In machine learning, the goal is to learn a function that approximates the true relationship between inputs (X) and outputs (y).
However, every model makes assumptions.
Prediction error can be decomposed into three components:
Understanding these components explains why models underfit or overfit.
Assume there exists an unknown true function:
y = f(x) + ε
Where: - f(x) is the real underlying relationship - ε is random noise (irreducible error)
A model tries to approximate f(x) with an estimated function:
ŷ = g(x)
The difference between f(x) and g(x) is where bias and variance arise.
Bias measures how far the average model prediction is from the true function.
Formally:
Bias(x) = E[g(x)] − f(x)
Bias represents systematic error.
Geometrically:
The model cannot bend to match the curve.
It will consistently miss the true pattern, even if trained on many datasets.
High bias means:
This leads to underfitting.
Variance measures how much the model prediction changes when trained on different datasets.
Formally:
Variance(x) = E[(g(x) − E[g(x)])²]
Geometrically:
High variance means:
This leads to overfitting.
For a given input x, expected prediction error can be decomposed as:
Total Error = Bias² + Variance + Irreducible Error
Irreducible error comes from noise ε and cannot be eliminated.
The model can only control bias and variance.
Consider fitting points in 2D space.
Case 1: Linear model for nonlinear data - The fitted line cannot follow the curve - Systematic deviation appears → High Bias
Case 2: Very high-degree polynomial - The curve passes through every point - Small fluctuations in data drastically change the shape → High Variance
The ideal model:
Model complexity affects both components:
There is no free improvement.
The goal is to find a balance that minimizes total error.
Underfitting: - High Bias - Low Variance - Poor training performance
Overfitting: - Low Bias - High Variance - Large gap between training and validation performance
Bias and variance explain:
They are central to understanding model behavior.