Classical ML Foundations

Overfitting and Underfitting

When training a machine learning model, it is important to evaluate not only how well it fits the training data, but also how it performs on unseen data.

Two common problems are underfitting and overfitting.


Underfitting

Underfitting occurs when a model is too simple to capture the underlying patterns in the data.

An underfitted model: - has high bias - performs poorly on training data - performs poorly on validation and test data

Common causes:


Overfitting

Overfitting occurs when a model learns noise instead of patterns.

An overfitted model: - has high variance - performs very well on training data - performs poorly on validation and test data

Common causes:


Training vs Validation Performance

A useful diagnostic tool is the comparison between training and validation performance:

Scenario Training Error Validation Error
Underfitting High High
Good fit Low Low
Overfitting Low High

Why This Matters

Understanding overfitting and underfitting helps to: - choose the right model complexity - interpret training results correctly - improve generalization to new data