After training a machine learning model, we need to understand how good its predictions really are.
Evaluation metrics help us answer different questions about model behavior. There is no single “best” metric — each one highlights a different aspect of performance.
In regression problems, predictions are continuous values.
Regression metrics measure how far predictions are from the true values.
Common regression metrics include: - MAE, MSE, RMSE - R² score
In general: - lower error means better predictions - different metrics penalize errors in different ways
In classification, the model answers a yes / no question or chooses a class.
Before defining metrics, we must clearly define the question the model is answering.
Question:
> Does this image contain a cat?
Possible answers: - Yes → Cat - No → Not a cat
The confusion matrix helps us understand how the model answers this question.
Predicted
| Cat | Not Cat
Interpretation: - True Positive (TP): image contains a cat, model says “cat” - False Negative (FN): image contains a cat, model says “not cat” - False Positive (FP): image does not contain a cat, model says “cat” - True Negative (TN): image does not contain a cat, model says “not cat”
Accuracy answers the question:
How often is the model correct overall?
Accuracy is useful when: - classes are balanced - all errors have similar consequences
Accuracy can be misleading when: - one class is much more frequent than the other - false positives and false negatives have different costs
Precision answers the question:
When the model predicts “cat”, how often is it correct?
Precision is important when: - false positives are costly - we want to trust positive predictions
Recall answers the question:
How many actual cats did the model successfully find?
Recall is important when: - false negatives are costly - missing a positive case is dangerous
The F1-score balances precision and recall.
It is useful when: - classes are imbalanced - both false positives and false negatives matter - a single metric is needed for comparison
Evaluation metrics are not just numbers.
They reflect: - the question the model is answering - the real-world cost of errors - the priorities of the problem
Choosing the right metric is a design decision, not a technical detail.