Avoiding Overconfidence: The Role of Model Calibration in Machine Learning
Table of contents
Why is it necessary?
In (supervised) ML projects, the aim of training a model is to find a model that can learn and estimate the outcomes. In the case of (binary) classification, typically the aim is to find out the occurrence of an event given a set of features. For example, given sets of characteristics of a visitor will they click on an AD or make a purchase?
Depending on the objective of a project, predicting whether a visitor will make a certain action(purchase or click) might be sufficient. However, when the probability of an action (e.g Purchase) becomes the focus rather than its presence or absence of action then model calibration becomes an important step in model evaluation. Calibration is not solely reserved for classification, it is applicable for regression as well.
What is it?
Model calibration is a form of post-processing to calibrate trained models to adjust and account for the mismatch between predicted probability and empirical/target probabilities.
A typical way to visualize such mismatch is using a Calibration plot whereby the X axis denotes mean predictions and y Empirical values.
The probabilities of the models shown above are overconfident on lower end of mean predicted probabilities and underconfident on the higher end of mean predicted probabilities. A well-calibrated model is shown above with a blue dashed line where empirical probabilities match the means predicted.
How to do it?
There are a few methods to calibrate (classification) models, one is Platt scaling another is isotonic regression. Both are implemented in sci-kit learn with CalibratedClassifierCV.
Here below, two classifiers (Naive & logistic regression) are fitted to a made-up dataset.
Both are following the line but not quite falling on the line. However, due to the way logistic regression is fitted (using cross-entropy or Log loss ), it is close to the dotted line than the Naive Bayes.
Regardless of the type of model, calibrated ones are closer to the dashed line. In particular, calibration with isotonic regression seems to improve both models with the biggest improvement in Naive Bayes.
In addition to the above calibration methods, there is another method called "Temperature scaling". Furthermore, here is another article on a Model calibration that dives into more details by a google data scientist.