When Does Optimizing a Proper Loss Yield Calibration?

Optimizing proper loss functions is widely believed to result in predictors that have good calibration properties; the idea being that these losses aim for the global optimum of predicting the true probabilities, which is inherently calibrated. However, conventional machine learning models are trained to minimize loss within restricted families of predictors, which are unlikely to include the ground truth. Under what circumstances does optimizing proper loss within a restricted family lead to calibrated models? What specific calibration guarantees does it provide? In this study, we offer a thorough response to these inquiries. We substitute global optimality with a condition of local optimality, which states that the (proper) loss of the predictor cannot be significantly reduced by post-processing its predictions using a certain set of Lipschitz functions. We demonstrate that any predictor satisfying this local optimality condition also satisfies smooth calibration, as defined in Kakade-Foster (2008) and Błasiok et al. (2023). Local optimality is likely to be met by well-trained DNNs, which provides an explanation for their calibration solely through proper loss minimization. Lastly, we establish a two-way relationship between local optimality and calibration error: nearly calibrated predictors are also nearly locally optimal.

Source link

No Result