Electronic Journal of Statistics

Beyond sigmoids: How to obtain well-calibrated probabilities from binary classifiers with beta calibration

Meelis Kull, Telmo M. Silva Filho, and Peter Flach

Full-text: Open access


For optimal decision making under variable class distributions and misclassification costs a classifier needs to produce well-calibrated estimates of the posterior probability. Isotonic calibration is a powerful non-parametric method that is however prone to overfitting on smaller datasets; hence a parametric method based on the logistic sigmoidal curve is commonly used. While logistic calibration is designed for normally distributed per-class scores, we demonstrate experimentally that many classifiers including Naive Bayes and Adaboost suffer from a particular distortion where these score distributions are heavily skewed. In such cases logistic calibration can easily yield probability estimates that are worse than the original scores. Moreover, the logistic curve family does not include the identity function, and hence logistic calibration can easily uncalibrate a perfectly calibrated classifier.

In this paper we solve all these problems with a richer class of parametric calibration maps based on the beta distribution. We derive the method from first principles and show that fitting it is as easy as fitting a logistic curve. Extensive experiments show that beta calibration is superior to logistic calibration for a wide range of classifiers: Naive Bayes, Adaboost, random forest, logistic regression, support vector machine and multi-layer perceptron. If the original classifier is already calibrated, then beta calibration learns a function close to the identity. On this we build a statistical test to recognise if the model deviates from being well-calibrated.

Article information

Electron. J. Statist., Volume 11, Number 2 (2017), 5052-5080.

Received: June 2017
First available in Project Euclid: 15 December 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Binary classification classifier calibration posterior probabilities logistic function sigmoid beta distribution

Creative Commons Attribution 4.0 International License.


Kull, Meelis; Silva Filho, Telmo M.; Flach, Peter. Beyond sigmoids: How to obtain well-calibrated probabilities from binary classifiers with beta calibration. Electron. J. Statist. 11 (2017), no. 2, 5052--5080. doi:10.1214/17-EJS1338SI. https://projecteuclid.org/euclid.ejs/1513306867

Export citation


  • Cohen, I. and Goldszmidt, M. (2004). Properties and benefits of calibrated classifiers. In, European Conference on Principles of Data Mining and Knowledge Discovery 125–136. Springer.
  • Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets., J. Machine Learning Research 7 1–30.
  • Fawcett, T. and Niculescu-Mizil, A. (2007). PAV and the ROC convex hull., Machine Learning 68 97-106.
  • Ferri, C., Flach, P. and Hernández-Orallo, J. (2003). Improving the AUC of Probabilistic Estimation Trees. In, 14th Eur. Conf. on Machine Learning, (ECML’03) 121–132. Springer.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors)., The Annals of Statistics 28 337–407.
  • Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I. H. (2009). The WEKA Data Mining Software: An Update., SIGKDD Explor. Newsl. 11 10–18.
  • Jones, E., Oliphant, T., Peterson, P. et al. (2001). SciPy: Open source scientific tools for, Python.
  • Kull, M., Silva Filho, T. and Flach, P. (2017). Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. In, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (A. Singh and J. Zhu, eds.). Proceedings of Machine Learning Research 54 623–631. PMLR, Fort Lauderdale, FL, USA.
  • Kull, M. and Flach, P. (2015). Novel Decompositions of Proper Scoring Rules for Classification: Score Adjustment as Precursor to Calibration. In, Machine Learning and Knowledge Discovery in Databases (ECML-PKDD’15) 68–85. Springer.
  • Lichman, M. (2013). UCI Machine Learning, Repository.
  • Lichtenstein, S., Fischhoff, B. and Phillips, L. D. (1977). Calibration of probabilities: The state of the art. In, Decision making and change in human affairs 275–324. Springer.
  • Niculescu-Mizil, A. and Caruana, R. (2005). Predicting Good Probabilities with Supervised Learning. In, Proc. 22nd Int. Conf. on Machine Learning (ICML’05) 625–632.
  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. and Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python., J. Machine Learning Research 12 2825–2830.
  • Platt, J. (2000). Probabilities for SV machines. In, Advances in Large Margin Classifiers (A. Smola, P. Bartlett, B. Schölkopf and D. Schuurmans, eds.) 61–74. MIT Press.
  • Provost, F. and Fawcett, T. (2001). Robust classification for imprecise environments., Machine learning 42 203–231.
  • Turner, B. M., Steyvers, M., Merkle, E. C., Budescu, D. V. and Wallsten, T. S. (2014). Forecast aggregation via recalibration., Machine Learning 95 261–289.
  • Zadrozny, B. and Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates In, Proc. 8th Int. Conf. on Knowledge Discovery and Data Mining (KDD’02) 694–699. ACM.
  • Zhu, J., Zou, H., Rosset, S. and Hastie, T. (2009). Multi-class AdaBoost., Statistics and its Interface 2 349–360.