Electronic Journal of Statistics

Why scoring functions cannot assess tail properties

Jonas R. Brehmer and Kirstin Strokorb

Full-text: Open access


Motivated by the growing interest in sound forecast evaluation techniques with an emphasis on distribution tails rather than average behaviour, we investigate a fundamental question arising in this context: Can statistical features of distribution tails be elicitable, i.e. be the unique minimizer of an expected score? We demonstrate that expected scores are not suitable to distinguish genuine tail properties in a very strong sense. Specifically, we introduce the class of max-functionals, which contains key characteristics from extreme value theory, for instance the extreme value index. We show that its members fail to be elicitable and that their elicitation complexity is in fact infinite under mild regularity assumptions. Further we prove that, even if the information of a max-functional is reported via the entire distribution function, a proper scoring rule cannot separate max-functional values. These findings highlight the caution needed in forecast evaluation and statistical inference if relevant information is encoded by such functionals.

Article information

Electron. J. Statist., Volume 13, Number 2 (2019), 4015-4034.

Received: May 2019
First available in Project Euclid: 5 October 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62C05: General considerations 62G32: Statistics of extreme values; tail inference
Secondary: 91B06: Decision theory [See also 62Cxx, 90B50, 91A35]

Elicitability elicitation complexity extreme value index max-functional proper scoring rule scoring functions consistency tail equivalence

Creative Commons Attribution 4.0 International License.


Brehmer, Jonas R.; Strokorb, Kirstin. Why scoring functions cannot assess tail properties. Electron. J. Statist. 13 (2019), no. 2, 4015--4034. doi:10.1214/19-EJS1622. https://projecteuclid.org/euclid.ejs/1570241097

Export citation


  • Bellini, F. and Bignozzi, V. (2015). On elicitable risk measures., Quantitative Finance 15 725–733.
  • Cadena, M. and Kratz, M. (2016). New results for tails of probability distributions according to their asymptotic decay., Statistics & Probability Letters 109 178–183.
  • Dawid, A. P. (2007). The geometry of proper scoring rules., Annals of the Institute of Statistical Mathematics 59 77–93.
  • de Haan, L. and Ferreira, A. (2006)., Extreme value theory. Springer Series in Operations Research and Financial Engineering. Springer, New York.
  • Dearborn, K. and Frongillo, R. (2019). On the indirect elicitability of the mode and modal interval., Annals of the Institute of Statistical Mathematics. To appear. Available at https://doi.org/10.1007/s10463-019-00719-1.
  • Diks, C., Panchenko, V. and van Dijk, D. (2011). Likelihood-based scoring rules for comparing density forecasts in tails., Journal of Econometrics 163 215–230.
  • Emmer, S., Kratz, M. and Tasche, D. (2015). What is the best risk measure in practice? A comparison of standard measures., Journal of Risk 18 31–60.
  • Ferro, C. A. T. and Stephenson, D. B. (2011). Extremal dependence indices: Improved verification measures for deterministic forecasts of rare binary events., Weather and Forecasting 26 699–713.
  • Fissler, T., Hlavinová, J. and Rudloff, B. (2019). Elicitability and identifiability of systemic risk measures and other set-valued functionals. Available at, https://arxiv.org/pdf/1907.01306.pdf.
  • Fissler, T. and Ziegel, J. F. (2015). Higher order elicitability and Osband’s principle. Available at, https://arxiv.org/pdf/1503.08123v3.pdf.
  • Fissler, T. and Ziegel, J. F. (2016). Higher order elicitability and Osband’s principle., The Annals of Statistics 44 1680–1707.
  • Fissler, T. and Ziegel, J. F. (2019). Order-sensitivity and equivariance of scoring functions., Electronic Journal of Statistics 13 1166–1211.
  • Friederichs, P. and Thorarinsdottir, T. L. (2012). Forecast verification for extreme value distributions with an application to probabilistic peak wind prediction., Environmetrics 23 579–594.
  • Frongillo, R. and Kash, I. A. (2015). Vector-valued property elicitation., Journal of Machine Learning Research: Workshop and Conference Proceedings 40 1–18.
  • Frongillo, R. and Kash, I. A. (2018). Elicitation complexity of statistical properties. Available at, https://arxiv.org/pdf/1506.07212.pdf.
  • Gneiting, T. (2011). Making and evaluating point forecasts., Journal of the American Statistical Association 106 746–762.
  • Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation., Journal of the American Statistical Association 102 359–378.
  • Gneiting, T. and Ranjan, R. (2011). Comparing density forecasts using threshold- and quantile-weighted scoring rules., Journal of Business & Economic Statistics 29 411–422.
  • Heinrich, C. (2014). The mode functional is not elicitable., Biometrika 101 245–251.
  • Holzmann, H. and Klar, B. (2017). Focusing on regions of interest in forecast evaluation., The Annals of Applied Statistics 11 2404–2431.
  • Lambert, N. S., Pennock, D. M. and Shoham, Y. (2008). Eliciting properties of probability distributions. In, Proceedings of the 9th ACM Conference on Electronic Commerce. EC ’08 129–138. ACM.
  • Ledford, A. W. and Tawn, J. A. (1996). Statistics for near independence in multivariate extreme values., Biometrika 83 169–187.
  • Ledford, A. W. and Tawn, J. A. (1997). Modelling dependence within joint tail regions., Journal of the Royal Statistical Society. Series B. Methodological 59 475–499.
  • Lerch, S., Thorarinsdottir, T. L., Ravazzolo, F. and Gneiting, T. (2017). Forecaster’s dilemma: extreme events and forecast evaluation., Statistical Science. A Review Journal of the Institute of Mathematical Statistics 32 106–127.
  • Matheson, J. E. and Winkler, R. L. (1976). Scoring rules for continuous probability distributions., Management Science 22 1087–1096.
  • McNeil, A. J., Frey, R. and Embrechts, P. (2015)., Quantitative risk management, revised ed. Princeton Series in Finance. Princeton University Press, Princeton, NJ.
  • Nau, R. F. (1985). Should scoring rules be ‘effective’?, Management Science 31 527–535.
  • Osband, K. (1985). Providing Incentives for Better Cost Forecasting, PhD thesis, University of California, Berkely.
  • Resnick, S. I. (1987)., Extreme values, regular variation, and point processes. Applied Probability. A Series of the Applied Probability Trust 4. Springer-Verlag, New York.
  • Steinwart, I., Pasin, C., Williamson, R. and Zhang, S. (2014). Elicitation and identification of properties., Journal of Machine Learning Research: Workshop and Conference Proceedings 35 1–45.
  • Stephenson, D. B., Casati, B., Ferro, C. A. T. and Wilson, C. A. (2008). The extreme dependency score: A non-vanishing measure for forecasts of rare events., Meteorological Applications 15 41–50.
  • Taillardat, M., Fougères, A.-L., Naveau, P. and De Fondeville, R. (2019). Extreme events evaluation using CRPS distributions. Available at, https://hal.archives-ouvertes.fr/hal-02121796/file/CRPS-190429.pdf.
  • Weber, S. (2006). Distribution-invariant risk measures, information, and dynamic consistency., Mathematical Finance 16 419–441.