The Annals of Applied Statistics

Focusing on regions of interest in forecast evaluation

Hajo Holzmann and Bernhard Klar

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Often, interest in forecast evaluation focuses on certain regions of the whole potential range of the outcome, and forecasts should mainly be ranked according to their performance within these regions. A prime example is risk management, which relies on forecasts of risk measures such as the value-at-risk or the expected shortfall, and hence requires appropriate loss distribution forecasts in the tails. Further examples include weather forecasts with a focus on extreme conditions, or forecasts of environmental variables such as ozone with a focus on concentration levels with adverse health effects.

In this paper, we show how weighted scoring rules can be used to this end, and in particular that they allow to rank several potentially misspecified forecasts objectively with the region of interest in mind. This is demonstrated in various simulation scenarios. We introduce desirable properties of weighted scoring rules and present general construction principles based on conditional densities or distributions and on scoring rules for probability forecasts. In our empirical application to log-return time series, all forecasts seem to be slightly misspecified, as is often unavoidable in practice, and no method performs best overall. However, using weighted scoring functions the best method for predicting losses can be identified, which is hence the method of choice for the purpose of risk management.

Article information

Ann. Appl. Stat., Volume 11, Number 4 (2017), 2404-2431.

Received: March 2017
Revised: August 2017
First available in Project Euclid: 28 December 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Financial time series predictive performance probabilistic forecast locally proper weighted scoring rule misspecified forecast rare and extreme events risk management


Holzmann, Hajo; Klar, Bernhard. Focusing on regions of interest in forecast evaluation. Ann. Appl. Stat. 11 (2017), no. 4, 2404--2431. doi:10.1214/17-AOAS1088.

Export citation


  • Amisano, G. and Giacomini, R. (2007). Comparing density forecasts via weighted likelihood ratio tests. J. Bus. Econom. Statist. 25 177–190.
  • Bank of England (2017). Monetary Policy Framework. Retrieved from.
  • Billi, R. M. (2017). A note on nominal GDP targeting and the zero lower bound. Macroecon. Dyn. 11 2138–2157.
  • Casati, B., Wilson, L. J., Stephenson, D. B., Nurmi, P., Ghelli, A., Pocernich, M., Damrath, U., Ebert, E. E., Brown, B. G. and Mason, S. (2008). Forecast verification: Current status and future directions. Meteorol. Appl. 15 3–18.
  • Dawid, A. P. (1984). Statistical theory: The prequential approach. J. Roy. Statist. Soc. Ser. A 147 278–292.
  • De Nicolò, G. and Lucchetta, M. (2017). Forecasting tail risks. J. Appl. Econometrics 32 159–170.
  • Diebold, F. X. (2015). Comparing predictive accuracy, twenty years later: A personal perspective on the use and abuse of Diebold-Mariano tests. J. Bus. Econom. Statist. 33 1–24.
  • Diebold, F. X., Gunther, T. A. and Tay, A. (1998). Evaluating density forecasts: With applications to financial risk management. Internat. Econom. Rev. 39 863–883.
  • Diebold, F. X. and Mariano, R. S. (2015). Comparing predictive accuracy. J. Bus. Econom. Statist. 13 253–263.
  • Diks, C., Panchenko, V. and van Dijk, D. (2011). Likelihood-based scoring rules for comparing density forecasts in tails. J. Econometrics 163 215–230.
  • Elliott, G. and Timmermann, A. (2016). Forecasting in economics and finance. Ann. Rev. Econ. 8 81–110.
  • Garín, J., Lester, R. and Sims, E. (2016). On the desirability of nominal GDP targeting. J. Econom. Dynam. Control 69 21–44.
  • Ghalanos, A. (2014). rugarch: Univariate GARCH models. R package version 1.3-5.
  • Giacomini, R. and White, H. (2006). Tests of conditional predictive ability. Econometrica 74 1545–1578.
  • Giesbergen, B. (2017). China: How realistic is the government’s growth target? Economic Report. Rabobank. Retrieved at.
  • Gneiting, T. (2011). Making and evaluating point forecasts. J. Amer. Statist. Assoc. 106 746–762.
  • Gneiting, T. and Raftery, A. E. (2005). Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133 1098–1118.
  • Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359–378.
  • Gneiting, T. and Ranjan, R. (2011). Comparing density forecasts using threshold- and quantile-weighted scoring rules. J. Bus. Econom. Statist. 29 411–422.
  • Haiden, T., Magnussen, L. and Richardson, D. (2014). Statistical evaluation of ECMWF extreme wind forecasts. In European Centre for Medium—Range Weather Forecasts Newsletter, Spring 2014.
  • Holzmann, H. and Klar, B. (2016). Weighted scoring rules and hypothesis testing. Available at arXiv:1611.07345v2.
  • Holzmann, H. and Klar, B. (2017a). Supplement to “Focusing on regions of interest in forecast evaluation.” DOI:10.1214/17-AOAS1088SUPP.
  • Holzmann, H. and Klar, B. (2017b). Discussion of “Elicitability and backtesting: Perspectives for banking regulation.” Ann. Appl. Stat. 11 1875–1882.
  • Hyvärinen, A. (2005). Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 6 695–709.
  • Krüger, F., Lerch, S., Thorarinsdottir, T. L. and Gneiting, T. (2017). Probabilistic forecasting and comparative model assessment based on Markov chain Monte Carlo output. Preprint. Available at arXiv:1608.06802.
  • Lerch, S., Thorarinsdottir, T. L., Ravazzolo, F. and Gneiting, T. (2017). Forecaster’s dilemma: Extreme events and forecast evaluation. Statist. Sci. 32 106–127.
  • Matheson, J. E. and Winkler, R. L. (1976). Scoring rules for continuous probability distributions. Manage. Sci. 22 1087–1096.
  • McNeil, A. J., Frey, R. and Embrechts, P. (2005). Quantitative Risk Management: Concepts, Techniques and Tools. Princeton Univ. Press, Princeton, NJ.
  • Nolde, N. and Ziegel, J. F. (2017). Elicitability and backtesting: Perspectives for banking regulation. Ann. Appl. Statist. To appear.
  • Opschoor, A., van Dijk, D. and van der Wel, M. (2017). Combining density forecasts using focused scoring rules. J. Appl. Econometrics 2017 1–16. DOI:10.1002/jae.2575.
  • Patton, A. (2017). Evaluating and comparing possibly misspecified forecasts. Working paper.
  • Pelenis, J. (2014). Weighted scoring rules for comparison of density forecasts on subsets of interest. Preprint. Available at
  • Pisoni, E., Farina, M., Pagani, G. and Piroddi, L. (2011). Environmental over-Threshold Event Forecasting Using NARX Models. Preprints of the 18th IFAC World Congress, Milano (Italy).
  • R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Thorarinsdottir, T. L. and Gneiting, T. (2010). Probabilistic forecasts of wind speed: Ensemble model ouput statistics by using heteroscedastic censored regression. J. Roy. Statist. Soc. Ser. A 173 371–388.

Supplemental materials

  • Supplement to “Focusing on regions of interest in forecast evaluation”. We discuss weighted versions of the multivariate Hyvärinen score and of multivariate energy scores and provide the proof of Theorem 3. Further, we present the remaining simulation results for scenario B as well as additional simulation results for the Wilcoxon signed-rank test in all scenarios.