In our discussion of the insightful paper by Nolde and Ziegel, we further investigate comparative backtests based on consistent scoring rules. We use Diebold–Mariano tests in pairwise comparisons instead of mere rankings in terms of average scores, and illustrate the use of weighted proper scoring rules, which address the quality of forecasts of the full loss distribution in its upper tail rather than some specific risk measure such as the Value at Risk. Overall, at lower levels up to 95%, these allow for better discrimination between competing forecasting methods.
"Discussion of “Elicitability and backtesting: Perspectives for banking regulation”." Ann. Appl. Stat. 11 (4) 1875 - 1882, December 2017. https://doi.org/10.1214/17-AOAS1041A