February 2023 Conditional predictive inference for stable algorithms
Lukas Steinberger, Hannes Leeb
Author Affiliations +
Ann. Statist. 51(1): 290-311 (February 2023). DOI: 10.1214/22-AOS2250


We investigate generically applicable and intuitively appealing prediction intervals based on k-fold cross-validation. We focus on the conditional coverage probability of the proposed intervals, given the observations in the training sample (hence, training conditional validity), and show that it is close to the nominal level, in an appropriate sense, provided that the underlying algorithm used for computing point predictions is sufficiently stable when feature-response pairs are omitted. Our results are based on a finite sample analysis of the empirical distribution function of k-fold cross-validation residuals and hold in nonparametric settings with only minimal assumptions on the error distribution. To illustrate our results, we also apply them to high-dimensional linear predictors, where we obtain uniform asymptotic training conditional validity as both sample size and dimension tend to infinity at the same rate and consistent parameter estimation typically fails. These results show that despite the serious problems of resampling procedures for inference on the unknown parameters (cf. in A Festschrift for Erich L. Lehmann (1983) 28–48 Wadsworth; Ann. Statist. 24 (1996) 307–335; J. Mach. Learn. Res. 19 (2018) 5), cross-validation methods can be successfully applied to obtain reliable predictive inference even in high dimensions and conditionally on the training data.

Funding Statement

The first author was supported by the Austrian Science Fund (FWF): P 28233-N32 and I 5484-N; the latter project is part of the Research Unit 5381 of the German Research Foundation. Part of this research was conducted while he was funded by the German Research Foundation (DFG): RO 3766/401.
The second author was supported by the Austrian Science Fund (FWF): P 28233-N32 and P 26354-N26.


The authors thank the participants of the “ISOR Research Seminar in Statistics and Econometrics” at the University of Vienna for discussion of an early version of the paper. In particular, we want to thank Benedikt M. Pötscher and David Preinerstorfer for valuable comments. We are also grateful to three anonymous referees and an associate editor for their constructive feedback to improve the paper.


Download Citation

Lukas Steinberger. Hannes Leeb. "Conditional predictive inference for stable algorithms." Ann. Statist. 51 (1) 290 - 311, February 2023. https://doi.org/10.1214/22-AOS2250


Received: 1 April 2022; Revised: 1 October 2022; Published: February 2023
First available in Project Euclid: 23 March 2023

MathSciNet: MR4564857
zbMATH: 07684013
Digital Object Identifier: 10.1214/22-AOS2250

Primary: 62G15 , 62J02
Secondary: 62G20 , 62J07

Keywords: algorithmic stability , cross-validation , high-dimensional regression , prediction intervals

Rights: Copyright © 2023 Institute of Mathematical Statistics


This article is only available to subscribers.
It is not available for individual sale.

Vol.51 • No. 1 • February 2023
Back to Top