The Annals of Statistics

Consistency of cross validation for comparing regression procedures

Yuhong Yang

Full-text: Open access


Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1.

Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property.

Article information

Ann. Statist., Volume 35, Number 6 (2007), 2450-2473.

First available in Project Euclid: 22 January 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G07: Density estimation 62B10: Information-theoretic topics [See also 94A17]
Secondary: 62C20: Minimax procedures

Consistency cross validation model selection


Yang, Yuhong. Consistency of cross validation for comparing regression procedures. Ann. Statist. 35 (2007), no. 6, 2450--2473. doi:10.1214/009053607000000514.

Export citation


  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Proc. 2nd Int. Symp. Information Theory (B. N. Petrov and F. Csáki, eds.) 267–281. Akadémiai Kiadó, Budapest.
  • Allen, D. M. (1974). The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16 125–127.
  • Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
  • Burman, P. (1989). A comparative study of ordinary cross-validation, $v$-fold cross-validation and the repeated learning-testing methods. Biometrika 76 503–514.
  • Burman, P. (1990). Estimation of optimal transformations using $v$-fold cross validation and repeated learning-testing methods. Sankhyā Ser. A 52 314–345.
  • Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions. Numer. Math. 31 377–403.
  • Donoho, D. L. and Johnstone, I. M. (1998). Minimax estimation via wavelet shrinkage. Ann. Statist. 26 879–921.
  • Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman and Hall, London.
  • Geisser, S. (1975). The predictive sample reuse method with applications. J. Amer. Statist. Assoc. 70 320–328.
  • Györfi, L., Kohler, M., Krzyzak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, New York.
  • Hall, P. and Johnstone, I. (1992). Empirical functional and efficient smoothing parameter selection (with discussion). J. Roy. Statist. Soc. Ser. B 54 475–530.
  • Härdle, W., Hall, P. and Marron, J. S. (1988). How far are automatically chosen regression smoothing parameters from their optimum? (with discussion). J. Amer. Statist. Assoc. 83 86–101.
  • Hart, J. D. (1997). Nonparametric Smoothing and Lack-of-Fit Tests. Springer, New York.
  • Li, K.-C. (1984). Consistency for cross-validated nearest neighbor estimates in nonparametric regression. Ann. Statist. 12 230–240.
  • Li, K.-C. (1987). Asymptotic optimality for $C_p$, $C_L$, cross-validation and generalized cross-validation: Discrete index set. Ann. Statist. 15 958–975.
  • Nemirovski, A. (2000). Topics in nonparametric statistics. Lectures on Probability Theory and Statistics (Saint-Flour, 1998). Lecture Notes in Math. 1738 85–277. Springer, Berlin.
  • Opsomer, J., Wang, Y. and Yang, Y. (2001). Nonparametric regression with correlated errors. Statist. Sci. 16 134–153.
  • Pollard, D. (1984). Convergence of Stochastic Processes. Springer, New York.
  • Shao, J. (1993). Linear model selection by cross-validation. J. Amer. Statist. Assoc. 88 486–494.
  • Shao, J. (1997). An asymptotic theory for linear model selection (with discussion). Statist. Sinica 7 221–264.
  • Simonoff, J. S. (1996). Smoothing Methods in Statistics. Springer, New York.
  • Speckman, P. (1985). Spline smoothing and optimal rates of convergence in nonparametric regression models. Ann. Statist. 13 970–983.
  • Stone, C. J. (1980). Optimal rates of convergence for nonparametric estimators. Ann. Statist. 8 1348–1360.
  • Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040–1053.
  • Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions (with discussion). J. Roy. Statist. Soc. Ser. B 36 111–147.
  • van der Laan, M. J. and Dudoit, S. (2003). Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: Finite sample oracle inequalities and examples. U.C. Berkeley Div. Biostatistics Working Paper Series. Available at
  • van der Laan, M. J., Dudoit, S. and van der Vaart, A. W. (2006). The cross-validated adaptive epsilon-net estimator. Statist. Decisions 24 373–395.
  • van der Vaart, A. W., Dudoit, S. and van der Laan, M. J. (2006). Oracle inequalities for multi-fold cross-validation. Statist. Decisions 24 351–371.
  • Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
  • Wegkamp, M. (2003). Model selection in nonparametric regression. Ann. Statist. 31 252–273.
  • Weisberg, S. (2005). Applied Linear Regression, 3rd ed. Wiley, Hoboken, NJ.
  • Wong, W. H. (1983). On the consistency of cross-validation in kernel nonparametric regression. Ann. Statist. 11 1136–1141.
  • Yang, Y. (2001). Adaptive regression by mixing. J. Amer. Statist. Assoc. 96 574–588.
  • Yang, Y. (2003). Regression with multiple candidate models: Selecting or mixing? Statist. Sinica 13 783–809.
  • Zhang, P. (1993). Model selection via multifold cross validation. Ann. Statist. 21 299–313.