The Annals of Statistics

Consistency of cross validation for comparing regression procedures

Yuhong Yang
Source: Ann. Statist. Volume 35, Number 6 (2007), 2450-2473.

Abstract

Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1.

Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property.

First Page: Show Hide
Primary Subjects: 62G07, 62B10
Secondary Subjects: 62C20
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1201012968
Digital Object Identifier: doi:10.1214/009053607000000514
Mathematical Reviews number (MathSciNet): MR2382654
Zentralblatt MATH identifier: 1129.62039

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Proc. 2nd Int. Symp. Information Theory (B. N. Petrov and F. Csáki, eds.) 267--281. Akadémiai Kiadó, Budapest.
Mathematical Reviews (MathSciNet): MR0483125
Zentralblatt MATH: 0283.62006
Allen, D. M. (1974). The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16 125--127.
Mathematical Reviews (MathSciNet): MR0343481
Digital Object Identifier: doi:10.2307/1267500
Zentralblatt MATH: 0286.62044
Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
Mathematical Reviews (MathSciNet): MR0726392
Zentralblatt MATH: 0541.62042
Burman, P. (1989). A comparative study of ordinary cross-validation, $v$-fold cross-validation and the repeated learning-testing methods. Biometrika 76 503--514.
Mathematical Reviews (MathSciNet): MR1040644
Zentralblatt MATH: 0677.62065
Burman, P. (1990). Estimation of optimal transformations using $v$-fold cross validation and repeated learning-testing methods. Sankhyā Ser. A 52 314--345.
Mathematical Reviews (MathSciNet): MR1178041
Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions. Numer. Math. 31 377--403.
Mathematical Reviews (MathSciNet): MR0516581
Digital Object Identifier: doi:10.1007/BF01404567
Zentralblatt MATH: 0377.65007
Donoho, D. L. and Johnstone, I. M. (1998). Minimax estimation via wavelet shrinkage. Ann. Statist. 26 879--921.
Mathematical Reviews (MathSciNet): MR1635414
Digital Object Identifier: doi:10.1214/aos/1024691081
Project Euclid: euclid.aos/1024691081
Zentralblatt MATH: 0935.62041
Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR1383587
Zentralblatt MATH: 0873.62037
Geisser, S. (1975). The predictive sample reuse method with applications. J. Amer. Statist. Assoc. 70 320--328.
Györfi, L., Kohler, M., Krzyzak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, New York.
Mathematical Reviews (MathSciNet): MR1920390
Hall, P. and Johnstone, I. (1992). Empirical functional and efficient smoothing parameter selection (with discussion). J. Roy. Statist. Soc. Ser. B 54 475--530.
Mathematical Reviews (MathSciNet): MR1160479
Härdle, W., Hall, P. and Marron, J. S. (1988). How far are automatically chosen regression smoothing parameters from their optimum? (with discussion). J. Amer. Statist. Assoc. 83 86--101.
Mathematical Reviews (MathSciNet): MR0941001
Digital Object Identifier: doi:10.2307/2288922
Zentralblatt MATH: 0644.62048
Hart, J. D. (1997). Nonparametric Smoothing and Lack-of-Fit Tests. Springer, New York.
Mathematical Reviews (MathSciNet): MR1461272
Zentralblatt MATH: 0886.62043
Li, K.-C. (1984). Consistency for cross-validated nearest neighbor estimates in nonparametric regression. Ann. Statist. 12 230--240.
Mathematical Reviews (MathSciNet): MR0733510
Digital Object Identifier: doi:10.1214/aos/1176346403
Project Euclid: euclid.aos/1176346403
Zentralblatt MATH: 0538.62030
Li, K.-C. (1987). Asymptotic optimality for $C_p$, $C_L$, cross-validation and generalized cross-validation: Discrete index set. Ann. Statist. 15 958--975.
Mathematical Reviews (MathSciNet): MR0902239
Digital Object Identifier: doi:10.1214/aos/1176350486
Project Euclid: euclid.aos/1176350486
Zentralblatt MATH: 0653.62037
Nemirovski, A. (2000). Topics in nonparametric statistics. Lectures on Probability Theory and Statistics (Saint-Flour, 1998). Lecture Notes in Math. 1738 85--277. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR1775640
Zentralblatt MATH: 0998.62033
Opsomer, J., Wang, Y. and Yang, Y. (2001). Nonparametric regression with correlated errors. Statist. Sci. 16 134--153.
Mathematical Reviews (MathSciNet): MR1861070
Digital Object Identifier: doi:10.1214/ss/1009213287
Project Euclid: euclid.ss/1009213287
Zentralblatt MATH: 1059.62537
Pollard, D. (1984). Convergence of Stochastic Processes. Springer, New York.
Mathematical Reviews (MathSciNet): MR0762984
Zentralblatt MATH: 0544.60045
Shao, J. (1993). Linear model selection by cross-validation. J. Amer. Statist. Assoc. 88 486--494.
Mathematical Reviews (MathSciNet): MR1224373
Digital Object Identifier: doi:10.2307/2290328
Zentralblatt MATH: 0773.62051
Shao, J. (1997). An asymptotic theory for linear model selection (with discussion). Statist. Sinica 7 221--264.
Mathematical Reviews (MathSciNet): MR1466682
Zentralblatt MATH: 1003.62527
Simonoff, J. S. (1996). Smoothing Methods in Statistics. Springer, New York.
Mathematical Reviews (MathSciNet): MR1391963
Zentralblatt MATH: 0859.62035
Speckman, P. (1985). Spline smoothing and optimal rates of convergence in nonparametric regression models. Ann. Statist. 13 970--983.
Mathematical Reviews (MathSciNet): MR0803752
Digital Object Identifier: doi:10.1214/aos/1176349650
Project Euclid: euclid.aos/1176349650
Zentralblatt MATH: 0585.62074
Stone, C. J. (1980). Optimal rates of convergence for nonparametric estimators. Ann. Statist. 8 1348--1360.
Mathematical Reviews (MathSciNet): MR0594650
Digital Object Identifier: doi:10.1214/aos/1176345206
Project Euclid: euclid.aos/1176345206
Zentralblatt MATH: 0451.62033
Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040--1053.
Mathematical Reviews (MathSciNet): MR0673642
Digital Object Identifier: doi:10.1214/aos/1176345969
Project Euclid: euclid.aos/1176345969
Zentralblatt MATH: 0511.62048
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions (with discussion). J. Roy. Statist. Soc. Ser. B 36 111--147.
Mathematical Reviews (MathSciNet): MR0356377
van der Laan, M. J. and Dudoit, S. (2003). Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: Finite sample oracle inequalities and examples. U.C. Berkeley Div. Biostatistics Working Paper Series. Available at www.bepress.com/ucbbiostat/paper130.
van der Laan, M. J., Dudoit, S. and van der Vaart, A. W. (2006). The cross-validated adaptive epsilon-net estimator. Statist. Decisions 24 373--395.
Mathematical Reviews (MathSciNet): MR2305113
van der Vaart, A. W., Dudoit, S. and van der Laan, M. J. (2006). Oracle inequalities for multi-fold cross-validation. Statist. Decisions 24 351--371.
Mathematical Reviews (MathSciNet): MR2305112
Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
Mathematical Reviews (MathSciNet): MR1045442
Zentralblatt MATH: 0813.62001
Wegkamp, M. (2003). Model selection in nonparametric regression. Ann. Statist. 31 252--273.
Mathematical Reviews (MathSciNet): MR1962506
Digital Object Identifier: doi:10.1214/aos/1046294464
Project Euclid: euclid.aos/1046294464
Zentralblatt MATH: 1019.62037
Weisberg, S. (2005). Applied Linear Regression, 3rd ed. Wiley, Hoboken, NJ.
Mathematical Reviews (MathSciNet): MR2112740
Zentralblatt MATH: 1068.62077
Wong, W. H. (1983). On the consistency of cross-validation in kernel nonparametric regression. Ann. Statist. 11 1136--1141.
Mathematical Reviews (MathSciNet): MR0720259
Project Euclid: euclid.aos/1176346327
Yang, Y. (2001). Adaptive regression by mixing. J. Amer. Statist. Assoc. 96 574--588.
Mathematical Reviews (MathSciNet): MR1946426
Digital Object Identifier: doi:10.1198/016214501753168262
Zentralblatt MATH: 1018.62033
Yang, Y. (2003). Regression with multiple candidate models: Selecting or mixing? Statist. Sinica 13 783--809.
Mathematical Reviews (MathSciNet): MR1997174
Zentralblatt MATH: 1028.62021
Zhang, P. (1993). Model selection via multifold cross validation. Ann. Statist. 21 299--313.
Mathematical Reviews (MathSciNet): MR1212178
Digital Object Identifier: doi:10.1214/aos/1176349027
Project Euclid: euclid.aos/1176349027
Zentralblatt MATH: 0770.62053

2012 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics