Electronic Journal of Statistics

Oracle inequalities for cross-validation type procedures

Guillaume Lecué and Charles Mitchell

Full-text: Open access

Abstract

We prove oracle inequalities for three different types of adaptation procedures inspired by cross-validation and aggregation. These procedures are then applied to the construction of Lasso estimators and aggregation with exponential weights with data-driven regularization and temperature parameters, respectively. We also prove oracle inequalities for the cross-validation procedure itself under some convexity assumptions.

Article information

Source
Electron. J. Statist., Volume 6 (2012), 1803-1837.

Dates
First available in Project Euclid: 4 October 2012

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1349355603

Digital Object Identifier
doi:10.1214/12-EJS730

Mathematical Reviews number (MathSciNet)
MR2988465

Zentralblatt MATH identifier
1295.62051

Subjects
Primary: 62G99: None of the above, but in this section

Keywords
Adaptation aggregation cross-validation sparsity

Citation

Lecué, Guillaume; Mitchell, Charles. Oracle inequalities for cross-validation type procedures. Electron. J. Statist. 6 (2012), 1803--1837. doi:10.1214/12-EJS730. https://projecteuclid.org/euclid.ejs/1349355603


Export citation

References

  • [1] Adamczak, R. (2008). A tail inequality for suprema of unbounded empirical processes with applications to Markov chains., Electron. J. Probab. 13, no. 34, 1000–1034.
  • [2] Arlot, S. & Celisse, A. (2010). A survey of cross-validation procedures for model selection., Stat. Surv. 4, 40–79.
  • [3] Barron, A., Birgé, L. & Massart, P. (1999). Risk bounds for model selection via penalization., Probab. Theory Related Fields 113, 301–413.
  • [4] Bartlett, P. L. & Jordan, M. I. and McAuliffe, J. D. (2006). Convexity, classification, and risk bounds., J. Amer. Statist. Assoc. 101, 138–156.
  • [5] Bartlett, P. L. & Mendelson, S. (2006). Empirical minimization., Probab. Theory Related Fields 135, 311–334.
  • [6] Bühlmann, P. & van de Geer, S. (2011)., Statistics for high-dimensional data in Springer Series in Statistics. Methods, theory and applications. Springer, Heidelberg.
  • [7] Bousquet, O. & Elisseeff, A. (2002). Stability and generalization., J. Mach. Learn. Res. 2(3), 499–526.
  • [8] Catoni, O. (2007)., Pac-Bayesian supervised classification: the thermodynamics of statistical learning, Institute of Mathematical Statistics Lecture Notes—Monograph Series, 56. Institute of Mathematical Statistics, Beachwood, OH.
  • [9] Catoni, O. (2004)., Statistical learning theory and stochastic optimization, vol. 1851 of Lecture Notes in Mathematics. Springer, Berlin.
  • [10] Cornec, M. (2009)., Probability bounds for the cross-validation estimate in the context of the statistical learning theory and statistical models applied to economics and finance. PhD Thesis, CREST - Centre de Recherche en économie et statistique.
  • [11] Devroye, L. & Wagner, T. (1979). Distribution-free performance bounds for potential function rules., IEEE Trans. Inform. Theory 25(5), 601–604.
  • [12] Dudley, R. M. (1999). Uniform central limit theorems., Cambridge University Press.
  • [13] Friedman, J., Hastie, T. & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent., Journal of Statistical Software 33 (1).
  • [14] Gaïffas, S. & Lecué, G. (2007). Optimal rates and adaptation in the single-index model using aggregation., Electron. J. Stat. 1, 538–573.
  • [15] Hall, P. (1983). Large sample optimality of least squares cross-validation in density estimation., Ann. Statist. 11(4), 1156–1174.
  • [16] Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization., Ann. Statist. 34, 2593–2656.
  • [17] Larson, S. C. (1931). The shrinkage of the coefficient of multiple correlation., J. Edic. Psychol., 22, 45 – 55.
  • [18] Lecué, G. (2007). Suboptimality of penalized empirical risk minimization in classification. In, 20th Annual Conference On Learning Theory, COLT07 (eds. N.H. Bshouty & C. Gentile), 142–156. Springer, Berlin.
  • [19] Lecué, G. (2007). Optimal rate of aggregation in classification under low noise assumption., Bernoulli 13(4), 1000–1022.
  • [20] Lecué, G. & Mendelson, S. (2009). Aggregation via empirical risk minimization., Probab. Theory Related Fields 145, 591–613.
  • [21] Ledoux, M. & Talagrand, M. (1991)., Probability in Banach spaces, vol. 23 of Ergebnisse der Mathematik und ihrer Grenzgebiete. Springer, Berlin.
  • [22] Lepskii, O. (1990). A problem of adaptive estimation in Gaussian white noise., (Russian) Teor. Veroyatnost. i Primenen. 35(3) 459–470 translation in Theory Probab. Appl. 35(3) 454–466 (1991). 62M05 (62G20)
  • [23] Lepskii, O. (1992). Asymptotically minimax adaptive estimation. I. Upper bounds. Optimally adaptive estimates., (Russian) Teor. Veroyatnost. i Primenen. 36(4) (1991) 645–659; translation in Theory Probab. Appl. 36(4) 682–697 (1992).
  • [24] Mammen, E. & Tsybakov, A. B. (1999). Smooth discrimination analysis., Ann. Statist. 27, 1808–1829.
  • [25] Massart, P. (2007)., Concentration inequalities and model selection, vol. 1896 of Lecture notes in mathematics. Springer, Berlin.
  • [26] Milman, V. D. & Schechtman, G. (1986)., Asymptotic theory of finite-dimensional normed spaces, vol. 1200 of Lecture Notes in Mathematics. Springer, Berlin.
  • [27] Nemirovski, A. (2000). Topics in non-parametric statistics. In, Lectures on probability theory and statistics (Saint-Flour, 1998), vol. 1738 of Lecture Notes in Mathematics, 85–277. Springer, Berlin.
  • [28] Shao, J. (1993). Linear model selection by cross-validation., J. Amer. Statist. Assoc. 88, 486–494.
  • [29] Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions., J. Roy. Statist. Soc., Ser. B 36, 111–147.
  • [30] Stone, C. (1984). An asymptotically optimal window selection rule for kernel density estimates., Ann. Statist. 12(4), 1285–1297.
  • [31] Talagrand, M. (2005)., The generic chaining. Springer Monographs in Mathematics. Springer, Berlin.
  • [32] Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning., Ann. Statist. 32, 135–166.
  • [33] van de Geer, S. A. (2000)., Applications of empirical process theory, vol. 6 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge.
  • [34] van der Vaart, A. W., Dudoit, S. & van der Laan, M. J. (2006). Oracle inequalities for multi-fold cross validation., Statist. Decisions 24, 351–371.
  • [35] van der Vaart, A. W. and Wellner, J. A. (1996)., Weak convergence and empirical processes. Springer Series in Statistics. Springer, New York.
  • [36] Vapnik, V. (1982) Estimation of dependences based on empirical data., Translated from the Russian by Samuel Kotz. Springer Series in Statistics. Springer-Verlag, New York-Berlin.
  • [37] Yang, Y. (2000). Mixing strategies for density estimation., Ann. Statist. 28, 75–87.
  • [38] Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization., Ann. Statist. 32, 56–85.
  • [39] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net., J. R. Stat. Soc. Ser. B. Stat. Methodol. 67, 301–320.