Electronic Journal of Statistics

Model selection by resampling penalization

Sylvain Arlot

Full-text: Open access


In this paper, a new family of resampling-based penalization procedures for model selection is defined in a general framework. It generalizes several methods, including Efron’s bootstrap penalization and the leave-one-out penalization recently proposed by Arlot (2008), to any exchangeable weighted bootstrap resampling scheme. In the heteroscedastic regression framework, assuming the models to have a particular structure, these resampling penalties are proved to satisfy a non-asymptotic oracle inequality with leading constant close to 1. In particular, they are asympotically optimal. Resampling penalties are used for defining an estimator adapting simultaneously to the smoothness of the regression function and to the heteroscedasticity of the noise. This is remarkable because resampling penalties are general-purpose devices, which have not been built specifically to handle heteroscedastic data. Hence, resampling penalties naturally adapt to heteroscedasticity. A simulation study shows that resampling penalties improve on V-fold cross-validation in terms of final prediction error, in particular when the signal-to-noise ratio is not large.

Article information

Electron. J. Statist. Volume 3 (2009), 557-624.

First available in Project Euclid: 19 June 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G09: Resampling methods
Secondary: 62G08: Nonparametric regression 62M20: Prediction [See also 60G25]; filtering [See also 60G35, 93E10, 93E11]

Non-parametric statistics resampling exchangeable weighted bootstrap model selection penalization non-parametric regression adaptivity heteroscedastic data regressogram histogram selection


Arlot, Sylvain. Model selection by resampling penalization. Electron. J. Statist. 3 (2009), 557--624. doi:10.1214/08-EJS196. https://projecteuclid.org/euclid.ejs/1245415825.

Export citation


  • [1] Marc Aerts, Gerda Claeskens, and Jeffrey D. Hart. Testing the fit of a parametric function., J. Amer. Statist. Assoc., 94(447):869–879, 1999.
  • [2] Hirotugu Akaike. Statistical predictor identification., Ann. Inst. Statist. Math., 22:203–217, 1970.
  • [3] Hirotugu Akaike. Information theory and an extension of the maximum likelihood principle. In, Second International Symposium on Information Theory (Tsahkadsor, 1971), pages 267–281. Akadémiai Kiadó, Budapest, 1973.
  • [4] David M. Allen. The relationship between variable selection and data augmentation and a method for prediction., Technometrics, 16:125–127, 1974.
  • [5] Miguel A. Arcones and Evarist Giné. On the bootstrap of, M-estimators and other statistical functionals. In Exploring the limits of bootstrap (East Lansing, MI, 1990), Wiley Ser. Probab. Math. Statist. Probab. Math. Statist., pages 13–47. Wiley, New York, 1992.
  • [6] Sylvain Arlot., Resampling and Model Selection. PhD thesis, University Paris-Sud 11, December 2007. oai:tel.archives-ouvertes.fr:tel-00198803_v1.
  • [7] Sylvain Arlot. Suboptimality of penalties proportional to the dimension for model selection in heteroscedastic regression, December 2008., arXiv:0812.3141v1
  • [8] Sylvain Arlot. Technical appendix to “Model selection by resampling penalization”, 2009. Appendix to, hal-00262478.
  • [9] Sylvain Arlot., V-fold cross-validation improved: V-fold penalization, February 2008. arXiv:0802.0566v2.
  • [10] Sylvain Arlot, Gilles Blanchard, and Étienne Roquain. Some non-asymptotic results on resampling in high dimension, I: confidence regions., Ann. Statist., 2008. To appear.
  • [11] Sylvain Arlot and Pascal Massart. Data-driven calibration of penalties for least-squares regression., J. Mach. Learn. Res., 10(Feb):245–279, 2009.
  • [12] Jean-Yves Audibert., Théorie Statistique de l’Apprentissage: une approche PAC-Bayésienne. PhD thesis, Université Paris VI, June 2004.
  • [13] Yannick Baraud. Model selection for regression on a fixed design., Probab. Theory Related Fields, 117(4):467–493, 2000.
  • [14] Yannick Baraud. Model selection for regression on a random design., ESAIM Probab. Statist., 6:127–146 (electronic), 2002.
  • [15] Philippe Barbe and Patrice Bertail., The weighted bootstrap, volume 98 of Lecture Notes in Statistics. Springer-Verlag, New York, 1995.
  • [16] Andrew Barron, Lucien Birgé, and Pascal Massart. Risk bounds for model selection via penalization., Probab. Theory Related Fields, 113(3):301–413, 1999.
  • [17] Peter L. Bartlett, Stéphane Boucheron, and Gábor Lugosi. Model selection and error estimation., Machine Learning, 48:85–113, 2002.
  • [18] Peter L. Bartlett, Olivier Bousquet, and Shahar Mendelson. Local Rademacher complexities., Ann. Statist., 33(4) :1497–1537, 2005.
  • [19] Peter L. Bartlett, Shahar Mendelson, and Petra Philips. Local complexities for empirical risk minimization. In, Learning theory, volume 3120 of Lecture Notes in Comput. Sci., pages 270–284. Springer, Berlin, 2004.
  • [20] Lucien Birgé and Pascal Massart. Gaussian model selection., J. Eur. Math. Soc. (JEMS), 3(3):203–268, 2001.
  • [21] Lucien Birgé and Pascal Massart. Minimal penalties for Gaussian model selection., Probab. Theory Related Fields, 138(1-2):33–73, 2007.
  • [22] Prabir Burman. A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika, 76(3):503–514, 1989.
  • [23] Prabir Burman. Estimation of equifrequency histograms., Statist. Probab. Lett., 56(3):227–238, 2002.
  • [24] Olivier Catoni., Pac-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, volume 56 of IMS Lecture Notes Monograph Series. Inst. Math. Statist., 2007.
  • [25] Joseph E. Cavanaugh and Robert H. Shumway. A bootstrap variant of AIC for state-space model selection., Statist. Sinica, 7(2):473–496, 1997.
  • [26] Luc Devroye and Gábor Lugosi., Combinatorial methods in density estimation. Springer Series in Statistics. Springer-Verlag, New York, 2001.
  • [27] David L. Donoho and Iain M. Johnstone. Adapting to unknown smoothness via wavelet shrinkage., J. Amer. Statist. Assoc., 90(432) :1200–1224, 1995.
  • [28] Sam Efromovich and Mark Pinsker. Sharp-optimal and adaptive estimation for heteroscedastic nonparametric regression., Statist. Sinica, 6(4):925–942, 1996.
  • [29] Bradley Efron. Bootstrap methods: another look at the jackknife., Ann. Statist., 7(1):1–26, 1979.
  • [30] Bradley Efron. Estimating the error rate of a prediction rule: improvement on cross-validation., J. Amer. Statist. Assoc., 78(382):316–331, 1983.
  • [31] Bradley Efron. How biased is the apparent error rate of a prediction rule?, J. Amer. Statist. Assoc., 81(394):461–470, 1986.
  • [32] Bradley Efron and Robert Tibshirani. Improvements on cross-validation: the.632+ bootstrap method., J. Amer. Statist. Assoc., 92(438):548–560, 1997.
  • [33] Magalie Fromont. Model selection by bootstrap penalization for classification. In, Learning theory, volume 3120 of Lecture Notes in Comput. Sci., pages 285–299. Springer, Berlin, 2004.
  • [34] Magalie Fromont. Model selection by bootstrap penalization for classification., Mach. Learn., 66(2–3):165–207, 2007.
  • [35] Leonid Galtchouk and Sergey Pergamenshchikov. Adaptive asymptotically efficient estimation in heteroscedastic nonparametric regression via model selection, October 2008., arXiv:0810.1173.
  • [36] Seymour Geisser. The predictive sample reuse method with applications., J. Amer. Statist. Assoc., 70:320–328, 1975.
  • [37] Xavier Gendre. Simultaneous estimation of the mean and the variance in heteroscedastic Gaussian regression., Electronic Journal of Statistics, 2 :1345–1372, 2008.
  • [38] László Györfi, Michael Kohler, Adam Krzyżak, and Harro Walk., A distribution-free theory of nonparametric regression. Springer Series in Statistics. Springer-Verlag, New York, 2002.
  • [39] Peter Hall., The bootstrap and Edgeworth expansion. Springer Series in Statistics. Springer-Verlag, New York, 1992.
  • [40] Peter Hall and Enno Mammen. On general resampling algorithms and their performance in distribution estimation., Ann. Statist., 22(4) :2011–2030, 1994.
  • [41] Don Hush and Clint Scovel. Concentration of the hypergeometric distribution., Statist. Probab. Lett., 75(2):127–132, 2005.
  • [42] Marie Hušková and Paul Janssen. Consistency of the generalized bootstrap for degenerate, U-statistics. Ann. Statist., 21(4) :1811–1823, 1993.
  • [43] Makio Ishiguro, Yosiyuki Sakamoto, and Genshiro Kitagawa. Bootstrapping log likelihood and EIC, an extension of AIC., Ann. Inst. Statist. Math., 49(3):411–434, 1997.
  • [44] C. Matthew Jones and Anatoly A. Zhigljavsky. Approximating the negative moments of the Poisson distribution., Statist. Probab. Lett., 66(2):171–181, 2004.
  • [45] Vladimir Koltchinskii. Rademacher penalties and structural risk minimization., IEEE Trans. Inform. Theory, 47(5) :1902–1914, 2001.
  • [46] Vladimir Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization., Ann. Statist., 34(6) :2593–2656, 2006.
  • [47] A. P. Korostelëv and A. B. Tsybakov., Minimax theory of image reconstruction, volume 82 of Lecture Notes in Statistics. Springer-Verlag, New York, 1993.
  • [48] Robert A. Lew. Bounds on negative moments., SIAM J. Appl. Math., 30(4):728–731, 1976.
  • [49] Ker-Chau Li. Asymptotic optimality for, Cp, CL, cross-validation and generalized cross-validation: discrete index set. Ann. Statist., 15(3):958–975, 1987.
  • [50] Gábor Lugosi and Marten Wegkamp. Complexity regularization via localized random penalties., Ann. Statist., 32(4) :1679–1697, 2004.
  • [51] Colin L. Mallows. Some comments on, Cp. Technometrics, 15:661–675, 1973.
  • [52] Enno Mammen., When does bootstrap work? Asymptotic results and simulations, volume 77 of Lecture Notes in Statistics. Springer, 1992.
  • [53] Enno Mammen and Alexandre B. Tsybakov. Smooth discrimination analysis., Ann. Statist., 27(6) :1808–1829, 1999.
  • [54] David M. Mason and Michael A. Newton. A rank statistics approach to the consistency of a general bootstrap., Ann. Statist., 20(3) :1611–1624, 1992.
  • [55] Pascal Massart., Concentration inequalities and model selection, volume 1896 of Lecture Notes in Mathematics. Springer, Berlin, 2007. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003, With a foreword by Jean Picard.
  • [56] Dimitris N. Politis, Joseph P. Romano, and Michael Wolf., Subsampling. Springer Series in Statistics. Springer-Verlag, New York, 1999.
  • [57] Jens Præstgaard and Jon A. Wellner. Exchangeably weighted bootstraps of the general empirical process., Ann. Probab., 21(4) :2053–2086, 1993.
  • [58] Marie Sauvé. Histogram selection in non Gaussian regression., ESAIM: Probability and Statistics, 13:70–86, 2009.
  • [59] Jun Shao. Bootstrap model selection., J. Amer. Statist. Assoc., 91(434):655–665, 1996.
  • [60] Jun Shao. An asymptotic theory for linear model selection., Statist. Sinica, 7(2):221–264, 1997. With comments and a rejoinder by the author.
  • [61] Ritei Shibata. An optimal selection of regression variables., Biometrika, 68(1):45–54, 1981.
  • [62] Ritei Shibata. Bootstrap estimate of Kullback-Leibler information for model selection., Statist. Sinica, 7(2):375–394, 1997.
  • [63] Charles J. Stone. Optimal rates of convergence for nonparametric estimators., Ann. Statist., 8(6) :1348–1360, 1980.
  • [64] Charles J. Stone. An asymptotically optimal histogram selection rule. In, Proceedings of the Berkeley conference in honor of Jerzy Neyman and Jack Kiefer, Vol. II (Berkeley, Calif., 1983), Wadsworth Statist./Probab. Ser., pages 513–520, Belmont, CA, 1985. Wadsworth.
  • [65] Mervyn Stone. Cross-validatory choice and assessment of statistical predictions., J. Roy. Statist. Soc. Ser. B, 36:111–147, 1974. With discussion by G.A. Barnard, A.C. Atkinson, L.K. Chan, A.P. Dawid, F. Downton, J. Dickey, A.G. Baker, O. Barndorff-Nielsen, D.R. Cox, S. Giesser, D. Hinkley, R.R. Hocking, and A.S. Young, and with a reply by the authors.
  • [66] Aad W. van der Vaart and Jon A. Wellner., Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York, 1996. With applications to statistics.
  • [67] Chien-Fu Jeff Wu. Jackknife, bootstrap and other resampling methods in regression analysis., Ann. Statist., 14(4) :1261–1350, 1986. With discussion and a rejoinder by the author.
  • [68] Yuhong Yang. Consistency of cross validation for comparing regression procedures., Ann. Statist., 35(6) :2450–2473, 2007.
  • [69] Yuhong Yang and Andrew Barron. Information-theoretic determination of minimax rates of convergence., Ann. Statist., 27(5) :1564–1599, 1999.
  • [70] Marko Žnidarič. Asymptotic expansions for inverse moments of binomial and poisson distributions. arXiv:math.ST /0511226, November, 2005.