Electronic Journal of Statistics

General oracle inequalities for model selection

Charles Mitchell and Sara van de Geer

Full-text: Open access


Model selection is often performed by empirical risk minimization. The quality of selection in a given situation can be assessed by risk bounds, which require assumptions both on the margin and the tails of the losses used. Starting with examples from the 3 basic estimation problems, regression, classification and density estimation, we formulate risk bounds for empirical risk minimization and prove them at a very general level, for general margin and power tail behavior of the excess losses. These bounds we then apply to typical examples.

Article information

Electron. J. Statist., Volume 3 (2009), 176-204.

First available in Project Euclid: 3 March 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G05: Estimation
Secondary: 62G20: Asymptotic properties


Mitchell, Charles; van de Geer, Sara. General oracle inequalities for model selection. Electron. J. Statist. 3 (2009), 176--204. doi:10.1214/08-EJS254. https://projecteuclid.org/euclid.ejs/1236089916

Export citation


  • [1] Audibert, J.-Y. (2003). Aggregated estimators and empirical complexity for least square regression. Preprint no. 805, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris, 7.
  • [2] Audibert, J.-Y. (2006). A randomized online learning algorithm for better variance control. In, Proceedings of the 19th Annual Conference on Learning Theory, pp. 392–407.
  • [3] Audibert, J.-Y. (2007). Progressive mixture rules are deviation suboptimal., Advances in Neural Information Processing Systems.
  • [4] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization., Prob. Theory and Rel. Fields 113, 3: 301–413.
  • [5] Bartlett, P.L. and Mendelson, S. (2006). Empirical minimization., Prob. Theory and Rel. Fields 135, 3: 311–334.
  • [6] Birgé, L. and Massart, P. (1997). From model selection to adaptive estimation., Festschrift for Lucien Le Cam. Springer, New York.
  • [7] Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes., C.R. Acad. Sci. Paris 334, 6: 495–500.
  • [8] Bunea, F., Tsybakov, A. B., and Wegkamp, M. H. (2007). Aggregation for Gaussian regression., Ann. Statist. 35, 4: 1674–1697.
  • [9] Chesneau, C. and Lecué, G. (2006). Adapting to Unknown Smoothness by Aggregation of Thresholded Wavelet Estimators. ArXiv preprint, math.ST/0612546.
  • [10] Györfi, L., Kohler, M., Krzyzak, A., and Walk, H. (2002)., A Distribution-free Theory of Nonparametric Regression. Springer, New York.
  • [11] Györfi, L., and Wegkamp, M. (2008). Quantization for Nonparametric Regression., IEEE Trans. Inform. Theory 54, 2: 867–874.
  • [12] Juditsky, A., Rigollet, P., and Tsybakov, A. (2008). Learning by mirror averaging., Ann. Statist. 36, 5: 2183–2206.
  • [13] Juditsky, A. B., Nazin, A. V., Tsybakov, A. B., and Vayatis, N. (2005). Recursive aggregation of estimators by the mirror descent method with averaging., Problemy Peredachi Informatsii 41, 4: 78–96.
  • [14] Koltchinskii, V. Local Rademacher Complexities and Oracle Inequalities in Risk Minimization. 2004 IMS Medallion Lecture, July, 2005.
  • [15] Lecué, G. (2007). Suboptimality of Penalized Empirical Risk Minimization in Classification., Proceedings of the 20th Annual Conference On Learning Theory. Lecture Notes in Artificial Intelligence 4539, 142–156. Springer, Heidelberg.
  • [16] Lecué, G. (2007). Optimal rates of aggregation in classification under low noise assumption., Bernoulli 13, 4: 1000–1022.
  • [17] Lee, W.S., Bartlett, P.L. and Williamson, R.C. (1998). The importance of convexity in learning with squared loss., IEEE Transactions on Information Theory 44, 5: 1974–1980.
  • [18] Mendelson, S. (2007). Obtaining fast error rates in nonconvex situations., J. Complexity 24: 380–397.
  • [19] Mendelson, S.. Lower bounds for the empirical minimization algorithm. To appear in IEEE Transactions on Information, Theory.
  • [20] Rigollet, P. (2006). Inégalités d’oracle, agrégation et adaptation. Ph.D. thesis, Université, Paris-VI.
  • [21] Tsybakov, A. (2004). Optimal aggregation of classifiers in statistical learning., Ann. Statist. 32, 1: 135–166.
  • [22] van de Geer, S. (2000)., Empirical Processes in M-Estimation. Cambridge University Press.
  • [23] van der Vaart, A. W. and Wellner, J. A. (1996)., Weak convergence and empirical processes, Springer, New York.
  • [24] van der Vaart, A. W., Dudoit, S. and van der Laan, M. J. (2006). Oracle inequalities for multi-fold cross validation., Statistics & Decisions 24: 351–371.
  • [25] Vapnik, V. and Chervonenkis, A. (1974)., Theory of Pattern Recognition. Nauka, Moscow (in Russian).
  • [26] P. Whittle (1960). Bounds for the moments of linear and quadratic forms in independent variables., Theory of Probability and its Applications 5, 3: 302–305.