Electronic Journal of Statistics

General oracle inequalities for model selection

Charles Mitchell and Sara van de Geer

Source: Electron. J. Statist. Volume 3 (2009), 176-204.

Abstract

Model selection is often performed by empirical risk minimization. The quality of selection in a given situation can be assessed by risk bounds, which require assumptions both on the margin and the tails of the losses used. Starting with examples from the 3 basic estimation problems, regression, classification and density estimation, we formulate risk bounds for empirical risk minimization and prove them at a very general level, for general margin and power tail behavior of the excess losses. These bounds we then apply to typical examples.

Primary Subjects: 62G05
Secondary Subjects: 62G20

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ejs/1236089916
Digital Object Identifier: doi:10.1214/08-EJS254

References

[1] Audibert, J.-Y. (2003). Aggregated estimators and empirical complexity for least square regression. Preprint no. 805, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7.
[2] Audibert, J.-Y. (2006). A randomized online learning algorithm for better variance control. In Proceedings of the 19th Annual Conference on Learning Theory, pp. 392–407.
[3] Audibert, J.-Y. (2007). Progressive mixture rules are deviation suboptimal. Advances in Neural Information Processing Systems.
[4] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Prob. Theory and Rel. Fields 113, 3: 301–413.
[5] Bartlett, P.L. and Mendelson, S. (2006). Empirical minimization. Prob. Theory and Rel. Fields 135, 3: 311–334.
[6] Birgé, L. and Massart, P. (1997). From model selection to adaptive estimation. Festschrift for Lucien Le Cam. Springer, New York.
[7] Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C.R. Acad. Sci. Paris 334, 6: 495–500.
[8] Bunea, F., Tsybakov, A. B., and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35, 4: 1674–1697.
[9] Chesneau, C. and Lecué, G. (2006). Adapting to Unknown Smoothness by Aggregation of Thresholded Wavelet Estimators. ArXiv preprint math.ST/0612546.
[10] Györfi, L., Kohler, M., Krzyzak, A., and Walk, H. (2002). A Distribution-free Theory of Nonparametric Regression. Springer, New York.
[11] Györfi, L., and Wegkamp, M. (2008). Quantization for Nonparametric Regression. IEEE Trans. Inform. Theory 54, 2: 867–874.
[12] Juditsky, A., Rigollet, P., and Tsybakov, A. (2008). Learning by mirror averaging. Ann. Statist. 36, 5: 2183–2206.
[13] Juditsky, A. B., Nazin, A. V., Tsybakov, A. B., and Vayatis, N. (2005). Recursive aggregation of estimators by the mirror descent method with averaging. Problemy Peredachi Informatsii 41, 4: 78–96.
[14] Koltchinskii, V. Local Rademacher Complexities and Oracle Inequalities in Risk Minimization. 2004 IMS Medallion Lecture, July 2005.
[15] Lecué, G. (2007). Suboptimality of Penalized Empirical Risk Minimization in Classification. Proceedings of the 20th Annual Conference On Learning Theory. Lecture Notes in Artificial Intelligence 4539, 142–156. Springer, Heidelberg.
[16] Lecué, G. (2007). Optimal rates of aggregation in classification under low noise assumption. Bernoulli 13, 4: 1000–1022.
[17] Lee, W.S., Bartlett, P.L. and Williamson, R.C. (1998). The importance of convexity in learning with squared loss. IEEE Transactions on Information Theory 44, 5: 1974–1980.
[18] Mendelson, S. (2007). Obtaining fast error rates in nonconvex situations. J. Complexity 24: 380–397.
[19] Mendelson, S.. Lower bounds for the empirical minimization algorithm. To appear in IEEE Transactions on Information Theory.
[20] Rigollet, P. (2006). Inégalités d’oracle, agrégation et adaptation. Ph.D. thesis, Université Paris-VI.
[21] Tsybakov, A. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32, 1: 135–166.
[22] van de Geer, S. (2000). Empirical Processes in M-Estimation. Cambridge University Press.
[23] van der Vaart, A. W. and Wellner, J. A. (1996). Weak convergence and empirical processes, Springer, New York.
[24] van der Vaart, A. W., Dudoit, S. and van der Laan, M. J. (2006). Oracle inequalities for multi-fold cross validation. Statistics & Decisions 24: 351–371.
[25] Vapnik, V. and Chervonenkis, A. (1974). Theory of Pattern Recognition. Nauka, Moscow (in Russian).
[26] P. Whittle (1960). Bounds for the moments of linear and quadratic forms in independent variables. Theory of Probability and its Applications 5, 3: 302–305.

2009 © Institute of Mathematical Statistics