## The Annals of Statistics

### Risk bounds for statistical learning

#### Abstract

We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classification framework. We extend Tsybakov’s analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weighted empirical processes. This allows us to deal with ways of measuring the “size” of a class of classifiers other than entropy with bracketing as in Tsybakov’s work. In particular, we derive new risk bounds for the ERM when the classification rules belong to some VC-class under margin conditions and discuss the optimality of these bounds in a minimax sense.

#### Article information

Source
Ann. Statist. Volume 34, Number 5 (2006), 2326-2366.

Dates
First available in Project Euclid: 23 January 2007

https://projecteuclid.org/euclid.aos/1169571799

Digital Object Identifier
doi:10.1214/009053606000000786

Mathematical Reviews number (MathSciNet)
MR2291502

Zentralblatt MATH identifier
1108.62007

Subjects
Primary: 60E15: Inequalities; stochastic orderings
Secondary: 60F10: Large deviations 94A17: Measures of information, entropy

#### Citation

Massart, Pascal; Nédélec, Élodie. Risk bounds for statistical learning. Ann. Statist. 34 (2006), no. 5, 2326--2366. doi:10.1214/009053606000000786. https://projecteuclid.org/euclid.aos/1169571799

#### References

• Barron, A. R., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301--413.
• Birgé, L. (2005). A new lower bound for multiple hypothesis testing. IEEE Trans. Inform. Theory 51 1611--1615.
• Birgé, L. and Massart, P. (1998). Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli 4 329--375.
• Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495--500.
• Devroye, L. and Lugosi, G. (1995). Lower bounds in pattern recognition and learning. Pattern Recognition 28 1011--1018.
• Dudley, R. M. (1999). Uniform Central Limit Theorems. Cambridge Univ. Press.
• Edelsbrunner, H. (1987). Algorithms in Combinatorial Geometry. Springer, Berlin.
• Haussler, D. (1995). Sphere packing numbers for subsets of the Boolean $n$-cube with bounded Vapnik--Chervonenkis dimension. J. Combin. Theory Ser. A 69 217--232.
• Haussler, D., Littlestone, N. and Warmuth, M. (1994). Predicting $\ 0,1\$-functions on randomly drawn points. Inform. and Comput. 115 248--292.
• Koltchinskii, V. I. (1981). On the central limit theorem for empirical measures. Theor. Probab. Math. Statist. 24 71--82.
• Korostelev, A. P. and Tsybakov, A. B. (1993). Minimax Theory of Image Reconstruction. Lecture Notes in Statist. 82. Springer, New York.
• Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Isoperimetry and Processes. Springer, Berlin.
• Lugosi, G. (2002). Pattern classification and learning theory. In Principles of Nonparametric Learning (L. Györfi, ed.) 1--56. Springer, Vienna.
• Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808--1829.
• Massart, P. (2000). Some applications of concentration inequalities to statistics. Probability theory. Ann. Fac. Sci. Toulouse Math. (6) 9 245--303.
• Massart, P. (2006). Concentration inequalities and model selection. Lectures on Probability Theory and Statistics. Ecole d'Eté de Probabilités de Saint Flour XXXIII. Lecture Notes in Math. 1896. Springer, Berlin. To appear.
• Massart, P. and Rio, E. (1998). A uniform Marcinkiewicz--Zygmund strong law of large numbers for empirical processes. In Festschrift for Miklós Csörgő: Asymptotic Methods in Probability and Statistics (B. Szyszkowicz, ed.) 199--211. North-Holland, Amsterdam.
• McDiarmid, C. (1989). On the method of bounded differences. In Surveys in Combinatorics 1989 (J. Siemons, ed.) 148--188. Cambridge Univ. Press.
• Pollard, D. (1982). A central limit theorem for empirical processes. J. Austral. Math. Soc. Ser. A 33 235--248.
• Reynaud-Bouret, P. (2003). Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probab. Theory Related Fields 126 103--153.
• Talagrand, M. (1996). New concentration inequalities in product spaces. Invent. Math. 126 505--563.
• Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135--166.
• Vapnik, V. N. (1982). Estimation of Dependences Based on Empirical Data. Springer, New York.
• Vapnik, V. N. and Chervonenkis, A. Ya. (1974). Theory of Pattern Recognition. Nauka, Moscow. (In Russian.)
• Yang, Y. and Barron, A. R. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564--1599.
• Yu, B. (1997). Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics (D. Pollard, E. Torgersen and G. L. Yang, eds.) 423--435. Springer, New York.