## The Annals of Statistics

### Fast learning rates for plug-in classifiers

#### Abstract

It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, that is, rates faster than n−1/2. The work on this subject has suggested the following two conjectures: (i) the best achievable fast rate is of the order n−1, and (ii) the plug-in classifiers generally converge more slowly than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only fast, but also super-fast rates, that is, rates faster than n−1. We establish minimax lower bounds showing that the obtained rates cannot be improved.

#### Article information

Source
Ann. Statist., Volume 35, Number 2 (2007), 608-633.

Dates
First available in Project Euclid: 5 July 2007

https://projecteuclid.org/euclid.aos/1183667286

Digital Object Identifier
doi:10.1214/009053606000001217

Mathematical Reviews number (MathSciNet)
MR2336861

Zentralblatt MATH identifier
1118.62041

#### Citation

Audibert, Jean-Yves; Tsybakov, Alexandre B. Fast learning rates for plug-in classifiers. Ann. Statist. 35 (2007), no. 2, 608--633. doi:10.1214/009053606000001217. https://projecteuclid.org/euclid.aos/1183667286

#### References

• Audibert, J.-Y. (2004). Classification under polynomial entropy and margin assumptions and randomized estimators. Preprint, Laboratoire de Probabilités et Modèles Aléatoires, Univ. Paris VI and VII. Available at www.proba.jussieu.fr/mathdoc/textes/PMA-908.pdf.
• Audibert, J.-Y. and Tsybakov, A. B. (2005). Fast learning rates for plug-in classifiers under the margin condition. Preprint, Laboratoire de Probabilités et Modèles Aléatoires, Univ. Paris VI and VII. Available at arxiv.org/abs/math/0507180.
• Bartlett, P. L., Jordan, M. I. and McAuliffe, J. D. (2006). Convexity, classification and risk bounds. J. Amer. Statist. Assoc. 101 138--156.
• Birman, M. Š. and Solomjak, M. Z. (1967). Piecewise-polynomial approximations of functions of the classes $W^\alpha_p$. Mat. Sb. (N.S.) 73 331--355.
• Blanchard, G., Bousquet, O. and Massart, P. (2004). Statistical performance of support vector machines. Unpublished manuscript. Available at www.kyb.mpg.de/publications/pss/ps2731.ps.
• Blanchard, G., Lugosi, G. and Vayatis, N. (2004). On the rate of convergence of regularized boosting classifiers. J. Mach. Learn. Res. 4 861--894.
• Boucheron, S., Bousquet, O. and Lugosi, G. (2005). Theory of classification: A survey of some recent advances. ESAIM Probab. Stat. 9 323--375.
• Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
• Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, New York.
• Kolmogorov, A. N. and Tihomirov, V. M. (1961). $\aa$-entropy and $\aa$-capacity of sets in functional space. Amer. Math. Soc. Transl. (2) 17 277--364.
• Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization (with discussion). Ann. Statist. 34 2593--2706.
• Koltchinskii, V. and Beznosova, O. (2005). Exponential convergence rates in classification. In Learning Theory. Lecture Notes in Comput. Sci. 3559 295--307. Springer, Berlin.
• Korostelëv, A. P. and Tsybakov, A. B. (1993). Minimax Theory of Image Reconstruction. Springer, New York.
• Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808--1829.
• Massart, P. and Nédélec, E. (2006). Risk bounds for statistical learning. Ann. Statist. 34 2326--2366.
• Nemirovskii, A. S., Polyak, B. T. and Tsybakov, A. B. (1985). Rate of convergence of nonparametric estimators of maximum-likelihood type. Probl. Inform. Trans. 21 258--272.
• Steinwart, I. and Scovel, J. C. (2007). Fast rates for support vector machines using Gaussian kernels. Ann. Statist. 35 575--607.
• Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040--1053.
• Tarigan, B. and van de Geer, S. (2004). Adaptivity of support vector machines with $\ell_1$ penalty. Preprint, Univ. Leiden. Available at stat.ethz.ch/~geer/svm4.pdf.
• Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135--166.
• Tsybakov, A. B. (2004). Introduction à l'estimation non-paramétrique. Springer, Berlin.
• Tsybakov, A. B. and van de Geer, S. (2005). Square root penalty: Adaptation to the margin in classification and in edge estimation. Ann. Statist. 33 1203--1224.
• van de Geer, S. (2000). Applications of Empirical Process Theory. Cambridge Univ. Press.
• Vapnik, V. N. (1998). Statistical Learning Theory. Wiley, New York.
• Vapnik, V. N. and Chervonenkis, A. Ya. (1974). Theory of Pattern Recognition. Statistical Problems of Learning. Nauka, Moscow. (In Russian.)
• Yang, Y. (1999). Minimax nonparametric classification. I. Rates of convergence. II. Model selection for adaptation. IEEE Trans. Inform. Theory 45 2271--2292.,
• Yang, Y. and Barron, A. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564--1599.