The Annals of Statistics

Fast learning rates for plug-in classifiers

Jean-Yves Audibert and Alexandre B. Tsybakov

Full-text: Open access


It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, that is, rates faster than n−1/2. The work on this subject has suggested the following two conjectures: (i) the best achievable fast rate is of the order n−1, and (ii) the plug-in classifiers generally converge more slowly than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only fast, but also super-fast rates, that is, rates faster than n−1. We establish minimax lower bounds showing that the obtained rates cannot be improved.

Article information

Ann. Statist., Volume 35, Number 2 (2007), 608-633.

First available in Project Euclid: 5 July 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G07: Density estimation
Secondary: 62G08: Nonparametric regression 62H05: Characterization and structure theory 68T10: Pattern recognition, speech recognition {For cluster analysis, see 62H30}

Classification statistical learning fast rates of convergence excess risk plug-in classifiers minimax lower bounds


Audibert, Jean-Yves; Tsybakov, Alexandre B. Fast learning rates for plug-in classifiers. Ann. Statist. 35 (2007), no. 2, 608--633. doi:10.1214/009053606000001217.

Export citation


  • Audibert, J.-Y. (2004). Classification under polynomial entropy and margin assumptions and randomized estimators. Preprint, Laboratoire de Probabilités et Modèles Aléatoires, Univ. Paris VI and VII. Available at
  • Audibert, J.-Y. and Tsybakov, A. B. (2005). Fast learning rates for plug-in classifiers under the margin condition. Preprint, Laboratoire de Probabilités et Modèles Aléatoires, Univ. Paris VI and VII. Available at
  • Bartlett, P. L., Jordan, M. I. and McAuliffe, J. D. (2006). Convexity, classification and risk bounds. J. Amer. Statist. Assoc. 101 138--156.
  • Birman, M. Š. and Solomjak, M. Z. (1967). Piecewise-polynomial approximations of functions of the classes $W^\alpha_p$. Mat. Sb. (N.S.) 73 331--355.
  • Blanchard, G., Bousquet, O. and Massart, P. (2004). Statistical performance of support vector machines. Unpublished manuscript. Available at
  • Blanchard, G., Lugosi, G. and Vayatis, N. (2004). On the rate of convergence of regularized boosting classifiers. J. Mach. Learn. Res. 4 861--894.
  • Boucheron, S., Bousquet, O. and Lugosi, G. (2005). Theory of classification: A survey of some recent advances. ESAIM Probab. Stat. 9 323--375.
  • Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, New York.
  • Kolmogorov, A. N. and Tihomirov, V. M. (1961). $\aa$-entropy and $\aa$-capacity of sets in functional space. Amer. Math. Soc. Transl. (2) 17 277--364.
  • Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization (with discussion). Ann. Statist. 34 2593--2706.
  • Koltchinskii, V. and Beznosova, O. (2005). Exponential convergence rates in classification. In Learning Theory. Lecture Notes in Comput. Sci. 3559 295--307. Springer, Berlin.
  • Korostelëv, A. P. and Tsybakov, A. B. (1993). Minimax Theory of Image Reconstruction. Springer, New York.
  • Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808--1829.
  • Massart, P. and Nédélec, E. (2006). Risk bounds for statistical learning. Ann. Statist. 34 2326--2366.
  • Nemirovskii, A. S., Polyak, B. T. and Tsybakov, A. B. (1985). Rate of convergence of nonparametric estimators of maximum-likelihood type. Probl. Inform. Trans. 21 258--272.
  • Steinwart, I. and Scovel, J. C. (2007). Fast rates for support vector machines using Gaussian kernels. Ann. Statist. 35 575--607.
  • Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040--1053.
  • Tarigan, B. and van de Geer, S. (2004). Adaptivity of support vector machines with $\ell_1$ penalty. Preprint, Univ. Leiden. Available at
  • Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135--166.
  • Tsybakov, A. B. (2004). Introduction à l'estimation non-paramétrique. Springer, Berlin.
  • Tsybakov, A. B. and van de Geer, S. (2005). Square root penalty: Adaptation to the margin in classification and in edge estimation. Ann. Statist. 33 1203--1224.
  • van de Geer, S. (2000). Applications of Empirical Process Theory. Cambridge Univ. Press.
  • Vapnik, V. N. (1998). Statistical Learning Theory. Wiley, New York.
  • Vapnik, V. N. and Chervonenkis, A. Ya. (1974). Theory of Pattern Recognition. Statistical Problems of Learning. Nauka, Moscow. (In Russian.)
  • Yang, Y. (1999). Minimax nonparametric classification. I. Rates of convergence. II. Model selection for adaptation. IEEE Trans. Inform. Theory 45 2271--2292.,
  • Yang, Y. and Barron, A. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564--1599.