Bernoulli

  • Bernoulli
  • Volume 13, Number 4 (2007), 1000-1022.

Optimal rates of aggregation in classification under low noise assumption

Guillaume Lecué

Full-text: Open access

Abstract

In the same spirit as Tsybakov, we define the optimality of an aggregation procedure in the problem of classification. Using an aggregate with exponential weights, we obtain an optimal rate of convex aggregation for the hinge risk under the margin assumption. Moreover, we obtain an optimal rate of model selection aggregation under the margin assumption for the excess Bayes risk.

Article information

Source
Bernoulli, Volume 13, Number 4 (2007), 1000-1022.

Dates
First available in Project Euclid: 9 November 2007

Permanent link to this document
https://projecteuclid.org/euclid.bj/1194625600

Digital Object Identifier
doi:10.3150/07-BEJ6044

Mathematical Reviews number (MathSciNet)
MR2364224

Zentralblatt MATH identifier
1129.62060

Keywords
aggregation of classifiers classification optimal rates margin

Citation

Lecué, Guillaume. Optimal rates of aggregation in classification under low noise assumption. Bernoulli 13 (2007), no. 4, 1000--1022. doi:10.3150/07-BEJ6044. https://projecteuclid.org/euclid.bj/1194625600


Export citation

References

  • Audibert, J.-Y. and Tsybakov, A.B. (2007). Fast learning rates for plug-in classifiers under margin condition., Ann. Statist. 35. To appear.
  • Bartlett, P.L., Freund, Y., Lee, W.S. and Schapire, R.E. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods., Ann. Statist. 26 1651--1686.
  • Bartlett, P.L., Jordan, M.I. and McAuliffe, J.D. (2006). Convexity, classification and risk bounds., J. Amer. Statist. Assoc. 101 138--156.
  • Birgé, L. (2006). Model selection via testing: An alternative to (penalized) maximum likelihood estimators., Ann. Inst. H. Poincaré Probab. Statist. 42 273--325.
  • Blanchard, G., Bousquet, O. and Massart, P. (2004). Statistical performance of support vector machines. Available at, http//mahery.math.u-psud.fr/~blanchard/publi/.
  • Blanchard, G., Lugosi, G. and Vayatis, N. (2003). On the rate of convergence of regularized boosting classifiers., J. Mach. Learn. Res. 4 861--894.
  • Boucheron, S., Bousquet, O. and Lugosi, G. (2005). Theory of classification: A survey of some recent advances., ESAIM Probab. Statist. 9 323--375.
  • Bühlmann, P. and Yu, B. (2002). Analyzing bagging., Ann. Statist. 30 927--961.
  • Bunea, F., Tsybakov, A.B. and Wegkamp, M. (2005). Aggregation for Gaussian regression., Ann. Statist. To appear. Available at http://www.stat.fsu.edu/~wegkamp.
  • Catoni, O. (1999). ``Universal'' aggregation rules with exact bias bounds. Preprint n. 510, LPMA. Available at, http://www.proba.jussieu.fr/mathdoc/preprints/index.html.
  • Catoni, O. (2001)., Statistical Learning Theory and Stochastic Optimization. Ecole d'Été de Probabilités de Saint-Flour 2001. Lecture Notes in Math. 1851. New York: Springer.
  • Chesneau, C. and Lecué, G. (2006). Adapting to unknown smoothness by aggregation of thresholded wavelet estimators., Submitted.
  • Cortes, C. and Vapnik, V. (1995). Support-vector networks., Machine Learning 20 273--297.
  • Devroye, L., Györfi, L. and Lugosi, G. (1996)., A Probabilistic Theory of Pattern Recognition. New York: Springer.
  • Freund, Y. and Schapire, R. (1997). A decision-theoric generalization of on-line learning and an application to boosting., J. Comput. Syst. Sci. 55 119--139.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion)., Ann. Statist. 28 337--407.
  • Herbei, R. and Wegkamp, H. (2006). Classification with reject option., Canad. J. Statist. 34 709--721.
  • Juditsky, A. and Nemirovski, A. (2000). Functional aggregation for nonparametric estimation., Ann. Statist. 28 681--712.
  • Juditsky, A., Rigollet, P. and Tsybakov, A.B. (2006). Learning by mirror averaging. Preprint n. 1034, Laboratoire de Probabilités et Modèle aléatoires, Univ. Paris 6 and Paris 7. Available at, http://www.proba.jussieu.fr/mathdoc/preprints/index.html#2005.
  • Lecué, G. (2006). Optimal oracle inequality for aggregation of classifiers under low noise condition. In, Proceedings of the 19th Annual Conference on Learning Theory, COLT 2006 32 364--378.
  • Lecué, G. (2007). Simultaneous adaptation to the margin and to complexity in classification., Ann. Statist. To appear. Available at http://hal.ccsd.cnrs.fr/ccsd-00009241/en/.
  • Lecué, G. (2007). Suboptimality of penalized empirical risk minimization. In, COLT07. To appear.
  • Lin, Y. (1999). A note on margin-based loss functions in classification. Technical Report 1029r, Dept. Statistics, Univ. Wisconsin, Madison.
  • Lugosi, G. and Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods., Ann. Statist. 32 30--55.
  • Mammen, E. and Tsybakov, A.B. (1999). Smooth discrimination analysis., Ann. Statist. 27 1808--1829.
  • Massart, P. (2000). Some applications of concentration inequalities to statistics., Ann. Fac. Sci. Toulouse Math. (6) 2 245--303.
  • Massart, P. (2004). Concentration inequalities and model selection., Lectures Notes of Saint Flour.
  • Massart, P. and Nédélec, E. (2006). Risk bound for statistical learning., Ann. Statist. 34 2326--2366.
  • Nemirovski, A. (2000). Topics in non-parametric statistics., Ecole d'Été de Probabilités de Saint-Flour 1998. Lecture Notes in Math. 1738 85--277. New York: Springer.
  • Schölkopf, B. and Smola, A. (2002)., Learning with Kernels. MIT Press.
  • Steinwart, I. and Scovel, C. (2005). Fast rates for support vector machines. In, Proceedings of the 18th Annual Conference on Learning Theory, COLT 2005. Berlin: Springer.
  • Steinwart, I. and Scovel, C. (2007). Fast rates for support vector machines using Gaussian kernels., Ann. Statist. 35 575--607.
  • Tsybakov, A.B. (2003). Optimal rates of aggregation. In, Computational Learning Theory and Kernel Machines (B. Schölkopf and M. Warmuth, eds.). Lecture Notes in Artificial Intelligence 2777 303--313. Heidelberg: Springer.
  • Tsybakov, A.B. (2004). Optimal aggregation of classifiers in statistical learning., Ann. Statist. 32 135--166.
  • Vovk, V.G. (1990). Aggregating strategies. In, Proceedings of the 3rd Annual Workshop on Computational Learning Theory, COLT90 371--386. San Mateo, CA: Morgan Kaufmann.
  • Yang, Y. (1999). Minimax nonparametric classification. I. Rates of convergence., IEEE Trans. on Inform. Theory 45 2271--2284.
  • Yang, Y. (1999). Minimax nonparametric classification. II. Model selection for adaptation., IEEE Trans. Inform. Theory 45 2285--2292.
  • Yang, Y. (2000). Mixing strategies for density estimation., Ann. Statist. 28 75--87.
  • Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization., Ann. Statist. 32 56--85.