The Annals of Statistics

Optimal aggregation of classifiers in statistical learning

Alexander B. Tsybakov
Source: Ann. Statist. Volume 32, Number 1 (2004), 135-166.

Abstract

Classification can be considered as nonparametric estimation of sets, where the risk is defined by means of a specific distance between sets associated with misclassification error. It is shown that the rates of convergence of classifiers depend on two parameters: the complexity of the class of candidate sets and the margin parameter. The dependence is explicitly given, indicating that optimal fast rates approaching $O(n^{-1})$ can be attained, where n is the sample size, and that the proposed classifiers have the property of robustness to the margin. The main result of the paper concerns optimal aggregation of classifiers: we suggest a classifier that automatically adapts both to the complexity and to the margin, and attains the optimal fast rates, up to a logarithmic factor.

First Page: Show Hide
Primary Subjects: 62G07
Secondary Subjects: 62G08, 62H30, 68T10
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1079120131
Digital Object Identifier: doi:10.1214/aos/1079120131
Mathematical Reviews number (MathSciNet): MR2051002
Zentralblatt MATH identifier: 02113753

References

Aizerman, M. A., Braverman, E. M. and Rozonoer, L. I. (1970). Method of Potential Functions in the Theory of Learning Machines. Nauka, Moscow (in Russian).
Alexander, K. S. (1984). Probability inequalities for empirical processes and a law of the iterated logarithm. Ann. Probab. 12 1041--1067. [Correction (1987) 15 428--430.]
Mathematical Reviews (MathSciNet): MR757769
Alexander, K. S. (1987). Rates of growth and sample moduli for weighted empirical processes indexed by sets. Probab. Theory Related Fields 75 379--423.
Mathematical Reviews (MathSciNet): MR890285
Digital Object Identifier: doi:10.1007/BF00318708
Zentralblatt MATH: 0596.60029
Anthony, M. and Bartlett, P. L. (1999). Neural Network Learning$:$ Theoretical Foundations. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR1741038
Zentralblatt MATH: 0968.68126
Barron, A. (1991). Complexity regularization with application to artificial neural networks. In Nonparametric Functional Estimation and Related Topics (G. Roussas, ed.) 561--576. Kluwer, Dordrecht.
Mathematical Reviews (MathSciNet): MR1154352
Zentralblatt MATH: 0739.62001
Bartlett, P. L., Boucheron, S. and Lugosi, G. (2002). Model selection and error estimation. Machine Learning 48 85--113.
Birgé, L. and Massart, P. (1993). Rates of convergence for minimum contrast estimators. Probab. Theory Related Fields 97 113--150.
Mathematical Reviews (MathSciNet): MR1240719
Digital Object Identifier: doi:10.1007/BF01199316
Zentralblatt MATH: 0805.62037
Breiman, L. (1996). Bagging predictors. Machine Learning 24 123--140.
Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
Mathematical Reviews (MathSciNet): MR726392
Zentralblatt MATH: 0541.62042
Bühlmann, P. and Yu, B. (2002). Analyzing bagging. Ann. Statist. 30 927--961.
Mathematical Reviews (MathSciNet): MR1926165
Digital Object Identifier: doi:10.1214/aos/1031689014
Project Euclid: euclid.aos/1031689014
Zentralblatt MATH: 1029.62037
Catoni, O. (2001) Randomized estimators and empirical complexity for pattern recognition and least square regression. Prépublication 677, Laboratoire de Probabilités et Modèles Aléatoires, Univ. Paris 6/7. Available at www.proba.jussieu.fr.
Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge Univ. Press.
Zentralblatt MATH: 0994.68074
Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
Mathematical Reviews (MathSciNet): MR1383093
Zentralblatt MATH: 0853.68150
Dudley, R. M. (1974). Metric entropy of some classes of sets with differentiable boundaries. J. Approximation Theory 10 227--236.
Mathematical Reviews (MathSciNet): MR358168
Digital Object Identifier: doi:10.1016/0021-9045(74)90120-8
Horváth, M. and Lugosi, G. (1998). Scale-sensitive dimensions and skeleton estimates for classification. Discrete Appl. Math. 86 37--61.
Mathematical Reviews (MathSciNet): MR1634871
Digital Object Identifier: doi:10.1016/S0166-218X(98)00013-4
Zentralblatt MATH: 0934.62065
Koltchinskii, V. (2001). Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory 47 1902--1914.
Mathematical Reviews (MathSciNet): MR1842526
Digital Object Identifier: doi:10.1109/18.930926
Zentralblatt MATH: 1008.62614
Koltchinskii, V. and Panchenko, D. (2002). Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30 1--50.
Mathematical Reviews (MathSciNet): MR1892654
Project Euclid: euclid.aos/1015362183
Korostelev, A. P. and Tsybakov, A. B. (1993). Minimax Theory of Image Reconstruction. Lecture Notes in Statist. 82. Springer, New York.
Mathematical Reviews (MathSciNet): MR1226450
Zentralblatt MATH: 0833.62039
Lepski, O. V. (1990). A problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 35 454--466.
Mathematical Reviews (MathSciNet): MR1091202
Lugosi, G. and Nobel, A. (1999). Adaptive model selection using empirical complexities. Ann. Statist. 27 1830--1864.
Mathematical Reviews (MathSciNet): MR1765619
Digital Object Identifier: doi:10.1214/aos/1017939242
Project Euclid: euclid.aos/1017939241
Zentralblatt MATH: 0962.62034
Mammen, E. and Tsybakov, A. B. (1995). Asymptotical minimax recovery of sets with smooth boundaries. Ann. Statist. 23 502--524.
Mathematical Reviews (MathSciNet): MR1332579
Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808--1829.
Mathematical Reviews (MathSciNet): MR1765618
Digital Object Identifier: doi:10.1214/aos/1017939240
Project Euclid: euclid.aos/1017939240
Zentralblatt MATH: 0961.62058
Massart, P. (2000). Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse Math. 9 245--303.
Mathematical Reviews (MathSciNet): MR1813803
Schapire, R. E., Freund, Y., Bartlett, P. L. and Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651--1686.
Mathematical Reviews (MathSciNet): MR1673273
Digital Object Identifier: doi:10.1214/aos/1024691352
Project Euclid: euclid.aos/1024691352
Zentralblatt MATH: 0929.62069
Schölkopf, B. and Smola, A. (2002). Learning with Kernels. MIT Press.
Tsybakov, A. B. (2002). Discussion of ``Random rates in anisotropic regression,'' by M. Hoffmann and O. Lepskii. Ann. Statist. 30 379--385.
Mathematical Reviews (MathSciNet): MR1902892
Digital Object Identifier: doi:10.1214/aos/1021379858
Project Euclid: euclid.aos/1021379858
Zentralblatt MATH: 1012.62042
van de Geer, S. (2000). Applications of Empirical Process Theory. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR1739079
Zentralblatt MATH: 0953.62049
van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
Mathematical Reviews (MathSciNet): MR1385671
Zentralblatt MATH: 0862.60002
Vapnik, V. N. (1982). Estimation of Dependencies Based on Empirical Data. Springer, New York.
Mathematical Reviews (MathSciNet): MR672244
Zentralblatt MATH: 0499.62005
Vapnik, V. N. (1998). Statistical Learning Theory. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1641250
Zentralblatt MATH: 0935.62007
Vapnik, V. N. and Chervonenkis, A. Ya. (1974). Theory of Pattern Recognition. Nauka, Moscow (in Russian).
Mathematical Reviews (MathSciNet): MR474638
Yang, Y. (1999). Minimax nonparametric classification. I. Rates of convergence. II. Model selection for adaptation. IEEE Trans. Inform. Theory 45 2271--2292.
Mathematical Reviews (MathSciNet): MR1725115
Digital Object Identifier: doi:10.1109/18.796368
Zentralblatt MATH: 0962.62026

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?