The Annals of Statistics

On the Bayes-risk consistency of regularized boosting methods

Gábor Lugosi and Nicolas Vayatis

Full-text: Open access

Abstract

The probability of error of classification methods based on convex combinations of simple base classifiers by "boosting" algorithms is investigated. The main result of the paper is that certain regularized boosting algorithms provide Bayes-risk consistent classifiers under the sole assumption that the Bayes classifier may be approximated by a convex combination of the base classifiers. Nonasymptotic distribution-free bounds are also developed which offer interesting new insight into how boosting works and help explain its success in practical classification problems.

Article information

Source
Ann. Statist., Volume 32, Number 1 (2004), 30-55.

Dates
First available in Project Euclid: 12 March 2004

Permanent link to this document
https://projecteuclid.org/euclid.aos/1079120129

Digital Object Identifier
doi:10.1214/aos/1079120129

Mathematical Reviews number (MathSciNet)
MR2051000

Zentralblatt MATH identifier
1105.62319

Subjects
Primary: 60G99: None of the above, but in this section 62C12: Empirical decision procedures; empirical Bayes procedures 62G99: None of the above, but in this section

Keywords
Boosting classification Bayes-risk consistency penalized model selection smoothing parameter convex cost functions empirical processes

Citation

Lugosi, Gábor; Vayatis, Nicolas. On the Bayes-risk consistency of regularized boosting methods. Ann. Statist. 32 (2004), no. 1, 30--55. doi:10.1214/aos/1079120129. https://projecteuclid.org/euclid.aos/1079120129


Export citation

References

  • Amit, Y. and Blanchard, G. (2001). Multiple randomized classifiers. Unpublished manuscript.
  • Blanchard, G. (2001). Méthodes de mélange et d'agrégation d'estimateurs en reconnaissance de formes. Application aux arbres de décision. Ph.D. dissertation, Univ. Paris XIII. (In English.)
  • Breiman, L. (1996a). Bagging predictors. Machine Learning 24 123--140.
  • Breiman, L. (1996b). Bias, variance, and arcing classifiers. Technical Report 460, Dept. Statistics, Univ. California, Berkeley.
  • Breiman, L. (1997a). Arcing the edge. Technical Report 486, Dept. Statistics, Univ. California, Berkeley.
  • Breiman, L. (1997b). Pasting bites together for prediction in large data sets and on-line. Technical report, Dept. Statistics, Univ. California, Berkeley.
  • Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801--849.
  • Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493--1517.
  • Breiman, L. (2000). Some infinite theory for predictor ensembles. Technical Report 577, Dept. Statistics, Univ. California, Berkeley.
  • Bühlmann, P. and Yu, B. (2000). Discussion of ``Additive logistic regression: A statistical view of boosting'' by J. Friedman, T. Hastie and R. Tibshirani. Ann. Statist. 28 377--386.
  • Bühlmann, P. and Yu, B. (2003). Boosting with the $L_2$-loss: Regression and classification. J. Amer. Statist. Assoc. 98 324--339.
  • Collins, M., Schapire, R. E. and Singer, Y. (2002). Logistic regression, AdaBoost and Bregman distances. Machine Learning 48 253--285.
  • Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • Devroye, L. and Lugosi, G. (2001). Combinatorial Methods in Density Estimation. Springer, New York.
  • Freund, Y. (1995). Boosting a weak learning algorithm by majority. Inform. and Comput. 121 256--285.
  • Freund, Y., Mansour, Y. and Schapire, R. E. (2001). Why averaging classifiers can protect against overfitting. In Proc. Eighth International Workshop on Artificial Intelligence and Statistics.
  • Freund, Y. and Schapire, R. E. (1996a). Experiments with a new boosting algorithm. In Machine Learning: Proc. 13th International Conference 148--156. Morgan Kaufmann, San Francisco.
  • Freund, Y. and Schapire, R. E. (1996b). Game theory, on-line prediction and boosting. In Proc. 9th Annual Conference on Computational Learning Theory 325--332. ACM Press, New York.
  • Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119--139.
  • Freund, Y. and Schapire, R. E. (2000). Discussion of ``Additive logistic regression: A statistical view of boosting,'' by J. Friedman, T. Hastie and R. Tibshirani. Ann. Statist. 28 391--393.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337--407.
  • Jiang, W. (2001). Some theoretical aspects of boosting in the presence of noisy data. In Proc. 18th International Conference on Machine Learning (ICML-2001) 234--241. Morgan Kaufmann, San Francisco.
  • Jiang, W. (2004). Process consistency for AdaBoost. Ann. Statist. 32 13--29.
  • Koltchinskii, V. and Panchenko, D. (2002). Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30 1--50.
  • Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Springer, New York.
  • Mannor, S. and Meir, R. (2001). Weak learners and improved convergence rate in boosting. In Advances in Neural Information Processing Systems 13. Proc. NIPS'2000 280--286.
  • Mannor, S., Meir, R. and Mendelson, S. (2001). On the consistency of boosting algorithms. Unpublished manuscript.
  • Mannor, S., Meir, R. and Zhang, T. (2002). The consistency of greedy algorithms for classification. In Proc. 15th Annual Conference on Computational Learning Theory. Lecture Notes in Computer Science 2375 319--333. Springer, New York.
  • Mason, L., Baxter, J., Bartlett, P. L. and Frean, M. (2000). Functional gradient techniques for combining hypotheses. In Advances in Large Margin Classifiers (A. J. Smola, P. L. Bartlett, B. Schölkopf and D. Schuurmans, eds.) 221--247. MIT Press, Cambridge, MA.
  • Schapire, R. E. (1990). The strength of weak learnability. Machine Learning 5 197--227.
  • Schapire, R. E., Freund, Y., Bartlett, P. and Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651--1686.
  • Steinwart, I. (2001). On the generalization ability of support vector machines. Unpublished manuscript.
  • Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32 56--85.