The Annals of Statistics

Three papers on boosting: an introduction

Vladimir Koltchinskii and Bin Yu

Full-text: Open access

Abstract

The notion of boosting originated in the Machine Learning literature in the 1980's [VALIANT, L.G. (1984). A theory of the learnable. In Proc. 16th Annual ACM Symposium on Theory of Computing 436-445. ACM Press, New York]. The goal of boosting is to improve the generalization performance of weak (or base) learning algorithms by combining them in a certain way. The first algorithm of this type was discovered by Schapire [SCHAPIRE, R.E. (1990). The strength of weak learnability. Machine Learning 5 197-227] and then the second one by Freund [FREUND, Y. (1995). Boosting a weak learning algorithm by majority. Inform. and Comput. 121 256-285]. Schapire and Freund [FREUND, Y. and Schapire. R.E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System. Sci. 55 119-139] came up with the idea of a more practical version of boosting and invented the algorithm called AdaBoost that combines simple classification rules into much more powerful and precise classification algorithms. For a fixed number of iterations, AdaBoost runs the weak (or base) learning algorithm on resampled original data sets in a sequential manner and then combines the resulting learning algorithms through a weighted summation at the end of the iteration. Gradually, it became clear that AdaBoost is a special case of a more general statistical methodology of combining simple estimates in classification or regression into more complex and more precise ones. The study of statistical properties of these methods has been conducted in several directions since then in both the machine learning and statistics communities.

The problem of consistency of AdaBoost is posed by Leo Breiman in the first paper in this issue of The Annals of Statistics. Breiman studies one ingredient needed to prove the consistency, the convergence properties of AdaBoost as a numerical method in the population case. This paper has been circulated for a couple of years as a preprint and its results were also covered in the Wald Lectures delivered by Breiman at the IMS Annual Meeting in 2002 in Banff, Canada. The papers by Jiang, Lugosi and Vayatis, and Zhang, published below with discussions, consider various versions of boosting and give answers to the consistency question posed by Breiman.

Article information

Source
Ann. Statist., Volume 32, Number 1 (2004), 12.

Dates
First available in Project Euclid: 12 March 2004

Permanent link to this document
https://projecteuclid.org/euclid.aos/1079120127

Digital Object Identifier
doi:10.1214/aos/1079120127

Zentralblatt MATH identifier
1105.62318

Citation

Koltchinskii, Vladimir; Yu, Bin. Three papers on boosting: an introduction. Ann. Statist. 32 (2004), no. 1, 12. doi:10.1214/aos/1079120127. https://projecteuclid.org/euclid.aos/1079120127


Export citation

References

  • Freund, Y. (1995). Boosting a weak learning algorithm by majority. Inform. and Comput. 121 256--285.
  • Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119--139.
  • Schapire, R. E. (1990). The strength of weak learnability. Machine Learning 5 197--227.
  • Valiant, L. G. (1984). A theory of the learnable. In Proc. 16th Annual ACM Symposium on Theory of Computing 436--445. ACM Press, New York.