The Annals of Statistics

Process consistency for AdaBoost

Wenxin Jiang

Source: Ann. Statist. Volume 32, Number 1 (2004), 13-29.

Abstract

Recent experiments and theoretical studies show that AdaBoost can overfit in the limit of large time. If running the algorithm forever is suboptimal, a natural question is how low can the prediction error be during the process of AdaBoost? We show under general regularity conditions that during the process of AdaBoost a consistent prediction is generated, which has the prediction error approximating the optimal Bayes error as the sample size increases. This result suggests that, while running the algorithm forever can be suboptimal, it is reasonable to expect that some regularization method via truncation of the process may lead to a near-optimal performance for sufficiently large sample size.

Primary Subjects: 62G99
Secondary Subjects: 68T99
Keywords: AdaBoost; Bayes error; boosting; consistency; prediction error; VC dimension

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1079120128
Digital Object Identifier: doi:10.1214/aos/1079120128
Mathematical Reviews number (MathSciNet): MR2050999
Zentralblatt MATH identifier: 02113741

References

Anthony, M. and Biggs, N. (1992). Computational Learning Theory: An Introduction. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR1159707
Zentralblatt MATH: 0755.68115
Breiman, L. (1997). Prediction games and arcing classifiers. Technical Report 504, Dept. Statistics, Univ. California, Berkeley.
Mathematical Reviews (MathSciNet): MR1635406
Digital Object Identifier: doi:10.1214/aos/1024691079
Project Euclid: euclid.aos/1024691079
Breiman, L. (2000). Some infinity theory for predictor ensembles. Technical Report 579, Dept. Statistics, Univ. California, Berkeley.
Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
Mathematical Reviews (MathSciNet): MR1383093
Zentralblatt MATH: 0853.68150
Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119--139.
Mathematical Reviews (MathSciNet): MR1473055
Digital Object Identifier: doi:10.1006/jcss.1997.1504
Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337--407.
Mathematical Reviews (MathSciNet): MR1790002
Digital Object Identifier: doi:10.1214/aos/1016218223
Project Euclid: euclid.aos/1016218223
Grove, A. J. and Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proc. 15th National Conference on Artificial Intelligence 692--699. AAAI Press, Menlo Park, CA.
Jiang, W. (2002). On weak base hypotheses and their implications for boosting regression and classification. Ann. Statist. 30 51--73.
Mathematical Reviews (MathSciNet): MR1892655
Digital Object Identifier: doi:10.1214/aos/1015362184
Project Euclid: euclid.aos/1015362184
Mason, L., Baxter, J., Bartlett, P. and Frean, M. (1999). Boosting algorithms as gradient descent in function space. Technical report, Dept. Systems Engineering, Australian National Univ.
Schapire, R. E. (1999). Theoretical views of boosting. In Computational Learning Theory: Proc. Fourth European Conference 1--10.
Mathematical Reviews (MathSciNet): MR1724975
Schapire, R. E., Freund, Y., Bartlett, P. and Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651--1686.
Mathematical Reviews (MathSciNet): MR1673273
Digital Object Identifier: doi:10.1214/aos/1024691352
Project Euclid: euclid.aos/1024691352

2010 © Institute of Mathematical Statistics