Process consistency for AdaBoost
Wenxin Jiang
Source: Ann. Statist.
Volume 32, Number 1
(2004), 13-29.
Abstract
Recent experiments and theoretical studies show that AdaBoost can overfit in the limit of large time. If running the algorithm forever is suboptimal, a natural question is how low can the prediction error be during the process of AdaBoost? We show under general regularity conditions that during the process of AdaBoost a consistent prediction is generated, which has the prediction error approximating the optimal Bayes error as the sample size increases. This result suggests that, while running the algorithm forever can be suboptimal, it is reasonable to expect that some regularization method via truncation of the process may lead to a near-optimal performance for sufficiently large sample size.
Primary Subjects: 62G99
Secondary Subjects: 68T99
Keywords: AdaBoost; Bayes error; boosting; consistency; prediction error; VC dimension
Links and Identifiers
Permanent link to this document: http://projecteuclid.org/euclid.aos/1079120128
Digital Object Identifier: doi:10.1214/aos/1079120128
Mathematical Reviews number (MathSciNet):
MR2050999
Zentralblatt MATH identifier:
02113741
References
Anthony, M. and Biggs, N. (1992). Computational Learning Theory: An Introduction. Cambridge Univ. Press.
Breiman, L. (1997). Prediction games and arcing classifiers. Technical Report 504, Dept. Statistics, Univ. California, Berkeley.
Breiman, L. (2000). Some infinity theory for predictor ensembles. Technical Report 579, Dept. Statistics, Univ. California, Berkeley.
Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119--139.
Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337--407.
Grove, A. J. and Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proc. 15th National Conference on Artificial Intelligence 692--699. AAAI Press, Menlo Park, CA.
Jiang, W. (2002). On weak base hypotheses and their implications for boosting regression and classification. Ann. Statist. 30 51--73.
Mason, L., Baxter, J., Bartlett, P. and Frean, M. (1999). Boosting algorithms as gradient descent in function space. Technical report, Dept. Systems Engineering, Australian National Univ.
Schapire, R. E. (1999). Theoretical views of boosting. In Computational Learning Theory: Proc. Fourth European Conference 1--10.
Schapire, R. E., Freund, Y., Bartlett, P. and Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651--1686.