The Annals of Statistics

Process consistency for AdaBoost

Wenxin Jiang

Full-text: Open access

Abstract

Recent experiments and theoretical studies show that AdaBoost can overfit in the limit of large time. If running the algorithm forever is suboptimal, a natural question is how low can the prediction error be during the process of AdaBoost? We show under general regularity conditions that during the process of AdaBoost a consistent prediction is generated, which has the prediction error approximating the optimal Bayes error as the sample size increases. This result suggests that, while running the algorithm forever can be suboptimal, it is reasonable to expect that some regularization method via truncation of the process may lead to a near-optimal performance for sufficiently large sample size.

Article information

Source
Ann. Statist. Volume 32, Number 1 (2004), 13-29.

Dates
First available in Project Euclid: 12 March 2004

Permanent link to this document
http://projecteuclid.org/euclid.aos/1079120128

Digital Object Identifier
doi:10.1214/aos/1079120128

Mathematical Reviews number (MathSciNet)
MR2050999

Zentralblatt MATH identifier
02113741

Subjects
Primary: 62G99: None of the above, but in this section
Secondary: 68T99: None of the above, but in this section

Keywords
AdaBoost Bayes error boosting consistency prediction error VC dimension

Citation

Jiang, Wenxin. Process consistency for AdaBoost. Ann. Statist. 32 (2004), no. 1, 13--29. doi:10.1214/aos/1079120128. http://projecteuclid.org/euclid.aos/1079120128.


Export citation

References

  • Anthony, M. and Biggs, N. (1992). Computational Learning Theory: An Introduction. Cambridge Univ. Press.
  • Breiman, L. (1997). Prediction games and arcing classifiers. Technical Report 504, Dept. Statistics, Univ. California, Berkeley.
  • Breiman, L. (2000). Some infinity theory for predictor ensembles. Technical Report 579, Dept. Statistics, Univ. California, Berkeley.
  • Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119--139.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337--407.
  • Grove, A. J. and Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proc. 15th National Conference on Artificial Intelligence 692--699. AAAI Press, Menlo Park, CA.
  • Jiang, W. (2002). On weak base hypotheses and their implications for boosting regression and classification. Ann. Statist. 30 51--73.
  • Mason, L., Baxter, J., Bartlett, P. and Frean, M. (1999). Boosting algorithms as gradient descent in function space. Technical report, Dept. Systems Engineering, Australian National Univ.
  • Schapire, R. E. (1999). Theoretical views of boosting. In Computational Learning Theory: Proc. Fourth European Conference 1--10.
  • Schapire, R. E., Freund, Y., Bartlett, P. and Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651--1686.