Open Access
February 2002 On weak base Hypotheses and their implications for boosting regression and classification
Wenxin Jiang
Ann. Statist. 30(1): 51-73 (February 2002). DOI: 10.1214/aos/1015362184

Abstract

When studying the training error and the prediction error for boosting, it is often assumed that the hypotheses returned by the base learner are weakly accurate, or are able to beat a random guesser by a certain amount of difference. It has been an open question how much this difference can be, whether it will eventually disappear in the boosting process or be bounded by a positive amount. This question is crucial for the behavior of both the training error and the prediction error. In this paper we study this problem and show affirmatively that the amount of improvement over the random guesser will be at least a positive amount for almost all possible sample realizations and for most of the commonly used base hypotheses. This has a number of implications for the prediction error, including, for example, that boosting forever may not be good and regularization may be necessary. The problem is studied by first considering an analog of AdaBoost in regression, where we study similar properties and find that, for good performance, one cannot hope to avoid regularization by just adopting the boosting device to regression.

Citation

Download Citation

Wenxin Jiang. "On weak base Hypotheses and their implications for boosting regression and classification." Ann. Statist. 30 (1) 51 - 73, February 2002. https://doi.org/10.1214/aos/1015362184

Information

Published: February 2002
First available in Project Euclid: 5 March 2002

zbMATH: 1012.62066
MathSciNet: MR1892655
Digital Object Identifier: 10.1214/aos/1015362184

Subjects:
Primary: 62G99
Secondary: 68T99

Keywords: Angular span , boosting , ‎classification‎ , error bounds , least squares regression , matching pursuit , nearest neighbor rule , overfit , prediction error , regularization , training error , weak hypotheses

Rights: Copyright © 2002 Institute of Mathematical Statistics

Vol.30 • No. 1 • February 2002
Back to Top