The Annals of Statistics

On weak base Hypotheses and their implications for boosting regression and classification

Wenxin Jiang

Full-text: Open access


When studying the training error and the prediction error for boosting, it is often assumed that the hypotheses returned by the base learner are weakly accurate, or are able to beat a random guesser by a certain amount of difference. It has been an open question how much this difference can be, whether it will eventually disappear in the boosting process or be bounded by a positive amount. This question is crucial for the behavior of both the training error and the prediction error. In this paper we study this problem and show affirmatively that the amount of improvement over the random guesser will be at least a positive amount for almost all possible sample realizations and for most of the commonly used base hypotheses. This has a number of implications for the prediction error, including, for example, that boosting forever may not be good and regularization may be necessary. The problem is studied by first considering an analog of AdaBoost in regression, where we study similar properties and find that, for good performance, one cannot hope to avoid regularization by just adopting the boosting device to regression.

Article information

Ann. Statist., Volume 30, Number 1 (2002), 51-73.

First available in Project Euclid: 5 March 2002

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G99: None of the above, but in this section
Secondary: 68T99

Angular span boosting classification error bounds least squares regression matching pursuit nearest neighbor rule overfit prediction error regularization training error weak hypotheses


Jiang, Wenxin. On weak base Hypotheses and their implications for boosting regression and classification. Ann. Statist. 30 (2002), no. 1, 51--73. doi:10.1214/aos/1015362184.

Export citation


  • ANTHONY, M. and BIGGS, N. (1992). Computational Learning Theory: An Introduction. Cambridge Univ. Press.
  • BREIMAN, L. (1996). Bagging predictors. Machine Learning 24 123-140.
  • BREIMAN, L. (1997a). Prediction games and arcing classifiers. Technical Report 504, Dept. Statistics, Univ. California, Berkeley.
  • BREIMAN, L. (1997b). Arcing the edge. Technical Report 486, Dept. Statistics, Univ. California, Berkeley.
  • BREIMAN, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801-849.
  • BREIMAN, L. (1999). Using adaptive bagging to debias regressions. Technical Report 547, Dept. Statistics, Univ. California, Berkeley.
  • BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R. A. and STONE, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
  • BÜHLMANN, P. and YU, B. (2000). Explaining bagging. Technical report, Dept. Statistics, Univ. California, Berkeley.
  • DEVROYE, L., GYÖRFI, L. and LUGOSI, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • DONOHO, D. L. and JOHNSTONE, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425-455.
  • FREUND, Y. (1995). Boosting a weak learning algorithm by majority. Inform. and Comput. 121 256-285.
  • FREUND, Y. and SCHAPIRE, R. E. (1996). Game theory, on-line prediction and boosting. In Proceedings of the Ninth Annual ACM Conference on Computational Learning Theory 325-332. ACM Press, New York.
  • FREUND, Y. and SCHAPIRE, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119-139.
  • FRIEDMAN, J. H. (1999a). Greedy function approximation: a gradient boosting machine. Technical report, Dept. Statistics, Stanford Univ.
  • FRIEDMAN, J. H. (1999b). Stochastic gradient boosting. Technical report, Dept. Statistics, Stanford Univ.
  • FRIEDMAN, J., HASTIE, T. and TIBSHIRANI, R. (2000). Additive logistic regression: a statistical view of boosting (with discussion). Ann. Statist. 28 337-407.
  • GOLDMANN, M., HASTAD, J. and RAZBOROV, A. (1992). Majority gates vs. general weighted threshold gates. Comput. Complexity 2 277-300.
  • GROVE, A. J. and SCHUURMANS, D. (1998). Boosting in the limit: maximizing the margin of learned ensembles. In Proceedings of the 15th National Conference on Artificial Intelligence. AAAI, Menlo Park, CA.
  • HAUSSLER, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications. Inform. and Comput. 100 78-150.
  • JACOBS, R. A., JORDAN, M. I., NOWLAN, S. J. and HINTON, G. E. (1991). Adaptive mixtures of local experts. Neural Comput. 3 79-87.
  • JIANG, W. (2000a). On weak base hypotheses and their implications for boosting regression and classification. Technical Report 00-01, Dept. Statistics, Northwestern Univ.
  • JIANG, W. (2000b). Process consistency for AdaBoost. Technical Report 00-05, Dept. Statistics, Northwestern Univ.
  • MALLAT, S. and ZHANG, S. (1993). Matching pursuit in a time-frequency dictionary. IEEE Trans. Signal Processing 41 3397-3415.
  • MASON, L., BAXTER, J., BARTLETT, P. and FREAN, M. (1999). Boosting algorithms as gradient descent in function space. Technical report, Dept. Systems Engineering, Australian National Univ.
  • SCHAPIRE, R. E. (1990). The strength of weak learnability. Machine Learning 5 197-227.
  • SCHAPIRE, R. E. (1999). Theoretical views of boosting. In Computational Learning Theory. Lecture Notes in Comput. Sci. 1572 1-10. Springer, Berlin.
  • SCHAPIRE, R. E., FREUND, Y., BARTLETT, P. and LEE, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651-1686.
  • VAPNIK, V. N. (1998). Statistical Learning Theory. Wiley, New York.
  • YANG, Y. (1999). Minimax nonparametric classification. I. Rates of convergence. IEEE Trans. Inform. Theory 45 2271-2284.