Statistical Science

Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting

Andreas Buja, David Mease, and Abraham J. Wyner

Full-text: Open access

Abstract

The authors are doing the readers of Statistical Science a true service with a well-written and up-to-date overview of boosting that originated with the seminal algorithms of Freund and Schapire. Equally, we are grateful for high-level software that will permit a larger readership to experiment with, or simply apply, boosting-inspired model fitting. The authors show us a world of methodology that illustrates how a fundamental innovation can penetrate every nook and cranny of statistical thinking and practice. They introduce the reader to one particular interpretation of boosting and then give a display of its potential with extensions from classification (where it all started) to least squares, exponential family models, survival analysis, to base-learners other than trees such as smoothing splines, to degrees of freedom and regularization, and to fascinating recent work in model selection. The uninitiated reader will find that the authors did a nice job of presenting a certain coherent and useful interpretation of boosting. The other reader, though, who has watched the business of boosting for a while, may have quibbles with the authors over details of the historic record and, more importantly, over their optimism about the current state of theoretical knowledge. In fact, as much as “the statistical view” has proven fruitful, it has also resulted in some ideas about why boosting works that may be misconceived, and in some recommendations that may be misguided.

Article information

Source
Statist. Sci., Volume 22, Number 4 (2007), 506-512.

Dates
First available in Project Euclid: 7 April 2008

Permanent link to this document
https://projecteuclid.org/euclid.ss/1207580164

Digital Object Identifier
doi:10.1214/07-STS242B

Mathematical Reviews number (MathSciNet)
MR2420455

Zentralblatt MATH identifier
1246.62165

Citation

Buja, Andreas; Mease, David; Wyner, Abraham J. Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting. Statist. Sci. 22 (2007), no. 4, 506--512. doi:10.1214/07-STS242B. https://projecteuclid.org/euclid.ss/1207580164


Export citation

References

  • Amit, Y. and Blanchard, G. (2001). Multiple randomized classifiers: MRCL. Technical report, Univ. Chicago.
  • Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation 9 1545–1588.
  • Breiman, L. (1997). Arcing the edge. Technical Report 486, Dept. Statistics, Univ. California. Available at www.stat.berkeley.edu.
  • Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801–849.
  • Breiman, L. (1999). Random forests—Random features. Technical Report 567, Dept. Statistics, Univ. California. Available at www.stat.berkeley.edu.
  • Breiman, L. (2000a). Some infinity theory for predictor ensembles. Technical Report 577, Dept. Statistics, Univ. California. Available at www.stat.berkeley.edu.
  • Breiman, L. (2000b). Discussion of “Additive logistic regression: A statistical view of boosting,” by J. Friedman, T. Hastie and R. Tibshirani. Ann. Statist. 28 374–377.
  • Breiman, L. (2004). Population theory for boosting ensembles. Ann. Statist. 32 1–11.
  • Buja, A., Stuetzle, W. and Shen, Y. (2005). Loss functions for binary class probability estimation: Structure and applications. Technical report, Univ. Washington. Available at http://www.stat.washington.edu/wxs/Learning-papers/paper-proper-scoring.pdf.
  • Bühlmann, P. and Yu, B. (2003). Boosting with the L2 loss: Regression and classification. J. Amer. Statist. Assoc. 98 324–339.
  • Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189–1232.
  • Friedman, J. H. (2002). Stochastic gradient boosting. Comput. Statist. Data Anal. 38 367–378.
  • Friedman, J. H., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 38 367–378.
  • Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA.
  • Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119–139.
  • Mason, L., Baxter, J., Bartlett, P. and Frean, M. (2000). Functional gradient techniques for combining hypotheses. In Advances in Large Margin Classifiers (A. Smola, P. Bartlett, B. Schölkopf and D. Schuurmans, eds.). MIT Press, Cambridge.
  • Mease, D., Wyner, A. and Buja, A. (2007). Boosted classification trees and class probability/quantile estimation. J. Machine Learning Research 8 409–439.
  • Mease, D. and Wyner, A. (2007). Evidence contrary to the statistical view of boosting. J. Machine Learning Research. To appear.
  • Ridgeway, G. (1999). The state of boosting. Comput. Sci. Statistics 31 172–181.
  • Ridgeway, G. (2000). Discussion of “Additive logistic regression: A statistical view of boosting,” by J. Friedman, T. Hastie and R. Tibshirani. Ann. Statist. 28 393–400.
  • Schapire, R. E. and Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning 37 297–226.
  • Wyner, A. (2003). On boosting and the exponential loss. In Proceedings of the Ninth International Workshop on Artificial Intellingence and Statistics.