The Annals of Statistics

Greedy function approximation: A gradient boosting machine.

Jerome H. Friedman

Full-text: Open access

Abstract

Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent “boosting” paradigm is developed for additive expansions based on any fitting criterion.Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such “TreeBoost” models are presented. Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.

Article information

Source
Ann. Statist. Volume 29, Number 5 (2001), 1189-1232.

Dates
First available in Project Euclid: 8 February 2002

Permanent link to this document
https://projecteuclid.org/euclid.aos/1013203451

Digital Object Identifier
doi:10.1214/aos/1013203451

Mathematical Reviews number (MathSciNet)
MR1873328

Zentralblatt MATH identifier
1043.62034

Subjects
Primary: 62-02: Research exposition (monographs, survey articles) 62-07: Data analysis 62-08 62G08: Nonparametric regression 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 68T10: Pattern recognition, speech recognition {For cluster analysis, see 62H30}

Keywords
Function estimation boosting decision trees robust nonparametric regression

Citation

Friedman, Jerome H. Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 (2001), no. 5, 1189--1232. doi:10.1214/aos/1013203451. https://projecteuclid.org/euclid.aos/1013203451


Export citation

References

  • Becker, R. A. and Cleveland, W. S (1996). The design and control ofTrellis display. J. Comput. Statist. Graphics 5 123-155.
  • Breiman, L. (1997). Pasting bites together for prediction in large data sets and on-line. Technical report, Dept. Statistics, Univ. California, Berkeley.
  • Breiman, L. (1999). Prediction games and arcing algorithms. Neural Comp. 11 1493-1517.
  • Breiman, L., Friedman, J. H., Olshen, R. and Stone, C. (1983). Classification and Regression Trees. Wadsworth, Belmont, CA.
  • Copas, J. B. (1983). Regression, prediction, and shrinkage (with discussion). J. Roy. Statist. Soc. Ser. B 45 311-354.
  • Donoho, D. L. (1993). Nonlinear wavelete methods for recovery of signals, densities, and spectra from indirect and noisy data. In Different Perspectives on Wavelets. Proceedings of Symposium in Applied Mathematics (I. Daubechies, ed.) 47 173-205. Amer. Math. Soc., Providence RI.
  • Drucker, H. (1997). Improving regressors using boosting techniques. Proceedings of Fourteenth International Conference on Machine Learning (D. Fisher, Jr., ed.) 107-115. MorganKaufmann, San Francisco.
  • Duffy, N. and Helmbold, D. (1999). A geometric approach to leveraging weak learners. In Computational Learning Theory. Proceedings of 4th European Conference EuroCOLT99 (P. Fischer and H. U. Simon, eds.) 18-33. Springer, New York.
  • Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference 148-156. Morgan Kaufman, San Francisco.
  • Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1-141.
  • Friedman J. H., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: a statistical view ofboosting (with discussion). Ann. Statist. 28 337-407.
  • Griffin, W. L., Fisher, N. I., Friedman J. H., Ryan, C. G. and O'Reilly, S. (1999). Cr-Pyrope garnets in lithospheric mantle. J. Petrology. 40 679-704.
  • Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London.
  • Huber, P. (1964). Robust estimation ofa location parameter. Ann. Math. Statist. 35 73-101.
  • Mallat, S. and Zhang, Z. (1993). Matching pursuits with time frequency dictionaries. IEEE Trans. Signal Processing 41 3397-3415.
  • Powell, M. J. D. (1987). Radial basis functions for multivariate interpolation: a review. In Algorithms for Approximation (J. C. Mason and M. G. Cox, eds.) 143-167. Clarendon Press, Oxford.
  • Ratsch, G., Onoda, T. and Muller, K. R. (1998). Soft margins for AdaBoost. NeuroCOLT Technical Report NC-TR-98-021.
  • Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge Univ. Press.
  • Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). Learning representations by backpropagating errors. Nature 323 533-536.
  • Schapire, R. and Singer, Y. (1998). Improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory. ACM, New York.
  • Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer, New York.
  • Warner, J. R., Toronto, A. E., Veasey, L. R. and Stephenson, R. (1961). A mathematical model for medical diagnosis-application to congenital heart disease. J. Amer. Med. Assoc. 177 177-184.