The Annals of Statistics
previous :: next

Greedy function approximation: A gradient boosting machine.

Jerome H. Friedman

Source: Ann. Statist. Volume 29, Number 5 (2001), 1189-1232.

Abstract

Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent “boosting” paradigm is developed for additive expansions based on any fitting criterion.Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such “TreeBoost” models are presented. Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.

Primary Subjects: 62-02, 62-07, 62-08, 62G08, 62H30, 68T10
Keywords: Function estimation; boosting; decision trees; robust nonparametric regression

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1013203451
Digital Object Identifier: doi:10.1214/aos/1013203451
Mathematical Reviews number (MathSciNet): MR1873328
Zentralblatt MATH identifier: 01829052

References

Becker, R. A. and Cleveland, W. S (1996). The design and control ofTrellis display. J. Comput. Statist. Graphics 5 123-155.
Breiman, L. (1997). Pasting bites together for prediction in large data sets and on-line. Technical report, Dept. Statistics, Univ. California, Berkeley.
Breiman, L. (1999). Prediction games and arcing algorithms. Neural Comp. 11 1493-1517.
Breiman, L., Friedman, J. H., Olshen, R. and Stone, C. (1983). Classification and Regression Trees. Wadsworth, Belmont, CA.
Copas, J. B. (1983). Regression, prediction, and shrinkage (with discussion). J. Roy. Statist. Soc. Ser. B 45 311-354.
Mathematical Reviews (MathSciNet): MR86a:62104
Donoho, D. L. (1993). Nonlinear wavelete methods for recovery of signals, densities, and spectra from indirect and noisy data. In Different Perspectives on Wavelets. Proceedings of Symposium in Applied Mathematics (I. Daubechies, ed.) 47 173-205. Amer. Math. Soc., Providence RI.
Mathematical Reviews (MathSciNet): MR95k:62099
Drucker, H. (1997). Improving regressors using boosting techniques. Proceedings of Fourteenth International Conference on Machine Learning (D. Fisher, Jr., ed.) 107-115. MorganKaufmann, San Francisco.
Duffy, N. and Helmbold, D. (1999). A geometric approach to leveraging weak learners. In Computational Learning Theory. Proceedings of 4th European Conference EuroCOLT99 (P. Fischer and H. U. Simon, eds.) 18-33. Springer, New York.
Mathematical Reviews (MathSciNet): MR1724977
Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference 148-156. Morgan Kaufman, San Francisco.
Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1-141.
Friedman J. H., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: a statistical view ofboosting (with discussion). Ann. Statist. 28 337-407.
Griffin, W. L., Fisher, N. I., Friedman J. H., Ryan, C. G. and O'Reilly, S. (1999). Cr-Pyrope garnets in lithospheric mantle. J. Petrology. 40 679-704.
Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR92e:62117
Huber, P. (1964). Robust estimation ofa location parameter. Ann. Math. Statist. 35 73-101.
Mathematical Reviews (MathSciNet): MR28:4622
Mallat, S. and Zhang, Z. (1993). Matching pursuits with time frequency dictionaries. IEEE Trans. Signal Processing 41 3397-3415.
Zentralblatt MATH: 0842.94004
Powell, M. J. D. (1987). Radial basis functions for multivariate interpolation: a review. In Algorithms for Approximation (J. C. Mason and M. G. Cox, eds.) 143-167. Clarendon Press, Oxford.
Ratsch, G., Onoda, T. and Muller, K. R. (1998). Soft margins for AdaBoost. NeuroCOLT Technical Report NC-TR-98-021.
Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR98a:68161
Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). Learning representations by backpropagating errors. Nature 323 533-536.
Schapire, R. and Singer, Y. (1998). Improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory. ACM, New York.
Mathematical Reviews (MathSciNet): MR1811573
Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer, New York.
Mathematical Reviews (MathSciNet): MR98a:68159
Warner, J. R., Toronto, A. E., Veasey, L. R. and Stephenson, R. (1961). A mathematical model for medical diagnosis-application to congenital heart disease. J. Amer. Med. Assoc. 177 177-184.
previous :: next

2010 © Institute of Mathematical Statistics