Function estimation/approximation is viewed from the perspective
of numerical optimization in function space, rather than parameter space. A
connection is made between stagewise additive expansions and steepest-descent
minimization. A general gradient descent “boosting” paradigm is
developed for additive expansions based on any fitting criterion.Specific
algorithms are presented for least-squares, least absolute deviation, and
Huber-M loss functions for regression, and multiclass logistic likelihood for
classification. Special enhancements are derived for the particular case where
the individual additive components are regression trees, and tools for
interpreting such “TreeBoost” models are presented. Gradient
boosting of regression trees produces competitive, highly robust, interpretable
procedures for both regression and classification, especially appropriate for
mining less than clean data. Connections between this approach and the boosting
methods of Freund and Shapire and Friedman, Hastie and Tibshirani are
discussed.
References
Becker, R. A. and Cleveland, W. S (1996). The design and control ofTrellis display. J. Comput. Statist. Graphics 5 123-155.
Breiman, L. (1997). Pasting bites together for prediction in large data sets and on-line. Technical report, Dept. Statistics, Univ. California, Berkeley.
Breiman, L. (1999). Prediction games and arcing algorithms. Neural Comp. 11 1493-1517.
Breiman, L., Friedman, J. H., Olshen, R. and Stone, C. (1983). Classification and Regression Trees. Wadsworth, Belmont, CA.
Copas, J. B. (1983). Regression, prediction, and shrinkage (with discussion). J. Roy. Statist. Soc. Ser. B 45 311-354.
Donoho, D. L. (1993). Nonlinear wavelete methods for recovery of signals, densities, and spectra from indirect and noisy data. In Different Perspectives on Wavelets. Proceedings of Symposium in Applied Mathematics (I. Daubechies, ed.) 47 173-205. Amer. Math. Soc., Providence RI.
Drucker, H. (1997). Improving regressors using boosting techniques. Proceedings of Fourteenth International Conference on Machine Learning (D. Fisher, Jr., ed.) 107-115. MorganKaufmann, San Francisco.
Duffy, N. and Helmbold, D. (1999). A geometric approach to leveraging weak learners. In Computational Learning Theory. Proceedings of 4th European Conference EuroCOLT99 (P. Fischer and H. U. Simon, eds.) 18-33. Springer, New York.
Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference 148-156. Morgan Kaufman, San Francisco.
Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1-141.
Friedman J. H., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: a statistical view ofboosting (with discussion). Ann. Statist. 28 337-407.
Griffin, W. L., Fisher, N. I., Friedman J. H., Ryan, C. G. and O'Reilly, S. (1999). Cr-Pyrope garnets in lithospheric mantle. J. Petrology. 40 679-704.
Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London.
Huber, P. (1964). Robust estimation ofa location parameter. Ann. Math. Statist. 35 73-101.
Mallat, S. and Zhang, Z. (1993). Matching pursuits with time frequency dictionaries. IEEE Trans. Signal Processing 41 3397-3415.
Powell, M. J. D. (1987). Radial basis functions for multivariate interpolation: a review. In Algorithms for Approximation (J. C. Mason and M. G. Cox, eds.) 143-167. Clarendon Press, Oxford.
Ratsch, G., Onoda, T. and Muller, K. R. (1998). Soft margins for AdaBoost. NeuroCOLT Technical Report NC-TR-98-021.
Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge Univ. Press.
Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). Learning representations by backpropagating errors. Nature 323 533-536.
Schapire, R. and Singer, Y. (1998). Improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory. ACM, New York.
Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer, New York.
Warner, J. R., Toronto, A. E., Veasey, L. R. and Stephenson, R. (1961). A mathematical model for medical diagnosis-application to congenital heart disease. J. Amer. Med. Assoc. 177 177-184.