In this paper, we investigate the theoretical and empirical properties of L2 boosting with kernel regression estimates as weak learners. We show that each step of L2 boosting reduces the bias of the estimate by two orders of magnitude, while it does not deteriorate the order of the variance. We illustrate the theoretical findings by some simulated examples. Also, we demonstrate that L2 boosting is superior to the use of higher-order kernels, which is a well-known method of reducing the bias of the kernel estimate.
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription.
Read more about accessing full-text
References
Bickel, P., Ritov, Y. and Zakai, A. (2006). Some theory for generalized boosting algorithms. J. Mach. Learn. Res. 7 705–732.
Breiman, L. (1998). Arcing classifier. Ann. Statist. 26 801–824.
Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computations 11 1493–1517.
Bühlmann, P. and Yu, B. (2003). Boosting with the L2 loss: Regression and classification. J. Amer. Statist. Assoc. 98 324–339.
Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. London: Chapman and Hall.
Freund, Y. (1995). Boosting a weak learning algorithm by majority. Info. Comp. 121 256–285.
Freund, Y. and Schapire, R.E. (1996). Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference 148–156.
Freund, Y. and Schapire, R.E. (1997). A decision-theoretic generalization of online learning and application to boosting. J. Comput. System Sci. 55 119–139.
Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. Ann. Statist. 29 1189–1232.
Friedman, J.H., Hastie, T and Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting. Ann. Statist. 28 337–407.
Györfi, L., Kohler, M., Krzyzak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. New York: Springer.
Mason, L., Baxter, J., Bartlett, P. and Frean, M. (2000). Functional gradient techniques for combining hypotheses. In Advances in Large Margin Classifiers (A.J. Smola, P.J. Bartlett, B. Schölkopf and D. Schuurmans, eds.). Cambridge, MA: MIT Press.
Ruppert, D., Sheather, S.J. and Wand, M.P. (1995). An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc. 90 1257–1270.
Schapire, R.E. (1990). The strength of weak learnability. Mach. Learn. 5 197–227.
Schapire, R.E., Freund, Y., Bartlett, P. and Lee, W.S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651–1686.
Schapire, R.E. and Singer, Y. (1999). Improved boosting algorithm using confidence-rated prediction. Mach. Learn. 37 297–336.
Stützle, W. and Mittal, Y. (1979). Some comments on the asymptotic behavior of robust smoother. In Smoothing Techniques for Curve Estimation. Lecture Notes in Mathematics 757 191–195. Berlin: Springer.
Tukey, J.W. (1977). Exploratory Data Analysis. Reading, MA: Addison-Wesley.