Source: Ann. Statist. Volume 35, Number 6
(2007), 2723-2768.
We introduce a useful tool for analyzing boosting algorithms called the “smooth margin function,” a differentiable approximation of the usual margin for boosting algorithms. We present two boosting algorithms based on this smooth margin, “coordinate ascent boosting” and “approximate coordinate ascent boosting,” which are similar to Freund and Schapire’s AdaBoost algorithm and Breiman’s arc-gv algorithm. We give convergence rates to the maximum margin solution for both of our algorithms and for arc-gv. We then study AdaBoost’s convergence properties using the smooth margin function. We precisely bound the margin attained by AdaBoost when the edges of the weak classifiers fall within a specified range. This shows that a previous bound proved by Rätsch and Warmuth is exactly tight. Furthermore, we use the smooth margin to capture explicit properties of AdaBoost in cases where cyclic behavior occurs.
References
Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801--849.
Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493--1517.
Caruana, R. and Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. In Proc. Twenty-Third International Conference on Machine Learning 161--168. ACM Press, New York.
Collins, M., Schapire, R. E. and Singer, Y. (2002). Logistic regression, AdaBoost and Bregman distances. Machine Learning 48 253--285.
Drucker, H. and Cortes, C. (1996). Boosting decision trees. In Advances in Neural Information Processing Systems 8 479--485. MIT Press, Cambridge, MA.
Duffy, N. and Helmbold, D. (1999). A geometric approach to leveraging weak learners. Computational Learning Theory (Nordkirchen, 1999). Lecture Notes in Comput. Sci. 1572 18--33. Springer, Berlin.
Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119--139.
Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337--407.
Grove, A. J. and Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proc. Fifteenth National Conference on Artificial Intelligence 692--699.
Koltchinskii, V. and Panchenko, D. (2005). Complexities of convex combinations and bounding the generalization error in classification. Ann. Statist. 33 1455--1496.
Kutin, S. (2002). Algorithmic stability and ensemble-based learning. Ph.D. dissertation, Univ. Chicago.
Mason, L., Baxter, J., Bartlett, P. and Frean, M. (2000). Boosting algorithms as gradient descent. In Advances in Neural Information Processing Systems 12 512--518. MIT Press, Cambridge, MA.
Meir, R. and Rätsch, G. (2003). An introduction to boosting and leveraging. Advanced Lectures on Machine Learning. Lecture Notes in Comput. Sci. 2600 119--183. Springer, Berlin.
Quinlan, J. R. (1996). Bagging, boosting, and C4.5. In Proc. Thirteenth National Conference on Artificial Intelligence 725--730. AAAI Press, Menlo Park, CA.
Rätsch, G. (2001). Robust boosting via convex optimization: Theory and applications. Ph.D. dissertation, Dept. Computer Science, Univ. Potsdam, Potsdam, Germany.
Rätsch, G., Onoda, T. and Müller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning 42 287--320.
Rätsch, G. and Warmuth, M. (2005). Efficient margin maximizing with boosting. J. Mach. Learn. Res. 6 2131--2152.
Reyzin, L. and Schapire, R. E. (2006). How boosting the margin can also boost classifier complexity. In Proc. Twenty-Third International Conference on Machine Learning 753--760. ACM Press, New York.
Rosset, S., Zhu, J. and Hastie, T. (2004). Boosting as a regularized path to a maximum margin classifier. J. Mach. Learn. Res. 5 941--973.
Rudin, C. (2004). Boosting, margins and dynamics. Ph.D. dissertation, Princeton Univ.
Rudin, C., Cortes, C., Mohri, M. and Schapire, R. E. (2005). Margin-based ranking meets boosting in the middle. Learning Theory. Lecture Notes in Comput. Sci. 3559 63--78. Springer, Berlin.
Rudin, C., Daubechies, I. and Schapire, R. E. (2004). The dynamics of AdaBoost: Cyclic behavior and convergence of margins. J. Mach. Learn. Res. 5 1557--1595.
Rudin, C., Daubechies, I. and Schapire, R. E. (2004). On the dynamics of boosting. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA.
Rudin, C. and Schapire, R. E. (2007). Margin-based ranking and why Adaboost is actually a ranking algorithm. To appear.
Rudin, C., Schapire, R. E. and Daubechies, I. (2004). Boosting based on a smooth margin. Learning Theory. Lecture Notes in Comput. Sci. 3120 502--517. Springer, Berlin.
Rudin, C., Schapire, R. E. and Daubechies, I. (2007). Precise statements of convergence for AdaBoost and arc-gv. In Proc. AMS-IMS-SIAM Joint Summer Research Conference: Machine Learning, Statistics, and Discovery 131--145.
Schapire, R. E. (2003). The boosting approach to machine learning: An overview. Nonlinear Estimation and Classification. Lecture Notes in Statist. 171 149--171. Springer, New York.
Schapire, R. E., Freund, Y., Bartlett, P. and Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651--1686.
Zhang, T. and Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Ann. Statist. 33 1538--1579.