The Annals of Statistics

Analysis of boosting algorithms using the smooth margin function

Cynthia Rudin, Robert E. Schapire, and Ingrid Daubechies
Source: Ann. Statist. Volume 35, Number 6 (2007), 2723-2768.

Abstract

We introduce a useful tool for analyzing boosting algorithms called the “smooth margin function,” a differentiable approximation of the usual margin for boosting algorithms. We present two boosting algorithms based on this smooth margin, “coordinate ascent boosting” and “approximate coordinate ascent boosting,” which are similar to Freund and Schapire’s AdaBoost algorithm and Breiman’s arc-gv algorithm. We give convergence rates to the maximum margin solution for both of our algorithms and for arc-gv. We then study AdaBoost’s convergence properties using the smooth margin function. We precisely bound the margin attained by AdaBoost when the edges of the weak classifiers fall within a specified range. This shows that a previous bound proved by Rätsch and Warmuth is exactly tight. Furthermore, we use the smooth margin to capture explicit properties of AdaBoost in cases where cyclic behavior occurs.

First Page: Show Hide
Primary Subjects: 68W40, 68Q25
Secondary Subjects: 68Q32
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1201012978
Digital Object Identifier: doi:10.1214/009053607000000785
Mathematical Reviews number (MathSciNet): MR2382664
Zentralblatt MATH identifier: 1132.68827

References

Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801--849.
Mathematical Reviews (MathSciNet): MR1635406
Digital Object Identifier: doi:10.1214/aos/1024691079
Project Euclid: euclid.aos/1024691079
Zentralblatt MATH: 0934.62064
Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493--1517.
Caruana, R. and Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. In Proc. Twenty-Third International Conference on Machine Learning 161--168. ACM Press, New York.
Collins, M., Schapire, R. E. and Singer, Y. (2002). Logistic regression, AdaBoost and Bregman distances. Machine Learning 48 253--285.
Zentralblatt MATH: 05686328
Drucker, H. and Cortes, C. (1996). Boosting decision trees. In Advances in Neural Information Processing Systems 8 479--485. MIT Press, Cambridge, MA.
Duffy, N. and Helmbold, D. (1999). A geometric approach to leveraging weak learners. Computational Learning Theory (Nordkirchen, 1999). Lecture Notes in Comput. Sci. 1572 18--33. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR1724977
Digital Object Identifier: doi:10.1007/3-540-49097-3_3
Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119--139.
Mathematical Reviews (MathSciNet): MR1473055
Digital Object Identifier: doi:10.1006/jcss.1997.1504
Zentralblatt MATH: 0880.68103
Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337--407.
Mathematical Reviews (MathSciNet): MR1790002
Digital Object Identifier: doi:10.1214/aos/1016218223
Project Euclid: euclid.aos/1016218223
Zentralblatt MATH: 1106.62323
Grove, A. J. and Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proc. Fifteenth National Conference on Artificial Intelligence 692--699.
Koltchinskii, V. and Panchenko, D. (2005). Complexities of convex combinations and bounding the generalization error in classification. Ann. Statist. 33 1455--1496.
Mathematical Reviews (MathSciNet): MR2166553
Digital Object Identifier: doi:10.1214/009053605000000228
Project Euclid: euclid.aos/1123250220
Zentralblatt MATH: 1080.62045
Kutin, S. (2002). Algorithmic stability and ensemble-based learning. Ph.D. dissertation, Univ. Chicago.
Mason, L., Baxter, J., Bartlett, P. and Frean, M. (2000). Boosting algorithms as gradient descent. In Advances in Neural Information Processing Systems 12 512--518. MIT Press, Cambridge, MA.
Mathematical Reviews (MathSciNet): MR1820960
Meir, R. and Rätsch, G. (2003). An introduction to boosting and leveraging. Advanced Lectures on Machine Learning. Lecture Notes in Comput. Sci. 2600 119--183. Springer, Berlin.
Quinlan, J. R. (1996). Bagging, boosting, and C4.5. In Proc. Thirteenth National Conference on Artificial Intelligence 725--730. AAAI Press, Menlo Park, CA.
Rätsch, G. (2001). Robust boosting via convex optimization: Theory and applications. Ph.D. dissertation, Dept. Computer Science, Univ. Potsdam, Potsdam, Germany.
Rätsch, G., Onoda, T. and Müller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning 42 287--320.
Rätsch, G. and Warmuth, M. (2005). Efficient margin maximizing with boosting. J. Mach. Learn. Res. 6 2131--2152.
Mathematical Reviews (MathSciNet): MR2249883
Reyzin, L. and Schapire, R. E. (2006). How boosting the margin can also boost classifier complexity. In Proc. Twenty-Third International Conference on Machine Learning 753--760. ACM Press, New York.
Rosset, S., Zhu, J. and Hastie, T. (2004). Boosting as a regularized path to a maximum margin classifier. J. Mach. Learn. Res. 5 941--973.
Mathematical Reviews (MathSciNet): MR2248005
Rudin, C. (2004). Boosting, margins and dynamics. Ph.D. dissertation, Princeton Univ.
Rudin, C., Cortes, C., Mohri, M. and Schapire, R. E. (2005). Margin-based ranking meets boosting in the middle. Learning Theory. Lecture Notes in Comput. Sci. 3559 63--78. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR2203254
Zentralblatt MATH: 05034626
Rudin, C., Daubechies, I. and Schapire, R. E. (2004). The dynamics of AdaBoost: Cyclic behavior and convergence of margins. J. Mach. Learn. Res. 5 1557--1595.
Mathematical Reviews (MathSciNet): MR2248027
Rudin, C., Daubechies, I. and Schapire, R. E. (2004). On the dynamics of boosting. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA.
Rudin, C. and Schapire, R. E. (2007). Margin-based ranking and why Adaboost is actually a ranking algorithm. To appear.
Rudin, C., Schapire, R. E. and Daubechies, I. (2004). Boosting based on a smooth margin. Learning Theory. Lecture Notes in Comput. Sci. 3120 502--517. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR2177931
Zentralblatt MATH: 1078.68724
Rudin, C., Schapire, R. E. and Daubechies, I. (2007). Precise statements of convergence for AdaBoost and arc-gv. In Proc. AMS-IMS-SIAM Joint Summer Research Conference: Machine Learning, Statistics, and Discovery 131--145.
Mathematical Reviews (MathSciNet): MR2433289
Zentralblatt MATH: 1141.68722
Schapire, R. E. (2003). The boosting approach to machine learning: An overview. Nonlinear Estimation and Classification. Lecture Notes in Statist. 171 149--171. Springer, New York.
Mathematical Reviews (MathSciNet): MR2005788
Schapire, R. E., Freund, Y., Bartlett, P. and Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651--1686.
Mathematical Reviews (MathSciNet): MR1673273
Digital Object Identifier: doi:10.1214/aos/1024691352
Project Euclid: euclid.aos/1024691352
Zentralblatt MATH: 0929.62069
Zhang, T. and Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Ann. Statist. 33 1538--1579.
Mathematical Reviews (MathSciNet): MR2166555
Digital Object Identifier: doi:10.1214/009053605000000255
Project Euclid: euclid.aos/1123250222
Zentralblatt MATH: 1078.62038

2012 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics