The Annals of Statistics
previous :: next

Boosting the margin: a new explanation for the effectiveness of voting methods

Peter Bartlett, Yoav Freund, Wee Sun Lee, and Robert E. Schapire
Source: Ann. Statist. Volume 26, Number 5 (1998), 1651-1686.

Abstract

One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of margins of the training examples with respect to the generated voting classification rule, where the margin of an example is simply the difference between the number of correct votes and the maximum number of votes received by any incorrect label. We show that techniques used in the analysis of Vapnik’s support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. We also show theoretically and experimentally that boosting is especially effective at increasing the margins of the training examples. Finally, we compare our explanation to those based on the bias-variance decomposition.

First Page: Show Hide
Primary Subjects: 62H30
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1024691352
Mathematical Reviews number (MathSciNet): MR1673273
Digital Object Identifier: doi:10.1214/aos/1024691352
Zentralblatt MATH identifier: 0929.62069

References

1 BARRON, A. R. 1993. Universal approximation bounds for superposition of a sigmoidal function. IEEE Trans. Inform. Theory 39 930 945.
Mathematical Reviews (MathSciNet): MR94h:92001
Digital Object Identifier: doi:10.1109/18.256500
2 BARTLETT, P. L. 1998. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans. Inform. Theory 44 525 536.
Mathematical Reviews (MathSciNet): MR98i:68235
Digital Object Identifier: doi:10.1109/18.661502
3 BAUER, E. and KOHAVI, R. 1997. An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Unpublished manuscript.
4 BAUM, E. B. and HAUSSLER, D. 1989. What size net gives valid generalization? Neural Computation 1 151 160.
Mathematical Reviews (MathSciNet): MR94a:11191
Zentralblatt MATH: 0743.68083
Digital Object Identifier: doi:10.1007/BF01810851
5 BOSER, B. E., GUy ON, I. M. and VAPNIK, V. N. 1992. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory 144 152. ACM, New York.
6 BREIMAN, L. 1996. Bagging predictors. Machine Learning 24 123 140.
Zentralblatt MATH: 0858.68080
Mathematical Reviews (MathSciNet): MR1425957
Digital Object Identifier: doi:10.1214/aos/1032181158
Project Euclid: euclid.aos/1032181158
7 BREIMAN, L. 1997. Prediction games and arcing classifiers. Technical Report 504, Dept. Statistics, Univ. California, Berkeley.
Mathematical Reviews (MathSciNet): MR1635406
Zentralblatt MATH: 0934.62064
Digital Object Identifier: doi:10.1214/aos/1024691079
Project Euclid: euclid.aos/1024691079
8 BREIMAN, L. 1998. Arcing classifiers with discussion. Ann. Statist. 26 801 849.
Zentralblatt MATH: 0934.62064
Mathematical Reviews (MathSciNet): MR1635406
Digital Object Identifier: doi:10.1214/aos/1024691079
Project Euclid: euclid.aos/1024691079
9 BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R. A. and STONE, C. J. 1984. Classification and Regression Trees. Wadsworth, Belmont, CA.
Mathematical Reviews (MathSciNet): MR86b:62101
Zentralblatt MATH: 0541.62042
10 CORTES, C. and VAPNIK, V. 1995. Support-vector networks. Machine Learning 20 273 297.
Zentralblatt MATH: 0831.68098
11 DEVROy E, L. 1982. Bounds for the uniform deviation of empirical measures. J. Multivariate Anal. 12 72 79.
Mathematical Reviews (MathSciNet): MR83f:62091
Zentralblatt MATH: 0492.60006
Digital Object Identifier: doi:10.1016/0047-259X(82)90083-5
12 DIETTERICH, T. G. 1998. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Unpublished manuscript.
13 DIETTERICH, T. G. and BAKIRI, G. 1995. Solving multiclass learning problems via error-correcting output codes. J. Artificial Intelligence Res. 2 263 286.
Zentralblatt MATH: 0900.68358
14 DONAHUE, M. J., GURVITS, L., DARKEN, C. and SONTAG, E. 1997. Rates of convex approximation in non-Hilbert spaces. Constr. Approx. 13 187 220.
Mathematical Reviews (MathSciNet): MR98c:41051
Zentralblatt MATH: 0876.41016
15 DRUCKER, H. 1997. Improving regressors using boosting techniques. In Machine Learning: Proceedings of the Fourteenth International Conference 107 115. Morgan Kauffman, San Francisco.
16 DRUCKER, H. and CORTES, C. 1996. Boosting decision trees. Advances in Neural Information Processing Sy stems 8 479 485.
17 FREUND, Y. 1995. Boosting a weak learning algorithm by majority. Inform. Comput. 121 256 285.
Mathematical Reviews (MathSciNet): MR96h:68166
Zentralblatt MATH: 0833.68109
Digital Object Identifier: doi:10.1006/inco.1995.1136
18 FREUND, Y. and SCHAPIRE, R. E. 1996. Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference 148 156. Morgan Kauffman, San Francisco.
19 FREUND, Y. and SCHAPIRE, R. E. 1996. Game theory, on-line prediction and boosting. In Proceedings of the Ninth Annual Conference on Computational Learning Theory 325 332.
20 FREUND, Y. and SCHAPIRE, R. E. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Sy stem Sci. 55 119 139.
Mathematical Reviews (MathSciNet): MR99g:68172
Zentralblatt MATH: 0880.68103
Digital Object Identifier: doi:10.1006/jcss.1997.1504
21 FREUND, Y. and SCHAPIRE, R. E. 1998. Adaptive game playing using multiplicative weights. Games and Economic Behavior. To appear.
Mathematical Reviews (MathSciNet): MR1729311
Zentralblatt MATH: 0964.91007
Digital Object Identifier: doi:10.1006/game.1999.0738
22 FRIEDMAN, J. H. 1998. On bias, variance, 0 1-loss, and the curse-of-dimensionality. Available at http: stat.stanford.edu jhf.
23 GROVE, A. J. and SCHUURMANS, D. 1998. Boosting in the limit: maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artificial Intelligence. AAAI Press, Meno Park, NJ.
24 JONES, L. K. 1992. A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training. Ann. Statist. 20 608 613.
Mathematical Reviews (MathSciNet): MR93b:62089
Zentralblatt MATH: 0746.62060
Digital Object Identifier: doi:10.1214/aos/1176348546
Project Euclid: euclid.aos/1176348546
25 KOHAVI, R. and WOLPERT, D. H. 1996. Bias plus variance decomposition for zero-one loss functions. In Machine Learning: Proceedings of the Thirteenth International Conference 275 283. Morgan Kauffman, San Francisco.
26 KONG, E. B. and DIETTERICH, T. G. 1995. Error-correcting output coding corrects bias and variance. In Proceedings of the Twelfth International Conference on Machine Learning 313 321. Morgan Kauffman, San Francisco.
SCHAPIRE, FREUND, BARTLETT AND LEE 1686
Mathematical Reviews (MathSciNet): MR1884092
Zentralblatt MATH: 1013.68175
Digital Object Identifier: doi:10.1162/15324430152733133
27 LEE, W. S., BARTLETT, P. L. and WILLIAMSON, R. C. 1996. Efficient agnostic learning of neural networks with bounded fan-in. IEEE Trans. Inform. Theory 42 2118 2132.
Mathematical Reviews (MathSciNet): MR98c:68184
Zentralblatt MATH: 0874.68253
Digital Object Identifier: doi:10.1109/18.485715
28 LEE, W. S., BARTLETT, P. L. and WILLIAMSON, R. C. 1998. The importance of convexity in learning with squared loss. IEEE Trans. Inform. Theory. To appear.
Mathematical Reviews (MathSciNet): MR99k:68160
Zentralblatt MATH: 0935.68091
Digital Object Identifier: doi:10.1109/18.705577
29 MACLIN, R. and OPITZ, D. 1997. An empirical evaluation of bagging and boosting. In Proceedings of the Fourteenth National Conference on Artificial Intelligence 546 551.
30 MERZ, C. J. and MURPHY, P. M. 1998. UCI repository of machine learning databases. Available at http: www.ics.uci.edu mlearn MLRepository.html.
31 QUINLAN, J. R. 1996. Bagging, boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence 725 730.
32 QUINLAN, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo.
33 SAUER, N. 1972. On the density of families of sets. J. Combin. Theory Ser. A 13 145 147.
Mathematical Reviews (MathSciNet): MR46:7017
Zentralblatt MATH: 0248.05005
Digital Object Identifier: doi:10.1016/0097-3165(72)90019-2
34 SCHAPIRE, R. E. 1990. The strength of weak learnability. Machine Learning 5 197 227.
35 SCHAPIRE, R. E. 1997. Using output codes to boost multiclass learning problems. In Machine Learning: Proceedings of the Fourteenth International Conference 313 321. Morgan Kauffman, San Francisco.
36 SCHAPIRE, R. E. and SINGER, Y. 1998. Improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory.
Mathematical Reviews (MathSciNet): MR1811573
Zentralblatt MATH: 0945.68194
37 SCHWENK, H. and BENGIO, Y. 1998. Training methods for adaptive boosting of neural networks for character recognition. Advances in Neural Information Processing Sy stems 10 647 653. MIT Press.
40 TIBSHIRANI, R. 1996. Bias, variance and prediction error for classification rules. Technical Report, Univ. Toronto.
41 VAPNIK, V. N. 1995. The Nature of Statistical Learning Theory. Springer, New York.
Mathematical Reviews (MathSciNet): MR98a:68159
42 VAPNIK, V. N. and CHERVONENKIS, A. YA. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 264 280.
Zentralblatt MATH: 0247.60005
Mathematical Reviews (MathSciNet): MR301855
FLORHAM PARK, NEW JERSEY 07932-0971 FLORHAM PARK, NEW JERSEY 07932-0971 E-MAIL: schapire@research.att.com E-MAIL: yoav@research.att.com
RSISE, AUSTRALIAN NATIONAL UNIVERSITY UNIVERSITY COLLEGE UNSW
CANBERRA, ACT 0200 AUSTRALIAN DEFENCE FORCE ACADEMY AUSTRALIA CANBERRA ACT 2600 E-MAIL: Peter.Bartlett@anu.edu.au AUSTRALIA E-MAIL: w-lee@ee.adfa.oz.au
previous :: next

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?