## The Annals of Statistics

- Ann. Statist.
- Volume 26, Number 3 (1998), 801-1214

### Arcing classifier (with discussion and a rejoinder by the author)

**Full-text: Open access**

#### Abstract

Recent work has shown that combining multiple versions of unstable classifiers such as trees or neural nets results in reduced test set error. One of the more effective is bagging. Here, modified training sets are formed by resampling from the original training set, classifiers constructed using these training sets and then combined by voting. Freund and Schapire propose an algorithm the basis of which is to adaptively resample and combine (hence the acronym “arcing”) so that the weights in the resampling are increased for those cases most often misclassified and the combining is done by weighted voting. Arcing is more successful than bagging in test set error reduction. We explore two arcing algorithms, compare them to each other and to bagging, and try to understand how arcing works. We introduce the definitions of bias and variance for a classifier as components of the test set error. Unstable classifiers can have low bias on a large range of data sets. Their problem is high variance. Combining multiple versions either through bagging or arcing reduces variance significantly.

#### Article information

**Source**

Ann. Statist. Volume 26, Number 3 (1998), 801-849.

**Dates**

First available: 21 June 2002

**Permanent link to this document**

http://projecteuclid.org/euclid.aos/1024691079

**Mathematical Reviews number (MathSciNet)**

MR1635406

**Digital Object Identifier**

doi:10.1214/aos/1024691079

**Subjects**

Primary: 62H30

**Keywords**

Ensemble methods decision trees neural networks bagging boosting error-correcting output coding Markov chain Monte Carlo

#### Citation

Breiman, Leo. Arcing classifier (with discussion and a rejoinder by the author). The Annals of Statistics 26 (1998), no. 3, 801--849. doi:10.1214/aos/1024691079. http://projecteuclid.org/euclid.aos/1024691079.

#### References

- ALI, K. 1995. Learning probabilistic relational concept descriptions. Ph.D. dissertation, Dept. Computer Science, Univ. California, Irvine. Z.
- BREIMAN, L. 1996a. Bagging predictors. Machine Learning 26 123 140. Z. Mathematical Reviews (MathSciNet): MR2002i:01022

Digital Object Identifier: doi: 10.1214/ss/1009213290

Project Euclid: euclid.ss/1009213290 - BREIMAN, L. 1996b. The heuristics of instability in model selection. Ann. Statist. 24 2350 2383. Z. Mathematical Reviews (MathSciNet): MR1425957

Zentralblatt MATH: 0867.62055

Digital Object Identifier: doi: 10.1214/aos/1032181158

Project Euclid: euclid.aos/1032181158 - BREIMAN, L., FRIEDMAN, J., OLSHEN, R. and STONE, C. 1984. Classification and Regression Trees. Chapman and Hall, London. Z.
- DRUCKER, H. and CORTES, C. 1996. Boosting decision trees. Advances in Neural Information Processing Sy stems 8 479 485. Z.
- FREUND, Y. and SCHAPIRE, R. 1996. Experiments with a new boosting algorithm. In Machine Z. Learning: Proceedings of the Thirteenth International Conference L. Saitta, ed. 148 156. Morgan Kaufmann, San Francisco. Z.
- FREUND, Y. and SCHAPIRE, R. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Sy stem Sci. 55 119 139.Mathematical Reviews (MathSciNet): MR99g:68172

Zentralblatt MATH: 0880.68103

Digital Object Identifier: doi: 10.1006/jcss.1997.1504 - FRIEDMAN, J. H. 1996. On bias, variance, 0 1-loss, and the curse of dimensionality. Journal of Knowledge Discovery and Data Mining. To appear. Z.
- GEMAN, S., BIENENSTOCK, E. and DOURSAT, R. 1992. Neural networks and the bias variance dilemma. Neural Computations 4 1 58. Z.
- HASTIE, T. and TIBSHIRANI, R. 1994. Handwritten digit recognition via deformable prototy pes. Unpublished manuscript. Available at ftp stat.stanford.edu pub hastie/ zip.ps.Z. Z.
- KEARNS, M. and VALIANT, L. G. 1988. Learning Boolean formulae or finite automata is as hard as factoring. Technical Report TR-14-88, Aiken Computation Laboratory, Harvard Univ. Z.
- KEARNS, M. and VALIANT, L. G. 1989. Cry ptographic limitations on learning Boolean formulae and finite automata. In Proceedings of the Twenty-First Annual ACM Sy mposium on Theory of Computing. 433 444. ACM Press, New York. Z.
- KOHAVI, R. and WOLPERT, D. H. 1996. Bias plus variance decomposition for zero-one loss functions. In Machine Learning: Proceedings of the Thirteenth International ConferZ. ence. L. Saitta, ed. 275 283. Morgan Kaufmann, San Francisco. Z.
- KONG, E. B. and DIETTERICH, T. G. 1995. Error-correcting output coding corrects bias and variance. In Proceedings of the Twelfth International Conference on Machine Learning Z. A. Prieditis and S. Russell, eds. 313 321. Morgan Kaufmann, San Francisco.
- LE CUN, Y., BOSER, B., DENKER, J., HENDERSON, D., HOWARD, R., HUBBARD, W. and JACKEL, L. Z. 1990. Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Sy stems 2 396 404. Z.
- MICHIE, D., SPIEGELHALTER, D. and TAy LOR, C. 1994. Machine Learning, Neural and Statistical Classification. Ellis Horwood, London. Z.
- QUINLAN, J. R. 1996. Bagging, Boosting, and C4.5. In Proceedings of AAAI '96 National Conference on Artificial Intelligence 725 730. Z.
- SCHAPIRE, R. 1990. The strength of weak learnability. Machine Learning 5 197 227. Z.
- SIMARD, P., LE CUN, Y. and DENKER, J. 1993. Efficient pattern recognition using a new transformation distance. Advances in Neural Information Processing Sy stems 5 50 58. Z.
- TIBSHIRANI, R. 1996. Bias, variance, and prediction error for classification rules. Technical Report, Dept. Statistics, Univ. Toronto. Z.
- VAPNIK, V. 1995. The Nature of Statistical Learning Theory. Springer, New York.Mathematical Reviews (MathSciNet): MR98a:68159
- BERKELEY, CALIFORNIA 94720-3860 E-MAIL: leo@stat.berkeley.edu
- 1 BREIMAN, L. 1996. Bagging predictors. Machine Learning 26 123 140.Zentralblatt MATH: 0858.68080
- 2 BREIMAN, L. 1996. The heuristics of instability in model selection. Ann. Statist. 24 2350 2383.Mathematical Reviews (MathSciNet): MR1425957

Zentralblatt MATH: 0867.62055

Digital Object Identifier: doi: 10.1214/aos/1032181158

Project Euclid: euclid.aos/1032181158 - 3 DRUCKER, H. and CORTES, C. 1996. Boosting decision trees. Advances in Neural Information Processing Sy stems 8 479 485.
- 4 FLOy D, S. and WARMUTH, M. 1995. Sample compression, learnability, and the Vapnik Chervonenkis dimension. Machine Learning 21 269 304.
- 5 FREUND, Y. 1995. Boosting a weak learning algorithm by majority. Inform. and Comput. 121 256 285.Mathematical Reviews (MathSciNet): MR96h:68166

Zentralblatt MATH: 0833.68109

Digital Object Identifier: doi: 10.1006/inco.1995.1136 - 6 FREUND, Y. and SCHAPIRE, R. E. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Sy stem Sci. 55 119 139.Mathematical Reviews (MathSciNet): MR99g:68172

Zentralblatt MATH: 0880.68103

Digital Object Identifier: doi: 10.1006/jcss.1997.1504 - 7 KEARNS, M. and VALIANT, L. G. 1994. Cry ptographic limitations on learning Boolean formulae and finite automata. J. Assoc. Comput. Mach. 4 67 95.Mathematical Reviews (MathSciNet): MR1369194

Zentralblatt MATH: 0807.68073

Digital Object Identifier: doi: 10.1145/174644.174647 - 8 KOHAVI, R. and WOLPERT, D. H. 1996. Bias plus variance decomposition for zero-one loss functions. In Machine Learning: Proceedings of the Thirteenth International Z. ConferenceL. Saitta, ed. 275 283. Morgan Kaufmann, San Francisco.
- 9 KONG, E. B. and DIETTERICH, T. G. 1995. Error-correcting output coding corrects bias and variance. In Proceedings of the Twelfth International Conference on Machine Learning Z. A. Prieditis and S. Russell, eds. 313 321. Morgan Kaufmann, San Francisco.
- 10 QUINLAN, J. R. 1996. Bagging, boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence 725 730.
- 11 QUINLAN, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco.
- 12 SCHAPIRE, R. E. 1990. The strength of weak learnability. Machine Learning 5 197 227.
- 13 SCHAPIRE, R. E., FREUND, Y., BARTLETT, P. and LEE, W. S. 1998. Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26. To appear.Mathematical Reviews (MathSciNet): MR2000f:62151

Zentralblatt MATH: 0929.62069

Digital Object Identifier: doi: 10.1214/aos/1024691352

Project Euclid: euclid.aos/1024691352 - 14 TIBSHIRANI, R. 1996. Bias, variance and prediction error for classification rules. Technical Report, Univ. Toronto.
- 15 VALIANT, L. G. 1994. A theory of the learnable. Communications of the ACM 27 1134 1142.
- 16 VAPNIK, V. N. 1982. Estimation of Dependences Based on Empirical Data. Springer, New York.Mathematical Reviews (MathSciNet): MR84a:62043
- FLORHAM PARK, NEW JERSEY 07932-0971 E-MAIL: yoav@research.att.com schapire@research.att.com
- Q I, where X is one component of a fixed-length feature vector and m Ä X c4 i i c is a constant. One could of course use multivariate functions or ``trans-gen erated features'' 7. Suppose we examine some of the queries by constructing a single binary tree TT by the usual data-driven induction method: stepwise entropy reduction estimated from a training set LL. Since we cannot entertain all possible splits at each node, we exploit a natural partial ordering on the set Q and examine only a tiny fraction of them. Basically we incrementally grow the geometric arrangements as we proceed down the tree. The classifier based on Z. Z. TT is then C Q, LL arg max P Y j TT. If the depths of the leaves of TT j Z. are far smaller than M, then evidently C Q, LL is not the Bay es classifier. However, for depths on the order of hundreds or thousands we could expect that P Y j TT P Y j Q, Z. Z. Z. the difference in some appropriate norm being a kind of ``approximation error.'' Of course, we cannot actually create or store a tree of such depth, and Z the best classification rate we obtained with a single tree of average depth. around ten was about 90% on test sets similar to the one discussed by Leo Breiman.Mathematical Reviews (MathSciNet): MR1902488
- TT, P Y k TT Y j, 1 n, m N. Naturally, small covariances lead to n m small errors. More generally, if the trees are produced with some sampling mechanism from the population of trees, involving either resampling from LL or random restrictions on the queries, then the quantities above can be analyzed by taking expectations relative to the space of trees. In regard to arcing, probably the deterministic reweighting on misclassified examples produces new trees which are quite different from the previous ones. Moreover, the errors induced on data points which were correctly classified by the existing trees are sufficiently randomized to avoid any sy stematic deterioration.
- 1 AMIT, Y. and GEMAN, D. 1994. Randomized inquiries about shape; an application to handwritten digit recognition. Technical Report 401, Univ. Chicago.
- 2 AMIT, Y. and GEMAN, D. 1997. Shape quantization and recognition with randomized trees. Neural Computation 9 1545 1588.
- 3 AMIT, Y., GEMAN, D. and JEDy NAK, B. 1998. Efficient focusing and face detection. In Face Z. Recognition: From Theory to Applications H. Wechsler and J. Phillips, eds. Springer, Berlin.
- 4 AMIT, Y., GEMAN, D. and WILDER, K. 1997. Joint induction of shape features and tree classifiers. IEEE Trans. PAMI 19 1300 1306.
- 5 BREIMAN, L., FRIEDMAN, J., OLSHEN, R. and STONE, C. 1984. Classification and Regression Trees. Wadsworth, Belmont, CA.
- 6 FREUND, Y. and SCHAPIRE, R. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Sy stem Sci. 55 119 139.Mathematical Reviews (MathSciNet): MR99g:68172

Zentralblatt MATH: 0880.68103

Digital Object Identifier: doi: 10.1006/jcss.1997.1504 - 7 FRIEDMAN, J. H. 1973. A recursive partitioning decision rule for nonparametric classification. IEEE Trans. Comput. 26 404 408.
- 8 GEMAN, S., BIENENSTOCK, E. and DOURSAT, R. 1992. Neural networks and the bias variance dilemma. Neural Computation 4 1 58.
- 9 SHLIEN, S. 1990. Multiple binary decision tree classifiers. Pattern Recognition 23 757 763.
- 10 WILDER, K. 1998. Decision tree algorithms for handwritten digit recognition. Ph.D. dissertation, Univ. Massachusetts, Amherst.
- CHICAGO, ILLINOIS 60637 UNIVERSITY OF MASSACHUSETTS E-MAIL: amit@galton.uchicago.edu AMHERST, MASSACHUSETTS 01003 E-MAIL: geman@math.umass.edu
- ALI, K. M. and PAZZANI, M. J. 1996. Error reduction through learning multiple descriptions. Machine Learning 24 173 202.
- CHERKAUER, K. J. 1996. Human expert-level performance on a scientific image analysis task by a sy stem using combined artificial neural networks. In Working Notes of the AAAI Z. Workshop on Integrating Multiple Learned Models P. Chan, ed. 15 21. AAAI Press, Menlo Park, CA. Z.
- DIETTERICH, T. G. and BAKIRI, G. 1995. Solving multiclass learning problems via error-correcting output codes. J. Artificial Intelligence Res. 2 263 286. Z. Zentralblatt MATH: 0900.68358
- DIETTERICH, T. G. and KONG, E. B. 1995. Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Technical Report, Dept. Computer Science, Oregon State Univ., Corvallis, Oregon. Available from ftp: ftp.cs.orst. edu pub tgd papers tr-bias.ps gz. Z.
- FREUND, Y. and SCHAPIRE, R. E. 1996. Experiments with a new boosting algorithm. In ProceedZ. ings of the Thirteenth International Conference on Machine Learning L. Saitta, ed. 148 156. Morgan Kaufmann, San Francisco. Z.
- HASHEM, S. 1993. Optimal linear combinations of neural networks. Ph.D. dissertation, School of Industrial Engineering, Purdue Univ., Lafay ette, IN. Z.
- KONG, E. B. and DIETTERICH, T. G. 1995. Error-correcting output coding corrects bias and Z variance. In Twelfth International Conference on Machine Learning A. Prieditis and. S. Russell, eds. 313 321. Morgan Kaufmann, San Francisco. Z.
- MACKAY, D. 1992. A practical bay esian framework for backpropagation networks. Neural Computation 4 448 472. Z.
- NEAL, R. 1993. Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, Dept. Computer Science, Univ. Toronto. Z.
- PERRONE, M. P. and COOPER, L. N. 1993. When networks disagree: ensemble methods for Z hy brid neural networks. In Neural Networks for Speech and Image Processing R. J.. Mammone, ed. 126 142. Chapman and Hall, London. Z.
- QUINLAN, J. R. 1993. C4.5: Programs for Empirical Learning. Morgan Kaufmann, San Francisco.Z.
- SCHAPIRE, R. E. 1997. Using output codes to boost multiclass learning problems. In Proceedings of the Fourteenth International Conference on Machine Learning 313 321. Morgan Kaufmann, San Francisco.
- CORVALLIS, OREGON 97331-3202 E-MAIL: tgd@cs.orst.edu
- AMIT, Y. and GEMAN, D. 1997. Shape quantization and recognition with randomized trees. Neural Computation 9 1545 1588. Z.
- BREIMAN, L. 1996c. Out-of-bag estimation. Available at ftp.stat users breimanas OOBestimation. Z. Mathematical Reviews (MathSciNet): MR2002i:01022

Digital Object Identifier: doi: 10.1214/ss/1009213290

Project Euclid: euclid.ss/1009213290 - BREIMAN, L. 1997. Prediction games and arcing algorithms. Technical Report 504, Dept. Statistics, Univ. California, Berkeley. Available at www.stat.berkeley.edu. Z. URL: Link to item
- DIETTERICH, T. 1998. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Machine Learning 1 22. Z.
- FREUND, Y. and SCHAPIRE, R. 1996. Experiments with a new boosting algorithm. In Machine Z. Learning: Proceedings of the Thirteenth International Conference L. Saitta, ed. 148 156. Morgan Kaufmann, San Francisco. Z.
- JI, C. and MA. S. 1997. Combinations of weak classifiers. IEEE Trans. Neural Networks 8 32 42. Z.
- KONG, E. B. and DIETTERICH, T. G. 1995. Error-correcting output coding corrects bias and variance. In Proceedings of the Twelfth International Conference on Machine Learning Z. A. Prieditis and S. Russell, eds. 313 321. Morgan Kaufmann, San Francisco. Z.
- SCHAPIRE, R., FREUND, Y., BARTLETT, P. and LEE, W. S. 1998. Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26. To appear. Z. Mathematical Reviews (MathSciNet): MR2000f:62151

Zentralblatt MATH: 0929.62069

Digital Object Identifier: doi: 10.1214/aos/1024691352

Project Euclid: euclid.aos/1024691352 - VAPNIK, V. N. 1995. The Nature of Statistical Learning Theory. Springer, New York.Mathematical Reviews (MathSciNet): MR98a:68159
- BERKELEY, CALIFORNIA 94720-3860 E-MAIL: leo@stat.berkeley.edu

### More like this

- Empirical Margin Distributions and Bounding the Generalization
Error of Combined Classifiers

Koltchinskii, V. and Panchenko, D., The Annals of Statistics, 2002 - Boosting the margin: a new explanation for the effectiveness of
voting methods

Bartlett, Peter, Freund, Yoav, Lee, Wee Sun, and Schapire, Robert E., The Annals of Statistics, 1998 - Three papers on boosting: an introduction

Koltchinskii, Vladimir and Yu, Bin, The Annals of Statistics, 2004

- Empirical Margin Distributions and Bounding the Generalization
Error of Combined Classifiers

Koltchinskii, V. and Panchenko, D., The Annals of Statistics, 2002 - Boosting the margin: a new explanation for the effectiveness of
voting methods

Bartlett, Peter, Freund, Yoav, Lee, Wee Sun, and Schapire, Robert E., The Annals of Statistics, 1998 - Three papers on boosting: an introduction

Koltchinskii, Vladimir and Yu, Bin, The Annals of Statistics, 2004 - Additive logistic regression: a statistical view of boosting (With
discussion and a rejoinder by the authors)

Friedman, Jerome, Hastie, Trevor, and Tibshirani, Robert, The Annals of Statistics, 2000 - Analyzing bagging

Bühlmann, Peter and Yu, Bin, The Annals of Statistics, 2002 - The false discovery rate for statistical pattern recognition

Scott, Clayton, Bellala, Gowtham, and Willett, Rebecca, Electronic Journal of Statistics, 2009 - Semi-parametric dynamic time series modelling
with applications to detecting neural dynamics

Rigat, Fabio and Smith, Jim Q., The Annals of Applied Statistics, 2009 - Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis

Wu, C. F. J., The Annals of Statistics, 1986 - Bounding the generalization error of convex combinations of classifiers: balancing the dimensionality and the margins

Koltchinskii, Vladimir, Panchenko, Dmitriy, and Lozano, Fernando, The Annals of Applied Probability, 2003 - A concrete statistical realization of Kleinberg's stochastic discrimination for pattern recognition. Part I. Two-class classification

Chen, Dechang, Huang, Peng, and Cheng, Xiuzhen, The Annals of Statistics, 2003