## The Annals of Statistics

### Classification algorithms using adaptive partitioning

#### Abstract

Algorithms for binary classification based on adaptive tree partitioning are formulated and analyzed for both their risk performance and their friendliness to numerical implementation. The algorithms can be viewed as generating a set approximation to the Bayes set and thus fall into the general category of set estimators. In contrast with the most studied tree-based algorithms, which utilize piecewise constant approximation on the generated partition [IEEE Trans. Inform. Theory 52 (2006) 1335–1353; Mach. Learn. 66 (2007) 209–242], we consider decorated trees, which allow us to derive higher order methods. Convergence rates for these methods are derived in terms the parameter $\alpha$ of margin conditions and a rate $s$ of best approximation of the Bayes set by decorated adaptive partitions. They can also be expressed in terms of the Besov smoothness $\beta$ of the regression function that governs its approximability by piecewise polynomials on adaptive partition. The execution of the algorithms does not require knowledge of the smoothness or margin conditions. Besov smoothness conditions are weaker than the commonly used Hölder conditions, which govern approximation by nonadaptive partitions, and therefore for a given regression function can result in a higher rate of convergence. This in turn mitigates the compatibility conflict between smoothness and margin parameters.

#### Article information

Source
Ann. Statist., Volume 42, Number 6 (2014), 2141-2163.

Dates
First available in Project Euclid: 20 October 2014

https://projecteuclid.org/euclid.aos/1413810724

Digital Object Identifier
doi:10.1214/14-AOS1234

Mathematical Reviews number (MathSciNet)
MR3269976

Zentralblatt MATH identifier
1310.62074

#### Citation

Binev, Peter; Cohen, Albert; Dahmen, Wolfgang; DeVore, Ronald. Classification algorithms using adaptive partitioning. Ann. Statist. 42 (2014), no. 6, 2141--2163. doi:10.1214/14-AOS1234. https://projecteuclid.org/euclid.aos/1413810724

#### References

• [1] Akakpo, N. (2012). Adaptation to anisotropy and inhomogeneity via dyadic piecewise polynomial selection. Math. Methods Statist. 21 1–28.
• [2] Audibert, J.-Y. and Tsybakov, A. B. (2007). Fast learning rates for plug-in classifiers. Ann. Statist. 35 608–633.
• [3] Binev, P., Cohen, A., Dahmen, W. and DeVore, R. (2007). Universal algorithms for learning theory. II. Piecewise polynomial functions. Constr. Approx. 26 127–152.
• [4] Binev, P., Cohen, A., Dahmen, W. and DeVore, R. (2014). Supplement to “Classification algorithms using adaptive partitioning.” DOI:10.1214/14-AOS1234SUPP.
• [5] Blanchard, G. and Massart, P. (2006). Discussion: “Local Rademacher complexities and oracle inequalities in risk minimization” [Ann. Statist. 34 (2006), no. 6, 2593–2656; 2329442] by V. Koltchinskii. Ann. Statist. 34 2664–2671.
• [6] Blanchard, G., Schäfer, C., Rozenholc, Y. and Müller, K.-R. (2007). Optimal dyadic decision trees. Mach. Learn. 66 209–242.
• [7] Boucheron, S., Bousquet, O. and Lugosi, G. (2005). Theory of classification: A survey of some recent advances. ESAIM Probab. Stat. 9 323–375.
• [8] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
• [9] Cohen, A., Dahmen, W., Daubechies, I. and DeVore, R. (2001). Tree approximation and optimal encoding. Appl. Comput. Harmon. Anal. 11 192–226.
• [10] DeVore, R. A. (1998). Nonlinear approximation. In Acta Numerica, 1998. Acta Numer. 7 51–150. Cambridge Univ. Press, Cambridge.
• [11] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Applications of Mathematics (New York) 31. Springer, New York.
• [12] Györfy, L., Kohler, M., Krzyzak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, Berlin.
• [13] Massart, P. and Nédélec, É. (2006). Risk bounds for statistical learning. Ann. Statist. 34 2326–2366.
• [14] Scott, C. and Nowak, R. D. (2006). Minimax-optimal classification with dyadic decision trees. IEEE Trans. Inform. Theory 52 1335–1353.
• [15] Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135–166.