The Annals of Statistics

Classification algorithms using adaptive partitioning

Peter Binev, Albert Cohen, Wolfgang Dahmen, and Ronald DeVore

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Algorithms for binary classification based on adaptive tree partitioning are formulated and analyzed for both their risk performance and their friendliness to numerical implementation. The algorithms can be viewed as generating a set approximation to the Bayes set and thus fall into the general category of set estimators. In contrast with the most studied tree-based algorithms, which utilize piecewise constant approximation on the generated partition [IEEE Trans. Inform. Theory 52 (2006) 1335–1353; Mach. Learn. 66 (2007) 209–242], we consider decorated trees, which allow us to derive higher order methods. Convergence rates for these methods are derived in terms the parameter $\alpha$ of margin conditions and a rate $s$ of best approximation of the Bayes set by decorated adaptive partitions. They can also be expressed in terms of the Besov smoothness $\beta$ of the regression function that governs its approximability by piecewise polynomials on adaptive partition. The execution of the algorithms does not require knowledge of the smoothness or margin conditions. Besov smoothness conditions are weaker than the commonly used Hölder conditions, which govern approximation by nonadaptive partitions, and therefore for a given regression function can result in a higher rate of convergence. This in turn mitigates the compatibility conflict between smoothness and margin parameters.

Article information

Ann. Statist. Volume 42, Number 6 (2014), 2141-2163.

First available in Project Euclid: 20 October 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62M45: Neural nets and related approaches 65D05: Interpolation 68Q32: Computational learning theory [See also 68T05] 97N50: Interpolation and approximation

Binary classification adaptive methods set estimators tree-based algorithms


Binev, Peter; Cohen, Albert; Dahmen, Wolfgang; DeVore, Ronald. Classification algorithms using adaptive partitioning. Ann. Statist. 42 (2014), no. 6, 2141--2163. doi:10.1214/14-AOS1234.

Export citation


  • [1] Akakpo, N. (2012). Adaptation to anisotropy and inhomogeneity via dyadic piecewise polynomial selection. Math. Methods Statist. 21 1–28.
  • [2] Audibert, J.-Y. and Tsybakov, A. B. (2007). Fast learning rates for plug-in classifiers. Ann. Statist. 35 608–633.
  • [3] Binev, P., Cohen, A., Dahmen, W. and DeVore, R. (2007). Universal algorithms for learning theory. II. Piecewise polynomial functions. Constr. Approx. 26 127–152.
  • [4] Binev, P., Cohen, A., Dahmen, W. and DeVore, R. (2014). Supplement to “Classification algorithms using adaptive partitioning.” DOI:10.1214/14-AOS1234SUPP.
  • [5] Blanchard, G. and Massart, P. (2006). Discussion: “Local Rademacher complexities and oracle inequalities in risk minimization” [Ann. Statist. 34 (2006), no. 6, 2593–2656; 2329442] by V. Koltchinskii. Ann. Statist. 34 2664–2671.
  • [6] Blanchard, G., Schäfer, C., Rozenholc, Y. and Müller, K.-R. (2007). Optimal dyadic decision trees. Mach. Learn. 66 209–242.
  • [7] Boucheron, S., Bousquet, O. and Lugosi, G. (2005). Theory of classification: A survey of some recent advances. ESAIM Probab. Stat. 9 323–375.
  • [8] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
  • [9] Cohen, A., Dahmen, W., Daubechies, I. and DeVore, R. (2001). Tree approximation and optimal encoding. Appl. Comput. Harmon. Anal. 11 192–226.
  • [10] DeVore, R. A. (1998). Nonlinear approximation. In Acta Numerica, 1998. Acta Numer. 7 51–150. Cambridge Univ. Press, Cambridge.
  • [11] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Applications of Mathematics (New York) 31. Springer, New York.
  • [12] Györfy, L., Kohler, M., Krzyzak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, Berlin.
  • [13] Massart, P. and Nédélec, É. (2006). Risk bounds for statistical learning. Ann. Statist. 34 2326–2366.
  • [14] Scott, C. and Nowak, R. D. (2006). Minimax-optimal classification with dyadic decision trees. IEEE Trans. Inform. Theory 52 1335–1353.
  • [15] Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135–166.

Supplemental materials

  • Supplementary material: Proof of Theorem 2.1. This supplement contains the detailed proof of Theorem 2.1 [4].