The Annals of Statistics

Tree-structured regression and the differentiation of integrals

Richard A. Olshen

Full-text: Open access


This paper provides answers to questions regarding the almost sure limiting behavior of rooted, binary tree-structured rules for regression. Examples show that questions raised by Gordon and Olshen in 1984 have negative answers. For these examples of regression functions and sequences of their associated binary tree-structured approximations, for all regression functions except those in a set of the first category, almost sure consistency fails dramatically on events of full probability. One consequence is that almost sure consistency of binary tree-structured rules such as CART requires conditions beyond requiring that (1) the regression function be in ℒ1, (2) partitions of a Euclidean feature space be into polytopes with sides parallel to coordinate axes, (3) the mesh of the partitions becomes arbitrarily fine almost surely and (4) the empirical learning sample content of each polytope be “large enough.” The material in this paper includes the solution to a problem raised by Dudley in discussions. The main results have a corollary regarding the lack of almost sure consistency of certain Bayes-risk consistent rules for classification.

Article information

Ann. Statist., Volume 35, Number 1 (2007), 1-12.

First available in Project Euclid: 6 June 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 26B05: Continuity and differentiation questions 28A15: Abstract differentiation theory, differentiation of set functions [See also 26A24] 62G08: Nonparametric regression 62C12: Empirical decision procedures; empirical Bayes procedures

Binary tree-structured partitions regression maximal functions differentiation of integrals


Olshen, Richard A. Tree-structured regression and the differentiation of integrals. Ann. Statist. 35 (2007), no. 1, 1--12. doi:10.1214/009053606000001000.

Export citation


  • Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA. Since 1993 this book has been published by Chapman and Hall, New York.
  • Busemann, H. and Feller, W. (1934). Zur Differentiation der Lebesgueschen Integrale. Fund. Math. 22 226–256.
  • de Guzmán, M. (1975). Differentiation of Integrals in R$^n$. Lecture Notes in Math. 481. Springer, Berlin.
  • Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • Devroye, L. and Krzyżak, A. (2002). New multivariate product density estimators. J. Multivariate Anal. 82 88–110.
  • Donoho, D. L. (1997). CART and best-ortho-basis: A connection. Ann. Statist. 25 1870–1911.
  • Gersho, A. and Gray, R. M. (1992). Vector Quantization and Signal Compression. Kluwer, Dordrecht.
  • Gordon, L. and Olshen, R. A. (1984). Almost surely consistent nonparametric regression from recursive partitioning schemes. J. Multivariate Anal. 15 147–163.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York.
  • Lugosi, G. and Nobel, A. (1996). Consistency of data-driven histogram methods for density estimation and classification. Ann. Statist. 24 687–706.
  • Nobel, A. (1996). Histogram regression estimation using data-dependent partitions. Ann. Statist. 24 1084–1105.
  • Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge Univ. Press.
  • Saks, S. (1934). Remarks on the differentiability of the Lebesgue indefinite integral. Fund. Math. 22 257–261.
  • Stone, C. J. (1977). Consistent nonparametric regression (with discussion). Ann. Statist. 5 595–645.
  • Zhang, H. and Singer, B. (1999). Recursive Partitioning in the Health Sciences. Springer, New York.