The Annals of Statistics

Consistency of data-driven histogram methods for density estimation and classification

Gábor Lugosi and Andrew Nobel
Source: Ann. Statist. Volume 24, Number 2 (1996), 687-706.

Abstract

We present general sufficient conditions for the almost sure $L_1$-consistency of histogram density estimates based on data-dependent partitions. Analogous conditions guarantee the almost-sure risk consistency of histogram classification schemes based on data-dependent partitions. Multivariate data are considered throughout.

In each case, the desired consistency requires shrinking cells, subexponential growth of a combinatorial complexity measure and sublinear growth of the number of cells. It is not required that the cells of every partition be rectangles with sides parallel to the coordinate axis or that each cell contain a minimum number of points. No assumptions are made concerning the common distribution of the training vectors.

We apply the results to establish the consistency of several known partitioning estimates, including the $k_n$-spacing density estimate, classifiers based on statistically equivalent blocks and classifiers based on multivariate clustering schemes.

First Page: Show Hide
Primary Subjects: 62G07
Secondary Subjects: 62H30
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1032894460
Mathematical Reviews number (MathSciNet): MR1394983
Digital Object Identifier: doi:10.1214/aos/1032894460
Zentralblatt MATH identifier: 0859.62040

References

ANDERSON, T. W. 1966. Some nonparametric multivariate procedures based on statistically Z. equivalent blocks. In Multivariate Analy sis P. R. Krishnaiah, ed. 5 27. Academic Press, New York. Z.
Mathematical Reviews (MathSciNet): MR214250
Zentralblatt MATH: 0245.62054
BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R. A. and STONE, C. J. 1984. Classification and Regression Trees. Wadsworth, Belmont, CA. Z.
Mathematical Reviews (MathSciNet): MR86b:62101
Zentralblatt MATH: 0541.62042
CHEN, X. R. and ZHAO, L. C. 1987. Almost sure L -norm convergence for data-based histogram 1 density estimates. J. Multivariate Anal. 21 179 188. Z.
Zentralblatt MATH: 0614.62044
Mathematical Reviews (MathSciNet): MR877850
Digital Object Identifier: doi:10.1016/0047-259X(87)90106-0
COVER, T. M. 1965. Geometrical and statistical properties of sy stems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers 14 326 334. Z.
CSISZAR, I. and KORNER, J. 1981. Information Theory: Coding Theorems for Discrete Memory´ ¨ less Sy stems. Academic Press, New York.
DEVROy E, L. 1988. Automatic pattern recognition: a study of the probability of error. IEEE Transactions on Pattern Analy sis and Machine Intelligence 10 530 543. Z.
DEVROy E, L. and Gy ORFI, L. 1983. Distribution-free exponential bound on the L error of ¨ 1 partitioning estimates of a regression function. In Proceedings of the Fourth PannonZ ian Sy mposium on Mathematical Statistics F. Konecny, J. Mogy orodi and W. Wertz, ´. eds. 67 76. Akademiai Kiado, Budapest, Hungary. ´ ´ Z.
DEVROy E, L. and Gy ORFI, L. 1985. Nonparametric Density Estimation: The L View. Wiley, New ¨ 1 York. Z.
Mathematical Reviews (MathSciNet): MR86i:62065
Zentralblatt MATH: 0546.62015
GERSHO, A. and GRAY, R. M. 1992. Vector Quantization and Signal Compression. Kluwer, Boston.Z.
GESSAMAN, M. P. 1970. A consistent nonparametric multivariate density estimator based on statistically equivalent blocks. Ann. Math. Statist. 41 1344 1346. Z.
GORDON, L. and OLSHEN, R. A. 1978. Asy mptotically efficient solutions to the classification problem. Ann. Statist. 6 515 533. Z.
Mathematical Reviews (MathSciNet): MR468035
Zentralblatt MATH: 0437.62056
Digital Object Identifier: doi:10.1214/aos/1176344197
Project Euclid: euclid.aos/1176344197
GORDON, L. and OLSHEN, R. A. 1980. Consistent nonparametric regression from recursive partitioning schemes. J. Multivariate Anal. 10 611 627. Z.
Zentralblatt MATH: 0453.62035
Mathematical Reviews (MathSciNet): MR599694
Digital Object Identifier: doi:10.1016/0047-259X(80)90074-3
GORDON, L. and OLSHEN, R. 1984. Almost surely consistent nonparametric regression from recursive partitioning schemes. J. Multivariate Anal. 15 147 163. Z.
Zentralblatt MATH: 0542.62032
Mathematical Reviews (MathSciNet): MR763592
Digital Object Identifier: doi:10.1016/0047-259X(84)90022-8
HARTIGAN, J. A. 1975. Clustering Algorithms. Wiley, New York. Z.
Mathematical Reviews (MathSciNet): MR53:9518
Zentralblatt MATH: 0372.62040
LINDER, T., LUGOSI, G. and ZEGER, K. 1994. Rates of convergence in the source coding theorem, empirical quantizer design, and universal lossy source coding. IEEE Trans. Inform. Theory 40 1728 1740. Z.
Mathematical Reviews (MathSciNet): MR1322387
Digital Object Identifier: doi:10.1109/18.340451
LUGOSI, G. and NOBEL, A. B. 1993. Consistency of data-driven histogram methods for density estimation and classification. Technical Report UIUC-BI-93-01, Beckman Institute, Univ. Illinois, Urbana Champaign. Z.
MAHALANOBIS, P. C. 1961. A method of fractile graphical analysis. Sankhy a Ser. A 23 41 64. Z.
Mathematical Reviews (MathSciNet): MR31:4105
Zentralblatt MATH: 0097.34905
NOBEL, A. 1995. Recursive partitioning to reduce distortion. Technical Report UIUC-BI-95-01, Beckman Institute, Univ. Illinois, Urbana Champaign. Z.
NOBEL, A. 1996. Histogram regression estimation using data-dependent partitions. Ann. Statist. Z. 24 3. Z.
Mathematical Reviews (MathSciNet): MR97k:62099
Zentralblatt MATH: 0862.62038
Digital Object Identifier: doi:10.1214/aos/1032526958
Project Euclid: euclid.aos/1032526958
PARTHASARATHY, K. R. and BHATTACHARy A, P. K. 1961. Some limit theorems in regression theory. Sankhy a Ser. A 23 91 102. Z.
Mathematical Reviews (MathSciNet): MR24:A3748
Zentralblatt MATH: 0097.34903
PATRICK, E. A. and FISHER, F. P., II 1967. Introduction to the performance of distribution-free conditional risk learning sy stems. Technical Report TR-EE-67-12, Purdue Univ. Z.
POLLARD, D. 1984. Convergence of Stochastic Processes. Springer, New York. Z.
Mathematical Reviews (MathSciNet): MR86i:60074
Zentralblatt MATH: 0544.60045
STONE, C. J. 1977. Consistent nonparametric regression. Ann. Statist. 5 595 645. Z.
Zentralblatt MATH: 0366.62051
Mathematical Reviews (MathSciNet): MR443204
Digital Object Identifier: doi:10.1214/aos/1176343886
Project Euclid: euclid.aos/1176343886
STONE, C. J. 1985. An asy mptotically optimal histogram selection rule. In Proc. Berkeley Conf. Z. in Honor of Jerzy Ney man and Jack Kiefer L. Le Cam and R. A. Olshen, eds. 2 513 520. Wadsworth, Belmont, CA. Z.
Mathematical Reviews (MathSciNet): MR822050
VAN Ry ZIN, J. 1973. A histogram method of density estimation. Communications in Statistics 2 493 506. Z.
Mathematical Reviews (MathSciNet): MR48:7453
Digital Object Identifier: doi:10.1080/03610927308827093
VAPNIK, V. N. and CHERVONENKIS, A. YA. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 264 280. Z.
Zentralblatt MATH: 0247.60005
ZHAO, L. C., KRISHNAIAH, P. R. and CHEN, X. R. 1990. Almost sure L -norm convergence for r data-based histogram estimates. Theory Probab. Appl. 35 396 403.
Zentralblatt MATH: 0721.62037
Mathematical Reviews (MathSciNet): MR1069145
TECHNICAL UNIVERSITY OF BUDAPEST CHAPEL HILL, NORTH CAROLINA 27599-3260 BUDAPEST HUNGARY

2012 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics