### Empirical geometry of multivariate data: a deconvolution approach

V. I. Koltchinskii
Source: Ann. Statist. Volume 28, Number 2 (2000), 591-629.

#### Abstract

Let $\{Y_j : j = 1,\ldots,n\}$ be independent observations in $\mathbb{R}^m, m \geq 1$ with common distribution $Q$. Suppose that $Y_j = X_j + \xi_j, j = 1,\ldots,n$, where $\{X_j, \xi_j, j = 1,\ldots,n\}$ are independent, $X_ j, j = 1,\ldots,n$ have common distribution $P$ and $\xi_ j, j = 1, \ldots,n$ have common distribution $\mu$, so that $Q = P * \mu$. The problem is to recover hidden geometric structure of the support of $P$ based on the independent observations $Y_j$. Assuming that the distribution of the errors $\mu$ is known, deconvolution statistical estimates of the fractal dimension and the hierarchical cluster tree of the support that converge with exponential rates are suggested. Moreover, the exponential rates of convergence hold for adaptive versions of the estimates even in the case of normal noise $\xi_ j$ with unknown covariance. In the case of the dimension estimation, though, the exponential rate holds only when the set of all possible values of the dimension is finite (e.g., when the dimension is known to be integer). If this set is infinite, the optimal convergence rate of the estimator becomes very slow (typically, logarithmic), even when there is no noise in the observations.

First Page:
Primary Subjects: 62H30, 62H12, 62G07
Secondary Subjects: 62F17
Full-text: Open access

Permanent link to this document: http://projecteuclid.org/euclid.aos/1016218232
Mathematical Reviews number (MathSciNet): MR1790011
Digital Object Identifier: doi:10.1214/aos/1016218232
Zentralblatt MATH identifier: 01828954

### References

Bhattacharya, R. N. and Ranga Rao, R. (1976). Normal Approximation and Asymptotic Expansions. Wiley, New York.
Mathematical Reviews (MathSciNet): MR55:9219
Bock, H-H. (1996). Probability models and hypothesis testing in partitioning cluster analysis. In Clustering and Classification (P. Arabie, L. J. Hubert and G. De Soete, eds.) 377-453, World Scientific, River Edge, NJ.
Bollob´as, B. (1979). Graph Theory: An Introductory Course. Springer, New York.
Mathematical Reviews (MathSciNet): MR80j:05053
Carroll, R. J. and Hall, P. (1988). Optimal rates of convergence for deconvolving a density. J. Amer.Statist.Assoc. 83 1184-1186.
Mathematical Reviews (MathSciNet): MR90g:62076
Chencov, N. N. (1972). Statistical Decision Rules and Optimal Inference. Nauka, Moscow. (In Russian).
Cuevas, A. and Fraiman, R. (1997). A plug-in approach to support estimation. Ann.Statist. 25 2300-2312.
Mathematical Reviews (MathSciNet): MR99m:62040
Dacunha-Castelle, D. and Gassiat, E. (1997). The estimation of the order of a mixture model. Bernoulli 3 279-299.
Mathematical Reviews (MathSciNet): MR98j:62019
Zentralblatt MATH: 0889.62012
Edgar, G. A. (1997). Integral, Probability and Fractal Measures. Springer, New York.
Efromovich, S. (1997). Density estimation for the case of supersmooth measurement error. J. Amer.Statist.Assoc. 92 526-535.
Mathematical Reviews (MathSciNet): MR98h:62048
Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann.Statist. 19 1257-1272.
Zentralblatt MATH: 0729.62033
Fan, J. (1992). Deconvolution with supersmooth distributions. Canad.J.Statist. 20 155-169.
Zentralblatt MATH: 0754.62020
Falconer, K. (1997). Techniques in Fractal Geometry. Wiley, New York.
Mathematical Reviews (MathSciNet): MR99f:28013
Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, 2nd ed. Academic Press, Boston.
Mathematical Reviews (MathSciNet): MR91i:68131
Goldfarb, L. (1985). A new approach to pattern recognition. In Progress in Pattern Recognition 2 (L. N. Kanal and A. Rosenfeld, eds.). North-Holland, Amsterdam.
Mathematical Reviews (MathSciNet): MR88f:68124
Gordon, A. D. (1996). Hierarchical classification In Clustering and Classification (P. Arabie, L. J. Hubert and G. De Soete, eds.) 65-121. World Scientific, River Edge, NJ.
Grenander, U. (1981). Lectures in Pattern Theory, 3. Regular Structures. Springer, New York.
Hartigan, J. A. (1975). Clustering Algorithms. Wiley, New York.
Mathematical Reviews (MathSciNet): MR53:9518
Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice-Hall, Englewood, New Jersey.
Jambu, M. (1978). Classification automatique pour l'analyse des donn´ees 1. Dunod, Paris.
Mathematical Reviews (MathSciNet): MR83e:62079a
Kolmogorov, A. N. and Tikhomirov, V. M. (1959). -entropy and -capacity of the sets in functional spaces. Uspehi Matem.Nauk 14 3-86.
Mathematical Reviews (MathSciNet): MR22:2890
Korostelev, A. P. and Tsybakov, A. B. (1993). Minimax theory of image reconstruction. Lecture Notes in Statist. 82. Springer, New York.
Mathematical Reviews (MathSciNet): MR95a:62028
Korostelev, A. P., Simar, L. and Tsybakov, A. B. (1995). Efficient estimation of monotone boundaries. Ann.Statist.23 476-489.
Mathematical Reviews (MathSciNet): MR96d:62060
Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Springer, New York.
Mathematical Reviews (MathSciNet): MR93c:60001
Mammen, E. and Tsybakov, A. B. (1995). Asymptotical minmax recovery of sets with smooth boundaries. Ann.Statist.23 502-524.
Pollard, D. (1982). A central limit theorems for k-means clustering. Ann.Probab.10 919-926.
Mathematical Reviews (MathSciNet): MR84c:60047
Zentralblatt MATH: 0502.62055
Polonik, W. (1995). Measuring mass concentrations and estimating density contour clusters: an excess mass approach. Ann.Statist.23 855-882.
Mathematical Reviews (MathSciNet): MR96i:62042
Stefanski, L. and Carroll, R. J. (1990). Deconvoluting kernel density estimators. Statistics 2 169-184.
Mathematical Reviews (MathSciNet): MR91j:62049
Zentralblatt MATH: 0697.62035
van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. With Applications to Statistics. Springer, New York.
Mathematical Reviews (MathSciNet): MR97g:60035
Wyse, N., Dubes, R. and Jain, A. K. (1980). A critical evaluation of intrinsic dimensionality algorithms. In Pattern Recognition in Practice (E. S. Gelsema and L. N. Kanal, eds.). North-Holland, Amsterdam.
Zhang, C. H. (1990). Fourier methods for estimating mixing densities and distributions. Ann. Statist. 18 806-831.
Zentralblatt MATH: 0778.62037