A central limit theorem for multivariate generalized trimmed $k$-means



The Annals of Statistics

A central limit theorem for multivariate generalized trimmed $k$-means

Luis A. Garc{\'\i}a-Escudero, Alfonso Gordaliza, and Carlos Matrán

Source: Ann. Statist. Volume 27, Number 3 (1999), 1061-1079.

Abstract

A central limit theorem for generalized trimmed $k$-means is obtained in a very general framework that covers the multivariate setting, general penalty functions and general $k \geq 1$. Several applications, including the location estimator case $(k = 1)$ for elliptical distributions and the construction of multivariate (not necessarily connected) tolerance zones, are also given.

Primary Subjects: 60F05, 62H20
Secondary Subjects: 62G35, 62G15
Keywords: Central limit theorem; k-means; clustering; impartial trimming; robustness; tolerance zones

Full-text: Access granted (open access)

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1018031268
Mathematical Reviews number (MathSciNet): MR1724041
Digital Object Identifier: doi:10.1214/aos/1018031268

References

Baddeley, A. (1977). Integrals of a moving manifold and geometrical probability. Adv. in Appl. Probab. 9 588-603.
Butler, R. W. (1982). Nonparametric interval and point prediction using data trimmed by a Grubbs-type outlier rule. Ann. Statist. 10 197-204.
Mathematical Reviews (MathSciNet): MR83i:62101
Zentralblatt MATH: 0487.62040
Butler, R. W., Davies, P. L. and Jhun, M. (1993). Asymptotics for the minimum covarince determinant estimator. Ann. Statist. 21 1385-1400.
Cuesta-Albertos, J. A., Gordaliza, A. and Matr´an, C. (1997). Trimmed k-means: an attempt to robustify quantizers. Ann. Statist. 25 553-576.
Mathematical Reviews (MathSciNet): MR98a:62083
Cuesta-Albertos, J. A., Gordaliza, A. and Matr´an, C. (1998). Trimmed best k-nets: a robustified version of a L -based clustering method. Statist. Probab. Lett. 36 401-413.
Cuevas, A. and Fraiman, R. (1997). A plug-in approach to support estimation. Ann. Statist. 25 2300-2312.
Mathematical Reviews (MathSciNet): MR99m:62040
Zentralblatt MATH: 0897.62034
Davies, P.L. (1987). Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices. Ann. Statist. 15 1269-1292.
Mathematical Reviews (MathSciNet): MR88i:62095
Zentralblatt MATH: 0645.62057
Fleischer, P. (1964). Sufficient conditions for achieving minimum distorsion in a quantizer. IEEE Int. Conv. Rec. 104-111.
Garc´ia-Escudero, L. A. and Gordaliza, A. (1999). Robustness properties of k-means and trimmed k-means. J. Amer. Statist. Assoc. To appear.
Mathematical Reviews (MathSciNet): MR2000g:62150
Garc´ia-Escudero, L. A., Gordaliza, A. and Matr´an, C. (1997). Asymptotics for trimmed kmeans and associated tolerance zones. J. Statist. Plann. Inference 77 247-262. Gordaliza, A. (1991a). Best approximations to random variables based on trimming procedures. J. Approx. Theory 64 162-180. Gordaliza, A. (1991b). On the breakdown point of multivariate location estimators based on trimming procedures. Statist. Probab. Lett. 11 387-394.
Hartigan, J. A. (1978). Asymptotic distribution for clustering criteria. Ann. Statist. 6 117-131.
Mathematical Reviews (MathSciNet): MR57:1764
H ¨ossjer, O. (1994). Rank-based estimates in the linear model with high breakdown point. J. Amer. Statist. Assoc. 89 149-158.
Mathematical Reviews (MathSciNet): MR94k:62104
Huber, P. J. (1967). The behavior of maximum likelihood estimators under non-standard conditions. Proc. Fifth Berkeley Symp. Math. Statist. Probab. 1 221-233. Univ. California press, Berkeley.
Hyndman, R. J. (1996). Computing and graphing highest density regions. Amer. Statist. 50 120- 126.
Kim, J. and Pollard, D. (1990). Cube root asymptotics. Ann. Statist. 18 191-219.
Mathematical Reviews (MathSciNet): MR91f:62059
Zentralblatt MATH: 0703.62063
Li, L. and Flury, B. (1995). Uniqueness of principal points for univariate distributions. Statist. Probab. Lett. 25 323-327.
Mathematical Reviews (MathSciNet): MR96g:62015
Zentralblatt MATH: 0837.62017
Mili, M. and Coakley, C. (1996). Robust estimation in structured linear regression. Ann. Statist. 24 2593-2607.
Mathematical Reviews (MathSciNet): MR97k:62066
Zentralblatt MATH: 0867.62040
Pollard, D. (1981). Strong consistency of k-means clustering. Ann. Statist. 9 135-140.
Mathematical Reviews (MathSciNet): MR83c:62098
Zentralblatt MATH: 0451.62048
Pollard, D. (1982). A central limit theorem for k-means clustering. Ann. Probab. 10 919-926.
Mathematical Reviews (MathSciNet): MR84c:60047
Zentralblatt MATH: 0502.62055
Rousseeuw, P. J. (1983). Multivariate estimation with high breakdown point. In Proceedings of the Fourth Pannonian Symposium on Mathematical Statistics (W. Grossman, G. Plufg, I. Vincze and W. Werttz, eds.) B 283-297. Reidel, Dordrecht.
Rousseeuw, P. J. (1984). Least median of squares regression. J. Amer. Statist. Assoc. 79 871-880.
Mathematical Reviews (MathSciNet): MR86d:62113
Zentralblatt MATH: 0551.62049
Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection. Wiley, New York.
Mathematical Reviews (MathSciNet): MR89e:62043
Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York.
Serinko, R. J. and Babu, G. J. (1992). Weak limit theorems for univariate k-means clustering inder nonregular conditions. J. Multivariate Anal. 49 188-203.
Stute, W. and Zhu, L. X. (1995). Asymptotics of k-means clustering based on projection pursuit. Sankhy¯a 57 462-471.
Mathematical Reviews (MathSciNet): MR97g:62086
Zentralblatt MATH: 0857.62064
Tableman, M. (1994). The asymptotics of the least trimmed absolute deviation (LTAD) estimator. Statist. Probab. Lett. 19 387-398.
Mathematical Reviews (MathSciNet): MR95b:62039
Van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Wiley, New York.
Mathematical Reviews (MathSciNet): MR97g:60035
Vandev, D. L. and Neykov, N. M. (1993). Robust maximum likelihood in the Gaussian case. In New Directions in Statistical Data Analysis and Robustness (S. Morgenthaler, E. Ronchetti and W. A. Stahel, eds.). Birkh¨auser, Basel.
Mathematical Reviews (MathSciNet): MR95c:62043
Yohai, V. and Maronna, R. (1976). Location estimators based on linear combinations of modified order statistics. Comm. Statist. Theory Methods 5 481-486.
Mathematical Reviews (MathSciNet): MR55:9405
Zentralblatt MATH: 0337.62032

2008 © Institute of Mathematical Statistics