Annals of Statistics

A central limit theorem for multivariate generalized trimmed $k$-means

Luis A. Garc{\'\i}a-Escudero, Alfonso Gordaliza, and Carlos Matrán

Full-text: Open access

Abstract

A central limit theorem for generalized trimmed $k$-means is obtained in a very general framework that covers the multivariate setting, general penalty functions and general $k \geq 1$. Several applications, including the location estimator case $(k = 1)$ for elliptical distributions and the construction of multivariate (not necessarily connected) tolerance zones, are also given.

Article information

Source
Ann. Statist., Volume 27, Number 3 (1999), 1061-1079.

Dates
First available in Project Euclid: 5 April 2002

Permanent link to this document
https://projecteuclid.org/euclid.aos/1018031268

Digital Object Identifier
doi:10.1214/aos/1018031268

Mathematical Reviews number (MathSciNet)
MR1724041

Zentralblatt MATH identifier
0984.62042

Subjects
Primary: 60F05: Central limit and other weak theorems 62H20: Measures of association (correlation, canonical correlation, etc.)
Secondary: 62G35: Robustness 62G15: Tolerance and confidence regions

Keywords
Central limit theorem k-means clustering impartial trimming robustness tolerance zones

Citation

Garc{\'\i}a-Escudero, Luis A.; Gordaliza, Alfonso; Matrán, Carlos. A central limit theorem for multivariate generalized trimmed $k$-means. Ann. Statist. 27 (1999), no. 3, 1061--1079. doi:10.1214/aos/1018031268. https://projecteuclid.org/euclid.aos/1018031268


Export citation

References

  • Baddeley, A. (1977). Integrals of a moving manifold and geometrical probability. Adv. in Appl. Probab. 9 588-603.
  • Butler, R. W. (1982). Nonparametric interval and point prediction using data trimmed by a Grubbs-type outlier rule. Ann. Statist. 10 197-204.
  • Butler, R. W., Davies, P. L. and Jhun, M. (1993). Asymptotics for the minimum covarince determinant estimator. Ann. Statist. 21 1385-1400.
  • Cuesta-Albertos, J. A., Gordaliza, A. and Matr´an, C. (1997). Trimmed k-means: an attempt to robustify quantizers. Ann. Statist. 25 553-576.
  • Cuesta-Albertos, J. A., Gordaliza, A. and Matr´an, C. (1998). Trimmed best k-nets: a robustified version of a L -based clustering method. Statist. Probab. Lett. 36 401-413.
  • Cuevas, A. and Fraiman, R. (1997). A plug-in approach to support estimation. Ann. Statist. 25 2300-2312.
  • Davies, P.L. (1987). Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices. Ann. Statist. 15 1269-1292.
  • Fleischer, P. (1964). Sufficient conditions for achieving minimum distorsion in a quantizer. IEEE Int. Conv. Rec. 104-111.
  • Garc´ia-Escudero, L. A. and Gordaliza, A. (1999). Robustness properties of k-means and trimmed k-means. J. Amer. Statist. Assoc. To appear.
  • Garc´ia-Escudero, L. A., Gordaliza, A. and Matr´an, C. (1997). Asymptotics for trimmed kmeans and associated tolerance zones. J. Statist. Plann. Inference 77 247-262. Gordaliza, A. (1991a). Best approximations to random variables based on trimming procedures. J. Approx. Theory 64 162-180. Gordaliza, A. (1991b). On the breakdown point of multivariate location estimators based on trimming procedures. Statist. Probab. Lett. 11 387-394.
  • Hartigan, J. A. (1978). Asymptotic distribution for clustering criteria. Ann. Statist. 6 117-131.
  • H ¨ossjer, O. (1994). Rank-based estimates in the linear model with high breakdown point. J. Amer. Statist. Assoc. 89 149-158.
  • Huber, P. J. (1967). The behavior of maximum likelihood estimators under non-standard conditions. Proc. Fifth Berkeley Symp. Math. Statist. Probab. 1 221-233. Univ. California press, Berkeley.
  • Hyndman, R. J. (1996). Computing and graphing highest density regions. Amer. Statist. 50 120- 126.
  • Kim, J. and Pollard, D. (1990). Cube root asymptotics. Ann. Statist. 18 191-219.
  • Li, L. and Flury, B. (1995). Uniqueness of principal points for univariate distributions. Statist. Probab. Lett. 25 323-327.
  • Mili, M. and Coakley, C. (1996). Robust estimation in structured linear regression. Ann. Statist. 24 2593-2607.
  • Pollard, D. (1981). Strong consistency of k-means clustering. Ann. Statist. 9 135-140.
  • Pollard, D. (1982). A central limit theorem for k-means clustering. Ann. Probab. 10 919-926.
  • Rousseeuw, P. J. (1983). Multivariate estimation with high breakdown point. In Proceedings of the Fourth Pannonian Symposium on Mathematical Statistics (W. Grossman, G. Plufg, I. Vincze and W. Werttz, eds.) B 283-297. Reidel, Dordrecht.
  • Rousseeuw, P. J. (1984). Least median of squares regression. J. Amer. Statist. Assoc. 79 871-880.
  • Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection. Wiley, New York.
  • Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York.
  • Serinko, R. J. and Babu, G. J. (1992). Weak limit theorems for univariate k-means clustering inder nonregular conditions. J. Multivariate Anal. 49 188-203.
  • Stute, W. and Zhu, L. X. (1995). Asymptotics of k-means clustering based on projection pursuit. Sankhy¯a 57 462-471.
  • Tableman, M. (1994). The asymptotics of the least trimmed absolute deviation (LTAD) estimator. Statist. Probab. Lett. 19 387-398.
  • Van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Wiley, New York.
  • Vandev, D. L. and Neykov, N. M. (1993). Robust maximum likelihood in the Gaussian case. In New Directions in Statistical Data Analysis and Robustness (S. Morgenthaler, E. Ronchetti and W. A. Stahel, eds.). Birkh¨auser, Basel.
  • Yohai, V. and Maronna, R. (1976). Location estimators based on linear combinations of modified order statistics. Comm. Statist. Theory Methods 5 481-486.