The Annals of Statistics

Trimmed $k$-means: an attempt to robustify quantizers

J. A. Cuesta-Albertos, A. Gordaliza, and C. Matrán

Full-text: Open access

Abstract

A class of procedures based on "impartial trimming" (self-determined by the data) is introduced with the aim of robustifying k-means, hence the associated clustering analysis. We include a detailed study of optimal regions, showing that only nonpathological regions can arise from impartial trimming procedures. The asymptotic results provided in the paper focus on strong consistency of the suggested methods under widely general conditions. A section is devoted to exploring the performance of the procedure to detect anomalous data in simulated data sets.

Article information

Source
Ann. Statist. Volume 25, Number 2 (1997), 553-576.

Dates
First available in Project Euclid: 12 September 2002

Permanent link to this document
http://projecteuclid.org/euclid.aos/1031833664

Digital Object Identifier
doi:10.1214/aos/1031833664

Mathematical Reviews number (MathSciNet)
MR1439314

Zentralblatt MATH identifier
0878.62045

Subjects
Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 60F15: Strong theorems
Secondary: 62F35: Robustness and adaptive procedures

Keywords
$k$-means trimmed $k$-means clustering methods consistency robustness

Citation

Cuesta-Albertos, J. A.; Gordaliza, A.; Matrán, C. Trimmed $k$-means: an attempt to robustify quantizers. Ann. Statist. 25 (1997), no. 2, 553--576. doi:10.1214/aos/1031833664. http://projecteuclid.org/euclid.aos/1031833664.


Export citation

References

  • ARCONES, M. A. and GINE, E. 1992. On the bootstrap of M-estimators and other statistical ´ Z. functions. In Exploring the Limits of Bootstrap R. Lepage and L. Billard, eds. 13 47. Wiley, New York. Z.
  • CAMBANIS, S. and GERR, N. L. 1983. A simple class of asy mptotically optimal quantiziers. IEEE Trans. Inform. Theory IT-29 664 676. Z. CUESTA-ALBERTOS, J. A. and MATRAN, C. 1988. The strong law of large numbers for k-means ´ and best possible nets of Banach valued random variables. Probab. Theory Related Fields 78 523 534. Z. CUESTA-ALBERTOS, J. A., GORDALIZA, A. and MATRAN, C. 1995. On the Cauchy mean value ´ property for -means. Multivariate Statistics. Proceedings of the Fifth Tartu Conference on Multivariate Statistics 247 265. VSP TEV, Vilnius, Lithuania. Z. CUESTA-ALBERTOS, J. A., GORDALIZA, A. and MATRAN, C. 1996. Trimmed k-nets. Preprint. ´ Z.
  • GORDALIZA, A. 1991a. Best approximations to random variables based on trimming procedures. J. Approx. Theory 64 162 180. Z.
  • GORDALIZA, A. 1991b. On the breakdown point of multivariate location estimators based on trimming procedures. Statist. Probab. Lett. 11 387 394. Z.
  • HARTIGAN, J. 1975. Clustering Algorithms. Wiley, New York. Z.
  • HARTIGAN, J. 1978. Asy mptotic distributions for clustering criteria. Ann. Statist. 6 117 131. Z. IEEE 1982. IEEE Trans. Inform. Theory IT-28. Z.
  • KAUFMAN, L. and ROUSSEEUW, P. J. 1990. Finding Groups in Data. An Introduction to Cluster Analy sis. Wiley, New York. Z.
  • POLLARD, D. 1981. Strong consistency of k-means clustering. Ann. Statist. 9 135 140. Z.
  • POLLARD, D. 1982. A central limit theorem for k-mean clustering. Ann. Probab. 10 919 926. Z.
  • ROUSSEEUW, P. J. and LEROY, A. 1987. Robust Regression and Outliers Detection. Wiley, New York. Z.
  • SERINKO, R. J. and BABU, G. J. 1992. Weak limit theorems for univariate k-mean clustering under a nonregular condition. J. Multivariate Anal. 41 273 296. Z. SVERDRUP-THy GESON, H. 1981. Strong law of large numbers for measures of central tendency and dispersion of random variables in compact metrics spaces. Ann. Statist. 9 141 145. Z.
  • TARPEY, T., LI, L. and FLURY, B. 1995. Principal points and self-consistent points of elliptical distributions. Ann. Statist. 23 103 112.