Electronic Journal of Statistics

Asymptotics of a clustering criterion for smooth distributions

Karthik Bharath, Vladimir Pozdnyakov, and Dipak K. Dey

Full-text: Open access

Abstract

We develop a clustering framework for observations from a population with a smooth probability distribution function and derive its asymptotic properties. A clustering criterion based on a linear combination of order statistics is proposed. The asymptotic behavior of the point at which the observations are split into two clusters is examined. The results obtained can then be utilized to construct an interval estimate of the point which splits the data and develop tests for bimodality and presence of clusters.

Article information

Source
Electron. J. Statist., Volume 7 (2013), 1078-1093.

Dates
First available in Project Euclid: 15 April 2013

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1366031051

Digital Object Identifier
doi:10.1214/13-EJS801

Mathematical Reviews number (MathSciNet)
MR3044510

Zentralblatt MATH identifier
1336.62172

Subjects
Primary: 62F05: Asymptotic properties of tests 62G30: Order statistics; empirical distribution functions
Secondary: 60F17: Functional limit theorems; invariance principles 62M02: Markov processes: hypothesis testing

Keywords
Clustering trimmed means CLT

Citation

Bharath, Karthik; Pozdnyakov, Vladimir; Dey, Dipak K. Asymptotics of a clustering criterion for smooth distributions. Electron. J. Statist. 7 (2013), 1078--1093. doi:10.1214/13-EJS801. https://projecteuclid.org/euclid.ejs/1366031051


Export citation

References

  • Adler, R. J. (1990). An Introduction to Continuity, Extrema, and Related Topics for General Gaussian Processes., Lecture Notes-Monograph Series 12.
  • Arnold, S. J. (1979). A Test for Clusters., Journal of Marketing Research 16 545-551.
  • Bharath, K., Pozdnyakov, V. and Dey, D. K. (2012). Asymptotics of Empirical Cross-over Function., Unpublished manuscript. Arxiv:1112.3427v3.
  • Billingsley, P. (1968)., Convergence of Probability Measures. John Wiley and Sons, New York.
  • Bock, H. H. (1985). On Some Significance Tests in Cluster Analysis., Journal of Classification 2 77-108.
  • Cuesta-Albertos, J. A., Gordaliza, A. and Matrán, C. (1997). Trimmed $k$-means: An Attempt to Robustify Quantizers., Annals of Statistics 25 553-576.
  • Devroye, L. (1981). Laws of Iterated Logarithm for Order Statistics of Uniform Spacings., Annals of Probability 9 860-867.
  • Engleman, L. and Hartigan, J. A. (1969). Percentage Points of a Test for Clusters., Journal of the American Statistical Association 64 1647-1648.
  • García-Escudero, L. A., Gordaliza, A. and Matrán, C. (1999). A Central Limit Theorem for Multivariate Generalized Trimmed $k$-means., Annals of Statistics 27 1061-1079.
  • Hartigan, J. (1978). Asymptotic Distributions for Clustering Criteria., Annals of Statistics 6 117-131.
  • Hartigan, J. A. and Hartigan, P. M. (1985). A Dip Test of Unimodality., Annals of Statistics 13 70-84.
  • Holzmann, H. and Vollmer, S. (2008). A Likelihood Ratio Test for Bimodality in Two-component Mixtures with Application to Regional Income Distribution in the EU., Advances in Statistical Analysis 92 57-69.
  • Pollard, D. (1981). Strong Consistency for $K$-Means Clustering., Annals of Statistics 9 135-140.
  • Pollard, D. (1982). A Central Limit Theorem for $k$-means Clustering., Annals of Statistics 10 919-926.
  • Pollard, D. (1984)., Convergence of Stochastic Processes. Springer-Verlag, New York.
  • Schwab, J., Podsiadlowski, P. H. and Rappaport, S. (2012). Further Evidence for the Bimodal Distribution of Neutron-Star Masses., The Astrophysical Journal 719 722-727.
  • Serfling, R. (1980)., Approximation Theorems for Mathematical Statistics. John Wiley, New york.
  • Serinko, R. J. and Babu, G. J. (1992). Weak Limit Theorems for Univariate $k$-means Clustering under Nonregular Conditions., Journal of Multivariate Analysis 49 188-203.
  • Stigler, S. M. (1973). The Asymptotic Distribution of the Trimmed Mean., Annals of Statistics 1 472-477.
  • Wolfe, J. H. (1970). Pattern Clustering by Multivariate Mixture Analysis., Multivariate Behavioral Research 5 329-350.