The Annals of Probability
- Ann. Probab.
- Volume 10, Number 4 (1982), 919-926.
A Central Limit Theorem for $k$-Means Clustering
A set of $n$ points in Euclidean space is partitioned into the $k$ groups that minimize the within groups sum of squares. Under the assumption that the $n$ points come from independent sampling on a fixed distribution, conditions are found to assure asymptotic normality of the vector of means of the $k$ groups. The method of proof makes novel application of a functional central limit theorem for empirical processes--a generalization of Donsker's theorem due to Dudley.
Ann. Probab., Volume 10, Number 4 (1982), 919-926.
First available in Project Euclid: 19 April 2007
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]
Secondary: 60F05: Central limit and other weak theorems 60F17: Functional limit theorems; invariance principles
$k$-means clustering central limit theorem minimized within cluster sum of squares differentiability in quadratic mean Donsker classes of functions functional central limit theorem for empirical processes
Pollard, David. A Central Limit Theorem for $k$-Means Clustering. Ann. Probab. 10 (1982), no. 4, 919--926. doi:10.1214/aop/1176993713. https://projecteuclid.org/euclid.aop/1176993713