The Annals of Probability

A Central Limit Theorem for $k$-Means Clustering

David Pollard

Full-text: Open access

Abstract

A set of $n$ points in Euclidean space is partitioned into the $k$ groups that minimize the within groups sum of squares. Under the assumption that the $n$ points come from independent sampling on a fixed distribution, conditions are found to assure asymptotic normality of the vector of means of the $k$ groups. The method of proof makes novel application of a functional central limit theorem for empirical processes--a generalization of Donsker's theorem due to Dudley.

Article information

Source
Ann. Probab., Volume 10, Number 4 (1982), 919-926.

Dates
First available in Project Euclid: 19 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aop/1176993713

Digital Object Identifier
doi:10.1214/aop/1176993713

Mathematical Reviews number (MathSciNet)
MR672292

Zentralblatt MATH identifier
0502.62055

JSTOR
links.jstor.org

Subjects
Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]
Secondary: 60F05: Central limit and other weak theorems 60F17: Functional limit theorems; invariance principles

Keywords
$k$-means clustering central limit theorem minimized within cluster sum of squares differentiability in quadratic mean Donsker classes of functions functional central limit theorem for empirical processes

Citation

Pollard, David. A Central Limit Theorem for $k$-Means Clustering. Ann. Probab. 10 (1982), no. 4, 919--926. doi:10.1214/aop/1176993713. https://projecteuclid.org/euclid.aop/1176993713


Export citation