## The Annals of Probability

- Ann. Probab.
- Volume 10, Number 4 (1982), 919-926.

### A Central Limit Theorem for $k$-Means Clustering

#### Abstract

A set of $n$ points in Euclidean space is partitioned into the $k$ groups that minimize the within groups sum of squares. Under the assumption that the $n$ points come from independent sampling on a fixed distribution, conditions are found to assure asymptotic normality of the vector of means of the $k$ groups. The method of proof makes novel application of a functional central limit theorem for empirical processes--a generalization of Donsker's theorem due to Dudley.

#### Article information

**Source**

Ann. Probab., Volume 10, Number 4 (1982), 919-926.

**Dates**

First available in Project Euclid: 19 April 2007

**Permanent link to this document**

https://projecteuclid.org/euclid.aop/1176993713

**Digital Object Identifier**

doi:10.1214/aop/1176993713

**Mathematical Reviews number (MathSciNet)**

MR672292

**Zentralblatt MATH identifier**

0502.62055

**JSTOR**

links.jstor.org

**Subjects**

Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]

Secondary: 60F05: Central limit and other weak theorems 60F17: Functional limit theorems; invariance principles

**Keywords**

$k$-means clustering central limit theorem minimized within cluster sum of squares differentiability in quadratic mean Donsker classes of functions functional central limit theorem for empirical processes

#### Citation

Pollard, David. A Central Limit Theorem for $k$-Means Clustering. Ann. Probab. 10 (1982), no. 4, 919--926. doi:10.1214/aop/1176993713. https://projecteuclid.org/euclid.aop/1176993713