Open Access
November, 1982 A Central Limit Theorem for $k$-Means Clustering
David Pollard
Ann. Probab. 10(4): 919-926 (November, 1982). DOI: 10.1214/aop/1176993713


A set of $n$ points in Euclidean space is partitioned into the $k$ groups that minimize the within groups sum of squares. Under the assumption that the $n$ points come from independent sampling on a fixed distribution, conditions are found to assure asymptotic normality of the vector of means of the $k$ groups. The method of proof makes novel application of a functional central limit theorem for empirical processes--a generalization of Donsker's theorem due to Dudley.


Download Citation

David Pollard. "A Central Limit Theorem for $k$-Means Clustering." Ann. Probab. 10 (4) 919 - 926, November, 1982.


Published: November, 1982
First available in Project Euclid: 19 April 2007

zbMATH: 0502.62055
MathSciNet: MR672292
Digital Object Identifier: 10.1214/aop/1176993713

Primary: 62H30
Secondary: 60F05 , 60F17

Keywords: $k$-means clustering , central limit theorem , differentiability in quadratic mean , Donsker classes of functions , functional central limit theorem for empirical processes , minimized within cluster sum of squares

Rights: Copyright © 1982 Institute of Mathematical Statistics

Vol.10 • No. 4 • November, 1982
Back to Top