Open Access
February 2005 Learning mixtures of separated nonspherical Gaussians
Sanjeev Arora, Ravi Kannan
Ann. Appl. Probab. 15(1A): 69-92 (February 2005). DOI: 10.1214/105051604000000512


Mixtures of Gaussian (or normal) distributions arise in a variety of application areas. Many heuristics have been proposed for the task of finding the component Gaussians given samples from the mixture, such as the EM algorithm, a local-search heuristic from Dempster, Laird and Rubin [J. Roy. Statist. Soc. Ser. B 39 (1977) 1–38]. These do not provably run in polynomial time.

We present the first algorithm that provably learns the component Gaussians in time that is polynomial in the dimension. The Gaussians may have arbitrary shape, but they must satisfy a “separation condition” which places a lower bound on the distance between the centers of any two component Gaussians. The mathematical results at the heart of our proof are “distance concentration” results—proved using isoperimetric inequalities—which establish bounds on the probability distribution of the distance between a pair of points generated according to the mixture.

We also formalize the more general problem of max-likelihood fit of a Gaussian mixture to unstructured data.


Download Citation

Sanjeev Arora. Ravi Kannan. "Learning mixtures of separated nonspherical Gaussians." Ann. Appl. Probab. 15 (1A) 69 - 92, February 2005.


Published: February 2005
First available in Project Euclid: 28 January 2005

zbMATH: 1059.62062
MathSciNet: MR2115036
Digital Object Identifier: 10.1214/105051604000000512

Primary: 62-07 , 62H30 , 62N02 , 68T05

Keywords: clustering , efficient algorithms , estimation , Gaussian mixtures , Isoperimetric inequalities , learning , Mixture models

Rights: Copyright © 2005 Institute of Mathematical Statistics

Vol.15 • No. 1A • February 2005
Back to Top