## Electronic Journal of Statistics

### Statistical convergence of the EM algorithm on Gaussian mixture models

#### Abstract

We study the convergence behavior of the Expectation Maximization (EM) algorithm on Gaussian mixture models with an arbitrary number of mixture components and mixing weights. We show that as long as the means of the components are separated by at least $\Omega (\sqrt{\min \{M,d\}})$, where $M$ is the number of components and $d$ is the dimension, the EM algorithm converges locally to the global optimum of the log-likelihood. Further, we show that the convergence rate is linear and characterize the size of the basin of attraction to the global optimum.

#### Article information

Source
Electron. J. Statist., Volume 14, Number 1 (2020), 632-660.

Dates
First available in Project Euclid: 28 January 2020

https://projecteuclid.org/euclid.ejs/1580202033

Digital Object Identifier
doi:10.1214/19-EJS1660

Mathematical Reviews number (MathSciNet)
MR4056269

Zentralblatt MATH identifier
07163269

Subjects
Primary: 62F10: Point estimation

#### Citation

Zhao, Ruofei; Li, Yuanzhi; Sun, Yuekai. Statistical convergence of the EM algorithm on Gaussian mixture models. Electron. J. Statist. 14 (2020), no. 1, 632--660. doi:10.1214/19-EJS1660. https://projecteuclid.org/euclid.ejs/1580202033

#### References

• [1] Achlioptas, D. and McSherry, F. (2005). On Spectral Learning of Mixtures of Distributions. In, COLT.
• [2] Arora, S. and Kannan, R. (2005). Learning mixtures of separated nonspherical Gaussians., The Annals of Applied Probability 69–92.
• [3] Balakrishnan, S., Wainwright, M. J. and Yu, B. (2017). Statistical guarantees for the EM algorithm: From population to sample-based analysis., The Annals of Statistics 77–120.
• [4] Belkin, M. and Sinha, K. (2010). Polynomial Learning of Distribution Families. In, Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science 103–112.
• [5] Brubaker, S. C. and Vempala, S. (2008). Isotropic PCA and Affine-Invariant Clustering. In, Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science 551–560.
• [6] Cai, T., Ma, J. and Zhang, L. CHIME: Clustering of High-dimensional Gaussian Mixtures with EM Algorithm and its Optimality., The Annals of Statistics. To appear.
• [7] Chaudhuri, K. and Rao, S. (2008). Learning Mixtures of Product Distributions using Correlations and Independence. In, Twenty-First Annual Conference on Learning Theory 9–20.
• [8] Chaudhuri, K., Kakade, S. M., Livescu, K. and Sridharan, K. (2009). Multi-view Clustering via Canonical Correlation Analysis. In, Proceedings of the 26th Annual International Conference on Machine Learning 129–136.
• [9] Dasgupta, S. (1999). Learning mixtures of Gaussians. In, 40th Annual Symposium on Foundations of Computer Science 634–644.
• [10] Dasgupta, S. and Schulman, L. J. (2007). A probabilistic analysis of EM for mixtures of separated, spherical Gaussians., Journal of Machine Learning Research 8 203–226.
• [11] Daskalakis, C., Tzamos, C. and Zampetakis, M. (2017). Ten Steps of EM Suffice for Mixtures of Two Gaussians. In, Proceedings of the 2017 Conference on Learning Theory 65 704–710.
• [12] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm., Journal of the Royal Statistical Society, Series B 39 1–38.
• [13] Ghosal, S. and van der Vaart, A. W. (2001). Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities., The Annals of Statistics 1233–1263.
• [14] Hardt, M. and Price, E. (2015). Tight Bounds for Learning a Mixture of Two Gaussians. In, Proceedings of the Forty-seventh Annual ACM Symposium on Theory of Computing 753–760.
• [15] Heinrich, P. and Kahn, J. (2018). Strong identifiability and optimal minimax rates for finite mixture estimation., The Annals of Statistics 2844–2870.
• [16] Hsu, D. and Kakade, S. M. (2013). Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions. In, Proceedings of the 4th Conference on Innovations in Theoretical Computer Science.
• [17] Jin, C., Zhang, Y., Balakrishnan, S., J. Wainwright, M. and Jordan, M. (2016). Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences. In, Advances in Neural Information Processing Systems 29.
• [18] Kalai, A. T., Moitra, A. and Valiant, G. (2010). Efficiently Learning Mixtures of Two Gaussians. In, Proceedings of the Forty-second ACM Symposium on Theory of Computing 553–562.
• [19] Kannan, R., Salmasian, H. and Vempala, S. (2008). The spectral method for general mixture models., SIAM Journal on Computing 38 1141–1156.
• [20] Klusowski, J. M. and Brinda, W. D. (2016). Statistical Guarantees for Estimating the Centers of a Two-component Gaussian Mixture by EM., arXiv preprint. arXiv:1608.02280 .
• [21] Lu, Y. and Zhou, H. H. (2016). Statistical and Computational Guarantees of Lloyd’s Algorithm and its Variants., arXiv preprint. arXiv:1612.02099.
• [22] Mei, S., Bai, Y. and Montanari, A. The Landscape of Empirical Risk for Non-convex Losses., arXiv preprint. arXiv:1607.06534.
• [23] Moitra, A. and Valiant, G. (2010). Settling the Polynomial Learnability of Mixtures of Gaussians. In, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science 93–102.
• [24] Nguyen, X. (2013). Convergence of latent mixing measures in finite and infinite mixture models., The Annals of Statistics 370–400.
• [25] Tseng, P. (2004). An analysis of the EM algorithm and entropy-like proximal point methods., Mathematics of Operations Research 29 27–44.
• [26] Vempala, S. and Wang, G. (2004). A spectral algorithm for learning mixture models., Journal of Computer and System Sciences 68 841–860.
• [27] Vershynin, R., High-Dimensional Probability: An introduction with Applications in Data Science.
• [28] Wang, Z., Gu, Q., Ning, Y. and Liu, H. (2015). High Dimensional EM Algorithm: Statistical Optimization and Asymptotic Normality. In, Advances in Neural Information Processing Systems 28 2521–2529.
• [29] Wu, C. F. J. (1983). On the convergence properties of the EM algorithm., The Annals of Statistics 95–103.
• [30] Xu, J., Hsu, D. and Maleki, A. (2016). Global Analysis of Expectation Maximization for Mixtures of Two Gaussians. In, Advances in Neural Information Processing Systems 29.
• [31] Yan, B., Yin, M. and Sarkar, P. (2017). Convergence of Gradient EM on Multi-component Mixture of Gaussians. In, Advances in Neural Information Processing Systems 30.
• [32] Yi, X. and Caramanis, C. (2015). Regularized EM Algorithms: A Unified Framework and Statistical Guarantees. In, Advances in Neural Information Processing Systems 28 1567–1575.