## The Annals of Statistics

- Ann. Statist.
- Volume 45, Number 1 (2017), 77-120.

### Statistical guarantees for the EM algorithm: From population to sample-based analysis

Sivaraman Balakrishnan, Martin J. Wainwright, and Bin Yu

#### Abstract

The EM algorithm is a widely used tool in maximum-likelihood estimation in incomplete data problems. Existing theoretical work has focused on conditions under which the iterates or likelihood values converge, and the associated rates of convergence. Such guarantees do not distinguish whether the ultimate fixed point is a near global optimum or a bad local optimum of the sample likelihood, nor do they relate the obtained fixed point to the global optima of the idealized population likelihood (obtained in the limit of infinite data). This paper develops a theoretical framework for quantifying when and how quickly EM-type iterates converge to a small neighborhood of a given global optimum of the population likelihood. For correctly specified models, such a characterization yields rigorous guarantees on the performance of certain two-stage estimators in which a suitable initial pilot estimator is refined with iterations of the EM algorithm. Our analysis is divided into two parts: a treatment of the EM and first-order EM algorithms at the population level, followed by results that apply to these algorithms on a finite set of samples. Our conditions allow for a characterization of the region of convergence of EM-type iterates to a given population fixed point, that is, the region of the parameter space over which convergence is guaranteed to a point within a small neighborhood of the specified population fixed point. We verify our conditions and give tight characterizations of the region of convergence for three canonical problems of interest: symmetric mixture of two Gaussians, symmetric mixture of two regressions and linear regression with covariates missing completely at random.

#### Article information

**Source**

Ann. Statist., Volume 45, Number 1 (2017), 77-120.

**Dates**

Received: September 2015

Revised: January 2016

First available in Project Euclid: 21 February 2017

**Permanent link to this document**

https://projecteuclid.org/euclid.aos/1487667618

**Digital Object Identifier**

doi:10.1214/16-AOS1435

**Mathematical Reviews number (MathSciNet)**

MR3611487

**Zentralblatt MATH identifier**

1367.62052

**Subjects**

Primary: 62F10: Point estimation 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43]

Secondary: 90C30: Nonlinear programming

**Keywords**

EM algorithm first-order EM algorithm nonconvex optimization maximum likelihood estimation

#### Citation

Balakrishnan, Sivaraman; Wainwright, Martin J.; Yu, Bin. Statistical guarantees for the EM algorithm: From population to sample-based analysis. Ann. Statist. 45 (2017), no. 1, 77--120. doi:10.1214/16-AOS1435. https://projecteuclid.org/euclid.aos/1487667618

#### Supplemental materials

- Supplement to “Statistical guarantees for the EM algorithm: From population to sample-based analysis”. The supplement [3] contains all remaining technical proofs omitted from the main text due to space constraints.Digital Object Identifier: doi:10.1214/16-AOS1435SUPPSupplemental files are immediately available to subscribers. Non-subscribers gain access to supplemental files with the purchase of the article.