The Annals of Statistics

Convergence of a stochastic approximation version of the EM algorithm

Bernard Delyon, Marc Lavielle, and Eric Moulines

Full-text: Open access


The expectation-maximization (EM) algorithm is a powerful computational technique for locating maxima of functions. It is widely used in statistics for maximum likelihood or maximum a posteriori estimation in incomplete data models. In certain situations, however, this method is not applicable because the expectation step cannot be performed in closed form. To deal with these problems, a novel method is introduced, the stochastic approximation EM (SAEM), which replaces the expectation step of the EM algorithm by one iteration of a stochastic approximation procedure. The convergence of the SAEM algorithm is established under conditions that are applicable to many practical situations. Moreover, it is proved that, under mild additional conditions, the attractive stationary points of the SAEM algorithm correspond to the local maxima of the function presented to support our findings.

Article information

Ann. Statist., Volume 27, Number 1 (1999), 94-128.

First available in Project Euclid: 5 April 2002

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 65U05 62F10: Point estimation
Secondary: 62M30: Spatial processes 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43]

Incomplete data optimization maximum likelihood missing data Monte Carlo algorithm EM algorithm simulation stochastic algorithm


Delyon, Bernard; Lavielle, Marc; Moulines, Eric. Convergence of a stochastic approximation version of the EM algorithm. Ann. Statist. 27 (1999), no. 1, 94--128. doi:10.1214/aos/1018031103.

Export citation


  • Andradottir, S. (1995). A stochastic approximation algorithm with varying bounds. Oper. Res. 1037-1048.
  • Brandiere, O. and Duflo, M. (1996). Les algorithmes stochastiques contournent-ils les pi eges ? Ann. Inst. H. Poincar´e 32 395-427.
  • Brocker, T. (1975). Differentiable Germs and Catastrophes. Cambridge Univ. Press.
  • Celeux, G. and Diebolt, J. (1988). A probabilistic teacher algorithm for iterative maximum likelihood estimation. In Classification and Related Methods of Data Analysis (H. H. Bock, ed.) 617-623. North-Holland, Amsterdam.
  • Celeux, G. and Diebolt, J. (1992). A stochastic aprroximation type EM algorithm for the mixture problem. Stochastics Stochastics Rep. 41 127-146.
  • Chen, H. F., Guo, L. and Gao, A. J. (1988). Convergence and robustness of the Robbins-Monro algorithm truncated at randomly varying bounds. Stoch. Process Appl. 27 217-231.
  • Chickin, D. O. and Poznyak, A. S. (1984). On the asymptotical properties of the stochastic approximation procedure under dependent noise. Automat. Remote Control 44.
  • Chickin, D. O. (1988). On the convergence of the stochastic approximation procedure under dependent noise. Automat. Remote Control 48.
  • de Jong, P. and Shephard, N. (1995). The simulation smoother for time series model. Biometrika 82 339-350.
  • Delyon, B. (1996). General results on stochastic approximation. IEEE Trans. Automat. Control. To appear.
  • Delyon, B. and Juditski, A. (1992). Stochastic approximation with averaging of trajectories. Stochastics Stochastics Rep. 39 107-118.
  • Doukhan, P. (1994). Mixing: Properties and Examples. Lecture Notes in Statist. Springer, Berlin.
  • Diebolt, J. and Celeux, G. (1996). Asymptotic properties of a stochastic EM algorithm for estimating mixture proportions. Stochastic Models 9 599-613.
  • Diebolt, J. and Ip, E. H. S. (1996). A stochastic EM algorithm for approximating the maximum likelihood estimate. In Markov Chain Monte Carlo in Practice (W. R. Gilks, S. T. Richardson, D. J. Spiegelhalter, eds.). Chapman and Hall, London.
  • Duflo, M. (1997). Random Iterative Models. Springer, Berlin.
  • Dempster, A., Laird, N. and Rubin, D. (1977). Maximum-likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1-38.
  • Fort, J. C. and Pag es, G. (1996). Convergence of stochastic algorithms: from Kushner-Clark theorem to the Lyapounov functional method. Adv. in Appl. Probab. 28 1072-1094.
  • Geyer, C. J. and Thompson, E. A. (1992). Constrained Monte-Carlo maximum likelihood for dependent data. J. Roy. Statist. Soc. Ser. B 54 657-699.
  • Geyer, C. J. (1994). On the convergence of Monte-Carlo maximum likelihoods calculations J. Roy. Statist. Soc. Ser. B 56 261-274.
  • Geyer, C. J. (1996). Likelihood inference for spatial point processes. In Current Trends in Stochastic Geometry and its Applications (W. S. Kendall, O. E. Barndorff-Nielsen and M. C. van Lieshout, eds.) Chapman and Hall, London. To appear.
  • Hall, P. and Heyde, C. C. (1980). Limit Theory and Its Applications. Academic Press, New York.
  • Horn, R. and Johnson, C. (1985). Matrix Analysis. Cambridge Univ. Press.
  • Ibragimov, I. and Has'minski, R. (1981). Statistical Estimation: Asymptotic Theory. Springer, New York.
  • Kushner, H. and Clark, D. (1978). Stochastic Approximation for Constrained and Unconstrained Systems. Springer, New York.
  • Kushner, H. and Yin, G. (1997). Stochastic Approximation Algorithms and Applications. Springer, Berlin.
  • Lange, K. (1995). A gradient algorithm locally equivalent to the EM algorithm. J. Roy. Statist. Soc. Ser. B 75 425-437.
  • Liu, C. and Rubin, D. (1994). The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81 633-648.
  • Little, R. J. and Rubin, D. B. (1987). Statistical Analysis with Missing Data. Wiley, New York.
  • Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. J. Roy. Statist. Soc. Ser. B 44 226-233.
  • MacDonald, I. L. and Zucchini, W. (1997). Hidden Markov and Other Models for Discrete-valued Time-series. Chapman and Hall, London.
  • Meilijson, I. (1989). A fast improvement to the EM algorithm on its own terms. J. Roy. Statist. Soc. Ser. B 51 127-138.
  • Meng, X. and Rubin, D. (1993). Maximum likelihood via the ECM algorithm: a general framework. Biometrika 80 267-278.
  • Meng, X. (1994). On the rate of convergence of the ECM algorithm. Ann. Statist. 22 326-339.
  • Murray, G. (1977). Discussion on P. Dempster et al., Maximum-likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. 27-28.
  • Polyak, B. T. (1990). New stochastic approximation type procedures. Automatica i Telemekh. 98-107. (English translation in Automat. Remote Control 51.)
  • Polyak, B. T. and Juditski, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30 838-855.
  • Rao, C. (1965). Linear Statistical Inference and Its Applications. Wiley, New York.
  • Shephard, N. (1994). Partial non-Gaussian state space. Biometrika 81 115-131.
  • Tanner, M. (1993). Tools for Statistical Inference: Methods for Exploration of Posterior Distributions and Likelihood Functions. Springer, Berlin.
  • Titterington, D. M., Smith, A. F. M. and Makov, U. E. (1985). Statistical Analysis of Finite Mixture Distribution. Wiley, New York.
  • Wei, G. and Tanner, M. (1990). A Monte-Carlo implementation of the EM algorithm and the Poor's Man's data augmentation algorithm. J. Amer. Statist. Assoc. 85 699-704.
  • Wu, C. (1983). On the convergence property of the EM algorithm. Ann. Statist. 11 95-103.
  • Younes, L. (1989). Parameter estimation for imperfectly observed Gibbsian fields. Probab. Theory and Related Fields 82 625-645.
  • Younes, L. (1992). Parameter estimation for imperfectly observed Gibbs fields and some comments on Chalmond's EM Gibbsian algorithm. Stochastic Models, Statistical Methods and Algorithms in Image Analysis. Lecture Notes in Statistics 74. Springer, Berlin.