Statistical Science

Stochastic Approximation and Newton’s Estimate of a Mixing Distribution

Ryan Martin and Jayanta K. Ghosh

Full-text: Open access


Many statistical problems involve mixture models and the need for computationally efficient methods to estimate the mixing distribution has increased dramatically in recent years. Newton [Sankhyā Ser. A 64 (2002) 306–322] proposed a fast recursive algorithm for estimating the mixing distribution, which we study as a special case of stochastic approximation (SA). We begin with a review of SA, some recent statistical applications, and the theory necessary for analysis of a SA algorithm, which includes Lyapunov functions and ODE stability theory. Then standard SA results are used to prove consistency of Newton’s estimate in the case of a finite mixture. We also propose a modification of Newton’s algorithm that allows for estimation of an additional unknown parameter in the model, and prove its consistency.

Article information

Statist. Sci., Volume 23, Number 3 (2008), 365-382.

First available in Project Euclid: 28 January 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Stochastic approximation empirical Bayes mixture models Lyapunov functions


Martin, Ryan; Ghosh, Jayanta K. Stochastic Approximation and Newton’s Estimate of a Mixing Distribution. Statist. Sci. 23 (2008), no. 3, 365--382. doi:10.1214/08-STS265.

Export citation


  • [1] Allison, D., Gadbury, G., Heo, M., Fernández, J., Lee, C., Prolla, T. and Weindruch, R. (2002). A mixture model approach for the analysis of microarray gene expression data. Comput. Statist. Data Anal. 39 1–20.
  • [2] Andrieu, C., Moulines, E. and Priouret, P. (2005). Stability of stochastic approximation under verifiable conditions. SIAM J. Control Optim. 44 283–312.
  • [3] Barron, A., Schervish, M. and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems. Ann. Statist. 27 536–561.
  • [4] Cipra, B. (1987). Introduction to the Ising model. Amer. Math. Monthly 94 937–959.
  • [5] Csiszár, I. (1975). I-divergence geometry of probability distributions and minimization problems. Ann. Probab. 3 146–158.
  • [6] Delyon, B., Lavielle, M. and Moulines, E. (1999). Convergence of a stochastic approximation version of the EM algorithm. Ann. Statist. 27 98–128.
  • [7] Dempster, A., Laird, N. and Rubin, D. (1977). Maximum-likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1–38.
  • [8] Efron, B. and Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 23 70–86.
  • [9] Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • [10] Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist. 19 1257–1272.
  • [11] Ghosal, S., Ghosh, J. K. and Ramamoorthi, R. V. (1999). Posterior consistency of Dirichlet mixtures in density estimation. Ann. Statist. 27 143–158.
  • [12] Ghosal, S. and van der Vaart, A. (2001). Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities. Ann. Statist. 29 1233–1263.
  • [13] Ghosh, J. K. and Ramamoorthi, R. V. (2003). Bayesian Nonparametrics. Springer, New York.
  • [14] Ghosh, J. and Tokdar, S. (2006). Convergence and consistency of Newton’s algorithm for estimating a mixing distribution. In Frontiers in Statistics 429–443. Imp. Coll. Press, London.
  • [15] Gilks, W., Roberts, G. and Sahu, S. (1998). Adaptive Markov chain Monte Carlo through regeneration. J. Amer. Statist. Assoc. 93 1045–1054.
  • [16] Haario, H., Saksman, E. and Tamminen, J. (2001). An adaptive Metropolis algorithm. Bernoulli 7 223–242.
  • [17] Isaacson, D. L. and Madsen, R. W. (1976). Markov Chains: Theory and Applications. Wiley, New York.
  • [18] Kou, S. C., Zhou, Q. and Wong, W. H. (2006). Equi-energy sampler with applications in statistical inference and statistical mechanics. Ann. Statist. 34 1581–1619.
  • [19] Kushner, H. J. and Yin, G. G. (2003). Stochastic Approximation and Recursive Algorithms and Applications, 2nd ed. Springer, New York.
  • [20] LaSalle, J. and Lefschetz, S. (1961). Stability by Liapunov’s Direct Method with Applications. Academic Press, New York.
  • [21] Liang, F., Liu, C. and Carroll, R. J. (2007). Stochastic approximation in Monte Carlo computation. J. Amer. Statist. Assoc. 102 305–320.
  • [22] Lindsay, B. (1995). Mixture Models: Theory, Geometry and Applications. IMS, Haywood, CA.
  • [23] Liu, J. S. (1996). Nonparametric hierarchical Bayes via sequential imputations. Ann. Statist. 24 911–930.
  • [24] McLachlan, G., Bean, R. and Peel, D. (2002). A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18 413–422.
  • [25] Nevel’son, M. B. and Has’minskii, R. Z. (1973). Stochastic Approximation and Recursive Estimation. Amer. Math. Soc., Providence, RI. Translations of Mathematical Monographs, Vol. 47.
  • [26] Newton, M. A. (2002). A nonparametric recursive estimator of the mixing distribution. Sankhyā Ser. A 64 306–322.
  • [27] Newton, M. A., Quintana, F. A. and Zhang, Y. (1998). Nonparametric Bayes methods using predictive updating. In Practical Nonparametric and Semiparametric Bayesian Statistics (D. Dey, P. Muller and D. Sinha, eds.) 45–61. Springer, New York.
  • [28] Newton, M. A. and Zhang, Y. (1999). A recursive algorithm for nonparametric analysis with missing data. Biometrika 86 15–26.
  • [29] Quintana, F. A. and Newton, M. A. (2000). Computational aspects of nonparametric Bayesian analysis with applications to the modeling of multiple binary sequences. J. Comput. Graph. Statist. 9 711–737.
  • [30] Robbins, H. (1964). The empirical Bayes approach to statistical decision problems. Ann. Math. Statist. 35 1–20.
  • [31] Robbins, H. and Monro, S. (1951). A stochastic approximation method. Ann. Math. Statist. 22 400–407.
  • [32] Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, 2nd ed. Springer, New York.
  • [33] San Martin, E. and Quintana, F. (2002). Consistency and identifiability revisited. Braz. J. Probab. Stat. 16 99–106.
  • [34] Scott, J. G. and Berger, J. O. (2006). An exploration of aspects of Bayesian multiple testing. J. Statist. Plann. Inference 136 2144–2162.
  • [35] Shyamalkumar, N. (1996). Cyclic I0 projections and its applications in statistics. Technical Report 96-24, Dept. Statistics, Purdue Univ., West Lafayette, IN.
  • [36] Tang, Y., Ghosal, S. and Roy, A. (2007). Nonparametric Bayesian estimation of positive false discovery rates. Biometrics 63 1126–1134.
  • [37] Teicher, H. (1963). Identifiability of finite mixtures. Ann. Math. Statist. 34 1265–1269.
  • [38] Tokdar, S. T., Martin, R. and Ghosh, J. K. (2008). Consistency of a recursive estimate of mixing distributions. Ann. Statist. To appear.
  • [39] Wei, G. C. G. and Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the Poor Man’s data augmentation algorithm. J. Amer. Statist. Assoc. 85 699–704.