The Annals of Statistics

Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities

Subhashis Ghosal and Aad W. van der Vaart

Full-text: Open access


We study the rates of convergence of the maximum likelihood estimator (MLE) and posterior distribution in density estimation problems, where the densities are location or location-scale mixtures of normal distributions with the scale parameter lying between two positive numbers. The true density is also assumed to lie in this class with the true mixing distribution either compactly supported or having sub-Gaussian tails. We obtain bounds for Hellinger bracketing entropies for this class, and from these bounds, we deduce the convergence rates of (sieve) MLEs in Hellinger distance. The rate turns out to be $(log n)^\kappa /\sqrt{n}$, where $\kappa \ge 1$ is a constant that depends on the type of mixtures and the choice of the sieve. Next, we consider a Dirichlet mixture of normals as a prior on the unknown density. We estimate the prior probability of a certain Kullback-Leibler type neighborhood and then invoke a general theorem that computes the posterior convergence rate in terms the growth rate of the Hellinger entropy and the concentration rate of the prior. The posterior distribution is also seen to converge at the rate $(log n)^\kappa /\sqrt{n}$, where $\kappa$ now depends on the tail behavior of the base measure of the Dirichlet process.

Article information

Ann. Statist. Volume 29, Number 5 (2001), 1233-1263.

First available in Project Euclid: 8 February 2002

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G07: Density estimation 62G20: Asymptotic properties

Bracketing Dirichlet mixture maximum likelihood mixture of normals posterior distribution rate of convergence sieve entropy


Ghosal, Subhashis; van der Vaart, Aad W. Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities. Ann. Statist. 29 (2001), no. 5, 1233--1263. doi:10.1214/aos/1013203452.

Export citation


  • Banfield, J. and Raftery, A. (1993). Model based Gaussian and non-Gaussian clustering. Biometrics 49 803-821.
  • Barron, A., Schervish, M. and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems. Ann. Statist. 27 536-561.
  • Birg´e, L. and Massart, P. (1998). Minimum contract estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4 329-375.
  • Escobar, M. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577-588.
  • Ferguson, T. S. (1967). Mathematical Statistics: A Decision Theoretic Approach. Academic Press, New York.
  • Ferguson, T. S. (1983). Bayesian density estimation by mixtures of Normal distributions. In Recent Advances in Statistics (M. Rizvi, J. Rustagi and D. Siegmund, eds.) 287-302. Academic Press, New York.
  • Geman, S. and Hwang, C. (1982). Nonparametric maximum likelihood estimation by the method of sieves. Ann. Statist. 10 401-414.
  • Genovese, C. and Wasserman, L. (2000). Rates of convergence for the Gaussian mixture sieve. Ann. Statist. 28 1105-1127. Ghosal, S., Ghosh, J. K. and Ramamoorthi, R. V. (1999a). Posterior consistency of Dirichlet mixtures in density estimation. Ann. Statist. 27 143-158. Ghosal, S., Ghosh, J. K. and Ramamoorthi, R. V. (1999b). Consistency issues in Bayesian Nonparametrics. In Asymptotics, Nonparametrics and Time Series: A Tribute to Madan Lal Puri (S. Ghosh, ed.) 639-668. Dekker, New York.
  • Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500-531.
  • Grenander, U. (1981). Abstract Inference. Wiley, New York.
  • Ibragimov, I. A. (2001). Estimation of analytic functions. In State of the Art in Probability and Statistics. Festschrift for W. R. van Zwet. IMS, Hayward, CA.
  • Ibragimov, I. A. and Khasminskii, R. Z. (1982). An estimate of the density of a distribution belonging to a class of entire functions. Theory Probab. Appl. 27 514-524 (in Russian).
  • Kolmogorov, A. N. and Tihomirov. V. M. (1961). -entropy and -capacity of sets in function spaces. Amer. Math. Soc. Transl. Ser. 2 17 277-364. [Translated from Russian (1959) Uspekhi Mat. Nauk 14 3-86.]
  • Lindsay, B. (1995). Mixture Models: Theory, Geometry and Applications. IMS, Hayward, CA.
  • Lo, A. Y. (1984). On a class of Bayesian nonparametric estimates I: Density estimates. Ann. Statist. 12 351-357.
  • McLachlan, G. and Basford, K. (1988). Mixture Models: Inference and Applications to Clustering. Dekker, New York.
  • Priebe, C. E. (1994). Adaptive mixtures. J. Amer. Statist. Assoc. 89 796-806.
  • Robert, C. (1996). Mixtures of distributions: inference and estimation. In Markov Chain Monte Carlo in Practice (W. Gilks, S. Richardson and D. Spiegelhalter, eds.) 441-464. Chapman and Hall, London.
  • Roeder, K. (1992). Semiparametric estimation of normal mixture densities. Ann. Statist. 20 929- 943.
  • Roeder, K. and Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals. J. Amer. Statist. Assoc. 92 894-902.
  • Rudin, W. (1987). Real and ComplexAnalysis, 3rd ed. McGraw-Hill, New York.
  • Schwartz, L. (1965). On Bayes procedures. Z. Wahrsch. Verw. Gebiete 4 10-26.
  • Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist. 29 687-714.
  • Shen, X. and Wong, W. H. (1994). Convergence rate of sieve estimates. Ann. Statist. 22 580-615.
  • van de Geer, S. (1993). Hellinger consistency of certain nonparametric maximum likelihood estimators. Ann. Statist. 21 14-44.
  • van de Geer, S. (1996). Rates of convergence for the maximum likelihood estimator in mixture models. J. Nonparametr. Statist. 6 293-310.
  • van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and empirical Processes. Springer, New York.
  • Wasserman, L. (1998). Asymptotic properties of nonparametric Bayesian procedures. Practical Nonparametric and Semiparametric Bayesian Statistics. Lecture Notes in Statist. 133 293-304. Springer, New York.
  • West, M. (1992). Modeling with mixtures. In Bayesian Statistics 4 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 503-524. Oxford Univ. Press.
  • West, M., Muller, P. and Escobar, M. D. (1994). Hierarchical priors and mixture models, with applications in regression and density estimation. In Aspects of Uncertainty: A Tribute to D. V. Lindley (P. R. Freeman and A. F. M. Smith, eds.) 363-386. Wiley, New York.
  • Wong, W. H. and Shen, X. (1995). Probability inequalities for likelihood ratios and convergence rates of sieve MLEs. Ann. Statist. 23 339-362.