The Annals of Statistics

Convergence of latent mixing measures in finite and infinite mixture models

XuanLong Nguyen

Full-text: Open access


This paper studies convergence behavior of latent mixing measures that arise in finite and infinite mixture models, using transportation distances (i.e., Wasserstein metrics). The relationship between Wasserstein distances on the space of mixing measures and $f$-divergence functionals such as Hellinger and Kullback–Leibler distances on the space of mixture distributions is investigated in detail using various identifiability conditions. Convergence in Wasserstein metrics for discrete measures implies convergence of individual atoms that provide support for the measures, thereby providing a natural interpretation of convergence of clusters in clustering applications where mixture models are typically employed. Convergence rates of posterior distributions for latent mixing measures are established, for both finite mixtures of multivariate distributions and infinite mixtures based on the Dirichlet process.

Article information

Ann. Statist., Volume 41, Number 1 (2013), 370-400.

First available in Project Euclid: 26 March 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F15: Bayesian inference 62G05: Estimation
Secondary: 62G20: Asymptotic properties

Mixture distributions hierarchical models Wasserstein metric transportation distances Bayesian nonparametrics $f$-divergence rates of convergence Dirichlet processes


Nguyen, XuanLong. Convergence of latent mixing measures in finite and infinite mixture models. Ann. Statist. 41 (2013), no. 1, 370--400. doi:10.1214/12-AOS1065.

Export citation


  • [1] Ali, S. M. and Silvey, S. D. (1966). A general class of coefficients of divergence of one distribution from another. J. R. Stat. Soc. Ser. B Stat. Methodol. 28 131–142.
  • [2] Barron, A., Schervish, M. J. and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems. Ann. Statist. 27 536–561.
  • [3] Belkin, M. and Sinha, K. (2010). Polynomial learning of distribution families. In FOCS.
  • [4] Bickel, P. J. and Freedman, D. A. (1981). Some asymptotic theory for the bootstrap. Ann. Statist. 9 1196–1217.
  • [5] Birgé, L. (1984). Sur un théorème de minimax et son application aux tests. Probab. Math. Statist. 3 259–282.
  • [6] Carroll, R. J. and Hall, P. (1988). Optimal rates of convergence for deconvolving a density. J. Amer. Statist. Assoc. 83 1184–1186.
  • [7] Chen, J. H. (1995). Optimal rate of convergence for finite mixture models. Ann. Statist. 23 221–233.
  • [8] Csiszár, I. (1967). Information-type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar. 2 299–318.
  • [9] del Barrio, E., Cuesta-Albertos, J. A., Matrán, C. and Rodríguez-Rodríguez, J. M. (1999). Tests of goodness of fit based on the $L_2$-Wasserstein distance. Ann. Statist. 27 1230–1239.
  • [10] Dudley, R. M. (1976). Probabilities and Metrics: Convergence of Laws on Metric Spaces, with a View to Statistical Testing. Lecture Notes Series 45. Matematisk Institut, Aarhus Universitet, Aarhus.
  • [11] Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist. 19 1257–1272.
  • [12] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
  • [13] Gelfand, A. E., Kottas, A. and MacEachern, S. N. (2005). Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Amer. Statist. Assoc. 100 1021–1035.
  • [14] Genovese, C. R. and Wasserman, L. (2000). Rates of convergence for the Gaussian mixture sieve. Ann. Statist. 28 1105–1127.
  • [15] Ghosal, S., Ghosh, J. K. and Ramamoorthi, R. V. (1999). Posterior consistency of Dirichlet mixtures in density estimation. Ann. Statist. 27 143–158.
  • [16] Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500–531.
  • [17] Ghosal, S. and van der Vaart, A. (2007). Convergence rates of posterior distributions for non-i.i.d. observations. Ann. Statist. 35 192–223.
  • [18] Ghosal, S. and van der Vaart, A. (2007). Posterior convergence rates of Dirichlet mixtures at smooth densities. Ann. Statist. 35 697–723.
  • [19] Hjort, N. L., Holmes, C., Müller, P. and Walker, S. G., eds. (2010). Bayesian Nonparametrics. Cambridge Series in Statistical and Probabilistic Mathematics 28. Cambridge Univ. Press, Cambridge.
  • [20] Ishwaran, H., James, L. F. and Sun, J. (2001). Bayesian model selection in finite mixtures by marginal density decompositions. J. Amer. Statist. Assoc. 96 1316–1332.
  • [21] Ishwaran, H. and Zarepour, M. (2002). Dirichlet prior sieves in finite normal mixtures. Statist. Sinica 12 941–963.
  • [22] Kalai, A., Moitra, A. and Valiant, G. (2012). Disentangling Gaussians. Communications of the ACM 55 113–120.
  • [23] Le Cam, L. (1986). Asymptotic Methods in Statistical Decision Theory. Springer, New York.
  • [24] Lindsay, B. (1995). Mixture models: Theory, geometry and applications. In NSF-CBMS Regional Conference Series in Probability and Statistics. IMS, Hayward, CA.
  • [25] Mallows, C. L. (1972). A note on asymptotic joint normality. Ann. Math. Statist. 43 508–515.
  • [26] McLachlan, G. J. and Basford, K. E. (1988). Mixture Models: Inference and Applications to Clustering. Statistics: Textbooks and Monographs 84. Dekker, New York.
  • [27] Nguyen, X. (2010). Inference of global clusters from locally distributed data. Bayesian Anal. 5 817–845.
  • [28] Nguyen, X., Wainwright, M. J. and Jordan, M. I. (2010). Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans. Inform. Theory 56 5847–5861.
  • [29] Petrone, S., Guindani, M. and Gelfand, A. E. (2009). Hybrid Dirichlet mixture models for functional data. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 755–782.
  • [30] Rodríguez, A., Dunson, D. B. and Gelfand, A. E. (2008). The nested Dirichlet process. J. Amer. Statist. Assoc. 103 1131–1144.
  • [31] Rousseau, J. and Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol. 73 689–710.
  • [32] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist. 29 687–714.
  • [33] Teh, Y. W., Jordan, M. I., Beal, M. J. and Blei, D. M. (2006). Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101 1566–1581.
  • [34] Teicher, H. (1960). On the mixture of distributions. Ann. Math. Statist. 31 55–73.
  • [35] Teicher, H. (1961). Identifiability of mixtures. Ann. Math. Statist. 32 244–248.
  • [36] Villani, C. (2003). Topics in Optimal Transportation. Graduate Studies in Mathematics 58. Amer. Math. Soc., Providence, RI.
  • [37] Villani, C. (2009). Optimal Transport: Old and New. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 338. Springer, Berlin.
  • [38] Walker, S. (2004). New approaches to Bayesian consistency. Ann. Statist. 32 2028–2043.
  • [39] Walker, S. G., Lijoi, A. and Prünster, I. (2007). On rates of convergence for posterior distributions in infinite-dimensional models. Ann. Statist. 35 738–746.
  • [40] Zhang, C.-H. (1990). Fourier methods for estimating mixing densities and distributions. Ann. Statist. 18 806–831.