Statistical Science

MCMC for Normalized Random Measure Mixture Models

Stefano Favaro and Yee Whye Teh

Full-text: Open access


This paper concerns the use of Markov chain Monte Carlo methods for posterior sampling in Bayesian nonparametric mixture models with normalized random measure priors. Making use of some recent posterior characterizations for the class of normalized random measures, we propose novel Markov chain Monte Carlo methods of both marginal type and conditional type. The proposed marginal samplers are generalizations of Neal’s well-regarded Algorithm 8 for Dirichlet process mixture models, whereas the conditional sampler is a variation of those recently introduced in the literature. For both the marginal and conditional methods, we consider as a running example a mixture model with an underlying normalized generalized Gamma process prior, and describe comparative simulation results demonstrating the efficacies of the proposed methods.

Article information

Statist. Sci., Volume 28, Number 3 (2013), 335-359.

First available in Project Euclid: 28 August 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayesian nonparametrics hierarchical mixture model completely random measure normalized random measure Dirichlet process normalized generalized Gamma process MCMC posterior sampling method marginalized sampler Algorithm 8 conditional sampler slice sampling


Favaro, Stefano; Teh, Yee Whye. MCMC for Normalized Random Measure Mixture Models. Statist. Sci. 28 (2013), no. 3, 335--359. doi:10.1214/13-STS422.

Export citation


  • [1] Aldous, D. J. (1985). Exchangeability and related topics. In École D’été de Probabilités de Saint-Flour, XIII—1983. Lecture Notes in Math. 1117 1–198. Springer, Berlin.
  • [2] Barrios, E., Lijoi, A., Nieto-Barajas, L. E. and Prüenster, I. (2012). Modeling with normalized random measure mixture models. Unpublished manuscript.
  • [3] Binder, D. A. (1978). Bayesian cluster analysis. Biometrika 65 31–38.
  • [4] Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via Pólya urn schemes. Ann. Statist. 1 353–355.
  • [5] Brix, A. (1999). Generalized gamma measures and shot-noise Cox processes. Adv. in Appl. Probab. 31 929–953.
  • [6] Broderick, T., Jordan, M. I. and Pitman, J. (2012). Clusters and features from combinatorial stochastic processes. Available at arXiv:1206.5862 [math.ST].
  • [7] Bush, C. A. and MacEachern, S. N. (1996). A semiparametric Bayesian model for randomised block designs. Biometrika 83 275–285.
  • [8] Dahl, D. B. (2006). Model-based clustering for expression data via a Dirichlet process mixture model. In Bayesian Inference for Gene Expression and Proteomics (K. Do, P. Müller and M. Vannucci, eds.). Cambridge Univ. Press, Cambridge.
  • [9] Daley, D. J. and Vere-Jones, D. (2002). An Introduction to the Theory of Point Processes. Springer, New York.
  • [10] Diebolt, J. and Robert, C. P. (1994). Estimation of finite mixture distributions through Bayesian sampling. J. R. Stat. Soc. Ser. B Stat. Methodol. 56 363–375.
  • [11] Escobar, M. D. (1988). Estimating the means of several normal populations by nonparametric estimation of the distribution of the means. Ph.D. thesis, Yale Univ.
  • [12] Escobar, M. D. (1994). Estimating normal means with a Dirichlet process prior. J. Amer. Statist. Assoc. 89 268–277.
  • [13] Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577–588.
  • [14] Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theoret. Population Biology 3 87–112; erratum, ibid. 3 (1972), 240, 376.
  • [15] Favaro, S. and Walker, S. G. (2013). Slice sampling $\sigma $-stable Poisson–Kingman mixture models. J. Comput. Graph. Statist. To appear.
  • [16] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
  • [17] Ferguson, T. S. and Klass, M. J. (1972). A representation of independent increment processes without Gaussian components. Ann. Math. Statist. 43 1634–1643.
  • [18] Fraley, C. and Raftery, A. E. (2007). Bayesian regularization for normal mixture estimation and model-based clustering. J. Classification 24 155–181.
  • [19] Fritsch, A. and Ickstadt, K. (2009). Improved criteria for clustering based on the posterior similarity matrix. Bayesian Anal. 4 367–391.
  • [20] Gasthaus, J., Wood, F., Görür, D. and Teh, Y. W. (2009). Dependent Dirichlet process spike sorting. In Advances in Neural Information Processing Systems 21 497–504.
  • [21] Gilks, W. R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Appl. Statist. 41 337–348.
  • [22] Gnedin, A. and Pitman, J. (2006). Exchangeable Gibbs partitions and Stirling triangles. J. Math. Sci. 138 5674–5684.
  • [23] Görür, D., Rasmussen, C. E., Tolias, A. S., Sinz, F. and Logothetis, N. K. (2004). Modelling spikes with mixtures of factor analysers. In Proceedings of the Conference of the German Association for Pattern Recognition (DAGM).
  • [24] Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711–732.
  • [25] Green, P. J. and Richardson, S. (2001). Modelling heterogeneity with and without the Dirichlet process. Scand. J. Stat. 28 355–375.
  • [26] Griffin, J. E., Kolossiatis, M. and Steel, M. F. J. (2013). Comparing distributions using dependent normalized random measure mixtures. J. R. Stat. Soc. Ser. B Stat. Methodol. 75 499–529.
  • [27] Griffin, J. E. and Walker, S. G. (2011). Posterior simulation of normalized random measure mixtures. J. Comput. Graph. Statist. 20 241–259.
  • [28] Griffiths, T. L. and Ghahramani, Z. (2011). The Indian buffet process: An introduction and review. J. Mach. Learn. Res. 12 1185–1224.
  • [29] Hjort, N. L. (1990). Nonparametric Bayes estimators based on beta processes in models for life history data. Ann. Statist. 18 1259–1294.
  • [30] Hjort, N. L., Holmes, C., Müller, P. and Walker, S. G., eds. (2010). Bayesian Nonparametrics. Cambridge Series in Statistical and Probabilistic Mathematics 28. Cambridge Univ. Press, Cambridge.
  • [31] Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc. 96 161–173.
  • [32] Jain, S. and Neal, R. M. (2000). A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Unpublished manuscript.
  • [33] James, L. F. (2002). Poisson process partition calculus with applications to exchangeable models and Bayesian nonparametrics. Available at arXiv:math/0205093v1.
  • [34] James, L. F. (2003). A simple proof of the almost sure discreteness of a class of random measures. Statist. Probab. Lett. 65 363–368.
  • [35] James, L. F., Lijoi, A. and Prünster, I. (2006). Conjugacy as a distinctive feature of the Dirichlet process. Scand. J. Stat. 33 105–120.
  • [36] James, L. F., Lijoi, A. and Prünster, I. (2009). Posterior analysis for normalized random measures with independent increments. Scand. J. Stat. 36 76–97.
  • [37] James, L. F., Lijoi, A. and Prünster, I. (2010). On the posterior distribution of classes of random means. Bernoulli 16 155–180.
  • [38] Jasra, A., Holmes, C. C. and Stephens, D. A. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statist. Sci. 20 50–67.
  • [39] Kalli, M., Griffin, J. E. and Walker, S. G. (2011). Slice sampling mixture models. Stat. Comput. 21 93–105.
  • [40] Kingman, J. F. C. (1967). Completely random measures. Pacific J. Math. 21 59–78.
  • [41] Kingman, J. F. C. (1993). Poisson Processes. Oxford Studies in Probability 3. Clarendon Press, Oxford.
  • [42] Kingman, J. F. C., Taylor, S. J., Hawkes, A. G., Walker, A. M., Cox, D. R., Smith, A. F. M., Hill, B. M., Burville, P. J. and Leonard, T. (1975). Random discrete distribution. J. R. Stat. Soc. Ser. B Stat. Methodol. 37 1–22.
  • [43] Lau, J. W. and Green, P. J. (2007). Bayesian model-based clustering procedures. J. Comput. Graph. Statist. 16 526–558.
  • [44] Lewicki, M. S. (1998). A review of methods for spike sorting: The detection and classification of neural action potentials. Network 9 53–78.
  • [45] Lewis, P. A. W. and Shedler, G. S. (1979). Simulation of nonhomogeneous Poisson processes by thinning. Naval Res. Logist. Quart. 26 403–413.
  • [46] Lijoi, A., Mena, R. H. and Prünster, I. (2005). Bayesian nonparametric analysis for a generalized Dirichlet process prior. Stat. Inference Stoch. Process. 8 283–309.
  • [47] Lijoi, A., Mena, R. H. and Prünster, I. (2005). Hierarchical mixture modeling with normalized inverse-Gaussian priors. J. Amer. Statist. Assoc. 100 1278–1291.
  • [48] Lijoi, A., Mena, R. H. and Prünster, I. (2007). Controlling the reinforcement in Bayesian non-parametric mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 715–740.
  • [49] Lijoi, A. and Prünster, I. (2010). Models beyond the Dirichlet process. In Bayesian Nonparametrics (N. L. Hjort, C. C. Holmes, P. Müller and S. G. Walker, eds.) 80–136. Cambridge Univ. Press, Cambridge.
  • [50] Lijoi, A., Prünster, I. and Walker, S. G. (2008). Investigating nonparametric priors with Gibbs structure. Statist. Sinica 18 1653–1668.
  • [51] Lo, A. Y. (1984). On a class of Bayesian nonparametric estimates. I. Density estimates. Ann. Statist. 12 351–357.
  • [52] MacEachern, S. N. (1994). Estimating normal means with a conjugate style Dirichlet process prior. Comm. Statist. Simulation Comput. 23 727–741.
  • [53] MacEachern, S. N. (1998). Computational methods for mixture of Dirichlet process models. In Practical Nonparametric and Semiparametric Bayesian Statistics (D. Dey, P. Müller and D. Sinha, eds.). Lecture Notes in Statist. 133 23–43. Springer, New York.
  • [54] MacEachern, S. N. and Müller, P. (1998). Estimating mixture of Dirichlet process models. J. Comput. Graph. Statist. 7 223–238.
  • [55] McLachlan, G. J. and Basford, K. E. (1988). Mixture Models: Inference and Applications to Clustering. Statistics: Textbooks and Monographs 84. Dekker, New York.
  • [56] Medvedovic, M. and Sivaganesan, S. (2002). Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 18 1194–1206.
  • [57] Mengersen, K. L. and Robert, C. P. (1996). Testing for mixtures: A Bayesian entropic approach. In Bayesian Statistics, 5 (Alicante, 1994) (J. O. Berger, J. M. Bernardo, A. P. Dawid, D. V. Lindley and A. F. M. Smith, eds.) 255–276. Oxford Univ. Press, New York.
  • [58] Muliere, P. and Tardella, L. (1998). Approximating distributions of random functionals of Ferguson–Dirichlet priors. Canad. J. Statist. 26 283–297.
  • [59] Müller, P., Erkanli, A. and West, M. (1996). Bayesian curve fitting using multivariate normal mixtures. Biometrika 83 67–79.
  • [60] Neal, R. M. (1992). Bayesian mixture modeling. In Proceedings of the 11th International Workshop on Maximum Entropy and Bayesian Methods of Statistical Analysis, Seattle. Kluwer, Dordrecht.
  • [61] Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist. 9 249–265.
  • [62] Neal, R. M. (2003). Slice sampling. Ann. Statist. 31 705–767.
  • [63] Nieto-Barajas, L. E. and Prünster, I. (2009). A sensitivity analysis for Bayesian nonparametric density estimators. Statist. Sinica 19 685–705.
  • [64] Nieto-Barajas, L. E., Prünster, I. and Walker, S. G. (2004). Normalized random measures driven by increasing additive processes. Ann. Statist. 32 2343–2360.
  • [65] Nobile, A. (1994). Bayesian analysis of finite mixture distributions. Ph.D. thesis, Carnegie Mellon Univ.
  • [66] Ogata, Y. (1981). On Lewis’ simulation method for Point processes. IEEE Trans. Inform. Theory 27 23–31.
  • [67] Papaspiliopoulos, O. (2008). A note on posterior sampling from Dirichlet mixture models. Working Paper 20, Centre for Research in Statistical Methodology, Univ. Warwick.
  • [68] Papaspiliopoulos, O. and Roberts, G. O. (2008). Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika 95 169–186.
  • [69] Perman, M., Pitman, J. and Yor, M. (1992). Size-biased sampling of Poisson point processes and excursions. Probab. Theory Related Fields 92 21–39.
  • [70] Pitman, J. (2003). Poisson–Kingman partitions. In Statistics and Science: A Festschrift for Terry Speed (D.R. Goldstein, ed.). Institute of Mathematical Statistics Lecture Notes—Monograph Series 40 1–34. IMS, Beachwood, OH.
  • [71] Pitman, J. (2006). Combinatorial Stochastic Processes. Lecture Notes in Math. 1875. Springer, Berlin.
  • [72] Pitman, J. and Yor, M. (1997). The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25 855–900.
  • [73] Quiroga, R. Q. (2007). Spike sorting. Scholarpedia 2 3583.
  • [74] Raftery, A. E. (1996). Hypothesis testing and model selection. In Markov Chain Monte Carlo in Practice (W. R. Gilks, S. Richardson and D. J. Spiegelhalter, eds.). Chapman & Hall, London.
  • [75] Raftery, A. E. (1996). Hypothesis testing and model selection via posterior simulation. In Markov Chain Monte Carlo in Practice (W. R. Gilks, S. Richardson and D. J. Spiegelhalter, eds.). Chapman & Hall, London.
  • [76] Rasmussen, C. E., De la Cruz, B. J., Ghahramani, Z. and Wild, D. L. (2009). Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures. IEEE/ACM Trans. Comput. Biol. and Bioinform. 6 615–628.
  • [77] Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA.
  • [78] Regazzini, E., Lijoi, A. and Prünster, I. (2003). Distributional results for means of normalized random measures with independent increments. Ann. Statist. 31 560–585.
  • [79] Richardson, S. and Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components. J. R. Stat. Soc. Ser. B Stat. Methodol. 59 731–792.
  • [80] Roeder, K. (1994). A graphical technique for determining the number of components in a mixture of normals. J. Amer. Statist. Assoc. 89 487–495.
  • [81] Roeder, K. and Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals. J. Amer. Statist. Assoc. 92 894–902.
  • [82] Stephens, M. (2000). Bayesian analysis of mixture models with an unknown number of components—an alternative to reversible jump methods. Ann. Statist. 28 40–74.
  • [83] Titterington, D. M., Smith, A. F. M. and Makov, U. E. (1985). Statistical Analysis of Finite Mixture Distributions. Wiley, Chichester.
  • [84] Trippa, L. and Favaro, S. (2012). A class of normalized random measures with an exact predictive sampling scheme. Scand. J. Stat. 39 444–460.
  • [85] Walker, S. G. (2007). Sampling the Dirichlet mixture model with slices. Comm. Statist. Simulation Comput. 36 45–54.
  • [86] Wood, F. and Black, M. J. (2008). A nonparametric Bayesian alternative to spike sorting. Journal of Neuroscience Methods 173 1–12.