Bayesian Analysis

Bayesian Cluster Analysis: Point Estimation and Credible Balls

Sara Wade and Zoubin Ghahramani

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access


Clustering is widely studied in statistics and machine learning, with applications in a variety of fields. As opposed to popular algorithms such as agglomerative hierarchical clustering or k-means which return a single clustering solution, Bayesian nonparametric models provide a posterior over the entire space of partitions, allowing one to assess statistical properties, such as uncertainty on the number of clusters. However, an important problem is how to summarize the posterior; the huge dimension of partition space and difficulties in visualizing it add to this problem. In a Bayesian analysis, the posterior of a real-valued parameter of interest is often summarized by reporting a point estimate such as the posterior mean along with 95% credible intervals to characterize uncertainty. In this paper, we extend these ideas to develop appropriate point estimates and credible sets to summarize the posterior of the clustering structure based on decision and information theoretic techniques.

Article information

Bayesian Anal. (2017), 29 pages.

First available in Project Euclid: 19 October 2017

Permanent link to this document

Digital Object Identifier

mixture model random partition variation of information Binder’s loss

Creative Commons Attribution 4.0 International License.


Wade, Sara; Ghahramani, Zoubin. Bayesian Cluster Analysis: Point Estimation and Credible Balls. Bayesian Anal., advance publication, 19 October 2017. doi:10.1214/17-BA1073.

Export citation


  • Binder, D. (1978). “Bayesian Cluster Analysis.”Biometrika, 65: 31–38.
  • Broderick, T., Kulis, B., and Jordan, M. (2013). “MAD-Bayes: MAP-based asymptotic derivations from Bayes.” InProceedings of the 30th International Conference on Machine Learning, 226–234.
  • Dahl, D. (2006). “Model-based clustering for expression data via a Dirichlet process mixture model.” In Do, K., Müller, P., and Vannucci, M. (eds.),Bayesian Inference for Gene Expression and Proteomic, 201–218. Cambridge University Press.
  • Dahl, D. (2009). “Modal clustering in a class of product partition models.”Bayesian Analysis, 4: 243–264.
  • Duan, J., Guindani, M., and Gelfand, A. (2007). “Generalized spatial Dirichlet processes.”Biometrika, 94: 809–825.
  • Dunson, D. (2010). “Nonparametric Bayes applications to biostatistics.” In Hjort, N., Holmes, C., Müller, P., and Walker, S. (eds.),Bayesian nonparametrics. Cambridge University Press.
  • Favaro, S. and Teh, Y. (2013). “MCMC for normalized random measure mixture models.”Statistical Science, 28: 335–359.
  • Favaro, S. and Walker, S. (2012). “Slice sampling $\sigma$-stable Poisson–Kingman mixture models.”Journal of Computational and Graphical Statistics, 22: 830–847.
  • Ferguson, T. (1973). “A Bayesian analysis of some nonparametric problems.”Annals of Statistics, 1: 209–230.
  • Fraley, C. and Raftery, A. (2002). “Model-based clustering, discriminant analysis, and density estimation.”Journal of the American Statistical Association, 97: 611–631.
  • Fritsch, A. (2012).mcclust: Process an MCMC Sample of Clusterings. URL
  • Fritsch, A. and Ickstadt, K. (2009). “Improved criteria for clustering based on the posterior similarity matrix.”Bayesian Analysis, 4: 367–392.
  • Griffin, J. and Steel, M. (2006). “Order-based dependent Dirichlet processes.”Journal of the American Statistical Association, 10: 179–194.
  • Griffiths, T. and Ghahramani, Z. (2011). “The Indian buffet process: An introduction and review.”Journal of Machine Learning Research, 12: 1185–1224.
  • Hartigan, J. and Wong, M. (1979). “Algorithm AS 136: A k-means clustering algorithm.”Journal of the Royal Statistical Society, Series C, 28: 100–108.
  • Heard, N., Holmes, C., and Stephens, D. (2006). “A quantitative study of gene regulation involved in the immune response of anopheline mosquitos: An application of Bayesian hierarchical clustering of curves.”Journal of the American Statistical Association, 101: 18–29.
  • Heller, K. and Ghahramani, Z. (2005). “Bayesian hierarchical clustering.” InProceedings of the 22nd International Conference on Machine Learning, 297–304.
  • Hubert, L. and Arabie, P. (1985). “Comparing partitions.”Journal of Classification, 2: 193–218.
  • Ishwaran, H. and James, L. (2001). “Gibbs campling methods for stick-breaking priors.”Journal of the American Statistical Association, 96: 161–173.
  • Jiang, K., Kulis, B., and Jordan, M. (2012). “Small-variance asymptotics for exponential family Dirichlet process mixture models.” InAdvances in Neural Information Processing Systems, 3158–3166.
  • Kalli, M., Griffin, J., and Walker, S. (2011). “Slice sampling mixture models.”Statistics and Computing, 21: 93–105.
  • Kulis, B. and Jordan, M. (2012). “Revisiting K-means: New algorithms via Bayesian nonparametrics.” InProceedings of the 29th International Conference on Machine Learning, 513–520.
  • Lau, J. and Green, P. (2007). “Bayesian model-based clustering procedures.”Journal of Computational and Graphical Statistics, 16: 526–558.
  • Lijoi, A. and Prünster, I. (2011). “Models beyond the Dirichlet process.” In Hjort, N., Holmes, C., Müller, P., and Walker, S. (eds.),Bayesian Nonparametrics, 80–136. Cambridge, UK: Cambridge University Press.
  • Lo, A. (1984). “On a class of Bayesian nonparametric estimates: I. Density estimates.”Annals of Statistics, 12: 351–357.
  • Lomellí, M., Favaro, S., and Teh, Y. (2015). “A hybrid sampler for Poisson–Kingman mixture models.” In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R. (eds.),Advances in Neural Information Processing Systems 28.
  • Lomellí, M., Favaro, S., and Teh, Y. (2016). “A marginal sampler for $\sigma$-stable Poisson–Kingman mixture models.”Journal of Computational and Graphical Statistics. To appear.
  • MacEachern, S. (2000). “Dependent Dirichlet processes.”Technical Report, Department of Statistics, Ohio State University.
  • Medvedovic, M. and Sivaganesan, S. (2002). “Bayesian infinite mixture model based clustering of gene expression profiles.”Bioinformatics, 18: 1194–1206.
  • Medvedovic, M., Yeung, K., and Bumgarner, R. (2004). “Bayesian mixture model based clustering of replicated microarray data.”Bioinformatics, 20: 1222–1232.
  • Meilă, M. (2007). “Comparing clusterings – an information based distance.”Journal of Multivariate Analysis, 98: 873–895.
  • Miller, J. and Harrison, M. (2013). “A simple example of Dirichlet process mixture inconsistency for the number of components.” In Burges, C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. (eds.),Advances in Neural Information Processing Systems 26. Curran Associates, Inc.
  • Miller, J. and Harrison, M. (2014). “Inconsistency of Pitman–Yor process mixtures for the number of components.”Journal of Machine Learning Research, 15: 3333–3370.
  • Molitor, J., Papathomas, M., Jerrett, M., and Richardson, S. (2010). “Bayesian profile regression with an application to the national survey of children’s health.”Biostatistics, 11: 484–498.
  • Müller, P. and Quintana, F. (2004). “Nonparametric Bayesian data analysis.”Statistical Science, 19: 95–110.
  • Nation, J. (1991).Notes on Lattice Theory.
  • Neal, R. (2000). “Markov chain sampling methods for Dirichlet process mixture models.”Journal of Computational and Graphical Statistcs, 9: 249–265.
  • Papaspiliopoulos, O. and Roberts, G. (2008). “Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models.”Biometrika, 95(1): 169–186.
  • Pitman, J. (2003). “Poisson Kingman partitions.” InStatistics and Science: a Festschrift for Terry Speed, 1–34. Beachwood: IMS Lecture Notes.
  • Pitman, J. and Yor, M. (1997). “The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator.”Annals of Probability, 25: 855–900.
  • Quintana, F. (2006). “A predictive view of Bayesian clustering.”Journal of Statistical Planning and Inference, 136: 2407–2429.
  • Quintana, F. and Iglesias, P. (2003). “Bayesian clustering and product partition models.”Journal of the Royal Statistical Society: Series B, 65: 557–574.
  • Rand, W. (1971). “Objective criteria for the evaluation of clustering methods.”Journal of the American Statistical Association, 66: 846–850.
  • Rasmussen, C., De la Cruz, B., Ghahramani, Z., and Wild, D. (2009). “Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures.”Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 6: 615–628.
  • Raykov, Y., Boukouvalas, A., and Little, M. (2014). “Simple approximate MAP Inference for Dirichlet processes.” Available at
  • Roeder, K. (1990). “Density estimation with confidence sets exemplified by superclusters and voids in galaxies.”Journal of the American Statistical Association, 85: 617–624.
  • Teh, Y., Jordan, M., Beal, M., and Blei, D. (2006). “Hierarchical Dirichlet process.”Journal of the American Statistical Association, 101: 1566–1581.
  • Vinh, N., Epps, J., and Bailey, J. (2010). “Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance.”Journal of Machine Learning Research, 11: 2837–2854.
  • Wade, S. (2015).mcclust.ext: Point estimation and credible balls for Bayesian cluster analysis. URL
  • Wade, S. and Ghahramani, Z. (2017). “Supplementary material for Bayesian cluster analysis: Point estimation and credible balls.”Bayesian Analysis.

Supplemental materials