Bayesian Analysis

Analysis of the Maximal a Posteriori Partition in the Gaussian Dirichlet Process Mixture Model

Łukasz Rajkowski

Full-text: Open access

Abstract

Mixture models are a natural choice in many applications, but it can be difficult to place an a priori upper bound on the number of components. To circumvent this, investigators are turning increasingly to Dirichlet process mixture models (DPMMs). It is therefore important to develop an understanding of the strengths and weaknesses of this approach. This work considers the MAP (maximum a posteriori) clustering for the Gaussian DPMM (where the cluster means have Gaussian distribution and, for each cluster, the observations within the cluster have Gaussian distribution). Some desirable properties of the MAP partition are proved: ‘almost disjointness’ of the convex hulls of clusters (they may have at most one point in common) and (with natural assumptions) the comparability of sizes of those clusters that intersect any fixed ball with the number of observations (as the latter goes to infinity). Consequently, the number of such clusters remains bounded. Furthermore, if the data arises from independent identically distributed sampling from a given distribution with bounded support then the asymptotic MAP partition of the observation space maximises a function which has a straightforward expression, which depends only on the within-group covariance parameter. As the operator norm of this covariance parameter decreases, the number of clusters in the MAP partition becomes arbitrarily large, which may lead to the overestimation of the number of mixture components.

Article information

Source
Bayesian Anal., Volume 14, Number 2 (2019), 477-494.

Dates
First available in Project Euclid: 30 July 2018

Permanent link to this document
https://projecteuclid.org/euclid.ba/1532937626

Digital Object Identifier
doi:10.1214/18-BA1114

Mathematical Reviews number (MathSciNet)
MR3934094

Zentralblatt MATH identifier
07045439

Subjects
Primary: 62F15: Bayesian inference

Keywords
Dirichlet process mixture models Chinese Restaurant Process

Rights
Creative Commons Attribution 4.0 International License.

Citation

Rajkowski, Łukasz. Analysis of the Maximal a Posteriori Partition in the Gaussian Dirichlet Process Mixture Model. Bayesian Anal. 14 (2019), no. 2, 477--494. doi:10.1214/18-BA1114. https://projecteuclid.org/euclid.ba/1532937626


Export citation

References

  • Antoniak, C. E. (1974). “Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems.” The Annals of Statistics, 1152–1174.
  • Blackwell, D. and MacQueen, J. B. (1973). “Ferguson distributions via Pólya urn schemes.” The Annals of Statistics, 353–355.
  • Dahl, D. B. (2006). “Model-based clustering for expression data via a Dirichlet process mixture model.” Bayesian Inference for Gene Expression and Proteomics, 201–218.
  • Doob, J. L. (1994). Measure Theory. Graduate Texts in Mathematics 143. Springer-Verlag New York, 1 edition.
  • Elker, J., Pollard, D., and Stute, W. (1979). “Glivenko-Cantelli Theorems for Classes of Convex Sets.” Advances in Applied Probability, 11(4): 820–833.
  • Ferguson, T. S. (1973). “A Bayesian analysis of some nonparametric problems.” The Annals of Statistics, 209–230.
  • Fritsch, A., Ickstadt, K., et al. (2009). “Improved criteria for clustering based on the posterior similarity matrix.” Bayesian Analysis, 4(2): 367–391.
  • Huelsenbeck, J. P. and Andolfatto, P. (2007). “Inference of population structure under a Dirichlet process model.” Genetics, 175(4): 1787–1802.
  • Miller, J. W. and Harrison, M. T. (2014). “Inconsistency of Pitman-Yor Process Mixtures for the Number of Components.” Journal of Machine Learning Research, 15: 3333–3370.
  • Moszyńska, M. (2005). Selected Topics in Convex Geometry. Birkhäuser Boston, 1 edition.
  • Neal, R. M. (2000). “Markov Chain Sampling Methods for Dirichlet Process Mixture Models.” Journal of Computational and Graphical Statistics, 9(2): 249–265.
  • Rajkowski, Ł. (2018). “Supplementary Material to “Analysis of the Maximal a Posteriori Partition in the Gaussian Dirichlet Process Mixture Model”” Bayesian Analysis.
  • Valentine, F. A. (1964). Convex Sets. McGraw-Hill Book Company.

Supplemental materials

  • Supplementary Material to “Analysis of the Maximal a Posteriori Partition in the Gaussian Dirichlet Process Mixture Model”. Supplement A: This supplementary material contains proofs that were left for the appendix. Supplement B: This supplementary material contains results of computer simulations.