Bayesian Analysis

Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination

Christopher Yau and Chris Holmes

Full-text: Open access

Abstract

We propose a hierarchical Bayesian nonparametric mixture model for clustering when some of the covariates are assumed to be of varying relevance to the clustering problem. This can be thought of as an issue in variable selection for unsupervised learning. We demonstrate that by defining a hierarchical population based nonparametric prior on the cluster locations scaled by the inverse covariance matrices of the likelihood we arrive at a `sparsity prior' representation which admits a conditionally conjugate prior. This allows us to perform full Gibbs sampling to obtain posterior distributions over parameters of interest including an explicit measure of each covariate's relevance and a distribution over the number of potential clusters present in the data. This also allows for individual cluster specific variable selection. We demonstrate improved inference on a number of canonical problems.

Article information

Source
Bayesian Anal., Volume 6, Number 2 (2011), 329-351.

Dates
First available in Project Euclid: 13 June 2012

Permanent link to this document
https://projecteuclid.org/euclid.ba/1339612049

Digital Object Identifier
doi:10.1214/11-BA612

Mathematical Reviews number (MathSciNet)
MR2806247

Zentralblatt MATH identifier
1330.62265

Subjects
Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]
Secondary: 60G57: Random measures 62B10: Information-theoretic topics [See also 94A17] 62F15: Bayesian inference 62G99: None of the above, but in this section 62H99: None of the above, but in this section 62P10: Applications to biology and medical sciences

Keywords
Bayesian mixture models Bayesian nonparametric priors variable selection unsupervised learning

Citation

Yau, Christopher; Holmes, Chris. Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination. Bayesian Anal. 6 (2011), no. 2, 329--351. doi:10.1214/11-BA612. https://projecteuclid.org/euclid.ba/1339612049


Export citation

References

  • Andrews, D. and Mallows, C. (1974). "Scale mixtures of normal distributions"." Journal of the Royal Statistical Society. Series B (Methodological), 36(1): 99–102.
  • Antoniak, C. (1974). "Mixtures of Dirichlet Processes With Applications to Bayesian Nonparametric Problems." The Annals of Statistics, 2: 1152–1174.
  • Claeskens, G. and Hjort, N. (2008). Model Selection and Model Averaging. Cambridge University Press.
  • Döhner, H., Stilgenbauer, S., Benner, A., Leupolt, E., Kröber, A., Bullinger, L., Döhner, K., Bentz, M., and Lichter, P. (2000). "Genomic aberrations and survival in chronic lymphocytic leukemia." New England Journal of Medicine, 343(26): 1910–1916.
  • Dy, J. and Broadly, C. (2004). "Feature selection for unsupervised learning." Journal of Machine Learning Research, 5: 845–889.
  • Escobar, M. (1994). "Estimating normal means with a Dirichlet process prior"." Journal of the American Statistical Association, 89(425): 268–277.
  • Escobar, M. and West, M. (1995). "Bayesian Density Estimation and Inference Using Mixtures." Journal of the American Statistical Association, 90(430): 577–88.
  • Ferguson, T. (1973). "A Bayesian Analysis of Some Nonparametric Problems." The Annals of Statistics, 1(2): 209–230.
  • Fisher, R. (1936). "The use of multiple measurements in taxonomic problems"." Annals of Eugenics, 7: 179–188.
  • Fraley, C. and Raftery, A. (2002). "Model-based clustering, discriminant analysis, and density estimation"." Journal of the American Statistical Association, 97(458): 611–631.
  • Friedman, J. and Meulman, J. (2004). "Clustering objects on subsets of attributes (with Discussion)"." Journal of the Royal Statistical Society, Series B, 66: 815–849.
  • Frühwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer Series in Statistics. New York: Springer.
  • Green, P. and Richardson, S. (2001). "Modelling Heterogeneity With and Without the Dirichlet Process." Scandinavian Journal of Statistics, 28(2): 355–375.
  • Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Second Edition). Springer.
  • Hoff, P. (2006). "Model-based subspace clustering"." Bayesian Analysis, 1(2): 321–344.
  • Ishwaran, H. and James, L. (2001). "Gibbs s"ampling methods for stick-breaking priors. Journal of the American Statistical Association, 96: 161–73.
  • Jain, S. and Neal, R. (2000). "A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model"." Journal of Computational and Graphical Statistics, 13: 158–82.
  • Jasra, A., Holmes, C., and Stephens, D. (2005). "Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling"." Statistical Science, 20(1): 50–67.
  • Kalli, M., Griffin, J., and Walker, S. (2011). "Slice sampling mixture models." Statistics and Computing, 1: 93–105.
  • Kim, S., Tadesse, M., and Vannucci, M. (2006). "Variable selection in clustering via Dirichlet process mixture models"." Biometrika, 93(4): 877–893.
  • Knight, S. J. L., Yau, C., Timbs, A., Sadighi-Akha, E., Dreau, H., Burns, A., Oscier, D., Pettitt, A., Holmes, C., Taylor, J., Cazier, J.-B., and Schuh, A. (2011). "A genome-wide array-based sequential analysis quantifies the proportion of sub-clones carrying genomic changes in B-cell chronic lymphocytic leukaemia and reveals the complexity of clonal dynamics"." Submitted to Leukemia.
  • Law, M., Figueiredo, M., and Jain, A. (2004). "Simultaneous feature selection and clustering using mixture models." IEEE Transaction on Pattern Analysis and Machine Intelligence, 26(9): 1154–1166.
  • MacEachern, S. (1998). "Estimating mixture of Dirichlet process models"." Journal of Computational and Graphical Statistics, 7(2): 223–238.
  • MacKay, D. (1995). "Probable networks and plausible predictions-a review of practical Bayesian methods for supervised neural networks"." Network: Computation in Neural Systems, 6(3): 469–505.
  • Maugis, C., Celeux, G., and Martin-Magniette, M. (2009). "Variable Selection for Clustering with Gaussian Mixture Models."" Biometrics, 65: 701–709.
  • Neal, R. (2000). "Markov Chain Sampling: Methods for Dirichlet Process Mixture Models." Journal of Computational and Graphical Statistics, 9: 249–265.
  • Neal, R. M. (1996). Bayesian Learning for Neural Networks. Springer-Verlag.
  • Pan, W. and Shen, X. (2007). "Penalized model-based Clustering with Application to Variable Selection." Journal of Machine Learning Research, 8: 1145–1164.
  • Papaspiliopoulos, O. and Roberts, G. O. (2008). "Retrospective Markov chain Monte Carlo for Dirichlet process hierarchical models." Biometrika, 95: 169–186.
  • Raftery, A. and Dean, N. (2006). "Variable selection for model-based clustering"." Journal of the American Statistical Association, 101(473): 168–178.
  • Richardson, S. and Green, P. (1997). "On Bayesian analysis of mixtures with an unknown number of components"." Journal of the Royal Statistical Society. Series B (Methodological), 59(4): 731–792.
  • Sethuraman, J. (1994). "A Constructive Definition of Dirichlet Priors." Statistica Sinica, 4: 639–50.
  • Stephens, M. (2000). "Bayesian analysis of mixture models with an unknown number of components-an alternative to reversible jump methods"." The Annals of Statistics, 28(1): 40–74.
  • Tadesse, M., Sha, N., and Vannucci, M. (2005). "Bayesian Variable Selection in Clustering High-Dimensional Data."" Journal of the American Statistical Association, 100(470): 602–618.
  • Tipping, M. (2001). "Sparse Bayesian Learning and the Relevance Vector Machine"." Journal of Machine Learning Research, 1: 211–244.
  • Titterington, D., Smith, A., and Makov, U. (1985). Statistical Analysis of Finite Mixture Distributions. New York, Wiley.
  • Walker, S. (2007). "Sampling the Dirichlet mixture model with slices"." Communications in Statistics-Simulation and Computation, 36(1-3): 45–54.