Electronic Journal of Statistics

Simple approximate MAP inference for Dirichlet processes mixtures

Yordan P. Raykov, Alexis Boukouvalas, and Max A. Little

Full-text: Open access

Abstract

The Dirichlet process mixture model (DPMM) is a ubiquitous, flexible Bayesian nonparametric statistical model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as Gibbs sampling are required. As a result, DPMM-based methods, which have considerable potential, are restricted to applications in which computational resources and time for inference is plentiful. For example, they would not be practical for digital signal processing on embedded hardware, where computational resources are at a serious premium. Here, we develop a simplified yet statistically rigorous approximate maximum a-posteriori (MAP) inference algorithm for DPMMs. This algorithm is as simple as DP-means clustering, solves the MAP problem as well as Gibbs sampling, while requiring only a fraction of the computational effort. (For freely available code that implements the MAP-DP algorithm for Gaussian mixtures see http://www.maxlittle.net/.) Unlike related small variance asymptotics (SVA), our method is non-degenerate and so inherits the “rich get richer” property of the Dirichlet process. It also retains a non-degenerate closed-form likelihood which enables out-of-sample calculations and the use of standard tools such as cross-validation. We illustrate the benefits of our algorithm on a range of examples and contrast it to variational, SVA and sampling approaches from both a computational complexity perspective as well as in terms of clustering performance. We demonstrate the wide applicabiity of our approach by presenting an approximate MAP inference method for the infinite hidden Markov model whose performance contrasts favorably with a recently proposed hybrid SVA approach. Similarly, we show how our algorithm can applied to a semiparametric mixed-effects regression model where the random effects distribution is modelled using an infinite mixture model, as used in longitudinal progression modelling in population health science. Finally, we propose directions for future research on approximate MAP inference in Bayesian nonparametrics.

Article information

Source
Electron. J. Statist. Volume 10, Number 2 (2016), 3548-3578.

Dates
Received: May 2016
First available in Project Euclid: 16 November 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1479287231

Digital Object Identifier
doi:10.1214/16-EJS1196

Zentralblatt MATH identifier
1357.62227

Subjects
Primary: 62F15: Bayesian inference
Secondary: 62G86: Nonparametric inference and fuzziness

Keywords
Bayesian nonparametrics clustering Gaussian mixture model

Citation

Raykov, Yordan P.; Boukouvalas, Alexis; Little, Max A. Simple approximate MAP inference for Dirichlet processes mixtures. Electron. J. Statist. 10 (2016), no. 2, 3548--3578. doi:10.1214/16-EJS1196. https://projecteuclid.org/euclid.ejs/1479287231


Export citation

References

  • Charles E. Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric, problems.The Annals of Statistics, pages 1152–1174, 1974.
  • Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh. Clustering with bregman, divergences.Journal of Machine Learning Research, 6 :1705–1749, December 2005. ISSN 1532-4435.
  • Albert-László Barabási and Réka Albert. Emergence of scaling in random, networks.Science, 286 (5439):509–512, 1999.
  • Matthew J. Beal, Zoubin Ghahramani, and Carl E. Rasmussen. The infinite hidden Markov model., InMachine Learning, pages 29–245. MIT Press, 2002.
  • Christopher, Bishop.Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., 2006. ISBN 0387310738.
  • David Blackwell. Conditional expectation and unbiased sequential, estimation.The Annals of Mathematical Statistics, 18(1):105–110, 03 1947. URLhttp://dx.doi.org/10.1214/aoms/1177730497.
  • Catherine Blake and Christopher J. Merz. UCI repository of machine learning databases., 1998.
  • David Blei and Michael I. Jordan. Variational methods for the Dirichlet process., InProceedings of the 21st International Conference on Machine Learning (ICML), page 12, 2004.
  • Olivier Bousquet and Léon Bottou. The tradeoffs of large scale learning., InAdvances in Neural Information Processing Systems, pages 161–168, 2008.
  • Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, and Michael I. Jordan. Streaming variational Bayes., InAdvances in Neural Information Processing Systems, pages 1727–1735, 2013a.
  • Tamara Broderick, Brian Kulis, and Michael I. Jordan. Mad-bayes: Map-based asymptotic derivations from bayes., InICML (3), pages 226–234, 2013b.
  • Wang Chong, John W. Paisley, and David M. Blei. Online variational inference for the hierarchical Dirichlet process., InInternational Conference on Artificial Intelligence and Statistics, pages 752–760, 2011.
  • David B. Dahl. Modal clustering in a class of product partition, models.Bayesian Analysis, 4(2):243–264, 2009.
  • Hal Daumé. Fast search for Dirichlet process mixture models., InInternational Conference on Artificial Intelligence and Statistics, 2007.
  • David Dunson. Nonparametric Bayes applications to biostatistics., InBayesian Nonparametrics: Principles and Practice. Cambridge University Press, 2010.
  • Thomas Ferguson. A Bayesian analysis of some nonparametric, problems.Annals of Statistics, 1(2):209–230, 03 1973. URLhttp://dx.doi.org/10.1214/aos/1176342360.
  • Michael C. Hughes and Erik B. Sudderth. Memoized online variational inference for Dirichlet process mixture models., InAdvances in Neural Information Processing Systems, pages 1133–1141, 2013.
  • Michael C. Hughes, Dae Il Kim, and Erik B. Sudderth. Reliable and scalable variational inference for the hierarchical Dirichlet process., InInternational Conference on Artificial Intelligence and Statistics, pages 370–378, 2015.
  • Ke Jiang, Brian Kulis, and Michael I. Jordan. Small-variance asymptotics for exponential family Dirichlet process mixture models., InAdvances in Neural Information Processing Systems, pages 3158–3166, 2012.
  • Josef Kittler and Janos Föglein. Contextual classification of multispectral pixel, data.Image and Vision Computing, 2(1):13–29, 1984.
  • Ken Kleinman and Joseph Ibrahim. A semiparametric Bayesian approach to the random effects, model.Biometrics, 54(3):921–938, 1998.
  • Brian Kulis and Michael I. Jordan. Revisiting K-means: New algorithms via Bayesian nonparametrics., InProceedings of the 29th International Conference on Machine Learning (ICML), pages 513–520, 2012.
  • Jeffrey W. Miller and Matthew T. Harrison. A simple example of Dirichlet process mixture inconsistency for the number of components., InAdvances in Neural Information Processing Systems, pages 199–206, 2013.
  • Radford Neal. Markov chain sampling methods for Dirichlet process mixture, models.Journal of Computational and Graphical Statistics, 9:249–265, 2000a.
  • Radford Neal. Markov chain sampling methods for Dirichlet process mixture, models.Journal of Computational and Graphical Statistics, 9(2):249–265, 2000b.
  • Gopalakrishnan Netuveli, Richard Wiggins, Zoe Hildon, Scott Montgomery, and David Blane. Quality of life at older ages: Evidence from the english longitudinal study of aging (wave, 1).Journal of Epidemiology and Community Health, 60(4):357–363, 2006.
  • Jim Pitman. Exchangeable and partially exchangeable random, partitions.Probability Theory and Related Fields, 102(2):145–158, 1995.
  • Jim Pitman and Marc Yor. The two-parameter Poisson-Dirichlet distribution derived from a stable, subordinator.The Annals of Probability, 25(2):855–900, 1997. URLhttp://dx.doi.org/10.1214/aop/1024404422.
  • Adrian Raftery and Steven Lewis. How many iterations in the Gibbs, sampler?Bayesian Statistics, 4(2):763–773, 1992.
  • Carl Rasmussen. The infinite Gaussian mixture model., InAdvances in Neural Information Processing Systems, pages 554–560, 1999.
  • Yordan Raykov, Alexis Boukouvalas, and Max A. Little. Simple approximate MAP inference for Dirichlet, processes.arXiv :1411.0939, 2014.
  • Abel Rodriguez, David Dunson, and Alan Gelfand. The nested Dirichlet, process.Journal of the American Statistical Association, 103(483), 2008.
  • Anirban Roychowdhury, Ke Jiang, and Brian Kulis. Small-variance asymptotics for hidden Markov models., InAdvances in Neural Information Processing Systems, pages 2103–2111, 2013.
  • Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practical Bayesian optimization of machine learning algorithms., InAdvances in Neural Information Processing Systems, pages 2951–2959, 2012.
  • Y. W. Teh, K. Kurihara, and M. Welling. Collapsed variational inference for the HDP., InAdvances in Neural Information Processing Systems, pages 1481–1488, 2008.
  • Yee Teh, Michael I. Jordan, Matthew Beal, and David Blei. Hierarchical Dirichlet, processes.Journal of the American Statistical Association, 101(476), 2006.
  • Jurgen Van, Gael.Bayesian Nonparametric Hidden Markov Models. PhD thesis, University of Cambridge, 2012.
  • Jurgen Van Gael, Yunus Saatci, Yee Whye Teh, and Zoubin Ghahramani. Beam sampling for the infinite hidden Markov model., InProceedings of the 25th International Conference on Machine Learning (ICML), pages 1088–1095, 2008.
  • Nguyen Vinh, Julien Epps, and James Bailey. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for, chance.The Journal of Machine Learning Research, 11 :2837–2854, 2010.
  • Lianming Wang and David Dunson. Fast Bayesian inference in Dirichlet process mixture, models.Journal of Computational and Graphical Statistics, 20(1):196–216, 2011.
  • Max Welling and Kenichi Kurihara. Bayesian K-means as a maximization-expectation algorithm., InSDM, pages 474–478. SIAM, 2006.
  • Max Welling and Yee Teh. Bayesian learning via stochastic gradient Langevin dynamics., InProceedings of the 28th International Conference on Machine Learning (ICML), pages 681–688, 2011.
  • Xiaole Zhang, David J. Nott, Christopher Yau, and Ajay Jasra. A sequential algorithm for fast fitting of Dirichlet process mixture, models.Journal of Computational and Graphical Statistics, 23(4) :1143–1162, 2014.