Bayesian Analysis

The Discrete Infinite Logistic Normal Distribution

John Paisley, Chong Wang, and David M. Blei

Full-text: Open access

Abstract

We present the discrete infinite logistic normal distribution (DILN), a Bayesian nonparametric prior for mixed membership models. DILN generalizes the hierarchical Dirichlet process (HDP) to model correlation structure between the weights of the atoms at the group level. We derive a representation of DILN as a normalized collection of gamma-distributed random variables and study its statistical properties. We derive a variational inference algorithm for approximate posterior inference. We apply DILN to topic modeling of documents and study its empirical performance on four corpora, comparing performance with the HDP and the correlated topic model (CTM). To compute with large-scale data, we develop a stochastic variational inference algorithm for DILN and compare with similar algorithms for HDP and latent Dirichlet allocation (LDA) on a collection of 350,000 articles from Nature.

Article information

Source
Bayesian Anal., Volume 7, Number 4 (2012), 997-1034.

Dates
First available in Project Euclid: 27 November 2012

Permanent link to this document
https://projecteuclid.org/euclid.ba/1354024470

Digital Object Identifier
doi:10.1214/12-BA734

Mathematical Reviews number (MathSciNet)
MR3000022

Zentralblatt MATH identifier
1330.62081

Keywords
mixed-membership models Dirichlet process Gaussian process

Citation

Paisley, John; Wang, Chong; Blei, David M. The Discrete Infinite Logistic Normal Distribution. Bayesian Anal. 7 (2012), no. 4, 997--1034. doi:10.1214/12-BA734. https://projecteuclid.org/euclid.ba/1354024470


Export citation

References

  • Airoldi, E., Blei, D., Fienberg, S., and Xing, E. (2008). “Mixed Membership Stochastic Blockmodels.” Journal of Machine Learning Research, 9: 1981–2014.
  • Aitchison, J. (1982). “The statistical analysis of compositional data.” Journal of the Royal Statistical Society, Series B, 44(2): 139–177.
  • Armagan, A. and Dunson, D. (2011). “Sparse variational analysis of large longitudinal data sets.” Statistics & Probability Letters, 81: 1056–1062.
  • Asuncion, A., Welling, M., Smyth, P., and Teh, Y. (2009). “On smoothing and inference for topic models.” In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI), 27–34.
  • Blackwell, D. and MacQueen, J. (1973). “Ferguson Distributions Via Pólya Urn Schemes.” Annals of Statistics, 1(2): 353–355.
  • Blei, D. and Jordan, M. (2005). “Variational inference for Dirichlet process mixtures.” Bayesian Analysis, 1(1): 121–144.
  • Blei, D. and Lafferty, J. (2007). “A correlated topic model of Science.” Annals of Applied Statistics, 1(1): 17–35.
  • — (2009). “Topic Models.” In Srivastava, A. and Sahami, M. (eds.), Text Mining: Theory and Applications. Taylor and Francis.
  • Blei, D., Ng, A., and Jordan, M. (2003). “Latent Dirichlet allocation.” Journal of Machine Learning Research, 3: 993–1022.
  • Dawid, A. (1981). “Some matrix-variate distribution theory: Notational considerations and a Bayesian application.” Biometrika, 68(1): 265–274.
  • Duan, J., Guindani, M., and Gelfand, A. (2007). “Generalized spatial Dirichlet process models.” Biometrika, 94(4): 809–825.
  • Dunson, D. and Park, J. (2008). “Kernel stick-breaking processes.” Biometrika, 95(2): 307–323.
  • Dunson, D., Pillai, N., and Park, J. (2007). “Bayesian density regression.” Journal of the Royal Statistical Society, 69(2): 163–183.
  • Erosheva, E., Fienberg, S., and Joutard, C. (2007). “Describing Disability Through Individual-Level Mixture Models for Multivariate Binary Data.” Annals of Applied Statistics, 1(2): 346–384.
  • Erosheva, E., Fienberg, S., and Lafferty, J. (2004). “Mixed-membership models of scientific publications.” Proceedings of the National Academy of Science, 97(22): 11885–11892.
  • Escobar, M. D. and West, M. (1995). “Bayesian density estimation and inference using mixtures.” Journal of the American Statistical Association, 90(430): 577–588.
  • Ferguson, T. (1973). “A Bayesian analysis of some nonparametric problems.” The Annals of Statistics, 1: 209–230.
  • — (1983). “Bayesian density estimation by mixtures of normal distributions.” In Rizvi, M., Rustagi, J., and Siegmund, D. (eds.), Recent Advances in Statistics, volume 155, 287–302. Academic Press.
  • Gelfand, A., Kottas, A., and MacEachern, S. (2005). “Bayesian nonparametric spatial modeling with Dirichlet process mixing.” Journal of the American Statistical Association, 100: 1021–1035.
  • Griffin, J. and Steel, M. (2006). “Order-based dependent Dirichlet processes.” Journal of the American Statistical Association, 101(473): 179–194.
  • Griffin, J. E. and Walker, S. G. (2010). “Posterior Simulation of Normalized Random Measure Mixtures.” Journal of Computational and Graphical Statistics, 20(1): 241–259.
  • Griffiths, T. and Steyvers, M. (2004). “Finding scientific topics.” Proceedings of the National Academy of Science, 101(suppl. 1): 5228–5235.
  • Hastings, W. (1970). “Monte Carlo sampling methods using Markov chains and their applications.” Biometrika, 57: 97–109.
  • Hoffman, M., Blei, D., and Bach, F. (2010). “Online learning for latent Dirichlet allocation.” In Advances in Neural Information Processing Systems (NIPS) 23, 856–864.
  • Ishwaran, H. and James, L. (2001). “Gibbs sampling methods for stick-breaking priors.” Journal of the American Statistical Association, 96(453): 161–173.
  • Ishwaran, H. and Zarepour, M. (2002). “Exact and approximate sum representations for the Dirichlet process.” Canadian Journal of Statistics, 30: 269–283.
  • Jordan, M., Ghahramani, Z., Jaakkola, T., and Saul, L. (1999). “An introduction to variational methods for graphical models.” Machine Learning, 37: 183–233.
  • Kalli, M., Griffin, J., and Walker, S. (2011). “Slice sampling mixture models.” Statistics and Computing, 21: 93–105.
  • Kingman, J. (1993). Poisson Processes. Oxford University Press, USA.
  • Kurihara, K., Welling, M., and Vlassis, N. (2006). “Accelerated variational DP mixture models.” In Advances in Neural Information Processing Systems (NIPS) 19, 761–768.
  • Lenk, P. (1988). “The logistic normal distribution for Bayesian, nonparametric, predictive densities.” Journal of the American Statistical Association, 83(402): 509–516.
  • Liang, P., Petrov, S., Jordan, M., and Klein, D. (2007). “The infinite PCFG using hierarchical Dirichlet processes.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 688–697.
  • Lo, A. (1984). “On a class of Bayesian nonparametric estimates. I. Density estimates.” Annals of Statistics, 12: 351–357.
  • MacEachern, S. (1999). “Dependent nonparametric processes.” In ASA Proceedings of the Section on Bayesian Statistical Science, 50–55.
  • Müller, P., Quintana, F., and Rosner, G. (2004). “A method for combining inference across related nonparametric Bayesian models.” Journal of the Royal Statistical Society, 66: 735–749.
  • Paisley, J., Wang, C., and Blei, D. (2011). “The discrete infinite logistic normal distribution for mixed-membership modeling.” In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 15, 74–82.
  • Pritchard, J., Stephens, M., and Donnelly, P. (2000). “Inference of population structure using multilocus genotype data.” Genetics, 155: 945–959.
  • Rao, V. and Teh, Y. W. (2009). “Spatial normalized gamma processes.” In Advances in Neural Information Processing Systems (NIPS) 22, 1554–1562.
  • Rasmussen, C. and Williams, C. (2006). Gaussian Processes for Machine Learning. MIT press.
  • Ren, L., Du, L., Carin, L., and Dunson, D. (2011). “Logistic stick-breaking process.” Journal of Machine Learning Research, 12: 203–239.
  • Robbins, H. and Monro, S. (1951). “A stochastic approximation method.” The Annals of Mathematical Statistics, 22(3): 400–407.
  • Robert, C. and Casella, C. (2004). Monte Carlo Statistical Methods, 2nd Edition. Springer Texts in Statistics.
  • Sato, M. (2001). “Online model selection based on the variational Bayes.” Neural Computation, 13(7): 1649–1681.
  • Sethuraman, J. (1994). “A constructive definition of Dirichlet priors.” Statistica Sinica, 4: 639–650.
  • Teh, Y., Jordan, M., Beal, M., and Blei, D. (2006). “Hierarchical Dirichlet processes.” Journal of the American Statistical Association, 101(476): 1566–1581.
  • Teh, Y., Kurihara, K., and Welling, M. (2007). “Collapsed variational inference for HDP.” In Advances in Neural Information Processing Systems (NIPS) 20, 1481–1488.
  • Wainwright, M. and Jordan, M. (2008). “Graphical models, exponential families, and variational inference.” Foundations and Trends in Machine Learning, 1: 1–305.
  • Wang, C., Paisley, J., and Blei, D. (2011). “Online learning for the hierarchical Dirichlet process.” In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 15, 752–760.
  • Winn, J. and Bishop, C. (2005). “Variational message passing.” Journal of Machine Learning Research, 6: 661–694.