Bayesian Analysis

Nonparametric Bayesian Negative Binomial Factor Analysis

Mingyuan Zhou

Full-text: Open access


A common approach to analyze a covariate-sample count matrix, an element of which represents how many times a covariate appears in a sample, is to factorize it under the Poisson likelihood. We show its limitation in capturing the tendency for a covariate present in a sample to both repeat itself and excite related ones. To address this limitation, we construct negative binomial factor analysis (NBFA) to factorize the matrix under the negative binomial likelihood, and relate it to a Dirichlet-multinomial distribution based mixed-membership model. To support countably infinite factors, we propose the hierarchical gamma-negative binomial process. By exploiting newly proved connections between discrete distributions, we construct two blocked and a collapsed Gibbs sampler that all adaptively truncate their number of factors, and demonstrate that the blocked Gibbs sampler developed under a compound Poisson representation converges fast and has low computational complexity. Example results show that NBFA has a distinct mechanism in adjusting its number of inferred factors according to the sample lengths, and provides clear advantages in parsimonious representation, predictive power, and computational complexity over previously proposed discrete latent variable models, which either completely ignore burstiness, or model only the burstiness of the covariates but not that of the factors.

Article information

Bayesian Anal., Volume 13, Number 4 (2018), 1065-1093.

First available in Project Euclid: 16 November 2017

Permanent link to this document

Digital Object Identifier

burstiness count matrix factorization hierarchical gamma-negative binomial process parsimonious representation self- and cross-excitation

Creative Commons Attribution 4.0 International License.


Zhou, Mingyuan. Nonparametric Bayesian Negative Binomial Factor Analysis. Bayesian Anal. 13 (2018), no. 4, 1065--1093. doi:10.1214/17-BA1070.

Export citation


  • Aldous, D. (1983). “Exchangeability and related topics.” In Ecole d’Ete de Probabilities de Saint-Flour XIII, 1–198. Springer.
  • Antoniak, C. E. (1974). “Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems.” Annals of Statistics, 2(6): 1152–1174.
  • Blei, D. and Lafferty, J. (2005). “Correlated Topic Models.” In NIPS, 147–154.
  • Blei, D., Ng, A., and Jordan, M. (2003). “Latent Dirichlet allocation.” Journal of Machine Learning Research, 3: 993–1022.
  • Broderick, T., Mackey, L., Paisley, J., and Jordan, M. I. (2015). “Combinatorial Clustering and the Beta Negative Binomial Process.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
  • Buntine, W. and Jakulin, A. (2006). “Discrete Component Analysis.” In Subspace, Latent Structure and Feature Selection Techniques. Springer-Verlag.
  • Canny, J. (2004). “GaP: a factor model for discrete data.” In SIGIR.
  • Church, K. W. and Gale, W. A. (1995). “Poisson mixtures.” Natural Language Engineering.
  • Doyle, G. and Elkan, C. (2009). “Accounting for burstiness in topic models.” In ICML.
  • Dunson, D. B. and Herring, A. H. (2005). “Bayesian latent variable models for mixed discrete outcomes.” Biostatistics, 6(1): 11–25.
  • Escobar, M. D. and West, M. (1995). “Bayesian density estimation and inference using mixtures.” Journal of the American Statistical Association.
  • Ewens, W. J. (1972). Theoretical Population Biology, 3(1): 87–112.
  • Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and Lin, C.-J. (2008). “LIBLINEAR: A Library for Large Linear Classification.” Journal of Machine Learning Research, 1871–1874.
  • Ferguson, T. S. (1973). “A Bayesian analysis of some nonparametric problems.” Annals of Statistics, 1(2): 209–230.
  • Fox, E. B., Sudderth, E. B., Jordan, M. I., and Willsky, A. S. (2011). “A sticky HDP-HMM with application to speaker diarization.” Annals of Applied Statistics.
  • Gan, Z., Chen, C., Henao, R., Carlson, D., and Carin, L. (2015). “Scalable Deep Poisson Factor Analysis for Topic Modeling.” In ICML.
  • Griffiths, T. L. and Steyvers, M. (2004). “Finding Scientific Topics.” PNAS.
  • Hofmann, T. (1999). “Probabilistic Latent Semantic Analysis.” In UAI.
  • Ishwaran, H. and James, L. F. (2001). “Gibbs sampling methods for stick-breaking priors.” Journal of the American Statistical Association, 96(453).
  • Lee, D. D. and Seung, H. S. (2001). “Algorithms for Non-negative Matrix Factorization.” In NIPS.
  • Lijoi, A., Mena, R. H., and Prünster, I. (2007). “Controlling the reinforcement in Bayesian non-parametric mixture models.” Journal of the Royal Statistical Society: Series B, 69(4): 715–740.
  • Madsen, R. E., Kauchak, D., and Elkan, C. (2005). “Modeling word burstiness using the Dirichlet distribution.” In ICML.
  • Mosimann, J. E. (1962). “On the compound multinomial distribution, the multivariate $\beta$-distribution, and correlations among proportions.” Biometrika, 65–82.
  • Newman, D., Asuncion, A., Smyth, P., and Welling, M. (2009). “Distributed algorithms for topic models.” Journal of Machine Learning Research.
  • Paisley, J., Wang, C., and Blei, D. M. (2012). “The Discrete Infinite Logistic Normal Distribution.” Bayesian Analysis.
  • Papaspiliopoulos, O. and Roberts, G. O. (2008). “Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models.” Biometrika.
  • Pitman, J. (2006). Combinatorial stochastic processes. Lecture Notes in Mathematics. Springer-Verlag.
  • Pritchard, J. K., Stephens, M., and Donnelly, P. (2000). “Inference of population structure using multilocus genotype data.” Genetics, 155(2): 945–959.
  • Ranganath, R., Tang, L., Charlin, L., and Blei, D. M. (2015). “Deep exponential families.” In AISTATS.
  • Regazzini, E., Lijoi, A., and Prünster, I. (2003). “Distributional results for means of normalized random measures with independent increments.” Annals of Statistics, 31(2): 560–585.
  • Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006). “Hierarchical Dirichlet processes.” Journal of the American Statistical Association, 101: 1566–1581.
  • Walker, S. G. (2007). “Sampling the Dirichlet mixture model with slices.” Communications in Statistics Simulation and Computation.
  • Wallach, H. M., Mimno, D. M., and McCallum, A. (2009a). “Rethinking LDA: Why priors matter.” In NIPS.
  • Wallach, H. M., Murray, I., Salakhutdinov, R., and Mimno, D. (2009b). “Evaluation Methods for Topic Models.” In ICML.
  • Zhou, M. (2017). “Nonparametric Bayesian Negative Binomial Factor Analysis: Supplementary Material.” Bayesian Analysis.
  • Zhou, M. and Carin, L. (2015). “Negative Binomial Process Count and Mixture Modeling.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2): 307–320.
  • Zhou, M., Cong, Y., and Chen, B. (2016a). “Augmentable Gamma Belief Networks.” JMLR, 17(163): 1–44.
  • Zhou, M., Hannah, L., Dunson, D., and Carin, L. (2012). “Beta-Negative Binomial Process and Poisson Factor Analysis.” In AISTATS, 1462–1471.
  • Zhou, M., Padilla, O. H. M., and Scott, J. G. (2016b). “Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes.” Journal of the American Statistical Association, 111(515): 1144–1156.

Supplemental materials