The Annals of Statistics

Tensor decompositions and sparse log-linear models

James E. Johndrow, Anirban Bhattacharya, and David B. Dunson

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. We derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Article information

Ann. Statist., Volume 45, Number 1 (2017), 1-38.

Received: April 2014
Revised: November 2015
First available in Project Euclid: 21 February 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F15: Bayesian inference

Bayesian categorical data contingency table latent class analysis graphical model high-dimensional low rank Parafac Tucker sparsity


Johndrow, James E.; Bhattacharya, Anirban; Dunson, David B. Tensor decompositions and sparse log-linear models. Ann. Statist. 45 (2017), no. 1, 1--38. doi:10.1214/15-AOS1414.

Export citation


  • [1] Agresti, A. (2002). Categorical Data Analysis, 2nd ed. Wiley, New York.
  • [2] Anderson, T. W. (1954). On estimation of parameters in latent structure analysis. Psychometrika 19 1–10.
  • [3] Bhattacharya, A. and Dunson, D. B. (2012). Simplex factor models for multivariate unordered categorical data. J. Amer. Statist. Assoc. 107 362–377.
  • [4] Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (2007). Discrete Multivariate Analysis: Theory and Practice. Springer, New York.
  • [5] Cohen, J. E. and Rothblum, U. G. (1993). Nonnegative ranks, decompositions, and factorizations of nonnegative matrices. Linear Algebra Appl. 190 149–168.
  • [6] Dahinden, C., Kalisch, M. and Bühlmann, P. (2010). Decomposition and model selection for large contingency tables. Biom. J. 52 233–252.
  • [7] Darroch, J. N., Lauritzen, S. L. and Speed, T. P. (1980). Markov fields and log-linear interaction models for contingency tables. Ann. Statist. 8 522–539.
  • [8] Dawid, A. P. and Lauritzen, S. L. (1993). Hyper-Markov laws in the statistical analysis of decomposable graphical models. Ann. Statist. 21 1272–1317.
  • [9] Dellaportas, P. and Forster, J. J. (1999). Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika 86 615–633.
  • [10] De Lathauwer, L., De Moor, B. and Vandewalle, J. (2000). A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21 1253–1278 (electronic).
  • [11] De Lathauwer, L. De Moor, B. and Vandewalle, J. (2000). On the best rank-1 and rank-$(r_{1},r_{2},\ldots,r_{n})$ approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 21 1324–1342.
  • [12] Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G. and West, M. (2004). Sparse graphical models for exploring gene expression data. J. Multivariate Anal. 90 196–212.
  • [13] Dobra, A. and Lenkoski, A. (2011). Copula Gaussian graphical models and their application to modeling functional disability data. Ann. Appl. Stat. 5 969–993.
  • [14] Dobra, A. and Massam, H. (2010). The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors. Stat. Methodol. 7 240–253.
  • [15] Dunson, D. B. and Xing, C. (2009). Nonparametric Bayes modeling of multivariate categorical data. J. Amer. Statist. Assoc. 104 1042–1051.
  • [16] Fienberg, S. E., Hersh, P., Rinaldo, A. and Zhou, Y. (2010). Maximum likelihood estimation in latent class models for contingency table data. In Algebraic and Geometric Methods in Statistics 27–62. Cambridge Univ. Press, Cambridge.
  • [17] Fienberg, S. E. and Rinaldo, A. (2007). Three centuries of categorical data analysis: Log-linear models and maximum likelihood estimation. J. Statist. Plann. Inference 137 3430–3445.
  • [18] Garcia, L. D., Stillman, M. and Sturmfels, B. (2005). Algebraic geometry of Bayesian networks. J. Symbolic Comput. 39 331–355.
  • [19] Geiger, D., Heckerman, D., King, H. and Meek, C. (2001). Stratified exponential families: Graphical models and model selection. Ann. Statist. 29 505–529.
  • [20] Gibson, W. A. (1955). An extension of Anderson’s solution for the latent structure equations. Psychometrika 20 69–73.
  • [21] Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61 215–231.
  • [22] Gregory, D. A. and Pullman, N. J. (1983). Semiring rank: Boolean rank and nonnegative rank factorizations. J. Comb. Inf. Syst. Sci. 8 223–233.
  • [23] Haberman, S. J. (1974). Log-linear models for frequency tables derived by indirect observation: Maximum likelihood equations. Ann. Statist. 2 911–924.
  • [24] Harshman, R. A. (1970). Foundations of the parafac procedure: Models and conditions for an “explanatory” multimodal factor analysis. UCLA Working Papers in Phonetics 16 1–84.
  • [25] Hu, J., Joshi, A. and Johnson, V. E. (2009). Log-linear models for gene association. J. Amer. Statist. Assoc. 104 597–607.
  • [26] Humphreys, K. and Titterington, D. M. (2003). Variational approximations for categorical causal modeling with latent variables. Psychometrika 68 391–412.
  • [27] Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc. 96 161–173.
  • [28] Johndrow, J. E., Battacharya, A. and Dunson, D. B. (2016). Supplement to “Tensor decompositions and sparse log-linear models.” DOI:10.1214/15-AOS1414SUPP.
  • [29] Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications. SIAM Rev. 51 455–500.
  • [30] Kunihama, T. and Dunson, D. B. (2013). Bayesian modeling of temporal dependence in large sparse contingency tables. J. Amer. Statist. Assoc. 108 1324–1338.
  • [31] Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford Univ. Press, New York.
  • [32] Lazarsfeld, P. F. and Henry, N. W. (1968). Latent Structure Analysis. Houghton, Mifflin, New York.
  • [33] Letac, G. and Massam, H. (2012). Bayes factors and the geometry of discrete hierarchical loglinear models. Ann. Statist. 40 861–890.
  • [34] Lim, L.-H. and Comon, P. (2009). Nonnegative approximations of nonnegative tensors. J. Chemom. 23 432–441.
  • [35] Madansky, A. (1960). Determinantal methods in latent class analysis. Psychometrika 25 183–197.
  • [36] Massam, H., Liu, J. and Dobra, A. (2009). A conjugate prior for discrete hierarchical log-linear models. Ann. Statist. 37 3431–3467.
  • [37] Nardi, Y. and Rinaldo, A. (2012). The log-linear group-lasso estimator and its asymptotic properties. Bernoulli 18 945–974.
  • [38] Roth, V. and Fischer, B. (2008). The group-lasso for generalized linear models: Uniqueness of solutions and efficient algorithms. In Proceedings of the 25th International Conference on Machine Learning 848–855. ACM, New York.
  • [39] Rusakov, D. and Geiger, D. (2002). Asymptotic model selection for naive Bayesian networks. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence 438–455. Morgan Kaufmann, San Francisco, CA.
  • [40] Settimi, R. and Smith, J. Q. (1998). On the geometry of Bayesian graphical models with hidden variables. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence 472–479. Morgan Kaufmann, San Francisco, CA.
  • [41] Smith, J. Q. and Croft, J. (2003). Bayesian networks for discrete multivariate data: An algebraic approach to inference. J. Multivariate Anal. 84 387–402.
  • [42] Stouffer, S. A., Guttman, L., Suchman, E. A., Lazarsfeld, P. F., Star, S. A. and Clausen, J. A. (1950). Measurement and prediction. Princeton Univ. Press, Princeton, NJ.
  • [43] Tucker, L. R. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika 31 279–311.
  • [44] Zhou, J., Bhattacharya, A., Herring, A. H. and Dunson, D. B. (2015). Bayesian factorizations of big sparse tensors. J. Amer. Statist. Assoc. 110 1562–1576.

Supplemental materials

  • Supplement to: “Tensor decompositions and sparse log-linear models”. We provide a supplement with three parts. In the first part, we provide a proof of Remark 3.4 and a constructive proof of a bound on nonnegative rank for $d^{2}$ tensors corresponding to sparse log-linear models. The second part provides an MCMC algorithm for posterior computation in c-Tucker models and the third part provides supplementary figures and tables for Section 5.