## The Annals of Statistics

### Tensor decompositions and sparse log-linear models

#### Abstract

Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. We derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

#### Article information

Source
Ann. Statist. Volume 45, Number 1 (2017), 1-38.

Dates
Revised: November 2015
First available in Project Euclid: 21 February 2017

https://projecteuclid.org/euclid.aos/1487667616

Digital Object Identifier
doi:10.1214/15-AOS1414

Zentralblatt MATH identifier
1367.62180

Subjects
Primary: 62F15: Bayesian inference

#### Citation

Johndrow, James E.; Bhattacharya, Anirban; Dunson, David B. Tensor decompositions and sparse log-linear models. Ann. Statist. 45 (2017), no. 1, 1--38. doi:10.1214/15-AOS1414. https://projecteuclid.org/euclid.aos/1487667616

#### References

• [1] Agresti, A. (2002). Categorical Data Analysis, 2nd ed. Wiley, New York.
• [2] Anderson, T. W. (1954). On estimation of parameters in latent structure analysis. Psychometrika 19 1–10.
• [3] Bhattacharya, A. and Dunson, D. B. (2012). Simplex factor models for multivariate unordered categorical data. J. Amer. Statist. Assoc. 107 362–377.
• [4] Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (2007). Discrete Multivariate Analysis: Theory and Practice. Springer, New York.
• [5] Cohen, J. E. and Rothblum, U. G. (1993). Nonnegative ranks, decompositions, and factorizations of nonnegative matrices. Linear Algebra Appl. 190 149–168.
• [6] Dahinden, C., Kalisch, M. and Bühlmann, P. (2010). Decomposition and model selection for large contingency tables. Biom. J. 52 233–252.
• [7] Darroch, J. N., Lauritzen, S. L. and Speed, T. P. (1980). Markov fields and log-linear interaction models for contingency tables. Ann. Statist. 8 522–539.
• [8] Dawid, A. P. and Lauritzen, S. L. (1993). Hyper-Markov laws in the statistical analysis of decomposable graphical models. Ann. Statist. 21 1272–1317.
• [9] Dellaportas, P. and Forster, J. J. (1999). Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika 86 615–633.
• [10] De Lathauwer, L., De Moor, B. and Vandewalle, J. (2000). A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21 1253–1278 (electronic).
• [11] De Lathauwer, L. De Moor, B. and Vandewalle, J. (2000). On the best rank-1 and rank-$(r_{1},r_{2},\ldots,r_{n})$ approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 21 1324–1342.
• [12] Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G. and West, M. (2004). Sparse graphical models for exploring gene expression data. J. Multivariate Anal. 90 196–212.
• [13] Dobra, A. and Lenkoski, A. (2011). Copula Gaussian graphical models and their application to modeling functional disability data. Ann. Appl. Stat. 5 969–993.
• [14] Dobra, A. and Massam, H. (2010). The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors. Stat. Methodol. 7 240–253.
• [15] Dunson, D. B. and Xing, C. (2009). Nonparametric Bayes modeling of multivariate categorical data. J. Amer. Statist. Assoc. 104 1042–1051.
• [16] Fienberg, S. E., Hersh, P., Rinaldo, A. and Zhou, Y. (2010). Maximum likelihood estimation in latent class models for contingency table data. In Algebraic and Geometric Methods in Statistics 27–62. Cambridge Univ. Press, Cambridge.
• [17] Fienberg, S. E. and Rinaldo, A. (2007). Three centuries of categorical data analysis: Log-linear models and maximum likelihood estimation. J. Statist. Plann. Inference 137 3430–3445.
• [18] Garcia, L. D., Stillman, M. and Sturmfels, B. (2005). Algebraic geometry of Bayesian networks. J. Symbolic Comput. 39 331–355.
• [19] Geiger, D., Heckerman, D., King, H. and Meek, C. (2001). Stratified exponential families: Graphical models and model selection. Ann. Statist. 29 505–529.
• [20] Gibson, W. A. (1955). An extension of Anderson’s solution for the latent structure equations. Psychometrika 20 69–73.
• [21] Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61 215–231.
• [22] Gregory, D. A. and Pullman, N. J. (1983). Semiring rank: Boolean rank and nonnegative rank factorizations. J. Comb. Inf. Syst. Sci. 8 223–233.
• [23] Haberman, S. J. (1974). Log-linear models for frequency tables derived by indirect observation: Maximum likelihood equations. Ann. Statist. 2 911–924.
• [24] Harshman, R. A. (1970). Foundations of the parafac procedure: Models and conditions for an “explanatory” multimodal factor analysis. UCLA Working Papers in Phonetics 16 1–84.
• [25] Hu, J., Joshi, A. and Johnson, V. E. (2009). Log-linear models for gene association. J. Amer. Statist. Assoc. 104 597–607.
• [26] Humphreys, K. and Titterington, D. M. (2003). Variational approximations for categorical causal modeling with latent variables. Psychometrika 68 391–412.
• [27] Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc. 96 161–173.
• [28] Johndrow, J. E., Battacharya, A. and Dunson, D. B. (2016). Supplement to “Tensor decompositions and sparse log-linear models.” DOI:10.1214/15-AOS1414SUPP.
• [29] Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications. SIAM Rev. 51 455–500.
• [30] Kunihama, T. and Dunson, D. B. (2013). Bayesian modeling of temporal dependence in large sparse contingency tables. J. Amer. Statist. Assoc. 108 1324–1338.
• [31] Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford Univ. Press, New York.
• [32] Lazarsfeld, P. F. and Henry, N. W. (1968). Latent Structure Analysis. Houghton, Mifflin, New York.
• [33] Letac, G. and Massam, H. (2012). Bayes factors and the geometry of discrete hierarchical loglinear models. Ann. Statist. 40 861–890.
• [34] Lim, L.-H. and Comon, P. (2009). Nonnegative approximations of nonnegative tensors. J. Chemom. 23 432–441.
• [35] Madansky, A. (1960). Determinantal methods in latent class analysis. Psychometrika 25 183–197.
• [36] Massam, H., Liu, J. and Dobra, A. (2009). A conjugate prior for discrete hierarchical log-linear models. Ann. Statist. 37 3431–3467.
• [37] Nardi, Y. and Rinaldo, A. (2012). The log-linear group-lasso estimator and its asymptotic properties. Bernoulli 18 945–974.
• [38] Roth, V. and Fischer, B. (2008). The group-lasso for generalized linear models: Uniqueness of solutions and efficient algorithms. In Proceedings of the 25th International Conference on Machine Learning 848–855. ACM, New York.
• [39] Rusakov, D. and Geiger, D. (2002). Asymptotic model selection for naive Bayesian networks. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence 438–455. Morgan Kaufmann, San Francisco, CA.
• [40] Settimi, R. and Smith, J. Q. (1998). On the geometry of Bayesian graphical models with hidden variables. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence 472–479. Morgan Kaufmann, San Francisco, CA.
• [41] Smith, J. Q. and Croft, J. (2003). Bayesian networks for discrete multivariate data: An algebraic approach to inference. J. Multivariate Anal. 84 387–402.
• [42] Stouffer, S. A., Guttman, L., Suchman, E. A., Lazarsfeld, P. F., Star, S. A. and Clausen, J. A. (1950). Measurement and prediction. Princeton Univ. Press, Princeton, NJ.
• [43] Tucker, L. R. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika 31 279–311.
• [44] Zhou, J., Bhattacharya, A., Herring, A. H. and Dunson, D. B. (2015). Bayesian factorizations of big sparse tensors. J. Amer. Statist. Assoc. 110 1562–1576.

#### Supplemental materials

• Supplement to: “Tensor decompositions and sparse log-linear models”. We provide a supplement with three parts. In the first part, we provide a proof of Remark 3.4 and a constructive proof of a bound on nonnegative rank for $d^{2}$ tensors corresponding to sparse log-linear models. The second part provides an MCMC algorithm for posterior computation in c-Tucker models and the third part provides supplementary figures and tables for Section 5.