March 2022 Multivariate mixed membership modeling: Inferring domain-specific risk profiles
Massimiliano Russo, Burton H. Singer, David B. Dunson
Author Affiliations +
Ann. Appl. Stat. 16(1): 391-413 (March 2022). DOI: 10.1214/21-AOAS1496

Abstract

Characterizing the shared memberships of individuals in a classification scheme poses severe interpretability issues, even when using a moderate number of classes (say four). Mixed membership models quantify this phenomenon, but they typically focus on goodness-of-fit more than on interpretable inference. To achieve a good numerical fit, these models may, in fact, require many extreme profiles, making the results difficult to interpret. We introduce a new class of multivariate mixed membership models that, when variables can be partitioned into subject-matter based domains, can provide a good fit to the data using fewer profiles than standard formulations. The proposed model explicitly accounts for the blocks of variables corresponding to the distinct domains along with a cross-domain correlation structure which provides new information about shared membership of individuals in a complex classification scheme. We specify a multivariate logistic normal distribution for the membership vectors which allows easy introduction of auxiliary information leveraging a latent multivariate logistic regression. A Bayesian approach to inference, relying on Pólya gamma data augmentation, facilitates efficient posterior computation via Markov chain Monte Carlo. We apply this methodology to a spatially explicit study of malaria risk over time on the Brazilian Amazon frontier.

Funding Statement

The work was partially supported by funding from grants R01ES027498 and R01ES028804 from the United States National Institute of Environmental Health Sciences and grant N00014-16-1-2147 from the Office of Naval Research.

Acknowledgments

Data for this study were extracted from the project “Land Use and Health” funded by the International Development Research Centre (IDRC), grant #94-0206-00, awarded to Centro de Desenvolvimento e Planejamento Regional, CEDEPLAR (Belo Horizonte, MG, Brazil), PI: Diana O. Sawyer. Data was cleaned and treated by Marcia C. Castro, and originally used in (Castro et al. (2006)) (www.pnas.org/cgi/doi/10.1073/pnas.0510576103). Massimiliano Russo is also affiliated to the Department of Data Science, Dana-Farber Cancer Institute.

Citation

Download Citation

Massimiliano Russo. Burton H. Singer. David B. Dunson. "Multivariate mixed membership modeling: Inferring domain-specific risk profiles." Ann. Appl. Stat. 16 (1) 391 - 413, March 2022. https://doi.org/10.1214/21-AOAS1496

Information

Received: 1 July 2020; Revised: 1 December 2020; Published: March 2022
First available in Project Euclid: 28 March 2022

MathSciNet: MR4400515
zbMATH: 1498.62124
Digital Object Identifier: 10.1214/21-AOAS1496

Keywords: Admixture model , Contingency table , latent Dirichlet allocation , multivariate categorical data , multivariate logistic normal distribution

Rights: Copyright © 2022 Institute of Mathematical Statistics

JOURNAL ARTICLE
23 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.16 • No. 1 • March 2022
Back to Top