Characterizing the shared memberships of individuals in a classification scheme poses severe interpretability issues, even when using a moderate number of classes (say four). Mixed membership models quantify this phenomenon, but they typically focus on goodness-of-fit more than on interpretable inference. To achieve a good numerical fit, these models may, in fact, require many extreme profiles, making the results difficult to interpret. We introduce a new class of multivariate mixed membership models that, when variables can be partitioned into subject-matter based domains, can provide a good fit to the data using fewer profiles than standard formulations. The proposed model explicitly accounts for the blocks of variables corresponding to the distinct domains along with a cross-domain correlation structure which provides new information about shared membership of individuals in a complex classification scheme. We specify a multivariate logistic normal distribution for the membership vectors which allows easy introduction of auxiliary information leveraging a latent multivariate logistic regression. A Bayesian approach to inference, relying on Pólya gamma data augmentation, facilitates efficient posterior computation via Markov chain Monte Carlo. We apply this methodology to a spatially explicit study of malaria risk over time on the Brazilian Amazon frontier.
The work was partially supported by funding from grants R01ES027498 and R01ES028804 from the United States National Institute of Environmental Health Sciences and grant N00014-16-1-2147 from the Office of Naval Research.
Data for this study were extracted from the project “Land Use and Health” funded by the International Development Research Centre (IDRC), grant #94-0206-00, awarded to Centro de Desenvolvimento e Planejamento Regional, CEDEPLAR (Belo Horizonte, MG, Brazil), PI: Diana O. Sawyer. Data was cleaned and treated by Marcia C. Castro, and originally used in (Castro et al. (2006)) (www.pnas.org/cgi/doi/10.1073/pnas.0510576103). Massimiliano Russo is also affiliated to the Department of Data Science, Dana-Farber Cancer Institute.
"Multivariate mixed membership modeling: Inferring domain-specific risk profiles." Ann. Appl. Stat. 16 (1) 391 - 413, March 2022. https://doi.org/10.1214/21-AOAS1496