September 2023 Bayesian combinatorial MultiStudy factor analysis
Isabella N. Grabski, Roberta De Vito, Lorenzo Trippa, Giovanni Parmigiani
Author Affiliations +
Ann. Appl. Stat. 17(3): 2212-2235 (September 2023). DOI: 10.1214/22-AOAS1715

Abstract

Mutations in the BRCA1 and BRCA2 genes are known to be highly associated with breast cancer. Identifying both shared and unique transcript expression patterns in blood samples from these groups can shed insight into if and how the disease mechanisms differ among individuals by mutation status, but this is challenging in the high-dimensional setting. A recent method, Bayesian multistudy factor analysis (BMSFA), identifies latent factors common to all studies (or equivalently, groups) and latent factors specific to individual studies. However, BMSFA does not allow for factors shared by more than one but less than all studies. This is critical in our context, as we may expect some but not all signals to be shared by BRCA1- and BRCA2-mutation carriers but not necessarily other high-risk groups. We extend BMSFA by introducing a new method, Tetris, for Bayesian combinatorial multistudy factor analysis which identifies latent factors that any combination of studies or groups can share. We model the subsets of studies that share latent factors with an Indian buffet process and offer a way to summarize uncertainty in the sharing patterns using credible balls. We test our method with an extensive range of simulations and showcase its utility not only in dimension reduction but also in covariance estimation. When applied to transcript expression data from high-risk families grouped by mutation status, Tetris reveals the features and pathways characterizing each group and the sharing patterns among them. Finally, we further extend Tetris to discover groupings of samples when group labels are not provided which can elucidate additional structure in these data.

Funding Statement

Research supported by the U.S.A.’s National Science Foundation grants NSF-DMS1810829 and 2113707 (LT and GP), the National Cancer Institute of the National Institutes of Health under award 5R01CA262710-02 (LT and GP), the National Library of Medicine of the National Institutes of Health under Award Number T32LM012411 (ING), and the National Science Foundation Graduate Research Fellowship under Grant No. DGE1745303 (ING).

Acknowledgments

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We thank Dr. Arkaprava Roy for guidance on the implementation of the PFA model.

Citation

Download Citation

Isabella N. Grabski. Roberta De Vito. Lorenzo Trippa. Giovanni Parmigiani. "Bayesian combinatorial MultiStudy factor analysis." Ann. Appl. Stat. 17 (3) 2212 - 2235, September 2023. https://doi.org/10.1214/22-AOAS1715

Information

Received: 1 April 2021; Revised: 1 November 2022; Published: September 2023
First available in Project Euclid: 7 September 2023

MathSciNet: MR4637664
Digital Object Identifier: 10.1214/22-AOAS1715

Keywords: Dimension reduction , factor analysis , Gibbs sampling , Multistudy analysis , unsupervised learning

Rights: Copyright © 2023 Institute of Mathematical Statistics

JOURNAL ARTICLE
24 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.17 • No. 3 • September 2023
Back to Top