Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 7, Number 2 (2013), 799-822.
Sparse latent factor models with interactions: Analysis of gene expression data
Sparse latent multi-factor models have been used in many exploratory and predictive problems with high-dimensional multivariate observations. Because of concerns with identifiability, the latent factors are almost always assumed to be linearly related to measured feature variables. Here we explore the analysis of multi-factor models with different structures of interactions between latent factors, including multiplicative effects as well as a more general framework for nonlinear interactions introduced via the Gaussian Process. We utilize sparsity priors to test whether the factors and interaction terms have significant effect. The performance of the models is evaluated through simulated and real data applications in genomics. Variation in the number of copies of regions of the genome is a well-known and important feature of most cancers. We examine interactions between factors directly associated with different chromosomal regions detected with copy number alteration in breast cancer data. In this context, significant interaction effects for specific genes suggest synergies between duplications and deletions in different regions of the chromosome.
Ann. Appl. Stat., Volume 7, Number 2 (2013), 799-822.
First available in Project Euclid: 27 June 2013
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Mayrink, Vinicius Diniz; Lucas, Joseph Edward. Sparse latent factor models with interactions: Analysis of gene expression data. Ann. Appl. Stat. 7 (2013), no. 2, 799--822. doi:10.1214/12-AOAS607. https://projecteuclid.org/euclid.aoas/1372338468
- Supplementary material: Sparse latent factor models with interactions: Posterior computation, simulated studies and gene selection procedure. Additional material containing the following: formulations of the complete conditional posterior distributions for parameters in the proposed models, simulated studies to evaluate the performance of the models, and the description of the procedure used to select genes for the real applications.