The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 5, Number 4 (2011), 2630-2650.
A sparse conditional Gaussian graphical model for analysis of genetical genomics data
Genetical genomics experiments have now been routinely conducted to measure both the genetic markers and gene expression data on the same subjects. The gene expression levels are often treated as quantitative traits and are subject to standard genetic analysis in order to identify the gene expression quantitative loci (eQTL). However, the genetic architecture for many gene expressions may be complex, and poorly estimated genetic architecture may compromise the inferences of the dependency structures of the genes at the transcriptional level. In this paper we introduce a sparse conditional Gaussian graphical model for studying the conditional independent relationships among a set of gene expressions adjusting for possible genetic effects where the gene expressions are modeled with seemingly unrelated regressions. We present an efficient coordinate descent algorithm to obtain the penalized estimation of both the regression coefficients and the sparse concentration matrix. The corresponding graph can be used to determine the conditional independence among a group of genes while adjusting for shared genetic effects. Simulation experiments and asymptotic convergence rates and sparsistency are used to justify our proposed methods. By sparsistency, we mean the property that all parameters that are zero are actually estimated as zero with probability tending to one. We apply our methods to the analysis of a yeast eQTL data set and demonstrate that the conditional Gaussian graphical model leads to a more interpretable gene network than a standard Gaussian graphical model based on gene expression data alone.
Ann. Appl. Stat., Volume 5, Number 4 (2011), 2630-2650.
First available in Project Euclid: 20 December 2011
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Yin, Jianxin; Li, Hongzhe. A sparse conditional Gaussian graphical model for analysis of genetical genomics data. Ann. Appl. Stat. 5 (2011), no. 4, 2630--2650. doi:10.1214/11-AOAS494. https://projecteuclid.org/euclid.aoas/1324399609
- Supplementary material: Supplemental materials for “A sparse conditional Gaussian graphical model for analysis of genetical genomics data”. The online supplemental materials include the simulation standard errors of Tables 1 and 2, two propositions on the Hessian matrix of the likelihood function and the convergence of the algorithm and the theoretical properties of the proposed penalized estimates of the sparse cGGM: its asymptotic distribution, the oracle properties when p and q are fixed as n → ∞ and the convergence rates and sparsistency of the estimators when p = p_n and q = q_n diverge as n → ∞. All the proofs are also given in the supplemental materials.