Open Access
December 2011 A sparse conditional Gaussian graphical model for analysis of genetical genomics data
Jianxin Yin, Hongzhe Li
Ann. Appl. Stat. 5(4): 2630-2650 (December 2011). DOI: 10.1214/11-AOAS494
Abstract

Genetical genomics experiments have now been routinely conducted to measure both the genetic markers and gene expression data on the same subjects. The gene expression levels are often treated as quantitative traits and are subject to standard genetic analysis in order to identify the gene expression quantitative loci (eQTL). However, the genetic architecture for many gene expressions may be complex, and poorly estimated genetic architecture may compromise the inferences of the dependency structures of the genes at the transcriptional level. In this paper we introduce a sparse conditional Gaussian graphical model for studying the conditional independent relationships among a set of gene expressions adjusting for possible genetic effects where the gene expressions are modeled with seemingly unrelated regressions. We present an efficient coordinate descent algorithm to obtain the penalized estimation of both the regression coefficients and the sparse concentration matrix. The corresponding graph can be used to determine the conditional independence among a group of genes while adjusting for shared genetic effects. Simulation experiments and asymptotic convergence rates and sparsistency are used to justify our proposed methods. By sparsistency, we mean the property that all parameters that are zero are actually estimated as zero with probability tending to one. We apply our methods to the analysis of a yeast eQTL data set and demonstrate that the conditional Gaussian graphical model leads to a more interpretable gene network than a standard Gaussian graphical model based on gene expression data alone.

References

1.

Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516. MR2417243 1225.68149Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516. MR2417243 1225.68149

2.

Bing, N. and Hoeschele, I. (2005). Genetical genomics analysis of a yeast segregant population for transcription network inference. Genetics 170 533–542.Bing, N. and Hoeschele, I. (2005). Genetical genomics analysis of a yeast segregant population for transcription network inference. Genetics 170 533–542.

3.

Brazhnik, P., de la Fuente, A. and Mendes, P. (2002). Gene networks: How to put the function in genomics. Trends Biotechnol. 20 467–472.Brazhnik, P., de la Fuente, A. and Mendes, P. (2002). Gene networks: How to put the function in genomics. Trends Biotechnol. 20 467–472.

4.

Brem, R. B. and Kruglyak, L. (2005). The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proceedings of National Academy of Sciences 102 1572–1577.Brem, R. B. and Kruglyak, L. (2005). The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proceedings of National Academy of Sciences 102 1572–1577.

5.

Chaibub Neto, E., Keller, M. P., Attie, A. D. and Yandell, B. S. (2010). Causal Graphical Models in Systems Genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann. Appl. Statist. 4 320–339. MR2758174 1189.62172 10.1214/09-AOAS288 euclid.aoas/1273584457 Chaibub Neto, E., Keller, M. P., Attie, A. D. and Yandell, B. S. (2010). Causal Graphical Models in Systems Genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann. Appl. Statist. 4 320–339. MR2758174 1189.62172 10.1214/09-AOAS288 euclid.aoas/1273584457

6.

Chen, L. S., Emmert-Streib, F. and Storey, J. D. (2007). Harnessing naturally randomized transcription to infer regulatory relationships among genes. Genome Biol. 8 R219.Chen, L. S., Emmert-Streib, F. and Storey, J. D. (2007). Harnessing naturally randomized transcription to infer regulatory relationships among genes. Genome Biol. 8 R219.

7.

Cheung, V. G. and Spielman, R. S. (2002). The genetics of variation in gene expression. Nat. Genet. 32 522–525.Cheung, V. G. and Spielman, R. S. (2002). The genetics of variation in gene expression. Nat. Genet. 32 522–525.

8.

Chickering, D. M., Heckerman, D. and Meek, C. (2004). Large-sample learning of Bayesian networks is NP-hard. J. Mach. Learn. Res. 5 1287–1330 (electronic). MR2248018 1222.68169Chickering, D. M., Heckerman, D. and Meek, C. (2004). Large-sample learning of Bayesian networks is NP-hard. J. Mach. Learn. Res. 5 1287–1330 (electronic). MR2248018 1222.68169

9.

Dempster, A. P. (1972). Covariance selection. Biometrics 28 157–175.Dempster, A. P. (1972). Covariance selection. Biometrics 28 157–175.

10.

Fan, J., Feng, Y. and Wu, Y. (2009). Network exploration via the adaptive lasso and SCAD penalties. Ann. Appl. Stat. 3 521–541. MR2750671 1166.62040 10.1214/08-AOAS215 euclid.aoas/1245676184 Fan, J., Feng, Y. and Wu, Y. (2009). Network exploration via the adaptive lasso and SCAD penalties. Ann. Appl. Stat. 3 521–541. MR2750671 1166.62040 10.1214/08-AOAS215 euclid.aoas/1245676184

11.

Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.

12.

Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M. and Hirakawa, M. (2010). KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38 D355–D360.Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M. and Hirakawa, M. (2010). KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38 D355–D360.

13.

Kendziorski, C. and Wang, P. (2003). A review of statistical methods for expression quantitative trait loci mapping. Mammalian Genome 17 509–517.Kendziorski, C. and Wang, P. (2003). A review of statistical methods for expression quantitative trait loci mapping. Mammalian Genome 17 509–517.

14.

Kendziorski, C. M., Chen, M., Yuan, M., Lan, H. and Attie, A. D. (2006). Statistical methods for expression quantitative trait loci (eQTL) mapping. Biometrics 62 19–27. MR2226552 10.1111/j.1541-0420.2005.00437.xKendziorski, C. M., Chen, M., Yuan, M., Lan, H. and Attie, A. D. (2006). Statistical methods for expression quantitative trait loci (eQTL) mapping. Biometrics 62 19–27. MR2226552 10.1111/j.1541-0420.2005.00437.x

15.

Kontos, K. (2009). Gaussian graphical model selection for gene regulatory network reverse engineering and function prediction. Ph.D. dissertation, Univ. Libre de Bruxelles.Kontos, K. (2009). Gaussian graphical model selection for gene regulatory network reverse engineering and function prediction. Ph.D. dissertation, Univ. Libre de Bruxelles.

16.

Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278. MR2572459 1191.62101 10.1214/09-AOS720 euclid.aos/1256303543 Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278. MR2572459 1191.62101 10.1214/09-AOS720 euclid.aos/1256303543

17.

Li, H. and Gui, J. (2006). Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks. Biostatistics 7 302–317.Li, H. and Gui, J. (2006). Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks. Biostatistics 7 302–317.

18.

Liu, B., De La Feunte, A. and Hoeschele, I. (2008). Gene network inference via structural equation modeling in genetical genomics experiments. Genetics 178 1763–1776.Liu, B., De La Feunte, A. and Hoeschele, I. (2008). Gene network inference via structural equation modeling in genetical genomics experiments. Genetics 178 1763–1776.

19.

Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462. MR2278363 1113.62082 10.1214/009053606000000281 euclid.aos/1152540754 Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462. MR2278363 1113.62082 10.1214/009053606000000281 euclid.aos/1152540754

20.

Neto, E. C., Keller, M. P., Attie, A. D. and Yandell, B. S. (2010). Causal graphical models in systems genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann. Appl. Stat. 4 320–339. MR2758174 1189.62172 10.1214/09-AOAS288 euclid.aoas/1273584457 Neto, E. C., Keller, M. P., Attie, A. D. and Yandell, B. S. (2010). Causal graphical models in systems genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann. Appl. Stat. 4 320–339. MR2758174 1189.62172 10.1214/09-AOAS288 euclid.aoas/1273584457

21.

Peng, J., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. J. Amer. Statist. Assoc. 104 735–746. MR2541591 10.1198/jasa.2009.0126Peng, J., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. J. Amer. Statist. Assoc. 104 735–746. MR2541591 10.1198/jasa.2009.0126

22.

Rothman, A. J., Levina, E. and Zhu, J. (2010). Sparse multivariate regression with covariance estimation. J. Comput. Graph. Statist. 19 947–962. MR2791263 10.1198/jcgs.2010.09188Rothman, A. J., Levina, E. and Zhu, J. (2010). Sparse multivariate regression with covariance estimation. J. Comput. Graph. Statist. 19 947–962. MR2791263 10.1198/jcgs.2010.09188

23.

Schadt, E. E., Monks, S. A., Drake, T. A., Lusis, A. J., Che, N., Colinayo, V., Ruff, T. G., Milligan, S. B., Lamb, J. R., Cavet, G., Linsley, P. S., Mao, M., Stoughton, R. B. and Friend, S. H. (2003). Genetics of gene expression surveyed in maize, mouse and man. Nature 422 297–302.Schadt, E. E., Monks, S. A., Drake, T. A., Lusis, A. J., Che, N., Colinayo, V., Ruff, T. G., Milligan, S. B., Lamb, J. R., Cavet, G., Linsley, P. S., Mao, M., Stoughton, R. B. and Friend, S. H. (2003). Genetics of gene expression surveyed in maize, mouse and man. Nature 422 297–302.

24.

Schäfer, J. and Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4 Art. 32, 28 pp. (electronic). MR2183942Schäfer, J. and Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4 Art. 32, 28 pp. (electronic). MR2183942

25.

Segal, E., Friedman, N., Kaminski, N., Regev, A. and Koller, D. (2005). From signatures to models: Understanding cancer using microarrays. Nat. Genet. 37 S38–S45.Segal, E., Friedman, N., Kaminski, N., Regev, A. and Koller, D. (2005). From signatures to models: Understanding cancer using microarrays. Nat. Genet. 37 S38–S45.

26.

Stark, C., Breitkreutz, B.-J., Chatr-Aryamontri, A., Boucher, L., Oughtred, R., Livstone, M. S., Nixon, J., Van Auken, K., Wang, X., Shi, X., Reguly, T., Rust, J. M., Winter, A., Dolinski, K. and Tyers, M. (2011). The BioGRID interaction database: 2011 update. Nucleic Acids Res. 39 D698–D704.Stark, C., Breitkreutz, B.-J., Chatr-Aryamontri, A., Boucher, L., Oughtred, R., Livstone, M. S., Nixon, J., Van Auken, K., Wang, X., Shi, X., Reguly, T., Rust, J. M., Winter, A., Dolinski, K. and Tyers, M. (2011). The BioGRID interaction database: 2011 update. Nucleic Acids Res. 39 D698–D704.

27.

Steffen, M., Petti, A., Aach, J., D’Haeseleer, P. and Church, G. (2002). Automated modelling of signal transduction networks. BMC Bioinformatics 3 34.Steffen, M., Petti, A., Aach, J., D’Haeseleer, P. and Church, G. (2002). Automated modelling of signal transduction networks. BMC Bioinformatics 3 34.

28.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288. MR1379242Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288. MR1379242

29.

Wang, H., Li, R. and Tsai, C.-L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94 553–568. MR2410008 1135.62058 10.1093/biomet/asm053Wang, H., Li, R. and Tsai, C.-L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94 553–568. MR2410008 1135.62058 10.1093/biomet/asm053

30.

Yin, J. and Li, H. (2011). Supplement to “A sparse conditional Gaussian graphical model for analysis of genetical genomics data.”  DOI:10.1214/11-AOAS494SUPPMR2907129 1234.62151 10.1214/11-AOAS494 euclid.aoas/1324399609 Yin, J. and Li, H. (2011). Supplement to “A sparse conditional Gaussian graphical model for analysis of genetical genomics data.”  DOI:10.1214/11-AOAS494SUPPMR2907129 1234.62151 10.1214/11-AOAS494 euclid.aoas/1324399609

31.

Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. J. Amer. Statist. Assoc. 57 348–368. MR139235 0113.34902 10.1080/01621459.1962.10480664Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. J. Amer. Statist. Assoc. 57 348–368. MR139235 0113.34902 10.1080/01621459.1962.10480664

32.

Zhu, J., Lum, P. Y., Lamb, J., GuhaThakurta, D., Edwards, S. W., Thieringer, R., Berger, J. P., Wu, M. S., Thompson, J., Sachs, A. B. and Schadt, E. E. (2004). An integrative genomics approach to the reconstruction of gene networks in segregating populations. Cytogenetic Genome Research 105 363–374.Zhu, J., Lum, P. Y., Lamb, J., GuhaThakurta, D., Edwards, S. W., Thieringer, R., Berger, J. P., Wu, M. S., Thompson, J., Sachs, A. B. and Schadt, E. E. (2004). An integrative genomics approach to the reconstruction of gene networks in segregating populations. Cytogenetic Genome Research 105 363–374.

33.

Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429. MR2279469 1171.62326 10.1198/016214506000000735Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429. MR2279469 1171.62326 10.1198/016214506000000735
Copyright © 2011 Institute of Mathematical Statistics
Jianxin Yin and Hongzhe Li "A sparse conditional Gaussian graphical model for analysis of genetical genomics data," The Annals of Applied Statistics 5(4), 2630-2650, (December 2011). https://doi.org/10.1214/11-AOAS494
Published: December 2011
Vol.5 • No. 4 • December 2011
Back to Top