The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 11, Number 1 (2017), 161-184.
A statistical framework for data integration through graphical models with application to cancer genomics
Yuping Zhang, Zhengqing Ouyang, and Hongyu Zhao
Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Abstract
Recent advances in high-throughput biotechnologies have generated various types of genetic, genomic, epigenetic, transcriptomic and proteomic data across different biological conditions. It is likely that integrating data from diverse experiments may lead to a more unified and global view of biological systems and complex diseases. We present a coherent statistical framework for integrating various types of data from distinct but related biological conditions through graphical models. Specifically, our statistical framework is designed for modeling multiple networks with shared regulatory mechanisms from heterogeneous high-dimensional datasets. The performance of our approach is illustrated through simulations and its applications to cancer genomics.
Article information
Source
Ann. Appl. Stat. Volume 11, Number 1 (2017), 161-184.
Dates
Received: February 2016
Revised: September 2016
First available in Project Euclid: 8 April 2017
Permanent link to this document
http://projecteuclid.org/euclid.aoas/1491616876
Digital Object Identifier
doi:10.1214/16-AOAS998
Keywords
Cancer genomics data integration graphical models
Citation
Zhang, Yuping; Ouyang, Zhengqing; Zhao, Hongyu. A statistical framework for data integration through graphical models with application to cancer genomics. Ann. Appl. Stat. 11 (2017), no. 1, 161--184. doi:10.1214/16-AOAS998. http://projecteuclid.org/euclid.aoas/1491616876.
References
- Albert, R., Jeong, H. and Barabási, A.-L. (2000). Error and attack tolerance of complex networks. Nature 406 378–382.
- Auslender, A. and Teboulle, M. (2006). Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16 697–725 (electronic).
- Barabási, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. Science 286 509–512.
- Beck, A. and Teboulle, M. (2009). Gradient-based algorithms with applications to signal recovery. Convex Optim. Signal Process. Commun. 42–88.Zentralblatt MATH: 1211.90290
- Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3 1–122.
- Chen, X., Slack, F. J. and Zhao, H. (2013). Joint analysis of expression profiles from multiple cancers improves the identification of microRNA–gene interactions. Bioinformatics 29 2137–2145.
- Chen, S., Witten, D. M. and Shojaie, A. (2015). Selection and estimation for mixed graphical models. Biometrika 102 47–64.Mathematical Reviews (MathSciNet): MR3335095
Zentralblatt MATH: 1345.62081
Digital Object Identifier: doi:10.1093/biomet/asu051 - Cheng, J., Levina, E. and Zhu, J. (2013). High-dimensional mixed graphical models. Preprint. Available at arXiv:1304.2810.arXiv: arXiv:1304.2810
- Chun, H., Chen, M., Li, B. and Zhao, H. (2013). Joint conditional Gaussian graphical models with multiple sources of genomic data. Front. Genet. 4 Article ID 294. DOI:10.3389/fgene.2013.00294.
- Ciriello, G., Miller, M. L., Aksoy, B. A., Senbabaoglu, Y., Schultz, N. and Sander, C. (2013). Emerging landscape of oncogenic signatures across human cancers. Nat. Genet. 45 1127–1133.
- Danaher, P., Wang, P. and Witten, D. M. (2013). The joint graphical lasso for inverse covariance estimation across multiple classes. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 373–397.
- Fellinghauer, B., Bühlmann, P., Ryffel, M., von Rhein, M. and Reinhardt, J. D. (2013). Stable graphical model estimation with random forests for discrete, continuous, and mixed variables. Comput. Statist. Data Anal. 64 132–152.
- Feng, Z., Zhang, H., Levine, A. J. and Jin, S. (2005). The coordinate regulation of the p53 and mTOR pathways in cells. Proc. Natl. Acad. Sci. USA 102 8204–8209.
- Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
- Friedman, J., Hastie, T. and Tibshirani, R. (2009). Glmnet: Lasso and elastic-net regularized generalized linear models. R Package Version 1.
- Friedman, J., Hastie, T. and Tibshirani, R. (2010). A note on the group lasso and a sparse group lasso. Technical report, Dept. Statistics, Stanford Univ., Stanford.
- Ge, H., Walhout, A. J. and Vidal, M. (2003). Integrating ‘omic’ information: A bridge between genomics and systems biology. Trends Genet. 19 551–560.
- Govindan, R. and Tangmunarunkit, H. (2000). Heuristics for Internet map discovery. In Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies 3 1371–1380. IEEE, New York.
- Guo, J., Levina, E., Michailidis, G. and Zhu, J. (2010). Joint structure estimation for categorical Markov networks. Technical report, Dept. Statistics, Univ. of Michigan, Ann Arbor.
- Guo, J., Levina, E., Michailidis, G. and Zhu, J. (2011). Joint estimation of multiple graphical models. Biometrika 98 1–15.Mathematical Reviews (MathSciNet): MR2804206
Zentralblatt MATH: 1214.62058
Digital Object Identifier: doi:10.1093/biomet/asq060 - Hawkins, R. D., Hon, G. C. and Ren, B. (2010). Next-generation genomics: An integrative approach. Nat. Rev. Genet. 11 476–486.
- Hecker, M., Lambeck, S., Toepfer, S., van Someren, E. and Guthke, R. (2009). Gene regulatory network inference: Data integration in dynamic models—A review. Biosystems 96 86–103.
- Hestenes, M. R. (1969). Multiplier and gradient methods. J. Optim. Theory Appl. 4 303–320.Mathematical Reviews (MathSciNet): MR271809
Zentralblatt MATH: 0174.20705
Digital Object Identifier: doi:10.1007/BF00927673 - Hoefling, H. (2010). A path algorithm for the fused lasso signal approximator. J. Comput. Graph. Statist. 19 984–1006. Supplementary materials available online.
- Höfling, H. and Tibshirani, R. (2009). Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods. J. Mach. Learn. Res. 10 883–906.
- Jeong, H., Mason, S. P., Barabási, A-L. and Oltvai, Z. N. (2001). Lethality and centrality in protein networks. Nature 411 41–42.
- Joyce, A. R. and Palsson, B. Ø. (2006). The model organism as a system: Integrating “omics” data sets. Nat. Rev., Mol. Cell Biol. 7 198–210.
- Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford Univ. Press, New York.
- Lee, J. D. and Hastie, T. J. (2012). Learning mixed graphical models. Preprint. Available at arXiv:1205.5012.arXiv: arXiv:1205.5012
- Li, B., Chun, H. and Zhao, H. (2012). Sparse estimation of conditional graphical models with application to gene networks. J. Amer. Statist. Assoc. 107 152–167.
- Mazumder, R. and Hastie, T. (2012). Exact covariance thresholding into connected components for large-scale graphical lasso. J. Mach. Learn. Res. 13 781–794.Zentralblatt MATH: 1283.62148
- Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.Mathematical Reviews (MathSciNet): MR2278363
Zentralblatt MATH: 1113.62082
Digital Object Identifier: doi:10.1214/009053606000000281
Project Euclid: euclid.aos/1152540754 - Myers, C. L. and Troyanskaya, O. G. (2007). Context-sensitive data integration and prediction of biological networks. Bioinformatics 23 2322–2330.
- Myers, C. L., Robson, D., Wible, A., Hibbs, M. A., Chiriac, C., Theesfeld, C. L., Dolinski, K. and Troyanskaya, O. G. (2005). Discovery of biological networks from diverse functional genomic data. Genome Biol. 6 Article ID R114. DOI:10.1186/gb-2005-6-13-r114.
- Myers, C. L., Barrett, D. R., Hibbs, M. A., Huttenhower, C. and Troyanskaya, O. G. (2006). Finding function: Evaluation methods for functional genomic data. BMC Genomics 7 187.
- Network, C. G. A. et al. (2012). Comprehensive molecular portraits of human breast tumours. Nature 490 61–70.
- Newman, M. E. J. (2006). Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E (3) 74 Article ID 036104.
- Ouyang, Z., Zhou, Q. and Wong, W. H. (2009). ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells. Proc. Natl. Acad. Sci. USA 106 21521–21526.
- Peng, J., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. J. Amer. Statist. Assoc. 104 735–746.Mathematical Reviews (MathSciNet): MR2541591
Zentralblatt MATH: 06441092
Digital Object Identifier: doi:10.1198/jasa.2009.0126 - Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $\ell_{1}$-regularized logistic regression. Ann. Statist. 38 1287–1319.Mathematical Reviews (MathSciNet): MR2662343
Zentralblatt MATH: 1189.62115
Digital Object Identifier: doi:10.1214/09-AOS691
Project Euclid: euclid.aos/1268056617 - Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A. and Kim, D. (2015). Methods of integrating data to uncover genotype-phenotype interactions. Nat. Rev. Genet. 16 85–97.
- Shen, K. and Tseng, G. C. (2010). Meta-analysis for pathway enrichment analysis when combining multiple genomic studies. Bioinformatics 26 1316–1323.
- Tomczak, K., Czerwińska, P. and Wiznerowicz, M. (2015). The Cancer Genome Atlas (TCGA): An immeasurable source of knowledge. Contemp. Oncol. 19 A68–A77.
- Troyanskaya, O. G., Dolinski, K., Owen, A. B., Altman, R. B. and Botstein, D. (2003). A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc. Natl. Acad. Sci. USA 100 8348–8353.
- Varambally, S., Yu, J., Laxman, B., Rhodes, D. R., Mehra, R., Tomlins, S. A., Shah, R. B., Chandran, U., Monzon, F. A., Becich, M. J. et al. (2005). Integrative genomic and proteomic analysis of prostate cancer reveals signatures of metastatic progression. Cancer Cell 8 393–406.
- Witten, D. M., Friedman, J. H. and Simon, N. (2011). New insights and faster computations for the graphical lasso. J. Comput. Graph. Statist. 20 892–900.
- Yang, E., Ravikumar, P., Allen, G. I. and Liu, Z. (2013). On graphical models via univariate exponential family distributions. Preprint. Available at arXiv:1301.4183.arXiv: arXiv:1301.4183
- Yin, J. and Li, H. (2011). A sparse conditional Gaussian graphical model for analysis of genetical genomics data. Ann. Appl. Stat. 5 2630–2650.
- Yook, S.-H., Oltvai, Z. N. and Barabási, A.-L. (2004). Functional and topological characterization of protein interaction networks. Proteomics 4 928–942.
- Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.
- Zhang, Y., Ouyang, Z. and Zhao, H. (2017). Supplement to “A statistical framework for data integration through graphical models with application to cancer genomics.” DOI:10.1214/16-AOAS998SUPP.
Supplemental materials
- Supplement to “A statistical framework for data integration through graphical models with application to cancer genomics.”. We present technical and methodological details regarding the model and algorithm in Section 2 and 4. Furthermore, complementary results for the application in Section 7 are provided.Digital Object Identifier: doi:10.1214/16-AOAS998SUPPSupplemental files available for subscribers.

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Detection of epigenomic network community oncomarkers
Bartlett, Thomas E. and Zaikin, Alexey, The Annals of Applied Statistics, 2016 - A hierarchical Bayesian model for inference of copy number variants and their association to gene expression
Cassese, Alberto, Guindani, Michele, Tadesse, Mahlet G., Falciani, Francesco, and Vannucci, Marina, The Annals of Applied Statistics, 2014 - An integrative analysis of cancer gene expression studies using Bayesian latent factor modeling
Merl, Daniel, Chen, Julia Ling-Yu, Chi, Jen-Tsan, and West, Mike, The Annals of Applied Statistics, 2009
- Detection of epigenomic network community oncomarkers
Bartlett, Thomas E. and Zaikin, Alexey, The Annals of Applied Statistics, 2016 - A hierarchical Bayesian model for inference of copy number variants and their association to gene expression
Cassese, Alberto, Guindani, Michele, Tadesse, Mahlet G., Falciani, Francesco, and Vannucci, Marina, The Annals of Applied Statistics, 2014 - An integrative analysis of cancer gene expression studies using Bayesian latent factor modeling
Merl, Daniel, Chen, Julia Ling-Yu, Chi, Jen-Tsan, and West, Mike, The Annals of Applied Statistics, 2009 - Sparse integrative clustering of multiple omics data sets
Shen, Ronglai, Wang, Sijian, and Mo, Qianxing, The Annals of Applied Statistics, 2013 - Bayesian sparse graphical models for classification with application to protein expression data
Baladandayuthapani, Veerabhadran, Talluri, Rajesh, Ji, Yuan, Coombes, Kevin R., Lu, Yiling, Hennessy, Bryan T., Davies, Michael A., and Mallick, Bani K., The Annals of Applied Statistics, 2014 - Bayesian joint modeling of multiple gene
networks and diverse genomic data to identify target genes of a
transcription factor
Wei, Peng and Pan, Wei, The Annals of Applied Statistics, 2012 - A hierarchical framework for state-space matrix inference and clustering
Zuo, Chandler, Chen, Kailei, Hewitt, Kyle J., Bresnick, Emery H., and Keleş, Sündüz, The Annals of Applied Statistics, 2016 - Learning a nonlinear dynamical system model of gene regulation: A perturbed steady-state approach
Meister, Arwen, Li, Ye Henry, Choi, Bokyung, and Wong, Wing Hung, The Annals of Applied Statistics, 2013 - A Bayesian graphical model for genome-wide association studies (GWAS)
Briollais, Laurent, Dobra, Adrian, Liu, Jinnan, Friedlander, Matt, Ozcelik, Hilmi, and Massam, Hélène, The Annals of Applied Statistics, 2016 - A Bayesian graphical modeling approach to
microRNA regulatory network inference
Stingo, Francesco C., Chen, Yian A., Vannucci, Marina, Barrier, Marianne, and Mirkes, Philip E., The Annals of Applied Statistics, 2010
