The Annals of Applied Statistics

Joint estimation of multiple related biological networks

Chris J. Oates, Jim Korkola, Joe W. Gray, and Sach Mukherjee

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Graphical models are widely used to make inferences concerning interplay in multivariate systems. In many applications, data are collected from multiple related but nonidentical units whose underlying networks may differ but are likely to share features. Here we present a hierarchical Bayesian formulation for joint estimation of multiple networks in this nonidentically distributed setting. The approach is general: given a suitable class of graphical models, it uses an exchangeability assumption on networks to provide a corresponding joint formulation. Motivated by emerging experimental designs in molecular biology, we focus on time-course data with interventions, using dynamic Bayesian networks as the graphical models. We introduce a computationally efficient, deterministic algorithm for exact joint inference in this setting. We provide an upper bound on the gains that joint estimation offers relative to separate estimation for each network and empirical results that support and extend the theory, including an extensive simulation study and an application to proteomic data from human cancer cell lines. Finally, we describe approximations that are still more computationally efficient than the exact algorithm and that also demonstrate good empirical performance.

Article information

Source
Ann. Appl. Stat. Volume 8, Number 3 (2014), 1892-1919.

Dates
First available in Project Euclid: 23 October 2014

Permanent link to this document
http://projecteuclid.org/euclid.aoas/1414091238

Digital Object Identifier
doi:10.1214/14-AOAS761

Mathematical Reviews number (MathSciNet)
MR3271357

Zentralblatt MATH identifier
1304.62136

Keywords
Bayesian network hierarchical model belief propagation information sharing

Citation

Oates, Chris J.; Korkola, Jim; Gray, Joe W.; Mukherjee, Sach. Joint estimation of multiple related biological networks. Ann. Appl. Stat. 8 (2014), no. 3, 1892--1919. doi:10.1214/14-AOAS761. http://projecteuclid.org/euclid.aoas/1414091238.


Export citation

References

  • The 1000 Genomes Project Consortium (2010). A map of human genome variation from population-scale sequencing. Nature 467 1061–1073.
  • Akbani, R. et al. (2014). A pan-cancer proteomic perspective on the Cancer Genome Atlas. Nat. Commun. 5 3887.
  • Barretina, J., Caponigro, G., Stransky, N., Venkatesan, K., Margolin, A. A., Kim, S., Wilson, C. J., Lehár, J., Kryukov, G. V., Sonkin, D., Reddy, A., Liu, M., Murray, L., Berger, M. F., Monahan, J. E., Morais, P., Meltzer, J., Korejwa, A., Jané-Valbuena, J., Mapa, F. A., Thibault, J., Bric-Furlong, E., Raman, P., Shipway, A., Engels, I. H., Cheng, J., Yu, G. K., Yu, J., Peter Aspesi, J., de Silva, M., Jagtap, K., Jones, M. D., Wang, L., Hatton, C., Palescandolo, E., Gupta, S., Mahan, S., Sougnez, C., Onofrio, R. C., Liefeld, T., MacConaill, L., Winckler, W., Reich, M., Li, N., Mesirov, J. P., Gabriel, S. B., Getz, G., Ardlie, K., Chan, V., Myer, V. E., Weber, B. L., Porter, J., Warmuth, M., Finan, P., Harris, J. L., Meyerson, M., Golub, T. R., Morrissey, M. P., Sellers, W. R., Schlegel, R. and Garraway, L. A. (2012). The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483 603–607.
  • Bayarri, M. J., Berger, J. O., Forte, A. and García-Donato, G. (2012). Criteria for Bayesian model choice with application to variable selection. Ann. Statist. 40 1550–1577.
  • Boutilier, C., Friedman, N., Goldszmidt, M. and Koller, D. (1996). Context-specific independence in Bayesian networks. In Uncertainty in Artificial Intelligence (Portland, OR, 1996) 115–123. Morgan Kaufmann, San Francisco, CA.
  • Butte, A. J., Tamayo, P., Slonim, D., Golub, T. R. and Kohane, I. S. (2000). Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc. Natl. Acad. Sci. USA 97 12182–12186.
  • The Cancer Genome Atlas Network (2012). Comprehensive molecular portraits of human breast tumours. Nature 490 61–70.
  • Cao, J., Schneeberger, K., Ossowski, S., Günther, T., Bender, S., Fitz, J., Koenig, D., Lanz, C., Stegle, O., Lippert, C., Wang, X., Ott, F., Müller, J., Alonso-Blanco, C., Borgwardt, K., Schmid, K. J. and Weigel, D. (2011). Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat. Genet. 43 956–963.
  • Curtis, C. et al. (2012). The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486 346–352.
  • Danaher, P., Wang, P. and Witten, D. M. (2014). The joint graphical lasso for inverse covariance estimation across multiple classes. J. R. Stat. Soc. Ser. B Stat. Methodol. 76 373–397.
  • Dondelinger, F., Lèbre, S. and Husmeier, D. (2013). Non-homogeneous dynamic Bayesian networks with Bayesian regularization for inferring gene regulatory networks with gradually time-varying structure. Mach. Learn. 90 191–230.
  • Forbes, S. A. et al. (2011). COSMIC: Mining complete cancer genomes in the catalogue of somatic mutations in cancer. Nucleic Acids Res. 39 D945–D950.
  • Geiger, D. and Heckerman, D. (1996). Knowledge representation and inference in similarity networks and Bayesian multinets. Artificial Intelligence 82 45–74.
  • George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731–747.
  • Hennessy, B. T. et al. (2010). A technical assessment of the utility of reverse phase protein arrays for the study of the functional proteome in nonmicrodissected human breast cancer. Clin. Proteom. 6 129–151.
  • Hill, S. M. et al. (2012). Bayesian inference of signaling network topology in a cancer cell line. Bioinformatics 28 2804–2810.
  • Ibrahim, J. G. and Chen, M.-H. (2000). Power prior distributions for regression models. Statist. Sci. 15 46–60.
  • Ideker, T. and Krogan, N. J. (2012). Differential network biology. Mol. Syst. Biol. 8 565.
  • Imoto, S. et al. (2003). Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks. In Proceedings of the IEEE Computer Society Bioinformatics Conference 104-113.
  • Kschischang, F. R., Frey, B. J. and Loeliger, H.-A. (2001). Factor graphs and the sum-product algorithm. IEEE Trans. Inform. Theory 47 498–519.
  • Liang, F., Paulo, R., Molina, G., Clyde, M. A. and Berger, J. O. (2008). Mixtures of $g$ priors for Bayesian variable selection. J. Amer. Statist. Assoc. 103 410–423.
  • Maher, B. (2012). ENCODE: The human encyclopaedia. Nature 489 46–48.
  • Mukherjee, S. and Speed, T. P. (2008). Network inference using informative priors. Proc. Natl. Acad. Sci. USA 105 14313–14318.
  • Murphy, K. P. (2002). Dynamic Bayesian networks: Representation, inference and learning. Ph.D. thesis, California Univ., Berkeley.
  • Neve, R. M. et al. (2006). A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell 10 515–527.
  • Niculescu-Mizil, A. and Caruana, R. (2007). Inductive transfer for Bayesian network structure learning. J. Mach. Learn. Res. Workshop and Conference Proceedings 27 339–346.
  • Oates, C. J. and Mukherjee, S. (2014). Joint structure learning of multiple non-exchangeable networks. J. Mach. Learn. Res. Workshop and Conference Proceedings 33 687–695. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS 2014).
  • Oates, C. J., Costa, L. and Nichols, T. E. (2014). Towards a multi-subject analysis of neural connectivity. Neural Comput. To appear.
  • Oates, C. J., Korkola, J. Gray, J. W. and Mukherjee, S. (2014a). Supplement to “Joint estimation of multiple related biological networks.” DOI:10.1214/14-AOAS761SUPPA.
  • Oates, C. J., Korkola, J. Gray, J. W. and Mukherjee, S. (2014b). Supplement to “Joint estimation of multiple related biological networks.” DOI:10.1214/14-AOAS761SUPPB.
  • Oyen, D. and Lane, T. (2013). Bayesian discovery of multiple Bayesian networks via transfer learning. In IEEE Thirteenth International Conference on Data Mining (ICDM) 577–586. IEEE, Dallas.
  • Pearl, J. (1982). Reverend bayes on inference engines: A distributed tree approach. In Proceedings of the Second National Conference on Artificial Intelligence 133–136. Pittsburg.
  • Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge Univ. Press, Cambridge.
  • Penfold, C. A., Buchanan-Wollaston, V., Denby, K. J. and Wild, D. L. (2012). Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks. Bioinformatics 28 i233–i241.
  • Spencer, S., Hill, S. M. and Mukherjee, S. (2012). Dynamic Bayesian networks for interventional data. CRiSM Working Paper 12:24, Warwick Univ., UK.
  • Werhli, A. V. and Husmeier, D. (2008). Gene regulatory network reconstruction by Bayesian integration of prior knowledge and/or different experimental conditions. J. Bioinform. Comput. Biol. 6 543–572.
  • Xu, T. R. et al. (2010). Inferring signaling pathway topologies from multiple perturbation measurements of specific biochemical species. Sci. Sig. 3 ra20.
  • Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with $g$-prior distributions. In Bayesian Inference and Decision Techniques. Stud. Bayesian Econometrics Statist. 6 233–243. North-Holland, Amsterdam.

Supplemental materials

  • Supplementary material A: Additional results and protocols. Includes: Alternative data generating models; robustness to in-degree restriction, outliers, batch effects and nonexchangeability; ancillary information for breast cancer; inferred wild type networks for breast cancer.
  • Supplementary material B: Computational implementation. MATLAB R2014a code (serial and parallel) implementing joint network inference.