Annals of Applied Statistics

Refining cellular pathway models using an ensemble of heterogeneous data sources

Alexander M. Franks, Florian Markowetz, and Edoardo M. Airoldi

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Improving current models and hypotheses of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of new high-throughput studies. Moreover, the available sources of data are heterogeneous, and the data need to be integrated in different ways depending on which part of the pathway they are most informative for. In this paper, we introduce a compartment specific strategy to integrate edge, node and path data for refining a given network hypothesis. To carry out inference, we use a local-move Gibbs sampler for updating the pathway hypothesis from a compendium of heterogeneous data sources, and a new network regression idea for integrating protein attributes. We demonstrate the utility of this approach in a case study of the pheromone response MAPK pathway in the yeast S. cerevisiae.

Article information

Ann. Appl. Stat., Volume 12, Number 3 (2018), 1361-1384.

Received: March 2014
Revised: January 2016
First available in Project Euclid: 11 September 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Multi-level modeling statistical network analysis Bayesian inference regulation and signaling dynamics


Franks, Alexander M.; Markowetz, Florian; Airoldi, Edoardo M. Refining cellular pathway models using an ensemble of heterogeneous data sources. Ann. Appl. Stat. 12 (2018), no. 3, 1361--1384. doi:10.1214/16-AOAS915.

Export citation


  • Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K. and Walter, P. (2002). Molecular Biology of the Cell, 4th ed. Garland Science, New York.
  • Balbin, O. A., Prensner, J. R., Sahu, A., Yocum, A., Shankar, S., Malik, R., Fermin, D., Dhanasekaran, S. M., Chandler, B., Thomas, D., Beer, D. G., Cao, X., Nesvizhskii, A. I. and Chinnaiyan, A. M. (2013). Reconstructing targetable pathways in lung cancer by integrating diverse omics data. Nat. Commun. 4 Article ID 2617. DOI:10.1038/ncomms3617.
  • Bernard, A. and Hartemink, A. J. (2005). Informative structure priors: Joint learning of dynamic regulatory networks from multiple types of data. In Pacific Symposium on Biocomputing 459–470.
  • Brem, R. B. and Kruglyak, L. (2005). The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl. Acad. Sci. USA 102 1572–1577.
  • Franks, A., Markowetz, F. and Airoldi, E. (2018). Supplement to “Refining cellular pathway models using an ensemble of heterogeneous data sources.” DOI:10.1214/16-AOAS915SUPP.
  • Friedman, N. (2004). Inferring cellular networks using probabilistic graphical models. Science 303 799–805.
  • Fröhlich, H., Fellmann, M., Sültmann, H., Poustka, A. and Beißbarth, T. (2007). Large scale statistical inference of signaling pathways from RNAi and microarray data. BMC Bioinform. 8 Article ID 386.
  • Fröhlich, H., Beissbarth, T., Tresch, A., Kostka, D., Jacob, J., Spang, R. and Markowetz, F. (2008a). Analyzing gene perturbation screens with nested effects models in R and Bioconductor. Bioinformatics 24 2549–2550.
  • Fröhlich, H., Fellmann, M., Sültmann, H., Poustka, A. and Beißbarth, T. (2008b). Predicting pathway membership via domain signatures. Bioinformatics 24 2137–2142.
  • Gasch, A. P., Spellman, P. T., Kao, C. M., Carmel-Harel, O., Eisen, M. B., Storz, G., Botstein, D. and Brown, P. O. (2000). Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11 4241–4257.
  • Gat-Viks, I. and Shamir, R. (2007). Refinement and expansion of signaling pathways: The osmotic response network in yeast. Genome Res. 17 358–367.
  • Gelman, A., Jakulin, A., Pittau, M. G. and Su, Y.-S. (2008). A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat. 2 1360–1383.
  • Gitter, A., Carmi, M., Barkai, N. and Bar-Joseph, Z. (2013). Linking the signaling cascades and dynamic regulatory networks controlling stress responses. Genome Res. 23 365–376.
  • Gruhler, A., Olsen, J. V., Mohammed, S., Mortensen, P., Faergeman, N. J., Mann, M. and Jensen, O. N. (2005). Quantitative phosphoproteomics applied to the yeast pheromone signaling pathway. Mol. Cell. Proteomics 4 310–327.
  • Guan, Y., Myers, C. L., Hess, D. C., Barutcuoglu, Z., Caudy, A. A. and Troyanskaya, O. G. (2008). Predicting gene function in a hierarchical context with an ensemble of classifiers. Genome Biol. 9(Suppl. 1) S3.
  • Guan, Y., Gorenshteyn, D., Burmeister, M., Wong, A. K., Schimenti, J. C., Handel, M. A., Bult, C. J., Hibbs, M. A. and Troyanskaya, O. G. (2012). Tissue-specific functional networks for prioritizing phenotype and disease genes. PLoS Comput. Biol. 8 Article ID e1002694.
  • Hahne, F., Mehrle, A., Arlt, D., Poustka, A., Wiemann, S. and Beißbarth, T. (2008). Extending pathways based on gene lists using InterPro domain signatures. BMC Bioinform. 9 Article ID 3. DOI:10.1186/1471-2105-9-3.
  • Hara, K., Ono, T., Kuroda, K. and Ueda, M. (2012). Membrane-displayed peptide ligand activates the pheromone response pathway in Saccharomyces cerevisiae. J. Biochem. 151 551–557.
  • Harbison, C. T., Gordon, D. B., Lee, T. I., Rinaldi, N. J., Macisaac, K. D., Danford, T. W., Hannett, N. M., Tagne, J.-B., Reynolds, D. B., Yoo, J., Jennings, E. G., Zeitlinger, J., Pokholok, D. K., Kellis, M., Rolfe, P. A., Takusagawa, K. T., Lander, E. S., Gifford, D. K., Fraenkel, E. and Young, R. A. (2004). Transcriptional regulatory code of a eukaryotic genome. Nature 431 99–104.
  • Hibbs, M. A., Hess, D. C., Myers, C. L., Huttenhower, C., Li, K. and Troyanskaya, O. G. (2007). Exploring the functional landscape of gene expression: Directed search of large microarray compendia. Bioinformatics 23 2692–2699.
  • Hibbs, M. A., Myers, C. L., Huttenhower, C., Hess, D. C., Li, K., Caudy, A. A. et al. (2009). Directing experimental biology: A case study in mitochondrial biogenesis. PLoS Comput. Biol. 5(3) Article ID e1000322.
  • Hughes, T. R., Marton, M. J., Jones, A. R., Roberts, C. J., Stoughton, R., Armour, C. D., Bennett, H. A., Coffey, E., Dai, H., He, Y. D., Kidd, M. J., King, A. M., Meyer, M. R., Slade, D., Lum, P. Y., Stepaniants, S. B., Shoemaker, D. D., Gachotte, D., Chakraburtty, K., Simon, J., Bard, M. and Friend, S. H. (2000). Functional discovery via a compendium of expression profiles. Cell 102 109–126.
  • Hyduke, D. R. and Palsson, B. Ø. (2010). Towards genome-scale signalling network reconstructions. Nat. Rev. Genet. 11 297–307.
  • Isci, S., Dogan, H., Ozturk, C. and Otu, H. H. (2014). Bayesian network prior: Network analysis of biological data using external knowledge. Bioinformatics 30 860–867.
  • Kanehisa, M. and Goto, S. (2000). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28 27–30.
  • Kirouac, D. C., Saez-Rodriguez, J., Swantek, J., Burke, J. M., Lauffenburger, D. A. and Sorger, P. K. (2012). Creating and analyzing pathway and protein interaction compendia for modelling signal transduction networks. BMC Syst. Biol. 6 Article ID 29.
  • Knapp, B. and Kaderali, L. (2013). Reconstruction of cellular signal transduction networks using perturbation assays and linear programming. PLoS ONE 8 Article ID e69220.
  • Kofahl, B. and Klipp, E. (2004). Modelling the dynamics of the yeast pheromone pathway. Yeast 21 831–850.
  • Letunic, I., Doerks, T. and Bork, P. (2012). SMART 7: Recent updates to the protein domain annotation resource. Nucleic Acids Res. 40 D302–D305.
  • Li, J., Wei, H., Liu, T. and Zhao, P. X. (2013). GPLEXUS: Enabling genome-scale gene association network reconstruction and analysis for very large-scale expression data. Nucleic Acids Res. 42 Article ID e32.
  • Llewellyn, R. and Eisenberg, D. S. (2008). Annotating proteins with generalized functional linkages. Proc. Natl. Acad. Sci. USA 105 17700–17705.
  • Lo, K., Raftery, A. E., Dombek, K. M., Zhu, J., Schadt, E. E., Bumgarner, R. E. and Yeung, K. Y. (2012). Integrating external biological knowledge in the construction of regulatory networks from time-series expression data. BMC Syst. Biol. 6 Article ID 101.
  • Markowetz, F. and Spang, R. (2007). Inferring cellular networks—A review. BMC Bioinform. 8(Suppl. 6) S5.
  • Markowetz, F., Kostka, D., Troyanskaya, O. G. and Spang, R. (2007). Nested effects models for high-dimensional phenotyping screens. Bioinformatics 23 i305–i312.
  • McClean, M. N., Mody, A., Broach, J. R. and Ramanathan, S. (2007). Cross-talk and decision making in MAP kinase pathways. Nat. Genet. 39 409–414.
  • Mukherjee, S. and Speed, T. P. (2008). Network inference using informative priors. Proc. Natl. Acad. Sci. USA 105 14313–14318.
  • Mulder, K. W., Wang, X., Escriu, C., Ito, Y., Schwarz, R. F., Gillis, J., Sirokmány, G., Donati, G., Uribe-Lewis, S., Pavlidis, P., Murrell, A., Markowetz, F. and Watt, F. M. (2012a). Diverse epigenetic strategies interact to control epidermal differentiation. Nat. Cell Biol. 14 753–763.
  • Müller, P., Kuttenkeuler, D., Gesellchen, V., Zeidler, M. P. and Boutros, M. (2005). Identification of JAK/STAT signalling components by genome-wide RNA interference. Nature 436 871–875.
  • Myers, C. L., Robson, D., Wible, A., Hibbs, M. A., Chiriac, C., Theesfeld, C. L., Dolinski, K. and Troyanskaya, O. G. (2005). Discovery of biological networks from diverse functional genomic data. Genome Biol. 6 Article ID R114.
  • Nagiec, M. J. and Dohlman, H. G. (2012). Checkpoints in a yeast differentiation pathway coordinate signaling during hyperosmotic stress. PLoS Genet. 8 Article ID e1002437.
  • Nariai, N., Kim, S., Imoto, S. and Miyano, S. (2004). Using protein-protein interactions for refining gene networks estimated from microarray data by Bayesian networks. In Pacific Symposium on Biocomputing 336–347.
  • Ourfali, O., Shlomi, T., Ideker, T., Ruppin, E. and Sharan, R. (2007). SPINE: A framework for signaling-regulatory pathway inference from cause-effect experiments. Bioinformatics 23 i359–i366.
  • Pham, L., Christadore, L., Schaus, S. and Kolaczyk, E. D. (2011). Network-based prediction for sources of transcriptional dysregulation using latent pathway identification analysis. Proc. Natl. Acad. Sci. USA 108 13347–13352.
  • Pham, L. M., Carvalho, L., Schaus, S. and Kolaczyk, E. D. (2016). Perturbation detection through modeling of gene expression on a latent biological pathway network: A Bayesian hierarchical approach. J. Amer. Statist. Assoc. 111 73–92.
  • Pounds, S. and Morris, S. W. (2003). Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of $p$-values. Bioinformatics 19 1236–1242.
  • Punta, M., Coggill, P. C., Eberhardt, R. Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., Clements, J., Heger, A., Holm, L., Sonnhammer, E. L. L., Eddy, S. R., Bateman, A. and Finn, R. D. (2012). The Pfam protein families database. Nucleic Acids Res. 40 D290–D301.
  • Reguly, T., Breitkreutz, A., Boucher, L., Breitkreutz, B.-J., Hon, G. C., Myers, C. L., Parsons, A., Friesen, H., Oughtred, R., Tong, A., Stark, C., Ho, Y., Botstein, D., Andrews, B., Boone, C., Troyanskya, O. G., Ideker, T., Dolinski, K., Batada, N. N. and Tyers, M. (2006). Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J. Biol. 5 Article ID 11.
  • Ren, B., Robert, F., Wyrick, J. J., Aparicio, O., Jennings, E. G., Simon, I., Zeitlinger, J., Schreiber, J., Hannett, N., Kanin, E., Volkert, T. L., Wilson, C. J., Bell, S. P. and Young, R. A. (2000). Genome-wide location and function of DNA binding proteins. Science 290 2306–2309.
  • Roberts, C. J., Nelson, B., Marton, M. J., Stoughton, R., Meyer, M. R., Bennett, H. A., He, Y. D., Dai, H., Walker, W. L., Hughes, T. R., Tyers, M., Boone, C. and Friend, S. H. (2000). Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. Science 287 873–880.
  • Ryan, C. J., Cimermani, P., Szpiech, Z. A., Sali, A., Hernandez, R. D. and Krogan, N. J. (2013). High-resolution network biology: Connecting sequence with function. Nat. Rev. Genet. 14 865–879.
  • Schäfer, J. and Strimmer, K. (2005a). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4 Article ID 32.
  • Schäfer, J. and Strimmer, K. (2005b). An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 21 754–764.
  • Schultz, J., Milpetz, F., Bork, P. and Ponting, C. P. (1998). SMART, a simple modular architecture research tool: Identification of signaling domains. Proc. Natl. Acad. Sci. USA 95 5857–5864.
  • Scott, J., Ideker, T., Karp, R. M. and Sharan, R. (2006). Efficient algorithms for detecting signaling pathways in protein interaction networks. J. Comput. Biol. 13 133–144.
  • Segal, E., Wang, H. and Koller, D. (2003). Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics 19(Suppl. 1) i264–i271.
  • Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D. and Friedman, N. (2003). Module networks: Identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34 166–176.
  • Simon, I., Barnett, J., Hannett, N., Harbison, C. T., Rinaldi, N. J., Volkert, T. L., Wyrick, J. J., Zeitlinger, J., Gifford, D. K., Jaakkola, T. S. and Young, R. A. (2001). Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 106 697–708.
  • Stark, C., Breitkreutz, B.-J., Reguly, T., Boucher, L., Breitkreutz, A. and Tyers, M. (2006). BioGRID: A general repository for interaction datasets. Nucleic Acids Res. 34 D535–D539.
  • Stelniec-Klotz, I., Legewie, S., Tchernitsa, O., Witzel, F., Klinger, B., Sers, C., Herzel, H., Blüthgen, N. and Schäfer, R. (2012b). Reverse engineering a hierarchical regulatory network downstream of oncogenic KRAS. Mol. Syst. Biol. 8 Article ID 601.
  • Tresch, A. and Markowetz, F. (2008). Structure learning in nested effects models. Stat. Appl. Genet. Mol. Biol. 7 Article ID 9.
  • Vaga, S., Bernardo-Faura, M., Cokelaer, T., Maiolica, A., Barnes, C. A., Gillet, L. C., Hegemann, B., van Drogen, F., Sharifian, H., Klipp, E., Peter, M., Saez-Rodriguez, J. and Aebersold, R. (2014). Phosphoproteomic analyses reveal novel cross-modulation mechanisms between two signaling pathways in yeast. Mol. Syst. Biol. 10 Article ID 767.
  • Wang, X., Castro, M. A., Mulder, K. W. and Markowetz, F. (2012). Posterior association networks and functional modules inferred from rich phenotypes of gene perturbations. PLoS Comput. Biol. 8 Article ID e1002566.
  • Wang, X., Yuan, K., Hellmayr, C., Liu, W. and Markowetz, F. (2014). Reconstructing evolving signalling networks by hidden Markov nested effects models. Ann. Appl. Stat. 8 448–480.
  • Werhli, A. V. and Husmeier, D. (2007). Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge. Stat. Appl. Genet. Mol. Biol. 6 Article ID 15.
  • Workman, C. T., Mak, H. C., McCuine, S., Tagne, J.-B., Agarwal, M., Ozier, O., Begley, T. J., Samson, L. D. and Ideker, T. (2006). A systems approach to mapping DNA damage response pathways. Science 312 1054–1059.
  • Yates, P. D. and Mukhopadhyay, N. D. (2013). An inferential framework for biological network hypothesis tests. BMC Bioinform. 14 Article ID 94.
  • Yeang, C.-H., Mak, H. C., McCuine, S., Workman, C., Jaakkola, T. and Ideker, T. (2005). Validation and refinement of gene-regulatory pathways on a network of physical interactions. Genome Biol. 6 Article ID R62.
  • Yip, K. Y., Alexander, R. P., Yan, K.-K. and Gerstein, M. (2010). Improved reconstruction of in silico gene regulatory networks by integrating knockout and perturbation data. PLoS ONE 5 Article ID e8121.

Supplemental materials

  • Supplementary figures. In this Appendix, we give convergence diagnostics for network statistics, we explore sensitivity of the results to variations in the compartment map, and we give more details about the simulation results.