## The Annals of Applied Statistics

### Network inference and biological dynamics

#### Abstract

Network inference approaches are now widely used in biological applications to probe regulatory relationships between molecular components such as genes or proteins. Many methods have been proposed for this setting, but the connections and differences between their statistical formulations have received less attention. In this paper, we show how a broad class of statistical network inference methods, including a number of existing approaches, can be described in terms of variable selection for the linear model. This reveals some subtle but important differences between the methods, including the treatment of time intervals in discretely observed data. In developing a general formulation, we also explore the relationship between single-cell stochastic dynamics and network inference on averages over cells. This clarifies the link between biochemical networks as they operate at the cellular level and network inference as carried out on data that are averages over populations of cells. We present empirical results, comparing thirty-two network inference methods that are instances of the general formulation we describe, using two published dynamical models. Our investigation sheds light on the applicability and limitations of network inference and provides guidance for practitioners and suggestions for experimental design.

#### Article information

Source
Ann. Appl. Stat. Volume 6, Number 3 (2012), 1209-1235.

Dates
First available in Project Euclid: 31 August 2012

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1346418580

Digital Object Identifier
doi:10.1214/11-AOAS532

Mathematical Reviews number (MathSciNet)
MR3012527

Zentralblatt MATH identifier
1257.62108

#### Citation

Oates, Chris. J.; Mukherjee, Sach. Network inference and biological dynamics. Ann. Appl. Stat. 6 (2012), no. 3, 1209--1235. doi:10.1214/11-AOAS532. https://projecteuclid.org/euclid.aoas/1346418580

#### References

• Äijö, T. and Lähdesmäki, H. (2009). Learning gene regulatory networks from gene expression measurements using nonparametric molecular kinetics. Bioinformatics 25 2937–2944.
• Altay, G. and Emmert-Streib, F. (2010). Revealing differences in gene network inference algorithms on the network level by ensemble methods. Bioinformatics 26 1738–1744.
• Babu, M. M., Luscombe, N. M., Aravind, L., Gerstein, M. and Teichmann, S. A. (2004). Structure and evolution of transcriptional regulatory networks. Curr. Opin. Struct. Biol. 14 283–291.
• Bansal, M., Belcastro, V. and Ambesi-Impiombato, A. (2007). How to infer gene networks from expression profiles. Mol. Sys. Bio. 3 Article No. 78.
• Bansal, M. and di Bernardo, D. (2007). Inference of gene networks from temporal gene expression profiles. IET Syst. Biol. 1 306–312.
• Beal, M. J., Falciani, F., Ghahramani, Z., Rangel, C. and Wild, D. L. (2005). A Bayesian approach to reconstructing genetic regulatory networks with hidden factors. Bioinformatics 21 349–356.
• Bolstad, A., Van Veen, B. D. and Nowak, R. (2011). Causal network inference via group sparse regularization. IEEE Trans. Signal Process. 59 2628–2641.
• Bonneau, R. (2008). Learning biological networks: From modules to dynamics. Nat. Chem. Bio. 4 658–664.
• Burnham, K. P. and Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, 2nd ed. Springer, New York.
• Camacho, D. M. and Collins, J. J. (2009). Systems biology strikes gold. Cell 137 24–26.
• Cantone, I., Marucci, L., Iorio, F., Ricci, M. A., Belcastro, V., Bansal, M., Santini, S., di Bernardo, M., di Bernardo, D. and Cosma, M. P. (2009). A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell 137 172–181.
• Craciun, G. and Pantea, C. (2008). Identifiability of chemical reaction networks. J. Math. Chem. 44 244–259.
• Dargatz, C. (2010). Bayesian inference for diffusion processes with applications in life sciences. Ph.D. thesis, München.
• Davidson, E. H. (2001). Gene Regulatory Systems. Development And Evolution. Academic Press, San Diego.
• Deltell, A. (2011). Objective Bayes criteria for variable selection. Ph.D. thesis, Universitat de Valencia.
• Eaton, D. and Murphy, K. (2007). Exact Bayesian structure learning from uncertain interventions. In Proceedings of 11th Conference on Artificial Intelligence and Statistics, March 2124, 2007, San Juan, Puerto Rico. Journal of Machine Learning Research, Workshop and Conference Proceedings, Vol. 2: AISTATS 2007 107-114.
• Ellis, B. and Wong, W. H. (2008). Learning causal Bayesian network structures from experimental data. J. Amer. Statist. Assoc. 103 778–789.
• Elowitz, M. B., Levine, A. J. and Siggia, E. D. (2002). Stochastic gene expression in a single cell. Science 297 1129–1131.
• Fawcett, T. (2005). An introduction to ROC analysis. Pattern Recognition Letters 27 861–874.
• Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
• Friedman, J. and Koller, D. (2003). Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Mach. Learn. 50 95–125.
• Friedman, N., Linial, M. and Nachman, I. et al. (2000). Using Bayesian networks to analyze expression data. J. Comp. Bio. 7 601–620.
• Grzegorczyk, M. and Husmeier, D. (2011). Improvements in the reconstruction of time-varying gene regulatory networks: Dynamic programming and regularization by information sharing among genes. Bioinformatics 27 693–699.
• Hache, H., Lehrach, H. and Herwig, R. (2009). Reverse engineering of gene regulatory networks: A comparative study. EURASIP J. Bioinform. Syst. Biol. 617281.
• Hans, C., Dobra, A. and West, M. (2007). Shotgun stochastic search for “large $p$” regression. J. Amer. Statist. Assoc. 102 507–516.
• Hecker, M., Lambeck, S., Toepfer, S., van Someren, E. and Guthke, R. (2009). Gene regulatory network inference: Data integration in dynamic models—a review. BioSystems 96 86–103.
• Hill, S., Lu, Y. and Molina, J. (2011). Bayesian inference of signaling network topology in a single cancer. Unpublished manuscript.
• Hurn, A., Jeisman, J. and Lindsay, K. (2007). Seeing the wood for the trees: A critical evaluation of methods to estimate the parameters of stochastic differential equations. Journal of Financial Econometrics 5 390.
• Ideker, T. and Lauffenburger, D. (2003). Building with a scaffold: Emerging strategies for high to low level cellular modelling. Trends in Biotechnology 21 255–262.
• Kim, S. Y., Imoto, S. and Miyano, S. (2003). Inferring gene networks from time series microarray data using dynamic Bayesian networks. Briefings in Bioinformatics 4 228–235.
• Knowles, D. and Ghahramani, Z. (2011). Nonparametric Bayesian sparse factor models with application to gene expression modeling. Ann. Appl. Stat. 5 1534–1552.
• Kolar, M., Song, L. and Xing, E. P. (2009). Sparsistent learning of varying-coefficient models with structural changes. NIPS 22 1006–1014.
• Kou, S. C., Xie, X. S. and Liu, J. S. (2005). Bayesian analysis of single-molecule experimental data. J. Roy. Statist. Soc. Ser. C 54 469–506.
• Lèbre, S., Becq, J. and Devaux, F. et al. (2010). Statistical inference of the time-varying structure of gene- regulation networks. BMC Systems Biology 4 130.
• Lee, W.-P. and Tzou, W.-S. (2009). Computational methods for discovering gene networks from expression data. Brief. Bioinformatics 10 408–423.
• Li, C. W. and Chen, B. S. (2010). Identifying functional mechanisms of gene and protein regulatory networks in response to a broader range of environmental stresses. Comp. and Func. Genomics 408705.
• Li, Z., Li, P., Krishnan, A. and Liu, J. (2011). Large-scale dynamic gene regulatory network inference combining differential equation models with local dynamic Bayesian network analysis. Bioinformatics 27 2686–2691.
• Marbach, D., Schaffter, T., Mattiussi, C. and Floreano, D. (2009). Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J. Comput. Biol. 16 229–239.
• Markowetz, F. and Spang, R. (2007). Inferring cellular networks—A review. BMC Bioinformatics 8(Suppl. 6) S5.
• McAdams, H. H. and Arkin, A. (1997). Stochastic mechanisms in gene expression. PNAS 94 814–819.
• Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
• Minty, J. J., Varedi, K. S. M. and Nina, L. X. (2009). Network benchmarking: A happy marriage between systems and synthetic biology. Chemistry and Biology 16 239–241.
• Morrissey, E. R., Juárez, M. A., Denby, K. J. and Burroughs, N. J. (2010). On reverse engineering of gene interaction networks using time course data with repeated measurements. Bioinformatics 26 2305–2312.
• Mukherjee, S. and Speed, T. P. (2008). Network inference using informative priors. PNAS 105 14313–14318.
• Nam, D., Yoon, S. H. and Kim, J. F. (2007). Ensemble learning of genetic networks from time-series expression data. Bioinformatics 23 3225–3231.
• Oates, C. J. and Mukherjee, S. (2011). Supplement to “Network inference and biological dynamics.” DOI:10.1214/11-AOAS532SUPP.
• Øksendal, B. (1998). Stochastic Differential Equations: An Introduction with Applications, 5th ed. Springer, Berlin.
• Opgen-Rhein, R. and Strimmer, K. (2007). Learning causal networks from systems biology time course data: An effective model selection procedure for the vector autoregressive process. BMC Bioinformatics 8(Suppl. 2) S3.
• Paulsson, J. (2005). Models of stochastic gene expression. Physics of Life Reviews 2 157–175.
• Pearl, J. (2009). Causal inference in statistics: An overview. Stat. Surv. 3 96–146.
• Prill, R. J., Marbach, D., Saez-Rodriguez, J., Sorger, P. K., Alexopoulos, L. G., Xue, X., Clarke, N. D., Altan-Bonnet, G. and Stolovitzky, G. (2010). Towards a rigorous assessment of systems biology models: The DREAM3 challenges. PLoS ONE 5 e9202.
• Rogers, S., Khanin, R. and Girolami, M. (2007). Bayesian model-based inference of transcription factor activity. BMC Bioinformatics 8(Suppl. 2) S2.
• Schoeberl, B., Eichler-Jonsson, C., Gilles, E. D. and Müller, G. (2002). Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors. Nat. Biotechnol. 20 370–375.
• Smith, V. A., Jarvis, E. D. and Hartemink, A. J. (2002). Evaluating functional network inference using simulations of complex biological systems. Bioinformatics 18 S216–S224.
• Swain, P. S., Elowitz, M. B. and Siggia, E. D. (2002). Intrinsic and extrinsic contributions to stochasticity in gene expression. PNAS 99 12795–12800.
• Swat, M., Kel, A. and Herzel, H. (2004). Bifurcation analysis of the regulatory modules of the mammalian G$_{1}/$S transition. Bioinformatics 20 1506–1511.
• Van den Bulcke, T., Van Leemput, K., Naudts, B., van Remortel, P., Ma, H., Verschoren, A., Moor, B. D. and Marchal, K. (2006). SynTReN: A generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinformatics 7 43.
• Van Kampen, N. G. (2007). Stochastic Processes in Physics and Chemistry, 3rd ed. North Holland, Amsterdam.
• Werhli, A. V., Grzegorczyk, M. and Husmeier, D. (2006). Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks. Bioinformatics 22 2523–2531.
• Wilkinson, D. J. (2006). Stochastic Modelling for Systems Biology. Chapman & Hall/CRC, Boca Raton, FL.
• Wilkinson, D. J. (2009). Stochastic modelling for quantitative description of heterogeneous biological systems. Nature Reviews Genetics 10 122–133.
• Xu, T.-R., Vyshemirsky, V., Gormand, A., von Kriegsheim, A., Girolami, M., Baillie, G. S., Ketley, D., Dunlop, A. J., Milligan, G., Houslay, M. D. and Kolch, W. (2010). Inferring signaling pathway topologies from multiple perturbation measurements of specific biochemical species. Sci. Signal. 3 ra20.
• Yarden, Y. and Sliwkowski, M. X. (2001). Untangling the ErbB signalling network. Nat. Rev. Mol. Cell Biol. 2 127–137.
• Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with $g$-prior distributions. In Bayesian Inference and Decision Techniques. Stud. Bayesian Econometrics Statist. 6 233–243. North-Holland, Amsterdam.
• Zou, C. and Feng, J. (2009). Granger causality vs. dynamic Bayesian network inference: A comparative study. BMC Bioinformatics 10 12.

#### Supplemental materials

• Supplementary material: Additional materials. This supplement provides the dynamical systems used in this paper and accompanying MATLAB R2010a scripts, derivations and additional figures SFigures 1–16.