The Annals of Applied Statistics

Inference and characterization of multi-attribute networks with application to computational biology

Natallia Katenka and Eric D. Kolaczyk

Full-text: Open access

Abstract

Our work is motivated by and illustrated with application of association networks in computational biology, specifically in the context of gene/protein regulatory networks. Association networks represent systems of interacting elements, where a link between two different elements indicates a sufficient level of similarity between element attributes. While in reality relational ties between elements can be expected to be based on similarity across multiple attributes, the vast majority of work to date on association networks involves ties defined with respect to only a single attribute. We propose an approach for the inference of multi-attribute association networks from measurements on continuous attribute variables, using canonical correlation and a hypothesis-testing strategy. Within this context, we then study the impact of partial information on multi-attribute network inference and characterization, when only a subset of attributes is available. We consider in detail the case of two attributes, wherein we examine through a combination of analytical and numerical techniques the implications of the choice and number of node attributes on the ability to detect network links and, more generally, to estimate higher-level network summary statistics, such as node degree, clustering coefficients and measures of centrality. Illustration and applications throughout the paper are developed using gene and protein expression measurements on human cancer cell lines from the NCI-60 database.

Article information

Source
Ann. Appl. Stat., Volume 6, Number 3 (2012), 1068-1094.

Dates
First available in Project Euclid: 31 August 2012

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1346418574

Digital Object Identifier
doi:10.1214/12-AOAS550

Mathematical Reviews number (MathSciNet)
MR3012521

Zentralblatt MATH identifier
1254.92035

Keywords
Multi-attribute association networks gene/protein regulatory networks canonical correlation

Citation

Katenka, Natallia; Kolaczyk, Eric D. Inference and characterization of multi-attribute networks with application to computational biology. Ann. Appl. Stat. 6 (2012), no. 3, 1068--1094. doi:10.1214/12-AOAS550. https://projecteuclid.org/euclid.aoas/1346418574


Export citation

References

  • Bartlett, M. S. (1941). The statistical significance of canonical correlations. Biometrika 32 29–37.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Butte, A. J. and Kohane, I. S. (2000). Mutual information relevance networks: Functional genomic clustering using pairwise entropy measurements. Pac. Symp. Biocomput. 5 418–429.
  • Butte, A. J., Tamayo, P., Slonim, D., Golub, T. R. and Kohane, I. S. (2000). Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc. Natl. Acad. Sci. USA 97 12182–12186.
  • Carroll, C. (2006). Canonical correlation analysis: Assessing links between multiplex networks. Social Networks 28 310–330.
  • Chang, J. and Blei, D. M. (2010). Hierarchical relational models for document networks. Ann. Appl. Stat. 4 124–150.
  • de la Fuente, A., Bing, N., Hoeschele, I. and Mendes, P. (2004). Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 20 3565–3574.
  • Efron, B. (1997). The length heuristic for simultaneous hypothesis tests. Biometrika 84 143–157.
  • Efron, B. (2010). Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Institute of Mathematical Statistics Monographs 1. Cambridge Univ. Press, Cambridge.
  • Eriksson, B., Barford, P., Nowak, R. and Crovella, M. (2007). Learning network structure from passive measurements. In Proceedings of the ACM/SIGCOMM Internet Measurement Conference 209–214. ACM, New York.
  • Faith, J. J., Hayete, B., Thaden, J. T., Mogno, I., Wierzbowski, J., Cottarel, G., Kasif, S., Collins, J. J. and Gardner, T. S. (2007). Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 5 e8.
  • Fienberg, S. E., Meyer, M. M. and Wasserman, S. S. (1985). Statistical analysis of multiple sociometric relations. J. Amer. Statist. Assoc. 80 51–67.
  • Gardner, T. S. and Faith, J. J. (2005). Reverse-engineering transcription control networks. Physics of Life Reviews 2 65–88.
  • Goldenberg, A., Zheng, A. X., Fienberg, S. E. and Airoldi, E. M. (2010). A survey of statistical network models. Found. Trends Mach. Learn. 2 129–233.
  • Hotelling, H. (1936). Relation between two sets of variates. Biometrika 28 321–377.
  • Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y. and Hattori, M. (2004). The KEGG resource for deciphering the genome. Nucleic Acids Res. 32 D277–D280.
  • Kolaczyk, E. D. (2009). Statistical Analysis of Network Data: Methods and Models. Springer, New York.
  • Lee, W.-P. and Tzou, W.-S. (2009). Computational methods for discovering gene networks from expression data. Brief. Bioinformatics 10 408–423.
  • Lee, I., Date, S. V., Adai, A. T. and Marcotte, E. M. (2004). A probabilistic functional network of yeast genes. Science 306 1555–1558.
  • Li, Q., Zheng, G., Li, Z. and Yu, K. (2008). Efficient approximation of P-value of the maximum of correlated tests, with applications to genome-wide association studies. Annals of Human Genetics 72 397–406.
  • Myers, C. L., Robson, D., Wible, A., Hibbs, M. A., Chiriac, C., Theesfeld, C. L., Dolinski, K. and Troyanskaya, O. G. (2005). Discovery of biological networks from diverse functional genomic data. Genome Biol. 6 R114.
  • Naylor, M. G., Lin, X., Weiss, S. T., Raby, B. A. and Lange, C. (2010). Using canonical correlation analysis to discover genetic regulatory variants. PLoS ONE 5 e10395.
  • Pal, A. (2011). Localization algorithms in wireless sensor networks: Current approaches and future challenges. Network Protocols and Algorithms 2 45–74.
  • Perry, P. O. and Wolfe, P. J. (2011). Point process modeling for directed interaction networks. Preprint. Available at arXiv:1011.1703.
  • Sampson, S. (1969). Crisis in a cloister. Ph.D. dissertation, Cornell Univ., Ithaca, NY.
  • Shankavaram, U. T., Reinhold, W., Nishizuka, S., Major, S., Morita, D., Chary, K. K., Reimers, M. A., Scherf, U., Kahn, A., Dolginow, D., Cossman, J., Kaldjian, E., Scudiero, D., Petricoin, E., Liotta, L., Lee, J. and Weinstein, J. (2007). Transcript and protein expression profiles of the NCI-60 cancer cell panel: An integromic microarray study. Molecular Cancer Therapeutics 6 820–832.
  • Shipley, B. (2002). Cause and Correlation in Biology: A User’s Guide to Path Analysis, Structural Equations and Causal Inference, 1st ed. Cambridge Univ. Press, Cambridge.
  • Steuer, R., Kurths, J., Fiehn, O. and Weckwerth, W. (2003). Observing and interpreting correlations in metabolomic networks. Bioinformatics 19 1019–1026.
  • Waaijenborg, S., Verselewel de Witt Hamer, P. C. and Zwinderman, A. H. (2008). Quantifying the association between gene expressions and DNA-markers by penalized canonical correlaton analysis. Stat. Appl. Genet. Mol. Biol. 7 Art. 3, 29.
  • Yamanishi, Y., Vert, J.-P., Nakaya, A. and Kanehisa, M. (2003). Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis. Bioinformatics 19 (Suppl 1) i323–i330.