The Annals of Applied Statistics

Inference and characterization of multi-attribute networks with application to computational biology

Natallia Katenka and Eric D. Kolaczyk

Our work is motivated by and illustrated with application of association networks in computational biology, specifically in the context of gene/protein regulatory networks. Association networks represent systems of interacting elements, where a link between two different elements indicates a sufficient level of similarity between element attributes. While in reality relational ties between elements can be expected to be based on similarity across multiple attributes, the vast majority of work to date on association networks involves ties defined with respect to only a single attribute. We propose an approach for the inference of multi-attribute association networks from measurements on continuous attribute variables, using canonical correlation and a hypothesis-testing strategy. Within this context, we then study the impact of partial information on multi-attribute network inference and characterization, when only a subset of attributes is available. We consider in detail the case of two attributes, wherein we examine through a combination of analytical and numerical techniques the implications of the choice and number of node attributes on the ability to detect network links and, more generally, to estimate higher-level network summary statistics, such as node degree, clustering coefficients and measures of centrality. Illustration and applications throughout the paper are developed using gene and protein expression measurements on human cancer cell lines from the NCI-60 database.

Ann. Appl. Stat., Volume 6, Number 3 (2012), 1068-1094.

Multi-attribute association networks gene/protein regulatory networks canonical correlation


