The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 6, Number 2 (2012), 561-600.
More power via graph-structured tests for differential expression of gene networks
We consider multivariate two-sample tests of means, where the location shift between the two populations is expected to be related to a known graph structure. An important application of such tests is the detection of differentially expressed genes between two patient populations, as shifts in expression levels are expected to be coherent with the structure of graphs reflecting gene properties such as biological process, molecular function, regulation or metabolism. For a fixed graph of interest, we demonstrate that accounting for graph structure can yield more powerful tests under the assumption of smooth distribution shift on the graph. We also investigate the identification of nonhomogeneous subgraphs of a given large graph, which poses both computational and multiple hypothesis testing problems. The relevance and benefits of the proposed approach are illustrated on synthetic data and on breast and bladder cancer gene expression data analyzed in the context of KEGG and NCI pathways.
Ann. Appl. Stat., Volume 6, Number 2 (2012), 561-600.
First available in Project Euclid: 11 June 2012
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Jacob, Laurent; Neuvial, Pierre; Dudoit, Sandrine. More power via graph-structured tests for differential expression of gene networks. Ann. Appl. Stat. 6 (2012), no. 2, 561--600. doi:10.1214/11-AOAS528. https://projecteuclid.org/euclid.aoas/1339419608
- Supplementary material A: Technical results and proofs. This section contains our technical results (Lemma and Corollaries) on gain in power along with their proofs. It also contains the upper bound used in the branch and bound algorithm with its proof. Finally, it contains the lemma characterizing the subgraphs that would be missed by the approximated subgraph discovery algorithm presented in Section 4.2 along with its proof.
- Supplementary material B: Pathways considered in the experiments. This section lists the names of the pathways considered in the experiments.
- Supplementary material C: Gene lists. This section lists the genes belonging to each of the pathways studied in detail in the experiments along with their $t$-statistic and corresponding $p$-value.