The Annals of Applied Statistics

More power via graph-structured tests for differential expression of gene networks

Laurent Jacob, Pierre Neuvial, and Sandrine Dudoit

Full-text: Open access


We consider multivariate two-sample tests of means, where the location shift between the two populations is expected to be related to a known graph structure. An important application of such tests is the detection of differentially expressed genes between two patient populations, as shifts in expression levels are expected to be coherent with the structure of graphs reflecting gene properties such as biological process, molecular function, regulation or metabolism. For a fixed graph of interest, we demonstrate that accounting for graph structure can yield more powerful tests under the assumption of smooth distribution shift on the graph. We also investigate the identification of nonhomogeneous subgraphs of a given large graph, which poses both computational and multiple hypothesis testing problems. The relevance and benefits of the proposed approach are illustrated on synthetic data and on breast and bladder cancer gene expression data analyzed in the context of KEGG and NCI pathways.

Article information

Ann. Appl. Stat., Volume 6, Number 2 (2012), 561-600.

First available in Project Euclid: 11 June 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Differential expression biological networks pathways enrichment analysis two-sample test Hotelling T2 spectral graph theory graph Laplacian dimensionality reduction


Jacob, Laurent; Neuvial, Pierre; Dudoit, Sandrine. More power via graph-structured tests for differential expression of gene networks. Ann. Appl. Stat. 6 (2012), no. 2, 561--600. doi:10.1214/11-AOAS528.

Export citation


  • Andersson, J., Larsson, L., Klaar, S., Holmberg, L., Nilsson, J., Inganäs, M., Carlsson, G., Ohd, J., Rudenstam, C.-M., Gustavsson, B. and Bergh, J. (2005). Worse survival for TP53 (p53)-mutated breast cancer patients receiving adjuvant CMF. Ann. Oncol. 16 743–748.
  • Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311–329.
  • Bakkar, A. A., Wallerand, H., Radvanyi, F., Lahaye, J.-B., Pissard, S., Lecerf, L., Kouyoumdjian, J. C., Abbou, C. C., Pairon, J.-C., Jaurand, M.-C., Thiery, J.-P., Chopin, D. K. and de Medina, S. G. D. (2003). FGFR3 and TP53 gene mutations define two distinct pathways in urothelial cell carcinoma of the bladder. Cancer Res. 63 8108–8112.
  • Barnes, D. M. (1997). Cyclin D1 in mammary carcinoma. J. Pathol. 181 267–269.
  • Beissbarth, T. and Speed, T. P. (2004). GOstat: Find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20 1464–1465.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Chen, S. X. and Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38 808–835.
  • Cunningham, J. M., Vierkant, R. A., Sellers, T. A., Phelan, C., Rider, D. N., Liebow, M., Schildkraut, J., Berchuck, A., Couch, F. J., Wang, X., Fridley, B. L., Ovarian Cancer Association Consortium, Gentry-Maharaj, A., Menon, U., Hogdall, E., Kjaer, S., Whittemore, A., DiCioccio, R., Song, H., Gayther, S. A., Ramus, S. J., Pharaoh, P. D. P. and Goode, E. L. (2009). Cell cycle genes and ovarian cancer susceptibility: A tagSNP analysis. Br. J. Cancer 101 1461–1468.
  • Das Gupta, S. and Perlman, M. D. (1974). Power of the noncentral $F$ test: Effect of additional variates on Hotelling’s $T^{2}$-test. J. Amer. Statist. Assoc. 69 174–180.
  • Davis, C. and Kahan, W. M. (1969). Some new bounds on perturbation of subspaces. Bull. Amer. Math. Soc. 75 863–868.
  • Dudoit, S. and van der Laan, M. J. (2008). Multiple Testing Procedures with Applications to Genomics. Springer, New York.
  • Dunn, S. E., Kari, F. W., French, J., Leininger, J. R., Travlos, G., Wilson, R. and Barrett, J. C. (1997). Dietary restriction reduces insulin-like growth factor I levels, which modulates apoptosis, cell proliferation, and tumor progression in p53-deficient mice. Cancer Res. 57 4667–4672.
  • Ein-Dor, L., Kela, I., Getz, G., Givol, D. and Domany, E. (2005). Outcome signature genes in breast cancer: Is there a unique set? Bioinformatics 21 171–178.
  • Eswarakumar, V. P., Lax, I. and Schlessinger, J. (2005). Cellular signaling by fibroblast growth factor receptors. Cytokine Growth Factor Rev. 16 139–149.
  • Evans, L. C. (1998). Partial Differential Equations. Graduate Studies in Mathematics 19. Amer. Math. Soc., Providence, RI.
  • Fan, J. and Lin, S.-K. (1998). Test of significance when data are curves. J. Amer. Statist. Assoc. 93 1007–1021.
  • Fernandez-Cuesta, L., Anaganti, S., Hainaut, P. and Olivier, M. (2010). p53 status influences response to tamoxifen but not to fulvestrant in breast cancer cell lines. Int. J. Cancer 128 1813–1821.
  • Goeman, J. J. and Bühlmann, P. (2007). Analyzing gene expression data in terms of gene sets: Methodological issues. Bioinformatics 23 980–987.
  • Goldberg, A. B. (2007). Dissimilarity in graph-based semisupervised classification. In Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS).
  • Gutierrez, M. C., Detre, S., Johnston, S., Mohsin, S. K., Shou, J., Allred, D. C., Schiff, R., Osborne, C. K. and Dowsett, M. (2005). Molecular changes in tamoxifen-resistant breast cancer: Relationship between estrogen receptor, HER-2, and p38 mitogen-activated protein kinase. J. Clin. Oncol. 23 2469–2476.
  • Hammond, D. K., Vandergheynst, P. and Gribonval, R. (2009). Wavelets on graphs via spectral graph theory. Available at arXiv:0912.3848.
  • Haury, A. C., Gestraud, P. and Vert, J. P. (2011). The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. Preprint. Available at arXiv:1101.5008.
  • Haury, A. C., Jacob, L. and Vert, J. P. (2010). Increasing stability and interpretability of gene expression signatures. ArXiv E-prints.
  • He, Z. and Yu, W. (2010). Stable feature selection for biomarker discovery. Available at arXiv:1001.0887.
  • Hernández, S., López-Knowles, E., Lloreta, J., Kogevinas, M., Jaramillo, R., Amorós, A., Tardón, A., García-Closas, R., Serra, C., Carrato, A., Malats, N. and Real, F. X. (2005). FGFR3 and Tp53 mutations in T1G3 transitional bladder carcinomas: Independent distribution and lack of association with prognosis. Clin. Cancer Res. 11 5444–5450.
  • Herynk, M. H., Beyer, A. R., Cui, Y., Weiss, H., Anderson, E., Green, T. P. and Fuqua, S. A. W. (2006). Cooperative action of tamoxifen and c-Src inhibition in preventing the growth of estrogen receptor-positive human breast cancer cells. Mol. Cancer Ther. 5 3023–3031.
  • Hung, T.-T., Wang, H., Kingsley, E. A., Risbridger, G. P. and Russell, P. J. (2008). Molecular profiling of bladder cancer: Involvement of the TGF-beta pathway in bladder cancer progression. Cancer Lett. 265 27–38.
  • Ideker, T., Ozier, O., Schwikowski, B. and Siegel, A. F. (2002). Discovering regulatory and signalling circuits in molecular interaction networks. In ISMB 233–240.
  • Ipsen, I. C. F. (2010). The eigenproblem and invariant subspaces: Perturbation theory. In G. W. Stewart: Selected Works with Commentaries (M. E. Kilmer and D. P. O’Leary, eds.) 71–93. Birkhäuser, Basel.
  • Jacob, L. (2011). NCIgraph: Pathways from the NCI Pathways Database R package version 1.0.0.
  • Jacob, L., Neuvial, P. and Dudoit, S. (2011a). Supplement A to “More power via graph-structured tests for differential expression of gene networks.” DOI:10.1214/11-AOAS528SUPPA.
  • Jacob, L., Neuvial, P. and Dudoit, S. (2011b). Supplement B to “More power via graph-structured tests for differential expression of gene networks.” DOI:10.1214/11-AOAS528SUPPB.
  • Jacob, L., Neuvial, P. and Dudoit, S. (2011c). Supplement C to “More power via graph-structured tests for differential expression of gene networks.” DOI:10.1214/11-AOAS528SUPPC.
  • Jacob, L., Obozinski, G. and Vert, J.-P. (2009). Group lasso with overlap and graph lasso. In ICML’09: Proceedings of the 26th Annual International Conference on Machine Learning 433–440. ACM, New York.
  • Jenatton, R., Audibert, J. Y. and Bach, F. (2009). Structured variable selection with sparsity-inducing norms. Research report, WILLOW–INRIA.
  • Johnson, N., Bentley, J., Wang, L.-Z., Newell, D. R., Robson, C. N., Shapiro, G. I. and Curtin, N. J. (2010). Pre-clinical evaluation of cyclin-dependent kinase 2 and 1 inhibition in anti-estrogen-sensitive and resistant breast cancer cells. Br. J. Cancer 102 342–350.
  • Knowles, M. A. (2006). Molecular subtypes of bladder cancer: Jekyll and Hyde or chalk and cheese? Carcinogenesis 27 361–373.
  • Land, A. H. and Doig, A. G. (1960). An automatic method of solving discrete programming problems. Econometrica 28 497–520.
  • Levidou, G., Saetta, A. A., Karlou, M., Thymara, I., Pratsinis, H., Pavlopoulos, P., Isaiadis, D., Diamantopoulou, K., Patsouris, E. and Korkolopoulou, P. (2010). D-type cyclins in superficial and muscle-invasive bladder urothelial carcinoma: Correlation with clinicopathological data and prognostic significance. J. Cancer Res. Clin. Oncol. 136 1563–1571.
  • Loi, S., Haibe-Kains, B., Desmedt, C., Wirapati, P., Lallemand, F., Tutt, A. M., Gillet, C., Ellis, P., Ryder, K., Reid, J. F. et al. (2008). Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen. BMC Genomics 9 239.
  • Lönnstedt, I. and Speed, T. (2002). Replicated microarray data. Statist. Sinica 12 31–46.
  • Lopes, M. E., Jacob, L. and Wainwright, M. J. (2011). A more powerful two-sample test in high dimensions using random projection. Technical report. Available at arXiv:1108.2401.
  • Louie, M. C., McClellan, A., Siewit, C. and Kawabata, L. (2010). Estrogen receptor regulates E2F1 expression to mediate tamoxifen resistance. Mol. Cancer Res. 8 343–352.
  • Lu, Y., Liu, P.-Y., Xiao, P. and Deng, H.-W. (2005). Hotelling’s $T^{2}$ multivariate profiling for detecting differential expression in microarrays. Bioinformatics 21 3105–3113.
  • Lücke, C. D., Philpott, A., Metcalfe, J. C., Thompson, A. M., Hughes-Davies, L., Kemp, P. R. and Hesketh, R. (2001). Inhibiting mutations in the transforming growth factor beta type 2 receptor in recurrent human breast cancer. Cancer Res. 61 482–485.
  • Ma, S. and Kosorok, M. R. (2009). Identification of differential gene pathways with principal component analysis. Bioinformatics 25 882–889.
  • Man, Y.-G. (2010). Aberrant leukocyte infiltration: A direct trigger for breast tumor invasion and metastasis. Int. J. Biol. Sci. 6 129–132.
  • McGlynn, L. M., Kirkegaard, T., Edwards, J., Tovey, S., Cameron, D., Twelves, C., Bartlett, J. M. S. and Cooke, T. G. (2009). Ras/Raf-1/MAPK pathway mediates response to tamoxifen but not chemotherapy in breast cancer patients. Clin. Cancer Res. 15 1487–1495.
  • Mellon, J. K., Lunec, J., Wright, C., Horne, C. H., Kelly, P. and Neal, D. E. (1996). C-erbB-2 in bladder cancer: Molecular biology, correlation with epidermal growth factor receptors and prognostic value. J. Urol. 155 321–326.
  • Mitra, A. P., Pagliarulo, V., Yang, D., Waldman, F. M., Datar, R. H., Skinner, D. G., Groshen, S. and Cote, R. J. (2009). Generation of a concise gene panel for outcome prediction in urinary bladder cancer. J. Clin. Oncol. 27 3929–3937.
  • Musgrove, E. A. and Sutherland, R. L. (2009). Biological determinants of endocrine resistance in breast cancer. Nat. Rev. Cancer 9 631–643.
  • Nacu, S., Critchley-Thorne, R., Lee, P. and Holmes, S. (2007). Gene expression network analysis and applications to immunology. Bioinformatics 23 850.
  • Obozinski, G., Jacob, L. and Vert, J. P. (2011). Group Lasso with overlaps: The latent group Lasso approach. Technical report. arXiv.
  • Ohtake, F., Baba, A., Takada, I., Okada, M., Iwasaki, K., Miki, H., Takahashi, S., Kouzmenko, A., Nohara, K., Chiba, T., Fujii-Kuriyama, Y. and Kato, S. (2007). Dioxin receptor is a ligand-dependent E3 ubiquitin ligase. Nature 446 562–566.
  • Paik, S., Shak, S., Tang, G., Kim, C., Baker, J., Cronin, M., Baehner, F. L., Walker, M. G., Watson, D., Park, T., Hiller, W., Fisher, E. R., Wickerham, D. L., Bryant, J. and Wolmark, N. (2004). A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 351 2817–2826.
  • Perou, C. M., Sørlie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Rees, C. A., Pollack, J. R., Ross, D. T., Johnsen, H., Akslen, L. A., Fluge, O., Pergamenschikov, A., Williams, C., Zhu, S. X., Lønning, P. E., Børresen-Dale, A. L., Brown, P. O. and Botstein, D. (2000). Molecular portraits of human breast tumours. Nature 406 747–752.
  • Rakha, E. A., Boyce, R. W. G., El-Rehim, D. A., Kurien, T., Green, A. R., Paish, E. C., Robertson, J. F. R. and Ellis, I. O. (2005). Expression of mucins (MUC1, MUC2, MUC3, MUC4, MUC5AC and MUC6) and their prognostic significance in human breast cancer. Mod. Pathol. 18 1295–1304.
  • Rapaport, F., Zinovyev, A., Dutreix, M., Barillot, E. and Vert, J.-P. (2007). Classification of microarray data using gene networks. BMC Bioinformatics 8 35.
  • Roy, D., Sarkar, S. and Felty, Q. (2006). Levels of IL-1 beta control stimulatory/inhibitory growth of cancer cells. Front. Biosci. 11 889–898.
  • Sanchez-Carbayo, M., Socci, N. D., Lozano, J., Saint, F. and Cordon-Cardo, C. (2006). Defining molecular profiles of poor outcome in patients with invasive bladder cancer using oligonucleotide microarrays. J. Clin. Oncol. 24 778–789.
  • Sandler, T., Blitzer, J., Talukdar, P. and Pereira, F. (2009). Regularized learning with networks of features. In Neural Information Processing Systems. MIT Press, Cambridge, MA.
  • Spruck, C. H., Ohneseit, P. F., Gonzalez-Zulueta, M., Esrig, D., Miyao, N., Tsai, Y. C., Lerner, S. P., Schmütte, C., Yang, A. S. and Cote, R. (1994). Two molecular pathways to transitional cell carcinoma of the bladder. Cancer Res. 54 784–788.
  • Srinivasan, S., Zafar, S., Nawaz, Z. and Loggie, B. W. (2007). Transcriptional regulation of MUC2 by estrogen. 2007 Gastrointestinal Cancers Symposium.
  • Srivastava, M. S. (2009). A test for the mean vector with fewer observations than the dimension under nonnormality. J. Multivariate Anal. 100 518–532.
  • Srivastava, M. S. and Du, M. (2008). A test for the mean vector with fewer observations than the dimension. J. Multivariate Anal. 99 386–402.
  • Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory. Academic Press, Boston, MA.
  • Stransky, N., Vallot, C., Reyal, F., Bernard-Pierrot, I., Diez de Medina, S. G., Segraves, R., de Rycke, Y., Elvin, P., Cassidy, A., Spraggon, C., Graham, A., Southgate, J., Asselain, B., Allory, Y., Abbou, C. C., Albertson, D. G., Thiery, J. P., Chopin, D. K., Pinkel, D. and Radvanyi, F. (2006). Regional copy number-independent deregulation of transcription in cancer. Nat. Genet. 38 1386–1396.
  • Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S. and Mesirov, J. P. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102 15545–15550.
  • Sutherland, R. L. and Musgrove, E. A. (2009). CDK inhibitors as potential breast cancer therapeutics: New evidence for enhanced efficacy in ER+ disease. Breast Cancer Res. 11 112.
  • Tai, Y. C. and Speed, T. P. (2009). On gene ranking using replicated microarray time course data. Biometrics 65 40–51.
  • Turner, N., Pearson, A., Sharpe, R., Lambros, M., Geyer, F., Lopez-Garcia, M. A., Natrajan, R., Marchio, C., Iorns, E., Mackay, A., Gillett, C., Grigoriadis, A., Tutt, A., Reis-Filho, J. S. and Ashworth, A. (2010). FGFR1 amplification drives endocrine therapy resistance and is a therapeutic target in breast cancer. Cancer Res. 70 2085–2094.
  • van Rhijn, B. W. G., van der Kwast, T. H., Vis, A. N., Kirkels, W. J., Boevé, E. R., Jöbsis, A. C. and Zwarthoff, E. C. (2004). FGFR3 and P53 characterize alternative genetic pathways in the pathogenesis of urothelial cell carcinoma. Cancer Res. 64 1911–1914.
  • Vandin, F., Upfal, E. and Raphael, B. J. (2010). Algorithms for detecting significantly mutated pathways in cancer. In RECOMB (B. Berger, ed.). Lecture Notes in Computer Science 6044 506–521. Springer, Berlin.
  • Vaske, C., Benz, S., Sanborn, Z., Earl, D., Szeto, C., Zhu, J., Haussler, D. and Stuart, J. (2010). Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. In ISMB.
  • Walsh, M. D., McGuckin, M. A., Devine, P. L., Hohn, B. G. and Wright, R. G. (1993). Expression of MUC2 epithelial mucin in breast carcinoma. J. Clin. Pathol. 46 922–925.
  • Wu, W., Pew, T., Zou, M., Pang, D. and Conzen, S. D. (2005). Glucocorticoid receptor-induced MAPK phosphatase-1 (MPK-1) expression inhibits paclitaxel-associated MAPK activation and contributes to breast cancer cell survival. J. Biol. Chem. 280 4117–4124.

Supplemental materials

  • Supplementary material A: Technical results and proofs. This section contains our technical results (Lemma and Corollaries) on gain in power along with their proofs. It also contains the upper bound used in the branch and bound algorithm with its proof. Finally, it contains the lemma characterizing the subgraphs that would be missed by the approximated subgraph discovery algorithm presented in Section 4.2 along with its proof.
  • Supplementary material B: Pathways considered in the experiments. This section lists the names of the pathways considered in the experiments.
  • Supplementary material C: Gene lists. This section lists the genes belonging to each of the pathways studied in detail in the experiments along with their $t$-statistic and corresponding $p$-value.