The Annals of Applied Statistics

More power via graph-structured tests for differential expression of gene networks

Laurent Jacob, Pierre Neuvial, and Sandrine Dudoit
Source: Ann. Appl. Stat. Volume 6, Number 2 (2012), 561-600.

Abstract

We consider multivariate two-sample tests of means, where the location shift between the two populations is expected to be related to a known graph structure. An important application of such tests is the detection of differentially expressed genes between two patient populations, as shifts in expression levels are expected to be coherent with the structure of graphs reflecting gene properties such as biological process, molecular function, regulation or metabolism. For a fixed graph of interest, we demonstrate that accounting for graph structure can yield more powerful tests under the assumption of smooth distribution shift on the graph. We also investigate the identification of nonhomogeneous subgraphs of a given large graph, which poses both computational and multiple hypothesis testing problems. The relevance and benefits of the proposed approach are illustrated on synthetic data and on breast and bladder cancer gene expression data analyzed in the context of KEGG and NCI pathways.

First Page: Show Hide

Related Works:

Full-text: Access denied (no subscription detected)
In 2007, access to the Annals of Applied Statistics was open. Beginning in 2008, you must hold a subscription or be a member of the IMS to view the full journal. For more information on subscribing, please visit: http://imstat.org/orders.
If you are already an IMS member, you may need to update your Euclid profile following the instructions here: http://imstat.org/publications/eaccess.htm.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoas/1339419608
Digital Object Identifier: doi:10.1214/11-AOAS528
Zentralblatt MATH identifier: 06062731
Mathematical Reviews number (MathSciNet): MR2976483

References

Andersson, J., Larsson, L., Klaar, S., Holmberg, L., Nilsson, J., Inganäs, M., Carlsson, G., Ohd, J., Rudenstam, C.-M., Gustavsson, B. and Bergh, J. (2005). Worse survival for TP53 (p53)-mutated breast cancer patients receiving adjuvant CMF. Ann. Oncol. 16 743–748.
Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311–329.
Mathematical Reviews (MathSciNet): MR1399305
Zentralblatt MATH: 0848.62030
Bakkar, A. A., Wallerand, H., Radvanyi, F., Lahaye, J.-B., Pissard, S., Lecerf, L., Kouyoumdjian, J. C., Abbou, C. C., Pairon, J.-C., Jaurand, M.-C., Thiery, J.-P., Chopin, D. K. and de Medina, S. G. D. (2003). FGFR3 and TP53 gene mutations define two distinct pathways in urothelial cell carcinoma of the bladder. Cancer Res. 63 8108–8112.
Barnes, D. M. (1997). Cyclin D1 in mammary carcinoma. J. Pathol. 181 267–269.
Beissbarth, T. and Speed, T. P. (2004). GOstat: Find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20 1464–1465.
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
Mathematical Reviews (MathSciNet): MR1325392
Chen, S. X. and Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38 808–835.
Mathematical Reviews (MathSciNet): MR2604697
Zentralblatt MATH: 1183.62095
Digital Object Identifier: doi:10.1214/09-AOS716
Project Euclid: euclid.aos/1266586615
Cunningham, J. M., Vierkant, R. A., Sellers, T. A., Phelan, C., Rider, D. N., Liebow, M., Schildkraut, J., Berchuck, A., Couch, F. J., Wang, X., Fridley, B. L., Ovarian Cancer Association Consortium, Gentry-Maharaj, A., Menon, U., Hogdall, E., Kjaer, S., Whittemore, A., DiCioccio, R., Song, H., Gayther, S. A., Ramus, S. J., Pharaoh, P. D. P. and Goode, E. L. (2009). Cell cycle genes and ovarian cancer susceptibility: A tagSNP analysis. Br. J. Cancer 101 1461–1468.
Das Gupta, S. and Perlman, M. D. (1974). Power of the noncentral $F$ test: Effect of additional variates on Hotelling’s $T^{2}$-test. J. Amer. Statist. Assoc. 69 174–180.
Mathematical Reviews (MathSciNet): MR394991
Zentralblatt MATH: 0285.62027
Davis, C. and Kahan, W. M. (1969). Some new bounds on perturbation of subspaces. Bull. Amer. Math. Soc. 75 863–868.
Mathematical Reviews (MathSciNet): MR246155
Digital Object Identifier: doi:10.1090/S0002-9904-1969-12330-X
Project Euclid: euclid.bams/1183530664
Dudoit, S. and van der Laan, M. J. (2008). Multiple Testing Procedures with Applications to Genomics. Springer, New York.
Mathematical Reviews (MathSciNet): MR2373771
Zentralblatt MATH: 05234992
Dunn, S. E., Kari, F. W., French, J., Leininger, J. R., Travlos, G., Wilson, R. and Barrett, J. C. (1997). Dietary restriction reduces insulin-like growth factor I levels, which modulates apoptosis, cell proliferation, and tumor progression in p53-deficient mice. Cancer Res. 57 4667–4672.
Ein-Dor, L., Kela, I., Getz, G., Givol, D. and Domany, E. (2005). Outcome signature genes in breast cancer: Is there a unique set? Bioinformatics 21 171–178.
Eswarakumar, V. P., Lax, I. and Schlessinger, J. (2005). Cellular signaling by fibroblast growth factor receptors. Cytokine Growth Factor Rev. 16 139–149.
Evans, L. C. (1998). Partial Differential Equations. Graduate Studies in Mathematics 19. Amer. Math. Soc., Providence, RI.
Mathematical Reviews (MathSciNet): MR1625845
Fan, J. and Lin, S.-K. (1998). Test of significance when data are curves. J. Amer. Statist. Assoc. 93 1007–1021.
Mathematical Reviews (MathSciNet): MR1649196
Zentralblatt MATH: 1064.62525
Digital Object Identifier: doi:10.1080/01621459.1998.10473763
Fernandez-Cuesta, L., Anaganti, S., Hainaut, P. and Olivier, M. (2010). p53 status influences response to tamoxifen but not to fulvestrant in breast cancer cell lines. Int. J. Cancer 128 1813–1821.
Goeman, J. J. and Bühlmann, P. (2007). Analyzing gene expression data in terms of gene sets: Methodological issues. Bioinformatics 23 980–987.
Goldberg, A. B. (2007). Dissimilarity in graph-based semisupervised classification. In Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS).
Gutierrez, M. C., Detre, S., Johnston, S., Mohsin, S. K., Shou, J., Allred, D. C., Schiff, R., Osborne, C. K. and Dowsett, M. (2005). Molecular changes in tamoxifen-resistant breast cancer: Relationship between estrogen receptor, HER-2, and p38 mitogen-activated protein kinase. J. Clin. Oncol. 23 2469–2476.
Hammond, D. K., Vandergheynst, P. and Gribonval, R. (2009). Wavelets on graphs via spectral graph theory. Available at arXiv:0912.3848.
arXiv: 0912.3848
Mathematical Reviews (MathSciNet): MR2754772
Digital Object Identifier: doi:10.1016/j.acha.2010.04.005
Haury, A. C., Gestraud, P. and Vert, J. P. (2011). The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. Preprint. Available at arXiv:1101.5008.
arXiv: 1101.5008
Haury, A. C., Jacob, L. and Vert, J. P. (2010). Increasing stability and interpretability of gene expression signatures. ArXiv E-prints.
He, Z. and Yu, W. (2010). Stable feature selection for biomarker discovery. Available at arXiv:1001.0887.
arXiv: 1001.0887
Hernández, S., López-Knowles, E., Lloreta, J., Kogevinas, M., Jaramillo, R., Amorós, A., Tardón, A., García-Closas, R., Serra, C., Carrato, A., Malats, N. and Real, F. X. (2005). FGFR3 and Tp53 mutations in T1G3 transitional bladder carcinomas: Independent distribution and lack of association with prognosis. Clin. Cancer Res. 11 5444–5450.
Herynk, M. H., Beyer, A. R., Cui, Y., Weiss, H., Anderson, E., Green, T. P. and Fuqua, S. A. W. (2006). Cooperative action of tamoxifen and c-Src inhibition in preventing the growth of estrogen receptor-positive human breast cancer cells. Mol. Cancer Ther. 5 3023–3031.
Hung, T.-T., Wang, H., Kingsley, E. A., Risbridger, G. P. and Russell, P. J. (2008). Molecular profiling of bladder cancer: Involvement of the TGF-beta pathway in bladder cancer progression. Cancer Lett. 265 27–38.
Ideker, T., Ozier, O., Schwikowski, B. and Siegel, A. F. (2002). Discovering regulatory and signalling circuits in molecular interaction networks. In ISMB 233–240.
Ipsen, I. C. F. (2010). The eigenproblem and invariant subspaces: Perturbation theory. In G. W. Stewart: Selected Works with Commentaries (M. E. Kilmer and D. P. O’Leary, eds.) 71–93. Birkhäuser, Basel.
Mathematical Reviews (MathSciNet): MR2731855
Jacob, L. (2011). NCIgraph: Pathways from the NCI Pathways Database R package version 1.0.0.
Jacob, L., Neuvial, P. and Dudoit, S. (2011a). Supplement A to “More power via graph-structured tests for differential expression of gene networks.” DOI:10.1214/11-AOAS528SUPPA.
Jacob, L., Neuvial, P. and Dudoit, S. (2011b). Supplement B to “More power via graph-structured tests for differential expression of gene networks.” DOI:10.1214/11-AOAS528SUPPB.
Jacob, L., Neuvial, P. and Dudoit, S. (2011c). Supplement C to “More power via graph-structured tests for differential expression of gene networks.” DOI:10.1214/11-AOAS528SUPPC.
Jacob, L., Obozinski, G. and Vert, J.-P. (2009). Group lasso with overlap and graph lasso. In ICML’09: Proceedings of the 26th Annual International Conference on Machine Learning 433–440. ACM, New York.
Jenatton, R., Audibert, J. Y. and Bach, F. (2009). Structured variable selection with sparsity-inducing norms. Research report, WILLOW–INRIA.
Johnson, N., Bentley, J., Wang, L.-Z., Newell, D. R., Robson, C. N., Shapiro, G. I. and Curtin, N. J. (2010). Pre-clinical evaluation of cyclin-dependent kinase 2 and 1 inhibition in anti-estrogen-sensitive and resistant breast cancer cells. Br. J. Cancer 102 342–350.
Knowles, M. A. (2006). Molecular subtypes of bladder cancer: Jekyll and Hyde or chalk and cheese? Carcinogenesis 27 361–373.
Land, A. H. and Doig, A. G. (1960). An automatic method of solving discrete programming problems. Econometrica 28 497–520.
Mathematical Reviews (MathSciNet): MR115825
Digital Object Identifier: doi:10.2307/1910129
Levidou, G., Saetta, A. A., Karlou, M., Thymara, I., Pratsinis, H., Pavlopoulos, P., Isaiadis, D., Diamantopoulou, K., Patsouris, E. and Korkolopoulou, P. (2010). D-type cyclins in superficial and muscle-invasive bladder urothelial carcinoma: Correlation with clinicopathological data and prognostic significance. J. Cancer Res. Clin. Oncol. 136 1563–1571.
Loi, S., Haibe-Kains, B., Desmedt, C., Wirapati, P., Lallemand, F., Tutt, A. M., Gillet, C., Ellis, P., Ryder, K., Reid, J. F. et al. (2008). Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen. BMC Genomics 9 239.
Lönnstedt, I. and Speed, T. (2002). Replicated microarray data. Statist. Sinica 12 31–46.
Mathematical Reviews (MathSciNet): MR1894187
Zentralblatt MATH: 1004.62086
Lopes, M. E., Jacob, L. and Wainwright, M. J. (2011). A more powerful two-sample test in high dimensions using random projection. Technical report. Available at arXiv:1108.2401.
arXiv: 1108.2401
Louie, M. C., McClellan, A., Siewit, C. and Kawabata, L. (2010). Estrogen receptor regulates E2F1 expression to mediate tamoxifen resistance. Mol. Cancer Res. 8 343–352.
Lu, Y., Liu, P.-Y., Xiao, P. and Deng, H.-W. (2005). Hotelling’s $T^{2}$ multivariate profiling for detecting differential expression in microarrays. Bioinformatics 21 3105–3113.
Lücke, C. D., Philpott, A., Metcalfe, J. C., Thompson, A. M., Hughes-Davies, L., Kemp, P. R. and Hesketh, R. (2001). Inhibiting mutations in the transforming growth factor beta type 2 receptor in recurrent human breast cancer. Cancer Res. 61 482–485.
Ma, S. and Kosorok, M. R. (2009). Identification of differential gene pathways with principal component analysis. Bioinformatics 25 882–889.
Man, Y.-G. (2010). Aberrant leukocyte infiltration: A direct trigger for breast tumor invasion and metastasis. Int. J. Biol. Sci. 6 129–132.
McGlynn, L. M., Kirkegaard, T., Edwards, J., Tovey, S., Cameron, D., Twelves, C., Bartlett, J. M. S. and Cooke, T. G. (2009). Ras/Raf-1/MAPK pathway mediates response to tamoxifen but not chemotherapy in breast cancer patients. Clin. Cancer Res. 15 1487–1495.
Mellon, J. K., Lunec, J., Wright, C., Horne, C. H., Kelly, P. and Neal, D. E. (1996). C-erbB-2 in bladder cancer: Molecular biology, correlation with epidermal growth factor receptors and prognostic value. J. Urol. 155 321–326.
Mitra, A. P., Pagliarulo, V., Yang, D., Waldman, F. M., Datar, R. H., Skinner, D. G., Groshen, S. and Cote, R. J. (2009). Generation of a concise gene panel for outcome prediction in urinary bladder cancer. J. Clin. Oncol. 27 3929–3937.
Musgrove, E. A. and Sutherland, R. L. (2009). Biological determinants of endocrine resistance in breast cancer. Nat. Rev. Cancer 9 631–643.
Nacu, S., Critchley-Thorne, R., Lee, P. and Holmes, S. (2007). Gene expression network analysis and applications to immunology. Bioinformatics 23 850.
Obozinski, G., Jacob, L. and Vert, J. P. (2011). Group Lasso with overlaps: The latent group Lasso approach. Technical report. arXiv.
Ohtake, F., Baba, A., Takada, I., Okada, M., Iwasaki, K., Miki, H., Takahashi, S., Kouzmenko, A., Nohara, K., Chiba, T., Fujii-Kuriyama, Y. and Kato, S. (2007). Dioxin receptor is a ligand-dependent E3 ubiquitin ligase. Nature 446 562–566.
Paik, S., Shak, S., Tang, G., Kim, C., Baker, J., Cronin, M., Baehner, F. L., Walker, M. G., Watson, D., Park, T., Hiller, W., Fisher, E. R., Wickerham, D. L., Bryant, J. and Wolmark, N. (2004). A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 351 2817–2826.
Perou, C. M., Sørlie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Rees, C. A., Pollack, J. R., Ross, D. T., Johnsen, H., Akslen, L. A., Fluge, O., Pergamenschikov, A., Williams, C., Zhu, S. X., Lønning, P. E., Børresen-Dale, A. L., Brown, P. O. and Botstein, D. (2000). Molecular portraits of human breast tumours. Nature 406 747–752.
Rakha, E. A., Boyce, R. W. G., El-Rehim, D. A., Kurien, T., Green, A. R., Paish, E. C., Robertson, J. F. R. and Ellis, I. O. (2005). Expression of mucins (MUC1, MUC2, MUC3, MUC4, MUC5AC and MUC6) and their prognostic significance in human breast cancer. Mod. Pathol. 18 1295–1304.
Rapaport, F., Zinovyev, A., Dutreix, M., Barillot, E. and Vert, J.-P. (2007). Classification of microarray data using gene networks. BMC Bioinformatics 8 35.
Roy, D., Sarkar, S. and Felty, Q. (2006). Levels of IL-1 beta control stimulatory/inhibitory growth of cancer cells. Front. Biosci. 11 889–898.
Sanchez-Carbayo, M., Socci, N. D., Lozano, J., Saint, F. and Cordon-Cardo, C. (2006). Defining molecular profiles of poor outcome in patients with invasive bladder cancer using oligonucleotide microarrays. J. Clin. Oncol. 24 778–789.
Sandler, T., Blitzer, J., Talukdar, P. and Pereira, F. (2009). Regularized learning with networks of features. In Neural Information Processing Systems. MIT Press, Cambridge, MA.
Spruck, C. H., Ohneseit, P. F., Gonzalez-Zulueta, M., Esrig, D., Miyao, N., Tsai, Y. C., Lerner, S. P., Schmütte, C., Yang, A. S. and Cote, R. (1994). Two molecular pathways to transitional cell carcinoma of the bladder. Cancer Res. 54 784–788.
Srinivasan, S., Zafar, S., Nawaz, Z. and Loggie, B. W. (2007). Transcriptional regulation of MUC2 by estrogen. 2007 Gastrointestinal Cancers Symposium.
Srivastava, M. S. (2009). A test for the mean vector with fewer observations than the dimension under nonnormality. J. Multivariate Anal. 100 518–532.
Mathematical Reviews (MathSciNet): MR2483435
Zentralblatt MATH: 1154.62046
Digital Object Identifier: doi:10.1016/j.jmva.2008.06.006
Srivastava, M. S. and Du, M. (2008). A test for the mean vector with fewer observations than the dimension. J. Multivariate Anal. 99 386–402.
Mathematical Reviews (MathSciNet): MR2396970
Digital Object Identifier: doi:10.1016/j.jmva.2006.11.002
Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory. Academic Press, Boston, MA.
Mathematical Reviews (MathSciNet): MR1061154
Stransky, N., Vallot, C., Reyal, F., Bernard-Pierrot, I., Diez de Medina, S. G., Segraves, R., de Rycke, Y., Elvin, P., Cassidy, A., Spraggon, C., Graham, A., Southgate, J., Asselain, B., Allory, Y., Abbou, C. C., Albertson, D. G., Thiery, J. P., Chopin, D. K., Pinkel, D. and Radvanyi, F. (2006). Regional copy number-independent deregulation of transcription in cancer. Nat. Genet. 38 1386–1396.
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S. and Mesirov, J. P. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102 15545–15550.
Sutherland, R. L. and Musgrove, E. A. (2009). CDK inhibitors as potential breast cancer therapeutics: New evidence for enhanced efficacy in ER+ disease. Breast Cancer Res. 11 112.
Tai, Y. C. and Speed, T. P. (2009). On gene ranking using replicated microarray time course data. Biometrics 65 40–51.
Mathematical Reviews (MathSciNet): MR2665844
Digital Object Identifier: doi:10.1111/j.1541-0420.2008.01057.x
Turner, N., Pearson, A., Sharpe, R., Lambros, M., Geyer, F., Lopez-Garcia, M. A., Natrajan, R., Marchio, C., Iorns, E., Mackay, A., Gillett, C., Grigoriadis, A., Tutt, A., Reis-Filho, J. S. and Ashworth, A. (2010). FGFR1 amplification drives endocrine therapy resistance and is a therapeutic target in breast cancer. Cancer Res. 70 2085–2094.
van Rhijn, B. W. G., van der Kwast, T. H., Vis, A. N., Kirkels, W. J., Boevé, E. R., Jöbsis, A. C. and Zwarthoff, E. C. (2004). FGFR3 and P53 characterize alternative genetic pathways in the pathogenesis of urothelial cell carcinoma. Cancer Res. 64 1911–1914.
Vandin, F., Upfal, E. and Raphael, B. J. (2010). Algorithms for detecting significantly mutated pathways in cancer. In RECOMB (B. Berger, ed.). Lecture Notes in Computer Science 6044 506–521. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR2782070
Digital Object Identifier: doi:10.1089/cmb.2010.0265
Vaske, C., Benz, S., Sanborn, Z., Earl, D., Szeto, C., Zhu, J., Haussler, D. and Stuart, J. (2010). Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. In ISMB.
Walsh, M. D., McGuckin, M. A., Devine, P. L., Hohn, B. G. and Wright, R. G. (1993). Expression of MUC2 epithelial mucin in breast carcinoma. J. Clin. Pathol. 46 922–925.
Wu, W., Pew, T., Zou, M., Pang, D. and Conzen, S. D. (2005). Glucocorticoid receptor-induced MAPK phosphatase-1 (MPK-1) expression inhibits paclitaxel-associated MAPK activation and contributes to breast cancer cell survival. J. Biol. Chem. 280 4117–4124.

2013 © Institute of Mathematical Statistics

The Annals of Applied Statistics

The Annals of Applied Statistics

Turn MathJax Off
What is MathJax?