The Annals of Applied Statistics

A novel and efficient algorithm for de novo discovery of mutated driver pathways in cancer

Binghui Liu, Chong Wu, Xiaotong Shen, and Wei Pan

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Next-generation sequencing studies on cancer somatic mutations have discovered that driver mutations tend to appear in most tumor samples, but they barely overlap in any single tumor sample, presumably because a single driver mutation can perturb the whole pathway. Based on the corresponding new concepts of coverage and mutual exclusivity, new methods can be designed for de novo discovery of mutated driver pathways in cancer. Since the computational problem is a combinatorial optimization with an objective function involving a discontinuous indicator function in high dimension, many existing optimization algorithms, such as a brute force enumeration, gradient descent and Newton’s methods, are practically infeasible or directly inapplicable. We develop a new algorithm based on a novel formulation of the problem as nonconvex programming and nonconvex regularization. The method is computationally more efficient, effective and scalable than existing Monte Carlo searching and several other algorithms, which have been applied to The Cancer Genome Atlas (TCGA) project. We also extend the new method for integrative analysis of both mutation and gene expression data. We demonstrate the promising performance of the new methods with applications to three cancer datasets to discover de novo mutated driver pathways.

Article information

Ann. Appl. Stat. Volume 11, Number 3 (2017), 1481-1512.

Received: September 2015
Revised: March 2017
First available in Project Euclid: 5 October 2017

Permanent link to this document

Digital Object Identifier

DNA sequencing driver mutations optimization subset selection truncated $L_{1}$ penalty


Liu, Binghui; Wu, Chong; Shen, Xiaotong; Pan, Wei. A novel and efficient algorithm for de novo discovery of mutated driver pathways in cancer. Ann. Appl. Stat. 11 (2017), no. 3, 1481--1512. doi:10.1214/17-AOAS1042.

Export citation


  • Beroukhim, R., Getz, G., Nghiemphu, L., Barretina, J., Hsueh, T., Linhart, D., Vivanco, I., Lee, J. C., Huang, J. H., Alexander, S., Du, J., Kau, T., Thomas, R. K., Shah, K., Soto, H., Perner, S., Prensner, J., Debiasi, R. M., Demichelis, F., Hatton, C., Rubin, M. A., Garraway, L. A., Nelson, S. F., Liau, L., Mischel, P. S., Cloughesy, T. F., Meyerson, M., Golub, T. A., Lander, E. S., Mellinghoff, I. K. and Sellers, W. R. (2007). Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma. Proc. Natl. Acad. Sci. USA 104 20007–20012.
  • Boca, S. M., Kinzler, K. W., Velculescu, V. E., Vogelstein, B. and Parmigiani, G. (2010). Patient-oriented gene set analysis for cancer mutation data. Genome Biol. 11 R112.
  • Brennan, C. W., Verhaak, R. G., McKenna, A. et al. (2013). The somatic genomic landscape of glioblastoma. Cell 155 462–477.
  • Ciriello, G., Cerami, E., Sander, C. and Schultz, N. (2012). Mutual exclusivity analysis identifies oncogenic network modules. Genome Res. 22 398–406.
  • da Cunha Santos, G., Shepherd, F. A. and Tsao, M. S. (2011). EGFR mutations and lung cancer. Annu. Rev. Phytopathol. 6 49–69.
  • Ding, L., Getz, G., Wheeler, D. A. et al. (2008). Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455 1069–1075.
  • Efroni, S. (2011). Detecting cancer gene networks characterized by recurrent genomic alterations in a population. PLoS ONE 6 e14437.
  • Feng, J., Kim, S. T., Liu, W. et al. (2012). An integrated analysis of germline and somatic, genetic and epigenetic alterations at 9p21.3 in glioblastoma. Cancer 118 232–240.
  • Forbes, S. A., Beare, D., Gunasekaran, P. et al. (2015). COSMIC: Exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 43 D805–D811.
  • Frattini, V., Trifonov, V., Chan, J. M. et al. (2013). The integrated landscape of driver genomic alterations in glioblastoma. Nat. Genet. 45 1141–1149.
  • Getz, G., Höfling, H., Mesirov, J. P., Golub, T. R., Meyerson, M., Tibshirani, R. and Lander, E. S. (2007). Comment on “The consensus coding sequences of human breast and colorectal cancers”. Science 317 1500.
  • Gill, R. K., Yang, S. H., Meerzaman, D. et al. (2011). Frequent homozygous deletion of the LKB1/STK11 gene in non-small cell lung cancer. Oncogene 30 3784–3791.
  • Hartmann, C., Bartels, G., Gehlhaar, C., Holtkamp, N. and von Deimling, A. (2005). PIK3CA mutations in glioblastoma multiforme. Acta Neuropathol. 109 639–642.
  • Heinemann, V., Stintzing, S., Kirchner, T., Boeck, S. and Jung, A. (2009). Clinical relevance of EGFR- and KRAS-status in colorectal cancer patients treated with monoclonal antibodies directed against the EGFR. Cancer Treat. Rev. 35 262–271.
  • Jones, S., Zhang, X., Parsons, D. W. et al. (2008). Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321 1801–1806.
  • Kandoth, C., McLellan, M. D., Vandin, F. et al. (2013). Mutational landscape and significance across 12 major cancer types. Nature 502 333–339.
  • Leiserson, M. DM., Blokh, D., Sharan, R. and Raphael, B. J. (2013). Simultaneous identification of multiple driver pathways in cancer. PLoS Comput. Biol. 9 e1003054.
  • Li, C. and Li, H. (2008). Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24 1175–1182.
  • Liu, L., Lei, J., Willsey, A. et al. (2014). DAWN: A framework to identify autism genes and subnetworks using gene expression and genetics. Molecular Autism 5 22.
  • Lo, Y. L., Hsiao, C. F., Jou, Y. S. et al. (2008). ATM polymorphisms and risk of lung cancer among never smokers. Lung Cancer 69 148–154.
  • Mardis, E. R. and Wilson, R. K. (2009). Cancer genome sequencing: A review. Hum. Mol. Genet. 18 R163–R168.
  • Masica, D. L. and Karchin, R. (2011). Correlation of somatic mutation and expression identifies genes important in human glioblastoma progression and survival. Cancer Res. 71 4550–4561.
  • Meyerson, M., Gabriel, S. and Getz, G. (2010). Advances in understanding cancer genomes through second-generation sequencing. Nat. Rev. Genet. 11 685–696.
  • Miller, C. A., Settle, S. H., Sulman, E. P., Aldape, K. D. and Milosavljevic, A. (2011). Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors. BMC Med Genomics 4 34.
  • Qiu, Y.-Q., Zhang, S., Zhang, X.-S. and Chen, L. (2010). Detecting disease associated modules and prioritizing active genes based on high throughput data. BMC Bioinform. 11 26.
  • Schaid, D. J., Sinnwell, J. P., Jenkins, G. D., McDonnell, S. K., Ingle, J. N., Kubo, M., Goss, P. E., Costantino, J. P., Wickerham, D. L. and Weinshilboum, R. M. (2012). Using the gene ontology to scan multilevel gene sets for associations in genome wide association studies. Genet. Epidemiol. 36 3–16.
  • Schwartzentruber, J., Korshunov, A., Liu, X. Y. et al. (2012). Driver mutations in histone H3.3 and chromatin remodeling genes in paediatric glioblastoma. Nature 482 226–231.
  • Shen, X., Pan, W. and Zhu, Y. (2012). Likelihood-based selection and sharp parameter estimation. J. Amer. Statist. Assoc. 107 223–232.
  • Shor, N. Z. (1985). Minimization Methods for Nondifferentiable Functions. Springer Series in Computational Mathematics 3. Springer, Berlin.
  • Stark, A. M., Witzel, P., Strege, R. J., Hugo, H. H. and Mehdorn, H. M. (2003). P53, mdm2, EGFR, and msh2 expression in paired initial and recurrent glioblastoma multiforme. J. Neurol. Neurosurg. Psychiatry 74 779–783.
  • Sturm, D., Bender, S., Jones, D. T. W., Lichter, P., Grill, J., Becher, O., Hawkins, C., Majewski, J., Jones, C., Costello, J. F., Iavarone, A., Aldape, K., Brennan, C. W., Jabado, N. and Pfister, S. M. (2014). Paediatric and adult glioblastoma: Multiform (epi)genomic culprits emerge. Nat. Rev. Cancer 14 92–107.
  • The Cancer Genome Atlas Research Network (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455 1061–1068.
  • Torkamani, A., Topo, E. J. and Schork, N. J. (2008). Pathway analysis of seven common diseases assessed by genome-wide association. Genomics 92 265–272.
  • Turcan, S., Rohle, D., Goenka, A. et al. (2012). IDH1 mutation is sufficient to establish the glioma hypermethylator phenotype. Nature 483 479–483.
  • Vandin, F., Upfal, E. and Raphael, B. J. (2012). De novo discovery of mutated driver pathways in cancer. Genome Res. 22 375–385.
  • Vogelstein, B. and Kinzler, K. W. (2004). Cancer genes and the pathways they control. Nat. Med. 10 789–799.
  • Wang, K., Li, M. and Bucan, M. (2007). Pathway-based approaches for analysis of genome-wide association studies. Am. J. Hum. Genet. 81 1278–1283.
  • Zhang, S. and Zhou, X. J. (2014). Matrix factorization methods for integrative cancer genomics. Methods Mol. Biol. 1176 229–242.
  • Zhao, J., Zhang, S., Wu, L. and Zhang, X. (2012). Efficient methods for identifying mutated driver pathways in cancer. Bioinformatics 28 2940–2947.
  • Zhuang, G., Song, W., Amato, K. et al. (2012). Effects of cancer-associated EPHA3 mutations on lung cancer. J. Natl. Cancer Inst. 104 1182–1197.