The Annals of Applied Statistics

A phylogenetic latent feature model for clonal deconvolution

Francesco Marass, Florent Mouliere, Ke Yuan, Nitzan Rosenfeld, and Florian Markowetz

Full-text: Open access


Tumours develop in an evolutionary process, in which the accumulation of mutations produces subpopulations of cells with distinct mutational profiles, called clones. This process leads to the genetic heterogeneity widely observed in tumour sequencing data, but identifying the genotypes and frequencies of the different clones is still a major challenge. Here, we present Cloe, a phylogenetic latent feature model to deconvolute tumour sequencing data into a set of related genotypes. Our approach extends latent feature models by placing the features as nodes in a latent tree. The resulting model can capture both the acquisition and the loss of mutations, as well as episodes of convergent evolution. We establish the validity of Cloe on synthetic data and assess its performance on controlled biological data, comparing our reconstructions to those of several published state-of-the-art methods. We show that our method provides highly accurate reconstructions and identifies the number of clones, their genotypes and frequencies even at a modest sequencing depth. As a proof of concept, we apply our model to clinical data from three cases with chronic lymphocytic leukaemia and one case with acute myeloid leukaemia.

Article information

Ann. Appl. Stat. Volume 10, Number 4 (2016), 2377-2404.

Received: April 2016
Revised: August 2016
First available in Project Euclid: 5 January 2017

Permanent link to this document

Digital Object Identifier

Clonal deconvolution tumour heterogeneity latent feature model phylogeny admixture


Marass, Francesco; Mouliere, Florent; Yuan, Ke; Rosenfeld, Nitzan; Markowetz, Florian. A phylogenetic latent feature model for clonal deconvolution. Ann. Appl. Stat. 10 (2016), no. 4, 2377--2404. doi:10.1214/16-AOAS986.

Export citation


  • Aparicio, S. and Caldas, C. (2013). The implications of clonal genome evolution for cancer medicine. N. Engl. J. Med. 368 842–851.
  • Beerenwinkel, N., Schwarz, R. F., Gerstung, M. and Markowetz, F. (2015). Cancer evolution: Mathematical models and computational inference. Syst. Biol. 64 e1–e25.
  • Deshwar, A. G., Vembu, S., Yung, C. K., Jang, G. H., Stein, L. and Morris, Q. (2015). PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol. 16 35.
  • El-Kebir, M., Oesper, L., Acheson-Field, H. and Raphael, B. J. (2015). Reconstruction of clonal trees and tumor composition from multi-sample sequencing data. Bioinform. 31 i62–i70.
  • Fearon, E. R. and Vogelstein, B. (1990). A genetic model for colorectal tumorigenesis. Cell 61 759–767.
  • Fischer, A., Vázquez-García, I., Illingworth, C. J. and Mustonen, V. (2014). High-definition reconstruction of clonal composition in cancer. Cell Rep. 7 1740–1752.
  • Forshew, T., Murtaza, M., Parkinson, C., Gale, D., Tsui, D. W. Y., Kaper, F., Dawson, S.-J., Piskorz, A. M., Jimenez-Linan, M., Bentley, D., Hadfield, J., May, A. P., Caldas, C., Brenton, J. D. and Rosenfeld, N. (2012). Noninvasive identification and monitoring of cancer mutations by targeted deep sequencing of plasma DNA. Sci. Transl. Med. 4 136ra68.
  • Gerlinger, M., Rowan, A. J., Horswell, S., Larkin, J., Endesfelder, D., Gronroos, E., Martinez, P., Matthews, N., Stewart, A., Tarpey, P., Varela, I., Phillimore, B., Begum, S., McDonald, N. Q., Butler, A., Jones, D., Raine, K., Latimer, C., Santos, C. R., Nohadani, M., Eklund, A. C., Spencer-Dene, B., Clark, G., Pickering, L., Stamp, G., Gore, M., Szallasi, Z., Downward, J., Futreal, P. A. and Swanton, C. (2012). Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366 883–892.
  • Geyer, C. J. (1991). Markov chain Monte Carlo maximum likelihood. In Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface (E. M. Keramidas, ed.) 156–163. Interface Foundation of North America.
  • Ghahramani, Z. and Griffiths, T. L. (2005). Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems 475–482.
  • Griffith, M., Miller, C. A., Griffith, O. L., Krysiak, K., Skidmore, Z. L., Ramu, A., Walker, J. R., Dang, H. X., Trani, L., Larson, D. E., Demeter, R. T., Wendl, M. C., McMichael, J. F., Austin, R. E., Magrini, V., McGrath, S. D., Ly, A., Kulkarni, S., Cordes, M. G., Fronick, C. C., Fulton, R. S., Maher, C. A., Ding, L., Klco, J. M., Mardis, E. R., Ley, T. J. and Wilson, R. K. (2015). Optimizing cancer genome sequencing and analysis. Cell Syst. 1 210–223.
  • Heaukulani, C., Knowles, D. A. and Ghahramani, Z. (2014). Beta diffusion trees and hierarchical feature allocations. Preprint. Available at arXiv:1408.3378.
  • Ji, Y. (2016). Biostatistics and Bioinformatics Lab—Software. Available at Accessed: 2016-02-05.
  • Jiao, W., Vembu, S., Deshwar, A. G., Stein, L. and Morris, Q. (2014). Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC Bioinform. 15 1.
  • Marass, F., Mouliere, F., Yuan, K., Rosenfeld, N. and Markowetz, F. (2016). Supplement to “A phylogenetic latent feature model for clonal deconvolution.” DOI:10.1214/16-AOAS986SUPPA, DOI:10.1214/16-AOAS986SUPPB.
  • Miller, K. T., Griffiths, T. and Jordan, M. I. (2012). The phylogenetic Indian buffet process: A non-exchangeable nonparametric prior for latent features. Preprint. Available at arXiv:1206.3279.
  • Miller, C. A., White, B. S., Dees, N. D., Griffith, M., Welch, J. S., Griffith, O. L., Vij, R., Tomasson, M. H., Graubert, T. A., Walter, M. J. et al. (2014). SciClone: Inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution. PLoS Comput. Biol. 10 e1003665.
  • Nik-Zainal, S., Loo, P. V., Wedge, D. C., Alexandrov, L. B., Greenman, C. D., Lau, K. W., Raine, K., Jones, D., Marshall, J., Ramakrishna, M., Shlien, A., Cooke, S. L., Hinton, J., Menzies, A., Stebbings, L. A., Leroy, C., Jia, M., Rance, R., Mudie, L. J., Gamble, S. J., Stephens, P. J., McLaren, S., Tarpey, P. S., Papaemmanuil, E., Davies, H. R., Varela, I., McBride, D. J., Bignell, G. R., Leung, K., Butler, A. P., Teague, J. W., Martin, S., Jönsson, G., Mariani, O., Boyault, S., Miron, P., Fatima, A., Langerød, A., Aparicio, S. A. J. R., Tutt, A., Sieuwerts, A. M., Borg, A., Thomas, G., Salomon, A. V., Richardson, A. L., Børresen-Dale, A.-L., Futreal, P. A., Stratton, M. R., Campbell, P. J. and Breast Cancer Working Group of the International Cancer Genome Consortium (2012). The life history of 21 breast cancers. Cell 149 994–1007.
  • Nowell, P. C. (1976). The clonal evolution of tumor cell populations. Science 194 23–28.
  • Ronquist, F., Huelsenbeck, J. P. and Teslenko, M. (2005). MrBayes version 3.2 Manual: Tutorials and Model Summaries. [Online; accessed 29 June 2016].
  • Roth, A., Khattra, J., Yap, D., Wan, A., Laks, E., Biele, J., Ha, G., Aparicio, S., Bouchard-Côté, A. and Shah, S. P. (2014). PyClone: Statistical inference of clonal population structure in cancer. Nat. Methods 11 396–398.
  • Schuh, A., Becq, J., Humphray, S., Alexa, A., Burns, A., Clifford, R., Feller, S. M., Grocock, R., Henderson, S., Khrebtukova, I. et al. (2012). Monitoring chronic lymphocytic leukemia progression by whole genome sequencing reveals heterogeneous clonal evolution patterns. Blood 120 4191–4196.
  • Schwarz, R. F., Ng, C. K., Coooke, S. L., Newman, S., Temple, J., Piskorz, A. M., Gale, D., Sayal, K., Murtaza, M., Baldwin, P. J., Rosenfeld, N., Earl, H. M., Sala, E., Jimenez-Linan, M., Parkinson, C. A., Markowetz, F. and Brenton, J. D. (2015). Spatial and temporal heterogeneity in high-grade serous ovarian cancer: A phylogenetic reconstruction. PLoS Med 12 e1001789.
  • Sengupta, S., Wang, J., Lee, J., Müller, P., Gulukota, K., Banerjee, A. and Ji, Y. (2015). BayClone: Bayesian nonparametric inference of tumor subclones using NGS data. In Pacific Symposium on Biocomputing 20 467. World Scientific.
  • Stratton, M. R., Campbell, P. J. and Futreal, P. A. (2009). The cancer genome. Nature 458 719–724.
  • Yuan, K., Sakoparnig, T., Markowetz, F. and Beerenwinkel, N. (2015). BitPhylogeny: A probabilistic framework for reconstructing intra-tumor phylogenies. Genome Biol. 16 36.
  • Zare, H., Wang, J., Hu, A., Weber, K., Smith, J., Nickerson, D., Song, C., Witten, D., Blau, C. A. and Noble, W. S. (2014). Inferring clonal composition from multiple sections of a breast cancer. PLoS Comput. Biol. 10 e1003703.

Supplemental materials