Open Access
February 2020 Statistical Inference for the Evolutionary History of Cancer Genomes
Khanh N. Dinh, Roman Jaksik, Marek Kimmel, Amaury Lambert, Simon Tavaré
Statist. Sci. 35(1): 129-144 (February 2020). DOI: 10.1214/19-STS7561


Recent years have seen considerable work on inference about cancer evolution from mutations identified in cancer samples. Much of the modeling work has been based on classical models of population genetics, generalized to accommodate time-varying cell population size. Reverse-time, genealogical views of such models, commonly known as coalescents, have been used to infer aspects of the past of growing populations. Another approach is to use branching processes, the simplest scenario being the classical linear birth-death process. Inference from evolutionary models of DNA often exploits summary statistics of the sequence data, a common one being the so-called Site Frequency Spectrum (SFS). In a bulk tumor sequencing experiment, we can estimate for each site at which a novel somatic point mutation has arisen, the proportion of cells that carry that mutation. These numbers are then grouped into collections of sites which have similar mutant fractions. We examine how the SFS based on birth-death processes differs from those based on the coalescent model. This may stem from the different sampling mechanisms in the two approaches. However, we also show that despite this, they are quantitatively comparable for the range of parameters typical for tumor cell populations. We also present a model of tumor evolution with selective sweeps, and demonstrate how it may help in understanding the history of a tumor as well as the influence of data pre-processing. We illustrate the theory with applications to several examples from The Cancer Genome Atlas tumors.


Download Citation

Khanh N. Dinh. Roman Jaksik. Marek Kimmel. Amaury Lambert. Simon Tavaré. "Statistical Inference for the Evolutionary History of Cancer Genomes." Statist. Sci. 35 (1) 129 - 144, February 2020.


Published: February 2020
First available in Project Euclid: 3 March 2020

MathSciNet: MR4071362
Digital Object Identifier: 10.1214/19-STS7561

Keywords: birth-death processes , bulk sequencing , Cancer evolution , clonal selection , coalescents , ploidy , site frequency spectrum , tumor heterogeneity

Rights: Copyright © 2020 Institute of Mathematical Statistics

Vol.35 • No. 1 • February 2020
Back to Top