Open Access
December 2020 RNDClone: Tumor subclone reconstruction based on integrating DNA and RNA sequence data
Tianjian Zhou, Subhajit Sengupta, Peter Müller, Yuan Ji
Ann. Appl. Stat. 14(4): 1856-1877 (December 2020). DOI: 10.1214/20-AOAS1368


Tumor cell population consists of genetically heterogeneous subpopulations, known as subclones. Bulk sequencing data using high-throughput sequencing technology provide total and variant DNA and RNA read counts for many nucleotide loci as a mixture of signals from different subclones. We present RNDClone as a tool to deconvolute the mixture and reconstruct the subclones with distinct DNA genotypes and RNA expression profiles. In particular, we infer the number and population frequencies of subclones as well as subclonal copy numbers, variant allele numbers and gene expression levels by jointly modeling DNA and RNA read counts from the same tumor samples based on generalized latent factor models. Incorporating data at the RNA level provides new insights into intra-tumor heterogeneity in addition to the existing DNA-based inference. Performance of RNDClone is assessed using simulated and real-world datasets, including an analysis of three samples from a lung cancer patient in The Cancer Genome Atlas (TCGA). A potential fatal subclone is identified from the primary tumor which could explain the rapid prognosis and sudden death of the patient despite a promising diagnosis by conventional standards. The R package $\mathtt{RNDClone}$ is available in the Supplementary Material (Zhou et al. (2020)) and online at


Download Citation

Tianjian Zhou. Subhajit Sengupta. Peter Müller. Yuan Ji. "RNDClone: Tumor subclone reconstruction based on integrating DNA and RNA sequence data." Ann. Appl. Stat. 14 (4) 1856 - 1877, December 2020.


Received: 1 March 2020; Revised: 1 June 2020; Published: December 2020
First available in Project Euclid: 19 December 2020

MathSciNet: MR4194251
Digital Object Identifier: 10.1214/20-AOAS1368

Keywords: Copy number , gene expression , high-throughput sequencing , Intra-tumor heterogeneity , latent factor model , somatic mutation

Rights: Copyright © 2020 Institute of Mathematical Statistics

Vol.14 • No. 4 • December 2020
Back to Top