Abstract
Tumor cell population consists of genetically heterogeneous subpopulations, known as subclones. Bulk sequencing data using high-throughput sequencing technology provide total and variant DNA and RNA read counts for many nucleotide loci as a mixture of signals from different subclones. We present RNDClone as a tool to deconvolute the mixture and reconstruct the subclones with distinct DNA genotypes and RNA expression profiles. In particular, we infer the number and population frequencies of subclones as well as subclonal copy numbers, variant allele numbers and gene expression levels by jointly modeling DNA and RNA read counts from the same tumor samples based on generalized latent factor models. Incorporating data at the RNA level provides new insights into intra-tumor heterogeneity in addition to the existing DNA-based inference. Performance of RNDClone is assessed using simulated and real-world datasets, including an analysis of three samples from a lung cancer patient in The Cancer Genome Atlas (TCGA). A potential fatal subclone is identified from the primary tumor which could explain the rapid prognosis and sudden death of the patient despite a promising diagnosis by conventional standards. The R package $\mathtt{RNDClone}$ is available in the Supplementary Material (Zhou et al. (2020)) and online at https://github.com/tianjianzhou/RNDClone.
Citation
Tianjian Zhou. Subhajit Sengupta. Peter Müller. Yuan Ji. "RNDClone: Tumor subclone reconstruction based on integrating DNA and RNA sequence data." Ann. Appl. Stat. 14 (4) 1856 - 1877, December 2020. https://doi.org/10.1214/20-AOAS1368
Information