Open Access
September 2018 BayCount: A Bayesian decomposition method for inferring tumor heterogeneity using RNA-Seq counts
Fangzheng Xie, Mingyuan Zhou, Yanxun Xu
Ann. Appl. Stat. 12(3): 1605-1627 (September 2018). DOI: 10.1214/17-AOAS1123

Abstract

Tumors are heterogeneous. A tumor sample usually consists of a set of subclones with distinct transcriptional profiles and potentially different degrees of aggressiveness and responses to drugs. Understanding tumor heterogeneity is therefore critical for precise cancer prognosis and treatment. In this paper we introduce BayCount—a Bayesian decomposition method to infer tumor heterogeneity with highly over-dispersed RNA sequencing count data. Using negative binomial factor analysis, BayCount takes into account both the between-sample and gene-specific random effects on raw counts of sequencing reads mapped to each gene. For the posterior inference, we develop an efficient compound Poisson-based blocked Gibbs sampler. Simulation studies show that BayCount is able to accurately estimate the subclonal inference, including the number of subclones, the proportions of these subclones in each tumor sample, and the gene expression profiles in each subclone. For real world data examples, we apply BayCount to The Cancer Genome Atlas lung cancer and kidney cancer RNA sequencing count data and obtain biologically interpretable results. Our method represents the first effort in characterizing tumor heterogeneity using RNA sequencing count data that simultaneously removes the need of normalizing the counts, achieves statistical robustness, and obtains biologically/clinically meaningful insights. The R package BayCount implementing our model and algorithm is available for download.

Citation

Download Citation

Fangzheng Xie. Mingyuan Zhou. Yanxun Xu. "BayCount: A Bayesian decomposition method for inferring tumor heterogeneity using RNA-Seq counts." Ann. Appl. Stat. 12 (3) 1605 - 1627, September 2018. https://doi.org/10.1214/17-AOAS1123

Information

Received: 1 February 2017; Revised: 1 November 2017; Published: September 2018
First available in Project Euclid: 11 September 2018

zbMATH: 06979644
MathSciNet: MR3852690
Digital Object Identifier: 10.1214/17-AOAS1123

Keywords: Cancer genomics , Compound Poisson , Markov chain Monte Carlo , negative binomial , overdispersion

Rights: Copyright © 2018 Institute of Mathematical Statistics

Vol.12 • No. 3 • September 2018
Back to Top