Gene expression deconvolution is a powerful tool for exploring the microenvironment of complex tissues comprised of multiple cell groups using transcriptomic data. Characterizing cell activities for a particular condition has been regarded as a primary mission against diseases. For example, cancer immunology aims to clarify the role of the immune system in the progression and development of cancer through analyzing the immune cell components of tumors. To that end, many deconvolution methods have been proposed for inferring cell subpopulations within tissues. Nevertheless, two problems limit the practicality of current approaches. First, most approaches use external purified data to preselect cell type-specific genes that contribute to deconvolution. However, some types of cells cannot be found in purified profiles, and the genes specifically over- or under-expressed in them cannot be identified. This is particularly a problem in cancer studies. Hence, a preselection strategy that is independent from deconvolution is inappropriate. The second problem is that existing approaches do not recover the expression profiles of unknown cells present in bulk tissues when the reference set of purified cell-specific profiles is incomplete which results in biased estimation of unknown cell proportions. Furthermore, it causes the shift-invariant property of deconvolution to fail which then affects the estimation performance. To address these two problems, we propose a novel semireference-based deconvolution approach, BayICE which employs hierarchical Bayesian modeling with stochastic search variable selection. We develop a comprehensive Markov chain Monte Carlo procedure through Gibbs sampling to estimate proportions, expression profiles and signature genes for a set of known reference cell types as well as an unknown cell type. Simulation and validation studies illustrate that BayICE outperforms existing semireference-based deconvolution approaches in estimating cell proportions. We further show that BayICE is applicable to single-cell RNA-seq data. Subsequently, we demonstrate an application of BayICE in the RNA sequencing of patients with nonsmall cell lung cancer. The model is implemented in the R package “BayICE,” and the algorithm is available for download.
We thank Dr. Jiebiao Wang (University of Pittsburgh) for the suggestions to refine the manuscript. This work was supported by the Ministry of Science and Technology [MOST 107-2118-M-007-001].
An-Shun Tai. George C. Tseng. Wen-Ping Hsieh. "BayICE: A Bayesian hierarchical model for semireference-based deconvolution of bulk transcriptomic data." Ann. Appl. Stat. 15 (1) 391 - 411, March 2021. https://doi.org/10.1214/20-AOAS1376