Open Access
March 2021 The role of scale in the estimation of cell-type proportions
Gregory J. Hunt, Johann A. Gagnon-Bartsch
Author Affiliations +
Ann. Appl. Stat. 15(1): 270-286 (March 2021). DOI: 10.1214/20-AOAS1395

Abstract

Complex tissues are composed of a large number of different types of cells, each involved in a multitude of biological processes. Consequently, an important component to understanding such processes is understanding the cell-type composition of the tissues. Estimating cell-type composition using high-throughput gene expression data is known as cell-type deconvolution. In this paper we first summarize the extensive deconvolution literature by identifying a common regression-like approach to deconvolution. We call this approach the unified deconvolution-as-regression (UDAR) framework. While methods that fall under this framework all use a similar model, they fit using data on different scales. Two popular scales for gene expression data are logarithmic and linear. Unfortunately, each of these scales has problems in the UDAR framework. Using log-scale gene expressions proposes a biologically implausible model and using linear-scale gene expressions will lead to statistically inefficient estimators. To explore ways to address these issues, in this paper we consider how deconvolution methods may use an adjusted model that is a hybrid of the two scales. In analysis on simulations as well as a collection of eleven real benchmark datasets, we find a prototypical hybrid-scale adjustment to the UDAR framework improves statistical efficiency and robustness. More broadly, we believe these hybrid-scale modeling principles may be incorporated into many existing deconvolution methods.

Acknowledgments

The authors gratefully acknowledge support from the National Science Foundation (grant no. DMS-1646108).

Citation

Download Citation

Gregory J. Hunt. Johann A. Gagnon-Bartsch. "The role of scale in the estimation of cell-type proportions." Ann. Appl. Stat. 15 (1) 270 - 286, March 2021. https://doi.org/10.1214/20-AOAS1395

Information

Received: 1 March 2020; Revised: 1 September 2020; Published: March 2021
First available in Project Euclid: 18 March 2021

Digital Object Identifier: 10.1214/20-AOAS1395

Keywords: Deconvolution , high-throughput sequencing , proportions

Rights: Copyright © 2021 Institute of Mathematical Statistics

Vol.15 • No. 1 • March 2021
Back to Top