Open Access
Translator Disclaimer
June 2021 A compositional model to assess expression changes from single-cell RNA-seq data
Xiuyu Ma, Keegan Korthauer, Christina Kendziorski, Michael A. Newton
Author Affiliations +
Ann. Appl. Stat. 15(2): 880-901 (June 2021). DOI: 10.1214/20-AOAS1423


On the problem of scoring genes for evidence of changes in the distribution of single-cell expression, we introduce an empirical Bayesian mixture approach and evaluate its operating characteristics in a range of numerical experiments. The proposed approach leverages cell-subtype structure revealed in cluster analysis in order to boost gene-level information on expression changes. Cell clustering informs gene-level analysis through a specially-constructed prior distribution over pairs of multinomial probability vectors; this prior meshes with available model-based tools that score patterns of differential expression over multiple subtypes. We derive an explicit formula for the posterior probability that a gene has the same distribution in two cellular conditions, allowing for a gene-specific mixture over subtypes in each condition. Advantage is gained by the compositional structure of the model not only in which a host of gene-specific mixture components are allowed but also in which the mixing proportions are constrained at the whole cell level. This structure leads to a novel form of information sharing through which the cell-clustering results support gene-level scoring of differential distribution. The result, according to our numerical experiments, is improved sensitivity compared to several standard approaches for detecting distributional expression changes.

Funding Statement

This research was supported in part by U.S. National Institutes of Health Grants P50 DE026787, P30CA14520-45, R01 GM102756, U54AI117924, and U.S. National Science Foundation Grant 1740707.


Dr Newton is also in the Department of Biostatistics and Medical Informatics at University of Wisconsin–Madison.

We thank the Associate Editor and referees for very helpful comments.


Download Citation

Xiuyu Ma. Keegan Korthauer. Christina Kendziorski. Michael A. Newton. "A compositional model to assess expression changes from single-cell RNA-seq data." Ann. Appl. Stat. 15 (2) 880 - 901, June 2021.


Received: 1 June 2019; Revised: 1 November 2020; Published: June 2021
First available in Project Euclid: 12 July 2021

Digital Object Identifier: 10.1214/20-AOAS1423

Keywords: clustering , double Dirichlet mixture , Empirical Bayes , local false discovery rate , mixture model

Rights: Copyright © 2021 Institute of Mathematical Statistics


Vol.15 • No. 2 • June 2021
Back to Top