Open Access
Translator Disclaimer
June 2019 Graphical models for zero-inflated single cell gene expression
Andrew McDavid, Raphael Gottardo, Noah Simon, Mathias Drton
Ann. Appl. Stat. 13(2): 848-873 (June 2019). DOI: 10.1214/18-AOAS1213


Bulk gene expression experiments relied on aggregations of thousands of cells to measure the average expression in an organism. Advances in microfluidic and droplet sequencing now permit expression profiling in single cells. This study of cell-to-cell variation reveals that individual cells lack detectable expression of transcripts that appear abundant on a population level, giving rise to zero-inflated expression patterns. To infer gene coregulatory networks from such data, we propose a multivariate Hurdle model. It is comprised of a mixture of singular Gaussian distributions. We employ neighborhood selection with the pseudo-likelihood and a group lasso penalty to select and fit undirected graphical models that capture conditional independences between genes. The proposed method is more sensitive than existing approaches in simulations, even under departures from our Hurdle model. The method is applied to data for T follicular helper cells, and a high-dimensional profile of mouse dendritic cells. It infers network structure not revealed by other methods, or in bulk data sets. A R implementation is available at


Download Citation

Andrew McDavid. Raphael Gottardo. Noah Simon. Mathias Drton. "Graphical models for zero-inflated single cell gene expression." Ann. Appl. Stat. 13 (2) 848 - 873, June 2019.


Received: 1 October 2016; Revised: 1 March 2018; Published: June 2019
First available in Project Euclid: 17 June 2019

zbMATH: 1423.62148
MathSciNet: MR3963555
Digital Object Identifier: 10.1214/18-AOAS1213

Keywords: Gene network , Graphical model , group lasso , single cell gene expression

Rights: Copyright © 2019 Institute of Mathematical Statistics


Vol.13 • No. 2 • June 2019
Back to Top