Annals of Applied Statistics

Graphical models for zero-inflated single cell gene expression

Andrew McDavid, Raphael Gottardo, Noah Simon, and Mathias Drton

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Bulk gene expression experiments relied on aggregations of thousands of cells to measure the average expression in an organism. Advances in microfluidic and droplet sequencing now permit expression profiling in single cells. This study of cell-to-cell variation reveals that individual cells lack detectable expression of transcripts that appear abundant on a population level, giving rise to zero-inflated expression patterns. To infer gene coregulatory networks from such data, we propose a multivariate Hurdle model. It is comprised of a mixture of singular Gaussian distributions. We employ neighborhood selection with the pseudo-likelihood and a group lasso penalty to select and fit undirected graphical models that capture conditional independences between genes. The proposed method is more sensitive than existing approaches in simulations, even under departures from our Hurdle model. The method is applied to data for T follicular helper cells, and a high-dimensional profile of mouse dendritic cells. It infers network structure not revealed by other methods, or in bulk data sets. A R implementation is available at

Article information

Ann. Appl. Stat., Volume 13, Number 2 (2019), 848-873.

Received: October 2016
Revised: March 2018
First available in Project Euclid: 17 June 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Gene network single cell gene expression graphical model group lasso


McDavid, Andrew; Gottardo, Raphael; Simon, Noah; Drton, Mathias. Graphical models for zero-inflated single cell gene expression. Ann. Appl. Stat. 13 (2019), no. 2, 848--873. doi:10.1214/18-AOAS1213.

Export citation


  • Adachi, Y., Hiramatsu, S., Tokuda, N., Sharifi, K., Ebrahimi, M., Islam, A., Kagawa, Y., Koshy Vaidyan, L., Sawada, T., Hamano, K. and Owada, Y. (2012). Fatty acid-binding protein 4 (FABP4) and FABP5 modulate cytokine production in the mouse thymic epithelial cells. Histochem. Cell Biol. 138 397–406.
  • Chen, S., Witten, D. M. and Shojaie, A. (2015). Selection and estimation for mixed graphical models. Biometrika 102 47–64.
  • Cheng, J., Li, T., Levina, E. and Zhu, J. (2017). High-dimensional mixed graphical models. J. Comput. Graph. Statist. 26 367–378.
  • The Gene Ontology Consortium Gene ontology consortium: Going forward. Nucleic Acids Res. 43. (D1): D1049–D1056, 2015.
  • de Jong, E. C., Vieira, P. L., Kalinski, P., Schuitemaker, J. H. N., Tanaka, Y., Wierenga, E. A., Yazdanbakhsh, M. and Kapsenberg, M. L. (2002). Microbial compounds selectively induce Th1 cell-promoting or Th2 cell-promoting dendritic cells in vitro with diverse th cell-polarizing signals. J. Immunol. 168 1704–1709.
  • Denda-Nagai, K., Aida, S., Saba, K., Suzuki, K., Moriyama, S., Oo-puthinan, S., Tsuiji, M., Morikawa, A., Kumamoto, Y., Sugiura, D., Kudo, A., Akimoto, Y., Kawakami, H., Bovin, N. V. and Irimura, T. (2010). Distribution and function of macrophage galactose-type C-type lectin 2 (MGL2/CD301b): Efficient uptake and presentation of glycosylated antigens by dendritic cells. J. Biol. Chem. 285 19193–19204.
  • Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G. and West, M. (2004). Sparse graphical models for exploring gene expression data. J. Multivariate Anal. 90 196–212.
  • Drton, M. and Maathuis, M. (2017). Structure learning in graphical modeling. Annu. Rev. Stat. Appl. 4 365–393.
  • Drton, M., Sturmfels, B. and Sullivant, S. (2009). Lectures on Algebraic Statistics. Oberwolfach Seminars 39. Birkhäuser, Basel.
  • Eltoft, T., Kim, T. and Lee, T. W. (2006). On the multivariate Laplace distribution. IEEE Signal Process. Lett. 13 300–303.
  • Finak, G., McDavid, A., Yajima, M., Deng, J., Gersuk, V., Shalek, A. K., Slichter, C. K., Miller, H. W., Juliana McElrath, M., Prlic, M., Linsley, P. S. and Gottardo, R. (2015). MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16 278.
  • Foygel, R. and Drton, M. (2010). Exact block-wise optimization in group lasso and sparse group lasso for linear regression. 1–19. Arxiv preprint. Available at arXiv:1010.3320.
  • Marinov, G. K., Williams, B. A., McCue, K., Schroth, G. P., Gertz, J., Myers, R. M. and Wold, B. J. (2014). From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing. Genome Res. 24 496–510.
  • Hermann-Kleiter, N. and Baier, G. (2010). NFAT pulls the strings during CD4+ T helper cell effector functions. Unpublished manuscript.
  • Janes, K. A., Wang, C.-C., Holmberg, K. J., Cabral, K. and Brugge, J. S. (2010). Identifying single-cell molecular programs by stochastic profiling. Nat. Methods 7 311–317.
  • Johnston, R. J., Poholek, A. C., DiToro, D., Yusuf, I., Eto, D., Barnett, B., Dent, A. L., Craft, J. and Crotty, S. (2009). Bcl6 and Blimp-1 are reciprocal and antagonistic regulators of T follicular helper cell differentiation. Science 325.
  • Kim, J. K. and Marioni, J. C. (2013). Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data. Genome Biol. 14.
  • Pham, L. V., Tamayo, A. T., Yoshimura, L. C., Lin-Lee, Y. C. and Ford, R. J. (2005). Constitutive NF-kappaB and NFAT activation in aggressive B-cell lymphomas synergistically activates the CD154 gene and maintains lymphoma cell survival. Blood 106 3940–3947.
  • Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford University Press, New York.
  • Lee, J. D. and Hastie, T. J. (2013). Structure learning of mixed graphical models. In AISTATS 16 31 388–396, Scottsdale, AZ. Available at
  • Li, Y., Pearl, S. A. and Jackson, S. A. (2015). Gene networks in plant biology: Approaches in reconstruction and analysis. Trends Plant Sci. 20 664–675.
  • Lin, L., Finak, G., Ushey, K., Seshadri, C., Hawn, T. R., Frahm, N., Scriba, T. J., Mahomed, H., Hanekom, W. et al. (2015). COMPASS identifies T-cell subsets correlated with clinical outcomes. Nat. Biotechnol. 33 610–616.
  • Ma, C. S., Deenick, E. K., Batten, M. and Tangye, S. G. (2012). The origins, function, and regulation of T follicular helper cells. J. Exp. Med. 209 1241–1253.
  • Markowetz, F. and Spang, R. (2007). Inferring cellular networks: A review. BMC Bioinform. 8.
  • McDavid, A., Finak, G., Chattopadyay, P. K., Dominguez, M., Lamoreaux, L., Ma, S. S., Roederer, M. and Gottardo, R. (2013). Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. Bioinformatics 29 461–467.
  • McDavid, A., Gottardo, R., Simon, N. and Drton, M. (2019). Supplement to “Graphical models for zero-inflated single cell gene expression.” DOI:10.1214/18-AOAS1213SUPP.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Parikh, N. and Boyd, S. (2014). Proximal algorithms. Found. Trends Optim. 1 123–231.
  • Precopio, M. L., Betts, M. R., Parrino, J., Price, D. A., Gostick, E., Ambrozak, D. R., Asher, T. E., Douek, D. C., Harari, A. et al. (2007). Immunization with vaccinia virus induces polyfunctional and phenotypically distinctive CD8($+$) T cell responses. J. Exp. Med. 204 1405–1416.
  • Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $\ell_{1}$-regularized logistic regression. Ann. Statist. 38 1287–1319.
  • Shah, R. D. and Samworth, R. J. (2013). Variable selection with error control: Another look at stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 55–80.
  • Shalek, A. K., Satija, R., Shuga, J., Trombetta, J. J., Gennert, D., Lu, D., Chen, P., Gertner, R. S., Gaublomme, J. T. et al. (2014). Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature 510 263–269.
  • Simon, N. and Tibshirani, R. (2012). Standardization and the group Lasso penalty. Statist. Sinica 22 983–1001.
  • Tansey, W., Padilla, O. H. M., Suggala, A. S. and Ravikumar, P. (2015). Vector-space Markov random fields via exponential families. In Proceedings of the 32nd International Conference on Machine Learning 37 684–692. Available at
  • Tibshirani, R., Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J. and Tibshirani, R. J. (2012). Strong rules for discarding predictors in lasso-type problems. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 245–266.
  • Yang, E., Baker, Y., Ravikumar, P., Allen, G. and Liu, Z. (2014). Mixed graphical models via exponential families. In AISTATS 17 33. Reykjavik, Iceland.

Supplemental materials

  • Derivations and methods. Supplemental derivations and methods for simulation and data preprocessing.