Bayesian Analysis

Bayesian Analysis of RNA-Seq Data Using a Family of Negative Binomial Models

Lili Zhao, Weisheng Wu, Dai Feng, Hui Jiang, and XuanLong Nguyen

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access


The analysis of RNA-Seq data has been focused on three main categories, including gene expression, relative exon usage and transcript expression. Methods have been proposed independently for each category using a negative binomial (NB) model. However, counts following a NB distribution on one feature (e.g., exon) do not guarantee a NB distribution for the other two features (e.g., gene/transcript). In this paper we propose a family of Negative Binomial models, which integrates the gene, exon and transcript analysis under a coherent NB model. The proposed model easily incorporates the uncertainty of assigning reads to transcripts and simplifies substantially the estimation for the relative usage. We developed simple Gibbs sampling algorithms for the posterior inference by exploiting fully tractable closed-forms of computation via suitable conjugate priors. The proposed models were investigated under extensive simulations. Finally, we applied our model to a real data set.

Article information

Bayesian Anal. (2017), 26 pages.

First available in Project Euclid: 8 April 2017

Permanent link to this document

Digital Object Identifier

Bayesian RNA-Seq Chinese restaurant table distribution differential test exon usage transcript analysis

Creative Commons Attribution 4.0 International License.


Zhao, Lili; Wu, Weisheng; Feng, Dai; Jiang, Hui; Nguyen, XuanLong. Bayesian Analysis of RNA-Seq Data Using a Family of Negative Binomial Models. Bayesian Anal., advance publication, 8 April 2017. doi:10.1214/17-BA1055.

Export citation


  • Aitchison, J., Barcelo-Vidal, C., Martin-Fernandez, J. A., and Pawlowsky-Glahn, V. (2000). “Logratio analysis and compositional distance.”Mathematical Geology, 32: 271–275.
  • Anders, S. and Huber, W. (2010). “Differential expression analysis for sequence countdata.”Genome Biology, 11: R106.
  • Anders, S., Reyes, A., and Huber, W. (2012). “Detecting differential usage of exons from RNA-seq data.”Genome Research, 22: 2008–2017.
  • Brooks, A. N., Yang, L., Duff, M. O., Hansen, K. D., Park, J. W., Dudoit, S., Brenner, S. E., and Graveley, B. R. (2010). “Conservation of an RNA regulatory map between Drosophila and mammals.”Genome Research, 193–202.
  • Chen, M.-H., Shao, Q., and Ibrahim, J. G. (2000).Monte Carlo Methods in Bayesian Computation. New York: Springer.
  • Cheung, V. G., Nayak, R. R., Wang, I. X., Elwyn, S., Cousins, S. M., Morley, M., and Spielman, R. S. (2010). “Polymorphic cis- and trans-regulation of human gene expression.”PLoS Biology, 8: e1000480.
  • Di, Y., Schafer, D. W., and nd J. H. Chang, J. S. C. (2011). “The NBP negative binomial model for assessing differential gene expression from RNA-seq.”Statistical Applications in Genetics and Molecular Biology, 10: Article 24.
  • Dillies, M.-A., Rau, A., Aubert, J., Hennequet-Antier, C., Jeanmougin, M., Servant, N., Keime, C., Marot, G., Castel, D., Estelle, J., Guernec, G., Jagla, B., Jouneau, L., Laloë, D., Gall, C. L., Schaëffer, B., Crom, S. L., Guedj, M., and Jaffrézic, F. (2013). “A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis.”Briefings in Bioinformatics, 14: 671–683.
  • Gelfand, A. E. and Dey, D. K. (1994). “Bayesian model choice: asymptotics and exact calculations.”Journal of the Royal Statistical Society, 56: 501–514.
  • Hardcastle, T. J. and Kelly, K. A. (2010). “baySeq: empirical Bayesian methods for identifying differential expression in sequence count data.”BMC Bioinforma, 11: 422.
  • Ibrahim, J. G., Chen, M.-H., and Sinha, D. (2001).Bayesian Survival Analysis. New York: Springer.
  • Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and SL, S. L. S. (2013). “TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions.”Genome Biology, 14: R36.
  • Leng, N., Dawson, J. A., Thomson, J. A., Ruotti, V., Rissman, A. I., Smits, B. M., Haag, J. D., Gould, M. N., Stewart, R. M., and Kendziorski, C. (2013). “EBSeq: An empirical Bayes hierarchical model for inference in RNA-seq experiments.”Bioinformatics.
  • León-Novelo, L. G., Müller, P., Arap, W., Kolonin, M., Sun, J., Pasqualini, R., and Do, K.-A. (2013). “Semi-parametric Bayesian inference for phage display data.”Biometrics, 69: 174–183.
  • Lewin, A., Bochkina, N., and Richardson, S. (2007). “Fully Bayesian mixture model for differential gene expression: simulations and model checks.”Statistical Applications in Genetics and Molecular Biologys, 6: Article36.
  • Love, M. I., Huber, W., and Anders, S. (2014). “Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2.”Genome Biology, 15: 550.
  • Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L., and Wold, B. (2008). “Mapping and quantifying mammalian transcriptomes by RNA-Seq.”Nature Methods, 5: 621–628.
  • Niu, L., Huang, W., Umbach, D. M., and Li, L. (2014). “IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data.”BMC Genomics, 15: 862.
  • Pickrell, J. K., Marioni, J. C., Pai, A. A., Degner, J. F., Engelhardt, B. E., Nkadori, E., Veyrieras, J. B., Stephens, M., Gilad, Y., and Pritchard, J. K. (2010). “Understanding mechanisms underlying human gene expression variation with RNA sequencing.”Nature, 464: 768–772.
  • Rapaport, F., Khanin, R., Liang, Y., Pirun, M., Krek, A., Zumbo, P., Mason, C. E., Socci, N. D., and Betel, D. (2013). “Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.”Genome Biology, 14: R95.
  • Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2010). “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.”Bioinformatics, 26: 139–140.
  • Salzman, J., Jiang, H., and Wong, W. H. (2011). “Statistical Modeling of RNA-Seq Data.”Statistical Science, 26.
  • Shi, Y. and Jiang, H. (2013). “rSeqDiff: Detecting Differential Isoform Expression from RNA-Seq Data Using Hierarchical Likelihood Ratio Test.”PLoS ONE, 8: e79448.
  • Soneson, C. and Delorenzi, M. (2013). “A comparison of methods for differential expression analysis of RNA-seq data.”Bioinformatics, 14: 91.
  • Storey, J. D. and Tibshirani, R. (2003). “Statistical significance for genomewide studies.”Proceedings of the National Academy of Sciences of the United States of America, 100: 9440–9445.
  • Trapnell, C., Hendrickson, D. G., Sauvageau, M., Goff, L., Rinn, J. L., and Pachter, L. (2013). “Differential analysis of gene regulation at transcript resolution with RNA-seq.”Nature Biotechnology, 31: 46–53.
  • Turro, E., Su, S.-Y., Goncalves, A., Coin, L. J., Richardson, S., and Lewin, A. (2011). “Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads.”Genome Biology, 12: R13.
  • van de Wiel, M. A., Leday, G., Pardo, L., Rue, H., der Vaart, A. W. V., and Wieringen, W. N. V. (2012). “Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors.”Biostatistics, 14: 113–128.
  • van de Wiel, M. A., Neerincx, M., Buffart, T. E., Sie, D., and Verheul, H. M. (2014). “ShrinkBayes: a versatile R-package for analysis of count-based sequencing data in complex study designs.”Bioinformatics, 15: 116.
  • Watanabe, S. (2010). “Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory.”Journal of Machine Learning Research, 11: 3571–3594.
  • Wu, H., Wang, C., and Wu, Z. (2013). “A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data.”Biostatistics, 14: 232–243.
  • Zhao, L., Wu, W., Feng, D., Jiang, H., and Nguyen, X.L. (2017). “Supplementary Materials for “Bayesian Analysis of RNA-Seq Data Using a Family of Negative Binomial Models””Bayesian Analysis.
  • Zhou, M. and Carin, L. (2012). “Augment-and-conquer negative binomial processes.”NIPS.
  • Zhou, M. and Carin, L. (2015). “Negative binomial process count and mixture modelling.”IEEE, 37: 307–320.

Supplemental materials