Open Access
June 2016 Shrinkage of dispersion parameters in the binomial family, with application to differential exon skipping
Sean Ruddy, Marla Johnson, Elizabeth Purdom
Ann. Appl. Stat. 10(2): 690-725 (June 2016). DOI: 10.1214/15-AOAS871

Abstract

The prevalence of sequencing experiments in genomics has led to an increased use of methods for count data in analyzing high-throughput genomic data to perform analyses. The importance of shrinkage methods in improving the performance of statistical methods remains. A common example is gene expression data, where the counts per gene are often modeled as some form of an overdispersed Poisson. Shrinkage estimates of the per-gene dispersion parameter have led to improved estimation of dispersion, particularly in the case of a small number of samples.

We address a different count setting introduced by the use of sequencing data: comparing differential proportional usage via an overdispersed binomial model. We are motivated by our interest in testing for differential exon skipping in mRNA-Seq experiments. We introduce a novel shrinkage method that models the overdispersion with the double binomial distribution proposed by Efron [J. Amer. Statist. Assoc. 81 (1986) 709–721].

Our method (WEB-Seq) is an empirical Bayes strategy for producing a shrunken estimate of dispersion and effectively detects differential proportional usage, and has close ties to the weighted-likelihood strategy of edgeR developed for gene expression data [Bioinformatics 23 (2007) 2881–2887, Bioinformatics (Oxford, England) 26 (2010) 139–140]. We analyze its behavior on simulated data sets as well as real data and show that our method is fast, powerful and gives accurate control of the FDR compared to alternative approaches. We provide implementation of our methods in the R package DoubleExpSeq available on CRAN.

Citation

Download Citation

Sean Ruddy. Marla Johnson. Elizabeth Purdom. "Shrinkage of dispersion parameters in the binomial family, with application to differential exon skipping." Ann. Appl. Stat. 10 (2) 690 - 725, June 2016. https://doi.org/10.1214/15-AOAS871

Information

Received: 1 March 2015; Revised: 1 August 2015; Published: June 2016
First available in Project Euclid: 22 July 2016

zbMATH: 06625666
MathSciNet: MR3528357
Digital Object Identifier: 10.1214/15-AOAS871

Keywords: alternative splicing , dispersion estimation , Empirical Bayes , mRNA-Seq , over-dispersed binomial

Rights: Copyright © 2016 Institute of Mathematical Statistics

Vol.10 • No. 2 • June 2016
Back to Top