- Statist. Sci.
- Volume 26, Number 1 (2011), 62-83.
Statistical Modeling of RNA-Seq Data
Recently, ultra high-throughput sequencing of RNA (RNA-Seq) has been developed as an approach for analysis of gene expression. By obtaining tens or even hundreds of millions of reads of transcribed sequences, an RNA-Seq experiment can offer a comprehensive survey of the population of genes (transcripts) in any sample of interest. This paper introduces a statistical model for estimating isoform abundance from RNA-Seq data and is flexible enough to accommodate both single end and paired end RNA-Seq data and sampling bias along the length of the transcript. Based on the derivation of minimal sufficient statistics for the model, a computationally feasible implementation of the maximum likelihood estimator of the model is provided. Further, it is shown that using paired end RNA-Seq provides more accurate isoform abundance estimates than single end sequencing at fixed sequencing depth. Simulation studies are also given.
Statist. Sci. Volume 26, Number 1 (2011), 62-83.
First available in Project Euclid: 9 June 2011
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Salzman, Julia; Jiang, Hui; Wong, Wing Hung. Statistical Modeling of RNA-Seq Data. Statist. Sci. 26 (2011), no. 1, 62--83. doi:10.1214/10-STS343. http://projecteuclid.org/euclid.ss/1307626566.