Statistical Science

Statistical Modeling of RNA-Seq Data

Julia Salzman, Hui Jiang, and Wing Hung Wong

Full-text: Open access


Recently, ultra high-throughput sequencing of RNA (RNA-Seq) has been developed as an approach for analysis of gene expression. By obtaining tens or even hundreds of millions of reads of transcribed sequences, an RNA-Seq experiment can offer a comprehensive survey of the population of genes (transcripts) in any sample of interest. This paper introduces a statistical model for estimating isoform abundance from RNA-Seq data and is flexible enough to accommodate both single end and paired end RNA-Seq data and sampling bias along the length of the transcript. Based on the derivation of minimal sufficient statistics for the model, a computationally feasible implementation of the maximum likelihood estimator of the model is provided. Further, it is shown that using paired end RNA-Seq provides more accurate isoform abundance estimates than single end sequencing at fixed sequencing depth. Simulation studies are also given.

Article information

Statist. Sci. Volume 26, Number 1 (2011), 62-83.

First available in Project Euclid: 9 June 2011

Permanent link to this document

Digital Object Identifier

Zentralblatt MATH identifier

Mathematical Reviews number (MathSciNet)


Salzman, Julia; Jiang, Hui; Wong, Wing Hung. Statistical Modeling of RNA-Seq Data. Statistical Science 26 (2011), no. 1, 62--83. doi:10.1214/10-STS343.

