Recently, ultra high-throughput sequencing of RNA (RNA-Seq) has been developed as an approach for analysis of gene expression. By obtaining tens or even hundreds of millions of reads of transcribed sequences, an RNA-Seq experiment can offer a comprehensive survey of the population of genes (transcripts) in any sample of interest. This paper introduces a statistical model for estimating isoform abundance from RNA-Seq data and is flexible enough to accommodate both single end and paired end RNA-Seq data and sampling bias along the length of the transcript. Based on the derivation of minimal sufficient statistics for the model, a computationally feasible implementation of the maximum likelihood estimator of the model is provided. Further, it is shown that using paired end RNA-Seq provides more accurate isoform abundance estimates than single end sequencing at fixed sequencing depth. Simulation studies are also given.
Statist. Sci.
26(1):
62-83
(February 2011).
DOI: 10.1214/10-STS343
Bullard, J. H., Purdom, E., Hansen, K. D. and Dudoit, S. (2010). Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11.Bullard, J. H., Purdom, E., Hansen, K. D. and Dudoit, S. (2010). Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11.
Hansen, K. D., Brenner, S. E. and Dudoit, S. (2010). Biases in illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38 e131.Hansen, K. D., Brenner, S. E. and Dudoit, S. (2010). Biases in illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38 e131.
Hansen, K. D., Lareau, L. F., Blanchette, M., Green, R. E., Meng, Q., Rehwinkel, J., Gallusser, F. L., Izaurralde, E., Rio, D. C., Dudoit, S. and Brenner, S. E. (2009). Genome-wide identification of alternative splice forms down-regulated by nonsense-mediated mRNA decay in Drosophila. PLoS Genetics 5 e1000525.Hansen, K. D., Lareau, L. F., Blanchette, M., Green, R. E., Meng, Q., Rehwinkel, J., Gallusser, F. L., Izaurralde, E., Rio, D. C., Dudoit, S. and Brenner, S. E. (2009). Genome-wide identification of alternative splice forms down-regulated by nonsense-mediated mRNA decay in Drosophila. PLoS Genetics 5 e1000525.
Hiller, D., Jiang, H., Xu, W. and Wong, W. H. (2009). Identifiability of isoform deconvolution from junction arrays and RNA-Seq. Bioinformatics 25 3056–3059.Hiller, D., Jiang, H., Xu, W. and Wong, W. H. (2009). Identifiability of isoform deconvolution from junction arrays and RNA-Seq. Bioinformatics 25 3056–3059.
Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. S. and Weissman, J. S. (2009). Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324 218–223.Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. S. and Weissman, J. S. (2009). Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324 218–223.
Jiang, H. and Wong, W. H. (2008). Seqmap: Mapping massive amount of oligonucleotides to the genome. Bioinformatics 24 2395–2396. 1201.68090Jiang, H. and Wong, W. H. (2008). Seqmap: Mapping massive amount of oligonucleotides to the genome. Bioinformatics 24 2395–2396. 1201.68090
Jiang, H. and Wong, W. H. (2009). Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25 1026–1032. 1201.68090Jiang, H. and Wong, W. H. (2009). Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25 1026–1032. 1201.68090
Jiang, H., Wang, F., Dyer, N. P. and Wong, W. H. (2010). Cisgenome browser: A flexible tool for genomic data visualization. Bioinformatics 26 1781–1782. 1201.68090Jiang, H., Wang, F., Dyer, N. P. and Wong, W. H. (2010). Cisgenome browser: A flexible tool for genomic data visualization. Bioinformatics 26 1781–1782. 1201.68090
Langmead, B., Trapnell, C., Pop, M. and Salzberg, S. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10 R25.Langmead, B., Trapnell, C., Pop, M. and Salzberg, S. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10 R25.
Maher, C. A., Kumar-Sinha, C., Cao, X., Kalyana-Sundaram, S., Han, B., Jing, X., Sam, L., Barrette, T., Palanisamy, N. and Chinnaiyan, A. M. (2009). Transcriptome sequencing to detect gene fusions in cancer. Nature 458 97–101.Maher, C. A., Kumar-Sinha, C., Cao, X., Kalyana-Sundaram, S., Han, B., Jing, X., Sam, L., Barrette, T., Palanisamy, N. and Chinnaiyan, A. M. (2009). Transcriptome sequencing to detect gene fusions in cancer. Nature 458 97–101.
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. and Wold, B. (2008). Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5 621–628.Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. and Wold, B. (2008). Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5 621–628.
Pan, Q., Shai, O., Lee, L. J., Frey, B. J. and Blencowe, B. J. (2008). Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature Genet. 40 1413–1415.Pan, Q., Shai, O., Lee, L. J., Frey, B. J. and Blencowe, B. J. (2008). Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature Genet. 40 1413–1415.
Pruitt, K. D., Tatusova, T. and Maglott, D. R. (2005). NCBI reference sequence (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33 D501–D504.Pruitt, K. D., Tatusova, T. and Maglott, D. R. (2005). NCBI reference sequence (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33 D501–D504.
Quail, M. A., Kozarewa, I., Smith, F., Scally, A., Stephens, P. J., Durbin, R., Swerdlow, H. and Turner1, D. J. (2008). A large genome center’s improvements to the illumina sequencing system. Nature Methods 5 1005–1010.Quail, M. A., Kozarewa, I., Smith, F., Scally, A., Stephens, P. J., Durbin, R., Swerdlow, H. and Turner1, D. J. (2008). A large genome center’s improvements to the illumina sequencing system. Nature Methods 5 1005–1010.
Sultan, M., Schulz, M. H., Richard, H., Magen, A., Klingenhoff, A., Scherf, M., Seifert, M., Borodina, T., Soldatov, A., Parkhomchuk, D., Schmidt, D., O’Keeffe, S., Haas, S., Vingron, M., Lehrach, H. and Yaspo, M.-L. (2008). A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321 956–960.Sultan, M., Schulz, M. H., Richard, H., Magen, A., Klingenhoff, A., Scherf, M., Seifert, M., Borodina, T., Soldatov, A., Parkhomchuk, D., Schmidt, D., O’Keeffe, S., Haas, S., Vingron, M., Lehrach, H. and Yaspo, M.-L. (2008). A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321 956–960.
Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., Wold, B. J. and Pachter, L. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnol. 28 511–515.Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., Wold, B. J. and Pachter, L. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnol. 28 511–515.
Vega, V. B., Cheung, E., Palanisamy, N. and Sung, W.-K. (2009). Inherent signals in sequencing-based chromatin-immunoprecipitation control libraries. PLoS ONE 4 e5241.Vega, V. B., Cheung, E., Palanisamy, N. and Sung, W.-K. (2009). Inherent signals in sequencing-based chromatin-immunoprecipitation control libraries. PLoS ONE 4 e5241.
Wahlstedt, H., Daniel, C., Enstero, M. and Ohman, M. (2009). Large-scale MRNA sequencing determines global regulation of RNA editing during brain development. Genome Res. 19 978–986.Wahlstedt, H., Daniel, C., Enstero, M. and Ohman, M. (2009). Large-scale MRNA sequencing determines global regulation of RNA editing during brain development. Genome Res. 19 978–986.
Wang, E. T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S. F., Schroth, G. P. and Burge, C. B. (2008). Alternative isoform regulation in human tissue transcriptomes. Nature 456 470–476.Wang, E. T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S. F., Schroth, G. P. and Burge, C. B. (2008). Alternative isoform regulation in human tissue transcriptomes. Nature 456 470–476.
Zhang, W., Duan, S., Bleibel, W. K., Wisel, S. A., Huang, R. S., Wu, X., He, L., Clark, T. A., Chen, T. X., Schweitzer, A. C., Blume, J. E., Dolan, M. E. and Cox, N. J. (2009). Identification of common genetic variants that account for transcript isoform variation between human populations. J. Human Genetics 125 81–93.Zhang, W., Duan, S., Bleibel, W. K., Wisel, S. A., Huang, R. S., Wu, X., He, L., Clark, T. A., Chen, T. X., Schweitzer, A. C., Blume, J. E., Dolan, M. E. and Cox, N. J. (2009). Identification of common genetic variants that account for transcript isoform variation between human populations. J. Human Genetics 125 81–93.