The Annals of Applied Statistics

GaGa: A parsimonious and flexible model for differential expression analysis

David Rossell

Full-text: Open access

Abstract

Hierarchical models are a powerful tool for high-throughput data with a small to moderate number of replicates, as they allow sharing information across units of information, for example, genes. We propose two such models and show its increased sensitivity in microarray differential expression applications. We build on the gamma–gamma hierarchical model introduced by Kendziorski et al. [Statist. Med. 22 (2003) 3899–3914] and Newton et al. [Biostatistics 5 (2004) 155–176], by addressing important limitations that may have hampered its performance and its more widespread use. The models parsimoniously describe the expression of thousands of genes with a small number of hyper-parameters. This makes them easy to interpret and analytically tractable. The first model is a simple extension that improves the fit substantially with almost no increase in complexity. We propose a second extension that uses a mixture of gamma distributions to further improve the fit, at the expense of increased computational burden. We derive several approximations that significantly reduce the computational cost. We find that our models outperform the original formulation of the model, as well as some other popular methods for differential expression analysis. The improved performance is specially noticeable for the small sample sizes commonly encountered in high-throughput experiments. Our methods are implemented in the freely available Bioconductor gaga package.

Article information

Source
Ann. Appl. Stat. Volume 3, Number 3 (2009), 1035-1051.

Dates
First available in Project Euclid: 5 October 2009

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1254773277

Digital Object Identifier
doi:10.1214/09-AOAS244

Mathematical Reviews number (MathSciNet)
MR2750385

Zentralblatt MATH identifier
1257.62111

Keywords
Hierarchical model microarray differential expression two-parameter gamma distribution

Citation

Rossell, David. GaGa: A parsimonious and flexible model for differential expression analysis. Ann. Appl. Stat. 3 (2009), no. 3, 1035--1051. doi:10.1214/09-AOAS244. https://projecteuclid.org/euclid.aoas/1254773277


Export citation

References

  • Armstrong, S. A., Staunton, J. E., Silverman, L. B., Pieters, R., Boer, M. L., Minden, M. D., Sallan, E. S., Lander, E. S., Golub, T. R. and Korsmeyer, S. J. (2002). Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics 30 41–47.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Coombes, K. R., Wang, J. and Baggerly, K. A. (2007). A statistical method for finding biomarkers from microarray data, with application to prostate cancer. Technical report, M.D. Anderson Cancer Center. Available at http://www.mdanderson.org/pdf/biostats_utmdabtr00704.pdf.
  • Coombes, K. R. (2007). ClassComparison: Classes and methods for “class comparison” problems on microarrays. R package version 2.5.0.
  • Damsleth, E. (1975). Conjugate classes for gamma distributions. Scand. J. Statist. 2 80–84.
  • Dennis, G., Sherman, B. T., Hosack, D. A., Yang, J., Baseler, M. W., Lane, H. C. and Lempicki, R. A. (2003). David: Database for annotation, visualization, and integrated discovery. Genome Biology 4 P3.
  • Dudoit, S., Yang, H. Y., Callow, M. J. and Speed, T. P. (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statist. Sinica 12 972–977.
  • Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A. J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J. Y. H. and Zhang, J. ( 2004). Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology 5.
  • Gottardo, R., Raftery, A. E., Yeung, K. Y. and Bumgarner, R. E. (2006). Bayesian robust inference for differential gene expression in microarrays with multiple samples. Biometrics 62 10–18.
  • Gottardo, R. (2004). Bridge: Bayesian robust inference for differential gene expression. R package version 1.4.0.
  • Irizarry, R., Hobbs, B., Collin, B., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U. and Speed, T. S. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4 249–264.
  • Kendziorski, C., Newton, M. and Sarkar, D. (2005). EBarrays: Empirical Bayes for microarrays. R package version 1.2.0.
  • Kendziorski, C. M., Newton, M. A., Lan, H. and Gould, M. N. (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statist. Med. 22 3899–3914.
  • Lapointe, J., Li, C., Higgins, J. P., Rijn, M., Bair, E., Montgomery, K., Ferrari, M., Egevad, L., Rayford, W., Bergerheim, U., Ekman, P., DeMarzo, A. M., Tibshirani, R., Botstein, D., Brown, P. O., Brooks, J. D. and Pollack, J. R. (2004). Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc. Natl. Acad. Sci. 101 811–816.
  • Lo, K. and Gottardo, R. (2007). Flexible empirical Bayes models for differential gene expression. Bioinformatics 23 328–335.
  • Lönnstedt, I. and Speed, T. (2002). Replicated microarray data. Statist. Sinica 12 31–46.
  • MAQCconsortium (2006). The microarray quality control (maqc) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature Biotechnology 24 1151–1161.
  • Miller, R. B. (1980). Bayesian analysis of the two-parameter gamma distribution. Technometrics 22 65–69.
  • Müller, P., Parmigiani, G. and Rice, K. (2007). FDR and Bayesian Multiple Comparisons Rules. Oxford Univ. Press.
  • Müller, P., Parmigiani, G., Robert, C. and Rousseau, J. (2004). Optimal sample size for multiple testing: The case of gene expression microarrays. J. Amer. Statist. Assoc. 99 990–1001.
  • Newton, M. A. and Kendziorski, C. M. (2003). Parametric Empirical Bayes Methods for Microarrays. Springer, New York.
  • Newton, M. A., Kendziorski, C. M., Richmond, C. S., Blattner, F. R. and Tsui, K. W. (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8 37–52.
  • Newton, M. A., Noueriry, A., Sarkar, D. and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture model. Biostatistics 5 155–176.
  • Pounds, S. and Morris, S. W. (2003). Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 10 1236–1242.
  • Rossell, D. (2009). Supplement to “GaGa: A simple and flexible hierarchical model for differential expression analysis.” DOI: 10.1214/09-AOAS244SUPP.
  • Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • Schwender, H. (2007). Siggenes: Multiple testing using SAM and Efron’s empirical Bayes approaches. R package version 1.13.2.
  • Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statist. Appl. Genet. Mol. Biol. 3 3.
  • Smyth, G. K. (2005). Limma: Linear models for microarray data. In Bioinformatics and Computational Biology Solutions Using R and Bioconductor (R. Gentleman, V. Carey, S. Dudoit, R. Irizarry and W. Huber, eds.) 397–420. Springer, New York.
  • Stafford, P. (2008). Methods in Microarray Normalization. CRC Press, USA.
  • Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. 98 5116–5121.
  • Wu, J. and Irizarry, J. M. J. R. with contributions from Gentry (2007). gcrma: Background Adjustment Using Sequence Information. R package version 2.8.1.
  • Wu, Z., Irizarry, R. A., Gentleman, R., Murillo, F. M. and Spencer, F. (2004). A model based background adjustment for oligonucleotide expression arrays. Technical report, Johns Hopkins Univ., Dept. Biostatistics.
  • Yuan, M. and Kendziorski, C. (2006). A unified approach for simultaneous gene clustering and differential expression identification. Biometrics 62 1089–1098.

Supplemental materials