The Annals of Applied Statistics

Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression

Belinda Phipson, Stanley Lee, Ian J. Majewski, Warren S. Alexander, and Gordon K. Smyth

Full-text: Open access

Abstract

One of the most common analysis tasks in genomic research is to identify genes that are differentially expressed (DE) between experimental conditions. Empirical Bayes (EB) statistical tests using moderated genewise variances have been very effective for this purpose, especially when the number of biological replicate samples is small. The EB procedures can, however, be heavily influenced by a small number of genes with very large or very small variances. This article improves the differential expression tests by robustifying the hyperparameter estimation procedure. The robust procedure has the effect of decreasing the informativeness of the prior distribution for outlier genes while increasing its informativeness for other genes. This effect has the double benefit of reducing the chance that hypervariable genes will be spuriously identified as DE while increasing statistical power for the main body of genes. The robust EB algorithm is fast and numerically stable. The procedure allows exact small-sample null distributions for the test statistics and reduces exactly to the original EB procedure when no outlier genes are present. Simulations show that the robustified tests have similar performance to the original tests in the absence of outlier genes but have greater power and robustness when outliers are present. The article includes case studies for which the robust method correctly identifies and downweights genes associated with hidden covariates and detects more genes likely to be scientifically relevant to the experimental conditions. The new procedure is implemented in the limma software package freely available from the Bioconductor repository.

Article information

Source
Ann. Appl. Stat. Volume 10, Number 2 (2016), 946-963.

Dates
Received: October 2014
Revised: December 2015
First available in Project Euclid: 22 July 2016

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1469199900

Digital Object Identifier
doi:10.1214/16-AOAS920

Mathematical Reviews number (MathSciNet)
MR3528367

Zentralblatt MATH identifier
06625676

Keywords
Empirical Bayes outliers robustness gene expression microarrays

Citation

Phipson, Belinda; Lee, Stanley; Majewski, Ian J.; Alexander, Warren S.; Smyth, Gordon K. Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Ann. Appl. Stat. 10 (2016), no. 2, 946--963. doi:10.1214/16-AOAS920. https://projecteuclid.org/euclid.aoas/1469199900.


Export citation

References

  • Baldi, P. and Long, A. D. (2001). A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inferences of gene changes. Bioinformatics 17 509–519.
  • Barlow, R. E., Bartholomew, D. J., Bremner, J. M. and Brunk, H. D. (1972). Statistical Inference Under Order Restrictions. Wiley, London.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 289–300.
  • Berger, J. O. (1984). The robust Bayesian viewpoint. In Robustness of Bayesian Analyses (J. Kadane, ed.). Stud. Bayesian Econometrics 4 63–144. North-Holland, Amsterdam. With comments and with a reply by the author.
  • Berger, J. O. (1990). Robust Bayesian analysis: Sensitivity to the prior. J. Statist. Plann. Inference 25 303–328.
  • Bolstad, B. M., Irizarry, R. A., Åstrand, M. and Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19 185–193.
  • Brent, R. P. (1973). Algorithms for Minimization Without Derivatives. Prentice-Hall, Englewood Cliffs, NJ.
  • Casella, G. (1985). An introduction to empirical Bayes data analysis. Amer. Statist. 39 83–87.
  • Chen, Y., Lun, A. T. L. and Smyth, G. K. (2014). Differential expression analysis of complex RNA-seq experiments using edgeR. In Statistical Analysis of Next Generation Sequence Data (S. Datta and D. S. Nettleton, eds.) 51–74. Springer, New York.
  • Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. J. Amer. Statist. Assoc. 74 829–836.
  • Efron, B. and Morris, C. (1972). Limiting the risk of Bayes and empirical Bayes estimators. II. The empirical Bayes case. J. Amer. Statist. Assoc. 67 130–139.
  • Efron, B. and Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. J. Amer. Statist. Assoc. 68 117–130.
  • Gaver, D. P. and O’Muircheartaigh, I. G. (1987). Robust empirical Bayes analyses of event rates. Technometrics 29 1–15.
  • Good-Jacobson, K. L., Chen, Y., Voss, A. K., Smyth, G. K., Thomas, T. and Tarlinton, D. (2014). Regulation of germinal center responses and B-cell memory by the chromatin modifier MOZ. Proc. Natl. Acad. Sci. USA 111 9585–9590.
  • Gottardo, R., Raftery, A. E., Yeung, K. Y. and Bumgarner, R. E. (2006). Bayesian robust inference for differential gene expression in microarrays with multiple samples. Biometrics 62 10–18.
  • Insua, D. R. and Ruggeri, F., eds. (2000). Robust Bayesian Analysis. Lecture Notes in Statistics 152. Springer, New York.
  • Jeanmougin, M., de Reynies, A., Marisa, L., Paccard, C., Nuel, G. and Guedj, M. (2010). Should we abandon the t-test in the analysis of gene expression microarray data: A comparison of variance modeling strategies. PLoS ONE 5 e12336.
  • Ji, H. and Liu, X. S. (2010). Analyzing ’omics data using hierarchical models. Nature Biotechnology 28 337.
  • Kooperberg, C., Aragaki, A., Strand, A. D. and Olson, J. M. (2005). Significance testing for small microarray experiments. Stat. Med. 24 2281–2298.
  • Law, C. W., Chen, Y., Shi, W. and Smyth, G. K. (2014). Voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15 R29.
  • Liao, J. G., McMurry, T. and Berg, A. (2014). Prior robust empirical Bayes inference for large-scale data by conditioning on rank with application to microarray data. Biostatistics 15 60–73.
  • Lun, A. T. L., Chen, Y. and Smyth, G. K. (2016). It’s DE-licious: A recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edgeR. Methods in Molecular Biology 1418 391–416.
  • Lun, A. T. L. and Smyth, G. K. (2015a). diffHic: A bioconductor package to detect differential genomic interactions in Hi-C data. BMC Bioinformatics 16 258.
  • Lun, A. T. L. and Smyth, G. K. (2015b). From reads to regions: A Bioconductor workflow to detect differential binding in ChIP-seq data. F1000Research 4 1080.
  • Lun, A. T. L. and Smyth, G. K. (2016). csaw: A bioconductor package for differential binding analysis of ChIP-seq data using sliding windows. Nucleic Acids Res. 44 e45.
  • Majewski, I. J., Blewitt, M. E., De Graaf, C. A., McManus, E. J., Bahlo, M., Hilton, A. A., Hyland, C. D., Smyth, G. K., Corbin, J. E., Metcalf, D. et al. (2008). Polycomb repressive complex 2 (PRC2) restricts hematopoietic stem cell activity. PLOS Biology 6 e93.
  • Majewski, I. J., Ritchie, M. E., Phipson, B., Corbin, J., Pakusch, M., Ebert, A., Busslinger, M., Koseki, H., Hu, Y., Smyth, G. K. et al. (2010). Opposing roles of polycomb repressive complexes in hematopoietic stem and progenitor cells. Blood 116 731–739.
  • McCarthy, D. J. and Smyth, G. K. (2009). Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 25 765–771.
  • Morris, C. N. (1983). Parametric empirical Bayes inference: Theory and applications. J. Amer. Statist. Assoc. 78 47–65. With discussion.
  • Murie, C., Woody, O., Lee, A. Y. and Nadon, R. (2009). Comparison of small $n$ statistical tests of differential expression applied to microarrays. BMC Bioinformatics 10 45.
  • Phipson, B., Lee, S., Majewski, I. J., Alexander, W. S. and Smyth, G. K. (2016). Supplement to “Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression.” DOI:10.1214/16-AOAS920SUPP.
  • Pickrell, J. K., Marioni, J. C., Pai, A. A., Degner, J. F., Engelhardt, B. E., Nkadori, E., Veyrieras, J.-B., Stephens, M., Gilad, Y. and Pritchard, J. K. (2010a). Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464 768–772.
  • Pickrell, J. K., Pai, A. A., Gilad, Y. and Pritchard, J. K. (2010b). Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet. 6 e1001236.
  • Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W. and Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43 e47.
  • Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010). edgeR: A bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26 139–140.
  • Sartor, M. A., Tomlinson, C. R., Wesselkamper, S. C., Sivaganesan, S., Leikauf, G. D. and Medvedovic, M. (2006). Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments. BMC Bioinformatics 7 538.
  • Sheikh, B. N., Downer, N. L., Phipson, B., Vanyai, H. K., Kueh, A. J., McCarthy, D. J., Smyth, G. K., Thomas, T. and Voss, A. K. (2015). MOZ and BMI1 play opposing roles during Hox gene activation in ES cells and in body segment identity specification in vivo. Proc. Natl. Acad. Sci. USA 112 5437–5442.
  • Shi, W., Oshlack, A. and Smyth, G. K. (2010). Optimizing the noise versus bias trade-off for Illumina whole genome expression BeadChips. Nucleic Acids Res. 38 e204.
  • Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3 Article 3.
  • Tukey, J. W. (1962). The future of data analysis. Ann. Math. Stat. 33 1–67.
  • Wright, G. W. and Simon, R. M. (2003). A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics 19 2448–2455.
  • Zhou, X., Lindsay, H. and Robinson, M. D. (2014). Robustly detecting differential expression in RNA sequencing data using observation weights. Nucleic Acids Res. 42 e91.

Supplemental materials