Microarrays, Empirical Bayes and the Two-Groups Model



Statistical Science

Microarrays, Empirical Bayes and the Two-Groups Model

Bradley Efron

Source: Statist. Sci. Volume 23, Number 1 (2008), 1-22.

Abstract

The classic frequentist theory of hypothesis testing developed by Neyman, Pearson and Fisher has a claim to being the twentieth century’s most influential piece of applied mathematics. Something new is happening in the twenty-first century: high-throughput devices, such as microarrays, routinely require simultaneous hypothesis tests for thousands of individual cases, not at all what the classical theory had in mind. In these situations empirical Bayes information begins to force itself upon frequentists and Bayesians alike. The two-groups model is a simple Bayesian construction that facilitates empirical Bayes analysis. This article concerns the interplay of Bayesian and frequentist ideas in the two-groups setting, with particular attention focused on Benjamini and Hochberg’s False Discovery Rate method. Topics include the choice and meaning of the null hypothesis in large-scale testing situations, power considerations, the limitations of permutation methods, significance testing for groups of cases (such as pathways in microarray studies), correlation effects, multiple confidence intervals and Bayesian competitors to the two-groups model.

Keywords: Simultaneous tests; empirical null; false discovery rates

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Alternatively, the document is available for a cost of $15. Select the "buy article" button below to purchase this document from a secured VeriSign, Inc. site.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ss/1215441276
Digital Object Identifier: doi:10.1214/07-STS236
Mathematical Reviews number (MathSciNet): MR2431866

References

Allison, D., Gadbury, G., Heo, M., Fernandez, J., Lee, C. K., Prolla, T. and Weindruch, R. (2002). A mixture model approach for the analysis of microarray gene expression data. Computat. Statist. Data Anal. 39 1–20.
Mathematical Reviews (MathSciNet): MR1895555
Aubert, J., Bar-hen, A., Daudin, J. and Robin, S. (2004). Determination of the differentially expressed genes in microarray experiments using local FDR. BMC Bioinformatics 5 125.
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
Mathematical Reviews (MathSciNet): MR1325392
Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate under dependency. Ann. Statist. 29 1165–1188.
Mathematical Reviews (MathSciNet): MR1869245
Digital Object Identifier: doi:10.1214/aos/1013699998
Project Euclid: euclid.aos/1013699998
Benjamini, Y. and Yekutieli, D. (2005). False discovery rate-adjusted multiple confidence intervals for selected parameters. J. Amer. Statist. Assoc. 100 71–93.
Mathematical Reviews (MathSciNet): MR2156820
Digital Object Identifier: doi:10.1198/016214504000001907
Broberg, P. (2005). A comparative review of estimates of the proportion unchanged genes and the false discovery rate. BMC Bioinformatics 6 199.
Do, K.-A., Mueller, P. and Tang, F. (2005). A Bayesian mixture model for differential gene expression. J. Roy. Statist. Soc. Ser. C 54 627–644.
Dudoit, S., Shaffer, J. and Boldrick, J. (2003). Multiple hypothesis testing in microarray experiments. Statist. Sci. 18 71–103.
Mathematical Reviews (MathSciNet): MR1997066
Digital Object Identifier: doi:10.1214/ss/1056397487
Project Euclid: euclid.ss/1056397487
Efron, B. (2003). Robbins, empirical Bayes, and microarrays. Ann. Statist. 31 366–378.
Mathematical Reviews (MathSciNet): MR1983533
Digital Object Identifier: doi:10.1214/aos/1051027871
Project Euclid: euclid.aos/1051027871
Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
Mathematical Reviews (MathSciNet): MR2054289
Digital Object Identifier: doi:10.1198/016214504000000089
Efron, B. (2005). Local false discovery rates. Available at http://www-stat.stanford.edu/~brad/papers/False.pdf.
Efron, B. (2006). Size, power, and false discovery rates. Available at http://www-stat.stanford.edu/~brad/papers/Size.pdf. Ann. Appl. Statist. To appear.
Mathematical Reviews (MathSciNet): MR2351089
Digital Object Identifier: doi:10.1214/009053606000001460
Project Euclid: euclid.aos/1188405614
Efron, B. (2007). Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc. 102 93–103.
Mathematical Reviews (MathSciNet): MR2293302
Digital Object Identifier: doi:10.1198/016214506000001211
Efron, B. and Gous, A. (2001). Scales of evidence for model selection: Fisher versus Jeffreys. Model Selection IMS Monograph 38 208–256.
Mathematical Reviews (MathSciNet): MR2000754
Digital Object Identifier: doi:10.1214/lnms/1215540972
Efron, B. and Morris, C. (1975). Data analysis using Stein’s estimator and its generalizations. J. Amer. Statist. Assoc. 70 311–319.
Mathematical Reviews (MathSciNet): MR391403
Digital Object Identifier: doi:10.2307/2285453
Efron, B. and Tibshirani, R. (1996). Using specially designed exponential families for density estimation. Ann. Statist. 24 2431–2461.
Mathematical Reviews (MathSciNet): MR1425960
Digital Object Identifier: doi:10.1214/aos/1032181161
Project Euclid: euclid.aos/1032181161
Efron, B. and Tibshirani, R. (2006). On testing the significance of sets of genes. Available at http://www-stat.stanford.edu/~brad/papers/genesetpaper.pdf. Ann. Appl. Statist. To appear.
Mathematical Reviews (MathSciNet): MR2393843
Digital Object Identifier: doi:10.1214/07-AOAS101
Project Euclid: euclid.aoas/1183143731
Efron, B. and Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays. Genetic Epidemiology 23 70–86.
Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
Mathematical Reviews (MathSciNet): MR1946571
Digital Object Identifier: doi:10.1198/016214501753382129
Hedenfalk, I., Duggen, D., Chen, Y., et al. (2001). Gene expression profiles in hereditary breast cancer. New Engl. J. Medicine 344 539–548.
Heller, G. and Qing, J. (2003). A mixture model approach for finding informative genes in microarray studies. Unpublished manuscript.
Kerr, M., Martin, M. and Churchill, G. (2000). Analysis of variance in microarray data. J. Comp. Biology 7 819–837.
Langass, M., Lindquist, B. and Ferkinstad, E. (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. Roy. Statist. Soc. Ser. B 67 555–572.
Mathematical Reviews (MathSciNet): MR2168204
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00515.x
Lee, M. L. T., Kuo, F., Whitmore, G. and Sklar, J. (2000). Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. 97 9834–9838.
Lehmann, E. and Romano, J. (2005). Generalizations of the familywise error rate. Ann. Statist. 33 1138–1154.
Mathematical Reviews (MathSciNet): MR2195631
Digital Object Identifier: doi:10.1214/009053605000000084
Project Euclid: euclid.aos/1120224098
Lehmann, E. and Romano, J. (2005). Testing Statistical Hypotheses, 3rd ed. Springer, New York.
Mathematical Reviews (MathSciNet): MR2135927
Zentralblatt MATH: 1076.62018
Lewin, A. Richardson, S., Marshall, C., Glaser, A. and Aitman, Y. (2006). Bayesian modeling of differential gene expression. Biometrics 62 1–9.
Mathematical Reviews (MathSciNet): MR2226550
Digital Object Identifier: doi:10.1111/j.1541-0420.2005.00394.x
Liang, C., Rice, J., de Pater, I., Alcock, C., Axelrod, T., Wang, A. and Marshall, S. (2004). Statistical methods for detecting stellar occultations by Kuiper belt objects: The Taiwanese-American occultation survey. Statist. Sci. 19 265–274.
Mathematical Reviews (MathSciNet): MR2146947
Digital Object Identifier: doi:10.1214/088342304000000378
Project Euclid: euclid.ss/1105714162
Liao, J., Lin, Y., Selvanayagam, Z. and Weichung, J. (2004). A mixture model for estimating the local false discovery rate in DNA microarray analysis. Bioinformatics 20 2694–2701.
Newton, M., Kendziorski, C., Richmond, C., Blattner, F. and Tsui, K. (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. J. Comp. Biology 8 37–52.
Newton, M., Noveiry, A., Sarkar, D. and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture model. Biostatistics 5 155–176.
Pan, W., Lin, J. and Le, C. (2003). A mixture model approach to detecting differentially expressed genes with microarray data. Functional and Integrative Genomics 3 117–124.
Parmigiani, G., Garrett, E., Ambazhagan, R. and Gabrielson, E. (2002). A statistical framework for expression-based molecular classification in cancer. J. Roy. Statist. Soc. Ser. B 64 717–736.
Mathematical Reviews (MathSciNet): MR1979385
Digital Object Identifier: doi:10.1111/1467-9868.00358
Pawitan, Y., Murthy, K., Michiels, J. and Ploner, A. (2005). Bias in the estimation of false discovery rate in microarray studies. Bioinformatics 21 3865–3872.
Pounds, S. and Morris, S. (2003). Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of the p-values. Bioinformatics 19 1236–1242.
Qui, X., Brooks, A., Klebanov, L. and Yakovlev, A. (2005). The effects of normalization on the correlation structure of microarray data. BMC Bioinformatics 6 120.
Rogosa, D. (2003). Accuracy of API index and school base report elements: 2003 Academic Performance Index, California Department of Education. Available at http://www.cde.cagov/ta/ac/ap/researchreports.asp.
Schwartzman, A., Dougherty, R. F. and Taylor, J. E. (2005). Cross-subject comparison of principal diffusion direction maps. Magnetic Resonance in Medicine 53 1423–1431.
Singh, D., Febbo, P., Ross, K., Jackson, D., Manola, J., Ladd, C. Tamayo, P., Renshaw, A., D’Amico, A., Richie, J., Lander, E., Loda, M., Kantoff, P., Golub, T. and Sellers, R. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1 302–309.
Smyth, G. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3 1–29.
Mathematical Reviews (MathSciNet): MR2101454
Digital Object Identifier: doi:10.2202/1544-6115.1027
Storey, J. (2002). A direct approach to false discovery rates. J. Roy. Statist. Soc. Ser. B 64 479–498.
Mathematical Reviews (MathSciNet): MR1924302
Digital Object Identifier: doi:10.1111/1467-9868.00346
Storey, J., Taylor, J. and Siegmund, D. (2005). Strong control conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. Roy. Statist. Soc. Ser. B 66 187–205.
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S. and Mesirov, J. P. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102 15545–15550.
Turnbull, B. (2006). BEST proteomics data. Available at www.stanford.edu/people/brit.turnbull/BESTproteomics.pdf.
Tusher, V., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to transcriptional responses to ionizing radiation. Proc. Natl. Acad. Sci. USA 98 5116–5121.
van’t Wout, A., Lehrma, G., Mikheeva, S., O’Keeffe, G., Katze, M., Bumgarner, R., Geiss, G. and Mullins, J. (2003). Cellular gene expression upon human immunodeficiency virus type 1 infection of CD$+ T-Cell lines. J. Virology 77 1392–1402.

2008 © Institute of Mathematical Statistics