The Annals of Statistics

Size, power and false discovery rates

Bradley Efron

Full-text: Open access

Abstract

Modern scientific technology has provided a new class of large-scale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spectroscopy, imaging, and social science surveys. This paper uses false discovery rate methods to carry out both size and power calculations on large-scale problems. A simple empirical Bayes approach allows the false discovery rate (fdr) analysis to proceed with a minimum of frequentist or Bayesian modeling assumptions. Closed-form accuracy formulas are derived for estimated false discovery rates, and used to compare different methodologies: local or tail-area fdr’s, theoretical, permutation, or empirical null hypothesis estimates. Two microarray data sets as well as simulations are used to evaluate the methodology, the power diagnostics showing why nonnull cases might easily fail to appear on a list of “significant” discoveries.

Article information

Source
Ann. Statist., Volume 35, Number 4 (2007), 1351-1377.

Dates
First available in Project Euclid: 29 August 2007

Permanent link to this document
https://projecteuclid.org/euclid.aos/1188405614

Digital Object Identifier
doi:10.1214/009053606000001460

Mathematical Reviews number (MathSciNet)
MR2351089

Zentralblatt MATH identifier
1123.62008

Subjects
Primary: 62J07: Ridge regression; shrinkage estimators 62G07: Density estimation

Keywords
Local false discovery rates empirical Bayes large-scale simultaneous inference empirical null

Citation

Efron, Bradley. Size, power and false discovery rates. Ann. Statist. 35 (2007), no. 4, 1351--1377. doi:10.1214/009053606000001460. https://projecteuclid.org/euclid.aos/1188405614


Export citation

References

  • Allison, D., Gadbury, G., Heo, M., Fernández, J., Lee, C.-K., Prolla, T. and Weindruch, R. (2002). A mixture model approach for the analysis of microarray gene expression data. Comput. Statist. Data Anal. 39 1--20.
  • Aubert, J., Bar-Hen, A., Daudin, J. and Robin, S. (2004). Determination of the differentially expressed genes in microarray experiments using local FDR. BMC Bioinformatics 5 125.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289--300.
  • Broberg, P. (2004). A new estimate of the proportion unchanged genes in a microarray experiment. Genome Biology 5 (5) P10.
  • Do, K.-A., Müller, P. and Tang, F. (2005). A Bayesian mixture model for differential gene expression. Appl. Statist. 54 627--644.
  • Dudoit, S., Shaffer, J. and Boldrick, J. (2003). Multiple hypothesis testing in microarray experiments. Statist. Sci. 18 71--103.
  • Dudoit, S., van der Laan, M. and Pollard, K. (2004). Multiple testing. I. Single-step procedures for the control of general type I error rates. Stat. Appl. Genet. Mol. Biol. 3 article 13. Available at www.bepress.com/sagmb/vol3/iss1/art13.
  • Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96--104.
  • Efron, B. (2005). Local false discovery rates. Available at www-stat.stanford.edu/~brad/papers/False.pdf.
  • Efron, B. (2007). Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc. 102 93--103.
  • Efron, B. and Gous, A. (2001). Scales of evidence for model selection: Fisher versus Jeffreys (with discussion). In Model Selection (P. Lahiri, ed.) 208--256. IMS, Beachwood, OH.
  • Efron, B. and Tibshirani, R. (1996). Using specially designed exponential families for density estimation. Ann. Statist. 24 2431--2461.
  • Efron, B. and Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays. Genetic Epidemiology 23 70--86.
  • Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151--1160.
  • Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035--1061.
  • Gottardo, R., Raftery, A., Yee Yeung, K. and Bumgarner, R. (2006). Bayesian robust inference for differential gene expression in microarrays with multiple samples. Biometrics 62 10--18.
  • Heller, G. and Qing, J. (2003). A mixture model approach for finding informative genes in microarray studies. Unpublished manuscript.
  • Johnstone, I. and Silverman, B. (2004). Needles and straw in haystacks: Empirical Bayes estimates of sparse sequences. Ann. Statist. 32 1594--1649.
  • Kendziorski, C., Newton, M., Lan, H. and Gould, M. (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med. 22 3899--3914.
  • Kerr, M., Martin, M. and Churchill, G. (2000). Analysis of variance for gene expression microarray data. J. Comput. Biol. 7 819--837.
  • Langaas, M., Lindqvist, B. and Ferkingstad, E. (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 555--572.
  • Lee, M.-L. T., Kuo, F., Whitmore, G. and Sklar, J. (2000). Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. USA 97 9834--9839.
  • Liao, J., Lin, Y., Selvanayagam, Z. and Weichung, J. (2004). A mixture model for estimating the local false discovery rate in DNA microarray analysis. Bioinformatics 20 2694--2701.
  • Lindsey, J. (1974). Comparison of probability distributions. J. Roy. Statist. Soc. Ser. B 36 38--47.
  • Lindsey, J. (1974). Construction and comparison of statistical models. J. Roy. Statist. Soc. Ser. B 36 418--425.
  • Newton, M., Kendziorski, C., Richmond, C., Blattner, F. and Tsui, K. (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8 37--52.
  • Newton, M., Noueiry, A., Sarkar, D. and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture model. Biostatistics 5 155--176.
  • Pan, W., Lin, J. and Le, C. (2003). A mixture model approach to detecting differentially expressed genes with microarray data. Functional and Integrative Genomics 3 117--124.
  • Pawitan, Y., Michiels, S., Koscielny, S., Gusnanto, A. and Ploner, A. (2005). False discovery rate, sensitivity and sample size for microarray studies. Bioinformatics 21 3017--3024.
  • Pounds, S. and Morris, S. (2003). Estimating the occurrence of false positions and false negatives in microarray studies by approximating and partitioning the empirical distribution of $p$-values. Bioinformatics 19 1236--1242.
  • Singh, D., Febbo, P., Ross, K., Jackson, D., Manola, J., Ladd, C., Tamayo, P., Renshaw, A., D'Amico, A., Richie, J., Lander, E., Loda, M., Kantoff, P., Golub, T. and Sellers, R. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1 203--209.
  • Storey, J. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479--498.
  • Storey, J., Taylor, J. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 187--206.
  • van't Wout, A., Lehrman, G., Mikheeva, S., O'Keeffe, G. Katze, M., Bumgarner, R., Geiss, G. and Mullins, J. (2003). Cellular gene expression upon human immunodeficiency virus type 1 infection of CD4$^+$-T-cell lines. J. Virology 77 1392--1402.