The Annals of Applied Statistics

Simultaneous inference: When should hypothesis testing problems be combined?

Bradley Efron

Full-text: Open access


Modern statisticians are often presented with hundreds or thousands of hypothesis testing problems to evaluate at the same time, generated from new scientific technologies such as microarrays, medical and satellite imaging devices, or flow cytometry counters. The relevant statistical literature tends to begin with the tacit assumption that a single combined analysis, for instance, a False Discovery Rate assessment, should be applied to the entire set of problems at hand. This can be a dangerous assumption, as the examples in the paper show, leading to overly conservative or overly liberal conclusions within any particular subclass of the cases. A simple Bayesian theory yields a succinct description of the effects of separation or combination on false discovery rate analyses. The theory allows efficient testing within small subclasses, and has applications to “enrichment,” the detection of multi-case effects.

Article information

Ann. Appl. Stat. Volume 2, Number 1 (2008), 197-223.

First available in Project Euclid: 24 March 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

False discovery rates separate-class model enrichment


Efron, Bradley. Simultaneous inference: When should hypothesis testing problems be combined?. Ann. Appl. Stat. 2 (2008), no. 1, 197--223. doi:10.1214/07-AOAS141.

Export citation


  • Benjamini, Y. and Hochberg (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing., J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate under dependency., Ann. Statist. 29 1165–1188.
  • Efron, B. (2004a). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis., J. Amer. Statist. Assoc. 99 96–104.
  • Efron, B. (2004b). The estimation of prediction error: Covariance penalties and cross-validation (with discussion)., J. Amer. Statist. Assoc. 99 619–642.
  • Efron, B. (2005). Local false discovery rates. Available at,
  • Efron, B. (2007a). Correlation and large-scale significance testing., J. Amer. Statist. Assoc. 102 93–103.
  • Efron, B. (2007b). Size, power and false discovery rates., Ann. Statist. 35 1351–1377.
  • Efron, B. and Tibshirani, R. (2007). On testing the significance of sets of genes., Ann. Appl. Statist. 1 107–129.
  • Ferkinstad, E., Frigessi, A., Thorleifsson, G. and Kong, A. (2007). Covariate-modulated false discovery rates. Available at,
  • Genovese, C., Roeder, K. and Wasserman, L. (2006). False discovery control with, p-value weighting. Biometrika 93 509–524.
  • Lehmann, E. and Romano, J. (2005)., Testing Statistical Hypotheses, 3rd ed. Springer, New York.
  • Newton, M., Quintana, F., den Boon, J., Sengupta, S. and Ahlquist, P. (2007). Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis., Ann. Appl. Statist. 1 85–106.
  • Schwartzman, A., Dougherty, R. F. and Taylor, J. E. (2005). Cross-subject comparison of principal diffusion direction maps., Magn. Reson. Med. 53 1423–1431.
  • Smyth, G. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments., Stat. Appl. Genet. Mol. Biol. 3 (1). Available at
  • Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S. and Mesirov, J. P. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles., Proc. Natl. Acad. Sci. 102 15545–15550.