Statistical Science

Multiple Testing for Exploratory Research

Jelle J. Goeman and Aldo Solari

Full-text: Open access


Motivated by the practice of exploratory research, we formulate an approach to multiple testing that reverses the conventional roles of the user and the multiple testing procedure. Traditionally, the user chooses the error criterion, and the procedure the resulting rejected set. Instead, we propose to let the user choose the rejected set freely, and to let the multiple testing procedure return a confidence statement on the number of false rejections incurred. In our approach, such confidence statements are simultaneous for all choices of the rejected set, so that post hoc selection of the rejected set does not compromise their validity. The proposed reversal of roles requires nothing more than a review of the familiar closed testing procedure, but with a focus on the non-consonant rejections that this procedure makes. We suggest several shortcuts to avoid the computational problems associated with closed testing.

Article information

Statist. Sci., Volume 26, Number 4 (2011), 584-597.

First available in Project Euclid: 28 February 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Closed testing confidence set false discovery proportion


Goeman, Jelle J.; Solari, Aldo. Multiple Testing for Exploratory Research. Statist. Sci. 26 (2011), no. 4, 584--597. doi:10.1214/11-STS356.

Export citation


  • Benjamini, Y. and Heller, R. (2008). Screening for partial conjunction hypotheses. Biometrics 64 1215–1222.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Benjamini, Y. and Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of Educational and Behavioral Statistics 25 60–83.
  • Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
  • Benjamini, Y. and Yekutieli, D. (2005). False discovery rate-adjusted multiple confidence intervals for selected parameters. J. Amer. Statist. Assoc. 100 71–81.
  • Bittman, R. M., Romano, J. P., Vallarino, C. and Wolf, M. (2009). Optimal testing of multiple hypotheses with common effect direction. Biometrika 96 399–410.
  • Brannath, W. and Bretz, F. (2010). Shortcuts for locally consonant closed test procedures. J. Amer. Statist. Assoc. 105 660–669.
  • Edwards, D. and Madsen, J. (2007). Constructing multiple test procedures for partially ordered hypothesis sets. Stat. Med. 26 5116–5124.
  • Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • Finner, H. and Roters, M. (2001). On the false discovery rate and expected type I errors. Biom. J. 43 985–1005.
  • Goeman, J. J. and Mansmann, U. (2008). Multiple testing on the directed acyclic graph of gene ontology. Bioinformatics 24 537–544.
  • Goeman, J. J. and Solari, A. (2010). The sequential rejection principle of familywise error control. Ann. Statist. 38 3782–3810.
  • Grechanovsky, E. and Hochberg, Y. (1999). Closed procedures are better and often admit a shortcut. J. Statist. Plann. Inference 76 79–91.
  • Herson, J. (2009). Data and Safety Monitoring Committees in Clinical Trials. CRC Press, Boca Raton, FL.
  • Hommel, G. (1983). Tests of the overall hypothesis for arbitrary dependence structures. Biometrical J. 25 423–430.
  • Hommel, G., Bretz, F. and Maurer, W. (2007). Powerful short-cuts for multiple testing procedures with special reference to gatekeeping strategies. Stat. Med. 26 4063–4073.
  • Huang, Y. and Hsu, J. C. (2007). Hochberg’s step-up method: Cutting corners off Holm’s step-down method. Biometrika 94 965–975.
  • Jin, J. and Cai, T. T. (2007). Estimating the null and the proportional of nonnull effects in large-scale multiple comparisons. J. Amer. Statist. Assoc. 102 495–506.
  • Langaas, M., Lindqvist, B. H. and Ferkingstad, E. (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 555–572.
  • Larner, M. (1996). Mass and its relationship to physical measurements. MS305 Data Project, Dept. Mathematics, Univ. Queensland, Australia.
  • Marcus, R., Peritz, E. and Gabriel, K. R. (1976). On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63 655–660.
  • Marenne, G., Dalmasso, C., Perdry, H., Génin, E. and Broët, P. (2009). Impaired performance of FDR-based strategies in whole-genome association studies when SNPs are excluded prior to the analysis. Genetic Epidemiology 33 45–53.
  • Meinshausen, N. (2006). False discovery control for multiple tests of association under general dependence. Scand. J. Statist. 33 227–237.
  • Meinshausen, N. (2008). Hierarchical testing of variable importance. Biometrika 95 265–278.
  • Meinshausen, N. and Bühlmann, P. (2005). Lower bounds for the number of false null hypotheses for multiple testing of associations under general dependence structures. Biometrika 92 893–907.
  • Romano, J. P. and Wolf, M. (2007). Control of generalized error rates in multiple testing. Ann. Statist. 35 1378–1408.
  • Sarkar, S. K. (1998). Some probability inequalities for ordered MTP2 random variables: A proof of the Simes conjecture. Ann. Statist. 26 494–504.
  • Schweder, T. and Spjøtvoll, E. (1982). Plots of p-values to evaluate many tests simultaneously. Biometrika 69 493–502.
  • Šidák, Z. (1967). Rectangular confidence regions for the means of multivariate normal distributions. J. Amer. Statist. Assoc. 62 626–633.
  • Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika 73 751–754.
  • Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498.
  • Tukey, J. (1980). We need both exploratory and confirmatory. Amer. Statist. 34 23–25.
  • Van De Wiel, M., Berkhof, J. and Van Wieringen, W. (2009). Testing the prediction error difference between 2 predictors. Biostatistics 10 550–560.
  • van der Laan, M. J., Dudoit, S. and Pollard, K. S. (2004). Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives. Stat. Appl. Genet. Mol. Biol. 3 Art. 15, 27 pp. (electronic).
  • Zaykin, D., Zhivotovsky, L. A., Westfall, P. and Weir, B. (2002). Truncated product method for combining p-values. Genetic Epidemiology 22 170–185.

See also

  • Discussion of: “Multiple Testing for Exploratory Research” by Jelle J. Goeman and Aldo Solari.
  • Discussion of: “Multiple Testing for Exploratory Research” by Jelle J. Goeman and Aldo Solari.
  • Discussion of: “Multiple Testing for Exploratory Research” by Jelle J. Goeman and Aldo Solari.
  • Rejoinder: Multiple Testing for Exploratory Research.