Annals of Statistics

Power-enhanced multiple decision functions controlling family-wise error and false discovery rates

Edsel A. Peña, Joshua D. Habiger, and Wensong Wu

Full-text: Open access


Improved procedures, in terms of smaller missed discovery rates (MDR), for performing multiple hypotheses testing with weak and strong control of the family-wise error rate (FWER) or the false discovery rate (FDR) are developed and studied. The improvement over existing procedures such as the Šidák procedure for FWER control and the Benjamini–Hochberg (BH) procedure for FDR control is achieved by exploiting possible differences in the powers of the individual tests. Results signal the need to take into account the powers of the individual tests and to have multiple hypotheses decision functions which are not limited to simply using the individual p-values, as is the case, for example, with the Šidák, Bonferroni, or BH procedures. They also enhance understanding of the role of the powers of individual tests, or more precisely the receiver operating characteristic (ROC) functions of decision processes, in the search for better multiple hypotheses testing procedures. A decision-theoretic framework is utilized, and through auxiliary randomizers the procedures could be used with discrete or mixed-type data or with rank-based nonparametric tests. This is in contrast to existing p-value based procedures whose theoretical validity is contingent on each of these p-value statistics being stochastically equal to or greater than a standard uniform variable under the null hypothesis. Proposed procedures are relevant in the analysis of high-dimensional “large M, small n” data sets arising in the natural, physical, medical, economic and social sciences, whose generation and creation is accelerated by advances in high-throughput technology, notably, but not limited to, microarray technology.

Article information

Ann. Statist., Volume 39, Number 1 (2011), 556-583.

First available in Project Euclid: 15 February 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F03: Hypothesis testing
Secondary: 62J15: Paired and multiple comparisons

Benjamini–Hochberg procedure Bonferroni procedure decision process false discovery rate (FDR) family wise error rate (FWER) Lagrangian optimization Neyman–Pearson most powerful test microarray analysis reverse martingale missed discovery rate (MDR) multiple decision function and process multiple hypotheses testing optional sampling theorem power function randomized p-values generalized multiple decision p-values ROC function Šidák procedure


Peña, Edsel A.; Habiger, Joshua D.; Wu, Wensong. Power-enhanced multiple decision functions controlling family-wise error and false discovery rates. Ann. Statist. 39 (2011), no. 1, 556--583. doi:10.1214/10-AOS844.

Export citation


  • [1] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • [2] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
  • [3] Bonferroni, C. (1936). Teoria statistica delle classi e calcolo delle probabilita. Publ. R. Instit. Super. Sci. Econ. Commere. Firenze 8 1–62.
  • [4] Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. Chapman and Hall, London.
  • [5] Dudoit, S., Gilbert, H. N. and van der Laan, M. (2007). Resampling-based empirical Bayes multiple testing procedures for controlling generalized tail probability and expected value error rates: Focus on the false discovery rate and simulation study. Technical report, Univ. California, Berkeley.
  • [6] Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments. Statist. Sci. 18 71–103.
  • [7] Dudoit, S. and van der Laan, M. J. (2008). Multiple Testing Procedures With Applications to Genomics. Springer, New York.
  • [8] Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
  • [9] Efron, B. (2007). Size, power and false discovery rates. Ann. Statist. 35 1351–1377.
  • [10] Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statist. Sci. 23 1–22.
  • [11] Efron, B. (2008). Simultaneous inference: When should hypothesis testing problems be combined? Ann. Appl. Statist. 2 197–223.
  • [12] Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • [13] Ferkingstad, E., Frigessi, A., Rue, H., Thorleifsson, G. and Kong, A. (2008). Unsupervised empirical Bayesian multiple testing with external covariates. Ann. Appl. Statist. 2 714–735.
  • [14] Foster, D. P. and Stine, R. A. (2008). α-investing: A procedure for sequential control of expected false discoveries. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 429–444.
  • [15] Genovese, C. and Wasserman, L. (2002). Operating characteristic and extensions of the false discovery rate procedure. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 499–517.
  • [16] Genovese, C. R., Roeder, K. and Wasserman, L. (2006). False discovery control with p-value weighting. Biometrika 93 509–524.
  • [17] Guindani, M., Muller, P. and Zhang, S. (2009). A Bayesian discovery procedure. J. Roy. Statist. Soc. Ser. B 71 905–925.
  • [18] Habiger, J. and Peña, E. A. (2010). Randomized P-values and nonparametric procedures in multiple testing. J. Nonparametr. Stat. 1–22. DOI: 10.1080/10485252.2010.482154.
  • [19] Ihaka, R. and Gentleman, R. (1996). R: A language for data analysis and graphics. J. Comput. Graph. Statist. 5 299–314.
  • [20] Jin, J. and Cai, T. T. (2007). Estimating the null and the proportional of nonnull effects in large-scale multiple comparisons. J. Amer. Statist. Assoc. 102 495–506.
  • [21] Kang, G., Ye, K., Liu, N., Allison, D. and Gao, G. (2009). Weighted multiple hypothesis testing procedures. Stat. Appl. Genet. Mol. Biol. 8 1–21.
  • [22] Lan, K. K. G. and DeMets, D. L. (1983). Discrete sequential boundaries for clinical trials. Biometrika 70 659–663.
  • [23] Langaas, M., Lindqvist, B. H. and Ferkingstad, E. (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 555–572.
  • [24] Lehmann, E. L. (1997). Testing Statistical Hypotheses, 2nd ed. Springer, New York.
  • [25] Lehmann, E. L., Romano, J. P. and Shaffer, J. P. (2005). On optimality of stepdown and stepup multiple test procedures. Ann. Statist. 33 1084–1108.
  • [26] Müller, P., Parmigiani, G., Robert, C. and Rousseau, J. (2004). Optimal sample size for multiple testing: The case of gene expression microarrays. J. Amer. Statist. Assoc. 99 990–1001.
  • [27] Neyman, J. and Pearson, E. (1933). On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. R. Soc. Ser. A 231 289–337.
  • [28] Peña, E., Habiger, J. and Wu, W. (2010). Supplement to “Power-enhanced multiple decision functions controlling family-wise error and false discovery rates.” DOI: 10.1214/10-AOS844SUPP.
  • [29] Roquain, E. and van de Wiel, M. A. (2009). Optimal weighting for false discovery rate control. Electron. J. Stat. 3 678–711.
  • [30] Rubin, D., Dudoit, S. and van der Laan, M. (2006). A method to increase the power of multiple testing procedures through sample splitting. Stat. Appl. Genet. Mol. Biol. 5 Art. 19, 20 pp. (electronic).
  • [31] Sarkar, S. K. (1998). Some probability inequalities for ordered MTP2 random variables: A proof of the Simes conjecture. Ann. Statist. 26 494–504.
  • [32] Sarkar, S. K. (2008). Generalizing Simes’ test and Hochberg’s stepup procedure. Ann. Statist. 36 337–363.
  • [33] Sarkar, S. K., Zhou, T. and Ghosh, D. (2008). A general decision theoretic formulation of procedures controlling FDR and FNR from a Bayesian perspective. Statist. Sinica 18 925–945.
  • [34] Schweder, T. and Spjøtvoll, E. (1982). Plots of P-values to evaluate many tests simultaneously. Biometrika 69 493–502.
  • [35] Scott, J. and Berger, J. (2006). An exploration of aspects of Bayesian multiple testing. J. Statist. Plann. Inference 136 2144–2162.
  • [36] Šidák, Z. (1967). Rectangular confidence regions for the means of multivariate normal distributions. J. Amer. Statist. Assoc. 62 626–633.
  • [37] Sorić, B. (1989). Statistical “discoveries” and effect-size estimation. J. Amer. Statist. Assoc. 84 608–610.
  • [38] Spjøtvoll, E. (1972). On the optimality of some multiple comparison procedures. Ann. Math. Statist. 43 398–411.
  • [39] Stevenson, R. L. (1886). The Strange Case of Dr Jekyll and Mr Hyde, 1st ed. Longmans, Green and Co., London.
  • [40] Storey, J. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498.
  • [41] Storey, J. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value. Ann. Statist. 31 2012–2035.
  • [42] Storey, J. (2007). The optimal discovery procedure: A new approach to simultaneous significance testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 347–368.
  • [43] Storey, J., Dai, J. and Leek, J. (2007). The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. Biostatistics 8 414–432.
  • [44] Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 187–205.
  • [45] Sun, W. and Cai, T. (2007). Oracle and adaptive compound decision rules for false discovery rate control. J. Amer. Statist. Assoc. 102 901–912.
  • [46] Wasserman, L. and Roeder, K. (2006). Weighted hypothesis testing. Technical report, Carnegie-Mellon Univ. Available at
  • [47] Westfall, P. and Troendle, J. (2008). Multiple testing with minimal assumptions. Biom. J. 50 1–11.
  • [48] Westfall, P. and Young, S. (1993). Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. Wiley, New York.
  • [49] Westfall, P. H., Krishen, A. and Young, S. S. (1998). Using prior information to allocate significance levels for multiple endpoints. Stat. Med. 17 2107–2119.

Supplemental materials

  • Supplementary material: Supplement to “Power-Enhanced Multiple Decision Functions Controlling Family-Wise Error and False Discovery Rates”. The proofs of lemmas, propositions, theorems and corollaries are provided in this supplemental article [28].