The Annals of Statistics

Asymptotic optimality of the Westfall–Young permutation procedure for multiple testing under dependence

Nicolai Meinshausen, Marloes H. Maathuis, and Peter Bühlmann

Full-text: Open access


Test statistics are often strongly dependent in large-scale multiple testing applications. Most corrections for multiplicity are unduly conservative for correlated test statistics, resulting in a loss of power to detect true positives. We show that the Westfall–Young permutation method has asymptotically optimal power for a broad class of testing problems with a block-dependence and sparsity structure among the tests, when the number of tests tends to infinity.

Article information

Ann. Statist., Volume 39, Number 6 (2011), 3369-3391.

First available in Project Euclid: 5 March 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F03: Hypothesis testing 62J15: Paired and multiple comparisons

Multiple testing under dependence Westfall–Young procedure permutations familywise error rate asymptotic optimality high-dimensional inference sparsity rank-based nonparametric tests


Meinshausen, Nicolai; Maathuis, Marloes H.; Bühlmann, Peter. Asymptotic optimality of the Westfall–Young permutation procedure for multiple testing under dependence. Ann. Statist. 39 (2011), no. 6, 3369--3391. doi:10.1214/11-AOS946.

Export citation


  • [1] Becker, T. and Knapp, M. (2004). A powerful strategy to account for multiple testing in the context of haplotype analysis. The American Journal of Human Genetics 75 561–570.
  • [2] Benjamini, Y., Krieger, A. M. and Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93 491–507.
  • [3] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
  • [4] Blanchard, G. and Roquain, É. (2009). Adaptive false discovery rate control under independence and dependence. J. Mach. Learn. Res. 10 2837–2871.
  • [5] Bond, G. L., Hu, W. and Levine, A. (2005). A single nucleotide polymorphism in the MDM2 gene: From a molecular and cellular explanation to clinical effect. Cancer Research 65 5481–5484.
  • [6] Cheung, V. G., Spielman, R. S., Ewens, K. G., Weber, T. M., Morley, M. and Burdick, J. T. (2005). Mapping determinants of human gene expression by regional and genome-wide association. Nature 437 1365–1369.
  • [7] Clarke, S. and Hall, P. (2009). Robustness of multiple testing procedures against dependence. Ann. Statist. 37 332–358.
  • [8] Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments. Statist. Sci. 18 71–103.
  • [9] Dudoit, S. and van der Laan, M. J. (2008). Multiple Testing Procedures with Applications to Genomics. Springer, New York.
  • [10] Efron, B. (2007). Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc. 102 93–103.
  • [11] Ge, Y., Dudoit, S. and Speed, T. P. (2003). Resampling-based multiple testing for microarray data analysis. Test 12 1–77.
  • [12] Genovese, C. R., Roeder, K. and Wasserman, L. (2006). False discovery control with p-value weighting. Biometrika 93 509–524.
  • [13] Goeman, J. J. and Solari, A. (2010). The sequential rejection principle of familywise error control. Ann. Statist. 38 3782–3810.
  • [14] Good, P. I. (2011). Permutation tests. In Analyzing the Large Number of Variables in Biomedical and Satellite Imagery 5–20. Wiley, Hoboken, NJ.
  • [15] Goode, E. L., Dunning, A. M., Kuschel, B., Healey, C. S., Day, N. E., Ponder, B. A. J., Easton, D. F. and Pharoah, P. P. D. (2002). Effect of germ-line genetic variation on breast cancer survival in a population-based study. Cancer Research 62 3052–3057.
  • [16] Hall, P. and Jin, J. (2008). Properties of higher criticism under strong dependence. Ann. Statist. 36 381–402.
  • [17] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
  • [18] Hirschhorn, J. N. and Daly, M. J. (2005). Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics 6 95–108.
  • [19] Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6 65–70.
  • [20] Kruglyak, L. (1999). Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genetics 22 139–144.
  • [21] Liang, C.-L., Rice, J. A., de Pater, I., Alcock, C., Axelrod, T., Wang, A. and Marshall, S. (2004). Statistical methods for detecting stellar occultations by Kuiper belt objects: The Taiwanese–American occultation survey. Statist. Sci. 19 265–274.
  • [22] Ludbrook, J. and Dudley, H. (1998). Why permutation tests are superior to t and F tests in biomedical research. Amer. Statist. 52 127–132.
  • [23] Marchini, J., Donnelly, P. and Cardon, L. R. (2005). Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genetics 37 413–417.
  • [24] McCarthy, M. I., Abecasis, G. R., Cardon, L. R., Goldstein, D. B., Little, J., Ioannidis, J. P. A. and Hirschhorn, J. N. (2008). Genome-wide association studies for complex traits: Consensus, uncertainty and challenges. Nature Reviews Genetics 9 356–369.
  • [25] Meinshausen, N. (2006). False discovery control for multiple tests of association under general dependence. Scand. J. Stat. 33 227–237.
  • [26] Meinshausen, N. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann. Statist. 34 373–393.
  • [27] Reiner, A., Yekutieli, D. and Benjamini, Y. (2003). Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19 368–375.
  • [28] Roeder, K. and Wasserman, L. (2009). Genome-wide significance levels and weighted hypothesis testing. Statist. Sci. 24 398–413.
  • [29] Romano, J. P. and Wolf, M. (2005). Exact and approximate stepdown methods for multiple hypothesis testing. J. Amer. Statist. Assoc. 100 94–108.
  • [30] Sun, W. and Cai, T. T. (2009). Large-scale multiple testing under dependence. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 393–424.
  • [31] Westfall, P. H. and Troendle, J. F. (2008). Multiple testing with minimal assumptions. Biom. J. 50 745–755.
  • [32] Westfall, P. H. and Young, S. S. (1989). p-value adjustments for multiple tests in multivariate binomial models. J. Amer. Statist. Assoc. 84 780–786.
  • [33] Westfall, P. H. and Young, S. S. (1993). Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. Wiley, New York.
  • [34] Westfall, P. H., Zaykin, D. V. and Young, S. S. (2002). Multiple tests for genetic effects in association studies. In Biostatistical Methods: Methods in Molecular Biology (S. Looney, ed.) 184 143–168. Humana Press, Totawa, NJ.
  • [35] Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin 1 80–83.
  • [36] Winkelmann, J., Schormair, B., Lichtner, P., Ripke, S., Xiong, L., Jalilzadeh, S., Fulda, S., Pütz, B., Eckstein, G. and Hauk, S. et al. (2007). Genome-wide association study of restless legs syndrome identifies common variants in three genomic regions. Nature Genetics 39 1000–1006.
  • [37] Yekutieli, D. and Benjamini, Y. (1999). Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J. Statist. Plann. Inference 82 171–196.