## The Annals of Statistics

### Control of generalized error rates in multiple testing

#### Abstract

Consider the problem of testing $s$ hypotheses simultaneously. The usual approach restricts attention to procedures that control the probability of even one false rejection, the familywise error rate (FWER). If $s$ is large, one might be willing to tolerate more than one false rejection, thereby increasing the ability of the procedure to correctly reject false null hypotheses. One possibility is to replace control of the FWER by control of the probability of $k$ or more false rejections, which is called the $k$-FWER. We derive both single-step and step-down procedures that control the $k$-FWER in finite samples or asymptotically, depending on the situation. We also consider the false discovery proportion (FDP) defined as the number of false rejections divided by the total number of rejections (and defined to be 0 if there are no rejections). The false discovery rate proposed by Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289–300] controls $E$(FDP). Here, the goal is to construct methods which satisfy, for a given $γ$ and $α$, $P\{\mathrm{FDP} \gt γ\} ≤ α$, at least asymptotically. In contrast to the proposals of Lehmann and Romano [Ann. Statist. 33 (2005) 1138–1154], we construct methods that implicitly take into account the dependence structure of the individual test statistics in order to further increase the ability to detect false null hypotheses. This feature is also shared by related work of van der Laan, Dudoit and Pollard [Stat. Appl. Genet. Mol. Biol. 3 (2004) article 15], but our methodology is quite different. Like the work of Pollard and van der Laan [Proc. 2003 International Multi-Conference in Computer Science and Engineering, METMBS’03 Conference (2003) 3–9] and Dudoit, van der Laan and Pollard [Stat. Appl. Genet. Mol. Biol. 3 (2004) article 13], we employ resampling methods to achieve our goals. Some simulations compare finite sample performance to currently available methods.

#### Article information

Source
Ann. Statist., Volume 35, Number 4 (2007), 1378-1408.

Dates
First available in Project Euclid: 29 August 2007

Permanent link to this document
https://projecteuclid.org/euclid.aos/1188405615

Digital Object Identifier
doi:10.1214/009053606000001622

Mathematical Reviews number (MathSciNet)
MR2351090

Zentralblatt MATH identifier
1127.62063

Subjects
Primary: 62J15: Paired and multiple comparisons
Secondary: 62G10: Hypothesis testing

#### Citation

Romano, Joseph P.; Wolf, Michael. Control of generalized error rates in multiple testing. Ann. Statist. 35 (2007), no. 4, 1378--1408. doi:10.1214/009053606000001622. https://projecteuclid.org/euclid.aos/1188405615

#### References

• Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
• Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
• Beran, R. (1984). Bootstrap methods in statistics. Jahresber. Deutsch. Math.-Verein. 86 14–30.
• Beran, R. (1986). Simulated power functions. Ann. Statist. 14 151–173.
• Beran, R. (1988). Balanced simultaneous confidence sets. J. Amer. Statist. Assoc. 83 679–686.
• Beran, R. (1988). Prepivoting test statistics: A bootstrap view of asymptotic refinements. J. Amer. Statist. Assoc. 83 687–697.
• Davison, A. C. and Hinkley, D. V. (1997). Bootstrap Methods and Their Application. Cambridge Univ. Press.
• Dudoit, S., van der Laan, M. J. and Birkner, M. D. (2004). Multiple testing procedures for controlling tail probability error rates. Working Paper 166, Div. Biostatics, Univ. California, Berkeley. Available at www.bepress.com/ucbbiostat/paper166.
• Dudoit, S., van der Laan, M. J. and Pollard, K. S. (2004). Multiple testing. I. Single-step procedures for the control of general type I error rates. Stat. Appl. Genet. Mol. Biol. 3 article 13. Available at www.bepress.com/sagmb/vol3/iss1/art13.
• Genovese, C. R. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035–1061.
• Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer, New York.
• Hall, P. and Wilson, S. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics 47 757–762.
• Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6 65–70.
• Hommel, G. and Hoffman, T. (1988). Controlled uncertainty. In Multiple Hypothesis Testing (P. Bauer, G. Hommel and E. Sonnemann, eds.) 154–161. Springer, Heidelberg.
• Korn, E. L., Troendle, J. F., McShane, L. M. and Simon, R. (2004). Controlling the number of false discoveries: Application to high-dimensional genomic data. J. Statist. Plann. Inference 124 379–398.
• Lahiri, S. N. (2003). Resampling Methods for Dependent Data. Springer, New York.
• Lehmann, E. L. and Romano, J. P. (2005). Generalizations of the familywise error rate. Ann. Statist. 33 1138–1154.
• Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer, New York.
• Lehmann, E. L., Romano, J. P. and Shaffer, J. P. (2005). On optimality of stepdown and stepup multiple test procedures. Ann. Statist. 33 1084–1108.
• Perone Pacifico, M., Genovese, C. R., Verdinelli, I. and Wasserman, L. (2004). False discovery control for random fields. J. Amer. Statist. Assoc. 99 1002–1014.
• Politis, D. N., Romano, J. P. and Wolf, M. (1999). Subsampling. Springer, New York.
• Pollard, K. S. and van der Laan, M. J. (2003). Multiple testing for gene expression data: An investigation of null distributions with consequences for the permutation test. In Proc. 2003 International Multi-Conference in Computer Science and Engineering, METMBS'03 Conference 3–9.
• Rogers, J. and Hsu, J. (2001). Multiple comparisons of biodiversity. Biom. J. 43 617–625.
• Romano, J. P. (1988). A bootstrap revival of some nonparametric distance tests. J. Amer. Statist. Assoc. 83 698–708.
• Romano, J. P. and Shaikh, A. M. (2006). On stepdown control of the false discovery proportion. In Optimality: The Second Erich L. Lehmann Symposium (J. Rojo, ed.) 33–50. IMS, Beachwood, OH.
• Romano, J. P. and Shaikh, A. M. (2006). Stepup procedures for control of generalizations of the familywise error rate. Ann. Statist. 34 1850–1873.
• Romano, J. P. and Wolf, M. (2005). Control of generalized error rates in multiple testing. Working Paper 245, IEW, Univ. Zurich. Available at www.iew.unizh.ch/wp/index.php.
• Romano, J. P. and Wolf, M. (2005). Exact and approximate step-down methods for multiple hypothesis testing. J. Amer. Statist. Assoc. 100 94–108.
• Romano, J. P. and Wolf, M. (2005). Stepwise multiple testing as formalized data snooping. Econometrica 73 1237–1282.
• Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Ann. Statist. 30 239–257.
• Shao, J. and Tu, D. (1995). The Jackknife and Bootstrap. Springer, New York.
• Tu, W. and Zhou, X. (2000). Pairwise comparisons of the means of skewed data. J. Statist. Plann. Inference 88 59–74.
• van der Laan, M. J., Birkner, M. D. and Hubbard, A. E. (2005). Empirical Bayes and resampling based multiple testing procedure controlling tail probability of the proportion of false positives. Stat. Appl. Genet. Mol. Biol. 4 article 29. Available at www.bepress.com/sagmb/vol4/iss1/art29.
• van der Laan, M. J., Dudoit, S. and Pollard, K. S. (2004). Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives. Stat. Appl. Genet. Mol. Biol. 3 article 15. Available at www.bepress.com/sagmb/vol3/iss1/art15.
• Westfall, P. H. and Young, S. S. (1993). Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. Wiley, New York.