Open Access
2020 Controlling the false discovery exceedance for heterogeneous tests
Sebastian Döhler, Etienne Roquain
Electron. J. Statist. 14(2): 4244-4272 (2020). DOI: 10.1214/20-EJS1771

Abstract

Several classical methods exist for controlling the false discovery exceedance (FDX) for large-scale multiple testing problems, among them the Lehmann-Romano procedure (Lehmann and Romano 2005) ($[\mathrm{LR}]$ below) and the Guo-Romano procedure (Guo and Romano 2007) ($[\mathrm{GR}]$ below). While these two procedures are the most prominent, they were originally designed for homogeneous test statistics, that is, when the null distribution functions of the $p$-values $F_{i}$, $1\leq i\leq m$, are all equal. In many applications, however, the data are heterogeneous which leads to heterogeneous null distribution functions. Ignoring this heterogeneity induces a lack of power. In this paper, we develop three new procedures that incorporate the $F_{i}$’s, while maintaining rigorous FDX control. The heterogeneous version of $[\mathrm{LR}]$, denoted $[\mathrm{HLR}]$, is based on the arithmetic average of the $F_{i}$’s, while the heterogeneous version of $[\mathrm{GR}]$, denoted $[\mathrm{HGR}]$, is based on the geometric average of the $F_{i}$’s. We also introduce a procedure $[\mathrm{PB}]$, that is based on the Poisson-binomial distribution and that uniformly improves $[\mathrm{HLR}]$ and $[\mathrm{HGR}]$, at the price of a higher computational complexity. Perhaps surprisingly, this shows that, contrary to the known theory of false discovery rate (FDR) control under heterogeneity, the way to incorporate the $F_{i}$’s can be particularly simple in the case of FDX control, and does not require any further correction term. The performances of the new proposed procedures are illustrated by real and simulated data in two important heterogeneous settings: first, when the test statistics are continuous but the $p$-values are weighted by some known independent weight vector, e.g., coming from co-data sets; second, when the test statistics are discretely distributed, as is the case for data representing frequencies or counts. Our new procedures are implemented in the R package $\mathtt{FDX}$, see Junge and Döhler (2020).

Citation

Download Citation

Sebastian Döhler. Etienne Roquain. "Controlling the false discovery exceedance for heterogeneous tests." Electron. J. Statist. 14 (2) 4244 - 4272, 2020. https://doi.org/10.1214/20-EJS1771

Information

Received: 1 April 2020; Published: 2020
First available in Project Euclid: 11 December 2020

zbMATH: 07285585
MathSciNet: MR4185850
Digital Object Identifier: 10.1214/20-EJS1771

Subjects:
Primary: 62H15
Secondary: 62Q05

Keywords: discrete hypothesis testing , false discovery exceedance , heterogeneous data , step-down algorithm , type I error rate control , weighted $p$-values

Vol.14 • No. 2 • 2020
Back to Top