April 2022 False discovery rate control with unknown null distribution: Is it possible to mimic the oracle?
Etienne Roquain, Nicolas Verzelen
Author Affiliations +
Ann. Statist. 50(2): 1095-1123 (April 2022). DOI: 10.1214/21-AOS2141

Abstract

Classical multiple testing theory prescribes the null distribution, which is often too stringent an assumption for nowadays large scale experiments. This paper presents theoretical foundations to understand the limitations caused by ignoring the null distribution, and how it can be properly learned from the same data set, when possible. We explore this issue in the setting where the null distributions are Gaussian with unknown rescaling parameters (mean and variance) whereas the alternative distributions are let arbitrary. In that case, an oracle procedure is the Benjamini–Hochberg procedure applied with the true (unknown) null distribution and we aim at building a procedure that asymptotically mimics the performances of the oracle (AMO in short). Our main result establishes a phase transition at the sparsity boundary n/log(n): an AMO procedure exists if and only if the number of false nulls is of order less than n/log(n), where n is the total number of tests. Further sparsity boundaries are derived for general location models where the shape of the null distribution is not necessarily Gaussian. In light of our impossibility results, we also pursue the less stringent aim of building a nonparametric confidence region for the null distribution. From a practical perspective, this provides goodness-of-fit tests for the null distribution and allows to assess the reliability of empirical null procedures via novel diagnostic graphs. Our results are illustrated on numerical experiments and real data sets, as detailed in a companion vignette (Roquain and Verzelen (2021)).

Funding Statement

This work has been supported by ANR-16-CE40-0019 (SansSouci), ANR-17-CE40-0001 (BASICS), ANR-21-CE23-0035 (ASCAI) and by the GDR ISIS through the “projets exploratoires” program (project TASTY).

Acknowledgments

We are grateful to Ery Arias-Castro, David Mary, to anonymous referees and to the editorial team for insightful comments that helped us to improve the presentation of the manuscript.

Citation

Download Citation

Etienne Roquain. Nicolas Verzelen. "False discovery rate control with unknown null distribution: Is it possible to mimic the oracle?." Ann. Statist. 50 (2) 1095 - 1123, April 2022. https://doi.org/10.1214/21-AOS2141

Information

Received: 1 December 2020; Revised: 1 July 2021; Published: April 2022
First available in Project Euclid: 7 April 2022

MathSciNet: MR4404913
zbMATH: 1486.62222
Digital Object Identifier: 10.1214/21-AOS2141

Subjects:
Primary: 62G10
Secondary: 62C20

Keywords: Benjamini–Hochberg procedure , False discovery rate , minimax , multiple testing , null distribution , phase transition , Sparsity

Rights: Copyright © 2022 Institute of Mathematical Statistics

Vol.50 • No. 2 • April 2022
Back to Top