Estimating minimum effect with outlier selection

Alexandra Carpentier; Sylvain Delattre; Etienne Roquain; Nicolas Verzelen

doi:10.1214/20-AOS1956

February 2021 Estimating minimum effect with outlier selection

Alexandra Carpentier, Sylvain Delattre, Etienne Roquain, Nicolas Verzelen

Ann. Statist. 49(1): 272-294 (February 2021). DOI: 10.1214/20-AOS1956

Abstract

We introduce one-sided versions of Huber’s contamination model, in which corrupted samples tend to take larger values than uncorrupted ones. Two intertwined problems are addressed: estimation of the mean of the uncorrupted samples (minimum effect) and selection of the corrupted samples (outliers). Regarding estimation of the minimum effect, we derive the minimax risks and introduce estimators that are adaptive with respect to the unknown number of contaminations. The optimal convergence rates differ from the ones in the classical Huber contamination model. This fact uncovers the effect of the one-sided structural assumption of the contaminations. As for the problem of selecting the outliers, we formulate the problem in a multiple testing framework for which the location and scaling of the null hypotheses are unknown. We rigorously prove that estimating the null hypothesis while maintaining a theoretical guarantee on the amount of the falsely selected outliers is possible, both through false discovery rate (FDR) and through post hoc bounds. As a by-product, we address a long-standing open issue on FDR control under equi-correlation, which reinforces the interest of removing dependency in such a setting.

Citation

Download Citation

Alexandra Carpentier. Sylvain Delattre. Etienne Roquain. Nicolas Verzelen. "Estimating minimum effect with outlier selection." Ann. Statist. 49 (1) 272 - 294, February 2021. https://doi.org/10.1214/20-AOS1956

Information

Received: 1 September 2018; Revised: 1 January 2020; Published: February 2021

First available in Project Euclid: 29 January 2021

Digital Object Identifier: 10.1214/20-AOS1956

Subjects:

Primary: 62G10

Secondary: 62C20

Keywords: contamination , equicorrelation , False discovery rate , Hermite polynomials , Minimax rate , moment matching , multiple testing , post hoc , selective inference , Sparsity

ACCESS THE FULL ARTICLE

JOURNAL ARTICLE
23 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY