February 2024 Adaptive novelty detection with false discovery rate guarantee
Ariane Marandon, Lihua Lei, David Mary, Etienne Roquain
Author Affiliations +
Ann. Statist. 52(1): 157-183 (February 2024). DOI: 10.1214/23-AOS2338

Abstract

This paper studies the semisupervised novelty detection problem where a set of “typical” measurements is available to the researcher. Motivated by recent advances in multiple testing and conformal inference, we propose AdaDetect, a flexible method that is able to wrap around any probabilistic classification algorithm and control the false discovery rate (FDR) on detected novelties in finite samples without any distributional assumption other than exchangeability. In contrast to classical FDR-controlling procedures that are often committed to a pre-specified p-value function, AdaDetect learns the transformation in a data-adaptive manner to focus the power on the directions that distinguish between inliers and outliers. Inspired by the multiple testing literature, we further propose variants of AdaDetect that are adaptive to the proportion of nulls while maintaining the finite-sample FDR control. The methods are illustrated on synthetic datasets and real-world datasets, including an application in astrophysics.

Funding Statement

The authors acknowledge the grants ANR-16-CE40-0019 (project SansSouci), ANR-17-CE40-0001 (BASICS), ANR-21-CE23-0035 (ASCAI) and ANR-23-CE40-0018-01 (BACKUP) of the French National Research Agency ANR, the program Emergence of Sorbonne University (project MARS) and the GDR ISIS through the “projets exploratoires” program (project TASTY).

Acknowledgments

We would like to thank Gilles Blanchard, Will Fithian, Aaditya Ramdas, Fanny Villers and Asaf Weinstein for constructive discussions and feedback.

Citation

Download Citation

Ariane Marandon. Lihua Lei. David Mary. Etienne Roquain. "Adaptive novelty detection with false discovery rate guarantee." Ann. Statist. 52 (1) 157 - 183, February 2024. https://doi.org/10.1214/23-AOS2338

Information

Received: 1 October 2022; Revised: 1 October 2023; Published: February 2024
First available in Project Euclid: 7 March 2024

Digital Object Identifier: 10.1214/23-AOS2338

Subjects:
Primary: 62G10 , 62J15
Secondary: 62H30

Keywords: Adaptive multiple testing , ‎classification‎ , conformal p-values , False discovery rate , knockoff , machine learning , neural network , novelty detection

Rights: Copyright © 2024 Institute of Mathematical Statistics

JOURNAL ARTICLE
27 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.52 • No. 1 • February 2024
Back to Top