Testing for outliers with conformal p-values

Stephen Bates; Emmanuel Candès; Lihua Lei; Yaniv Romano; Matteo Sesia

doi:10.1214/22-AOS2244

Abstract

This paper studies the construction of p-values for nonparametric outlier detection, from a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a general framework yielding p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Our results depart from classical conformal inference as we leverage concentration inequalities rather than combinatorial arguments to establish our finite-sample guarantees. Further, our techniques also yield a uniform confidence bound for the false positive rate of any outlier detection algorithm, as a function of the threshold applied to its raw statistics. Finally, the relevance of our results is demonstrated by experiments on real and simulated data.

Funding Statement

S. Bates was supported by a Ric Weiland fellowship. E. Candès was supported by Office of Naval Research grant N00014-20-12157, by the National Science Foundation grants OAC 1934578 and DMS 2032014, by the Army Research Office (ARO) under Grant W911NF-17-1-0304, and by the Simons Foundation under award 814641.
L. Lei gratefully acknowledges the support of the National Science Foundation Grants OAC 1934578, the Discovery Innovation Fund for Biomedical Data Sciences and the NIH Grant R01MH113078.
Y. Romano was supported by the Israel Science Foundation (grant no. 729/21) and by the Career Advancement Fellowship of the Technion.

Acknowledgments

We are grateful to the anonymous referees and Associate Editor for their helpful comments and suggestions.

Authors listed alphabetically.

Citation

Download Citation

Stephen Bates. Emmanuel Candès. Lihua Lei. Yaniv Romano. Matteo Sesia. "Testing for outliers with conformal p-values." Ann. Statist. 51 (1) 149 - 178, February 2023. https://doi.org/10.1214/22-AOS2244

Information

Received: 1 April 2021; Revised: 1 May 2022; Published: February 2023

First available in Project Euclid: 23 March 2023

MathSciNet: MR4564852

zbMATH: 07684008

Digital Object Identifier: 10.1214/22-AOS2244

Subjects:

Primary: 62G10 , 62J15

Secondary: 62G15

Keywords: conformal inference , False discovery rate , out-of-distribution , Positive dependence

Abstract

Funding Statement

Acknowledgments

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS