Translator Disclaimer
June 2020 Robust inference with knockoffs
Rina Foygel Barber, Emmanuel J. Candès, Richard J. Samworth
Ann. Statist. 48(3): 1409-1431 (June 2020). DOI: 10.1214/19-AOS1852


We consider the variable selection problem, which seeks to identify important variables influencing a response $Y$ out of many candidate features $X_{1},\ldots ,X_{p}$. We wish to do so while offering finite-sample guarantees about the fraction of false positives—selected variables $X_{j}$ that in fact have no effect on $Y$ after the other features are known. When the number of features $p$ is large (perhaps even larger than the sample size $n$), and we have no prior knowledge regarding the type of dependence between $Y$ and $X$, the model-X knockoffs framework nonetheless allows us to select a model with a guaranteed bound on the false discovery rate, as long as the distribution of the feature vector $X=(X_{1},\dots ,X_{p})$ is exactly known. This model selection procedure operates by constructing “knockoff copies” of each of the $p$ features, which are then used as a control group to ensure that the model selection algorithm is not choosing too many irrelevant features. In this work, we study the practical setting where the distribution of $X$ can only be estimated, rather than known exactly, and the knockoff copies of the $X_{j}$’s are therefore constructed somewhat incorrectly. Our results, which are free of any modeling assumption whatsoever, show that the resulting model selection procedure incurs an inflation of the false discovery rate that is proportional to our errors in estimating the distribution of each feature $X_{j}$ conditional on the remaining features $\{X_{k}:k\neq j\}$. The model-X knockoffs framework is therefore robust to errors in the underlying assumptions on the distribution of $X$, making it an effective method for many practical applications, such as genome-wide association studies, where the underlying distribution on the features $X_{1},\dots ,X_{p}$ is estimated accurately but not known exactly.


Download Citation

Rina Foygel Barber. Emmanuel J. Candès. Richard J. Samworth. "Robust inference with knockoffs." Ann. Statist. 48 (3) 1409 - 1431, June 2020.


Received: 1 January 2018; Revised: 1 February 2019; Published: June 2020
First available in Project Euclid: 17 July 2020

zbMATH: 07241596
MathSciNet: MR4124328
Digital Object Identifier: 10.1214/19-AOS1852

Primary: 62F03 , 62F35 , 62G10 , 62G35

Keywords: false discovery rate (FDR) , high-dimensional regression , Knockoffs , robustness , Variable selection

Rights: Copyright © 2020 Institute of Mathematical Statistics


This article is only available to subscribers.
It is not available for individual sale.

Vol.48 • No. 3 • June 2020
Back to Top