Open Access
October 2019 A knockoff filter for high-dimensional selective inference
Rina Foygel Barber, Emmanuel J. Candès
Ann. Statist. 47(5): 2504-2537 (October 2019). DOI: 10.1214/18-AOS1755

Abstract

This paper develops a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the number of observational units. In this framework, the observations are split into two groups, where the first group is used to screen for a set of potentially relevant variables, whereas the second is used for inference over this reduced set of variables; we also develop strategies for leveraging information from the first part of the data at the inference step for greater power. In our work, the inferential step is carried out by applying the recently introduced knockoff filter, which creates a knockoff copy—a fake variable serving as a control—for each screened variable. We prove that this procedure controls the directional false discovery rate (FDR) in the reduced model controlling for all screened variables; this says that our high-dimensional knockoff procedure “discovers” important variables as well as the directions (signs) of their effects, in such a way that the expected proportion of wrongly chosen signs is below the user-specified level (thereby controlling a notion of Type S error averaged over the selected set). This result is nonasymptotic, and holds for any distribution of the original features and any values of the unknown regression coefficients, so that inference is not calibrated under hypothesized values of the effect sizes. We demonstrate the performance of our general and flexible approach through numerical studies, showing more power than existing alternatives. Finally, we apply our method to a genome-wide association study to find locations on the genome that are possibly associated with a continuous phenotype.

Citation

Download Citation

Rina Foygel Barber. Emmanuel J. Candès. "A knockoff filter for high-dimensional selective inference." Ann. Statist. 47 (5) 2504 - 2537, October 2019. https://doi.org/10.1214/18-AOS1755

Information

Received: 1 February 2016; Revised: 1 July 2018; Published: October 2019
First available in Project Euclid: 3 August 2019

zbMATH: 07114920
MathSciNet: MR3988764
Digital Object Identifier: 10.1214/18-AOS1755

Subjects:
Primary: 62F03 , 62J05

Keywords: false discovery rate (FDR) , high-dimensional regression , Knockoffs , Variable selection

Rights: Copyright © 2019 Institute of Mathematical Statistics

Vol.47 • No. 5 • October 2019
Back to Top