Open Access
October 2020 Relaxing the assumptions of knockoffs by conditioning
Dongming Huang, Lucas Janson
Ann. Statist. 48(5): 3021-3042 (October 2020). DOI: 10.1214/19-AOS1920

Abstract

The recent paper Candès et al. (J. R. Stat. Soc. Ser. B. Stat. Methodol. 80 (2018) 551–577) introduced model-X knockoffs, a method for variable selection that provably and nonasymptotically controls the false discovery rate with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known (but arbitrary) distribution. The present paper shows that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as $\Omega (n^{*}p)$ parameters, where $p$ is the dimension and $n^{*}$ is the number of covariate samples (which may exceed the usual sample size $n$ of labeled samples when unlabeled samples are also available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. We demonstrate how to do this for three models of interest, with simulations showing the new approach remains powerful under the weaker assumptions.

Citation

Download Citation

Dongming Huang. Lucas Janson. "Relaxing the assumptions of knockoffs by conditioning." Ann. Statist. 48 (5) 3021 - 3042, October 2020. https://doi.org/10.1214/19-AOS1920

Information

Received: 1 March 2019; Revised: 1 October 2019; Published: October 2020
First available in Project Euclid: 19 September 2020

MathSciNet: MR4152633
Digital Object Identifier: 10.1214/19-AOS1920

Subjects:
Primary: 62G10
Secondary: 62B05 , 62J02

Keywords: false discovery rate (FDR) , Graphical model , high-dimensional inference , Knockoffs , model-X , Sufficient statistic , topological measure

Rights: Copyright © 2020 Institute of Mathematical Statistics

Vol.48 • No. 5 • October 2020
Back to Top