Abstract
Model-X knockoffs (J. R. Stat. Soc. Ser. B. Stat. Methodol. 80 (2018) 551–577) allows analysts to perform feature selection using almost any machine learning algorithm while provably controlling the expected proportion of false discoveries. This procedure involves constructing synthetic variables, called knockoffs, which effectively act as controls during feature selection. The gold standard for constructing knockoffs has been to minimize the mean absolute correlation (MAC) between features and their knockoffs, but, surprisingly, we prove this procedure can be powerless in extremely easy settings, including Gaussian linear models with correlated exchangeable features. The key problem is that minimizing the MAC creates joint dependencies between the features and knockoffs, which allow machine learning algorithms to reconstruct the effect of the features on the response using the knockoffs. To improve power, we propose generating knockoffs which minimize the reconstructability (MRC) of the features, and we demonstrate our proposal for Gaussian features by showing it is computationally efficient, robust, and powerful. We also prove that certain MRC knockoffs minimize a notion of estimation error in Gaussian linear models. Through extensive simulations, we show MRC knockoffs often dramatically outperform MAC-minimizing knockoffs, and we find no settings in which MAC-minimizing knockoffs outperform MRC knockoffs by more than a slight margin. We implement our methods and many others from the knockoffs literature in a new python package .
Funding Statement
L. J. was partially supported by the William F. Milton Fund.
Acknowledgments
The authors would like to thank Chenguang Dai, Buyu Lin, Jun Liu, Wenshuo Wang, and Xin Xing for valuable discussions and suggestions. The authors are also grateful to the anonymous referees for helpful comments.
Citation
Asher Spector. Lucas Janson. "Powerful knockoffs via minimizing reconstructability." Ann. Statist. 50 (1) 252 - 276, February 2022. https://doi.org/10.1214/21-AOS2104
Information