Open Access
October 2020 Which bridge estimator is the best for variable selection?
Shuaiwen Wang, Haolei Weng, Arian Maleki
Ann. Statist. 48(5): 2791-2823 (October 2020). DOI: 10.1214/19-AOS1906

Abstract

We study the problem of variable selection for linear models under the high-dimensional asymptotic setting, where the number of observations $n$ grows at the same rate as the number of predictors $p$. We consider two-stage variable selection techniques (TVS) in which the first stage uses bridge estimators to obtain an estimate of the regression coefficients, and the second stage simply thresholds this estimate to select the “important” predictors. The asymptotic false discovery proportion ($\operatorname{AFDP}$) and true positive proportion (ATPP) of these TVS are evaluated. We prove that for a fixed ATPP, in order to obtain a smaller $\operatorname{AFDP}$, one should pick a bridge estimator with smaller asymptotic mean square error in the first stage of TVS. Based on such principled discovery, we present a sharp comparison of different TVS, via an in-depth investigation of the estimation properties of bridge estimators. Rather than “orderwise” error bounds with loose constants, our analysis focuses on precise error characterization. Various interesting signal-to-noise ratio and sparsity settings are studied. Our results offer new and thorough insights into high-dimensional variable selection. For instance, we prove that a TVS with Ridge in its first stage outperforms TVS with other bridge estimators in large noise settings; two-stage LASSO becomes inferior when the signal is rare and weak. As a by-product, we show that two-stage methods outperform some standard variable selection techniques, such as $\operatorname{LASSO}$ and Sure Independence Screening, under certain conditions.

Citation

Download Citation

Shuaiwen Wang. Haolei Weng. Arian Maleki. "Which bridge estimator is the best for variable selection?." Ann. Statist. 48 (5) 2791 - 2823, October 2020. https://doi.org/10.1214/19-AOS1906

Information

Received: 1 March 2019; Published: October 2020
First available in Project Euclid: 19 September 2020

MathSciNet: MR4152121
Digital Object Identifier: 10.1214/19-AOS1906

Subjects:
Primary: 62J05 , 62J07

Keywords: Bridge regression , debiasing , false discovery proportion , high dimension , large noise , large sample , rare signal , true positive proportion , two-stage methods , Variable selection

Rights: Copyright © 2020 Institute of Mathematical Statistics

Vol.48 • No. 5 • October 2020
Back to Top