Abstract
Variable selection has become an indispensable part of statistical analysis for high-dimensional datasets. However, classical variable selection algorithms, such as regularization methods, are computationally high demanding when sample size and dimension of dataset are both large. Lin, Foster and Ungar (Journal of the American Statistical Association 106 (2011) 232–247) proposed a variable selection algorithm called VIF regression for massive datasets which is more computationally efficient and able to control the marginal false discovery rate. Building on the idea of VIF regression, we propose a new variable selection algorithm, Double-Gates Streamwise regression (DGS), which quickly tests whether predictors significantly reduce the prediction error in one-pass search. DGS regression has two main appealing features. First, DGS regression is computationally efficient and low demanding in the usage of memory. Second, DGS regression can control the false discovery rate, and hence improve the predictive and explanatory abilities. Its advantages relative to VIF regression and some other popular variable selection algorithms are demonstrated in extensive numerical simulated experiments and a real dataset analysis.
Funding Statement
Bingqing Lin was supported by the National Natural Science Foundation of China (Grant No. 11701386), the Natural Science Foundation of Guangdong Province (Grant No. 2020A1515010372) and the Natural Science Foundation of Shenzhen (Grant No. 20200813151828003). Jun Zhang’s research was supported by the Natural Science Foundation of Guangdong Province (Grant No. 2020A1515010372), and the University stability support program A of Shenzhen (Grant No. 20200813151828003).
Acknowledgments
The authors would like to thank the anonymous referees, an Associate Editor and the Editor for their constructive comments that improved the quality of this paper.
Citation
Bingqing Lin. Zhen Pang. Jun Zhang. Cuiqing Chen. "Fast feature selection via streamwise procedure for massive data." Braz. J. Probab. Stat. 36 (1) 81 - 102, March 2022. https://doi.org/10.1214/21-BJPS516
Information