March 2022 Fast feature selection via streamwise procedure for massive data
Bingqing Lin, Zhen Pang, Jun Zhang, Cuiqing Chen
Author Affiliations +
Braz. J. Probab. Stat. 36(1): 81-102 (March 2022). DOI: 10.1214/21-BJPS516

Abstract

Variable selection has become an indispensable part of statistical analysis for high-dimensional datasets. However, classical variable selection algorithms, such as regularization methods, are computationally high demanding when sample size and dimension of dataset are both large. Lin, Foster and Ungar (Journal of the American Statistical Association 106 (2011) 232–247) proposed a variable selection algorithm called VIF regression for massive datasets which is more computationally efficient and able to control the marginal false discovery rate. Building on the idea of VIF regression, we propose a new variable selection algorithm, Double-Gates Streamwise regression (DGS), which quickly tests whether predictors significantly reduce the prediction error in one-pass search. DGS regression has two main appealing features. First, DGS regression is computationally efficient and low demanding in the usage of memory. Second, DGS regression can control the false discovery rate, and hence improve the predictive and explanatory abilities. Its advantages relative to VIF regression and some other popular variable selection algorithms are demonstrated in extensive numerical simulated experiments and a real dataset analysis.

Funding Statement

Bingqing Lin was supported by the National Natural Science Foundation of China (Grant No. 11701386), the Natural Science Foundation of Guangdong Province (Grant No. 2020A1515010372) and the Natural Science Foundation of Shenzhen (Grant No. 20200813151828003). Jun Zhang’s research was supported by the Natural Science Foundation of Guangdong Province (Grant No. 2020A1515010372), and the University stability support program A of Shenzhen (Grant No. 20200813151828003).

Acknowledgments

The authors would like to thank the anonymous referees, an Associate Editor and the Editor for their constructive comments that improved the quality of this paper.

Citation

Download Citation

Bingqing Lin. Zhen Pang. Jun Zhang. Cuiqing Chen. "Fast feature selection via streamwise procedure for massive data." Braz. J. Probab. Stat. 36 (1) 81 - 102, March 2022. https://doi.org/10.1214/21-BJPS516

Information

Received: 1 December 2019; Accepted: 1 September 2021; Published: March 2022
First available in Project Euclid: 6 February 2022

MathSciNet: MR4377123
zbMATH: 07477296
Digital Object Identifier: 10.1214/21-BJPS516

Keywords: massive data , streamwise selection , Variable selection

Rights: Copyright © 2022 Brazilian Statistical Association

JOURNAL ARTICLE
22 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.36 • No. 1 • March 2022
Back to Top