Network-based feature screening with applications to genome data

Mengyun Wu; Liping Zhu; Xingdong Feng

doi:10.1214/17-AOAS1097

June 2018 Network-based feature screening with applications to genome data

Mengyun Wu, Liping Zhu, Xingdong Feng

Ann. Appl. Stat. 12(2): 1250-1270 (June 2018). DOI: 10.1214/17-AOAS1097

Abstract

Modern biological techniques have led to various types of data, which are often used to identify important biomarkers for certain diseases with appropriate statistical methods, such as feature screening. Model-free feature screening has been extensively studied in the literature, and it is effective to select useful predictors for ultra-high dimensional data. These existing screening procedures are conducted based on certain marginal correlations between predictors and a response variable, therefore network structures connecting the predictors are usually ignored. Google’s PageRank algorithm has achieved remarkable success. We adopt its spirit to adjust original screening approaches by incorporating the network information. We can then significantly improve the performance of those screening methods in choosing useful biomarkers, which is demonstrated in an intensive simulation study. A couple of real genome datasets along with a biological network are further analyzed by comparing results on both accuracy of predicting responses and stability of identifying biomarkers.

Citation

Download Citation

Mengyun Wu. Liping Zhu. Xingdong Feng. "Network-based feature screening with applications to genome data." Ann. Appl. Stat. 12 (2) 1250 - 1270, June 2018. https://doi.org/10.1214/17-AOAS1097