Open Access
June 2018 Network-based feature screening with applications to genome data
Mengyun Wu, Liping Zhu, Xingdong Feng
Ann. Appl. Stat. 12(2): 1250-1270 (June 2018). DOI: 10.1214/17-AOAS1097

Abstract

Modern biological techniques have led to various types of data, which are often used to identify important biomarkers for certain diseases with appropriate statistical methods, such as feature screening. Model-free feature screening has been extensively studied in the literature, and it is effective to select useful predictors for ultra-high dimensional data. These existing screening procedures are conducted based on certain marginal correlations between predictors and a response variable, therefore network structures connecting the predictors are usually ignored. Google’s PageRank algorithm has achieved remarkable success. We adopt its spirit to adjust original screening approaches by incorporating the network information. We can then significantly improve the performance of those screening methods in choosing useful biomarkers, which is demonstrated in an intensive simulation study. A couple of real genome datasets along with a biological network are further analyzed by comparing results on both accuracy of predicting responses and stability of identifying biomarkers.

Citation

Download Citation

Mengyun Wu. Liping Zhu. Xingdong Feng. "Network-based feature screening with applications to genome data." Ann. Appl. Stat. 12 (2) 1250 - 1270, June 2018. https://doi.org/10.1214/17-AOAS1097

Information

Received: 1 January 2017; Revised: 1 May 2017; Published: June 2018
First available in Project Euclid: 28 July 2018

zbMATH: 06980492
MathSciNet: MR3834302
Digital Object Identifier: 10.1214/17-AOAS1097

Keywords: Correlation , feature screening , model-free , network , ultra-high dimension , Variable selection

Rights: Copyright © 2018 Institute of Mathematical Statistics

Vol.12 • No. 2 • June 2018
Back to Top