The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 12, Number 2 (2018), 1250-1270.
Network-based feature screening with applications to genome data
Modern biological techniques have led to various types of data, which are often used to identify important biomarkers for certain diseases with appropriate statistical methods, such as feature screening. Model-free feature screening has been extensively studied in the literature, and it is effective to select useful predictors for ultra-high dimensional data. These existing screening procedures are conducted based on certain marginal correlations between predictors and a response variable, therefore network structures connecting the predictors are usually ignored. Google’s PageRank algorithm has achieved remarkable success. We adopt its spirit to adjust original screening approaches by incorporating the network information. We can then significantly improve the performance of those screening methods in choosing useful biomarkers, which is demonstrated in an intensive simulation study. A couple of real genome datasets along with a biological network are further analyzed by comparing results on both accuracy of predicting responses and stability of identifying biomarkers.
Ann. Appl. Stat., Volume 12, Number 2 (2018), 1250-1270.
Received: January 2017
Revised: May 2017
First available in Project Euclid: 28 July 2018
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Wu, Mengyun; Zhu, Liping; Feng, Xingdong. Network-based feature screening with applications to genome data. Ann. Appl. Stat. 12 (2018), no. 2, 1250--1270. doi:10.1214/17-AOAS1097. https://projecteuclid.org/euclid.aoas/1532743493
- Some additional tables. The Supplementary Materials includes some additional simulation results with different network structures, signal-noise-ratios and types of responses, the top 100 biomarkers for the dataset GSE71729 identified by DC-SIS-Network and the corresponding KEGG pathway analysis results.