Graphical model is a powerful and popular approach to study high-dimensional omic data, such as genome-wide gene expression data. Nonlinear relations between genes are widely documented. However, partly due to sparsity of data points in high-dimensional space (i.e., curse of dimensionality) and computational challenges, most available methods construct graphical models by testing linear relations. We propose to address this challenge by a two-step approach: first, use a model-free approach to prioritize the neighborhood of each gene; then, apply a nonparametric conditional independence testing method to refine such neighborhood estimation. Our method, named as “mofreds” (MOdel FRee Estimation of DAG Skeletons), seeks to estimate the skeleton of a directed acyclic graph (DAG) by this two-step approach. We studied the theoretical properties of mofreds and evaluated its performance in extensive simulation settings. We found mofreds has substantially better performance than the state-of-the art method which is designed to detect linear relations of Gaussian graphical models. We applied mofreds to analyze gene expression data of breast cancer patients from The Cancer Genome Atlas (TCGA). We found that it discovers nonlinear relationships among gene pairs that are missed by the Gaussian graphical model methods.
This work is supported, in part, by NIH grants GM126550 and GM105785 and NSF Grant DMS-1821231.
"Model free estimation of graphical model using gene expression data." Ann. Appl. Stat. 15 (1) 194 - 207, March 2021. https://doi.org/10.1214/20-AOAS1380