Bayesian network marker selection via the thresholded graph Laplacian Gaussian prior

Selecting informative nodes over large-scale networks becomes increasingly important in many research areas. Most existing methods focus on the local network structure and incur heavy computational costs for the large-scale problem. In this work, we propose a novel prior model for Bayesian network marker selection in the generalized linear model (GLM) framework: the Thresholded Graph Laplacian Gaussian (TGLG) prior, which adopts the graph Laplacian matrix to characterize the conditional dependence between neighboring markers accounting for the global network structure. Under mild conditions, we show the proposed model enjoys the posterior consistency with a diverging number of edges and nodes in the network. We also develop a Metropolis-adjusted Langevin algorithm (MALA) for efficient posterior computation, which is scalable to large-scale networks. We illustrate the superiorities of the proposed method compared with existing alternatives via extensive simulation studies and an analysis of the breast cancer gene expression dataset in the Cancer Genome Atlas (TCGA).


Introduction
In biomedical research, complex biological systems are often modeled or represented as biological networks (Kitano 2002).High-throughput technology such as next generation sequencing (Schuster 2007), mass spectrometry (Aebersold and Mann 2003) and medical imaging (Doi 2007) has generated massive datasets related to those biological networks.For example, in omics studies, a biological network may represent the interactions or dependences among a large set of genes/proteins/metabolites; and the expression data are a number of observations at each node of the network (Barabási et al. 2011).In neuroimaging studies, a biological network may refer to the functional connectivity among many brain regions or voxels; and the neural activity can be measured at each node of the network.In many biomedical studies, one important research question is to select informative nodes from tens of thousands of candidate nodes that are strongly associated with the disease risk or other clinical outcomes (Greicius et al. 2003).We refer to these informative nodes as network markers (Kim et al. 2012, Peng et al. 2014, Yuan et al. 2017) and the selection procedure as network marker selection.One promising solution is to perform network marker selection under regression framework where the response variable is the clinical outcome and predictors are nodes in the network.The classical variable selection (George andMcCulloch 1993, Fan andLi 2001) in the regression model can be considered as a special case of the network marker selection, where the variable refers to the nodes in a network without edges.
For variable selection in regression models, many regularization methods have been proposed with various penalty terms, including the least absolute shrinkage and selection operator or the L 1 penalty (Tibshirani 1996, Zou 2006, LASSO), elastic-net or the L 1 plus L 2 penalty (Zou and Hastie 2005), the Smoothly Clipped Absolute Deviation penalty (Fan and Li 2001, SCAD), the minimax concave penalty (Zhang 2010, MCP) and so on.Several network constrained regularization regression approaches have been developed to improve selection accuracy and increase prediction power.One pioneering work is the graph-constrained estimation (Li and Li 2008, Grace), which adopts the normalized graph Laplacian matrix to incorporate the network dependent structure between connected nodes.As an extension of Grace, the adaptive Grace (Li and Li 2010, aGrace) makes constraints on the absolute values of weighted coefficients between connected nodes.Alternatively, an L γ norm group penalty (Pan et al. 2010) and a fused LASSO type penalty (Luo et al. 2012) have been proposed to penalize the difference of absolute values of coefficients between neighboring nodes.
Instead of imposing constraints on coefficients between neighboring nodes, an L 0 loss to penalize their selection indicators (Kim et al. 2013) has been proposed, leading to a non-convex optimization problem for parameter estimation, which can be solved by approximating the non-continuous L 0 loss using the truncated lasso penalty (TLP).
In addition to frequentist approaches, Bayesian variable selection methods have received much attention recently with many successful applications.The Bayesian methods are natural to incorporate the prior knowledge and make posterior inference on uncertainty of variable selection.A variety of prior models have been studied, such as the spike and slab prior (George and McCulloch 1993), the LASSO prior (Park and Casella 2008), the Horseshoe prior (Polson and Scott 2012), the non-local prior (Johnson and Rossell 2012) , the Dirichlet Laplace prior (Bhattacharya et al. 2015) and more.To incorporate the known network information, Stingo et al. (2011) employed a Markov random field to capture network dependence and jointly select pathways and genes; and Chekouo et al. (2016) adopted a similar approach for imaging genetics analysis.Zhou and Zheng (2013) proposed rGrace, a Bayesian random graph-constrained model to combine network information with empirical evidence for pathway analysis.A partial least squares (PLS) g-prior was developed in Peng et al. (2013) to incorporate prior knowledge on gene-gene interactions or functional relationship for identifying genes and pathways.Chang et al. (2016) proposed a Bayesian shrinkage prior which smoothed shrinkage parameters of connected nodes to a similar degree for structural variable selection.
The Ising model is another commonly used Bayesian structural variable selection method.
It has been used as a prior model for latent selection indicators that lay on an undirected graph characterizing the local network structure.They are especially successful for variable selection over the grid network motivated by some applications, for example, the motif finding problem (Li and Zhang 2010) and the imaging data analysis (Goldsmith et al. 2014, Li et al. 2015).However, it is very challenging for fully Bayesian inference on the Ising model over the large-scale network due to at least two reasons: 1) The posterior inference can be quite sensitive to the hyperparameter specifications in the Ising priors based on empirical Bayes estimates or subjective prior elicitation in some applications.However, fully Bayesian inference on those parameters is difficult due to the intractable normalizing constant in the model.2) Most posterior computation algorithms, such as the single-site Gibbs sampler and the Swendsen-Wang algorithm, incur heavy computational costs for updating the massive binary indicators over large-scale networks with complex structures.In addition, Dobra (2009), Kundu et al. (2015), Liu et al. (2014) and Peterson et al. (2016) also proposed Bayesian variable selection approaches for predictors with unknown network structure.
To address limitations of existing methods, we propose a new prior model: the thresholded graph Laplacian Gaussian (TGLG) prior, to perform network marker selection over the largescale network by thresholding a latent continuous variable attached to each node.To model the selection dependence over the network, all the latent variables are assumed to follow a multivariate Gaussian distribution with mean zero and covariance matrix constructed by a normalized graph Laplacian matrix.The effect size of each node is modeled through an independent Gaussian distribution.
Threshold priors have been proposed for Bayesian modeling of sparsity in various applications.Motivated by the analysis of financial time series data, Nakajima and West (2013a) and Nakajima and West (2013b) proposed a latent threshold approach to imposing dynamic sparsity in the simultaneous autoregressive models (SAR).Nakajima et al. (2017) further extended this type of models for the analysis of EEG data.To analyze neuroimaging data, Shi and Kang (2015) proposed a hard-thresholded Gaussian process prior for image-on-scalar regression; and Kang et al. (2018) introduced a soft-thresholded Gaussian process for scalaron-image regression.To construct the directed graphs in genomics applications, Ni et al.
(2017) adopted a hard threshold Gaussian prior in a structural equation model.However, all the existing threshold prior models do not incorporate the useful network structural information, and thus are not directly applicable to the network marker selection problem of our primary interest.
In this work, we propose to build the threshold priors using the graph Laplacian matrix, which has been used to capture the structure dependence between neighboring nodes (Li and Li 2008, Zhe et al. 2013, Li and Li 2010).Most of those frequentist methods directly specify the graph Laplacian matrix from the existing biological network.Liu et al. (2014) has proposed a Bayesian regularization graph Laplacian (BRGL) approach which utilizes the graph Laplacian matrix to specify a priori precision matrix of regression coefficients.
However, BRGL is fundamentally different from our method in that it is one type of continuous shrinkage priors for regression coefficients which have quite different prior supports compared with our TGLG priors.BRGL were developed only for linear regression and its computational cost can be extremely heavy for large-scale networks.In addition, there is lack of theoretical justifications for BRGL when the large-scale network has a diverging number of edges and nodes.
Our method is a compelling Bayesian approach to network marker selection.The TGLG prior has at least four markable features: 1) Fully Bayesian inference for large-scale networks is feasible in that the TGLG prior does not involve any intractable normalizing constants.2) Posterior computation can be more efficient, since the TGLG-based inference avoids updating the latent binary selection indicators and instead updates the latent continuous variables, to which many existing approximation techniques can be potentially applied.3) The graph Laplacian matrix (Chung 1997, Li and Li 2008, Zhe et al. 2013) based prior can incorporate the topological structure of the network which has been adopted in genomics.4) The TGLG prior enjoys the large support for Bayesian network marker selection over large-scale networks, leading to posterior consistency of model inference with a diverging number of nodes and edges under the generalized linear model (GLM) framework.
The remainder of the manuscript is organized as follows.In Section 2, we introduce the TGLG prior and propose our model for network marker selection under the GLM framework.
In Section 3, we study the theoretical properties for the TGLG prior and show the posterior consistency of model inference.In Section 4, we discuss the hyper prior specifications and the efficient posterior computation algorithm.We illustrate the performance of our approach via simulation studies and an application on the breast cancer gene expression dataset from The Cancer Genome Atlas (TCGA) in Section 5. We conclude our paper with a brief discussion on the future work in Section 6.

The Model
Suppose the observed dataset includes a network with p n nodes, one response variable and q confounding variables.For each node, we have n observations.For observation i, i = 1, . . ., n, let y i be the response variable, x i = (x i1 , • • • , x ipn ) T be the vector of nodes and T be the vector of confounding variables.Denote by D n = {z i , x i , y i } n i=1 the dataset.We write the number of nodes as p n to emphasize on the diverging number of nodes in our asymptotical theory.Drop subscript i to have generic notation for a response variable y, a vector of nodes x and a vector of confounders z.Generalized linear model (GLM) is a flexible regression model to relate a response variable to a vector of nodes and confounding variables.The GLM density function for (y, x, z) with one natural parameter is: where h * = z T ω * +x T β * is the linear parameter in the model, ω * and β * are true coefficients that generate data, a(h) and b(h) are continuous differentiable functions.The true mean function is where g −1 (•) is an inverse link function, which can be chosen according to the specific type of the response variable.For example, one can choose the identity link for the continuous response and the logit link for the binary response.
In (1), coefficient vector ω is a nuisance parameter to adjust for the confounder effects, for which we assign a Gaussian prior with mean zero and independent covariance, i.e. ω ∼ N(0, σ 2 ω I q ) for σ 2 ω > 0.Here I d represents an identity matrix of dimension d for any d > 0.
Coefficient vector β represents the effects of nodes on the response variable.Here we perform network marker selection by imposing sparsity on β.To achieve this goal, we develop a new prior model for β: the thresholded graph Laplacian Gaussian (TGLG) prior.Suppose the observed network can be represented by a graph G, with each vertex corresponding to one node in the network.Let j ∼ k indicate there exists an edge between vertices j and k in G. Let d j represent the degree of vertex j, i.e., the number of nodes that are connected to vertex j in G. Denote by L = (L jk ) a p n × p n normalized graph Laplacian matrix, where and L jk = 0 otherwise.For any d > 0, denote by 0 d an all zero vector of dimension d.For any λ, ε, σ 2 α , σ 2 γ > 0, we consider an element-wise decomposition of β for the prior specifications: (2) Here α = (α 1 , . . ., α pn ) T represents the effect size of nodes.The operator "•" is the elementwise product.The vector thresholding function is where I(A) is the event indicator with I(A) = 1 if A occurs and I(A) = 0 otherwise.The latent continuous vector γ = (γ 1 , . . ., γ pn ) T controls the sparsity over graph G.We refer to (2) as the TGLG prior for β, denoted as β ∼ TGLG(λ, ε, σ 2 γ , σ 2 α ).The TGLG prior implies that for any two nodes j and k, γ j and γ k are conditionally dependent given others if and only if j ∼ k over the graph G.In this case, their absolute values are more likely to be smaller or larger than a threshold value λ together.This further implies that nodes j and k are more likely to be selected as network marker or not selected together if j ∼ k. Figure 1 shows an example of a graph and the corresponding correlation matrix of γ for ε = 10 −2 , where the γ's of connected vertices are highly correlated.priori the sparsity.When λ → 0, all the nodes tend to be selected.When λ → ∞, none of them will be selected.The parameter ε determines the impact of the network on the sparsity.When ε → ∞, γ's of connected vertices tend to be independent while they tend to be perfectly correlated when ε → 0. The two variance parameters σ 2 γ and σ 2 α control the prior variability of the latent vectors γ and α respectively.Now we discuss how to specify the hyperparameters.For variance terms σ 2 γ and σ 2 α , we use the conjugate prior model by assigning the Inverse-Gamma distribution IG(a γ , b γ ) and IG(a α , b α ) respectively.We fix σ 2 ω as a large value.We assign the uniform prior to the threshold parameter λ, i.e. λ ∼ Unif(0, λ u ) with upper bound λ u > 0. We choose a wide range by set λ u = 10 in the rest of manuscript.For parameter ε, we can either assign an log-normal prior (logε ∼ N (µ ε , σ 2 ε )) or set as a fixed small value.

Theoretical Properties
In this section, we examine the theoretical properties of TGLG prior based network marker selection under the GLM framework.In particular, we establish the posterior consistency with a diverging number of nodes in the large-scale networks.
T the coefficient of interest, respectively.Let π(ξ, dβ ξ , dω) represent the joint prior probability measure for model ξ, parameters β ξ and confounding coefficients ω.Their joint posterior probability measure conditional on dataset We examine asymptotic properties of the posterior distribution of the density function f regarding to the Hellinger distance (Jiang 2007, Song andLiang 2015) under some regularity conditions.The Hellinger distance d(f 1 , f 2 ) between two density functions f 1 (x, y) and f 2 (x, y) is defined as .
We list all the regularity conditions in the Appendix.We show that the TGLG prior and the proposed model enjoy the following properties: Theorem 1. (Large Support for Network Marker Selection) Assume a sequence n ∈ (0, 1] with n 2 n → ∞ and a sequence of nonempty models ξ n .Assume conditions (C1)-( C3) and (C7) hold.Given σ 2 α and σ 2 γ , for any sufficiently small η > 0, there exists N η such that for all n > N η , we have There exists C n > 0, such that for all sufficiently large n and for any j ∈ ξ n : This theorem shows that the TGLG prior has a large support for the network marker selection.Particularly, (3) states that the TGLG prior can select the true network marker with a positive prior probability bounded away from zero, (4) ensures that the prior probability of the coefficients falling within an arbitrarily small neighborhood of the true coefficients with probability bounded away from zero, and (5) indicates a sufficiently small tail probability of the TGLG prior.Let n ∈ (0, 1] be a sequence such that n 2 n → ∞.Assume conditions (C1)-(C7) hold.Then we have the following results: (i) Posterior consistency: where f is the density function sampled from the posterior distribution and f * is the true density function.
(ii) For all sufficiently large n: (iii) For all sufficiently large n: Probability measure P and expectation E are both with respect to data D n that are generated from the true density f * .
This theorem establishes the posterior consistency of network marker selection.In particular, (6) implies that the posterior distribution of density f concentrates on an arbitrarily small neighborhood of the true density f * under the Hellinger distance with a large probability.
This probability converges to one as sample size n → ∞. (7) provides the convergence rate of the posterior distribution indicating how fast the tail probability approaches to zero.( 8) indicates the average convergence rate of the posterior distribution of density f concentrating on the arbitrarily small neighborhood of the true density f * .
Please refer to the Supplementary File 1 for proofs of Theorems 1 and 2.

Posterior Computation
Our primary goal is to make posterior inference on regression coefficients for network markers, i.e. β.According to the model specification, the sparsity of β j is determined by the sparsity of α j and whether |γ j | is less than λ or not, i.e.I(β j = 0) = I(α j = 0)I(|γ j | ≤ λ).Since α j has a non-sparse normal prior, the posterior inclusion probability of node j is just equal to the posterior probability of |γ j | being greater than λ; and given β j = 0, the effect-size can be estimated by E(β j ||γ j | > λ, D n ).All other parameters in the model can be estimated by its posterior expectations.
To simulate the joint posterior distribution for all parameters, we adopt an efficient Metropolis-adjusted Langevin algorithm (MALA) (Roberts and Rosenthal 1998) for posterior computation.We introduce a smooth approximation for the thresholding function: leading to the analytically tractable first derivative: .
We choose ε 0 = 10 −8 in the simulation studies and real data application in this article.The key steps in our posterior computation algorithm include: .
The proposal variances τ 2 γ , τ 2 α and τ 2 ω are all adaptively chosen by tuning acceptance rates to 30% for random walk and 50% for MALA in simulation studies and 15% for random walk and 30% for MALA in real data analysis.Our choice of the acceptance rate takes into account the general theoretical results on the optimal scaling of random walk (Roberts et al. 1997) and MALA (Roberts andRosenthal 1998, Roberts et al. 2001).However, the log-likelihood of our model involves both smooth and discontinuous functions, which do not satisfy the regularity conditions of the general theoretical results.Thus, we have made slight changes in the theoretical optimal acceptance rates according to our numerical experiments.
Denote by {γ (i) , α (i) , λ (i) } N i=1 the MCMC samples obtained after burn-in.We estimate the posterior inclusion probability for node j(j = 1, According to Barbieri et al. (2004), we select the informative nodes with at least 50% inclusion probability, denote by M = {j : Pr(β j = 0 | D n ) > 0.5} the indices of all the informative nodes.To estimate regression coefficients of informative nodes, we choose the estimated conditional expectation of β j given β j = 0 by , for j ∈ M .

Numerical Studies
We conduct simulation studies to evaluate performance of the proposed methods compared with existing methods for different scenarios.

Small simple networks
Following settings in Li and Li (2008), Zhe et al. (2013) and Kim et al. (2013), we simulate small simple gene networks consisting of multiple subnetworks, where each subnetwork contains one transcription factor (TF) gene and 10 target genes that are connected to the TF gene; and two of the subnetworks are set as the true network markers.We consider two types of true network markers.In Type 1 network marker, TF and all 10 target genes are informative nodes; see Figure 2(a).In Type 2 network marker, TF and half of target genes are informative nodes; see Figure 2(b).For each informative node, the magnitude of the effect size, β, is simulated from Unif(1, 3) and its sign is randomly assigned as positive or negative.
In each subnetwork, the covariate variables for 11 nodes, i.e., the expression levels of the TF gene and 10 target gene, are jointly generated from a 11-dimensional multivariate normal distribution with zero mean and unit variance, where the correlation between the TF gene and each target gene is 0.5; and the correlation between any two different target genes is 0.25.We assume the covariate variables are independent across different subnetworks.
(a) Type 1 (b) Type 2 Figure 2: Two types network markers in the simulated small simple networks, where true informative nodes are marked in red.In Type 1 network marker, TF and all target genes are informative nodes.In Type 2 network marker, TF and half of target genes are informative nodes.
To generate the response variable given the true network markers, we consider binary and continuous cases, where the continuous response variable is generated from linear regression, i.e. y ∼ N(Xβ, i β 2 i /3); and the binary response is generated from logistic regression, i.e.
For all the Bayesian methods, we run 30, 000 MCMC iterations with the first 20, 000 as burn-in.We also check the MCMC convergences by running five chains and computing the Gelman-Rubin diagnostics.For each of the Bayesian methods, the estimated 95% CI of the potential scaled reduction factors is [1.0, 1.0], indicating the convergence of the MCMC algorithm.To compare the performance of different methods, we compute true positives, false positives and the area under the curve (AUC) for true network markers recovery, prediction mean squared error (PMSE) for linear regression and classification error (CE) for logistic regression regarding to outcome.We report the mean and standard error over 50 datasets for each metric we choose to compare in the result table.
Table 1 summarizes   itives and classification error than TGLG-I in most cases, which indicates that including network structure could improve model performance in logistic regression.

Large scale-free networks
We perform simulation studies on large scale-free networks, which are commonly used network models for gene networks.We simulate scale-free network (Barabási and Albert 1999) with 1,000 nodes using R function barabasi.game in package igraph.In the simulated scale-free network, we set the true network markers by selecting 10 nodes out of 1,000 as the true informative nodes according to two criteria: 1) all the true informative nodes form a connected component (Hopcroft and Tarjan 1973) in the network; 2) all the true informative nodes are disconnected, in which case the TGLG model assumption does not hold.For each informative node, the magnitude of the effect size is simulated from Unif(1, 3) and its  We apply the aforementioned all three TGLG methods (TGLG-I, TGLG-F, TGLG-L) to each dataset compared with Lasso and Elastic-net.In addition, to evaluate the robustness of network structure mis-specifications, for each simulated scale-free network, we randomly select 20% of nodes and permute their labels; and then we apply TGLG-L with this misspecified network.We refer to this approach as TGLG-M.
Table 4 reports the same performance evaluation metrics as Table 1 and Table 2.When the informative nodes form a connected component in the network, overall TGLG-L achieve the best performance regarding to PMSE or CE, and false positives.When the informative nodes are all disconnected, TGLG-L still has the best performance in linear regression, but is slightly worse than TGLG-I in logistic regression.This fact indicates that TGLG approaches is not sensitive to our model assumption regarding the true network markers.In both cases, TGLG-M performs worse than TGLG-L with correctly specified networks, but still better than Lasso and Elastic-net.This implies that the useful network information can improve the performance of TGLG, while TGLG-L is robust the network misspecification.

Application to breast cancer data from the Cancer Genome Atlas
In the real data application, we use the High-quality INTeractomes (HINT) database for the biological network (Das and Yu 2012).We apply our method to the TCGA breast A total of 470 genes are selected as networks marker by our approach.To facilitate data interpretation, we conduct the community detection on the network containing the selected network markers and their one-step neighbors (Clauset et al. 2004).There is a total of eight modules that contain 10 or more selected genes.The plot of the modules, together with their over-represented biological process as identified using the 'GOstats' package (Falcon and Gentleman 2007), are listed in Supplementary File 2.
Figure 3 shows two example network modules.The first example (Figure 3(a)) contains 95 selected gene network markers, including 14 that are connected with other network markers.The top 5 biological processes associated with these 95 genes are listed in Table 5.
The most significant biological process that is over-represented by the selected genes in this module is regulation of cellular response to stress (p=0.00016), with 14 of the selected genes involved in this biological process.Besides the general connection between stress response and breast cancer, ER status has some specific interplay with various stress response processes.For example, breast cancer cells up-regulate hypoxia-inducible factors, which cause higher risk of metastasis (Gilkes and Semenza 2013).Hypoxia inducible factors can influence the expression of estrogen receptor (Wolff et al. 2017).In addition, estrogen changes the DNA damage response by regulating proteins including ATM, ATR, CHK1, BRCA1, and p53 (Caldon 2014).Thus it is expected that DNA damage response is closely related to ER status.
Five other genes in this module are involved in the pathway of regulation of anion transport, which include the famous mTOR gene, which is implicated in multiple cancers (Le Rhun et al. 2017).The PI3K/AKT/mTOR pathway is an anticancer target in ER+ breast cancer (Ciruelos Gil 2014).The other four genes, ABCB1 (Jin and Song 2017), SNCA (Li et al. 2018), IRS2 (Yin et al. 2017) and NCOR1 (Lopez et al. 2016) are all involved in some other types of cancer.
In ER-breast cancer cells, the lack of ER signaling triggers the epigenetic silencing of downstream targets (Leu et al. 2004), which explains the significance of the biological process "negative regulation of gene silencing".Many genes in the "cardiac muscle cell development" processes are also part of the growth factor receptor pathway, which has a close interplay with estrogen signaling (Osborne et al. 2005).Four of the genes fall into the process "regulation of B cell proliferation".Among them, AHR has been identified as a potential tumor suppressor (Formosa et al. 2017).ERα is recruited in AhR signaling (Matthews and Gustafsson 2006).
IRS2 responds to interleukin 4 treatment, and its polymorphism is associated with colorectal cancer risk (Yin et al. 2017).CLCF1 signal tranduction was found to play a critical role in the growth of malignant plasma cells (Burger et al. 2003).It appears that these genes are found due to their functionality in signal transduction, rather than specific functions in B cell proliferation.The second example is a much smaller module including 14 selected genes.Six of the 14 genes are involved in both hemopoiesis and immune system development (Table 5).They are all signal transducers.Among them, AGER is a member of the immunoglobulin superfamily et al. 2015), and SVIL, which mediates the suppression of p53 protein and enhances cell survival (Fang and Luna 2013).
Overall, genes selected by TGLG are easy to interpret.Many known links exist between these genes and ER status, or breast cancer in general.Still many of the selected genes are not reported so far to be linked to ER status or breast cancer.Our results indicate they may play important roles.

Discussion
In summary, we propose a new prior model: TGLG prior for Bayesian network marker selection over large-scale networks.We show the proposed prior model enjoys large prior support for network marker selection over large-scale networks, leading to the posterior consistency.We also develop an efficient Metropolis-adjusted Langevin algorithm (MALA) for posterior computation.The simulation studies show that our method performs better than existing regularized regression approaches with regard to the selection and prediction accuracy.Also, the analysis of TCGA breast cancer data indicates that our method can provide biologically meaningful results.
This paper leads to some obvious future work.First, we can apply the TGLG prior for network marker selection under other modeling framework such as the survival model and the generalized mixed effects model.Second, the current posterior computation can be further improved by utilizing the parallel computing techniques within each iteration of the MCMC algorithm, for updating the massive latent variables simultaneously.Third, another promising direction is to use the integrated nested laplace approximations (INLA) for Bayesian approximating computation taking advantages of the TGLG prior involving high-dimensional Gaussian latent variables.

Figure 1 :
Figure 1: An example of the graph and the corresponding correlation matrix of γ that was constructed from the inverse graph Laplacian matrix
For linear regression, each dataset contains 100 training samples and 100 test samples; for logistic regression, each dataset contains 200 training samples and 200 test samples.
where N i denotes the neighbor nodes set of node i.For hyper prior specifications in Ising model, we fix a = −2 and choose b from 2, 5, 7 and 10 based on model performance.We implement a single-site Gibbs sampler for Ising model.For BRGL byLiu et al. (2014), the network markers are selected when the posterior probability P(|β j | > Var(β j )|D n ) exceeds 0.5.
the results for linear regression under different settings.In most cases, TGLG approaches with incorporating network structure achieve a smaller PMSE, smaller number of false positives with a comparable amount of true positives compared with other methods.For the Ising model, we only report the results in the case of b = 7 since it has an overall best performance among all choices of b values.In fact, the performance of the Ising model varies greatly for different choices of values for b and it may perform vary bad with an inappropriate value of b.Table 3 shows the mean computation time over 50 datasets for Ising model and TGLG.It shows that our method is much more computationally efficient than the Ising model, especially for the large-scale networks.As for the three cases of adopting TGLG approaches, TGLG-L has the best overall performance regarding to the PMSE and false positives.TGLG-F tends to have a larger false positive than TGLG-L and TGLG-I, since selection variables for connected nodes are highly dependent when fixed ε = 10 −5 .However, TGLG-F still has a smaller PMSE than TGLG-I.Compared with TGLG-I, TGLG-L has smaller false positives and PMSE in most cases.These facts show that incorporating network structure can improve model prediction performance of TGLG in linear regression.Table2summarizes the results for the logistic regression under different simulation settings.Here the TGLG is only compared with Lasso, Elastic-net and the Ising model.For Type 1 network, the Ising model has a smaller number of false positives than all three TGLG approaches.However, the Ising model has a larger prediction error and a smaller number of true positives.In all other scenarios, TGLG outperforms the Ising model.

Figure 3 :
Figure 3: Two example modules of selected genes.

Table 1 :
Simulation results for linear regression.PMSE: prediction mean squared error.TP: true positives, FP: false positives; number of informative nodes in Type 1 network is 22; number of informative nodes in Type 2 network is 12.

Table 2 :
Simulation results for logistic regression with sample size 200.CE: classification error (number of incorrect classification); TP: true positive; FP: false positive; number of Type 1 true network markers: 22; number of Type 2 true network markers: 12.

Table 3 :
Average computing time with standard error in seconds for Ising model and TGLG based network marker selection.All the computations run on a desktop computer with 3.40 GHz i7 CPU and 16 GB memory sign is randomly assigned as positive or negative.Covariates X are generated from a multivariate normal distribution X ∼ N(0, 0.3 D ), where D is the shortest path distance matrix between nodes in the generated scale-free network.Response variable Y is generated using

Table 4 :
Simulation results for scale-free network.The number of true informative nodes is 10.Sample size is 200 and the number of nodes is 1,000.RNA-seq gene expression dataset with 762 subjects and 10, 792 genes in the network.The response variable we consider here is ER status -whether the cancer cells grow in response to the estrogen.The ER status is a molecular characteristic of the cancer which has important implications in prognosis.The purpose here is not focused on prediction.Rather we intend to find genes and functional modules that are associated with ER status, through which biological mechanisms differentiating the two subgroups of cancer can be further elucidated.