High dimensional Single Index Bayesian Modeling of the Brain Atrophy over time

. We propose a model of brain atrophy as a function of high dimensional genetic information and low dimensional covariates such as gender, age, APOE gene, and disease status. A nonparametric single index Bayesian model of high dimension is proposed to study the data with B-spline series prior on the unknown functions and Dirichlet process scale mixture of centered normal prior on the distributions of the random eﬀects. The posterior rate of contraction without the random eﬀect is established for a ﬁxed number of regions and time points with increasing sample size. The performance of the proposed Bayesian method is compared with the corresponding least square estimator in the linear model with LASSO and SCAD penalization on the high dimensional covariates. The proposed Bayesian method is applied to a dataset of 748 individuals with 620,901 SNPs and 6 other covariates for each individual, to identify factors associated with signiﬁcant variation.


Introduction
Alzheimer's disease (AD) is a progressive neurodegenerative disease that affects approximately 5.5 million people in the United States and about 30 million people worldwide. It is believed to have a prolonged preclinical phase initially characterized by the development of silent pathologic changes when patients appear to be clinically normal, followed by mild cognitive impairment (MCI) and then dementia (AD) (Petrella (2013)). Apart from its manifestation in the impairment of cognitive abilities, disease progression also produces a number of structural changes in the human brain, which includes the deposition of amyloid protein and the shrinkage or atrophy for certain regions of the brain over time (Thompson et al. (2003)). Previous studies have shown that the rate of brain atrophy is significantly modulated by a number of factors, such as gender, age, baseline cognitive status and most markedly, allelic variants in the Apolipoprotein E (APOE) gene (Hostage et al. (2014)). In this paper, we examine if any other genes are also implicated in modulating the rate of brain atrophy along with examining effects of the low-dimensional covariate on the rate of atrophy using the data, collected by Alzheimer's Disease Neuroimaging Initiative (ADNI). We model regional brain volume, which has been measured from magnetic resonance (MR) images using a segmentation procedure and recorded in the ADNI database. We collect this data directly from ADNI. We have volumetric measurements over six visits for thirteen disjoint brain regions and a total brain measure which is a summary measure of total brain parenchyma, including the Cerebral-Cortex, Cerebellum-Cortex, Thalamus-Proper, CaudatePutamen, Pallidum, Hippocampus, Amygdala, Accumbensarea, VentralDC, Cerebral-White-Matter, Cerebellum-White-Matter, and WM-hypointensities. These visits were roughly around six months apart. Thus the subjects were scanned roughly over three years. The rate of atrophy differs across different individuals and is assumed to be dependent on subject-specific covariates like genetic variations, gender, age, etc. In this paper, we propose a model to study the effects of these different covariates along with Alzheimer's disease state on atrophy in different brain regions. This analysis represents a technical challenge because the genomic data is high-dimensional and needs to be incorporated in a model for longitudinal progression of brain volumes measured in multiple parts of the brain in a non-parametric setup. A schematic of the regions, we studied here, are depicted in the Figure 1. This image is reproduced with permission from Ahveninen et al. (2012). The pre-processed brain volume data, used in this paper is obtained from the ADNI database. Hostage et al. (2014) had also used a similar set of regions of interest. We consider two separate sets of unknown functions of covariates X and Z to model the volumetric measurements of the first visit and rates of changes for different regions. These functions have two inputs. The first X consists of high-dimensional single-nucleotide polymorphism (SNP) of each individual, and the other Z consists of low-dimensional covariates like gender, age, disease state, APOE gene status of each individual. The effect of these covariates on brain volume is modelled using a semiparametric function of the form {a 0,j (X β, Z η) : j = 1, . . . , 14} for the volumetric measurements of the first visit and {a 1,j (X β, Z η) : j = 1, . . . , 14} for the rate of change in the jth region and the coefficients β and η are unit vectors of appropriate dimensions. A finite random series prior is put on the functions based on tensor products of B-splines with appropriate prior distribution on the coefficients. To address the issues associated with the high-dimensionality of X, the coefficient β is assumed to be sparse. We reparametrize β in polar coordinates to incorporate sparsity in its prior. Figure 1 of the Supplementary Materials (Roy et al., 2019) shows that the logarithm of the total brain measure changes non-linearly with time. This is the case for other brain regions as well. In addition to that, Alzheimer's Disease Assessment Scale-cognitive subscale (ADAS-Cog) and Clinical Dementia Rating scale Sum of Boxes (CDR-SB) has been found to change quadratically with time (Lin et al., 2015). Thus, we incorporate the effect of time in the modeling by an increasing function which is estimated nonparametrically. Here also, we use a finite random series of B-splines with an appropriate prior on the coefficients to model this unknown function of time.
Apart from proposing a sophisticated model for studying brain atrophy, the proposed method develops a new estimation scheme for a general high-dimensional single-index model (high-dimensional SIM). Estimation for high-dimensional single-index model was previously addressed in Zhu (2009), Yu andRuppert (2002), Wang et al. (2012), Peng and Huang (2011), Radchenko (2015) and Luo and Ghosal (2016). All of them used the 1 -penalty and used optimization techniques to compute the estimates. In a Bayesian framework, Antoniadis et al. (2004) used the Fisher-von Mises prior to the directional vector. This cannot be easily modified for a high-dimensional covariate as then we shall need a prior which favors many zeros in the unit vector. Another paper addressing sparse Bayesian single-index model estimation is Alquier and Biau (2013). Even though their method is theoretically attractive, it is difficult to implement for highdimensional covariate due to its high computational complexity. Wang (2009) developed a Bayesian method for the sparse single-index model using the reversible jump Markov chain Monte Carlo (MCMC) technique which is computationally expensive, especially in the high-dimensional setting. We introduce a new way of incorporating sparsity on a unit vector by a spike-and-slab prior via the polar form. The computation scheme is based on an efficient gradient-based Hamiltonian Monte Carlo (HMC) algorithm.
The rest of the paper is organized as follows. Section 2 discusses dataset and modeling in more detail. In Section 3, we describe the prior on different parameters of the model. Section 4 describes posterior computation in this setup. We study the posterior rate of contraction in the model in Section 5 under the asymptotic regime that the number of individuals goes to infinity but the number of time points where measurements are taken and the number of regions is fixed. We present a simulation study comparing the proposed Bayesian procedure with its linear counterpart in Section 6. The concentration of the posterior justifies the use of the proposed Bayesian procedure from a frequentist perspective. In Section 7, we present conclusions from the ADNI data on brain atrophy described above using our proposed method. Section 8, concludes the paper with some further remarks.

Data description and modeling
Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to predict the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). In the ADNI dataset, the grey matter part of the brain is divided into thirteen disjoint regions. The volume of these regions and the summary measure of the whole brain are recorded over time for n = 748 individuals. The visits are scheduled after every 6 months over a span of 36 months. Not all of the participants turned up at each of those scheduled visits. There was no record of visiting after 30 th month for any of the participants. The volume data of J = 14 regions which include thirteen brain regions and the summary measure of the whole brain over T i set of time points which are original visit times in months divided by 36 to keep it bounded in [0, 1] for the ith individual, for i = 1, . . . , 748 is collected where 1 ≤ |T i | ≤ 6. The notation |T i | denotes cardinality of the set T i . For some of the participants, the disease status changed during the span of this study. However, they were very small in number. We do not include them in this study.
In ADNI, the subjects were genotyped using the Human 610-Quad BeadChip (Illumina, Inc., San Diego, CA), yielding a set of 620,901 SNP and copy number variation (CNV) markers. The APOE gene has been the most significant gene in GWAS of Alzheimer's disease. The corresponding SNPs, rs429358 and rs7412, are not on the Human 610-Quad Bead-Chip. At the time of participant enrollment, APOE genotyping was performed and included in the ADNI database. The two SNPs (rs429358, rs7412) define the epsilon 2, 3, and 4 alleles and therefore were not genotyped using DNA extracted by Cogenics from a 3 mL aliquot of Ethylenediaminetetraacetic acid (EDTA) blood. These alleles are considered separately in the study.
Thus apart from the volumetric measurements, we also have high-dimensional SNP data and data on some other covariates for each individual. The other covariates are gender, disease state, age and allele 2 and 4 of the APOE gene. Except for the covariate age, all other low-dimensional covariates are categorical. To represent the categories, we use binary dummy variables. Since the disease status has three states-NC (no cognitive impairment), MCI (mild cognitive impairment) and AD (Alzheimer's disease), we consider two dummy variables Z AD and Z MCI respectively are the dummy indicator variables for MCI and AD, setting NC as the reference group. Similarly, the dummy variable Z M indicating male gender is introduced setting females as the reference group. Also, we introduce Z APOE,2 , Z APOE,4 standing for Alleles 2 and 4 for the two alleles APOEallele2 and APOEallele4 together setting Allele 3 as a reference group for each of the two cases. We consider the age corresponding to the initial visit as a covariate as well. Let Z = (Z MCI , Z AD , Z M , Age, Z APOE,2 , Z APOE,4 ) stand for the whole vector of covariates. The continuous variable is Age.
With time, different brain regions change differently. We study the effects of different attributes to these changes. For every individual, the volume of a brain region on a particular visit should primarily depend on the volume of that region at the baseline visit and the rate of change of volume for that region with time. These rates of changes are region-specific as well as individuals-specific. Hence, it is logical to consider the baseline volume and the rate of change as functions on the subject and brain region. For simplicity, we do not assume any form of spatial dependence between measurements across brain regions. Thus we need two sets {a 0,j (·, ·), a 1,j (·, ·) : j = 1, . . . , 14} of functions for modeling volume at the initial visit and the rate of change for the jth region. These functions are unknown and are modeled nonparametrically. For nonparametric regression problems, single-index models provide a lot of flexibility in the estimation and interpretation of the results. Hence we adopt the bivariate single-index model with two inputs for high-dimensional and low-dimensional covariates separately for easy interpretation and computational efficiency. The effect of time is captured through an unknown increasing function F 0 (·), which is bounded in [0,1]. This is also modeled nonparametrically. The region-specific rate of change function a 1,j (X i β, Z i η), which is multiplied with F 0 , is dependent on subjects and regions. To make this function identifiable, we consider F 0 , independent of subjects or regions.
Let Y ij (t) be the volume in the logarithmic scale of the jth brain region for the ith individual at the tth time point, X i is high-dimensional SNP expression of length p for the ith individual, t ∈ T i , i = 1, . . . , m and j = 1, . . . , 14 and Z i is low-dimensional covariate of length k for the ith individual. Then the data generating process can be represented through the following specification where a 0,j (·, ·), a 1,j (·, ·), F 0 (·) are all unknown continuous functions and N stands for a normal distribution and ∼ iid signifies that error ε ij (t)s are independent across i, j and t and follows the distribution N(0, σ 2 ). The function F 0 (·) is monotone increasing functions from [0, 1] to [0, 1] and F 0 (0) = 0, F 0 (1) = 1. The other two functions a 0,j (·, ·), a 1,j (·, ·) maps from [−1, 1] 2 to (−∞, ∞). For identifiability of the functions along with the parameters β and η, we assume that β = 1, η = 1; here · denotes L 2 -norm of a vector. We also normalize the covariates for each individual i.e. X i and Z i for each i such that X i and Z i are one. This is to ensure that X i β and Z i η are bounded between [−1, 1] for each i. For nonparametric function estimation, bounded domain is important for uniform approximation using the basis expansion. The biggest challenge for estimation in this model is the high-dimensionality of β. To identify important SNPs from X, we need to perform variable selection. To do that, we propose a sparse estimation scheme for β. First we reparametrize the two unit vector β = (β 1 , . . . , β p ) and η = (η 1 , . . . , η k ) to their respective polar forms which allow us to work with Euclidean spaces. In this setup, only the directions of β and η are identifiable. Note that β and −β have the same directions. In the polar setup, for s = 1, . . . , p − 1, Similarly, let α be the polar angle corresponding to η. Then for s ≤ k − 1,

Prior specification
In the nonparametric Bayesian setup described above, we induce prior distributions on the smooth functions a 0,j and a 1,j in (2.1) through basis expansions in tensor products of B-splines and suitable prior for the corresponding coefficients. Given other parameters in this setup, a normal prior distribution on the coefficients of the tensor products of B-splines will lead to conjugacy and faster sampling via Gibbs sampling scheme. An inverse gamma prior on σ 2 is an obvious choice due to conjugacy and faster sampling. We also put a B-spline series prior on the smooth increasing function of time F 0 (·). The coefficients for this function would be increasing in the index of the basis functions and lie in (0,1]. To put a prior on an increasing sequence, we introduce a set of latent variables of size equal to one less than the number of B-spline coefficients. Then the B-spline coefficients would be normalized the cumulative sum of those latent variables. Other two parameters β and η are reparametrized to their polar coordinate system. The parameter space of the polar angles will be a hyper-rectangle. It will be easier to put prior to the polar angles. To estimate using the sparsity of β, we need to carefully put a shrinkage prior on the polar angles. A polar angle of π/2 will ensure that the corresponding coordinate in the unit vector equals to zero. When there is sparsity in the unit vector, most of the polar angles will be π/2. Thus a spike-and-slab prior on the polar angle with a spike at π/2 should be able to capture sparsity in the corresponding unit vector. The last polar angle has spike both at the multiples of π and π/2, due to the special structure of the last and the penultimate co-ordinates of a unit vector in the polar form. If it is a multiple of π, then the last coordinate is zero and if it is an odd multiple of π/2, then the penultimate coordinate is zero. The support of the last polar angle is [0, 2π] and we need spikes at all multiples of π and π/2. The prior is constructed to maintain that structure of spikes. Since only the directions of β and η are identifiable, the intercept and slope functions are modeled as even functions i.e. symmetric around zero. Now we describe the prior in details. Let x denote the lowest integer greater than or equal to x. (i) Intercept and slope functions: , and putting the prior δ i ∼ Un(0, 1) for l = 1, . . . , K − 1, where Un stands for the uniform distribution.
(v) Polar angles θ of β: We put a spike-and-slab prior on the polar angles that has spike at π/2 for the first polar angle and all the multiples of π/2 for the last polar angle. Then the spike distribution would look like Figure 2. The spike-and-slab prior for θ i , i ≤ (p − 2), has density given by for 0 < θ i < π and the distribution of θ p−1 is given by +γ i Un(0, 2π); here Be(M 1 , M 2 ) [a,b] denotes the beta distribution with shape parameters M 1 and M 2 , supported within the interval [a, b], and M 1 < 1 ≤ M 2 . The indicator variable γ ∼ Ber(q).
The spike distribution on θ looks like Figure 2. The first plot is for the first (p − 2) angles and the second plot is for the last polar angle.
Model selection: Polar angles with a posterior probability of selection in the model more than 0.5 are considered in the model.

Computation
The conditional log-likelihood is given by where C involves only the hyperparameters a, M 2 , M 1 , K, d 1 , d 2 and the observations but not the parameters of the model.
All B-spline coefficients and σ are updated using the conjugacy structure. Using expressions for the derivative of B-splines (De Boor, 2001), it is possible to calculate the derivatives of the log-likelihood with respect to the polar angles and δ of F 0 (t). Thus, we implement an efficient gradient-based MCMC algorithm using Hamiltonian Monte Carlo algorithm, described in Neal (2011). Since these parameters have bounded support, the candidates for a Metropolis step are reflected back if they cross the boundaries. For example if a candidate θ c i of θ i is more than π, then it is re-adjusted as θ c i = π − (θ c i − π). A similar adjustment is done if θ c i becomes smaller than zero as well. Similar strategies are adopted while updating δ's as well. All the posterior updates are discussed in the supplementary materials.
The parameters K and K are chosen by minimizing the Bayesian Information Criterion (BIC) after fitting the model over a grid of a number of B-spline basis functions for randomly generated 50 different choices of β and η. We generate 50 different choices for β and η from the prior and then fit the non-linear model for different choices of a number of B-spline basis functions within the range [7,20]. After taking the average over all 10 BIC values for each case, we choose the number that has the least BIC value as the optimal value. The convergence of the MCMC chain is diagnosed using trace plots.

Large-sample properties
In this section, we examine the large-sample properties of the proposed Bayesian procedure for the model (2.1). We have observations for fixed J number of regions and T many time points. We show posterior consistency in the asymptotic regime of increasing sample size and increasing dimension p of the SNPs. For sake of generality of the method, K is given a prior with probability mass function of the form The functions B m , m = 1, . . . , K, stand for B-spline basis. Also, K is given a prior with probability mass function of the form Π( Here also, we denote B-spline basis functions as B m . Note that either geometric or Poisson distribution (respectively b 3 = 0 and 1) can be chosen as prior on K , the number of terms to be used in the B-spline series for the growth function F 0 . For K, the square, which is the number of terms in the tensor product series representation, can have a geometric or Poisson-like tail. In our computation of the model, we are not using these priors on K and K i.e. the number of B-spline basis functions as it will require reversible jump MCMC strategy which is computationally expensive.
We study the posterior contraction rate with respect to the empirical 2 -distance on the regression function, which is defined as follows. For two sets of parameters (F, a, β, η) and (F * , a * , β * , η * ), the empirical 2 -distance is given by and a = (a 0,j , a 1,j : j = 1, . . . , J), a * = (a * 0,j , a * 1,j : j = 1, . . . , J). Since β is a high-dimensional parameter, sensible estimation is possible only if it has sparsity, which must be picked up by the prior. In the setting of a spike-and-slab prior for polar coordinates described in Section 3, we need to ensure sufficient concentration near π/2 by choosing a large value of the second parameter M 2 in the beta spike distribution (depending on the sample size n and the dimension p) and a small value of the probability of slab γ. More precisely, we choose M 1 ≤ 1 fixed, M 2 > √ np log p and γ = o(p −1 ). Then the contraction rate will be determined by the smoothness of the underlying true functions a 0,0 , a 0,1 and F 0,0 , the sparsity s 0 of true high-dimensional regression coefficient β 0 and mildly on the parameter b 3 , b 3 in the prior distribution for the number of basis elements K, K used in the B-spline bases, as shown by the following result.
In the above result, since the observation points are not dense over the domain, posterior contraction on the function is based on its distance with the true function only at the observation points.
The proof of the theorem uses the general theory of posterior contraction (see Ghosal and van der Vaart (2017)) for independent non-identically distributed observations and some estimates for finite random series based on B-splines. The proof is given in the supplementary materials part.

Simulation
We compare our method with some common penalization methods based on the following simplified linear model We generate a high-dimensional binary matrix X and a low-dimensional covariate matrix Z. The true data matrices of the real data are in the support of the scheme of sampling the high-dimensional matrix as well as the low-dimensional matrix.

Data generation for the non-linear case:
• Generate a data matrix X with elements coming from Bernoulli distribution with success probability of the ith row as p i .
• Generate all the elements of the matrix Z from N(0, 1).
• Generate the sparse vector β 0 with 5% elements non-zero. Positions for non-zero elements are chosen first at random by sampling p/20 elements from total p positions, where p is the length of β 0 . The non zero elements are generated from mixture distribution of two normals N(2, 1) and N(−1, 1).
• Normalize each row of X and Z along with β 0 are η 0 to the unit norm.

Data generation for the linear case:
In this case, the true model is the one given in (6.1). All the steps for generating X, Z, β 0 and η 0 are as given above. The coefficients γ 1,j and γ 2,j , j = 1, . . . , 13, are generated from N(0, 1). After generating the design matrix, the data Y ij (t) is generated from N( For the data generation, we set the error standard deviation to σ 0 = 1. We compare the prediction MSEs across all the methods. We split the whole data into two equal parts for training and testing. We compare the performance of our method with the LASSO (Tibshirani (1996)), the SCAD (Fan and Li (2001)) and the horseshoe method (Carvalho et al., 2010) on testing dataset based estimates of the parameters from the training dataset. We use the R package glmnet for LASSO and ncvreg for SCAD. We develop and use an HMC sampler for the horseshoe prior. For sample sizes 200, 500 and 1000, we gather the mean squared error (MSE) values for non-linear and linear cases. We use half of the sample for training and the remaining half for testing. Among other parameters, we consider thirteen regions in total, five-time points and vary the value of p as 5000, 10000 and 20000. We set M 1 = 0.1, M 2 = 10 and tune q to ensure a good acceptance rate and desired model size (sum of the γ i ) across MCMC samples. We consider maintaining the desired model size between 20 and 30 at each step of the MCMC iterations. The results are summarized below for 50 replications and 3000 postburn samples after burn-in 1000 samples. The number of basis functions for spline is different across different sample sizes. To fit our model we vary the number of B-spline basis functions as 8 for n = 200, 11 for n = 500, and 14 for n = 1000. These numbers are chosen according to the strategy described in the Section 4. Let I 0 be the set of indices such that {i : β 0i = 0}. We have in total of 26 non-zero variables. From the posterior samples of γ, we can identify the top 26 selected variables. For other methods, we select the variables with the highest 26 absolute values inβ and ignore the ones with very low magnitudes relative to others. LetX denote the final selected set of variables for different cases and let X I0 be the true set of variables. To examine the performance of variable selection, we report the maximum canonical correlation betweenX and X I0 for each case in the bracket.
From Table 1, we infer that the performance of the proposed Bayesian method based on the high-dimensional single-index model is always much better than the LASSO, the SCAD, and the horseshoe method for the non-linear case. For the linear case in Table 2, it is competitive with linearity based methods like the LASSO, the SCAD, and the horseshoe method. This is natural as the LASSO, the SCAD or the horseshoe method use more precise modeling information which the semiparametric methods cannot use. However, in terms of maximum canonical correlations, our model outperforms in the non-linear case and remains extremely competitive in the linear case.  Table 2: Comparison of the proposed high-dimensional single-index model with the LASSO, the SCAD, and the horseshoe method in terms of MSE and the maximum canonical correlation between a selected set of variables and true set of variables in the bracket for the linear case.
7 Real-data analysis

Incorporating random effects and regions wise varying effect
As the data are longitudinal, it is reasonable to add a subject specific random effect (τ i ) in the model. We also vary the coefficient η of the low-dimensional covariates regionwise following the linear model of Hostage et al. (2014). However, we keep the highdimensional coefficient β fixed for all the regions because of the high computational cost. Thus, the selected SNPs are responsible for the change in all the brain regions. The new modified model will then become Prior on the random effects: We put a Dirichlet process scale mixture of normal prior to the random effect distribution.

Region-wise varying effect with no SNP
To compare the nonlinear model with the linear model of Hostage et al. (2014), we also fit the following model without the SNPs:

Corresponding linear model
We compare the performance of our method based on the above non-linear model with the following linear model based on Hostage et al. (2014), where t = 1, . . . , T i with 1 ≤ T i ≤ 6, j = 1, . . . , 14, i = 1, . . . , 748, and We have the volumetric measurement data for the total thirteen brain regions along with the summary measure of the whole brain over time for 748 individuals. For each individual, the covariate information is summarized in Table 3. The reference group for our analysis is a female individual with average age and no cognitive impairment.
We first fit the model in (7.2) and the following linear model in (7.3) in accordance with Hostage et al. (2014) with the same set of covariates and interactions between APOE and disease states. Then we compare the prediction MSE. The prediction error  Table 3: Demographic table to summarize data in terms of number of no cognitive impaired (NC), Alzheimer's disease (AD) and mildly cognitive impaired (MCI) participants along with the age across the two gender groups male and female.
gives us predictive performance and fitted relative MSE helps to judge the reliability of inference. We consider a basis consisting of 17 B-spline functions for univariate and 17 2 basis functions for bivariate cases. The estimates are based on 5000 post-burn MCMC samples after burning-in the first 1000 samples.
To compare the fitted models, we calculate the prediction error in each model. To calculate the prediction error, we divide the whole dataset into training (Tr) and testing (Te) sets. We use stratified sampling using each subject-region pair as stratum so that training will have all the individuals that belong to the testing set. This is important for prediction with a random effect in the model. The formula for prediction error will be |Te| −1 (i,j,t)∈Te (Y ij (t) −Ŷ ij (t)) 2 ; here |Te| denotes total number of elements in the test set Te. The linear model gives the prediction error of 3.83 whereas that in our non-linear model hugely improves to 0.06. The model in (7.1) with SNPs improves the prediction error to 0.45 which is about 25% improvement. Region-wise prediction errors for the models in (7.1) and (7.3) are provided in Table 1 and 2 of the Supplementary Materials respectively. Figure 3 shows the estimated F 0 (t) function for the model in (7.1). The estimated effect of time is indeed non-linear. This suggests a non-linear change in the logarithm of volume with time for a given individual and a brain region.
After selecting the significant genes, we calculate the Bayesian information criterion (BIC) of models leaving out one of the low-dimensional covariates every time with all the genes to compare the significance. If BIC for the model leaving out covariate A is higher than the model leaving out covariate B, then covariate A is more significant than covariate B. The table below gives an ordered list of the significance of low-dimensional covariates for different regions in Tables 4. In Table 5, we show the estimates from the linear model in (7.3) for the whole brain.
We map the significant SNPs from our analysis to the corresponding genes using the R package rsnps. We tune the parameter γ in the model to select the 20 most significant SNPs. Among those, we could map 11 of those to some genes. The significant genes from our analysis are mentioned in Section 8 along with some previous studies that found the corresponding gene significant for the AD and/or cerebral atrophy.

Conclusions and discussion
We fit a bivariate single-index model to capture the volumetric change of different cortical regions in the human brain. There are both high and low-dimensional covariates as input in the unknown functions determining the initial configuration and the rate of change of different regions. To tackle the high-dimensional covariate within a single-index model, we provide a new technique of assigning a sparse prior in this paper and propose an efficient MCMC scheme using a Hamiltonian Monte Carlo sampling. We present a result on posterior consistency of the resulting procedure. An 'R' package is attached to this paper as supplementary material (https://github.com/royarkaprava/ High-Dimensional-Single-index-model).
In our results on the real dataset, we find that allele 4 of the APOE gene is always among the top three significant covariates for almost all the cases in Table 4. The fact that allele 4 of the APOE gene is significant was established in Hostage et al. (2014). They used a linear model, similar to the model in (7.3). Allele 4 is not found to be significant for the linear case in Table 5 as well as for several other regions in the linear case. However it is always ranked among the top three most significant covariates for the non-linear model. Thus, for this dataset, the linear model is not appropriate. We identify 11 significant genes. There are some previous studies that also noted the significant genes from our analysis as possible candidates for the AD and/or cerebral atrophy. The genes along with associated future study citing that gene in connection with AD and/or cerebral atrophy are mentioned here SLC6A1 (cerebellum) (Carvill et al., 2015), KCNIP4 (Himes et al., 2013), ADGRL3 (Orsini et al., 2016), SORBS2 (Zhang et al., 2016;Lee et al., 2014;Niceta et al., 2015), LPAR3 (Yung et al., 2015), SHROOM3 (Dickson et al., 2015;Freudenberg-Hua et al., 2016), SORCS3 (Breiderhoff et al., 2013;Lane et al., 2012), NPY2R (Lin et al., 2010;Schriemer et al., 2016), CWF19L2 (Lin et al., 2013), PALLD (Nho et al., 2015) and KCNMA1 (Burns et al., 2011;Tabarki et al., 2016). In particular, SLC6A1 has been found to affect cerebellum (Carvill et al., 2015); ADGRL3 affects hippocampus, the prefrontal cortex, and the striatum (Orsini et al., 2016); LPAR3 affects central and peripheral nervous tissues (Yung et al., 2015); SORCS3 and PALLD affect hippocampus (Breiderhoff et al., 2013;Nho et al., 2015); KCNMA1 affects olfactory bulb, cortex, basal ganglia, hippocampus, thalamus, cerebellum, vestibular nuclei, and spinal cord (Tabarki et al., 2016). Apart from this, Figure 3 suggests that with time logarithm of volume changes non-linearly for a given individual and a brain region. The estimated function closely resembles a cubic polynomial in shape, among the class of polynomial functions.
We have kept the low-dimensional covariates fixed with time. One interesting future direction would be to modify the model to incorporate time-varying covariates as well. For example, the disease status of some of the participants changed during the span of this study. Although they were very small in number and we ignored them from this study, it would be useful to consider them as well, using a more elaborate model. Also, we have not included the Mini-Mental State Exam (MMSE) scores for our analysis. It will be interesting to study the effect of that covariate as well. Our proposed model limits us to choose a unique set of genes across all the regions. We can modify to select separate sets of genes for different parts of the brain at the expense of additional computational burden. This would give us more insights into the interdependence between genes and different parts of the brain. In our computation of the model, we are not using the priors mentioned in Section 5 on K and K i.e. the number of B-spline basis functions as it will require reversible jump MCMC strategy which is computationally expensive. The model will be richer if these priors can be incorporated.
A package to fit the high-dimensional single index model, as well as a linear regression model with Dirichlet-Laplace and horseshoe priors using the HMC algorithm, is given in https://github.com/royarkaprava/High-Dimensional-Single-index-model.