Heritability estimation in high dimensional sparse linear mixed models

: Motivated by applications in genetic ﬁelds, we propose to esti- mate the heritability in high-dimensional sparse linear mixed models. The heritability determines how the variance is shared between the diﬀerent random components of a linear mixed model. The main novelty of our approach is to consider that the random eﬀects can be sparse, that is may contain null components, but we do not know either their proportion or their positions. The estimator that we consider is strongly inspired by the one proposed by Pirinen, Donnelly and Spencer (2013), and is based on a maximum likelihood approach. We also study the theoretical properties of our estimator, namely we establish that our estimator of the heritability is √ n -consistent when both the number of observations n and the number of random eﬀects N tend to inﬁnity under mild assumptions. We also prove that our estimator of the heritability satisﬁes a central limit theorem which gives as a byproduct a conﬁdence interval for the heritability. Some Monte-Carlo experiments are also conducted in order to show the ﬁnite sample performances of our estimator.


Introduction
Linear mixed models (LMMs) have been widely used in various fields such as agriculture, biology, medicine and genetics.In quantitative genetics, LMMs have been used for estimating the heritability of traits and breeding values as explained for instance by Lynch and Walsh (1998).In Genome Wide Association Studies (GWAS), which is the application field that inspired our work, Yang et al. (2011) suggested the use of linear mixed models to measure genotypes at a large number of single nucleotide polymorphisms (SNPs) in large samples of individuals in order to identify genetic variants that explain variations in phenotypes.
The model that we shall study in this paper is a LMM defined as where Y = (Y 1 , . . ., Y n ) is the vector of observations, X is a n × p matrix of predictors, β is a p × 1 vector containing the unknown linear effects of the predictors, and u and e correspond to the random effects.Moreover, in (1), Z is a n × N random matrix which will be further described in Section 2. We shall assume that the random effects can be sparse, that is only a proportion q of the components of u are non-zero: ∼ (1 − q)δ 0 + qN (0, σ u 2 ) , for all 1 i N and e ∼ N 0, σ e 2 Id R n , (2) where Id R n denotes the n × n identity matrix, q is in (0, 1], and δ 0 is the point mass at 0. Note that this corresponds to a more general situation than the usual assumption of (non-sparse) Gaussian random effects which is recovered when q = 1.The use of linear mixed models to estimate heritability has been proposed by Yang et al. (2011) as an alternative to the regression models usually used in GWAS.The goal is to consider the joint effect of all SNPs on a phenotype, and the heritability corresponds to the proportion of phenotypic variance explained by all SNPs.
In the GWAS framework, Z is thus a matrix having a number of rows equal to the number of individuals in the experiment that is n ≈ 1000 and a number of columns equal to the number of SNPs taken into account in the experiment, namely N ≈ 500, 000.This application motivated the framework that we chose where n and N tend to infinity.
The major difference between the framework of Yang et al. (2011) and ours is that they consider that the random effects are Gaussian while we consider a mixture model between a point mass at 0 and a Gaussian distribution.With this modeling, we assume that all SNPs are not necessarily causal, that is that all SNPs do not explain a given phenotype.
Our main goal in this paper is to propose an estimator for the heritability in this possibly sparse framework and to establish its theoretical properties in the non standard theoretical context where n and N tend to infinity.
In this paper, we prove that using a strategy close to the one proposed by Pirinen, Donnelly and Spencer (2013), which has been devised in the case q = 1, provides consistent estimators even in the case where q < 1.Moreover, we prove that this estimator is √ n-consistent in the following asymptotic framework: n → ∞ and N → ∞ such as n/N → a > 0 and satisfies under mild assumptions a central limit theorem in both cases q = 1 and q < 1.It has to be noticed that the classical results that exist in linear mixed models are established only in the case where q = 1, n tends to infinity and N is constant.
The paper is organized as follows.Section 2 provides a detailed description of the model and the heritability estimator that we propose.Section 3 reviews existing methods for heritability estimation.Section 4 is dedicated to the theoretical properties of our estimator.The numerical results are presented in Section 5.They have been obtained thanks to the R package HiLMM that we have developed and which is available from the Comprehensive R Archive Network (CRAN).In Section 6, we provide some additional comments on our work as well as some prospects such as the estimation of the proportion q of non null components in the random effects.Finally, the proofs are given in Section 7.

Model
In the sequel, up to considering the projection of Y onto the orthogonal of the image of X and for notational simplicity, we shall focus on the following model where Y = (Y 1 , . . ., Y n ) is the vector of observations, u and e correspond to the random effects, which are defined in (2).Moreover, Z is a n × N random matrix such that the Z i,j are normalized random variables in the following sense: they are defined from a matrix W = (W i,j ) 1 i n, 1 j N by where In ( 4) and ( 5) the W i,j 's are such that for each j in {1, . . ., N} the (W i,j ) 1 i n are independent and identically distributed random variables and such that the columns of W are independent.With this definition the columns of Z are empirically centered and normalized.
In genetic applications, the matrix W contains all the genetic information about all the individuals in the study.More precisely, for each j, the (W i,j ) 1 i n are i.i.d binomial random variables with parameters 2 and p j .W i,j = 0 (resp.1, resp.2) if the genotype of the ith individual at locus j is qq (resp.Qq, resp.QQ) where p j is the frequency of Q allele at locus j.
In Model ( 1) with ( 4), ( 5), (2), one can observe that Inspired by Pirinen, Donnelly and Spencer (2013), Model (1) can be rewritten by using the following parameters: The parameter η which belongs to [0, 1] is commonly called the heritability in the case where q = 1, see for instance Yang et al. (2010), and determines how the variance is shared between u and e when all the components of u are non zero.We propose in (6) to extend this definition to the case where u may contain null components and q is in (0, 1].The parameter q actually corresponds to the proportion of non null components in u that is to the proportion of causal SNPs.Then, the heritability defined by η in (6) corresponds to the proportion of phenotypic variance explained by the causal variants.

Heritability estimator
In the case where q = 1, observe that where η and σ are defined in (6).
Let U as the orthogonal matrix (U U = UU = Id R n ) such that URU = diag(λ 1 , . . ., λ n ) is a diagonal matrix having its diagonal entries equal to λ 1 , . . ., λ n .Hence, in the case where q = 1 and conditionally to Z, Y = U Y is a zeromean Gaussian vector having a covariance matrix equal to diag(η , where the λ i 's are the eigenvalues of R. The method proposed by Pirinen, Donnelly and Spencer (2013) consists in computing the log-likelihood and to maximize this function of two variables by iterative optimization techniques.Since in our case we are only interested in estimating η , we plugged an estimator of Thus, in the case q = 1, the maximum likelihood strategy would lead to estimate η , assumed to be in [0, 1 − δ] with δ > 0, by η defined as a maximizer of where the Y i 's are the components of the vector Y = U Y. We shall establish in Theorem 2, which is proved in Section 7.2, that this strategy produces √ n-consistent estimators of η in both cases: q = 1 and q < 1 and also that this estimator satisfies a central limit theorem which provides as a by-product confidence intervals for η .

Existing methods for heritability estimation
Several approaches can be used for estimating the heritability in the case where q = 1 but to the best of our knowledge, no theoretical results concerning the estimation of the heritability or the estimation of σ u , σ e have been established in the framework where both n and N tend to infinity.This is one of the contributions of our paper.Among these approaches, we can quote the REML (REstricted Maximum Likelihood) approach, originally proposed by Patterson and Thompson (1971) and described for instance in Searle, Casella and McCulloch (1992), which consists in estimating first σ u and σ e and then to estimate η as the ratio η = N σu 2 /(N σu 2 + σe 2 ).However, this type of approach is based on iterative procedures that require expensive matrix operations.Hence, several approximations have been proposed such as the AI algorithm (Gilmour, Thompson and Cullis (1995)) which is used for instance in the software GCTA (Genome-wide Complex Trait Analysis) described in Yang et al. (2011).Other approximations have also been proposed in the EMMA algorithm (Kang et al. (2008)).For further details on the different approximations that could be used we refer the reader to Pirinen, Donnelly and Spencer (2013).The latter paper proposes another methodology for estimating the heritability which consists in rewriting Model (1) with the parameters (6) and in using an eigenvalue decomposition of the matrix R. Further details on their methodology are given hereafter.According to the numerical experiments conducted in Pirinen, Donnelly and Spencer ( 2013) their approach has the lowest computational burden among the available algorithms.
In the case of sparse high dimensional framework, most of the papers studied the case of linear models.Among them, we can quote: Meinshausen and Bühlmann (2010) and Beinrucker, Dogan and Blanchard (2014).The high dimensional linear mixed model where u is sparse, that is the case where q < 1, which is the most realistic case for the applications that we have in view, has received little attention.It has been studied according two directions: detection and estimation.Concerning the detection field in this framework, we are only aware of the work of Arias-Castro, Candès and Plan (2011) in which a testing procedure for detecting a sparse vector in high dimensional linear sparse regression model is also proposed and compared with the one proposed by Ingster, Tsybakov and Verzelen (2010).As for the procedures dedicated to the heritability estimation, there exist, to the best of our knowledge, only three approaches: the approach of Yang et al. (2010) who propose to approximate the genetic correlation between every pairs of individuals across the set of causal SNPs by the genetic correlation across the set of all SNPs, the approach of Golan and Rosset (2011) who propose a methodology based on MCEM (Monte-Carlo expectation-maximization) developed by Wei and Tanner (1990) and the Bayesian approaches of Guan and Stephens (2011) and Zhou, Carbonetto and Stephens (2013).However, as far as the estimation issue in the high dimensional linear mixed model is concerned, the authors of these papers did not establish the theoretical properties of their estimators in the framework where both n and N tend to infinity.

Theoretical results
Observe that another way of writing Model (3) with the parameters defined in (6) consists in writing where ε is a n × 1 Gaussian vector having a covariance matrix equal to identity and t = (t 1 , . . ., t N ) is a random vector such that where the w i 's and the π i 's are independent, w = (w 1 , . . ., w N ) is a Gaussian vector with a covariance matrix equal to identity and the π i 's are i.i.d Bernoulli random variables such that P(π 1 = 1) = q.
The estimator η is defined as a maximizer of L n (η) for η ∈ [0, 1 − δ] for some small δ > 0, L n being given in ( 7).We shall study the asymptotic properties of η as n and N tend to infinity in a comparable way, that is when n/N → a > 0.
To understand the asymptotic behavior of η, we shall first prove its consistency, then use a Taylor expansion of the derivative of L n around η in the usual way.The computations as can be seen in ( 7) involve empirical means of functions of the eigenvalues λ i of R = ZZ N .Using Theorem 1.1 of Bai and Zhou (2008), we shall prove the almost sure convergence of such empirical quantities under a weak assumption denoted by Assumption 1 as follows.
(A1) Let Z and W be two matrices defined by ( 4) and ( 5).Recall that for each j in {1, . . ., N} the (W i,j ) 1 i n are independent and identically distributed random variables and such that the columns of W are independent (but not necessarily identically distributed).Assume that the entries W i,j of W are uniformly bounded, and have variance uniformly lower bounded, that is: there exist W M < ∞ and κ > 0 such that 0 W i,j W M and σ 2 j = Var(W i,j ) κ, for all j.The following lemma ensures that the result of Marchenko and Pastur (1968) which gives the empirical spectral distribution of sample covariance matrices ZZ /N holds even when the entries Z i,j of the matrix Z are not i.i.d.random variables but when Z is obtained by empirical standardization of a matrix W satisfying Assumption 1.
k=1 1 {λ k x} tends almost surely to the Marchenko-Pastur distribution defined as the distribution function of μ a where, for any measurable set A, Our first main result is the √ n-consistency of the estimator η. 8) with η > 0 and the entries W i,j of W satisfy Assumption 1.Then, for all q in (0 Such a result is a theoretical cornerstone to legitimate the use of an estimator.However, statistical inference has to be based on confidence sets.The next step is thus to find the asymptotic distribution of √ n(η −η ).Define for any η ∈ [0, 1] and λ 0 Define also We are now ready to state our second main result about the asymptotic distribution of √ n(η − η ).For general q, the result only holds when the entries of Z, that is the random variables Z i,j are i.i.d.standard Gaussian.Indeed, as may be seen when computing the variances, we need to be able to find the asymptotic behavior of empirical means of functions of the eigenvalues together with the eigenvectors of the matrix R = ZZ /N .8) with η > 0 and assume that the random variables Z i,j are i.i.d.N (0, 1).Then for any q ∈ (0 converges in distribution to a centered Gaussian random variable with variance where In the case where q = 1, the result holds in the general situation described in Assumption 1, and allows us to propose confidence sets with precise asymptotic confidence level.8) with q = 1 and with η > 0. Assume also that the entries W i,j of W satisfy Assumption 1 then, as n, N → ∞ such that n/N → a > 0, converges in distribution to N (0, 1).
Let us now give some additional comments on the previous results.Firstly, it has to be noticed that none of the limiting variance depends on σ .Secondly, Theorem 2 is proved here only in the case where the Z i,j are i.i.d.Gaussian.This is because we used several times that the matrix of eigenvectors of ZZ /N is independent of the eigenvalues, and uniformly distributed on the set of orthonormal matrices.We think that the result of Theorem 2 is also valid when the Z i,j are defined from the W i,j satisfying Assumption 1, as suggested by the numerical results obtained in Section 5. To prove it requires new results in an active research topic of the random matrix theory field.We can observe in the expression of τ 2 (a, η ) given in Theorem 2 that the presence of q is counterbalanced by the presence of a 2 .This will be confirmed by the results obtained in the numerical results given in Section 5. Finally, we can observe that 2/(nγ 2 n ) corresponds to the usual inverse of the Fisher information associated to η.This result is classical in the case where N is fixed and n tends to infinity but did not exist in the framework where both n and N tend to infinity even if it was already used in biological applied papers for deriving standard errors and confidence intervals.Theorem 3 proves that this result still holds even in the case where both n and N tend to infinity.
To the best of our knowledge, the effect of the presence of null components in the random effects has never been taken into account for computing the asymptotic variance of an estimator of the heritability.This is the contribution of Theorem 2. This theorem shows that the asymptotic variance contains an additional term which increases its value in the case q < 1 with respect to the case q = 1.It is shown in Section 5 how the computation of the asymptotic variance can be altered if this additional term is neglected.In practical situations, computing the standard error given by Theorem 2 requires the knowledge of q which is in general unknown.However, if an estimation of q is available for any practical reasons, the result of Theorem 2 can be used for computing confidence intervals and standard errors, see Section 6 for further details.

Numerical experiments
In this section, we first explain how to implement our method and then we illustrate the theoretical results of Section 4 on finite sample size observations for both cases: q = 1 and q < 1.We also compare the results obtained with our approach to those obtained by the GCTA software described in Yang et al. (2010) and Yang et al. (2011) which is a reference in quantitative genetics.

Implementation
In order to obtain η, we used a Newton-Raphson approach which is based on the following recursion: starting from an initial value η (0) , Estimation of η obtained in the case a = 0.1 and η = 0.8 for different values of initialization: η (0) = 0.1 (dots), η (0) = 0.5 (triangles) and η (0) = 0.9 (crosses).The plain line displays the estimations obtained with our method to select the best initialization value and the x-axis is the replication number.
where L n and L n denote the first and second derivatives of L n defined in (7), respectively.The closed form expression of these quantities are given in ( 13) and ( 25), respectively.In practice, this approach converges after at most 20 iterations and is not very sensitive to the initialization, namely to the value of η (0) .However, in particular cases, the value of the initialization can have an influence on the estimation of η .This is the case, for instance, when the real value η is close to 1.In these situations, our algorithm can provide an estimation bigger than 1 and we constrained our method to return a value equal to 0.99. Figure 1 shows the estimations obtained on 100 replications when a = 0.1 and η = 0.8.From this figure, we can see that the estimation of η does not depend in general on the initialization, except in some cases.Moreover, the best choice for η (0) is not constant from one replication to another.In order to limit the effect of the initialization, our algorithm uses several values for η (0)  and whenever the estimations differ, it keeps the estimation which is the farthest away from the boundaries.

Results in model (2) when q = 1
We shall first consider the performance of the estimator η when q = 1 for η in {0.3, 0.5, 0.7}, n = 1000, σ u = 0.1 and for a in {0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1}, where a = n/N .We generated 500 data sets according to Model (1) using these parameters and Z as defined in (4) where the W i,j are binomial random variables with parameters 2 and p j .In our experiments the p j 's are uniformly drawn in [0.1, 0.5].The corresponding boxplots of η are displayed in Figure 2. We can see   from this figure that our approach provides unbiased estimators of η and that the smaller the a the larger the empirical variance.
In order to illustrate the central limit theorem given in Theorem 3, we first display in Figure 3 the histograms of γ n (n/2) 1/2 (η − η ) along with the p.d.f of a standard Gaussian random variable for η = 0.5 and different values of a.We can see that the Gaussian p.d.f fits well the data in all the considered cases.We also display in Figure 4 the values of n −1/2 2γ −2 n and the empirical standard deviation of (η − η ) averaged over all the experiments.As shown in Theorem 3, we also observe empirically that both quantities are very close.
In practice, the value of γ −1 n (n/2) −1/2 can be used for deriving confidence intervals for η .As we can see from Figure 4, our approach leads to very accurate confidence intervals for a larger than 0.1 even in finite sample size cases.
Let us now compare our results with those obtained with the software GCTA.As we can see from Figure 5 which displays the boxplots of η for different values of a when η = 0.7, the results found by our approach and GCTA are very close.In both cases, we observe that when a is close to 1 the estimations of η are very accurate but when a is small the standard error becomes very high.

Results in model (2) when q < 1
This section is dedicated to the study of the performance of η when q < 1.We generated 500 data sets according to Model (1) for η = 0.7, a ∈ {0.05, 0.1, 0.5, 1}, different values of q and Z defined in (4) where the W i,j are binomial random variables with parameters 2 and p j .In our experiments the p j 's are uniformly drawn in [0.1, 0.5].
Figure 6 displays the boxplots of η for these parameters.We can see from this figure that for small values of a, the estimators of η have the same behavior for q = 1 and q < 1.However, when a = 1 or a = 0.5, we can see from this figure that the presence of null components strongly alter the performance of the estimator of η .Since in typical GWAS experiments, a = 0.01 or even smaller, the results of Figure 6 could lead to conclude that considering the case q < 1 is not necessary for such values of the parameter a.However, as already noticed from Figure 2, the variance of η is very large for small values of a, hence considering the presence of null components and proposing a strategy for selecting only the non null components of u could be one way to increase a and thus to diminish the variance of η.
In order to illustrate the central limit theorem given in Theorem 2, we first display in Figure 7 the histograms of τ −1 n n 1/2 (η − η ) along with the p.d.f of a standard Gaussian random variable for η = 0.7, two values of q: q = 0.01 and q = 0.1 and a = 0.5 (top) and two values of a: a = 0.2 and a = 0.5 with q = 0.5 (bottom).Here, τ n is the empirical version of τ (a, η , q) where γ is replaced by γ n and S(a, η ) is replaced by its empirical version with the eigenvalues of R. When a is large (a = 0.5), we can see that the higher q the better the Gaussian p.d.f fits the histograms.
We also display in Figure 8 the values of n −1/2 τ n and the empirical standard deviation of (η − η ) averaged over all the experiments for η = 0.7 and q = 0.5.As shown in Theorem 2, we observe empirically that both quantities are very close.We also display in this figure the value of n −1/2 τ n with q = 1 which boils down to consider the asymptotic standard deviation found in the non sparse model.We can see from this figure that neglecting the term depending on q leads to underestimate the asymptotic variance of η and that this difference is all the more striking that a is close to 1. n n 1/2 (η − η ) for a = 0.5 and q = 0.5 (top left), a = 0.1 and q = 0.1 (top right), and for a = 0.1 and q = 0.01 (bottom left), a = 0.05 and q = 0.1 (bottom right).

Discussion
In the course of this study, we have proposed a methodology for estimating the heritability in high dimensional linear mixed models.This methodology has two main features.Firstly, the theoretical performances of our estimator are established in a non standard theoretical framework where n and N tend to infinity and where the components of the random effect part can be equal to zero.Secondly, the computational burden of our approach is very low which makes its use possible on real data coming from GWAS experiments.
As a byproduct of the central limit theorem that we establish for η we can derive a confidence interval for the heritability.However, the confidence intervals depend on q which is the proportion of non null components in u and which is general unknown.For estimating q, several strategies can be considered.One could, for instance, use a GWAS approach to compute the p-values of the correlation tests of each SNP with the observations Y and then keep only the most significant ones.Such a practical approach can be used for providing a lower bound for q.A refinement of this approach has been proposed by Toro et al. ( 2014) who observed, through numerical studies, that for a fixed value of the heritability, the minimal p-value is all the more low that the number of causal SNPs is small.Hence, performing a GWAS approach on a given data set allows them to have an idea of the number of SNPs which are likely to be causal.One could also propose another practical method based on a variable selection technique.Such an approach has already been proposed by Fan and Li (2012) in the context of sparse linear mixed models.However, the framework in which their theoretical results are derived is different from the one that is considered in our paper.We are currently working on a paper Bonnet et al. (2015) which presents a variable selection method which is adapted to our framework and which could be used for estimating the proportion q of non null components in the random effects.
Moreover, we did not take into account the linkage disequilibrium issue which would require to extend our results to the case where the columns of the random matrix are correlated.This question will be the subject of a future work.

Proofs
Let us write the singular value decomposition (SVD) of the n×N matrix Z/ where U (already introduced in Section 1) is a n × n orthonormal matrix, V is a N × N orthonormal matrix and √ D is a n × n diagonal matrix having its diagonal entries equal to √ λ i , the λ i 's being the eigenvalues of R = ZZ /N previously defined.Thus, (8) rewrites as where ε = U ε is a n × 1 centered Gaussian vector having a covariance matrix equal to identity.We shall use repeatedly the following lemma which is proved in Section 7.4.
Lemma 2. Let Y be defined by ( 11) and H be a n × n diagonal matrix, then where and Another useful lemma will be the following.
Lemma 3.Under Assumption 1, let h : R + → R + be such that there exist α > 0 and C such that for all n, The proof of this lemma follows from the application of Lemma 1 to the bounded function h1 h M , and the Markov inequality applied to the empirical mean of h1 h>M .
Lemma 4.Under Assumption 1 let n, N → ∞ be such that n/N → a > 0. Then there exists C such that for all n, To prove the lemma, notice that where the values of the involved expectations may be found in the proof of Lemma 1 in Section 7.4.We thus have which ends the proof.

Proof of Theorem 1
The first step is to prove the consistency of η.We shall first prove that L n (η) converges uniformly for η ∈ [0, 1 − δ] in probability to L(η) given by Using Lemma 2 with H with diagonal entries 1/(η(λ i − 1) + 1), we get that which leads to Now, using Lemma 3 we easily get that In order to prove the uniform convergence of L n to L in probability on [0, 1 − δ], we shall use the following lemma which is proved in section 7.4.

Lemma 5. Assume that for any
, as n tends to infinity.

Let us now prove that sup
shows that it is decreasing and that it takes negative values for η ∈ [0, 1 − δ], so that its absolute value is maximum for η By Lemma 2 with H = Id, we get where the last equality comes from Lemma 3. In the same way, we get by using Lemma 2 with H having its diagonal entries equal to λ i and Lemma 3 that Finally, we get from Lemma 3 that which ends the proof of (12).By Lemma 5, we thus have proved that Now, using Jensen's inequality, we easily get that for all η ∈ [0, 1], L(η) L(η ), with equality if and only if η = η .This together with (14) gives η = η + o P (1). (15) The next step is to prove that √ n(η − η ) = O P (1).Let us first note that η satisfies the following equation: We first prove the asymptotic convergence of L n ( η).
Let us now focus on the properties of L n (η ).Using the following notation we see that √ nL n (η ) can be rewritten as follows: But using Lemma 2 and Lemma 3 we get Moreover, by Lemma 3, n −1 n i=1 g(η , λ i ) converges in probability to g(η , λ)dμ a (λ).Thus, Using again Lemma 2 and Lemma 3 we obtain √ nL n (η ) = O P (1).
This, together with Lemma 6 and ( 16) ends the proof of Theorem 1.

Proof of Theorem 2
Notice first that all previous results may be used, replacing Assumption 1 by the assumption that the Z i,j are i.i.d.standard Gaussian.Indeed, in this case, Lemma 1 reduces to the original result of Marchenko and Pastur (1968), Lemma 3 only involves Lemma 1 and truncation arguments, and the computations leading to Lemma 4 still hold.Thus, Theorem 1 and Lemma 6 also still hold.
Let us now prove that √ nL n (η ) converges in distribution to a centered Gaussian.Define H the diagonal n × n matrix with diagonal entries Then using ( 18) and Lemma 3 we have Now using Lemma 2 we get that setting γ 2 n = Var [L n |Z], The first term in this sum converges as n, N → ∞ to 2σ 4 γ 2 (a, η ).Under the assumption that the Z i,j are i.i.d.standard Gaussian, the matrix of eigenvectors V is Haar distributed on the orthonormal matrices, and is independent of (λ i ) 1 i n , see Bai and Silverstein (2010) chapter 10.Conditionally to the eigenvalues (λ i ) 1 i n , we thus get that so that whose variance, conditionally to Z is In the same way as for γ 2 n we get that s 2 n,1 = σ 4 η 2 1 q − 1 S(a, η ) + o P (1). Let Since η > 0, the function λ → λ(λ−1) (η (λ−1)+1) 2 is bounded, and Also, the variables ( for large enough n.Then, by Lindeberg's Theorem, conditionally to Z, converges in distribution to N (0, 1).
We shall prove that, conditionally to Z and Δ, (L n −E(L n |Δ, Z))/s n,2 converges in distribution to N (0, 1), and thus also unconditionally.Write Here, Ṽ is the N × n-matrix which consists of the first n columns of V. Let φ be the characteristic function of (L n − E(L n |Δ, Z))/s n,2 conditionally to Z and Δ.Notice first that if b j , j = 1, . . ., n + N are the eigenvalues of B, we may write for random variables e j i.i.d.standard Gaussian.Thus √ n and Taylor expansion leads to We shall now prove that 1 Using the distribution of V and its independence on D we get Moreover, tedious computations again give and we obtain that We shall now prove that N +n j=1 b 3 j = o P (1).To do so, it is enough to prove that max j |b j | = o P ( √ n).Notice that for any normed vector A First, since η > 0, all entries of H and D and HD are uniformly bounded and so are all entries of Δ.We thus get A 2 HA 2 = O(1) and A 1 (Δ Ṽ√ DH)A 2 = O(1).Then, using the distribution of V and its independence on D we get so that A BA = O P (1).We have thus proved that max j |b j | = O P (1) = o P ( √ n).Thus φ(t) converges in probability for all t to exp − t 2 2 and the convergence may be strengthened by contradiction to an a.s.convergence, so that conditionally to Z and Δ, (L n − E(L n |Δ, Z))/s n,2 converges in distribution to N (0, 1).Now, conditionally to Z and Δ, (L n − E(L n |Δ, Z))/s n,2 converges in distribution to a Gaussian random variable independent of Δ.Thus conditionally to converge in distribution to independent Gaussian variables, so that their sum converges in distribution to a centered Gaussian with variance the sum of the variances, namely the limit of γ 2 n , and Theorem 2 is proved.
Notice first that when q = 1, (U 1 , . . ., U n )|Z is a centered Gaussian vector with a covariance matrix equal to σ 2 times the identity matrix.We shall prove that conditionally to Z, √ nL n (η ) converges in distribution to N (0, 2σ 4 γ 2 (a, η )) so that the result still holds unconditionally.Using (18), it is only needed to prove it for 1 Also, setting ξ i = U i 2 − 1 g(η , λ i ) − g(η , λ)dμ a (λ) and C an upper bound of |g(η , λ)|, we get that for any c > 0, where the first equality comes from the fact that the distribution of (U 1 , . . ., U n )|Z does not depend on Z and is thus also the distribution of (U 1 , . . ., U n ).Then, using Lindeberg's Theorem, conditionally to Z, √ nL n (η ) converges in distribution to N (0, 2σ 4 γ 2 (a, η )) and thus also unconditionally.
The fact that γ 2 n converges in probability to γ 2 (a, η ) is a straightforward consequence of Taylor expansion, the fact that g(η , λ) and its derivative with respect to η in the neighborhood of η are bounded functions of λ, and Slutzky's Lemma.
In particular for η = η , we have where L (3) n (η) is the third derivative of L n (η), and a similar handling of empirical means as before.Indeed, all functions of λ involved are bounded as soon as α is such that η 2α.

Fig 2 .
Fig 2. Boxplots of η for different values of a, for η = 0.3 (left), η = 0.5 (middle) and η = 0.7 (right).The horizontal line corresponds to the true value of η .The whiskers of each boxplot correspond to the first and third quartiles.

Fig 5 .
Fig 5. Boxplots of η for different values of a, using our method (dark gray) and GCTA (light gray).The whiskers of each boxplot are the first and third quartiles.

Fig 6 .
Fig 6.Boxplots of η for different values of q, with η = 0.7 and a = 1 (top left), a = 0.5 (top right), a = 0.1 (bottom left) and a = 0.01 (bottom right).The boxplots are based on 500 replications.The whiskers of each boxplot are the fist and third quartile.