Large deviations for the largest eigenvalue of matrices with variance profiles

In this article we consider Wigner matrices $X_N$ with variance profiles (also called Wigner-type matrices) which are of the form $X_N(i,j) = \sigma(i/N,j/N) a_{i,j} / \sqrt{N}$ where $\sigma$ is a symmetric real positive function of $[0,1]^2$ and $\sigma$ will be taken either continuous or piecewise constant. We prove a large deviation principle for the largest eigenvalue of those matrices under the same condition of sharp sub-Gaussian bound and for some other assumptions on $\sigma$. These sub-Gaussian bounds are verified for example for Gaussian variables, Rademacher variables or uniform variables on $[- \sqrt{3}, \sqrt{3}]$.


Introduction
One of the key results in random matrix theory is Wigner's theorem: it establishes the convergence of the empirical measure of the eigenvalues of Wigner matrices towards the semi-circular measure [37]. These Wigner matrices are a model of real or complex self-adjoint random matrices with independent centered subdiagonal entries of variance 1/N and independent centered diagonal entries of variance O(1/N ). Later Füredi and Komlós proved that the largest eigenvalue of such matrices converges almost surely toward 2 [25] under an assumption of boundedness on the moments of the entries. This moment hypothesis was then relaxed to an hypothesis of boundedness for the fourth moment by Vu in [36] which was later proved to be necessary by Lee and Yin in [31]. Similar results also exist for Wishart matrices (that is matrices of the form 1 M X * X where X is a M × N random matrix with i.i.d. centered entries of variance 1) and for matrices with variance profiles (that is self-adjoint random matrices whose diagonal and subdiagonal entries are independent centered but whose variance of the entries may not be constant up to a factor 1/N ). In that case the limit of the empirical measure depends on the profile [26].
Once one knows the limits of the empirical measure and the largest eigenvalue, one can wonder how the probability that they are away from these limits behaves. These questions are of great importance for instance in mobile communication systems [18,24] and in the study of the energy landscape of disordered systems [10,32]. In the case of matrices from the Gaussian Orthogonal Ensemble or the Gaussian Unitary Ensemble, thanks to the the orthogonal or unitary invariance of the distributions, the joint law of the eigenvalues is explicitly known (see for example [33]) and the spectrum behaves like a so-called β-ensemble. By Laplace's principle, once one takes care of the singularities, those formulas lead to large deviation principles both for the empirical measure [16] and the largest eigenvalue [15].
In the case of general distributions, since eigenvalues are complicated functions of the entries, large deviations remain mysterious. Concentration of measure results were obtained in compactly supported and log-Sobolev settings by Guionnet and Zeitouni [29]. Several recent breakthroughs proved large deviation principles for matrices with entries with distributions whose tails are heavier than Gaussian both for the empirical measure and the largest eigenvalue respectively by Bordenave and Caputo and by Augeri [11,19]. Those results rely on the fact that the large deviation behaviour comes from a small number of large entries. These ideas are further generalized to the questions of subgraphs counts and eigenvalues of random graphs in [12,21,17]. In the case of sub-Gaussian entries, a large deviation principle for the largest eigenvalue of matrices with Rademacher-distributed entries was proved by Guionnet and the author in [27] using the asymptotics of Itzykson-Zuber integrals computed by Guionnet and Maïda in [28]. Indeed, one obtains the large deviations by tilting the measure by spherical integrals. Under this tilted law, the matrix is roughly distributed as a sum W N + R where W N is a Wigner matrix and R a deterministic matrix of rank one. Then, the largest eigenvalue of such a deformed model is well known and follows the phenomenon of BBP transition (coined after Bai, Ben Arous and Péché who observed it in the case of deformations of sample covariance matrices [14]). Notably the rate function for the large deviation principle of the largest eigenvalue of a matrix with Rademacher entries is the same as for the GOE. The crucial hypothesis verified by the Rademacher law that assure that the upper and lower large deviations bounds both coincides with those of the Gaussian case is the so-called sharp sub-Gaussianity. This property of the Rademacher law is expressed in terms of its Laplace transform: For distributions that are sub-Gaussian but not sharply so, large deviation lower and upper bounds were also proved by Augeri, Guionnet and the author in [13] for large values and values near the bulk of the limit measure. In this case though our rate function near infinity can be strictly smaller to the rate function for the GOE.
Wigner's original approach to determine the limit of the empirical measure was to estimate the trace of moments of Wigner matrices but a more modern approach is to estimate the resolvent (z − W N ) −1 using the Schur complement formula. One then finds that the Stieltjes transform m of the limit measure must be a solution of the so called Dyson equation: with the convention that the Stieltjes transform of µ is z → µ((z − x) −1 ). Furthermore, if for z ∈ H + (where H + is {z ∈ C : (z) > 0}) we set the condition m(z) < 0, then the only solution to this equation for z ∈ H + is m(z) = (z − √ z 2 − 4)/2 which is the Stieltjes transform of the semicircular measure (4 − x 2 ) + dx/2π . In the case of matrices with variance profiles, it can be computed again by the Schur complement formula applied on the resolvant G(z) = (z − W N ) −1 , which shows that up to an error term, its diagonal terms satisfies the following equation which admits only one solution of negative imaginary part: Then, using that the Stieltjes transform of the empirical measure is N −1 i G i,i (z) one can find the limit measure. This equation has been used to study those matrices for instance by Girko in [26] by Khorunzy and Pastur in [30], Anderson and Zeitouni in [8] and Schlyakhtenko in [34]. It was extensively studied in itself by Alt, Erdös, Ajanki, Kreuger and Schröder in a series of articles where it is used to prove local laws and universality of the local eigenvalue statistics both on the bulk, the cusp and the edge of the spectrum [4,22,20,1,3,2,5]. One may want to look at [23] for a more thorough review on the subject.
In this article, we will use the techniques developed in [27] and apply them to random matrices with variance profiles to prove a large deviation principle for the largest eigenvalue. We will place ourselves in the same context of entries with sharp sub-Gaussian law. Such a result is new, even for matrices with Gaussian entries and once again our rate function will not depend on the laws of the entries. We will consider a symmetric (or Hermitian) matrix model (X N ) N ∈N with independent sub-diagonal entries with a variance profile Σ N (i, j) = N 1/2 E[|X N (i, j)| 2 ]. We will consider a piecewise constant case where Σ N is equal to some σ k,l on squares of the form I k ) 1≤k≤n is a collection of disjoint intervals covering 1, N and such that I (N ) k /N converges to some non-trivial interval I k of [0,1]. In this case we will define σ to be the piecewise constant function equal to σ k,l on I k × I l . We will also consider the case of a variance profile which converges toward a continuous function σ in the sense that lim N sup i,j |Σ N (i, j) − σ(i/N, j/N )| = 0. In both cases the empirical measure converges to a measure µ σ characterized by the fact that its Stieltjes transform m is equal to m(z) = 1 0 m(x, z)dx where m is the only solution of the equation (see [26]): = z − σ 2 (y, x)m(y, z)dy, ∀z ∈ H + with the condition that m(x, z) < 0 for for all x ∈ [0, 1], z ∈ H + . We will find then that large deviations of the largest eigenvalue occur when we tilt our measure so that X N has roughly the same law thanX N + R whereX N is a random matrix with the same variance profile as X N and where R is deterministic and of finite rank. SinceX N will not be a Wigner matrix, finding the correct tilt will be more involved than in the classical case and will require some additional hypothesis on the variance profiles in order for the tilt to yield the desired lower bound.
First, in section 2 we will introduce the rate function and the assumption on the variance profile we will need in order for our large deviation lower bound to coincide with our upper bound. In sections 3 to 5 we will treat the case of matrices with piecewise constant variance profile which bears the most similarities with the models treated in [27]. In these sections we will insist on the differences with [27] while redirecting the reader to it for the parts of the proofs that stay the same. We will first prove a large deviations upper bound using an annealed spherical integral in section 4. We will then tilt our initial measure to prove the lower bound in section 5. There we will use the assumption made in section 2 to prove we can find a good tilt. In section 6 we will approximate the case of a continuous variance profile using piecewise constant ones. We will have to prove the convergence of the rate functions of the approximations. Since the approximations will only satisfy our lower bound up to an error term, we will also prove that this error can ultimately be neglected. In section 7 we will illustrate the cases where our result applies in the simple context of a piecewise constant variance profile with four blocks. In the same section we will illustrate the limits of our approach and the necessity to make some assumptions concerning the variance profiles, with an example of a matrix whose variance profile does not satisfy our assumptions and such that the rate function for the large deviations of the largest eigenvalue does not match our rate function. Finally, in section 8 we will discuss the explicit value of the rate function and in particular we will present a condition that when verified assures us that the rate function does depend on the variance profile only through the limit measure of the matrix model.

Acknowledgement
The author would like to thank Alice Guionnet for her help proofreading this article and particularly the introduction, Ion Nechita for bringing to his attention the example in Remark 2.4 and the referees for their careful reading and sugeestions. This work was partially supported by the ERC Advanced Grant LDRaM (ID:884584).

Variance profiles
In the rest of the article, a real x is said to be non-negative if x ≥ 0, R + is the set {x ∈ R : x ≥ 0} and R +, * = R + \ {0}. Our random matrix model will be of the form W N Σ N , where W N is either a real or a complex Wigner matrix, Σ N is a real symmetric matrix and is the entrywise product. P(A), where A is a measurable space, will denote the set of probability measures on A. We denote for n ∈ N and a set A, S n (A) the set of symmetric matrices with entries in A. First of all, we describe the matrices Σ N we will be using. These matrices will converge as piecewise constant functions of [0, 1] 2 to some function σ on [0, 1] 2 called the variance profile. We will consider here two cases: the case where σ is piecewise constant and the case where it is continuous.

Piecewise constant variance profile:
We consider a variance profile piecewise constant on rectangular blocks. Let n ∈ N * , Σ = (σ i,j ) i,j∈ 1,n a real symmetric n × n matrix with non-negative coefficients and α = (α 1 , ..., α n ) ∈ R n such that for every i, α i > 0 and α 1 + ... + α n = 1. In this context we will consider Σ N defined by block by: n } is a partition of 1, N such that for all j ∈ 1, n , I where for every N , the a N j are such that: and such that for j ∈ 1, n : We then define (γ i ) 1≤i≤n and (I i ) 1≤i≤n by: We shall also denote σ : [0, 1] 2 → R + the piecewise constant function defined by This setting will be referred as the case of a piecewise constant variance profile associated to the parameters Σ and α.
Continuous variance profile: In this case, we will consider a real non-negative symmetric continuous function σ : [0, 1] 2 → R + and for every N , we will consider a symmetric matrix with non negative entries Σ N such that the sequence Σ N satisfies: In both cases, we will call σ the variance profile of the matrix model.

The generalized Wigner matrix model
For the Wigner matrix W N , we will consider two cases, a real symmetric one when β = 1 and a complex Hermitian one when β = 2. For every N , we will consider a family of independent random variables (a i,j is a real random variable for all i, j and for β = 2, a (β) i,j will be a complex random variable for i = j and a real random variable for i = j. These will be the unrenormalized entries of W N . We will assume that all the µ N i,j are centered: For β = 1 we assume that off-diagonal entries have variance 1 and diagonal entries have variance 2: For β = 2, if we denote x the function z → z and y the function z → z, we assume the following conditions on the variances of the entries: If µ is a probability measure on some R d with a covariance matrix C, we say that µ has a sharp sub-Gaussian Laplace transform if For µ a measure on C, we can identify R 2 and C and then the Laplace transform T µ can be expressed for z ∈ C as We will need to make the following assumption on the µ N i,j : In particular, we can notice that in the complex case, it implies that for i = j, ∀z ∈ C: Examples of distributions that satisfy this sharp sub-Gaussian bound in R are the (centered) Gaussian laws the Rademacher laws 1 2 (δ −1 +δ 1 ) and the uniform law on a centered interval. On C, if X is a random variable such that (X) and (X) are independent and have sharp sub-Gaussian Laplace transform, then X has a sharp sub-Gaussian Laplace transform.
We have a bound of the form: for some universal constant C. From this bound, we have that for every δ > 0, there exists > 0 that does not depend on the laws µ N i,j such that for |t| ≤ .
We have also that the T µ N i,j are uniformly C 3 in a neighbourhood of the origin: for > 0 small enough In the complex case, with the same method we have a similar result, that is that for every δ > 0, there is > 0 such that for every z ∈ C such that |z| ≤ : For both those cases, we will need to use concentration inequalities to ensure that at the exponential scale we consider, the empirical measure of our matrices can be approximated by their typical value. To this we will need this classical assumption.

Assumption 1.2.
There exists a compact set K such that the support of all µ N i,j is included in K for all i, j ∈ 1, N and all integer number N , or all µ N i,j satisfy a log-Sobolev inequality with the same constant c independent of N . In the complex case, we will suppose also that for all (i, j), if Y is a random variable of law µ i,j , there is a complex a = 0 such that (aY ) and (aY ) are independent. Now for β = 1 or 2 and N ∈ N, given the family (a (β) i,j ), we define the following Wigner matrices: If A is a self-adjoint N × N matrix, we denote λ min (A) = λ 1 ≤ λ 2 · · · ≤ λ N = λ max (A) its eigenvalues and µ A its empirical measure: N , we will abbreviate µ X (β) N inμ N throughout the article.

Statement of the results
First of all with this matrix model, we will state with the following theorem the existence of a limit in probability of the empirical measureμ N . This limit, which depends only on the limit σ of the variance profile is described in more detail in the Appendix A where this theorem is proved: This theorem is in fact an almost direct consequence from [26,Theorem 1.1]. It can also be obtained in the piecewise case and in the continuous case with the additional assumption that σ is 1/2-Hölder by applying Lemma 9.2 from [6] which itself uses stability results for the Dyson equation. Since here the Dyson equation of our setting is simpler, we present a more elementary proof of this result using rougher stability results in Appendix A. We denote by r σ the rightmost point of the support of µ σ . First of all, we have the following result for the convergence of the largest eigenvalue of X This theorem is a generalization of the result of convergence of the largest eigenvalue toward 2 in the Wigner case which was proved by Füredi and Komlós [25] for distributions with moments such that i,j | k ] ≤ k Ck for some C > 0 and then by Vu for distributions with finite fourth moment [36]. For this result, we need only to have a bound of the form E[|a (β) i,j | k ] ≤ r k for some sequence (r k ) k∈N (this hypothesis is automatically verified with our sharp sub-Gaussian bound). There are numerous similar results of convergence for the largest eigenvalue in the literature for models similar to this one, unfortunately, to the author knowledge none seem to quite correspond to the level of generality we are going for in this paper (the most similar to our model would be Theorem 2.7 from [7] but here we would like to allow for rectangular blocks). Therefore we will be using here a stronger kind of results, which are the local law results from [2] (corollary 2.10) in the case of a positive piecewise variance profile. The non-negative case as well as the continuous case will be proven by approximation, the only technicality is to prove that when we approximate a variance profile σ by a sequence of variance profiles (σ n ), the rightmost point of the support of µ σn converges toward the rightmost point of the support of µ σ (see Lemma 6.4). Although using the local law may seem excessive for the purpose of proving the convergence of the largest eigenvalue, its anisotropic version will end up being used in the large deviation lower bound in section 5.
For the following theorem, which states a large deviation principle for λ max (X (β) N ), we will need Assumptions 2.1 and 2.3 respectively for the case of a continuous variance profile and for the case of a piecewise constant variance profile. These assumptions are more thoroughly discussed in section 2. Assumption 2.3 states that the following optimization problem for ψ ∈ P([0, 1]): has a determination of its maximum argument that is continuous in θ.
Similarly, Assumption 2.1 states that the following optimization problem for ψ ∈ (R + ) n such that ψ i = 1: has a determination of its maximum argument that is continuous in θ. Both assumptions are necessary to obtain the large deviation lower bound.
satisfies a large deviation principle with speed N and good rate function I (β) which is infinite on (−∞, r σ ).
In other words, for any closed subset F of R, whereas for any open subset O of R, The same result holds for the opposite of the smallest eigenvalue −λ min (X

The rate function
We will now define the rate function I (β) in Theorem 1.4. This is in fact done the same way as in [27] with the supremum sup θ≥0 (J(µ σ , θ, x) − F (θ)).
In this formula, J(µ σ , θ, x) is the limit of N −1 log E[exp(N θ e, A N e )] where e is a unitary vector taken uniformly on the sphere and A N is a sequence of matrices such that the empirical measures converge weakly to µ σ and such that the sequence of the largest eigenvalues of A N converges to x. F (θ) is the limit of N −1 log E[exp(N θ e, X N e )] where the expectation is taken both in X N and e. We will first describe the quantity F (θ).

The asymptotics of the annealed spherical integral
For σ : [0, 1] 2 → R + a bounded measurable function and ψ a probability measure on [0, 1], let us denote: and for θ > 0: where D(.||.) is the Kullback-Leibler divergence, that is for λ, µ ∈ P([0, 1]): We consider here the following optimization problem with parameter θ > 0 on the set P([0, 1]): First, let us study this problem with the following lemma: If σ is bounded and continuous, the supremum is achieved in (1). Furthermore, in both the continuous and the piecewise cases, the function F is continuous in θ.
Proof. Let us take µ n a sequence of measures such that θ 2 β P (σ, µ n ) − β 2 D(Leb||µ n ) converges toward F (σ, θ). By compactness of P([0, 1]) for the weak topology we can assume that this sequence converges weakly to some µ. Since we assume σ continuous, P (σ, .) is continuous for the weak topology and so, lim n P (σ, µ n ) = P (σ, µ). Furthermore, since (λ, µ) → D(λ||µ) is lower semi-continuous, we have lim inf n D(Leb||µ n ) ≥ D(Leb||µ) so that Furthermore, we have for every µ ∈ P([0, 1]), In section 4 we will prove that the following limit: In the piecewise constant case, that is when σ is defined with a matrix (σ i,j ) 1≤i,j≤n and parameters α, the optimization problem that defines F is a simpler one. Indeed, if we denote for ψ = (ψ 1 , ..., ψ n ) ∈ R: We have easily, replacing µ by where Leb Ii is the Lebesgue measure restricted to the interval I i .

Definition of the rate functions
Now, in order to introduce our rate functions we need first to introduce the function J. This function is linked to the asymptotics of the following spherical integrals: where the expectation holds over e which follows the uniform measure on the sphere S βN −1 of radius one (taken in R N when β = 1 and C N when β = 2). Denoting J N the following quantity: the following theorem was proved in [28]: symmetric matrices when β = 1 and complex Hermitian matrices when β = 2 such that: • The sequence of the empirical measures µ E N of E N weakly converges to a compactly supported measure µ, and θ ≥ 0, then: lim The limit J is defined as follows. For a compactly supported probability measure we define its Stieltjes transform G µ by are taken as the limits of G µ (t) when t → a − and t → b + . We denote by K µ its inverse and let R µ (z) := K µ (z) − 1/z be its R-transform as defined by Voiculescu in [35] (both defined on ]G µ (a), G µ (b)[ and G µ (a) and/or G µ (b) if they are finite). Let us denote by r(µ) the right edge of the support of µ. J is defined for any θ ≥ 0, and λ ≥ r(µ) by, In both the piecewise constant and the continuous case, we introduce our rate function I (β) as where µ σ is the limit measure of X (β) N , our Wigner matrix whose variance profile converges toward σ.
Proof. As a supremum of continuous functions, I (β) (σ, .) is lower semi-continuous. We want to prove that the level sets of

Assumptions on the variance profile σ
In order to prove the large deviation lower bound in the piecewise constant case, we will need the following assumption on σ: ..+ψ n = 1} such that ψ θ is a maximal argument of the equation 2, that is: As a more practical example, the following assumption implies Assumption 2.1:

Remark 2.4.
Examples of variance profiles that satisfies this assumption are the variance profiles associated to some parameters (α 1 , ..., α n ) ∈ (R +, * ) n and σ i, In the case n = 2 this a linearization of a Wishart matrix as in [27].

Lemma 2.5. Assumption 2.2 implies Assumption 2.1.
Proof. The function ψ → θ 2 β P (σ, ψ) + β 2 n i=1 α i log ψ i is strictly concave and since it tends to −∞ on the boundary of the domain, it admits a unique maximal argument ψ θ which is also the unique solution to the following critical point equation: where Vect(1, ..., 1) is the subspace of R n spanned by the vector whose coordinates are all 1 . We now want to apply the implicit function theorem to prove that θ → ψ θ is analytic. First of all, the equation above can be rewritten Πf (ψ) = 0 where Π is the orthogonal projection on Vect(1, ..., 1) ⊥ .We have that for every u ∈ R n : .., 1) and we can apply the implicit function theorem.
Examples of variance profiles that satisfies Assumption 2.1 but not Assumption 2.2 are provided in section 7. In the same section, we will also show that without any assumptions on σ, the method used in this article may fail as we can have a large deviation principle but with a rate function different from I.
In the continuous case, we will need the following assumption: such that ψ θ is a maximal argument of 1 that is: As for the piecewise constant case, the following assumption implies 2.3

Remark 2.7. A family of σ satisfying Assumption 2.4 is given by
) is concave and so is P (σ, .).

Scheme of the proof
The proof of Theorem 1.4 will follow a path similar to [27] for the piecewise constant case and then for σ continuous, we will approximate it by a sequence of piecewise constant profiles. In the piecewise constant case, we will insist on the differences with [27] and novelties brought by the introduction of a variance profile and we will refer the reader to the relevant parts of [27] for further details on the proofs that stay similar. First of all, we will prove that the sequence of distributions of the largest eigenvalue of X (β) N is exponentially tight.

Exponential tightness
We will prove the following lemma of exponential tightness: Similar results hold for λ min (X (β) N ). We will in fact prove a stronger and slightly more quantitative result that will also be useful when we will approximate continuous variance profiles using piecewise constant ones (we recall that is the entrywise product of matrices): Lemma 3.2. Let β = 1, 2 and let us assume that the distribution of the entries a (β) i,j satisfy Assumption 1.1. Let A N be the following subset of symmetric matrices: For every M > 0 there exists B > 0 such that: Proof. We will use a standard net argument that we recall here for the sake of completeness. Let us denote: We next bound the probability of deviations of Y where we used that the entries have a sharp sub-Gaussian Laplace transform and that |A(i, i)| ≤ 1. This complete the proof of the Lemma with (3).
With this result, we conclude that the sequence of the distributions of the largest eigenvalue of X (β) N in Lemma 3.1 is indeed exponentially tight. Therefore it is enough to prove a weak large deviation principle. In the following we summarize the assumptions on the distribution of the entries as follows: Either the µ N i,j are uniformly compactly supported in the sense that there exists a compact set K such that the support of all µ N i,j is included in K, or the µ N i,j satisfy a uniform log-Sobolev inequality in the sense that there exists a constant c independent of N such that for all smooth function f : Additionally, the µ N i,j satisfy Assumption 1.1.

Large deviation upper and lower bounds
To use the result of Lemma A.8 in appendix A which states convergence of the largest eigenvalue toward the edge of the support, as well as the isotropic local laws we will need the following positivity assumption (which is mainly technical and will be relaxed later by approximation): In the piecewise constant case, ∀i, j ∈ 1, n , σ i,j > 0.
We shall first prove that we have a weak large deviation upper bound similar to theorem 1.9 in [27]: The lower bound will however be slightly different since we need to take into account the error term E. ) i∈ [1,n] such that: Then, we will show that when Assumption 2.1 is verified, we can take E = 0 and the main theorem follows. However, when we deal with the continuous case, we will approximate σ by piecewise constant functions σ n . But for σ n Assumption 2.1 will be verified only up to an error term E n that can be neglected when n tends to infinity.
Proving that Lemma 3.3 is verified for x < r σ is done as in [27, Corollary 1.12] using the following Lemma and saying that a deviation of λ max (X (β) N ) below r σ imply a deviation ofμ N (which cannot occur with probability larger than the exponential scale we are interested in). Lemma 3.5. Assume that the µ N i,j are uniformly compactly supported or satisfy a uniform log-Sobolev inequality. Then, for β = 1, 2, there exists some sequence ξ(N ) converging to 0 such that with d the Dudley distance The sequence ξ(N ) in this lemma depends on the the quantities |I |/N and on how fast they converge to α i . The proof of this lemma is in Appendix A. The asymptotics of are given by Theorem 2.2. We will also need the following lemma, which is a result of continuity for the J N and where we denote by A the operator norm of the matrix A given by Lemma 3.6. Given µ ∈ P([0, 1]) a compactly supported probability measure, ξ : N → R + an arbitrary sequence tending to 0, for every θ > 0, every M > 0, every ρ ≥ r(µ) there exists a function g : R + → R + going to 0 at 0 such that for any δ > 0, if we denote by B N the set of real symmetric or complex Hermitian .
Then, if we assume by the absurd that the lemma is false, there exists a sequence of matrices ( . But then by compactness, we can assume that up to extraction lim Using Lemma 3.1 and Lemma 3.5, and defining we have that for any L > 0, for M large enough and for N large enough Therefore it is enough to study the probability of deviations on the set where J N is continuous. The last item we need to this end is the asymptotics of the annealed version of the spherical integral defined by where e is a unit vector independent of X where we recall that F (σ, θ) is defined in equation (1).
This counterpart to [27, Theorem 1.17] will be proven in section 4. We are now in position to get an In fact, using the result of Lemma 3.6, for any θ ≥ 0, (1) is some quantity converging to 0 as N → +∞). Taking the log, dividing by N and optimizing in θ ≥ 0 then gives Lemma 3.3.
To prove the complementary lower bound, we shall prove the following limit: This lemma is proved by showing in section 5 that the matrix whose law has been tilted by the spherical integral is approximately a finite rank perturbation of a matrix with the same variance profile, from which we can use the techniques developped to study the famous BBP transition [14]. The conclusion follows since then where we used Theorem 3.7 and Lemma 3.8. The Theorem 1.4 follows in the case of piecewise constant variance profile satisfying Assumption 3.2 by noticing that if Assumption 2.1 is verified then we can choose E = 0. We will relax the Assumption 3.2 by approximation in the same time we will treat the continuous case.

Proof for the asymptotics of the annealed integral in Theorem 3.7
In this section we determine that the limit of F N (θ, β) is F (σ, θ) as in Theorem 3.7. In fact, we prove the following refinement of this theorem which shows that with our assumption of sharp sub-Gaussian tails, the vectors e that make the dominant contributions are delocalized.
Proof. There again, the proof is very similar to the proof of Theorem 1.17 in [27]. By denoting L µ = log T µ , we have with e ∈ S βN −1 fixed and by expanding the scalar product e, X N e : where we used the independence of the (a We deduce: But since e is taken uniformly on the sphere, the vector ψ = (ψ 1 , ..., ψ n ) follows a Dirichlet law of  (α 1,N , ..., α n,N ) satisfies a large deviation principle with good rate function I(x 1 , ..., x n Proof. We denote f N and f the functions defined on D = {x ∈ (R +, * ) n : For x ∈ D, let's denotex = (x 1 , ..., x n−1 ) andD the image of D by this application (so thatD = {x ∈ (R +, * ) n−1 : We have that on every compact ofD, (which is continuous) and furthermore, for every M > 0 there is a compact K ofD such that for N large With both those remarks we deduce via a classical Laplace method that Using again classical Laplace methods and the fact that x →x is a homeomorphism between D and D, we have that the uniform convergence of f N and the continuity of the limit f gives a weak large deviation principle with rate function f (x) − n i=1 α i log α i and the bound outside compacts gives the exponential tightness. The large deviation principle is proved.
Using Lemma 4.2 and Varadhan's lemma, we have since P is continuous that: so that we have proved the following upper bound: For the lower bound, we then again follow [27] and we use that if e ∈ V N then: We can then use the Taylor expansions of L µi,j near 0 to prove that for any δ > 0: We shall then use the following lemma: and that the event {e ∈ V N } is independent of the vector (ψ 1 , ..., ψ n ). As a consequence, we deduce from (8) that for any δ > 0 and N large enough and then we let δ tends to 0. In order to see that the event {e ∈ V N } is independent of ψ and to prove Lemma 4.3, we say that if we denote e (j) = (e i ) i∈I (N ) j , f (j) := e (j) /||e (j) || is a uniform unit vector on the sphere of dimension β|I Furthermore all these f (j) together with the random vector (ψ 1 , ..., ψ n ) form a family of independent variables. Indeed, if we construct e as a renormalized standard Gaussian variable, that is e = g/||g|| then one can see that f (j) = g (j) /||g (j) || and ψ j = ||g (j) || 2 /||g|| 2 where g (j) is defined from g the same way as e (j) is defined from e. The independence of the f (j) and ψ then comes from a classical change of variables. We notice then that in term of the f (j) , we have: Then using the independence of the f (j) : The result follows since each of these terms converges to 1.

Large deviation lower bound
We will now prove Theorem 3.4. For a vector e of the sphere S βN −1 and X a random symmetric or Hermitian matrix, we denote by P (e,θ) N the tilted probability measure defined by: where P N is the law of X where E and ψ E,θ are as in the hypotheses of Theorem 3.4 and e (i) = (e j ) j∈I (N ) i is the i-th block of entries of e. For any x ≥ r σ , there is θ x such that: where we recall that Proof that Lemma 5.1 implies Theorem 3.4. In the rest of the proof, we will abbreviate E X (β) We only need to prove that if there exists E, ψ E,θ i that satisfy the hypotheses of Theorem 3.4, for every x ≥ r σ , there exists θ x ≥ 0 such that: We have For δ > 0, let: We have, using Lemma 4.2 that: Let (δ N ) N ∈N be a sequence converging to 0 such that: and let: We have then, using the Taylor expansions of the L µ N i,j and that δ N → 0 as in equation (8), the following limit uniformly in e ∈ W N ∩ V N : Then we have: so we have that: where we used that {e ∈ V N } and {e ∈ W N } are independent (since {e ∈ V N } only depends on the f (j) and W N only depends on the ψ i ) and that 1 N log inf e∈V N ∩W N P (e,θx) [X And so it remains to prove the Lemma 5.1. More precisely, we will show that for ∈ ( 1 8 , 1 4 ), for any x > r σ (the rightmost point of the support of µ σ ) and δ > 0 we can find θ x ≥ 0 so that for M large enough, lim To prove (9), we have to show that uniformly on e ∈ V N ∩W N we still have that lim N →∞ P Proof. For e ∈ V N fixed, let X (e),N be a matrix with law P (e,θ) N . We will prove that X (e),N can be written as an additive perturbation of a random self-adjoint matrixX (e),N with independent sub-diagonal entries with the same variance profile as X . The deterministic matrix S is defined by S := (σ 2 i,j ) 1≤i,j≤n and E is the N × n matrix defined by: where we recall the the e (i) are the vectors defined by e (i) = (e j ) j∈I (N ) j . In particular, one can notice that the entries of ESE * are given by (ESE * ) i,j = Σ 2 N (i, j)e iēj . Furthermore the terms ∆ (e),N and Y (e),N are negligible in operator norm for large N in the sense that: • There is a constant C such that ||∆ (e),N || ≤ CN −1/2−4 .
• For every δ > 0: lim Those estimates revolve around the Taylor expansion of the L µi,j and the uniform bound on their derivatives near 0 given by Remark 1.1. Here we will only expose the computation justifying that the entries of ∆ (e),N and Y (e),N tend to 0. For how to refine this estimates and obtain that ∆ (e),N and Y (e),N are negligible in operator norm, we refer the reader to the subsection 5.1 of [27]. We can express the density of P (e,θ) N as the following product: where the a (β) i,j are defined as in the introduction. Since the a (β) i,j independent (for i ≤ j), the entries X (e),N i,j remain so and their mean is given by the derivative of L µi,j : where we used that by centering and variance one, In the complex case, the notation |L (3) | just means Hence, we have Furthermore, when we identify C to R 2 when X (e),N i,j is a complex variable the covariance matrix of X (e),N i,j is given by the Hessian of L µi,j so that the variance of X (e),N i,j is given by the Laplacian of L µi,j (i.e. ∂ z ∂zL µi,j ): And so : And so, to conclude we need only to identify the limit of λ max ( X (e),N + 2θ β ESE * ). The eigenvalues of X (e),N + 2θ β ESE * satisfy the following equation in z and therefore z is an eigenvalue away from the spectrum of X (e),N if and only if Recall that if K is a field and A, B are two matrices respectively in M n,p (K) and M p,n (K) then we have det(I n + AB) = det(I p + BA). Using this, we have that the preceding equality is equivalent to where θ = 2θ/β.

Lemma 5.3.
For i, j ∈ 1, n , η > 0, a > r σ , we have: we only need to prove that for k, l ∈ 1, n : To that end, we want to apply the anisotropic local law from [2] but in order to do so, we need to check its assumptions.
Furthermore following Theorem A.8, we have that for a ∈]r σ , a[, D > 0, N large enough Let e, f ∈ S βN −1 and h : z → e * G N (z)f and k : z → e * M N (z)f . On the event {λ max ( X (e),N ) < a }, we have that |h(z)|, |k(z)| ≤ 1 ( z−a ) and |h (z)|, |k (z)| ≤ 1 ( z−a ) 2 for {z : z > a } and therefore, for γ < 1/10, we can in fact assume that our bound holds for any z such that (z) > a and in particular for z real (up to some multiplicative constant C before the N −1/10 ). Let By union bound, we have that for N large enough: Combining this again with the bounds of the derivative of h and k and the bound in modulus that is derived from the bound on λ max ( X (e),N ), we get for some C > 0: for N large enough. Furthermore, this bound is uniform in e and f . We then use Theorem A.6 and the bounds one the derivatives of M N (z) the same way to conclude that for any η > 0, for N large enough and e ∈ V N ∩ W N we have that (for N large enough): where m is the solution of K σ and m i is the value taken by m on the interval I i . And so we have: Let's denote D(θ, z) the diagonal n × n matrix diag(m 1 (z)ψ θ 1 , ..., m n (z)ψ θ n ), we have that the above limit can be rewritten SD(θ, z). From the preceding lemma we have that for η > 0 uniformly in e ∈ S βN −1 that for N large enough. So since lim z→∞ det(I n − θ SD(θ, z)) = 1, all that remains is to solve the determinantal equation: det(I n − θ SD(θ, z)) = 0 and the largest solution z > r σ , if it exists, will be the the limit of λ max . We can rewrite this equation: Let ρ(θ, z) be the largest eigenvalue of D(θ, z)S D(θ, z). Then, the largest z solution of equation (10) is the unique solution of: θ ρ(θ, z) = 1 (11) one ]r σ , +∞[. Indeed, with θ fixed, if θ ρ(θ, z) = 1 then z is a solution of (10). Since the z → m i (z) are strictly decreasing, so is ρ(θ, .). So for z > z, θ ρ(θ, z ) < 1 and so z cannot be solution of (10) for the same θ. Similarly, if z is a solution of (10) then θ ρ(θ, z) ≥ 1. If θ ρ(θ, z) > 1 then since z → θ ρ(θ, z) is continuous and decreasing toward 0, there exists z > z such that θ ρ(θ, z ) = 1 and z is therefore a solution of (10) strictly larger than z.
Therefore, it suffices to prove that for any x > r σ there is at least one θ x such that Here, the Assumption 2.1 is crucial. Indeed, we need this assumption to suppose that the function θ → D(θ, z) is continuous. This continuity implies the continuity of θ → ρ(θ, z). For θ = 0 the lefthand side is 0 and for θ → ∞, since max i ψ θ i ≥ n −1 we have that +∞. By continuity, there is at least one θ x such that θ x ρ(θ x , x) = 1 and so Theorem 3.4 is proved.

Approximation of continuous and non-negative variance profiles
We now choose σ : [0, 1] 2 → R + continuous and symmetric and consider the random matrix model X (β) In order to prove a large deviation principle for X N , we will approximate the variance profile by a piecewise constant σ n . Namely, for n ∈ N we let σ n be the following n × n matrix: The term 1/(n + 1) is here to ensure that the approximating variance profile is positive so that Assumption 3.2 is satisfied. Let's denote X N the random matrix constructed with the same family of random variables a (β) i,j but with the piecewise constant variance profile associated with the matrix σ n and the vector of parameters ( 1 n , ..., 1 n ). Let F n = F (σ n , .), µ n := µ σ n . We will also denote F = F (σ, .) and I = I (β) (σ, .). Even if we suppose that Assumption 2.3 holds in the case of the continuous variance profile σ, we don't necessarily have Assumption 2.1 for the variance profiles σ n and so we don't necessarily have a sharp lower bound. To this end we will need to introduce an error term E n that will be negligible as n tends to ∞.
In the first subsection, we will prove that there exist for every n a function from R + to itself E n and a function θ → ψ θ,E n from R + to {x ∈ (R + ) n : x 1 + ... + x n = 1} such that: and such that lim n→∞ sup θ≥0 E n /θ 2 = 0. In the second subsection, we will prove that the upper and lower large deviation bounds we get for X (β),n N from Theorems 3.3 and 3.4 (which will be denoted respectively I (n) andĨ (n) ) both converges toward the rate function defined in section 2.

Existence of an error negligible toward infinity
and recalling that F n is defined the same way by replacing σ by σ n , we have the following limit.

Lemma 6.2. Let us recall the definition of Ψ:
If the Assumption 2.3 is true, then for every > 0, there is a sequence of functions E n and continuous θ → (ψ θ,E n i ) i∈ [1,n] such that: and there is a n 0 such that for n ≥ n 0 : Proof. Since assumption 2.3 is verified, there is some measure valued continuous function θ → ψ θ such that F (θ) = θ 2 P (σ, ψ θ )/β − βD(Leb||ψ θ )/2. Let ψ θ, := K (ψ θ * τ ) where * is the convolution, K is the push-forward by the application K, τ the probability measure whose density is a triangular function of support [− , ] and K the function defined by Proof of the lemma. Let η > 0 and let us find > 0 such that: Let us take X, Y, U , V independent random variables of law respectively, ψ θ , ψ θ and τ , τ . Then we have Using the uniform continuity of σ 2 , and that |K(X + U ) − X|, |K(Y + V ) − Y | ≤ almost surely, we have that there exists an > 0 such that the difference is smaller than η. This bound does not depend on θ.
Now, let us find n 0 such that for n ≥ n 0 , where we recall that (x, y) → σ n (x, y) is the discretized version of σ. There again, using the uniform continuity of σ, we have for every > 0 the existence of n 0 such that for n ≥ n 0 , for all x, y ∈ [0, 1], |(σ n (x, y)) 2 − σ 2 (x, y)| ≤ η. Combining these two inequalities we get the first point. Then let us show that: We have that: 1] g (x, y)dψ θ (y).
Let us notice that [0,1] g (x, y)dy = [0,1] g (y, x)dy = 1. We have where we used the concavity of log. Finally, using again the concavity, we have for every i ∈ [1, n] n i/n Summing over i gives us the result.

Convergence of large deviation bounds toward the rate function
We can now introduce I n β andĨ n β defined on [r σ , +∞[ the rate functions for the upper and lower bound of the piecewise constant approximations To prove that those two functions converge toward I(x) = I (β) (σ, .) we will need the following result: we have that for every > 0 there is n 0 such that for any n ≥ n 0 : In particular if we denote λ 1,N < ... < λ N,N the eigenvalues of X (β) And so, on this event, we have that for any a ∈ R: If we denote for t ∈ R, F (t) := µ(] − ∞, t]) and F (n) (t) := µ n (] − ∞, t]), using the convergence in probability of the eigenvalue distribution, this implies that for every t in R: This then easily implies that |r (n) − r σ | ≤ 2 .
This result enables us to finally prove the complete version of Theorem 1.3. Indeed using the Theorem 3.5 we have that for every > 0, P[λ max (X  N )). So we have: where we used Theorem A.8 for X n N . And so Theorem 1.3 is proved. Lemma 6.5.
where we recall that I is equal to I (β) (σ, .).
Proof. For the first point of the lemma, let's first prove that for every x ≥ r σ , θ → J(µ n , θ, x) converges uniformly on every compact towards the function θ → J(µ, θ, x). Let l < r be two reals. For µ a probability measure on R whose support is a subset of ]l, r[, let Q µ be the function defined on D r, = Q µ is continuous in (θ, u) and for K ⊂ D r, a compact we have that the function µ → Q µ |K mapping µ to the restriction of Q µ on K is continuous in µ for the weak topology and µ such that their support is a subset of ]l, r[ when the arrival space is the set of functions on K endowed with the uniform norm (this is a consequence of Ascoli's theorem). Let x > r σ and r, l such that l < l σ < r σ < r < x. For n large enough the support of µ n is in ]l, r[. We have that the sequence of functions θ → v(θ, µ n , x) converges for n large enough and the result is immediate.
If 2θ β < G µ (x) then 2θ β < G µ n (x) for n large enough. G µ n converge towards G µ on [r, +∞[, for > 0, K µ n is defined on ]0, G µ (r) − ] for n large enough and K µ n converges toward K µ and therefore and that the limits in both cases are v(θ, λ, x).
where we used that y → G µ (y)(y − r) is increasing.
Taking > 0 such that < G µ (x)(x − r) and using the continuity in µ of µ → G µ (x)(x − r), we have for every compact K ⊂ R + and n large enough: Therefore, using the convergence of v(θ, µ n , x) and the uniform convergence of Q µ n on the compacts of D r, , since: we have that J(µ n , θ, x) converges towards J(µ σ , θ, x). Furthermore, since θ → J(µ σ , θ, x) are continuous increasing functions, by Dini's theorem the convergence is uniform on all compact.

Conclusion
We will now prove that the difference between X (β),n N and X (β) N is negligible at the exponential scale. Lemma 6.6. For every > 0 and every A > 0, there exists some n 0 ∈ N such that for n ≥ n 0 : We have that: lim n→∞ M n = 0 Following Lemma 3.2, we can write that for every n ∈ N, A > 0 there is B > 0 such that lim sup For n 0 ∈ N such that M n B ≤ for all n ≥ n 0 , our upper bound is verified.
Therefore, since both I β (x) converge toward I β (x), we have a weak large deviation principle with rate function I β . Furthermore since we also have exponential tightness, we have that Theorem 1.4 holds.
It only remains to relax the positivity assumption 3.2 for the piecewise constant case. Let σ be a piecewise variance profile. We can approximate σ by σ n := σ 2 + 1 n+1 . We notice then that with this choice of σ n : so that if 2.1 is verified for σ, it is verified for σ n . And so, as we have just done for the continuous case, we can prove the same way that the rate functions I(σ n , .) converges to I(σ, .) and that the large deviation principle holds with I(σ, .).

The case of matrices with 2 × 2 block variance profiles
In this section, we will discuss the case of piecewise constant variance profiles with 4 blocks (which are not necessarily of equal sizes) and determine what are the cases where the Assumption 2.1 holds. In particular, we will provide examples where the maximum argument of Assumption 2.1 can be taken continuous without the need for the concavity assumption. Let's take a piecewise constant variance profile defined by α = (α, 1 − α) and σ 1,1 = a, σ 2,2 = b, σ 1,2 = σ 2,1 = c. In order to apply Theorem 1.4 we need to study the maximum argument for θ fixed of: Since we can change α in 1 − α by switching a and b, we can suppose without loss of generality that α ≤ 1/2.

Case with
In the case (a 2 + b 2 − 2c 2 ) ≤ 0 we have the ψ(., θ) is strictly concave and therefore θ → x θ is analytical and assumption 2.1 is satisfied and the large deviation principle applies.

Case x min ∈]α, 1/2[ and pathological cases
The graph we obtain is similar to the graph of the first case. In this case, we also have a θ crit such that for θ ≤ θ crit , there is only one critical point x θ 1 which is in ]0, α[ and then the apparition of two other critical points x θ 2 and x θ 3 that are such that 1/2 < x θ 2 < x θ 3 , ψ(x θ 2 , θ) being a local minimum and ψ(x θ 3 , θ) a local maximum. But in this case for high values of θ, we have that the maximum is attained near 1 and so for these high values x θ 3 is the maximum argument. We have a discontinuity in the maximum argument and so Assumption 2.1 is not verified.
Let us now show that if x min ∈]α, 1/2[ and c = 0, the largest eigenvalue satisfies a large deviation principle but with a rate function J different from I.
Our matrix X (1−α)N )}. But both these quantities satisfy large deviation principles, more precisely, if I β is the rate function for the large deviation principle of [27] for a Wigner matrix, λ max ( √ aαT ). Now λ max (X (β) N ) follows a large deviation principle with rate function J β which is: In particular, if we choose α, a, b such that aα > b(1 − α) and b > a, we notice that J β (x) = αI β ( x √ aα ) for x near aα and J β (x) = (1 − α)I β ( ) for large x. In this case one can notice that J β is not a convex function and therefore cannot by obtained as sup θ {J(x, µ σ , θ) − F (θ)} since it is a sup of convex functions. We have J = I. For c > 0 but small enough we can also conclude that the large deviation principle still does not hold. Indeed, if we denote I c the rate function we expect using the formula of section 2. Since I 0 still provides a large deviation upper bound, we have J ≥ I 0 and so let x 0 ∈ R + such that J(x 0 ) ≥ I 0 (x 0 ) + η for some η > 0 (x 0 does exists since J = I 0 ). Using the same approximation arguments as in section 6, we have that there exists > 0 such that for c < , I c (x 0 ) < I 0 (x 0 ) + η/3 and: Since I c is continuous in x 0 , we have that there cannot be a large deviation principle with the rate function I c .

Looking for an expression of the rate function
In this section we will present a method to explicitly compute the rate function I in the piecewise constant case under some hypothesis on the behavior of F . First, let us describe F in a neighbourhood of θ = 0. Theorem 8.1. Let σ be a continuous or piecewise constant variance profile, there is θ 0 > 0 such that for θ ≤ θ 0 : Where R is the R-transform of the measure µ σ .
Proof. This result was proved in the case of a linearisation of a Wishart matrix in section 4.2 of [27]. For the sake of completeness, we will reproduce here this proof. For the lower bound, we have for M > r σ and 2θ/β ≤ G(M ) (where G is the Stieltjes transform of the measure µ σ ): This is due to the fact that for 2θ For the upper bound, we write: Where we used that for N large enough, we have for every N , P[λ max (X N ) ≥ M ] ≤ exp(−KM ) for some K > 0 and that for λ max (X N ) ≤ M , I N (X N , θ) ≤ e M θ . Now, by choosing θ small enough such that (2θ − K) < 0, we have the upper bound.
The main results of this section is the following: θ) is analytic, then the R transform of µ σ has an analytic extension on R + and then the rate function I(σ, .) only depends on µ σ .
Proof. Since F (σ, .) is analytic and so is R and since we have F (σ, θ) = R( 2θ β ) for small θ, F depends only on R that is on µ σ and F ( βx 2 ) extends R on R + . Then, looking at the expression of I(σ, .), it only depends on µ σ . Remark 8.3. Without any condition on the variance profile σ, we do not have that I(σ, .) depends only on µ σ . For instance if we take X N and X N independent matrices both with the same variance profile σ, α, β > 0 such that α > β and α + β = 1, then the following matrix has a variance profile: And then λ max (Y N ) = max(λ max (X αN ), λ max (X βN )). We have that λ max (Y N ) satisfy a large deviation principle with rate function βI(σ, .) whereas this matrix has for limit measure µ σ whatever the choice of β.
In the case of a piecewise constant variance profile, the same concavity hypothesis as before implies the analyticity of the function F (σ, .) (this is due to the fact that with the implicit function theorem, the maximum argument is indeed analytic in θ).

A Appendix: The limit of the empirical measure
In this section, we describe the limit of the empirical measure µ σ of the matrices X N . We will also discuss the stability of this measure in function of σ. Under assumptions of positivity for the variance profile, we will prove that the largest eigenvalue converges toward the rightmost point of the support of µ σ . We denote H + the complex upper half-plane {z ∈ C : z > 0} and H − the complex lower half-plane {z ∈ C : z < 0}. To describe the limit of the empirical measure we need the following definition for the so-called canonical equation (also called quadratic vector equation). The following definition takes into account both the piecewise constant and the continuous case: .
Here S is the following kernel operator for f ∈ H: If w is a function from [0, 1] to C, we denote ||w|| = sup x∈[0,1] |w x |. If S is an operator on the space of functions from [0, 1] to C, we denote ||S|| operator norm corresponding to the previous norm. If m is a function from H + to H, we denote m x the function z → m(z)(x). We then have the following result concerning the solution of the equation: This theorem is in fact a direct application of Theorem 2.1 from [4] which states that the equation K σ always has a solution in a more general context where we replace [0, 1] by a probability space X and S is a symmetric, positivity preserving operator on the space of uniformly bounded complex functions on X.
Remark A.2. If σ is a piecewise constant variance profile with parameters α 1 , ..., α n and (σ i,j ) 1≤i,j≤n , then the solution of (K σ ) is piecewise constant on the intervals I j . This can be viewed directly from K σ by noticing that Sf is always piecewise constant on the intervals I j .
We will denote µ σ := 1 0 p x dx where p x is given by the preceding theorem. And so we have that the Stieltjes transform of µ σ is 1 0 m x dx. This measure µ σ will be the limit of the empirical measuresμ N of the matrices X  ( N x , N x ). When N tends to infinity, for almost every x we have that with probability 1: Proof. First, since for all N , Σ N is bounded and since the coefficients of X And so for δ small enough, (v (n) ) n∈N is a Cauchy sequence for the distance , v x (z))) which is converging towardm the fixed point of Φ and Therefore for every > 0 and η > 0, there is δ > 0 small enough such that sup x,y |σ 2 (x, y)−σ 2 (x, y)| ≤ δ implies d H + η (m,m) ≤ . Since a base of neighbourhood of µ σ for the vague topology is given by: Proof. This is a consequence from Theorem A.3 and Theorem A.4 by noticing that lim N →∞ sup x,y∈[0,1] |(σ N ) 2 (x, y) − σ 2 (x, y)| = 0.
We will also need a similar result for the piecewise constant case.
Theorem A.6. Let s = (s i,j ) n i,j=1 ∈ S n (R + ) and α, β ∈ (R +, * ) n be two vectors of positive coordinates summing to one and let γ i = i j=0 α j andγ i = i j=0 β j . Let σ andσ and the two piecewise constant variance profiles associated respectively with the couples (s, α) and (s, β) and v andṽ the solutions respectively of K σ and Kσ. For i ∈ 1, n let also m i andm i be the holomorphic functions given by v x = n i=1 1 γi−1≤x<γi m i andṽ x = n i=1 1γ i−1 ≤x<γimi . Then for every η > 0 there is > 0 such that if sup i |α i − β i | ≤ , then for all z ∈ H + η , we have sup i |m i (z) −m i (z)| ≤ η. Note that if we impose s i,j > 0 for all i, j, this result is a particular case of Proposition 10.1 [6]. However, since we would like for s i,j to be potentially 0, we present the following proof which while not at the same level of generality and quantitative bounds will be sufficient enough.
Proof. We use the same notations as in the previous proof. Since v is the solution of K σ , the m i satisfy the following system: (2+min{||S||,||S||}) 2 , sup z∈H + η ||u(z)|| ≤ η −1 }, Φ andΦ maps B η onto itself for η small enough. For u ∈ B η , we have as before if δ ≥ (sup i,j s 2 i,j )(sup k |α k − β k |), for all i: Then, using the same reasoning as in the previous case, we have that for every η > 0, there is δ > 0 such that if sup i |α i − β i | ≤ δ then sup z∈H + η sup i |m i (z) −m i (z)| ≤ η.
Remark A.7. A more elementary proof of the convergence of the measure can also be obtained since we have bounds on the moments of the entries in our case via a classical moments methods. Let us consider for any k ∈ N, we consider W k the set of words w = (w 1 , ..., w 2k+1 ) on 1, k + 1 such that {w 1 , ..., w 2k+1 } = 1, k + 1 , w 1 = w 2k+1 and such that for any i ∈ 1, k + 1 , there is exactly one j ∈ 1, k + 1 \ {i} such that {w i , w i+1 } = {w j , w j+1 }. For such words w, we define E w := {{w i , w i+1 } : i ∈ 1, 2k + 1 }. On this set, we can define a relation of equivalence ∼ by letting w ∼ w if there is a permutation f of 1, k + 1 such that f (w i ) = w i for every i. We can then define W k an arbitrary set of representative of W k for the equivalence relation ∼. Then using classical arguments for the computation of moments ofμ N , one can see, using that the k-th moment of the entries of X where c N k = 0 for k odd and We redirect the reader to [ So if we denote c k = lim N →∞ c N k , we have that in probabilityμ N converges toward a measure µ σ whose moments are the c k .
In order to apply the full results of [2], we will need the positivity assumption 3.2 for the piecewise constant variance profile. We then have the following convergence result: Theorem A.8. If we are in the piecewise constant case with Assumption 3.2 satisfied, if we let l N and r N be respectively the left and right edge of the support ofμ N , that is respectively the smallest eigenvalue and largest eigenvalue of X (β) N and l σ and r σ the left and right edges of the support of µ σ , we have for every δ > 0, D > 0, P[r N ≥ r σ + δ or l N ≤ l σ − δ] ≤ N −D for N large enough.
Proof. This is in fact an application of corollary 1.10 from [2] which states that the extreme eigenvalues cannot leave the neighborhood of the support of µ σ N where σ N is the same as in Theorem A.3. We need only to check the hypothesis (A) to (D). Up to multiplication by a scalar, our matrix model satisfies the boundedness condition (A) and the Assumption 3.2 gives us the positivity hypothesis (B). The sharp-sub Gaussian hypothesis gives (D). For the boundedness condition on the Stieltjes transform (C) we can use Theorem 6.1 from [4]. Our kernel operator S satisfy assumptions A1,A2 and B1. Particularly we can use remark 6.2 and 6.3 to bound m respectively away and near 0. Then, we need only to prove that r σ N and l σ N converges toward r σ and l σ . This can be done for instance by looking at the expression on the moments of µ σ N and µ σ given in Remark A.7. σ N and σ are both piecewise constant functions with σ being associated with the parameters (σ i,j ) i,j∈ 1,n and (α i ) i∈ 1,n and σ N being associated with the parameters (σ i,j ) i,j∈ 1,n and (α N i ) i∈ 1,n where the α N i are defined by where we remind that the I (N ) i are defined in subsection 1.1. Since lim N →∞ α N i = α i for any > 0 we have that for N large enough: Then using the formula of Remark A.7 which gives the moments of µ σ N and µ σ in terms of the σ i,j and the α i and α N i we see that for every > 0, for N large enough, we have that for every k ∈ N, (1 − ) k+1 µ σ N (x 2k ) ≤ µ σ (x 2k ) ≤ (1 + ) k+1 µ σ N (x 2k ). We conclude using that since the µ σ and µ σ N are symmetric, r σ N = −l σ N = lim k→∞ µ σ N (x 2k ) 1/2k and r σ = −l σ = lim k→∞ µ σ (x 2k ) 1/2k .

B Appendix: Proof of Lemma 3.5
This section is devoted to the proof of Lemma 3.5. For this, we will use a concentration results respectively from [29] and Theorem A.3 Using Theorem A.6, we have that lim N →+∞ d(µ σ N , µ σ ) = 0 and therefore, using Theorem A.3 we have in probability lim N →+∞ d(μ N , µ σ ) = 0. From this, we can deduce that lim N →∞ d(E[μ N ], µ σ ) = 0. Indeed, if f is a continuous function bounded in absolute norm by 1, We have in probability that lim N →∞μN (f ) = µ σ (f ) and so since |μ N (f )| ≤ 1, we have lim N →∞ E[μ N ](f ) = µ σ (f ).