Large deviations for extreme eigenvalues of deformed Wigner random matrices

We present a large deviation principle at speed N for the largest eigenvalue of some additively deformed Wigner matrices. In particular this includes Gaussian ensembles with full-rank general deformation. For the non-Gaussian ensembles, the deformation should be diagonal, and we assume that the laws of the entries have sharp sub-Gaussian Laplace transforms and satisfy certain concentration properties. For these latter ensembles we establish the large deviation principle in a restricted range $(-\infty, x_c)$, where $x_c$ depends on the deformation only and can be infinite.


Introduction
1.1 Deformed ensembles: typical behavior. In this paper, our goal is to prove a large deviation principle (LDP) for the largest eigenvalue of the random matrix (1.1) Here WN √ N lies in a particular class of real or complex Wigner matrices. Specifically, we will ask that the laws of the entries of W N have sub-Gaussian Laplace transforms with certain variances, and that these laws satisfy concentration properties. The archetypal examples of this class are the Gaussian ensembles (GOE and GUE). We also assume that D N is a deterministic matrix whose empirical spectral measure tends to a deterministic limit µ D and whose extreme eigenvalues tend to the edges of µ D . In all of our proofs we will assume that D N is diagonal, but by rotational invariance, our results hold for the deformed Gaussian models even when D N is not diagonal. More details on our assumptions will be given in Section 2.
If µ is a compactly supported measure on R, we write l(µ) and r(µ) for the left and right endpoints, respectively, of its support. For some special cases of our model, it is known that λ N (X N ) → r(ρ sc ⊞ µ D ) almost surely.
Here M is an N × N self-adjoint matrix, θ ≥ 0 is the argument of the Laplace transform, and the integration E e is over vectors e uniform on the unit sphere S N −1 (we take S N −1 ⊂ R N if M is real, or S N −1 ⊂ C N if M is complex, so that (1.2) is real). If M is a random matrix, then (1.2) is a random variable. This is a special case of the famous Harish-Chandra/Itzykson/Zuber integral.
For an LDP for the model (1.1), we encounter two technical challenges. If we write P N for the law of X N and E XN for the corresponding expectation (and define E WN in the obvious way), then the main challenge is the computation of The term E XN [E e [e N θ e,XN e ]] appears as a normalization constant when tilting the measure, so its logarithmic asymptotics appear as part of the rate function. To understand these asymptotics when W N is not Gaussian, we use the method of [Guionnet and Husson, 2018 The qualifier "for θ small enough" means that, via this argument, we can only obtain large-deviations asymptotics of events that localize λ N (X N ) below some critical threshold x c , which depends on the deformation µ D only. We show x c ≥ r(ρ sc ⊞ µ D ) with strict inequality except in degenerate cases, and that x c can be infinite. For example, x c = +∞ when µ D is the uniform measure on an interval. For the Gaussian ensembles, the limit in (1.3) is directly computable for every θ ≥ 0 without recourse to this delocalization problem, so our results for those models are stronger.
The second difficulty is that we need a concentration result of the form for κ > 0 small enough, where d is defined in (1.5). With ρ sc ⊞µ D replaced with E[μ XN ], this is standard concentration of linear statistics [Guionnet and Zeitouni, 2000], easily extended to our model. To approximate E[μ XN ] with ρ sc ⊞µ D , we use local laws for deformed ensembles [Lee et al., 2016, Lee and Schnelli, 2015. Our argument is slightly technical, since these local laws let us approximate E[μ XN ], not directly by ρ sc ⊞ µ D , but by a measure close to ρ sc ⊞μ DN , so several intermediate comparisons are needed. The organization of the paper is as follows: In Section 2, we state our assumptions and main result with commentary and examples. In Section 3, we provide background on spherical integrals, introduce the tilted measures, and provide a high-level overview of the technique as well as proofs of weak-large-deviations upper and lower bounds. These arguments rely on several key lemmas, the proofs of which make up the remaining three sections. In Section 4, we address the first technical issue discussed above. In Section 5, we prove exponential tightness for our model, then address the second technical issue discussed above. In Section 6, we establish properties of the rate function. Throughout, our results are stated for both the real and complex cases, but we only give proofs in the real case. The proofs in the complex case require only minor modifications.
Conventions. We use the shorthand β for the symmetry class at hand: β = 1 refers to real symmetric matrices and β = 2 refers to complex Hermitian matrices. Our norm M on matrices is the operator norm M = sup u 2 =1 M u 2 . Our metric d on probability measures will be the Dudley distance (also called the bounded-Lipschitz distance), given by (1.5) Recall that this distance metrizes weak convergence. Finally, we recall the Stieltjes transform and the Voiculescu R-transform of a compactly supported probability measure. If µ is a probability measure on R the convex hull of whose support is [a, b], then we will normalize its Stieltjes transform G µ as G µ (y) = µ(dt) y − t .
If we write G µ (a) = lim y↑a G µ (y) and G µ (b) = lim y↓b G µ (y), then it can be shown that G µ is a bijection from R\ [a, b] to (G µ (a), G µ (b)) \ {0}. We will write for its functional inverse, and write R µ (y) = K µ (y) − 1 y for its Voiculescu R-transform, which linearizes free convolution: Acknowledgements. The author would like to thank Paul Bourgade for many helpful discussions, and Alice Guionnet and Ofer Zeitouni for explaining that one assumption in an early version of this paper was superfluous.
2 Main result 2.1 Assumptions. We first present our assumptions on D N , which will be made throughout, even though we will only state them in the presentation of the main results.
Assumption I. The matrix D N is real, diagonal, and deterministic, and its empirical measureμ DN tends weakly as N → ∞ to a compactly supported probability measure µ D . Furthermore, Assumption II. There exist C > 0 and ǫ 0 > 0 such that Remark 2.1. We emphasize that µ D is allowed to be quite poorly behaved. For example, it can be singular with respect to Lebesgue measure. It can also have disconnected support. Notice that Assumption II is fairly mild. For example, if µ D has a density and the entries of D N are the 1 N -quantiles of µ D , then in fact In fact, the proof of Lemma 5.9 below shows that, instead of Assumption II, it suffices to bound the difference between the Stieltjes transforms ofμ DN and µ D at distance N −δ from the real line, for δ > 0 small enough.
We will write the Laplace transform of a measure µ on C as If in fact µ is supported on R and t is real, this reduces to the familiar T µ (t) = e tx µ(dx).
We assume that WN √ N is a Wigner matrix, by which we mean that its entries are independent up to the self-adjoint condition. Our assumptions on the Wigner part are named, rather than numbered, to emphasize that our results apply under either of them, rather than both of them.
Gaussian Hypothesis. The matrix WN √ N is distributed according to the Gaussian Orthogonal Ensemble if β = 1, or the Gaussian Unitary Ensemble if β = 2. (That is, the law of W N on the space of symmetric/Hermitian matrices has density proportional to exp(−β tr(W 2 N )/4).) SSGC Hypothesis. (This labelling stands for "sharp sub-Gaussian and concentrates," and matches the assumptions of [Guionnet and Husson, 2018].) Write µ N i,j for the law of the (i, j)th entry of W N . 1. Assume both of the following.
• The first and second moments match those of the relevant Gaussian ensemble. In our normalization, this means that for every N ∈ N and i, j ∈ 1, N , if β = 1 we have whereas if β = 2 and i = j we have If β = 2, then each µ N i,i is supported on R, with xµ N i,i (dx) = 0 and x 2 µ N i,i (dx) = 1. • For every N ∈ N and i, j ∈ 1, N , the measure µ N i,j has a sharp sub-Gaussian Laplace transform: 2. In addition, assume one of the following concentration-type hypotheses.
• There exists a constant c independent of N such that, for all N ∈ N and all i, j ∈ 1, N , the law µ N i,j satisfies a log-Sobolev inequality with constant c.
• There exists a compact set K independent of N (real if β = 1, or complex if β = 2) such that, for all N ∈ N and all i, j ∈ 1, N , the law µ N i,j is supported in K.
Remark 2.3. A list of examples satisfying the SSGC Hypothesis is provided in [Guionnet and Husson, 2018]. Among these examples are real matrices whose entries follow the Rademacher law 1 2 (δ −1 + δ +1 ) or the uniform law on [− √ 3, √ 3] (appropriately rescaled on the diagonal). In the literature, it is common to call a centered measure µ on R with unit variance sub-Gaussian whenever is finite. We emphasize that we are asking for more; in (2.2) we require A = 1 (off the diagonal, with appropriate modifications otherwise), and following [Guionnet and Husson, 2018] we call such measures sharp sub-Gaussian. This is a strict subclass; for example, the law of 1 p BG, where B ∼ Bernoulli(p) and G ∼ N (0, 1) are independent, has unit variance but A = 1/p. This example appears in [Augeri et al., 2019], which treats the general case A > 1, with zero deformation.

Main result.
Definition 2.4. For a compactly supported measure ν, a parameter θ ≥ 0, and a real number M ≥ r(ν), define (2.5) (If M = r(ν), we recall our convention G ν (r(ν)) = lim y↓r(ν) G ν (y), which is possibly infinite.) In Section 3.1 we will explain how this function arises as the limit of appropriately normalized spherical integrals. For x ≥ r(ρ sc ⊞ µ D ) and θ ≥ 0, we define and then set We will show below that for all measures µ D .
To state our result, we will need the following critical threshold.
Definition 2.6. Given the compactly supported measure µ D , define the real number x c by It will be shown in Proposition 6.1 below that x c ≥ r(ρ sc ⊞ µ D ), with equality if and only if an inequality involving the Stieltjes transform of µ D degenerates.
The main result of the paper is the following: Theorem 2.7. Suppose that Assumptions I and II hold.
1. If the Gaussian Hypothesis holds, then the law of the largest eigenvalue λ N (X N ) satisfies a large deviation principle at speed N with the good rate function I (β) (x). By rotational invariance, we have the same result when D N is not diagonal but simply symmetric (if β = 1) or Hermitian (if β = 2) and satisfies the rest of the requirements of Assumption I.
2. If instead the SSGC Hypothesis holds, then the law of the largest eigenvalue λ N (X N ) satisfies what we will call a "restricted large deviation principle on (−∞, x c )" at speed N with the good rate function I (β) (x). This means the following: • For every closed set F ⊂ (−∞, x c ), we have 3. In particular, if the SSGC Hypothesis holds and µ D is such that x c = +∞, then the law of the largest eigenvalue λ N (X N ) satisfies a large deviation principle at speed N with the good rate function I (β) (x) in the usual sense.
Remark 2.10. See Proposition 6.1 below for a more in-depth study of the function I (β) (x). There, it is shown that I (β) (x) has a unique minimizer at x = r(ρ sc ⊞ µ D ), where it takes the value zero. In particular, if the Gaussian Hypothesis holds, or if the SSGC Hypothesis holds and µ D is such that x c = +∞, then This result appears to be new in the real case when ρ sc ⊞ µ D is multicut, and in the complex non-Gaussian case when ρ sc ⊞µ D is multicut and (D N ) ∞ N =1 has "internal outliers" between the connected components of supp(µ D ) that persist as N → ∞. (Recall that we forbid "external outliers" by assuming λ N (D N ) → r(µ D ) and λ 1 (D N ) → l(µ D ).) In the literature Equation (2.11) appears as an easy corollary of edge universality results, or as a special case of BBP results when the deforming matrix D N has no external outliers. For example, it follows from [Capitaine and Péché, 2016] for deformed GUE, possibly multicut with internal outliers, under some assumptions about the decay rate of µ D near its edges; from [Lee and Schnelli, 2015] for general real or complex noise if µ D is such that ρ sc ⊞ µ D is supported on a single interval with square-root decay at its two edges; and from [Belinschi and Capitaine, 2017] in the complex (and possibly multicut) case with no outliers. Of course, all of these papers achieve much more.
Remark 2.12. The proof of the "restricted LDP," i.e., of Equations (2.9) and (2.8), follows in the classical way from estimates of small-ball probabilities via a weak large deviation principle and exponential tightness, except that we can only lower-bound small-ball probabilities P N (|λ N (X N ) − x| < δ) for x < x c rather than x ∈ R. However, we can upper-bound these probabilities for all x (see Theorem 3.4), so Equation (2.8) actually holds for all closed F ⊂ R, not just F ⊂ (−∞, x c ).
Remark 2.13. Of course, one would prefer to write the rate function non-variationally, and we can do this when the argument is at or above the critical threshold x c (µ D ). Proposition 6.1 shows that, for all x > r(ρ sc ⊞ µ D ), the supremum in the definition of I(x) is achieved at a unique θ x . For x ≥ x c (which is relevant for the Gaussian case), this θ x is given explicitly as θ (If x c (µ D ) < ∞, then log(r(µ D ) − y)µ D (dy) < ∞.) But for subcritical x values, θ x is defined implicitly in the proof of Proposition 6.1 as the unique solution of the constrained problem (2.14) We have not found a way to solve this constrained problem explicitly, nor to write I (β) (x, θ x ) explicitly at its solution. If the domain of θ x in the constraint were instead (0, β 2 G ρsc⊞µD (r(ρ sc ⊞ µ D ))), the equation would simplify to K ρsc⊞µD ( 2 β θ x ) = x, which has the solution θ x = β 2 G ρsc⊞µD (x). But K ρsc⊞µD (·) is not generally guaranteed to exist for arguments larger than G ρsc⊞µD (r(ρ sc ⊞ µ D )), and even when extendable it may not be globally invertible.
Thus our rate function remains implicit for subcritical x values. Nevertheless, in some simple cases the constrained problem can be solved explicitly; two examples are given below in Sections 2.3 and 2.4.
Remark 2.15. If D N = 0, then x c = +∞, and we recover [Guionnet and Husson, 2018, Theorems 1.4 and 1.5], which in particular includes the classical LDP for the Gaussian ensembles. (The last equality in the above display is true by [Guionnet and Husson, 2018, Section 4.1].) Notice that we get the same rate function if D N is not identically zero but rather D N → 0 sufficiently quickly. Remark 2.16. One wants to recover large deviations for BBP-type problems, so it is tempting to conjecture that, if the largest eigenvalue of D N tends not to r(µ D ) but to some ρ > r(µ D ), then an LDP should hold for λ N (X N ) at speed N with the good rate functioñ But, at least for certain simple situations, such a conjecture would be wrong. For example, suppose that WN √ N is distributed according to the GOE (if β = 1) or the GUE (if β = 2), that µ D = δ 0 (so that ρ sc ⊞ µ D = ρ sc ), and that D N has N − 1 zero eigenvalues with one spike at, say, 2 for concreteness. Then it is known [Maïda, 2007, Theorem 1.2] that λ N (X N ) satisfies an LDP at speed N with the good rate function x ≥ 2.
(The published rate function has a typo; it is corrected in the v2 arXiv posting. We also normalize the semicircle law differently.) Notice that this vanishes uniquely at x = 5 2 , which lies outside supp(ρ sc ) -this model is past the BBP phase transition. But in this situation we can computẽ It is likely that our method could be extended, as in [Guionnet and Maïda, 2018], to models where lim N →∞ λ N (D N ) is a spike below the BBP threshold, i.e., such that still λ N (X N ) → r(ρ sc ⊞ µ D ) almost surely. But a new idea is needed beyond the BBP threshold.

Second example
for some parameter a > 0. Here x c (µ D ) = G µD (a) = +∞, so all x are subcritical; that is, we can estimate any probability P N (λ N ∈ A) under either the SSGC Hypothesis or the Gaussian Hypothesis. Our computations use the known result (2.17) x c = 3 Figure 1: Sketch of the rate function when β = 1 and µ D = ρ sc .
In the physics literature this dates back to [Zee, 1996, Equations (55), (56)]; it was established in the mathematical literature in [Bleher and Kuijlaars, 2004, Equations (3.5), (3.6)] (for a > 1), [Aptekarev et al., 2005, Section 1] (for a < 1), and [Bleher and Kuijlaars, 2007, Section 7] (for a = 1). The latter three papers establish that the measure ρ sc ⊞ µ D undergoes a phase transition at a = 1. When a > 1, the support of ρ sc ⊞ µ D consists of two intervals; when a = 1, these intervals meet at zero, where the density has cubic-root decay; and when a < 1 the support is a single interval, on the interior of which the density is strictly positive. (This set of three papers also establishes universality of correlation functions.) We emphasize that our results apply to all a > 0. Using the equivalent [Guionnet and Maïda, 2005, Theorem 6] formula , and the constrained equation (2.14) implicitly defining θ x , one can see that to obtain G ρsc⊞µD (y) for y > r(a), choosing branches according to the requirement that G ρsc⊞µD (y) be decreasing on (r(ρ sc ⊞ µ D ), ∞); this yields if y > r(a). In the limit y ↓ r(a) we obtain This gives us the bounds on the constrained problem (2.14); since K µD (y) = √ 1+4a 2 y 2 +1 2y , this has the solution On the other hand, since ρ sc ⊞ µ D decays at most like a cube root near its edges [Biane, 1997, Corollary 5], we can differentiate under the integral sign to obtain We compute the second term on the right-hand side by setting x = r(ρ sc ⊞ µ D ) in (2.18), since then I (β) (x) = 0 and 2 ) when β = 1. Question 2.19. Does the mechanism driving the deviations {λ N (X N ) ≈ x} change as x passes the critical threshold x c ? Specifically, can one formalize and prove the notion that, with large probability, while the eigenvector corresponding to λ N is delocalized under the above event for subcritical x values, it localizes for supercritical x values? 3 Proof overview 3.1 Spherical integrals. Given a self-adjoint N × N matrix X and θ ≥ 0, consider We recall that E e only averages over the unit sphere, so if X is random then I N (X, θ) and J N (X, θ) are random variables.
If {X N } is such thatμ XN has a weak limit ν, then we might hope that J N (X N , θ) also has a limit depending on ν and θ. This is so; but the limit also depends on λ N (X N ) if θ is sufficiently large. This should not be surprising, since the integrand e N θ e,Xe is maximized near the eigenvector corresponding to λ N (X), especially for larger θ values. Indeed, we have the following result.
Proposition 3.1. [Guionnet and Maïda, 2005, Theorem 6] Suppose that the sequence (A N ) ∞ N =1 of self-adjoint matrices is such thatμ AN → ν weakly for some compactly-supported measure ν, that λ 1 (A N ) has a finite limit, and that λ N (A N ) → M for some real number M . (Notice that we are not assuming that M is the right edge of ν, but of course we must have M ≥ r(ν).) If θ ≥ 0, then

Tilted measures and weak large deviations.
Our general strategy will be to show a weak large deviation principle, as well as exponential tightness. In the proof of the weak-large-deviations lower bound for our measure of interest, we will actually need a weak-large-deviations upper bound for the following family of measures.
Definition 3.2. Given θ ≥ 0, we consider the "tilted" measure P θ N on N × N matrices (symmetric if β = 1, or Hermitian if β = 2) whose density with respect to the law P N of X N is given by .
Notice from the definition of I N that P 0 N = P N . We will need the following asymptotics of the free energy for this measure, with proof in Section 4.
Under the Gaussian Hypothesis, choose any θ ≥ 0; or, under the SSGC Hypothesis, choose any 0 ≤ θ < θ c . Then We split up the weak-large-deviations upper and lower bounds as follows: . Under the Gaussian Hypothesis, choose any θ ≥ 0; or, under the SSGC Hypothesis, choose any 0 ≤ θ < θ c . Then Notice that I (β) (x, 0) = 0 for all measures µ D and all x ≥ r(ρ sc ⊞ µ D ). Thus when θ = 0 we recover the weak large deviation upper bound for the measure of primary interest, under either Hypothesis.
Theorem 3.5. Under the Gaussian Hypothesis, choose any x ∈ R; or, under the SSGC Hypothesis, choose any x < x c . Then 3.3 Outline. When estimating 1 N log P N (|λ N (X N ) − x| ≤ δ) by tilting by spherical integrals, one wants to understand J N (X N , θ) on the event {|λ N (X N ) − x| ≤ δ}. To localize J N (X N , θ), one needs to controlμ XN . Therefore one wants to find a set A M x,δ ⊂ {|λ N (X N ) − x| ≤ δ} of matrices with controlled empirical measures (which will turn out to depend on some M ≫ 1) satisfying both of the following: • On the one hand, A M x,δ is a continuity set for spherical integrals, in the sense that we have a good enough understanding of J N (M, θ) for M ∈ A M x,δ to be able to estimate The next subsection first details the continuity result of [Maïda, 2007], which helps us choose A M x,δ while satisfying the first point, then states a proposition which we need to show that our choice satisfies the second point.

Continuity of spherical integrals.
Proposition 3.6. [Maïda, 2007, Proposition 2.1] For any θ > 0 and any κ > 0, there exists a function g κ,θ : R + → R + going to zero at zero such that, for any δ > 0 and N large enough, if B N and B ′ N are sequences of matrices such This suggests that we introduce the following deterministic sets of N × N symmetric matrices. Fix once and for all a κ satisfying Proposition 3.9, below, and write g θ for g κ 2 ,θ ; then for any x ∈ R, δ > 0, and M > 0, let In the next few results, we discretize the measure ρ sc ⊞ µ D so that we can apply Proposition 3.6 and control Then there exists a sequence of deterministic matrices B ′ N with the following properties: (This is possible since ρ sc ⊞ µ D admits a density [Biane, 1997, Corollary 2].) Then let B ′ N = diag(γ 1 , . . . , γ N −1 , x). Since our distance on probability measures is defined with respect to bounded-Lipschitz test functions, it is easy to show that, in fact, for N sufficiently large. In addition, by Proposition 3.1 we have The result follows.
On the other hand, the result below shows that the restrictions we added to {X : |λ N (X) − x| < δ} to arrive at A M x,δ have probability negligibly close to 1 at the exponential scale. Notice that the first point is exponential tightness. The proof will make up Section 5.
Proposition 3.9. Assume either the Gaussian Hypothesis or the SSGC Hypothesis.
1. For every θ ≥ 0 we have 3. There exists γ > 0 such that, for any 0 < κ < γ and any θ ≥ 0, Theorem 2.7 follows in the classical way from the exponential tightness above, the weak LDP upper bound (Theorem 3.4), and the weak LDP lower bound (Theorem 3.5). We now prove the latter two.

The proof of the weak LDP upper bound.
Lemma 3.10. Fix y ≥ r(ρ sc ⊞ µ D ) and M > y sufficiently large. Under the Gaussian Hypothesis, choose any θ ≥ 0; or, under the SSGC Hypothesis, choose any 0 ≤ θ < θ c . Then Proof. For any θ ′ ≥ 0, we have Fix ǫ > 0. By Corollary 3.8 and Lemmas 4.1 (applied to θ ′ , which is any nonnegative number) and 4.2 (applied to θ, which is subcritical if necessary), if M > y + δ (true for small enough δ since M > y) and for N sufficiently large depending on θ, θ ′ , and ǫ, we thus have By taking N → ∞, then δ ↓ 0, then ǫ ↓ 0, we obtain lim sup which gives us the result by optimizing over θ ′ .
Proof of Theorem 3.4. We first focus on the case when x < r(ρ sc ⊞ µ D ). For such an x, if δ is so small that x + δ < r(ρ sc ⊞ µ D ) − δ, then whenever |λ N (X N ) − x| ≤ δ, the empirical spectral measureμ XN does not charge (r(ρ sc ⊞ µ D ) − δ, r(ρ sc ⊞ µ D )). Hence d(μ XN , ρ sc ⊞ µ D ) ≥ f (δ) for some positive function f . Thus for such δ and for N large enough we have which suffices in light of Proposition 3.9. Thus in the following it remains only to consider x ≥ r(ρ sc ⊞ µ D ). Fix θ ≥ 0, δ > 0, x > r(ρ sc ⊞ µ D ), and a sufficiently large M . Then we have An application of Proposition 3.9 gives us lim sup By taking δ ↓ 0 and applying Lemma 3.10, we obtain Finally we obtain the result by taking M → ∞ and applying again Proposition 3.9.
3.6 The proof of the weak LDP lower bound. The following lemma relies on results about the rate function which will be established in Section 6.
Lemma 3.11. Under the Gaussian Hypothesis, choose any x ≥ r(ρ sc ⊞ µ D ); or, under the SSGC Hypothesis, choose any r(ρ sc ⊞ µ D ) ≤ x < x c . Then there exists θ x > 0 such that, for any M sufficiently large depending on x and any δ > 0 sufficiently small depending on x, we have Proof. Fix x ≥ r(ρ sc ⊞ µ D ), and let θ x be such that I (β) (x) = sup θ≥0 I (β) (x, θ) = I (β) (x, θ x ). Proposition 6.1 below shows that this exists and is unique (except at x = r(ρ sc ⊞ µ D ), where we choose one of many possible θ x values by convention), and that θ x < θ c whenever x < x c . We claim that in fact P θx N (A M x,δ ) = 1 − o(1); to prove this, by Proposition 3.9 it suffices to show > ǫ} for some ǫ, and since the law of λ N is exponentially tight under P θx N , we need only show that for K large enough But Theorem 3.4 shows a weak large deviation upper bound for P θx N with the rate function J (β) x (y) = I (β) (y) − I (β) (y, θ x ), which Proposition 6.1 below shows is nonnegative and vanishes uniquely at y = x. (This theorem applies, since θ x is less than θ c if necessary.) Since [r(ρ sc ⊞ µ D ) − 1, x − δ] ∪ [x + δ, K] is a compact set that does not contain x, this suffices.
Proof of Theorem 3.5. If x < r(ρ sc ⊞ µ D ), then I (β) (x) = +∞, and there is nothing to prove. Thus we will assume in the following that x ≥ r(ρ sc ⊞ µ D ).
Whenever X ∈ A M x,δ , by Corollary 3.8 we have for N sufficiently large. In addition, for every ǫ > 0, Lemma 3.11 tells us that for N sufficiently large depending on ǫ we have P θx N (A M x,δ ) ≥ e −N ǫ . We wish to use Proposition 3.3 to conclude that, for N sufficiently large depending on ǫ and on θ x , we also have Under the Gaussian Hypothesis, this is permissible for every x; under the SSGC Hypothesis, our restriction x < x c tells us by Lemma 3.11 that θ x < θ c , so that Proposition 3.3 indeed applies. Thus Thus, fixing some M sufficiently large, we obtain and since this is true for every ǫ > 0 we can take the limit as δ ↓ 0 to conclude.

Free energy expansion
In this section we prove Proposition 3.3.
Proof under the Gaussian Hypothesis. Then one can compute directly that The proof under the SSGC Hypothesis is more involved and will take up the remainder of this section. We separate the upper and lower bounds as follows: Lemma 4.1. Under the SSGC Hypothesis, for any θ ≥ 0 we have Lemma 4.2. Under the SSGC Hypothesis, for any 0 ≤ θ < θ c we have The proof of the lower bound will use the following two technical results.
Lemma 4.3. Under the SSGC Hypothesis, for every δ > 0 there exists ǫ(δ) > 0 such that, for every N ∈ N, every i, j ∈ 1, N , and every t ∈ R with |t| ≤ ǫ(δ) if β = 1 (or every t ∈ C with |t| ≤ ǫ(δ) if β = 2), 2β . (4.5) Proof of Lemma 4.1. For the remainder of this paper, we introduce the notation . For every unit vector e, by the sub-Gaussian-Laplace-transform assumption of the SSGC Hypothesis we have To complete the proof, we integrate over S N and apply Proposition 3.1.
Proof of Lemma 4.2. Fix δ > 0, and let ǫ = ǫ(δ) be as in Lemma 4.3, proved below. Whenever the unit vector e is such that e ∞ ≤ N −3/8 , we have for N ≥ N 0 (δ). (The proof below will work with any exponent strictly between −1/2 and −1/4; but since the exponent does not appear in the final result, we have chosen −3/8 for definiteness.) Thus the lower bound on the Laplace transform of the SSGC Hypothesis gives us, for such vectors e, Thus Lemma 4.4, which is proved below, and Proposition 3.1 give us for every δ > 0.
Proof of Lemma 4.4. This builds on the proof of Lemma 14 in [Guionnet and Maïda, 2005]. Notice that the upper bound in Equation (4.5) is for free; we only need show the lower bound. It is well known that (e 1 , . . . , e N ) d = g 1 g 2 , . . . , g N g 2 where g = (g 1 , . . . , g N ) is a standard Gaussian vector in R N . The idea is to work in this Gaussian representation, relying on the fact that g will concentrate around √ N . Towards this end, we rewrite our desired inequality as Since standard Gaussian measure is isotropic, we may and will assume for the remainder of this proof that the d i 's are ordered as d 1 ≥ · · · ≥ d N . Write v N for the unique solution in (d 1 − 1 2θ , +∞) of the equation (This exists and unique because the left-hand side is a strictly decreasing positive function of v ∈ (d 1 − 1 2θ , +∞), tending to infinity as v ↓ d 1 − 1 2θ and tending to zero as v → ∞.) Let us pause to collect some facts about v N . If we write for N 0 large enough, then we have [Maïda, 2007, Fact 2.4 Furthermore, the proof of [Guionnet and Maïda, 2005, Theorem 2] shows that, since θ < θ c , there exists some small η > 0 such that for all i, 1 + 2θv N − 2θd i ≥ η. (4.7) By the proof of [Guionnet and Maïda, 2005, Lemma 14] (for the first inequality) and Equation (4.6) (for the second), for every 0 < κ < 1 2 and N large enough depending on κ, we have (4.8) For 0 < κ < 1 2 , we introduce the event A N (κ) = g 2 2 N − 1 ≤ N −κ . Now the same arguments from [Guionnet and Maïda, 2005, Lemma 14], along with Equation (4.6), give where P vN N = P vN ,DN ,θ N is the probability measure on R N defined by By Equations (4.8) and (4.9), we are done if we can show that The proof of [Guionnet and Maïda, 2005, Lemma 14] shows that, for our choice of v N and since we have chosen θ < θ c , we have for N large enough depending on κ. But now we observe thatg which is o(1). This concludes the proof.

Concentration and exponential tightness for tilted measures
5.1 Proof overview. The proof of Proposition 3.9 is broken into the following four lemmata: Lemma 5.1. If Proposition 3.9 holds for θ = 0, then it holds for all θ > 0. (For the last point, the same γ > 0 works for all θ ≥ 0.) Lemma 5.2. For any K > 2d max , In particular, the first point of Proposition 3.9 is true for θ = 0.
Lemma 5.3. The second point of Proposition 3.9 is true for θ = 0: Lemma 5.4. Under Assumption II, the third point of Proposition 3.9 is true for θ = 0: There exists γ > 0 such that, for any 0 < κ < γ, Note that this result is the only place in the paper where we use Assumption II.
Now, whenever A = A N is a Borel subset of the space of N ×N real matrices, Equation (5.5) and Cauchy-Schwarz give us, for N sufficiently large depending on θ, where the last inequality follows from Lemma 4.1. Thus for any sequence {A N } we have This estimate gives us the following two points, from which we can verify the various claims of Proposition 3.9 by taking various choices of {A N } and {A M,N }. 5.3 Proof of Lemmas 5.2 and 5.3. For Lemma 5.2, notice that it suffices to bound P N ( X N ≥ K). But and the second term vanishes for K large enough, so we only need to control the first term. But this was done in [Guionnet and Husson, 2018, Lemma 1.8]. The constants are slightly worse for the β = 2 estimate, and we phrase Lemma 5.2 in terms of these worse constants. Lemma 5.3 is an immediate consequence.

Proof of Lemma 5.4.
Lemma 5.6. With C and ǫ 0 as in Assumption II, then for any η ≤ 1 we have Proof. By recalling the definition of the Dudley distance and by calculating the L ∞ norm and Lipschitz constants of the function y → 1 E+iη−y , we find that uniformly in E ∈ R. Now we control d(ρ sc ⊞μ DN , ρ sc ⊞µ D ) in terms of d(μ DN , µ D ). Write d L for the Lévy distance between probability measures d L (µ, ν) = inf{ǫ > 0 : µ(A) ≤ ν(A ǫ ) + ǫ for all Borel A}.
Then it is classical [Dudley, 2002, Corollary 11.6.5, Theorem 11.3.3] that, whenever µ and ν are probability measures on R, On the other hand, [Bercovici and Voiculescu, 1993, Proposition 4.13] says that Putting these together, we obtain This finishes the proof by Assumption II.
Lemma 5.7. Fix some A > 0 independent of N . If δ > 0 is chosen sufficiently small, then Proof. Throughout, we write z = E + iη. Later, we will decide how to choose η = η(N ). We start by giving an informal overview of the proof. We will compare E XN [Gμ X N (·)] and G ρsc⊞µD (·) via three intermediate comparisons. First, we will import a local law to show that Gμ X N (z) is, with high probability and for appropriate z values, close to the negative normalized trace of a matrix M MDE (z) = M N,MDE (z) that exactly solves a matrix equation called the Matrix Dyson Equation (MDE): (The negatives appear since the convention in the local-law literature is to define the Stieltjes transform of a measure as µ(dy) z−y instead of our µ(dy) y−z . We have preferred to stick to that convention when working in that vein, so that the reader can more easily cross-reference.) Then we will show that a matrix M Wig (z) = M N,Wig (z) whose normalized trace is exactly −G ρsc⊞μD N approximately solves the MDE; standard arguments about the so-called stability of the MDE will then show Finally, we will use Lemma 5.6 to show G ρsc⊞μD N (z) ≈ G ρsc⊞µD (z).
Notice that all quantities here, except for Gμ X N , are deterministic. For a matrix M ∈ C N ×N , we define its imaginary part as ℑ(M ) = 1 2i [M − M * ]. Whenever S : C N ×N → C N ×N is a linear operator preserving the set {M : ℑ(M ) > 0}, it is known [Helton et al., 2007] that the following constrained equation admits a unique solution: In particular, we will be interested in the unique solutions to this equation corresponding to two operators S: By rearranging Equation (5.8) and taking the normalized trace, one can see that 1 N tr M Wig (z) satisfies the Pastur equation , which characterizes [Pastur, 1972] the Stieltjes transform of ρ sc ⊞μ DN . Hence (recall our sign convention) • For any δ > 0, write H = {z ∈ C : η > 0} and define the complex domain D δ far = {z ∈ H : |z| ≤ N 100 , η ≥ N −δ }. (The notation reminds us that points in this domain are relatively far from the real line; typically in local laws the optimal scale is η ≫ 1 N .) Then [Erdős et al., 2019, Theorem 2.1] tells us that there is a universal constant c > 0 such that, for any sufficiently small ǫ > 0, there exists C = C(ǫ) such that Since 1 N tr(M MDE (z)) is known by [Ajanki et al., 2019, Proposition 2.1] to be the Stieltjes transform of some measure, we also have the trivial bounds Gμ X N (E + iη) ≤ 1 η and 1 N tr(M MDE (E + iη)) ≤ 1 η . If η = N a for some −cǫ < a < 0, then for N sufficiently large we have {E + iη : |E| ≤ A} ⊂ D cǫ far ; thus whenever |E| ≤ A and η is as above we have • By the definition of M Wig and since =:E(z) .
As the notation suggests, we will show that E(z) is an error term, so that M Wig (z) approximately solves Equation ( In particular, we have We will use results of [Erdős et al., 2019], which are phrased in terms of a special matrix norm B x,y,K * , depending on K ∈ N and x, y ∈ C N ; the only information we shall need about this norm is that | x, By | ≤ B x,y,K * for every x, y, and K. If we choose η = N a for some positive or negative a, then [Erdős et al., 2019, Lemma 5.4] tells us that, for every x, y, and K, and for every ǫ > 0 and N ≥ N 0 (ǫ, K), we have sup x,y Thus [Erdős et al., 2019, Equation 8] tells us that, for every ǫ > 0, there exists δ(ǫ) > 0 and C ǫ > 0 such that, far .
In particular, if {e i } are the standard basis vectors, then for every ǫ > 0, K ∈ N and N ≥ N 0 (ǫ, K) we have far .
For definiteness, let us choose, say, ǫ = 1 400 and K = 100, and write C ′ = C 1 400 and δ = δ( 1 400 ); then if η > N −δ , for N sufficiently large we have {E + iη : |E| ≤ A} ⊂ D δ far so that • If η ≤ 1 then Lemma 5.6 gives us Combining these estimates, we have the following result: If η = N −δ and δ is sufficiently small, then every assumption we made on η in the above bounds is satisfied and, for all sufficiently small ǫ > 0, This concludes the proof.
Lemma 5.9. Write Then there exists some small γ > 0 such that Proof. In order to apply a standard technique for bounding Kolmogorov-Smirnov distances, we must first show Since E XN [F XN ] and F ρsc⊞µD both take values in [0, 1], it suffices to find M > 0 such that Furthermore, since ρ sc ⊞ µ D is compactly supported, we may take M so large that F ρsc⊞µD (x) vanishes for x < −M and is identically one for x > M . Now, Similarly, which finishes the proof of (5.10).
Thus we may import [Bai, 1993, Theorem 2.2], which says that, for any choice of η > 0 and B > 0, we have We will control the three terms on the right-hand side in order. In the course these estimates we shall choose the parameters B and η = η(N ).
• Since the compactly supported measure ρ sc ⊞µ D has L ∞ density [Biane, 1997, Corollary 5], F ρsc⊞µD is Lipschitz, so we can control the third term by • Choose some B > max(|r(ρ sc ⊞ µ D )| , |l(ρ sc ⊞ µ D )|); then arguments as above show that Since we will ultimately choose η = N −δ for some small δ > 0, we can choose B so large that this decays exponentially fast.
• If we choose η = N −δ for δ > 0 sufficiently small, then Lemma 5.7 tells us that We combine these to obtain Lemma 5.12. Under either the Gaussian Hypothesis or the SSGC Hypothesis, there exist positive constants C 1 and C 2 (depending on the constants in those hypotheses) such that Proof. Concentration results of this type are quite classical, using either the Herbst argument under the log-Sobolev assumption, or results of Talagrand under the compact-support assumption. Indeed, results of the former type are available "out of the box"; results of the latter type are available "out of the box" when D N vanishes, and we will explain below how to modify the existing proofs for our situation. Suppose first that we satisfy the log-Sobolev option of the SSGC Hypothesis, that is, that the laws of the entries of W N satisfy a log-Sobolev inequality with a uniform constant. Since Gaussian measure satisfies the log-Sobolev inequality, the same statement is true under the Gaussian Hypothesis. Furthermore, one can see directly from the definition of the inequality that, if the law of the real random variable X satisfies the logarithmic Sobolev inequality with constant c, then for any deterministic α ∈ R the law of X + α also satisfies the logarithmic Sobolev inequality with constant c. Thus the laws of the entries of √ N X N satisfy a log-Sobolev inequality with uniform constant. This uniformity allows us to import the result [Guionnet and Zeitouni, 2000, Corollary 1.4b], which tells us that there exist positive universal constants C 1 and C 2 such that, for any δ > 0, By choosing δ = N −1/6 , this completes the proof under the Gaussian Hypothesis or under the log-Sobolev option of the SSGC Hypothesis. Next, we turn to the compact-support option of the SSGC Hypothesis. The barrier to using existing results is that, even if the entries of W N are uniformly compactly supported, the diagonal entries of are supported in boxes that, while of fixed size, may have centers tending to infinity. So we modify the existing proofs for this situation. Specifically, we start by importing the following result.
Lemma 5.13. [Guionnet and Zeitouni, 2000, Theorem 1.3a] 1 Fix (a i,j ) i,j≤N ⊂ R N , and suppose that there exists a compact set K ⊂ R such that the i, jth entry of √ N X N is supported on the compact set a i,j + K = {a i,j + k : k ∈ K}. Write δ 1 (N ) = 8 |K| √ π/N . Let K ⊂ R be compact, and define the class of test functions Then, for any δ ≥ 4 |K| δ 1 (N ), we have The authors of [Guionnet and Zeitouni, 2000] then extend this result to a supremum over all bounded Lipschitz functions, not just those that are compactly supported, but in the case that E[X N ] = 0. Their arguments require a bound on 1 N tr(X 2 N ), which we replace for our model with 1 N tr(X 2 N ) ≤ sup{|x| 2 : x ∈ K} + d 2 max + 1, which is true for N sufficiently large. Following their proofs but substituting this estimate, we obtain the following result, which is analogous to [Guionnet and Zeitouni, 2000, Corollary 1.4a]: Lemma 5.14. Under the assumptions and notation of the previous lemma, write S = sup{|x| 2 : x ∈ K} and M = 8(S + d 2 max + 1). Then for any N sufficiently large and for any δ > 0 satisfying the implicit equation δ > (128(M + √ δ)δ 1 (N )) 2/5 , we have For N sufficiently large, δ = N −1/6 satisfies the implicit equation given in the lemma, and it is easy to show that for N large enough, which gives the desired result in this case.
Proof of Lemma 5.4. By Lemma 5.12, if κ < 1 6 we have where the last inequality follows from Lemma 5.2. Similarly, Thus it remains to estimate . We will do this by approximating f by a test function smooth enough to integrate by parts.
More precisely, suppose first that f is C 1 and f ′ L ∞ = sup x =y Combining these and and optimizing over f , we have where the last equality follows from Lemma 5.9. Thus if we choose 0 < κ < γ, we have for sufficiently large N ; in particular this shows us that from which point it is easy to conclude the proof.

Properties of the rate function
The purpose of this section is to show that the supremum in the definition of I (β) (x) = sup θ≥0 I (β) (x, θ) is achieved at a value θ x , which is unique (except for x = r(ρ sc ⊞ µ D ), where it is chosen by convention) and which depends injectively on x. This implies that, in the large-deviation upper bound established for tilted measures in Theorem 3.4, the rate function has a unique zero; this property was crucial in the proof of Lemma 3.11 above. At the end of this section, we establish goodness of I (β) (·).
We also have with equality if and only if G µD (r(µ D )) = G ρsc⊞µD (r(ρ sc ⊞ µ D )). In addition, otherwise, by convention and if x < x c then θ x < θ xc . Finally, Proof. For the duration of this proof, we introduce the notation It can be checked directly from the definition that that, for any compactly supported measure ν and any M ≥ r(ν), Notice that this is a continuous function of θ. Furthermore, it is known [Guionnet and Maïda, 2018, Lemma 11] that G µ sc D (r(µ sc D )) ≤ min(G µD (r(µ D )), G ρsc (r(ρ sc ))) = min(G µD (r(µ D )), 1).
Since G ν is decreasing on (r(ν), +∞), there are three (or two) phases of θ values: where the third case disappears if G µD (r(µ D )) = +∞ and the second case disappears if x = r(µ sc D ) and G µ sc D (r(µ sc D )) = G µD (r(µ D )). Notice that this is a continuous function of θ ≥ 0, and that, if G µ sc , +∞]. For the purposes of our analysis, the endpoints of this interval are degenerate cases, and will be handled separately at the end. For now, assume that G µD (r(µ D )) ∈ (G µ sc D (r(µ sc D )), +∞).
Then ∂ θ I (1) (x, θ) has three non-degenerate piecewise sections, and x c < ∞, where we recall the threshold In the course of the casework, we will show that x c > r(µ sc D ) in this nondegenerate regime.
( 6.2) The function f x defined above is still strictly concave on its domain and still vanishes at the left endpoint of this domain, but now its value at the right endpoint is nonnegative; thus I (1) (x, θ) is strictly increasing for θ ∈ ( 1 2 G µ sc D (x), 1 2 G µD (r(µ D ))). A simple analysis of ∂ θ I (1) (x, θ) for θ ≥ 1 2 G µD (r(µ D )) shows that θ x as defined above is, as claimed, the unique θ value that maximizes I (1) (x, θ), and I (1) (x) > 0.
Thus we must have θ x1 = θ x2 . Now we explain the necessary adjustments in the degenerate cases.
-Degenerate Subcase b (x > r(µ sc D )): Then f x as above is defined and strictly concave on a nondegenerate interval; it vanishes at the left endpoint of this interval; it takes a positive maximum (namely x − r(µ sc D )) at the right endpoint of this interval. Thus the analysis of Case 2 above holds to show that θ x is given by Equation (6.2).
The argument above for injectivity goes through, since Equation (6.2) works for all x values.
• Degenerate Case 2 (G µD (r(µ D )) = +∞): Here x c = +∞, and all x values are subcritical. The function f x from Case 1 is then defined and strictly concave on the interval ( 1 2 G µ sc D (x), +∞). It has a unique maximum at