Large deviations for the largest eigenvalue of the sum of two random matrices

In this paper, we consider the addition of two matrices in generic position, namely A + U BU * , where U is drawn under the Haar measure on the unitary or the orthogonal group. We show that, under mild conditions on the empirical spectral measures of the deterministic matrices A and B, the law of the largest eigenvalue satisfies a large deviation principle, in the scale N, with an explicit rate function involving the limit of spherical integrals. We cover in particular all the cases when A and B have no outliers.


Introduction
Understanding the spectrum of the sum A+B of two Hermitian matrices knowing the spectra of A and B respectively is a classical and difficult problem. Since the pioneering works of Voiculescu [1991], we know that free probability provides efficient tools to describe, at least asymptotically, the spectrum of the sum of two large Hermitian matrices in generic position from one another. More precisely, if A N and B N are two deterministic N × N Hermitian matrices and U N is a unitary random matrix distributed according to the Haar measure, then, in the large N limit, A N and U N B N U * N are asymptotically free and the spectral distribution of H N := A N + U N B N U * N is given by the free convolution of the spectral distributions of A N and B N . This global law, that is the convergence of the spectral distribution of H N at macroscopic scale, has been studied in details by Speicher [1993], Pastur and Vasilchuk [2000] among others. The local law, that is the comparison of the spectral distribution of H N with the free additive convolution of the spectral distributions of A N and B N below the macroscopic scale was then investigated by Kargin [2012] and Bao et al. [2017]. In this paper, we will be interested in the behavior of the largest eigenvalue of H N . As a corollary of the results of Collins and Male [2014] on strong asymptotic freeness, we know that if A N and B N have no outliers, then the largest eigenvalue of H N converges to the right edge of the support of the free convolution of the spectral distributions of A N and B N . In this work, we investigate the large deviations of this extreme eigenvalue.
In the framework of random matrix theory, there are very few large deviation results known about the spectrum, basically because the eigenvalues are complicated functions of the entries. A notable exception is given by the Gaussian invariant ensembles for which the joint law of the eigenvalues can be explicitly written as a Coulomb gas. Based on this explicit formula, large deviation principles for the empirical spectral distribution at global scale have been established by Ben Arous and Guionnet [1997] and for the largest eigenvalue by Ben Arous et al. [2001]. Another special case is given by the sum of a deterministic matrix and a Gaussian invariant ensemble. Then, the spectrum can be constructed as the realization at time one of a Hermitian (or symmetric) Brownian motion starting from a given deterministic matrix. This point of view was used by Guionnet and Zeitouni [2002] to study the large deviations of the empirical measure, and the large deviations for the process of the largest eigenvalue starting from the origin were derived by ?. One of the applications of the present paper is to provide the large deviation for the largest eigenvalue of this sum by using another approach based on spherical integrals. Beyond these cases where specific tools are available, it was observed by Bordenave and Caputo [2014] that deviations of the spectrum of Wigner matrices for which the distribution of the entries has a tail which is heavier than Gaussian are naturally created by large entries. This key remark allowed to obtain the large deviations for the empirical measure in [Bordenave and Caputo, 2014] (see also [Groux, 2017] for the counterpart for covariance matrices) and for the largest eigenvalue in [Augeri, 2016b]. Large deviations for the spectrum of Wigner matrices with sub-gaussian entries is still completely open as far as the empirical measure is concerned. One can mention the deviations results of Augeri [2016a] for the moments of the empirical spectral distribution in several models. Concerning the deviations of the largest eigenvalue, beyond the works [Ben Arous et al., 2001, ?, Augeri, 2016b] already cited above, the following models have been so far studied : Gaussian ensembles plus a rank one perturbation by Maïda [2007], very thin covariance matrices by Fey et al. [2008], finite rank perturbations of deterministic matrices or unitarily invariant ensembles by Benaych-Georges et al. [2012]. In a companion paper, Guionnet and Husson [2018] have established a large deviation principle for the largest eigenvalue of Wigner matrices with entries having sharp sub-Gaussian tails, such as Rademacher matrices. They show that the speed and the rate function of this large deviation principle are the same as in the Gaussian case.
Acknowledgments The idea to tilt measures by the spherical integral came out magically from a discussion with M. Potters in UCLA in 2017 and we wish to thank him for this beautiful inspiration. We also benefited from many discussions with J. Husson and F. Augeri with whom one of the author is working on a companion project on Wigner matrices. We also thank Benjamin McKenna for pointing out a gap in the proof of Proposition 7 and anonymous referees for very useful remarks in particular on the Appendix. Finally, we are very grateful for stimulating discussions with O. Zeitouni and N. Cook.

Statement of the results
Let (A N ) N ≥1 and (B N ) N ≥1 be two sequences of deterministic real diagonal matrices, with A N and B N of size N × N. We denote by λ their respective eigenvalues in non increasing order, by their respective spectral radius. We define bŷ converge weakly as N grows to infinity respectively to µ a and µ b , compactly supported on R. Moreover, converge as N grows to infinity to ρ a and ρ b respectively.
A key argument of the proof will be a tilt of the measure by a rank one spherical integral. The rank one spherical integral is defined as follows: for any θ ≥ 0 and M N an Hermitian matrix of size N, The rate function of our large deviation principle will crucially involve the limit J β µ (θ, ρ) of J β N (θ, H N ) as N grows to infinity, which we now describe. For µ a compactly supported probability measure on R, we denote by r(µ) the right edge of the support of µ and by G µ the Stieltjes transform of µ given for λ > r(µ) by It is decreasing on the interval (r(µ), ∞). By taking the limit as λ decreases to r(µ), one can also define G µ (r(µ)) ∈ R + ∪ ∞. As G µ is bijective from (r(µ), ∞) to (0, G µ (r(µ))), one can define its inverse on the latter interval, that we denote by K µ . Then, for any z ∈ (0, G µ (r(µ))), we define The function R µ is called the R-transform of µ. One can check that R µ is increasing and that lim z→0 R µ (z) = λµ(dλ), so that R µ is bijective from (0, G µ (r(µ))) to λµ(dλ), r(µ) − 1 Gµ(r(µ)) . We denote by Q µ its inverse on this interval. We can now define, for β = 1 or 2, θ ≥ 0, µ a compactly supported probability measure and ρ ≥ r(µ): The convergence of J β N (θ, M N ) towards J β µ (θ, ρ), obtained by the authors in [Guionnet and Maïda, 2005], will be stated precisely in Lemma 10. At this point, we want to emphasize that, for θ large enough, the limit depends not only on the limiting spectral distribution µ but also of the limit ρ of the largest eigenvalue of M N : this observation is crucial in our use of the spherical integral to produce an interesting tilt. If µ 1 and µ 2 are two probability measures compactly supported on R, we denote by µ 1 µ 2 the free convolution of µ 1 and µ 2 . It is uniquely determined as the unique probability measure with R-transform equal to the sum of the R-transforms of µ 1 and µ 2 (see [Voiculescu, 1991]). For any θ ≥ 0 and x ≥ r(µ a µ b ), we denote by It is easy to check the following: Lemma 1. Let µ a , µ b , ρ a and ρ b be given as in Assumption 1. For β = 1 or 2, the function I β is a good rate function, that is for any α ∈ R, the level set {I β ≤ α} is a compact subset of R. Moreover, for any x > ρ a + ρ b , I β (x) = +∞.
The proof will be given at the beginning of Section 4. We can now state the main results of this paper. The first result is the following large deviation upper bound: Proposition 2. Under Assumption 1, for β = 1 or 2, for any x ∈ R such that We will then derive the following large deviation lower bound: Proposition 3. Under Assumption 1, for β = 1 or 2, for any x ∈ R such that This leads to the following important corollary: Theorem 4. Under Assumption 1 and if moreover, then, for β = 1 or 2, the law of λ N max under m β N satisfies a large deviation principle in the scale N with good rate function I β . More precisely, for any F closed Borel subset of R, and for any O open Borel subset of R, A few remarks have to be made on the condition (NoOut). Under assumptions that are slightly stronger then Assumption (H bulk ), ? established that, whenever (NoOut) is satisfied, A N + U B N U * has no outlier, that is, its largest eigenvalue converges to r(µ a µ b ). Another related remark is that, if A N and B N have no outliers, namely ρ a = r(µ a ) and ρ b = r(µ b ), then the condition (NoOut) is automatically satisfied. This will be stated in Lemma 13 and leads to the following corollary From there, one can recover partly Theorem 3.2. in [Maïda, 2007].
Remark 6. If we choose A N to be a rank one deterministic matrix with eigenvalue ρ a > 0 and U B N U * to be a random matrix from the Gaussian Unitary (or Orthogonal) Ensemble, one can study the largest eigenvalue of A N +U B N U * by conditioning on the deviations of the largest eigenvalue of U B N U * ,. These large deviations were obtained in ? and we denote by J β its rate function. If ρ a ≤ β 2 , we know that the deformed model has no outliers and one can apply Corollary 5. For any x ≥ √ 2β, the rate function of the deformed model is given by K β (x) := inf √ 2β≤y≤x (J β (y)+I β (x)), where I β corresponds to µ a = δ 0 , µ b = σ β and ρ b = y. Standard computations allow to identify the rate function as the funtion K β ρa in [Maïda, 2007]. To get a taste of what happens in the case with outliers, we also consider in Appendix A the following model: let (U (1) 1 , . . . , U (p) 1 ) be independent random vectors uniformly distributed on the unit sphere (in R N if β = 1 and C N if β = 2) and γ 1 , . . . , γ p be nonnegative real numbers. We consider the following deformed model : (2.4) We show in Theorem 18 that we still have a large deviation principle, for which the rate function will depend on the γ i 's. The rest of the paper will be organized as follows: in the next section, we will first prove a more general result than Proposition 2, that holds not only for m β N but also for a whole family of tilted measures. This will be helpful in the proof of Proposition 3, that will be developed in Section 5. Before getting there, we will study in Section 4 some properties of the rate function I β . The last section will be devoted to the proof of Theorem 4 and Corollary 5, with Lemma 13 as prerequisite. At the end of the paper, in Appendix A, we will study the deviations of the largest eigenvalue of X N for the deformed model (2.4).

Large deviation upper bound for tilted measures
For θ ≥ 0, β = 1 or 2, we define a tilted measure on O N if β = 1 and U N if β = 2 as follows It is easy to check that m β,θ N is a probability measure: indeed, for any U, we have that . For these tilted measures, we have the following weak large deviation upper bound : Proposition 7. Under Assumption 1, for β = 1 or 2, for any θ ≥ 0, for any

1)
and for any x ≥ r(µ a µ b ) such that Remark 8. Applying this proposition with θ = 0 gives Proposition 2.
As we will see in Section 5, establishing an upper bound for any θ ≥ 0 will be useful in the proof of Proposition 3. To prove Proposition 7, and in particular its first statement, we will need to check that, under m β,θ N the empirical spectral distribution More precisely, we equip the set P(R) of probability measures on R with the bounded Lipschitz distance d: for any Lipschitz function f : , then for any µ and ν in P(R), We then have the following concentration result: Lemma 9. Under Assumption (H bulk ), for β = 1 or 2 and any θ ≥ 0, Proof. Let β = 1 or 2 and θ ≥ 0 be fixed. Observe that for any Hermitian matrix M N bounded by K in operator norm we have

As a consequence, for any Borel subset
, which is assumed to be finite. Therefore it is enough to prove Lemma 9 for θ = 0, that is For β = 2, Theorem 3.8 in [Meckes and Meckes, 2013] states that there exist c, C > 0 such that from which the lemma follows. A careful reading of [Meckes and Meckes, 2013] shows that the exact same result as (3.4) also holds for β = 1.
Before proving Proposition 7, we will recall some results about the convergence and the continuity of spherical integrals.
Lemma 10 (Proposition 2.1 in [Maïda, 2007] and Theorem 6 in [Guionnet and Maïda, 2005]). For any θ ≥ 0, there exists a continuous function g θ with g θ (0) = 0 such that for any δ > 0, if the sequences If morever,μ G N converges weakly, as N goes to infinity to µ and λ 1 (G N ) converges to ρ, then We can now prove Proposition 7. In the sequel, we will denote by ν β Proof of Proposition 7. The first claim (3.1) is a direct consequence of the previous lemma. Indeed, let x < r(µ a µ b ) and δ 0 : Using Corollary 5.4.11 for β = 2 and Exercise 5.4.18 for β = 1 in [Anderson et al., 2010], we know that ν β N converges weakly to µ a µ b as N goes to infinity. As the distance d metrizes the weak convergence, for N large enough, We now prove (3.3). Let δ > 0 and x ≥ r(µ a µ b ) be fixed and define the following event: Then we have,

To lighten a bit the notations we write A, B and H for
We detail the first term, the second being similar. According to ? (see also Section 4.1.
where at the last line, we have used the second part of Lemma 10. Letting δ going to zero and then optimizing over θ ≥ 0, we get the required upper bound.

Properties of the rate function I β
We now check the properties of the rate function I β defined in (2.1).
Proof of Lemma 1. An ingredient for the proof is the following: for any compactly supported µ, for any On the other hand, the limit of R µ (θ) as θ grows to G µ (ρ) is ρ − 1 Gµ(ρ) . As R µ is nondecreasing, we get the upper bound. Moreover, it is easy to check that, for any x ≥ 0, there exist C, C ∈ R (depending on µ and x but not on θ) such that, for θ large enough, we have so that, for any x ≥ 0, there exists c, c ∈ R such that, for θ large enough, by the properties of the R-transform. The function I β is therefore nonnegative. If we denote by g the lower semi-continuous function which is equal to −∞ on [r(µ a µ b ), +∞) and +∞ outside, then I β = sup(g, sup θ I β (θ, ·)) is lower semicontinuous as a supremum of lower semi-continuous functions. As it is infinite outside the interval [r(µ a µ b ), ρ a + ρ b ], it is a good rate function.
We will now turn to the proof of the lower bound of our large deviation principle, stated in Proposition 3. To complete its proof, we will need to further study the properties of the function I β . First, let us remark that the cases when µ a is a Dirac mass at ρ a (or µ b is a Dirac mass at ρ b ) are not very interesting. In this case, the free convolution µ a µ b is just a shift of µ b by ρ a (or respectively of µ a by ρ b ) and λ N max converges with probability one to ρ a + ρ b . Hence, the large deviations have an infinite rate function in the scale N except at ρ a + ρ b where it vanishes.
Consequently, in the sequel, one can assume without loss of generality that Assumption 2. µ a is not a Dirac mass at ρ a and µ b is not a Dirac mass at ρ b .
We have the following: Lemma 11. Under Assumptions 1 and 2, for any r( then, for β = 1 or 2, there exists a unique θ ≥ 0 such that We denote by θ β The first remark is that if G µa (ρ a ) and By symmetry of the problem, without loss of generality, one can now assume that , then there exist some constants C 1 , C 2 and C 3 (that may depend on µ a , ρ a , µ b , ρ b and x but not on θ) such that In the computation, we have used the well known fact that R µa µ b = R µa + R µ b when the three functions are well defined. Therefore, one can check that the function I β x is continuously differentiable and its derivative is given by: We now set α ) and therefore K µa µ b (G µa (ρ a )) are well defined. As K µa µ b is a decreasing function and we have As K µ b is also a decreasing function, this yields: which is equivalent to α x ≥ G µa (ρ a ). There are therefore two cases to consider and we claim that: where Q µ b is the inverse of R µ b as defined in Section 2; Case 2: If α x ≥ G µ b (ρ b ), then I β x reaches its maximum at θ β x := β 2 α x . Let us now prove this claim. On the interval 0, β 2 G µa (ρ a ) , the function (I β x ) is nondecreasing and it vanishes at zero, it is therefore nonnegative so that I β x is nondecreasing on this interval. We have .
. We now distinguish the two cases.
such that I β x is increasing on β 2 G µa (ρ a ), θ β x and then decreasing. One can check that the point where (I β x ) cancels is given by β , ∞ and negative at β 2 G µ b (ρ b ) so it remains negative and I β x is decreasing on this interval. The first claim holds true.
, θ β x and then decreasing. One can check that the point where (I β x ) cancels is given by β 2 α x and the second claim holds true. This concludes the proof of the uniqueness of θ.
Moreover, looking carefully at the definition of θ β x in Case 1 and Case 2, one can see that it is an increasing function of x. Indeed, it is increasing on the intervals . As a consequence, for x = y such that r(µ a µ b ) ≤ x, y < ρ a + ρ b , θ β x = θ β y and therefore sup θ≥0 I β (θ, y) > I β (θ β x , y). We now have to deal with the case when y = ρ a + ρ b , that is to show that: and therefore the supremum is infinite and (4.2) holds. Assume now that From there, we get that, for any u > G µa (ρ a ) ∨ 2M α , Therefore, there exist c, c ∈ R, such that for any θ ≥ G µa (ρ a ) ∨ 2M α , so that, letting θ grow to infinity, we get again that I β (ρ a + ρ b ) = ∞ and (4.2) holds. This concludes the proof of Lemma 11.

Large deviation lower bound
The goal of this section is to show Proposition 3. A classical strategy to get a large deviation lower bound is to tilt the measure in such a way that the rare event {λ N max ∈ [x − δ, x + δ]} becomes typical under the tilted measure. We now check that it is possible to make such a tilt 1 : 1 As for Lemma 11, we want to mention that Lemma 12 holds without Assumption 2, that we add to simplify the proof.
where E x N,δ was defined in (3.6) and θ β x in Lemma 11.
Proof of Lemma 12. Let β = 1 or 2. The first remark is that, almost surely, |λ N max | ≤ K, where we recall that K : we know from Proposition 7 that, for any y ∈ R, From (5.1), for any y ∈ F δ , there exists γ y,η such that As F δ is a compact set, one can extract from the family ([y − γ y,η , y + γ y,η ]) y∈F δ a finite covering Letting η going to zero, we deduce that By Lemma 11, we know that L β x is nonnegative and vanishes only at x, so that, inf y∈F δ L δ x (y) > 0. Therefore, we deduce that, for N large enough, But, in virtue of Lemma 9, for N large enough, we also have and Lemma 12 follows.
From there, one can easily get the large deviation lower bound.
Proof of Proposition 3. As mentioned in Section 4, without loss of generality, one can assume Assumption 2. Let β = 1 or 2 and x ∈ R be fixed. If x > ρ a + ρ b or x < r(µ a µ b ), Lemma 1 gives that I β (x) = ∞, so that the lower bound obviously holds. Moreover, as we have seen at the end of the proof of Lemma 11, as µ b is not a Dirac mass at ρ b , then I β (ρ a + ρ b ) = ∞ and the lower bound also holds for x = ρ a + ρ b .
Let us now assume that r(µ a µ b ) ≤ x < ρ a + ρ b and let θ β x be the corresponding shift defined in Lemma 11. Then, with E x N,δ defined in (3.6) and recalling that A = A N , B = B N and H = A + U BU * , we have: δ ) so that, using again Lemma 10, we get: Letting δ going to zero and using Lemma 12, we get that This concludes the proof.

Proof of the main theorem and its corollary
Proof of Theorem 4. Assume that Assumption 1 and the condition (NoOut) are satisfied. Without loss of generality, one can add Assumption 2. As already stated in the proof of Lemma 12, almost surely, |λ N max | ≤ K, where we recall that K := sup N ≥1 ( A N + B N ). In particular, Using e.g. Theorem D.4(a) and Corollary D.6 in [Anderson et al., 2010], it is enough to show that, for any x ∈ R, The upper bound is nothing but Proposition 2, obtained from Proposition 7 for θ = 0 and the lower bound is given by Proposition 3.
We now prove Corollary 5. Our goal is to show that if A N and B N have no outliers, then the condition (NoOut) is automatically satisfied. Indeed, if A N and B N have no outliers, it means that their respective largest eigenvalues converge to the edge of the support of the limiting measure, that is to say ρ a = r(µ a ) and ρ b = r(µ b ). Therefore, Corollary 5 is a direct consequence of the following lemma: Lemma 13. For any probability measures µ and ν compactly supported on R, we have G µ ν (r(µ ν)) ≤ min(G µ (r(µ)), G ν (r(ν))).
Proof. If one of the measures µ or ν is a single point mass, the additive free convolution is just a translation and we have equality. We now assume that none of them is a single point mass. In general, we know (see e.g. [Belinschi, 2008]) that there exists a function ω, called the subordination function, which is analytic on C + := {z ∈ C, Im z > 0} such that, for all z ∈ C + , G µ ν (z) = G µ (ω(z)).
As µ and ν play symmetric roles, this concludes the proof of Lemma 13.
Appendix A. Study of the deformed model (2.4) A.1. Large deviations for the smallest eigenvalue of H N . In order to study the deviations of the largest eigenvalue of the deformed model below its expected value, we will need a counterpart of Theorem 4 for the smallest eigenvalue of H N . We first state the counterpart of the condition (NoOut). For any compactly supported probability measure µ, we denote by l(µ) the left edge of the support of µ. One can extend the definitions of G µ , K µ , R µ and Q µ given in Section 2: for any λ < l(µ), G µ (λ) := 1 λ−y µ(dy); G µ is decreasing from (−∞, l(µ)) into (G µ (l(µ)), 0) so we denote again by K µ its inverse. For any z ∈ (G µ (l(µ)), 0) we set R µ (z) := K µ (z) − 1 z , which is increasing with inverse Q µ . We then introduce the following assumption: (NoDown) The smallest eigenvalues λ converge as N grows to infinity to a and b respectively and As in Lemma 13, one can check that this condition is satisfied if A N and B N have no outliers, this time in the sense that a = l(µ a ) and b = l(µ b ). We now extend the definition of the rate function I β introduced in (2.1). For β = 1 or 2, θ ≤ 0, µ a compactly supported probability measure and ≤ l(µ), we define: For any θ ≤ 0 and x ≤ l(µ a µ b ), we denote by In other words, when θ and θ are of opposite sign, the rank two spherical integral asymptotically factorizes in the scale e N . As an immediate corollary, we find that the large deviations of λ min and λ max are asymptotically independent. (NoOut), for β = 1 or 2, the law of (λ N min , λ N max ) under m β N satisfies a large deviation principle in the scale N and with good rate function I β min (x) + I β (y). Proof. The proof is to tilt the measure by the two-dimensional spherical integral of Proposition 15 which implies that for θ > 0, θ < 0 Now, since the law of (e 1 , e 2 ) and (U e 1 , U e 2 ) are independent and equidistributed, we deduce the upper bound as before. The proof of the lower bound is the same since for any (x, y) we find a unique couple (θ x , θ y ) which optimizes the rate function.

Corollary 16. Under the assumptions (H bulk ), (NoDown) and
Moreover, it is easy to deduce the following corollary, which is the extension of Proposition 2 to θ < 0: Corollary 17. Under Assumption 1, for β = 1 or 2, for any θ < 0, for any x < r(µ a µ b ), and for any x ≥ r(µ a µ b ), With Proposition 15 in hand, the proof of Corollary 17 follows the same lines as the proof of Proposition 2. We do not detail it and go directly to the proof of Proposition 15. Note that this kind of factorization property has been already shown for θ and θ not too far from zero, we refer the reader to [Guionnet and Maïda, 2005][Theorem 7] or [?]. Our goal here is to extend this result to any pair of (θ, θ ) of opposite sign.
Proof of Proposition 15. For the sake of simplicity, we will stick to the case β = 1. Let g and g be two independent standard Gaussian vectors in R N . If we denote by · 2 the Euclidean norm and set e 1 := g g 2 , h := g − g, g g 2 2 g and e 2 := h h 2 , then it is well know that (e 1 , e 2 ) are two random vectors uniform on the unit sphere in R N , orthogonal to each other. Moreover, (e 1 , e 2 ) is independent from ( g 2 , g 2 , g, g ). Indeed, one can use the following system of coordinates : r := g 2 , γ 1 , . . . , γ N −1 are the polar coordinates of g, r := g 2 , η is the angle between g and g and γ 1 , . . . , γ N −2 are the angles needed to spot g on the cone of angle η around g. One can check that the Gaussian measure decomposes as a product measure in these coordinates, ( g 2 , g 2 , g, g ) is a function of r, r , η whereas (e 1 , e 2 ) is a function of the γ's and γ s. In particular, for any ε, if we let A ε N := {| g, g | ≤ ε g 2 g 2 }, then A ε N is independent of (e 1 , e 2 ). Moreover, on A ε N , we have h 2 2 ≥ g 2 2 (1 − ε 2 ) so that, for ε < 1/2, and E e N θ e 1 ,M N e 1 +N θ e 2 ,M N e 2 = 1 Because of the law of large numbers, for any ε > 0, P(A ε N ) converges to 1 as N goes to infinity. Hence, letting N go to infinity and then ε going to zero, we get the upper bound in (A.2).
We now prove the lower bound. If O is an orthogonal matrix, the law of (Oe 1 , Oe 2 ) is the same as the law of (e 1 , e 2 ) so that we can assume without loss of generality that M N is real diagonal, with eigenvalues that we denote by λ N 1 ≥ λ N 2 ≥ . . . ≥ λ N N . We refer the reader to Proposition 16 and Lemmas 18 to 21 in [Guionnet and Maïda, 2005], in particular the proof of Lemma 19. We recall that T is the rate function for As θ ≥ 0, one can check that As θ < 0, one can check that ≤ α * 2 ≤ α max . If α * 2 ∈ [α min , α max ], we set x 2 := 0, whereas if α * 2 ∈ ( , α min ), we set x 2 := (λ min − α * 1 )((λ min − α * 1 )H min − 1). We now define, for any δ > 0, Now, if σ 1 , . . . , σ N are N independent Rademacher random variables, independent of g and g , then g, g and N i=1 σ i g i g i have the same law. Therefore, since the sets B δ x 1 ,x 2 ,y 1 ,y 2 , C and E δ are independent of the sign of the g i 's, where the second expectation holds on the σ's only. Using the concentration properties of the Rademacher random variables (or the Azuma Hoeffding inequality), one gets that On B δ x 1 ,x 2 ,y 1 ,y 2 ∩ C ∩ E δ , the right hand side is bounded above by e −4 √ N ε 2 δ , so that we can conclude that, for any ε, δ > 0, P(A ε N |B δ x 1 ,x 2 ,y 1 ,y 2 ∩ C ∩ E δ ) converges to one as N goes to infinity. Furthermore, we have that Since it is well known that where the above right hand side goes to −∞ as δ goes to zero, we only need to estimate the first term in the right hand side of (A.6) for small enough δ.Now P(B δ x 1 ,x 2 ,y 1 ,y 2 ∩ C) ≥ P(B δ x 1 ,x 2 ,y 1 ,y 2 |C)P(C), where the last term goes to one as N goes to infinity. The last thing to check is that Indeed, going back to the proofs of Lemmas 18 and 19 in [Guionnet and Maïda, 2005] (see also [?] The proof is the same except that in the computation of the log-Laplace the integral with go from −N 1/4 to N 1/4 instead of running on R and this will not change the limit. Similarly, if α * 1 > α max , Putting everything together in (A.5), taking the limit as N then δ and ε go to zero, we get the factorization property.
A.3. Large deviations for the largest eigenvalue in the deformed model. For the sake of simplicity, when treating the deformed model, we will stick to the case β = 1. For any x > r(µ a µ b ), we denote by µ x the measure defined as follows: for any bounded measurable function f, In particular, for any For any x > ρ ≥ r(µ a µ b ) and ≤ l(µ a µ b ) we define and and we also extend it to and The quantities above can easily be extended to the case x = ρ > r(µ a µ b ) (only the first line of (A.7) will be relevant). For x = ρ = r(µ a µ b ), we set . . . , γ p ) a p-uplet of nonnegative real numbers, we now define, with the convention that Note that this rate function should not depend on the ordering of the γ i 's, which is far from obvious on the formula above.
We can now state our main result. We recall that (U 1 , . . . , U 1 ) are independent random vectors uniformly distributed on the unit sphere. To simplify the notations, they can be viewed as respective first column vectors of p independent matrices distributed according to m 1 N . Theorem 18. Under the assumptions (H bulk ), (NoOut) and (NoDown), for any p ∈ N * and any γ ∈ (R + ) p , the law of the largest eigenvalue λ N max of the matrix X N : Before proving Theorem 18, we need to state a variant of Proposition 16 in [Guionnet and Maïda, 2005]. We denote by P the standard Gaussian measure on R and we assume that (g 1 , . . . , g N ) follows the law P ⊗N . For any N -tuple of real numbers λ := (λ 1 , . . . , λ N ) and x / ∈ {λ 1 , . . . , λ N }, we denote by v N,λ (x) : 1≤i≤N be a triangular array of real numbers such that 1 N N i=1 δ λ N i converges to µ a µ b as N grows to ∞. We denote by λ N := (λ N 1 , . . . , λ N N ). Let x be a real number such that, for N large enough, We will not give a full proof of Proposition 19. This follows from an adaptation of Lemma 18 and Proposition 16 in [Guionnet and Maïda, 2005]. In Lemma 18 in particular, one can check that the deviations above the mean (which is G µa µ b (x) in the present case) may involve not only the limiting empirical distribution but also the limit as N grows to ∞ of the largest particle (denoted by max N i=1 γ i there and equal to in the present case, whereas the deviations below the mean may depend on the limiting smallest particle, equal to i −x here. The rest of this section is devoted to the proof of Theorem 18 in the case p = 1. For p > 1, the proof is very similar, except that instead of conditioning by the deviations of the extreme eigenvalues of H N , we will condition on the deviations of extreme eigenvalues of the model at step p − 1. Proof of Theorem 18 in the case p = 1. We recall that we stick to the case β = 1. Let γ 1 > 0 be fixed. As in the proof of Theorem 4, the exponential tightness is straightforward : for any N ≥ 1, | λ N max | ≤ K + γ 1 . Again, using e.g. Theorem D.4(a) and Corollary D.6 in [Anderson et al., 2010], it is enough to show that, for any x ∈ R, For any z which does not belong to the spectrum of H N , one can write Therefore, z is an eigenvalue of X N which is not an eigenvalue of H N if and only if (U

This function is decreasing and continuous on (max
Therefore, there exists a function ε λ going to zero at zero, such that for z ∈ (max N i=1 λ i , ∞), f λ (z) = 1 γ 1 if and only if, for any δ > 0 small enough, for any . Let x > r(µ a µ b ) be fixed. Let y such that r(µ a µ b ) ≤ y < x and set η 0 := x−y 4 . For any η < η 0 , similarly to the definition of E y N,η in (3.6), we introduce The analysis will be the same, except possibly for y = r(µ a µ b ). we have that for ∈ [y, y + η]. Therefore, if we denote by v N (x, y) := 1 x − y Therefore, for any η < η 0 , there exists a continuous function ε η going to zero at zero such that, for any δ ≤ η and N large enough, The probability measure on the right handside is P ⊗N ⊗ m 1 N because v N (x, y) can be seen as a function of U of law m 1 N and of (g 1 , . . . , g N ) of law P ⊗N . If we assume that η, δ < |x−y| 4 and for all i ∈ N * , λ i ≤ y + η, one can choose ε λ uniformly in (λ 1 , . . . , λ N ). Now, let U ∈ E y N,η be chosen. We denote by λ N 1 := y and for any 2 ≤ i ≤ N, λ N i := λ Taking the limit of the right hand-side as η goes to zero, we get using Theorem 4 that where the last inequality was obtained by optimizing on y. Assume now that r := r(µ a µ b ) < x < K µa µ b 1 γ 1 . Similarly to (3.6), we define, for y ≤ l(µ a µ b ) For η small enough and δ ≤ η, we can then write as above Then, taking the limit as η goes to zero in (A.13) and optimizing in y gives the required lower bound.
We now prove (A.14). Similarly to Lemma 11 and 12 (by symmetry between the smallest and largest eigenvalue), one can show that there exists a unique θ y ≤ 0 such that, for any η > 0 and N large enough, With this ingredient, the proof of (A.14) goes as in the proof of Proposition 3: The strategy to get the upper bound is similar : we know that, for N ≥ 1, λ N max ∈ [−K, K] and λ N min ∈ [−K, K]. For any δ > 0, there exists p ∈ N * and ρ 1 , . . . , ρ p such that + m 1 N (d(μ N , ν 1 N ) > N −1/4 ). We then use Lemma 9 to get rid of the last term and apply the same strategy as before, combining the relation (A.12) and Proposition 19 for the main term.
Assume now that G µa µ b (x) > 1 γ 1 . We apply the very same strategy with E ρ i ,− N,δ instead of E ρ i N,δ and the bound (A.14).