Large Deviations for the largest eigenvalue of rank one deformations of Gaussian ensembles

We establish a large deviation principle for the largest eigenvalue of a rank one deformation of a matrix from the GUE or GOE. As a corollary, we get another proof of the phenomenon, well-known in learning theory and ﬁnance, that the largest eigenvalue separates from the bulk when the perturbation is large enough. A large part of the paper is devoted to an auxiliary result on the continuity of spherical integrals in the case when one of the matrix is of rank one, as studied in (12).


Introduction
We consider in this paper rank one deformations of matrices from Gaussian ensembles, that is matrices which can be written W N + A N , with W N from the Gaussian Orthogonal (or Unitary) Ensemble and A N rank one deterministic, real symmetric if W N is from the GOE and Hermitian if W N is from the GUE.
Since the fifties, the classical Gaussian ensembles (see Mehta (19)) have been extensively studied. Various results for the global regime were established (Wigner semicircle law (22), large deviations for the spectral measure (5)...); the statistics of the spacings between eigenvalues were investigated for example in (8; 7), as well as the behaviour of extremal eigenvalues (Tracy-Widom distribution (21)). In the meantime, people got interested in the universality of some of these results. In this context, it is natural to look at various deformations of these ensembles, for example the rank one deformations we are interested in.
This so-called "deformed Wigner ensemble" was studied in (16) and (6), where the authors focused mainly on the problem of the local spacings and in (20) and (11), where they studied the behaviour of the largest eigenvalue. In this framework, our goal in this paper will be to establish a large deviation principle for the largest eigenvalue of X N = W N + A N , that we denote in the sequel by x * N . Note that our result can also be seen as a generalization of the result established in (4) for the largest eigenvalue of a matrix distributed according to the GOE. If we denote by θ the unique non zero eigenvalue of A N , the joint law of the eigenvalues x 1 , . . . , x N of X N = W N + A N is given by where I β N is the spherical integral defined by I β N (θ, X N ) := e N tr(UX N U * A N ) dm β N (U ) = e N θ(U X N U * ) 11 dm β N (U ), with m β N the Haar probability measure on O N the orthogonal group of size N if β = 1, on the unitary group U N if β = 2 and Z β,θ N is a normalizing constant. The fact that the joint law of the eigenvalues of X N and that I β N (θ, X N ) depend on A N only through its non zero eigenvalue θ comes from the unitary invariance respectively of the law of W N and of the Haar measure m β N .
Our main result is the following Theorem 1.1. For β = 1 or 2, if θ 0, then under Q θ N , the largest eigenvalue x * N = max{x 1 , . . . , x N } satisfies a large deviation principle in the scale N, with good rate function K β θ defined as follows: One can see in particular that K β θ differs from the rate function for the deviations of the largest eigenvalue of the non-deformed model that was obtained in (4).
Note that in the case when θ < 0, similar results would hold for the smallest eigenvalue of the deformed ensemble. We let the precise statement to the reader and assume in the sequel that θ > 0. Remark 1.2. Let us mention that, although we did not investigate this point in full details, very similar results can be obtained with our techniques in the case of sample covariance matrices for the so-called "single spike model" that is matrices of the form XX * , where X is a p × n matrix, whose column vectors are iid Gaussian (real or complex) with a covariance matrix diag(a, 1, 1, . . . , 1), with a single spike a > 1 (see below for references).
We have to mention an important corollary of Theorem 1.1 : x * N converges almost surely to the edge of the support of the semicircle law σ β as long as θ θ c := β 2 and separates from the support when θ > θ c . In this case, it converges to θ + β 2θ .
This allows us to give a new proof, via large deviations, to this known phenomena which is crucial for applications to finance and learning theory (cf. for example (15; 18)). On the mathematical level, this kind of phase transition has been pointed out and proved by several authors in the case of non-white sample covariance matrices (cf. for example (3) for the complete analysis in the complex Gaussian case, (10), (9) for more general models, (17) for statistical applications to PCA).
The organisation of the paper is as follows : as we can see in (1) above, the expression of the joint law Q θ N of the eigenvalues involves the spherical integrals I β N in the case when one of the matrices is of rank one. We got the asymptotics of this quantity in (12) but we will need a precise continuity result of these spherical integrals to which Section 2 is devoted. In Section 3, we prove Theorem 1.1. Finally, in a very short Section 4, we show how to derive Corollary 1.3 from this Large Deviation Principle.

Continuity of spherical integrals
The question we want to address in this section is the continuity, in a topology to be prescribed, of I β N (θ, B N ), in its second argument B N . The matrix B N is supposed to be symmetric if β = 1 and Hermitian if β = 2. Due to invariance property of the Haar measure, we can always assume that B N is real diagonal.
We denote by λ 1 (B N ), . . . , λ N (B N ) the eigenvalues of B N in decreasing order and we let ; d is the Dudley distance defined on probability measures by The following continuity property holds Proposition 2.1. For β = 1 or 2, for any θ > 0 and any κ > 0, there exists a function g κ : R + → R + going to zero at zero such that, for any δ > 0 and N large enough, if B N and B ′ N are two sequences of real diagonal matrices such that d( Remark 2.2. According to Theorem 6 of (12), we know that, for some values of θ, the limit of 1 N log I β N (θ, B N ) as N goes to infinity depends not only on the limiting spectral measure of B N but also on the limit of λ 1 (B N ). Therefore 1 N log I β N (θ, B N ) cannot be continuous in the spectral measure of B N but we have also to localize λ 1 (B N ). That is precisely the content of Proposition 2.1 above. We also refer the reader to the remarks made in (12) on point (3) of Lemma 14 therein.
A key step to show Proposition 2.1 is to get an equivalent as explicit as possible of . This is given by If B N has spectral radius uniformly bounded in N , then for any δ > 0, for N large enough, This lemma can be regarded as a generalization to any value of θ of the second point of Lemma 14 in (12). The remaining of this section is devoted to its proof. For the sake of simplicity, we prove in full details the case β = 1 and leave to the reader the changes to the other cases.

Some preliminary inequalities
Notation and remarks.
• We denote by λ 1 . . . λ N the eigenvalues of B N in decreasing order.
• The eigenvalues λ 1 . . . λ N being fixed, we can see that the function Moreover, if θ > 0, one can easily see that v N + 1 2θ andṽ N + 1 2θ both lie in (λ 1 , ∞). • E and V denotes respectively the expectation and the variance under the standard Gaussian measure on R N .
• As 1 + 2θv N − 2θλ 1 > 0, we can define the probability measure on R N given by We denote by E P N and V P N respectively the expectation and the variance under P N .
• Similarly, we define the probability measure on R N −j 0 given bỹ We denote by EP N and VP N respectively the expectation and the variance underP N .
Before going to the proof of Lemma 2.3, we enumerate hereafter some inequalities on the quantities we have just introduced, that will be useful further.

We have the following inequality
where the right inequality comes from the fact that N i=1 and for all i, v N + 1 2θ − λ i 0 and the left one is inherited from the definition of j 0 . Putting the leftmost and rightmost terms together, we get the first inequality announced in point (2) above. The second one is even simpler : positive so that any of them is smaller than 2θN.
but the rightmost term is smaller or equal to 2θN. Therefore,ṽ N v N .

Proof of Lemma 2.3
We first prove the upper bound: the starting point will be the same as in (12). It is a well known fact that the first column vector of a random orthogonal matrix distributed according to the Haar measure on O N has the same law as a standard Gaussian vector in R N divided by its Euclidian norm. Therefore, we can write From concentration for the norm of a Gaussian vector (cf. (12) for details), we get that, for any κ such that 0 < κ < 1/2, where A N (κ) = g 2 N − 1 N −κ and δ(κ, N ) goes to one at infinity for any 0 < κ < 1/2.
From there, we have where M is the uniform bound on the spectral radius of B N and we use that P N (A N (κ)) 1. Therefore, for any δ > 0, we get that for N large enough, For the proof of the lower bound, we have to treat two distinct cases. In both cases, the starting point, inherited from (2), is the following: but our startegy will be different according to whether there is a lot of eigenvalues at the vicinity of the largest eigenvalue λ 1 or not, that is according to the size of j 0 defined above.
• First case : N is such that j 0 δN 1− ξ 2 . In this case, the situation is very similar to what happens with a small θ, we therefore follow the proof of (12). Indeed, we write and show that, for N large enough, P N (A N (κ)) 1/2. An easy computation gives that and our goal is to show that this variance decreases fast enough.
This gives that Therefore, by Chebichev inequality, where the last inequality holds for N large enough with our choice of ξ.
• Second case : N is such that j 0 δN 1− ξ 2 . The strategy will be a bit different : separating the eigenvalues of B N that are in K N (ξ) and the others, we get The first term will be treated similarly to what we made in the first case. We can easily check which goes to zero with our choice of ξ. This gives that for N large enough, We now go to the last term. From (3) in Fact 2.4, we have thatṽ N λ 1 , so that for any i j 0 , where the last inequality is again obtained through Chebichev inequality.
Putting together (3) and (4), we get that for N large enough The last step is now to prove that, for N large enough, On one side, we have that where we use (4) in Fact 2.4, with N large enough and ε = δ 2θ .
Thanks to Lemma 2.3, we know that it is enough to study we proceed as in the proof of Lemma 5.1 in (14) and define a permutation σ N that allows to put in pairs all but (N 1−κ ∧ N δ) of the λ i 's with a corresponding λ ′ σ N (i) which lies at a distance less than δ from λ i . As in (14), we denote by J 0 the set of indices i such that we have such a pairing. Then we have where we used once again that 2θ so that we get the required continuity in this second case.
In this case, we proceed exactly as in the second case. The only point is that establishing that v N cannot be far from v ′ N will be a bit more involved. We address this point in detail.
On one side we have from Fact 2.4 that On the other side, as |λ 1 − λ ′ 1 | δ, λ ′ 1 + 2δ is greater than λ 1 and the map B N → H B N is continuous outside the support of all the spectral measures so that with the function C going to zero at zero.
and H B N being decreasing what implies with the function K going to zero at zero. and, together with (6) this gives that Now the same estimates as in the second case above lead to the same conclusion. This gives Proposition 2.1. The goal of this section is to prove the large deviation principle for x * N , the largest eigenvalue of a matrix from the deformed Gaussian ensemble, announced in the introduction in Theorem 1.1.
A first step will be to prove the following Proposition 3.1. For β = 1 or 2 and θ > 0, if we define with Z β N the normalizing constant in the case θ = 0, and we let where σ β denotes the semicircle law whose density on R is given by where B N has limiting spectral measure µ and limiting largest eigenvalue x, then we have the following large deviations bounds : 1. there exists a function f θ : R + → R + going to infinity at infinity such that for all N

2.
For any x, for any M such that |x| < M, 3. For any x, is well defined by virtue of Theorem 6 in (12) (an explicit expression for it will be given in Section 3.2 below).
It is more convenient to rewrite (7) as Now, a well known inequality (see for example Lemma 2.3 in (2)) gives that where the minimum is taken over all permutations π of {1, . . . , N }. But all a k 's are zero, except one of them, let's say a 1 , which is equal to θ. As the law of the x j 's in invariant by permutations, we can assume that π −1 * (1) = 1, where π * is the permutation for which the minimum is reached. Therefore We can now use the very same estimates as in Lemma 6.3 in (4) to get (1). More precisely, we can write , for x large enough, so that, for M large enough, From Selberg formula (cf for example proof of Proposition 3.1 in (5)), we can show that C. This concludes the proof of (1).
• For all x < √ 2β, Indeed, we know from Theorem 1.1 in (5) that the spectral measure of W N satisfies a large deviation principle in the scale N 2 with a good rate function whose unique minimizer is the semicircle law σ β . We can check that adding a deterministic matrix of bounded rank (uniformly in N ) does not affect the spectral measure in this scale so that the spectral measure of X N satisfies the same large deviation principle. Therefore, if we let x < √ 2β, f ∈ C b (R) such that f (y) = 0 if y x but f dσ β > 0 and if we consider the closed set F := {µ/ f dµ = 0}, we have that Furthermore, as we saw above, C, the same holds for P θ N and we immediately deduce what gives immediately (8).
• Let now x √ 2β and δ > 0. Let M > |x| and δ small enough so that M |x+δ|. One important remark is that, by invariance by permutation, we have, We introduce now the following notations : is the measure on R N −1 such that, for each Borel set E, we have With these notations, we have where B(σ β , N −κ ) is the ball of size N −κ centered at σ β , for the Dudley distance (defined at the very beginning of Section 2).
We first show that the second term is exponentially negligible. We have where F N −1 and F β are respectively the (cumulative) distribution function ofπ N and σ β . We know from the result of Bai in (1) that But, by a result of concentration of (13) (see Theorem 1.1), we have that there exists a constant C > 0 such that for all N ∈ N, We can now come back to the first term in (9). The same computation as in the proof of Proposition 3.1 in (5), based on Selberg formula, gives that Applying Proposition 2.1 together with Theorem 6 of (12), we can conclude that One can easily see that z → Φ β (z, σ β ) is continuous on ( √ 2β, +∞) and the continuity of I β σ β (., θ) will be shown in the proof of Lemma 3.4 below. We therefore get the upper bound : • We now conclude the proof of Proposition 3.1 by showing the lower bound (3). We proceed as in (4). Let y > x > r > √ 2β. Then, where B r (σ β , N −κ ) = B(σ β , N −κ ) ∩ P([−r, r]), with P([−r, r]) the set of probability measure whose support is included in [−r, r] and g κ going to zero at zero by virtue of Proposition 2.1. We proceed as for the upper bound to show that P N −1 N (π N ∈ B r (σ β , N −κ )) is going to 1. Knowing the asymptotics of C β N , we get We let now y decrease to x. Φ β (., σ β ) and I β σ β (., θ) are continuous on ( √ 2β, +∞) (see Lemma 3.4 below) so that we have the required lower bound This concludes the proof of Proposition 3.1. 3.2 Proof of Theorem 1.1 We first introduce a few notations that will be useful. Definition 3.3. For µ a compactly supported measure, we define H µ its Hilbert transform by with co(supp µ) the convex enveloppe of the support of µ.
It is easy to check that H µ is injective, therefore we can define its inverse G µ defined on the image of H µ such that G µ (H µ (z)) = z. The R-transform R µ is then given, for z = 0, by R µ (z) = G µ (z) − 1 z . Moreover, one can check that l := lim z→0 R µ (z) exists and we let R µ (0) = l, so that R µ is continuous at 0. Proof. We know from Theorem 6 in (12) that Therefore, we have to check that I β σ β (x, θ) is continuous at x * satisfying H σ β (x * ) = 2θ β for a θ such that x * > √ 2β. From Definition 3.3, we see that v(., θ) is continuous at x * . Moreover as x * > √ 2β, it is outside the support of σ β and x → log (x − λ) dσ β (λ) is continuous at x * .
We now go to the proof of Theorem 1.1. If we define L β θ (x) := F β θ (x) − inf x∈R F β θ (x), then L β θ is a good rate function and a direct consequence of Proposition 3.1 is that, for