Large deviations for the largest eigenvalues and eigenvectors of spiked Gaussian random matrices

We consider matrices formed by a random N × N matrix drawn from the Gaussian Orthogonal Ensemble (or Gaussian Unitary Ensemble) plus a rank-one perturbation of strength θ, and focus on the largest eigenvalue, x, and the component, u, of the corresponding eigenvector in the direction associated to the rank-one perturbation. We obtain the large deviation principle governing the atypical joint fluctuations of x and u. Interestingly, for θ > 1, in large deviations characterized by a small value of u, i.e. u < 1 − 1/θ, the second-largest eigenvalue pops out from the Wigner semi-circle and the associated eigenvector orients in the direction corresponding to the rank-one perturbation. We generalize these results to the Wishart Ensemble, and we extend them to the first n eigenvalues and the associated eigenvectors.


Introduction
The large deviations theory for the spectral properties of random matrix models is a very active domain of research in probability theory and theoretical physics. A lot of works have been devoted to the statistics of the eigenvalues. Following Voiculescu pioneering work on non-commutative entropy [23], G. Ben Arous and one of the author derived a large deviation principle for the distribution of the empirical measure of the eigenvalues of Gaussian ensembles in the late nineties (in physics known as Coulomb gas method [17]). The proof is based on the explicit density of the joint law of the eigenvalues and the speed of the large deviation principle is the square of the linear dimension of the random matrix. More than ten years later, C. Bordenave and P. Caputo [6] obtained a large deviations principle for the same empirical measure but for a Wigner matrix with heavy tails entries, in the sense that their tail decays more slowly than a Gaussian variable at infinity. Their approach is totally different as it follows from the ideas that deviations are created by a few big entries: the rate then depends on the speed of decay of the tail. The large deviation principle for the spectral measure in the general sub-Gaussian case is still an open problem. Instead of considering the deviations of the empirical measure, it is also natural to try to understand the probability of deviations of a single eigenvalue. The deviations of an eigenvalue inside the bulk is closely related to that of the empirical measure but one can seek for the probability of deviations of the extreme eigenvalues. This was achieved for Gaussian ensembles in the Appendix of [5], see also [10], where it was shown that the large deviations are on the scale of the dimension. Again, the proof was based on the explicit joint law of the eigenvalues. The large deviations principle for the largest eigenvalue was derived in [3] for heavy tails. In the case of sharp sub-Gaussian entries, which include Rademacher (binary) entries, it was recently proved that the large deviations of the extreme eigenvalues are the same than in the Gaussian case [15]. The probability of atypical eigenvectors has been much less studied. Again, the only result that we know concerns the Gaussian ensembles: in this case, the invariance by multiplication of the Haar measure implies that each eigenvector is uniformly distributed on the sphere. In [5], the large deviations for the empirical measure of the properly rescaled entries of an eigenvector was established. The large deviations for the supremum of the entries could also be easily derived. In this article, we address a different question. We want to investigate the large deviations of the eigenvector in a given fixed direction. In many solvable random matrix models, eigenvectors are uniformly distributed; hence there are no meaningful atypical fluctuations or special directions to focus on. For a spiked GOE matrix, i.e. a random N × N matrix drawn from the Gaussian Orthogonal Ensemble plus a rank-one perturbation, there is instead a special direction: the one related to the perturbation. In this case an interesting phenomenon, called BBP-transition, takes place by varying the strength of the perturbation (called θ in the following). As shown in [12] and then proved rigorously in [4] the largest eigenvalue, x, pops out of the semi-circle if the perturbation is strong enough. More precisely, x is almost surely equal to two for θ ≤ 1 and to θ + 1/θ for θ > 1. In the latter case, the square of the component of the associated eigenvector in the direction associated to the perturbation, that henceforth we shall denote u, is almost surely equal to 1 − 1/θ 2 . In this context the question we raised before becomes meaningful, and it is natural to focus on the good rate function (GRF) that controls the joint atypical fluctuations of x and u. This GRF plays an important role for the geometric properties of random high-dimensional energy landscapes, which can exhibit a number of critical points that is exponentially large in the number of dimensions, as obtained in [9,13,8] and rigorously proven and extended in [2,22]. The rigorous method developed to perform those studies is based on a large dimensional version of the Kac-Rice formula [1], and is strongly related to random matrix theory, since the Hessian of the energy function at the critical points-a crucial element in the theoretical analysis-is a random matrix. In order to analyze the dynamics in those rough landscapes it is important to know not only the behavior of typical critical points, but also of atypical ones associated to index one saddles connecting minima [21]. One has therefore to study large deviations of the Hessian, i.e. one needs to condition the critical points to be of index one and to have the eigenvector associated to the negative eigenvalue oriented in the direction connecting the minima, which leads in fact the problem discussed above. Noise dressing and cleaning of empirical correlation matrices is another context in which the kind of large deviations addressed in this paper are relevant. In this case, a model that is often considered to interpret the data is the one of spiked Wishart random matrices, whose eigenvalue distribution consists in a Marchenko-Pastur law plus a few eigenvalues that pop out from it. Those few eigenvalues correspond to the signal buried in the noise and the associated eigenvectors play an important role in assessing the structure of the correlations, with important applications such as portfolios risk management [7]. A natural question in this context is to characterize the joint atypical fluctuations of the largest eigenvalues and associated eigenvectors that carry the signal. In this work we obtain the large deviation function that governs them.

Main results
We consider the matrix where X is from the GOE if β = 1 (resp. the GUE if β = 2) and θ is a non-negative real number. w is a fixed unit vector and we may assume without loss of generality that w = e 1 = (1, 0 · · · , 0). Let λ N ≥ λ N −1 ≥ · · · ≥ λ 1 be the eigenvalues of Y , with respective eigenvectors v N , . . . , v 1 . The joint large deviations of the largest eigenvalue λ N and the component |v N (1)| 2 of the associated eigenvector along w = e 1 is governed by the following theorem.
Theorem 2.1. The joint law of (λ N , |v N (1)| 2 ) satisfies a large deviation principle in the scale N and good rate function I β . In other words, for any closed set and for any open Moreover, I β is a good rate function in the sense that it is non-negative and with compact level sets. More precisely, the function I β is infinite outside of S = [2, +∞) × [0, 1] and otherwise given by Here σ(dx) = √ 4 − x 2 dx/2π is the semi-circle distribution and G σ its Cauchy transform. The second largest eigenvalue converges almost surely to the maximizer, y, of the variational problem (2.1) defined above.
We have more explicit results on the rate function, the behavior of the second largest eigenvalue and the component of the associated eigenvector along w = e 1 : • For θ ≥ 1 and x > θ + 1/θ: The second eigenvalue pops out of the semicircle for u < 1 − 1/θ, and is equal to . The minimum on u of the large deviation function I β , for a given • For θ ≥ 1 and 2 ≤ x < θ + 1/θ: The second eigenvalue pops out of the semicircle for u < 1 − 1/θ, and is equal to inf(y(u), x), i.e. it increases when u decreases until reaching the value x. The minimum of the large deviation function is reached at u θ .
An example of the large deviation function (GRF) is shown in Fig. 1 for θ = 3.
The previous results can be extended to the large deviations of the n largest eigenvalues λ N ≥ λ N −1 ≥ · · · λ N −n and their first components where the function J is the same one of Theorem 2.1. The n + 1-th largest eigenvalue is equal almost surely to the maximizer y of the variational problem defined above.
Moreover, we can as well extend our results in the case of Wishart matrices with covariance which is a finite dimensional perturbation of the identity. To simplify, let us assume it is one dimensional, and consider the Wishart matrix where Y is a M × N random matrix with i.i.d standard Gaussian entries with variance 1/N . Σ is a M × M non-negative definite matrix :Σ = I + γee * with e a unit vector. We assume M ≤ N . We recall that when M/N converges towards α ∈ (0, 1], the empirical measure of Y Y * converges towards the so-called Marchenko-Pastur [20] law π α with support [ . We can study the joint large deviation of the largest eigenvalue λ N and the strength |v N (1)| 2 of the eigenvector in the direction e for W as well. We find that where I(y) = y − (1 − α) ln y − 2α ln |y − t|dπ α (t) is the rate function for the large deviation of the largest eigenvalue of a Gaussian Wishart matrix with covariance equal to the identity.

Strategy of the proof
We next focus on the pertubed Wigner matrix Y = X + θww T . The law of Y is given by Therefore, since w, Y w = λ i |v i (1)| 2 when w = e 1 , the joint law of (λ N , |v N (1)| 2 ) is given by Our main goal is to estimate the density of P N when N is large and to apply Laplace's method. We infer from concentration inequalities [19] We deduce that for x > 2 (the singularity of the log can be overcome as in [5]) i.e. we can replace the empirical distributionμ N −1 with its average. To estimate the other terms, we first observe that since (v i (1)) 1≤i≤N is uniformly distributed on the sphere, we can represent with independent standard Gaussian variables g i , 1 ≤ i ≤ N , which are real when β = 1 and complex when β = 2. As a consequence, we see that the distribution U N of |v N (1)| 2 is the Beta-distribution Hence the main point of the proof is to estimate where the expectation is over v i (1), i ≤ N − 1 with given |v N (1)| 2 = u. Thanks to (3.3), if we fix v N (1) and denote g N −1 = (g 1 , . . . , g N −1 ), we have Observe that (w i ) 1≤i≤N −1 is independent of |v N (1)| 2 (as the computation of the joint law reveals). Hence

Asymptotic of spherical integrals
Recall the definition of spherical integrals: where e is uniformly sampled on the sphere S N −1 with radius one. The asymptotics of were studied in [14] where the following result was proved. • The sequence of empirical measuresμ N E N converges weakly to a compactly supported measure µ. • There are two real numbers λ min (E), λ max (E) such that For any θ ≥ 0, lim The limit J is defined as follows. For a compactly supported probability measure µ ∈ P(R) we define its Stieltjes transform G µ by, where supp(µ) denotes the support of µ. In the sequel, for any compactly supported probability measure µ, we denote by r(µ) the right edge of the support of µ. Then G µ is a bijection from (r(µ), +∞) to (0, G µ (r(µ)), with G µ (r(µ)) = lim We denote by K µ its inverse on (0, G µ (r(µ)) and let R µ (z) := K µ (z) − 1/z be the R-transform of µ as defined by Voiculescu in [24] (defined on (0, G µ (r(µ))). In order to define the rate function, we now introduce, for any θ ≥ 0, and λ ≥ r(µ), In the case where µ = σ, the semi-circular law, then, Therefore,

Proof of Theorem 2.1
Remark that Theorem 2.1 implies the weak large deviation principle which states that for δ small enough, Indeed, the weak large deviation principle is simply the restriction of the full large deviation principle to small balls. To recover the full large deviation principle from its weak version, it is enough to show that the probability is exponentially tight in the sense that deviations mostly occur in a compact set. The latter is easy to check since |v N (1)| 2 lives in a compact set and where it is known that X ∞ ≤ M with probability greater than 1 − e −N cM 2 [5]. We refer the reader to [11] for more details. Hence, we only need to prove the weak large deviation principle, that is estimate the probability that (λ N , |v N (1)| 2 ) is close to some (x, u).
By Theorem 4.1, if we assume addtionally that λ N −1 is close to y, and |v N (1)| 2 close to u, we deduce using (3.1) and (3.6) that But λ N −1 satisfies a LDP under P x N −1 [5] with good rate function which is infinite above x and below 2, and otherwise given by Hence we deduce by continuity of the limiting spherical integrals [18] that ,y)− y 2 4 + ln |y−t|dσ(t)+C} and therefore, plugging (3.2),(3.4) in the above estimate, we deduce that the joint law of (λ N , |v N (1)| 2 ) is approximately given by The final result follows by Laplace's method.

Proof of Theorem 2.2
The law of Y is given by where I n N (x, θ, u) equals Here we have denoted As a consequence, one can write and the same when the limsup are replaced by a liminf. We deduce the same case for x i ordered and eventually equal by taking approximating sequencesx δ i andx δ i which are strictly ordered and such that By the previous bounds we deduce that The continuity of I in the x i allows to conclude by letting δ going to zero.

Study of the rate function
We can give a more explicit formula of the rate function by noticing that the supremum was already studied in [18]. In the notations of [18], we are maximizing −F 1 x]. According to [18,Section 3.2] of this paper we find that Note that these two cases correspond to different asymptotic behaviors of λ N −1 when λ N goes to x and |v N (1)| 2 goes to u: if θ(1 − u) is larger than 1, λ N −1 goes to inf(y(u), x), and otherwise to 2. We can therefore study the optimizer in u of H for a given x.
• For θ(1 − u) ≥ 1, the contribution to the GRF that depends on u and that we have to minimize reads: θ(x − inf(y(u), x))u − 1 2 ln | inf(y(u), x) − t|dσ(t) + inf(y(u), x) 2 4 which is independent of u if x ≤ y(u). If x ≥ y(u) the total derivative of this expression is simply − 1 2 θ(x − inf(y(u), x)) ≤ 0 since the partial derivative with respect to y(u) is zero (y(u) is an extremum). As a consequence, the expression above is a decreasing function of u in the entire available range of u. • for θ(1 − u) ≤ 1, x ≥ 2, the contribution to the GRF that depends on u and that we have to minimize reads: We find that the minimum in u is taken at u θ given by u θ belongs to [0, 1] and therefore is admissible, for x ≤ 2θ and otherwise x ≥ max{2θ, θ + θ −1 }. Otherwise u θ < 0 and the minimum is taken at zero. where if P x M −1 is the law of the remaining N − 1 eigenvalues of Wishart matrices conditioned to be smaller than x. The proof of Theorem 2.3 then follows exactly the same steps as before.