Concentration of the Spectral Measure for Large Random Matrices with Stable Entries

We derive concentration inequalities for functions of the empirical measure of large random matrices with infinitely divisible entries and, in particular, stable ones. We also give concentration results for some other functionals of these random matrices, such as the largest eigenvalue or the largest singular value.

There is relatively little work outside the independent or finite second moment assumptions.Let us mention Soshnikov [28] who, using the method of determinants, studied the distribution of the largest eigenvalue of Wigner matrices with entries having heavy tails.(Recall that a real (or complex) Wigner matrix is a symmetric (or Hermitian) matrix whose entries M i,i , 1 ≤ i ≤ N , and M i,j , 1 ≤ i < j ≤ N , form two independent families of iid (complex valued in the Hermitian case) random variables.)In particular, (see [28]), for a properly normalized Wigner matrix with entries belonging to the domain of attraction of an α-stable law, lim N →∞ P N (λ max ≤ x) = exp (−x −α ) (here λ max is the largest eigenvalue of such a normalized matrix).Soshnikov and Fyodorov [29] further derived results for the largest singular value of K × N rectangular random matrices with independent Cauchy entries, showing that the largest singular value of such a matrix is of order K 2 N 2 .
On another front, Guionnet and Zeitouni [9], gave concentration results for functionals of the empirical spectral measure for random matrices whose entries are independent and either satisfy a Logarithmic Sobolev inequality or are compactly supported.They obtained in that context, the subgaussian decay of the tails of the empirical spectral measure when deviating from its mean (see also Ledoux [18]).Our purpose in the present work is to deal with matrices whose entries form a general infinitely divisible vector, and in particular a stable one.We obtain concentration results for functionals of the corresponding empirical spectral measure, allowing for any type of light or heavy tails.The methodologies developed here apply as well for the largest eigenvalue or for the spectral radius of such random matrices.
Following the lead of Guionnet and Zeitouni [9], let us start by setting our notation and framework.
Let M N ×N (C) be the set of N × N Hermitian matrices with complex entries, throughout, equipped with the Hilbert-Schmidt norm Let f be a real valued function on R. The function f can be viewed as mapping M N ×N (C) to M N ×N (C).Indeed, for M = (M i,j ) 1≤i,j≤N ∈ M N ×N (C), so that M = UDU * , where D is a diagonal matrix, with real entries λ 1 , ..., λ N , and U is a unitary matrix, set .
Let tr(M) = N i=1 M i,i be the trace operator on M N ×N (C) and set also For a N × N random Hermitian matrix with eigenvalues λ 1 , λ 2 , ..., λ N , let We study below the tail behavior of either the spectral measure or the linear statistic of f (M) for classes of matrices M. Still following Guionnet and Zeitouni, we focus on a general random matrix X A given as follows: , and where ω i,j , 1 ≤ i ≤ j ≤ N is a complex valued random variable with law P i,j = P R i,j + √ −1P I i,j , 1 ≤ i ≤ j ≤ N, with P I i,i = δ 0 (by the Hermite property).Moreover, the matrix A = (A i,j ) 1≤i,j≤N is Hermitian with, in most cases, non-random complex valued entries uniformly bounded, say, by a.
Different choices for the entries of A allow to cover various types of ensembles.For instance, if ω i,j , 1 ≤ i < j ≤ N , and gives the GUE (Gaussian Unitary Ensemble) (see [22]).Moreover, if ω R i,j , ω I i,j , 1 ≤ i < j ≤ N, and ω R i,i , 1 ≤ i ≤ N , are two independent families of real valued random variables, taking A i,j = 0 for |i − j| large and A i,j = 1 otherwise, gives band matrices.Proper choices of non-random A i,j also make it possible to cover Wishart matrices, as seen in the later part of this section.In certain instances, A can also be chosen to be random, like in the case of diluted matrices, in which case A i,j , 1 ≤ i ≤ j ≤ N , are iid Bernoulli random variables (see [9]).On R N 2 , let P N be the joint law of the random vector

the corresponding expectation. Denote by μN
A the empirical spectral measure of the eigenvalues of X A , and further note that for any bounded Borel function f .For a Lipschitz function f : R d → R, set where throughout • is the Euclidean norm, and where we write f ∈ Lip(c) listed in non increasing order according to multiplicity in the simplex where throughout S N is equipped with the Euclidian norm λ = N i=1 λ 2 i .It is a classical result sometimes called Lidskii's theorem ( [24]), that the map M N ×N (C) → S N which associates to each Hermitian matrix its ordered list of real eigenvalues is 1-Lipschitz ( [10], [17]).For a matrix X A under consideration with eigenvalues λ(X A ), it is then clear that the map ), with Lipschitz constant bounded by a 2/N.Moreover, for any real valued Lipschitz function F on S N with Lipschitz constant F Lip , the map 17], [1]) ensure that the maximal eigenvalue λ max (X A ) = λ 1 (X A ), the spectral radius ρ(X A ) = max 1≤i≤N |λ i | and tr N (f (X A )), where f : R → R is a Lipschitz function, are themselves Lipschitz functions with Lipschitz constants at most a 2/N, a 2/N and √ 2a f Lip /N, respectively.These observations (and our results) are also valid for the real symmetric matrices, with proper modification of the Lipschitz constants.
Next, Recall that X is a d-dimensional infinitely divisible random vector without Gaussian component, X ∼ ID(β, 0, ν), if its characteristic function is given by, where t, β ∈ R d and ν ≡ 0 (the Lévy measure) is a positive measure on B(R d ), the Borel σ-field of R d , without atom at the origin, and such that The vector X has independent components if and only if its Lévy measure ν is supported on the axes of R d and is thus of the form: for some one-dimensional Lévy measures νk .Moreover, the νk are the same for all k = 1, . . ., d, if and only if X has identically distributed components.
The following proposition gives an estimate on any median (or the mean, if it exists) of a Lipschitz function of an infinitely divisible vector X.It is used in most of the results presented in this paper.The first part is a consequence of Theorem 1 in [13], while the proof of the second part can be obtained as in [13].
, and for any γ > 0, let where k γ (x), x > 0, is the solution, in y, of the equation and where e k , y ν(dy) + 1< y ≤pγ e k , y ν(dy) with e 1 , e 2 , . . ., e N 2 being the canonical basis of R N 2 .
Our first result deals with the spectral measure of a Hermitian matrix whose entries on and above the diagonal form an infinitely divisible random vector with finite exponential moments.Below, for any b > 0, c > 0, let ) 1≤i<j≤N be a random vector with joint law P N ∼ ID(β, 0, ν) such that E N [e t X ] < +∞, for some t > 0. Let T = sup{t ≥ 0 : E N e t X < +∞} and let h −1 be the inverse of (ii) ) with G 2 (γ) as in Proposition 1.1, C a universal constant, and with t 0 the solution, in t, of th(t) − t 0 h(s)ds − ln(12b/δ) = 0.

Remark 1.3 (i)
The order of C(δ, b) in part (ii) can be made more specific.Indeed, it will be clear from the proof of the theorem (see (2.39)), that for any 0 < t * ≤ T fixed, (ii) As seen from the proof (see (2.38)), in the statement of the above theorem, G 2 (γ) can be replaced by where the X j , j = 1, 2, . . ., N 2 are the components of X. Actually, an estimate more precise than (1.6) is given by a result of Marcus and Rosiński [20] which asserts that if E[X] = 0, then where x 0 is the solution of the equation: where V 2 (x) is as before, while M(x) = u ≥x u ν(du), x > 0.
(iii) As usual, one can easily pass from the mean E N [tr N (f )] to any median m(tr N (f )) in either (1.4) or (1.5).Indeed, for any 0 Without loss of generality assuming the former, otherwise dealing with the latter with −f , consider the function g(y Hence, Next, recall (see [6], [17]) that the Wasserstein distance between any two probability measures µ 1 and µ 2 on R is defined by (1.9) Hence, Theorem 1.2 actually gives a concentration result, with respect to the Wasserstein distance, for the empirical spectral measure μN A , when it deviates from its mean E N [μ N  A ].As in [9], we can also obtain a concentration result for the distance between any particular probability measure and the empirical spectral measure.
) 1≤i<j≤N be a random vector with joint law P N ∼ ID(β, 0, ν) such that E N e t X < +∞, for some t > 0. Let T = sup{t > 0 : E N e t X < +∞} and let h −1 be the inverse of h(s) = R N 2 u (e s u − 1)ν(du), 0 < s < T .Then, for any probability measure µ, Of particular importance is the case of an infinitely divisible vector having boundedly supported Lévy measure.We then have: ) 1≤i<j≤N be a random vector with joint law (i) For any δ > 0, where with G 2 (γ) as in Proposition 1.1, C a universal constant, and t 0 the solution, in t, of (ii) For any probability measure µ on R, and any δ > 0, Hence one can choose τ to be the solution, in x, of the equation It then follows that C(δ, b) can be taken to be Outside of the finite exponential moment assumption, an interesting class of random matrices with infinitely divisible entries are the ones with stable entries, which we now analyze.
Recall that where σ, the spherical component of the Lévy measure, is a finite positive measure on S d−1 , the unit sphere of R d .Since the expected value of the spectral measure of a matrix with α-stable entries might not exist, we look at the deviation from a median.Here is a sample result.
(i) Let f ∈ Lip(1), and let m(tr N (f (X A ))) be any median of tr N (f (X A )). Then, , and where (ii) Let λ max (X A ) be the largest eigenvalue of X A , and let m(λ max (X A )) be any median of λ max (X A ), then , and where Remark 1.8 Let M be a Wigner matrix whose entries M i,i , 1≤i≤N , M R i,j , 1≤i<j≤N , and M I i,j , 1 ≤ i < j ≤ N , are iid random variables, such that the distribution of |M 1,1 | belongs to the domain of attraction of an α-stable distribution, i.e., for any δ > 0, for some slowly varying positive function L such that lim for all t > 0. Soshnikov [28] showed that, for any δ > 0, lim where b N is a normalizing factor such that lim X is in the domain of attraction of an α-stable distribution, concentration inequalities similar to (1.14) or (1.15) can be obtained for general Lipschitz function.In particular, if the Lévy measure of X is given by for some slowly varying function L on [0, +∞), and if we still choose the normalizing factor b N such that lim ) Now, recall that for an N 2 dimensional vector with iid entries, σ(S N 2 −1 ) = N 2 (σ(1) + σ(−1)), where σ(1) is short for σ(1, 0, . . ., 0) and similarly for σ(−1).Thus, for fixed N, our result gives the correct order of the upper bound for large values of δ, since for δ > 1, Moreover, in the stable case, L(δ) becomes constant, and b N = N 2/α .Since λ max (N −2/α M) is a Lipschitz function of the entries of the matrix M with Lipschitz constant at most √ 2N −2/α , for any median m(λ max (N −2/α M)) of λ max (N −2/α M), we have, whenever δ ≥ 2C(α) σ(1) + σ(−1) 1/α .Furthermore, using Theorem 1 in [13], it is not difficult to see that m(λ max (N −2/α M)) can be upper and lower bounded independently of N. Finally, an argument as in Remark 1.15 below will give a lower bound on λ max (N −2/α M) of the same order as (1.18).
The following proposition will give an estimate on any median of a Lipschitz function of X, where X is a stable vector.It is the version of Proposition 1.1 for α-stable vectors.Proposition 1.9 Let X = (ω R i,i , ω R i,j , ω I i,j ) 1≤i<j≤N be an α-stable, 0 < α < 2, random vector in R N 2 with Lévy measure ν given by (1.13).Let f ∈ Lip(1), then where k α/4(2−α) (x), x > 0, is the solution, in y, of the equation and where e k , y ν(dy) ) with e 1 , e 2 , . . ., e N 2 being the canonical basis of R N 2 .
Remark 1.10 (i) When the components of X are independent, a direct computation shows that, up to a constant, E in both J 1 (α) and J 2 (α) In complete similarity to the finite exponential moments case, we can obtain concentration results for the spectral measure of matrices with αstable entries.
(ii) For any probability measure µ, and where It is also possible to obtain concentration results for smaller values of δ.The lower and intermediate range for the stable deviation obtained in [4] provide the appropriate tools to achieve the following result.We refer to [4] for complete arguments, and only provide below a sample result.Theorem 1.12 Let X = (ω R i,i , ω R i,j , ω I i,j ) 1≤i<j≤N be an α-stable, 1 < α < 2, random vector in R N 2 with Lévy measure ν given by (1.13).For any ǫ > 0, there exists η(ǫ), and constants D 1 = D 1 α, a, N, σ(S N 2 −1 ) and D 2 = D 2 α, a, N, σ(S N 2 −1 ) , such that for all 0 < δ < η(ǫ), .
(1.24) Remark 1.13 (i) In (1.14), (1.15) or (1.23), the constant C(α) is not of the right order as α → 2. It is, however, a simple matter to adapt Theorem 2 of [12] to obtain, at the price of worsening the range of validity of the concentration inequalities, the right order in the constants as α → 2.
(ii) Let us now provide some estimation of D 1 and D 2 , which are needed for comparison with the GUE results of [9] (see (2 − α)/10 and let As shown in the proof of the theorem, D 1 = 24D * , while .
(iii) Guionnet and Zeitouni [9], obtained concentration results for the spectral measure of matrices with independent entries, which are either compactly supported or satisfy a logarithmic Sobolev inequality.In particular for the elements of the GUE, their upper bound of concentration for the spectral measure is where C 1 and C 2 are universal constants.In Theorem 1.12, the order, in b, of D 1 is at most b α+1/α , while that of D 2 is at least b −(α+1)/(α−1) .This order is thus consistent with the one in (1.26), as α is close to 2. Taking into account part (ii) above, the order of the constants in (1.24) are correct when α → 2. Following [4] (see also Remark 4 in [19]), we can recover a suboptimal Gaussian result by considering a particular stable random vector X (α) and letting α → 2. Toward this end, let X (α) be the stable random vector whose Lévy measure has for spherical component σ, the uniform measure with total mass σ(S N 2 −1 ) = N 2 (2 − α).As α converges to 2, X (α) converges in distribution to a standard normal random vector.Also, as α → 2, the range in δ in Theorem 1.12 becomes (0, +∞) while the constants in the concentration bound do converge.Thus, the right hand side of (1.24) becomes which is of the same order, in δ, as (1.26).However our order in N is suboptimal.
(iv) In the proof of Theorem 1.12, the desired estimate in (2.56) is achieved through a truncation of order δ −1/α , which, when α → 2, is of the same order as the one used in obtaining (1.26).However, for the GUE result, using Gaussian concentration, a truncation of order ln(12b/δ) gives a slightly better bound, namely, where C 1 and C 2 are absolute constants (different from those of (1.26)).
Wishart matrices are of interest in many contexts, in particular as the sample covariance matrix in statistics.
(a real Wishart matrix is defined similarly with Y I i,j = δ 0 and M = Y t Y).Recall also that if the entries of Y are iid centered random variables with finite variance σ 2 , the empirical distribution of the eigenvalues of Y * Y/N converges as K → ∞, N → ∞, and K/N → γ ∈ (0, +∞) to the Marčenko-Pastur law ( [3], [21]) with density where When the entries of Y are iid Gaussian, Johansson [14] and Johnstone [15] showed, in the complex and real case respectively, that the properly normalized largest eigenvalue converges in distribution to the Tracy-Widom law ( [30], [31]).Soshnikov [27] extended the result of Johnstone to Wishart matrix with Non-Gaussian entries under the condition that K − N = O(N 1/3 ) and that the moments of the entries do not grow too fast.Soshnikov and Fyodorov [29] recently studied the distribution of the largest eigenvalue of the Wishart matrix Y * Y, when the entries of Y are iid Cauchy random variables.We are interested here in concentration for the linear statistics of the spectral measure and for the largest eigenvalue of the Wishart matrix Y * Y, where the entries of Y form an infinitely divisible and, in particular, a stable one.We restrict our work to the complex framework, the real framework being essentially the same.
It is not difficult to see that if Y has iid Gaussian entries, Y * Y has infinitely divisible entries, each with a Lévy measure without a known explicit form.However the dependence structure among the entries of Y * Y prevents the vector of entries to be, itself, infinitely divisible (this is a well known fact originating with Lévy, see [25]).The methodology we previously used cannot be directly applied to deal with functions of eigenvalues of Y * Y.However, concentration results can be obtained when we consider the following facts, due to Guionnet and Zeitouni [9] and already used for that purpose in their paper. Let and Moreover, since the spectrum of Y * Y differs from that of YY * only by the multiplicity of the zero eigenvalue, for any function f , one has where M 1/2 is the unique positive semi-definite square root of M = Y * Y.
Next let P K,N be the joint law of (Y R i,j , Y I i,j ) 1≤i≤K,1≤j≤N on R 2KN , and let E K,N be the corresponding expectation.We present below, in the infinitely divisible case, a concentration result for the largest eigenvalue λ max (M), of the Wishart matrices M = Y * Y.The concentration for the linear statistic tr N (f (M)) could also be obtained using the above observations.
) 1≤i≤N,1≤j≤K be a random vector with joint law P K,N ∼ ID(β, 0, ν) such that E K,N [e t X ] < +∞, for some t > 0. Let T = sup{t > 0 : E K,N [e t X ] < +∞} and let h −1 be the inverse of Then, ) 1≤i≤K,1≤j≤N be an α-stable random vector with Lévy measure ν given by ν(B) = S 2KN−1 σ(dξ) +∞ 0 1 B (rξ)dr/r 1+α .Then, Remark 1.15 (i) As already mentioned, Soshnikov and Fyodorov ([29]) studied the asymptotic for the largest singular value of the K × N random matrix Y, which is the largest eigenvalue of the Wishart matrix Y * Y, when the entries of Y are iid Cauchy random variables.They argue that although the typical eigenvalues of Y * Y is of the order KN, the correct order of the largest eigenvalue of such a matrix is K 2 N 2 .Our result implies that the largest eigenvalue λ max (M) of the Wishart matrix M = Y * Y, when the entries of Y form an α-stable random vector, is of order at most σ(S 2KN −1 ) 2/α .We also have a lower bound result which is described next.In particular, if the entries of the matrix Y are iid α-stable random variables, the largest eigenvalue of Lemma 5.4 in [5]) for any x > 0, and any norm ), which we denote by X λ , if X is a stable vector in R 2KN . where is the unit sphere relative to the norm • λ and where σ is the spherical part of the Lévy measure corresponding to this norm.Moreover, if the components of X are independent, in which case the Lévy measure is supported on the axes of R 2KN , σ S 2KN −1 • λ is of order KN, and so the largest eigenvalue of M 1/2 is of order K 1/α N 1/α .(iii) For any function f such that g(x) = f (x 2 ) is Lipschitz with Lipschitz constant g Lip := |||f ||| L , tr(g(X A )) = tr(f (X (iv) Under the assumptions of part (ii) of Corollary 1.14, for any function ) , and where Remark 1. 16 The methodology used to obtain the results of the present paper, in the absence of the finite exponential moments, can be applied to any matrices whose entries on and above the main diagonal form such an infinitely divisible vector X.However, to obtain explicit estimates, we do need specific bounds on V 2 (r) and ν(r), which are not always available when further knowledge on the Lévy measure of X is lacking.
2 Proofs: We start with a proposition, which is a direct consequence of the concentration inequalities obtained in [11] for general Lipschitz function of infinitely divisible random vectors with finite exponential moment.
Proposition 2.1 Let X = (ω R i,i , ω R i,j , ω I i,j ) 1≤i<j≤N be a random vector with joint law P N ∼ ID(β, 0, ν) such that E N e t X < +∞, for some t > 0 and let T = sup{t > 0 : E N e t X < +∞}.Let h −1 be the inverse of (i) For any Lipschitz function f , (ii) Let λ max (X A ) be the largest eigenvalue of the matrix X A .Then, Proof of Theorem 1.2: For part (i), following the proof of Theorem 1.3 of [9], without loss of generality, by shift invariance, assume that min{x : x ∈ K} = 0. Next, for any v > 0, let (2.33) Clearly g v ∈ Lip(1) with g v ∞ = v.Next for any function f ∈ Lip K (1), any ∆ > 0, define recursively f ∆ (x) = 0 for x ≤ 0, and for (j where g ∆ ∈ Lip(1), regardless of the function f .Now, for δ > 2∆, ∆ ∈Lip(1) whenever 0 < δ < 8 √ 2a|K|h (T − ) /N, and where the last inequality follows from part (i) of the previous proposition by taking also ∆ = δ/4.
In order to prove part (ii), for any f ∈ Lip b (1), i.e, such that f Lip ≤ 1, f ∞ ≤ b, and any τ > 0, let f τ be given via: with g b given as in (2.33).Now, where we have used Proposition 1.1 in the next to last inequality and where the last inequality follows from Theorem 1 in [11] (p. 1233) with We want to choose τ , such that  (2.44) Theorem 1.4 then follows from Theorem 1 in [11].