Spectral measures of powers of random matrices

This paper considers the empirical spectral measure of a power of a random matrix drawn uniformly from one of the compact classical matrix groups. We give sharp bounds on the $L_p$-Wasserstein distances between this empirical measure and the uniform measure on the circle, which show a smooth transition in behavior when the power increases and yield rates on almost sure convergence when the dimension grows. Along the way, we prove the sharp logarithmic Sobolev inequality on the unitary group.


Introduction
The eigenvalues of large random matrices drawn uniformly from the compact classical groups are of interest in a variety of fields, including statistics, number theory, and mathematical physics; see e.g.[7] for a survey.An important general phenomenon discussed at length in [7] is that the eigenvalues of an N × N random unitary matrix U , all of which lie on the circle S 1 = {z ∈ C : |z| = 1}, are typically more evenly spread out than N independently chosen uniform random points in S 1 .It was found by Rains [15] that the eigenvalues of U N are exactly distributed as N independent random points in S 1 ; similar results hold for other compact Lie groups.In subsequent work [16], Rains found that in a sense, the eigenvalues of U m become progressively more independent as m increases from 1 to N .
In this paper we quantify in a precise way the degree of uniformity of the eigenvalues of U m when U is drawn uniformly from any of the classical compact groups U (N ), SU (N ), O (N ), SO (N ), and SÔ (2N ).We do this by bounding, for any p ≥ 1, the mean and tails of the L p -Wasserstein distance W p between the empirical spectral measure µ N,m of U m and the uniform measure ν on S 1 (see Section 4 for the definition of W p ).In particular, we show in Theorem 11 that (1) EW p (µ N,m , ν) ≤ Cp m log N m + 1 N .
Theorem 13 gives a subgaussian tail bound for W p (µ N,m , ν), which is used in Corollary 14 to conclude that if m = m(N ), then with probability 1, for all sufficiently large N , W p (µ N,m , ν) ≤ Cp m log(N ) N min{1,1/2+1/p} .In the case m = 1 and 1 ≤ p ≤ 2, (1) and (2) are optimal up to the logarithmic factor, since the W p -distance from the uniform measure ν to any probability measure supported on N points is at least cN −1 .When m = N , Rains's theorem says that µ N,N is the empirical measure of N independent uniformly distributed points in S 1 , for which the estimate in (1) E. Meckes's research is partially supported by the American Institute of Mathematics and NSF grant DMS-0852898.
M. Meckes's research is partially supported by NSF grant DMS-0902203.
of order N −1/2 is optimal (cf.[6]).We conjecture that Theorem 11 and Corollary 14, which interpolate naturally between these extreme cases, are optimal up to logarithmic factors in their entire parameter space.
In the case that m = 1 and p = 1, these results improve the authors' earlier results in [14] (where W 1 (µ N,1 , ν) was bounded above by CN −2/3 ) to what we conjectured there was the optimal rate; the results above are completely new for m > 1 or p > 1.
The proofs of our main results rest on three foundations: the fact that the eigenvalues of uniform random matrices are determinantal point processes, Rains's representation from [16] of the eigenvalues of powers of uniform random matrices, and logarithmic Sobolev inequalities.In Section 2, we combine some remarkable properties of determinantal point processes with Rains's results to show that the number of eigenvalues of U m contained in an arc is distributed as a sum of independent Bernoulli random variables.In Section 3, we estimate the means and variances of these sums, again using the connection with determinantal point processes.In Section 4, we first generalize the method of Dallaporta [5] to derive bounds on mean Wasserstein distances from those data and prove Theorem 11.Then by combining Rains's results with tensorizable measure concentration properties which follow from logarithmic Sobolev inequalities, we prove Theorem 13 and Corollary 14.We give full details only for the case of U (N ), deferring to Section 5 discussion of the modifications necessary for the other groups.
In order to carry out the approach above, we needed the sharp logarithmic Sobolev inequality on the full unitary group, rather than only on SU (N ) as in [14].It has been noted previously (e.g. in [10,1]) that such a result is clearly desirable, but that because the Ricci tensor of U (N ) is degenerate, the method of proof which works for SU (N ), SO (N ), and SÔ (2N ) breaks down.In the appendix, we prove the logarithmic Sobolev inequality on U (N ) with a constant of optimal order.

A miraculous representation of the eigenvalue counting function
As discussed in the introduction, a fact about the eigenvalue distributions of matrices from the compact classical groups which we use crucially is that they are determinantal point processes.For background on determinantal point processes the reader is referred to [11].The basic definitions will not be repeated here since all that is needed for our purposes is the combination of Propositions 1 and 5 with Proposition 2 and Lemma 6 below.The connection between eigenvalues of random matrices and determinantal point processes has been known in the case of the unitary group at least since [8].For the other groups, the earliest reference we know of is [12].Although the language of determinantal point processes is not used in [12], Proposition 1 below is essentially a summary of [12,Section 5.2].We first need some terminology.
Given an eigenvalue e iθ , 0 ≤ θ < 2π, of a unitary matrix, we refer to θ as an eigenvalue angle of the matrix.Each matrix in SO (2N + 1) has 1 as an eigenvalue, each matrix in SO − (2N + 1) has −1 as an eigenvalue, and each matrix in SO − (2N + 2) has both −1 and 1 as eigenvalues; we refer to all of these as trivial eigenvalues.Here SO − (N ) = {U ∈ O (N ) : det U = −1}, which is considered primarily as a technical tool in order to prove our main results for O (N ).The remaining eigenvalues of matrices in SO (N ) or SÔ (2N ) occur in complex conjugate pairs.When discussing SO (N ), SO − (N ), or SÔ (2N ), we refer to the eigenvalue angles corresponding to the nontrivial eigenvalues in the upper half-circle as nontrivial eigenvalue angles.For U (N ), all the eigenvalue angles are considered nontrivial.
Proposition 1.The nontrivial eigenvalue angles of uniformly distributed random matrices in any of SO (N ), SO − (N ), U (N ), SÔ (2N ) are a determinantal point process, with respect to uniform measure on Λ, with kernels as follows.
K N (x, y) Λ 2 cos(jx) cos(jy) [0, π) 2 sin(jx) sin(jy) [0, π) Proposition 1 allows us to apply the following result from [11]; see also Corollary 4.2.24 of [1].Proposition 2. Let K : Λ × Λ → C be a kernel on a locally compact Polish space Λ such that the corresponding integral operator K : is self-adjoint, nonnegative, and locally trace-class with eigenvalues in [0, 1].For D ⊆ Λ measurable, let K D (x, y) = ½ D (x)K(x, y)½ D (y) be the restriction of K to D. Suppose that D is such that K D is trace-class; denote by {λ k } k∈A the eigenvalues of the corresponding operator K D on L 2 (D) (A may be finite or countable) and denote by N D the number of particles of the determinantal point process with kernel K which lie in D. Then where " d =" denotes equality in distribution and the ξ k are independent Bernoulli random variables with In order to treat powers of uniform random matrices, we will make use of the following elegant result of Rains.For simplicity of exposition, we will restrict attention for now to the unitary group, and discuss in Section 5 the straightforward modifications needed to treat the other classical compact groups.
Proposition 3 (Rains, [16]).Let m ≤ N be fixed.If ∼ denotes equality of eigenvalue distributions, then That is, if U is a uniform N × N unitary matrix, the eigenvalues of U m are distributed as those of m independent uniform unitary matrices of sizes , such that the sum of the sizes of the matrices is N .
is equal to the sum of m independent random variables X i , 1 ≤ i ≤ m, which count the number of eigenvalue angles of smaller-rank uniformly distributed unitary matrices which lie in the interval.Propositions 1 and 2 together imply that each X i is equal in distribution to a sum of independent Bernoulli random variables, which completes the proof of the first claim.The inequality (3) then follows immediately from Bernstein's inequality [19, Lemma 2.7.1].

Means and variances
In order to apply (3), it is necessary to estimate the mean and variance of the eigenvalue counting function N (m) θ .As in the proof of Corollary 4, this reduces by Proposition 3 to considering the case m = 1.Asymptotics for these quantities have been stated in the literature before, e.g. in [18], but not with the uniformity in θ which is needed below, so we indicate one approach to the proofs.A different approach yielding very precise asymptotics was carried out by Rains [15] for the unitary group; we use the approach outlined below because it generalizes easily to all of the other groups and cosets.
For this purpose we again make use of the fact that the eigenvalue distributions of these random matrices are determinantal point processes.It is more convenient for the variance estimates to use here an alternative representation to the one stated in Proposition 1 (which is more convenient for verifying the hypotheses of Proposition 2 and for the mean estimates).First define The following result essentially summarizes [12,Section 5.4].(Note that in the unitary case, the kernels given in Propositions 1 and 5 are not actually equal, but they generate the same process).
Proposition 5.The nontrivial eigenvalue angles of uniformly distributed random matrices in any of SO (N ), SO − (N ), U (N ), SÔ (2N ) are a determinantal point process, with respect to uniform measure on Λ, with kernels as follows.
The following lemma is easy to check using Proposition 2. For the details of the variance expression, see [9, Appendix B].Lemma 6.Let K : I × I → R be a continuous kernel on an interval I representing an orthogonal projection operator on L 2 (µ), where µ is the uniform measure on I.For a subinterval D ⊆ I, denote by N D the number of particles of the determinantal point process with kernel K which lie in D. Then and (1) Let U be uniform in U (N ).For θ ∈ [0, 2π), let N θ be the number of eigenvalues angles of U in [0, θ).Then (2) Let U be uniform in one of SO (2N ), SO − (2N + 2), SO (2N + 1), SO − (2N + 1), or SÔ (2N ).For θ ∈ [0, π), let N θ be the number of nontrivial eigenvalue angles of U in [0, θ).Then Proof.The equality for the unitary group follows from symmetry considerations, or immediately from Proposition 5 and Lemma 6.
In that case sin(θ) ≥ 2θ/π, and so All together, The other cases are handled similarly.
As before, we restrict attention from now on to the unitary group, deferring discussion of the other cases to Section 5.For the first integral, since sin If θ ≤ 1 N , there is no need to break up the integral and one simply has the bound is equal in distribution to the total number of eigenvalue angles in [0, θ) of each of U 0 . . ., U m−1 , where U 0 , . . ., U m−1 are independent and U j is where the N j,θ are the independent counting functions corresponding to U 0 , . . ., U m−1 .
The bounds in the corollary are thus automatic from Propositions 7 and 8. (Note that the N/m in the variance bound, as opposed to the more obvious ⌈N/m⌉, follows from the concavity of the logarithm.)

Wasserstein distances
In this section we prove bounds and concentration inequalities for the spectral measures of fixed powers of uniform random unitary matrices.The method generalizes the approach taken in [5] to bound the distance of the spectral measure of the Gaussian unitary ensemble from the semicircle law.
Recall that for p ≥ 1, the L p -Wasserstein distance between two probability measures µ and ν on C is defined by |w − z| p dπ(w, z) , where Π(µ, ν) is the set of all probability measures on C × C with marginals µ and ν.
Proof.For each 1 ≤ j ≤ N and u > 0, if j + 2u < N then and the above inequality holds trivially.The probability that θ j < 2πj N − 4π N u is bounded in the same way.Inequality (4) now follows from Corollaries 4 and 9.
Theorem 11.Let µ N,m be the spectral measure of U m , where 1 ≤ m ≤ N and U ∈ U (N ) is uniformly distributed, and let ν denote the uniform measure on S 1 .Then for each p ≥ 1, where C > 0 is an absolute constant.
Proof.Let θ j be as in Lemma 10.Then by Fubini's theorem, Observe that in particular, Let ν N be the measure which puts mass 1 N at each of the points e 2πij/N , 1 ≤ j ≤ N .Then It is easy to check that W p (ν N , ν) ≤ π N , and thus Applying Stirling's formula to bound Γ(p + 1) 1 p completes the proof.
In the case that m = 1 and p ≤ 2, Theorem 11 could now be combined with Corollary 2.4 and Lemma 2.5 from [14] in order to obtain a sharp concentration inequality for W p (µ N,1 , ν).However, for m > 1 we did not prove an analogous concentration inequality for W p (µ N,m , ν) because the main tool needed to carry out the approach taken in [14], specifically, a logarithmic Sobolev inequality on the full unitary group, was not available.The appendix to this paper contains the proof of the necessary logarithmic Sobolev inequality on the unitary group (Theorem 15) and the approach to concentration taken in [14], in combination with Proposition 3, can then be carried out in the present context.
The following lemma, which generalizes part of [14,Lemma 2.3], provides the necessary Lipschitz estimates for the functions to which the concentration property will be applied.Lemma 12. Let p ≥ 1.The map A → µ A taking an N × N normal matrix to its spectral measure is Lipschitz with constant N −1/ max{p,2} with respect to W p .Thus if ρ is any fixed probability measure on C, the map A → W p (µ A , ρ) is Lipschitz with constant N −1/ max{p,2} .
Proof.If A and B are N × N normal matrices, then the Hoffman-Wielandt inequality [3, Theorem VI.4.1] states that (5) min where λ 1 (A), . . ., λ N (A) and λ 1 (B), . . ., λ N (B) are the eigenvalues (with multiplicity, in any order) of A and B respectively, and Σ N is the group of permutations on N letters.Defining couplings of µ A and µ B given by for σ ∈ Σ N , it follows from (5) that Theorem 13.Let µ N,m be the empirical spectral measure of U m , where U ∈ U (N ) is uniformly distributed and 1 ≤ m ≤ N , and let ν denote the uniform probability measure on S 1 .Then for each t > 0, for p > 2, where C > 0 is an absolute constant.
Proof.By Proposition 3, µ N,m is equal in distribution to the spectral measure of a blockdiagonal where the U j are independent and uniform in U N m and U N m .Identify µ N,m with this measure and define the function Applying the concentration inequality in Corollary 17 of the appendix to the function F gives that P F (U 1 , . . ., U m ) ≥ EF (U 1 , . . ., U m ) + t ≤ e −N t 2 /24mL 2 , where L is the Lipschitz constant of F , and we have used the trivial estimate N m ≥ N 2m .Inserting the estimate of EF (U 1 , . . ., U m ) from Theorem 11 and the Lipschitz estimates of Lemma 12 completes the proof.
Corollary 14. Suppose that for each N , U N ∈ U (N ) is uniformly distributed and 1 ≤ m N ≤ N .Let ν denote the uniform measure on S 1 .There is an absolute constant C such that given p ≥ 1, with probability 1, for all sufficiently large N , Proof.In Theorem 13 let and apply the Borel-Cantelli lemma.
We observe that Corollary 14 makes no assumption about any joint distribution of the matrices {U N } N ∈N ; in particular, they need not be independent.
As a final note, Rains's Proposition 3 above shows that, in the case m = N , µ N,m is the empirical measure of N i.i.d. points on S 1 .By another result of Rains [15], the same is true when m > N .In particular, in all the above results the restriction m ≤ N may be removed if m is simply replaced by min{m, N } in the conclusion.

Other groups
The approach taken above can be completed in essentially the same way for SO (N ), SO − (N ) and SÔ (2N ), so that all the results above hold in those cases as well, with only the precise values of constants changed.
In [16], Rains proved that the eigenvalue distributions for these groups (or rather, components, in the case of SO − (N )) can be decomposed similarly to the decomposition described in Proposition 3, although the decompositions are more complicated in those cases (mostly because of parity issues).The crucial fact, though, is that the decomposition is still in terms of independent copies of smaller-rank (orthogonal) groups and cosets.This allows for the representation of the eigenvalue counting function in all cases as a sum of independent Bernoulli random variables (allowing for the application of Bernstein's inequality) and as a sum of independent copies of eigenvalue counting functions for smaller-rank groups.In particular the analogue of Corollary 4 holds and it suffices to estimate the means and variances in the case m = 1.The analogue of Proposition 8 for the other groups can be proved similarly using Proposition 5 and Lemma 6.
With those tools and Proposition 7 on hand, the analogue of Theorem 11 can be proved in the same way, with a minor twist.One can bound as in the proof of Theorem 11 the distance between the empirical measure associated to the nontrivial eigenvalues and the uniform measure on the upper-half circle.Since the nontrivial eigenvalues occur in complex conjugate pairs and there are at most two trivial eigenvalues, one gets essentially the same bound for the distance between the empirical spectral measure and the uniform measure on the whole circle.
Finally, logarithmic Sobolev inequalities -and hence concentration results analogous to Corollary 17 -for the other groups are already known via the Bakry-Émery criterion, cf.[1, Section 4.4], so that the analogue of Theorem 13 follows as for the unitary group.
For the special unitary group SU (N ), all the results stated above hold exactly as stated for the full unitary group, cf. the proof of [14,Lemma 2.5].Analogous results for the full orthogonal group O (N ) follow from the results for SO (N ) and SO − (N ) by conditioning on the determinant, cf. the proofs of Theorem 2.6 and Corollary 2.7 in [14].

Appendix: the log-Sobolev constant of the unitary group
In this section we prove a logarithmic Sobolev inequality for the unitary group with a constant of optimal order.As a consequence, we obtain a sharp concentration inequality, independent of k, for functions of k independent unitary random matrices.
Recall the following general definitions for a metric space (X, d) equipped with a Borel probability measure µ.The entropy of a measurable function f : We say that (X, d, µ) satisfies a logarithmic Sobolev inequality (or log-Sobolev inequality for short) with constant C > 0 if, for every locally Lipschitz f : X → R, Theorem 15.The unitary group U (N ), equipped with its uniform probability measure and the Hilbert-Schmidt metric, satisfies a logarithmic Sobolev inequality with constant 6/N .
If the Riemannian structure on U (N ) is the one induced by the usual Hilbert-Schmidt inner product on matrix space, the the geodesic distance is bounded above by π/2 times the Hilbert-Schmidt distance on U (N ) (see e.g.[4, Lemma 3.9.1]).Thus Theorem 15 implies that U (N ) equipped with the geodesic distance also satisfies a log-Sobolev inequality, with constant 3π 2 /2N .
It is already known that every compact Riemannian manifold, equipped with the normalized Riemannian volume measure and geodesic distance, satisfies a log-Sobolev inequality with some finite constant [17].For applications like those in this paper to a sequence of manifolds such as {U (N )} ∞ N =1 , however, the order of the constant as N grows is crucial.The constant in Theorem 15 is best possible up to a constant factor; this can be seen, for example, from the fact that one can recover the sharp concentration of measure phenomenon on the sphere from Corollary 17 below.
Since the map F is √ 3-Lipschitz, its image U (N ) with the (uniform) image measure satisfies a logarithmic Sobolev inequality with constant ( √ 3) 2 2 N = 6 N .
The lack of dependence on k is a crucial feature of the inequality in Corollary 17; unlike logarithmic Sobolev inequalities, concentration inequalities themselves do not tensorize without introducing a dependence on k.

16 . 9 .
there is no need to break up the integral and one simply has a bound of π 2 .All together,Var N θ ≤ log(N ) + 11Corollary Let U be uniform in U (N ) and 1 ≤ m ≤ N .For θ ∈ [0, 2π), let N (m) θbe the number of eigenvalue angles of U m in [0, θ).Then EN