Wigner theorems for random matrices with dependent entries: Ensembles associated to symmetric spaces and sample covariance matrices

It is a classical result of Wigner that for an hermitian matrix with independent entries on and above the diagonal, the mean empirical eigenvalue distribution converges weakly to the semicircle law as matrix size tends to infinity. In this paper, we prove analogs of Wigner's theorem for random matrices taken from all infinitesimal versions of classical symmetric spaces. This is a class of models which contains those studied by Wigner and Dyson, along with seven others arising in condensed matter physics. Like Wigner's, our results are universal in that they only depend on certain assumptions about the moments of the matrix entries, but not on the specifics of their distributions. What is more, we allow for a certain amount of dependence among the matrix entries, in the spirit of a recent generalization of Wigner's theorem, due to Schenker and Schulz-Baldes. As a byproduct, we obtain a universality result for sample covariance matrices with dependent entries.


Introduction
Classical physics-inspired random matrix theory is chiefly concerned with probability measures on what Freeman Dyson in 1962 called the "threefold way", namely, the spaces of hermitian, real symmetric, and quaternion real matrices (or their respective exponentiated, compact versions). The rationale behind this focus is Dyson's proof that any hermitian matrix (thought of as a truncated Hamiltonian of a quantum system) that commutes with a group of unitary symmetries and "time reversals" breaks down to these three constituents ( [6]), which are, in structural terms, the tangent spaces to the Riemannian Symmetric Spaces (RSS) of type A, AI and AII.
During the last decade, theoretical condensed matter physicists have pointed out that matrix descriptions of systems such as mesoscopic normal-superconducting hybrid structures are outside the scope of Dyson's theorem, and that the tangent spaces to all ten infinite series of classical RSS may (and do indeed) arise. The deeper reasons are explained in [1, Section 6.4], [2], and [10]. Some of this material is summarized in [7]. Concrete matrix realizations of this "tenfold way" of "symmetry classes" are given in Section 2 below.
The task of developing random matrix theories for the full "tenfold way", i.e., studying probability measures on all ten series of matrix spaces, has been taken up in [7], where the probability measures enjoy invariance properties that guarantee an explicit analytic expression for the joint eigenvalue density, and in [4,5], where the focus is on the compact versions of the classical RSS, endowed with their natural invariant probability measure. In the present paper, we abandon invariance properties and turn to analogs of Wigner's famous result of 1958 ( [15]), stating that for a symmetric matrix with independent entries on and above the diagonal, the mean empirical eigenvalue distribution converges weakly to the semicircle law as matrix size tends to infinity. This is a universality result in the sense that it only depends on certain assumptions about the moments of the matrix entries, but not on the specifics of their distributions.
Actually, our starting point is not the classical version of Wigner's result, but a recent generalization, due to Schenker and Schulz-Baldes ( [14]), allowing for a certain amount of dependence to hold among the matrix entries. Specifically, the authors consider the following set-up: For each n ∈ N write I n := {1, . . . , n} and suppose that I 2 n = I n × I n comes with an equivalence relation ∼ n . The entries of the matrix X n = ( 1 √ n a n (p, q)) p,q=1,...,n are complex random variables, with a n (p 1 , q 1 ), . . . , a n (p j , q j ) independent whenever (p 1 , q 1 ), . . . , (p j , q j ) belong to j distinct equivalence classes of the relation ∼ n . Furthermore, it is required that a n (p, q) = a n (q, p) for all n, p, q. In the case that all equivalence classes of ∼ n are of the form {(p, q), (q, p)}, one is back to hermitian matrices with independent entries on and above the diagonal, i.e. to the situation of Wigner's theorem. If some equivalence classes are larger, then there is some leeway for violations of independence. In the framework of Schenker and Schulz-Baldes, the following conditions on ∼ n serve as a less restrictive substitute for independence: n : (p, q) ∼ n (q, p ′ ) and p = p ′ } = o(n 2 ). Apart from that, one requires that for all n, p, q, a n (p, q) is centered and E(a n (p, q)a n (p, q)) = 1. (1) Furthermore, a uniform bound on the k-th moments is assumed: For an hermitian matrix M ∈ C n×n with eigenvalues λ 1 , . . . , λ n write for the empirical measure of the eigenvalues of M. Then the main theorem of [14] can be stated as follows: Under conditions (W1), (W2), (W3), (1), and (2), the measure E(L n (X n )), i.e. the mean empirical eigenvalue distribution, converges weakly to the semicircle law with density 1 2π So one ends up with the same limit distribution as in the independent case. In fact, conditions (W1), (W2), (W3) arose from a close reading of Wigner's proof, in order to understand how much independence is really needed to arrive at the semicircle law. So the approach of Schenker and Schulz-Baldes is complementary to the route taken in a recent preprint of Anderson and Zeitouni ([3]), where a different dependence structure leads to a limit which is not a semicircle.
To establish analogs of Theorem 1.1 for all ten symmetry classes, we proceed as follows: In Section 2 we give precise definitions of the matrix spaces in question and introduce some auxiliary notation for the combinatorics of moment calculations. In Section 3 we treat those classes for which EL n converges to the semicircle distribution and for which nothing more than a slight extension of Theorem 1.1 is needed. On the other hand, substantial work has to be done for the so-called "chiral" classes, which, despite their roots in physics, are related to sample covariance matrices and hence lead to some relative of the Marčenko-Pastur distribution as limit for EL n . The main step is to rework the combinatorics of moment convergence to the Marčenko-Pastur law in the spirit of Schenker and Schulz-Baldes, yielding a universality result for sample covariance matrices with dependent entries. This is the content of Section 4. In the final Section 5, this result is applied to the chiral classes.

Background and notation
We begin by listing the ten "symmetry classes", i.e. series of matrix spaces, from which our matrices are taken. In structural terms, these spaces are of the form ig, where g is the Lie algebra of a compact classical group, or ip, where p is the −1 eigenspace of a Cartan involution of g (see [7] for details). The i factor is to make sure that the matrices are hermitian. The labels A, AI, etc. in the list are those of Lie theory, but no Lie theoretic properties will be needed in what follows. X * denotes the conjugate transpose of a matrix X.
Class A: M A n = {X ∈ C n×n : X hermitian} Class AI: M AI n = {X ∈ R n×n : X symmetric} Class AII: Class AIII: Class B/D: Class BDI: Class DIII: : X i ∈ (iR) n×n skew symmetric Class C: Class CI: Class CII: where the space H s×t of quaternionic matrices is embedded into C 2s×2t as Generically, we write C for any label A, AI,..., CII. We have M C n ⊂ C δn×δn with δ = δ C = 2 if C = AII, DIII, C, CI, CII and δ = 1 otherwise. Classes AIII, BDI and CII, the "chiral" classes in physics terminology (since they are related to Dirac operators, see [8], [10]), are special in that the shape of the subblocks depends on an extra parameter s, and it is clear that one will have to control the relative growth of s = s(n) and t(n) = n − s(n) as n → ∞. In fact, their large n behaviour is quite different from that of the other classes.
Before getting down to business, let us review some combinatorial notation and facts that will be useful later on. Write P(n) for the set of all partitions of I n = {1, 2, . . . , n}. If p ∈ P(n) has r blocks, write |p| = r. Define P (i) (n) := {p ∈ P(n) : |p| = i}. If each of the blocks of p consists of exactly two elements, we say that p is a pair partition and write p ∈ P 2 (n). For p ∈ P(n), write ∼ p ⊆ I n × I n for the corresponding equivalence relation. Given p, q ∈ P(n), define ∼ p ∨ ∼ q ⊆ I n × I n as the smallest equivalence relation which contains ∼ p and ∼ q . The partition corresponding to Otherwise, it is called noncrossing. Write NC(n), NC (i) (n) and NC 2 (n) for the set of noncrossing partitions, noncrossing partitions with i blocks, and noncrossing pair partitions of I n , respectively. For sets Ω, Ω ′ write F (Ω, Ω ′ ) := {ϕ : Ω → Ω ′ } and F (k, n) := F (I k , I n ). For p ∈ P(k) write F (p, Ω) for the set of all ϕ ∈ F (I k , Ω) which are constant on the blocks of p. Finally, let us give names to two important special pair partitions: It is well-known that # NC 2 (2k) equals the kth Catalan number C k , which in turn equals the 2kth moment of the semicircular distribution with density given in (4) (whose odd moments vanish). On the other hand, setting for κ > 0 (m k ) k∈N is the sequence of moments of the Marčenko-Pastur distribution with density A reference for these facts is [9] or [11].

Wigner-Dyson and Bogolioubov-de Gennes classes
These are the easy cases, because they basically reduce to Theorem 1.1. In fact, one may interpret the symmetries of a matrix from M C n as an equivalence relation on pairs of indices, recalling that in the set-up of Theorem 1.1, if index pairs (p, q), (p ′ , q ′ ) are equivalent, then the corresponding random variables a n (p, q), a n (p ′ , q ′ ) may be identical. Of course, for C = AII, DIII, C, CI, CII, the symmetries of M C n must be realized as equivalence relations on I 2n × I 2n , so one obtains the desired theorem on random elements of M C n by passage to a subsequence in Theorem 1.1. A more serious caveat is the following: Some of the blocks which make up matrices from M C n are skew symmetric, so their diagonal elements vanish, contradicting condition (1). While we will see that this problem can be circumvented in the cases at hand, the full blocks of zeroes in the chiral cases make it impossible to apply Theorem 1.1 for them as well.
To make the set-up for this section precise, let C be a Wigner-Dyson or BdG class, write δ = δ C as in Section 2, J C δn := {(p, q) ∈ I 2 δn : pr p,q (M C n ) = 0}, where pr p,q projects each element of M C n onto its (p, q)-entry. Consider an equivalence relation ∼ δn on I 2 δn and a random matrix X δn = ( 1 √ δn a δn (p, q)) p,q=1,...,δn such that the centered complex random variables a δn (p 1 , q 1 ), . . . , a δn (p j , q j ) are independent whenever (p 1 , q 1 ), . . . , (p j , q j ) belong to j distinct equivalence classes of the relation ∼ δn . Assume that conditions (W1), (W2), (W3) hold, with I δn in the place of I n , and that the moment condition (2) is satisfied. As to (1), it is required that it holds for all (p, q) ∈ J C δn . All realizations of the matrix X δn are supposed to be elements of M C n . It is straightforward to verify that this assumption is compatible with conditions (W1), (W2), (W3). So we may take the symmetries of the matrices for granted, and have some leeway for extra dependence between the matrix entries. Under these conditions, there holds Theorem 3.1. If C is a Wigner-Dyson or Bogolioubov-de Gennes class, E(L δn (X δn )) converges weakly to the semicircle law.
It only remains to address the complication that random elements of M n may have up to 4n entries which are identically zero. To see that the effect of this complication is asymptotically negligible, recall from the proof of Theorem 1.1 in [14] that the k-th moment of EL n vanishes if k is odd and is asymptotically equivalent to 1 n l+1 if k = 2l is even. Here the set S n (p) consists of all pairs (ϕ, ψ), ϕ, ψ ∈ F (k, n), with the following properties: (i) ψ(j) = ϕ(j + 1) for all j ∈ I k , where k + 1 is cyclically identified with 1.
For n ∈ N fix E n ⊂ I 2 n with #E n = o(n 2 ). Actually, what we have in mind is that E n contains the O(n) diagonal places of skew blocks. For ν ∈ I k set Then the following lemma makes it possible to neglect the effect of the zero entries on the diagonals of the blocks: n (p), it suffices to show that for all ν one has #S (ν) n (p) = o(n l+1 ). To this end, we construct an element (ϕ, ψ) ∈ S (ν) n (p), starting with (ϕ(ν), ψ(ν)) and proceeding cyclically. By cyclically permuting the index set, we may assume that ν = 1. For (ϕ(1), ψ(1)) we have o(n 2 ) choices. ϕ(2) is then fixed by property (i). If 1 ∼ p 2, then ψ(2) is fixed by (ii). Otherwise, we have at most n choices. Similarly, for j ≥ 2, once we have chosen ψ(j − 1), ϕ(j) is fixed, and ψ(j) is either fixed or we have at most n choices for it, depending on whether i ∼ p j for some 1 ≤ i < j or not. The latter case occurs l − 1 times. So #S

Sample covariance matrices
In this section we prove a limit theorem for the mean empirical eigenvalue distribution of sample covariance matrices with dependence. By these we understand matrices of the form X * X, where X admits a certain amount of dependence among its entries, X * is the conjugate transpose of X, and the entries of X are not necessarily Gaussian, but subject to certain conditions on their moments. In our context, this is preparatory work for the study of the chiral classes in Section 5, but it is of interest for its own sake. In the case that X has independent entries, the result is well-known, and we will take the combinatorial proof of Oravecz and Petz ( [13]) as starting-point for an analysis in the spirit of Schenker and Schulz-Baldes ( [14]).
Theorem 4.1. As n → ∞, E(L n (X * n X n )) converges weakly to a probability measure with k-th moment equal to If µ = 1, this limit is the Marčenko-Pastur distribution.
The following lemma is a straightforward adaptation of a key step of [14] to the present context. Proof. Suppose that p contains a nearest neighbour pair, i.e. a block of the form {ν, ν + 1}. Assume that ν is odd.
Since we may argue analogously for ν even, we have shown that Since p was assumed to be crossing, iterating this argument yields where p ′ contains no nearest neighbour pair. Upon relabelling, we may regard p ′ as an element of P 2 (2(k − r)), where k − r ≥ 2. Note that we may have p = p ′ , whence it is possible that r = 0.
The following is evident: For p ∈ NC 2 (2k), each of the blocks of p consists of exactly one odd and exactly one even number.
Proof. We may identify P(k) with {p ∈ P(2k) : any block of p is the union of blocks of m}. So p → p ∨ m maps P(2k) onto P(k). It is easy to see that if p ∨ m is crossing, so is p. In fact NC 2 (2k) is mapped bijectively onto NC(k). To see this, it suffices to show that p → p ∨ m is injective, as it is known that # NC(k) = # NC 2 (2k) (see [12,Remark 9.5]). A block of p ∨ m is of the form b J = ν∈J {2ν − 1, 2ν} for some J ⊆ I k . We have to show that there exists precisely onep ∈ NC 2 (b J ) such thatp ∨ {{2ν − 1, 2ν} : ν ∈ J} = {b J }. Since our aim is to show that a pairing of the elements of b J with certain properties is uniquely determined, the embedding of b J into I 2k is irrelevant, and we may assume that b J = I 2r for some r ≤ k. Let us start by finding a partner for 1. By Lemma 4.7, the partner must be even, 1 ∼p 2s, say. Assume that s < r. Since we wish to construct a noncrossingp, no 1 < ν < 2s can be paired with any ν ′ > 2s. On the other hand, 2s ∼ m 2s + 1, so 2s and 2s + 1 lie in distinct blocks ofp ∨ m, contradicting the requirement thatp ∨ m = {I 2r }. Consequently, we must have 1 ∼p 2r.
By Lemmata 4.9 and 4.3 we may replace EΣ n (p) by EΣ ∨ n (p) in (15). Recall that for (ϕ, ψ) ∈ S ∨ n (p), ϕ and ψ are a fortiori constant on the blocks of p. Comparing Lemma 4.7 with (10), one sees that this implies that given (ϕ, ψ) ∈ S ∨ n (p), a block of p corresponds to a matrix element and its complex conjugate. Invoking (1) and Lemma 4.6, we see that (15) is asymptotically equivalent to which tends, as n → ∞, to where the last equality follows from Lemma 4.8.

The chiral classes
In this section we apply our result about sample covariance matrices to random elements of the spaces M C n from Section 2, where C = BDI, AIII or CII. It is convenient to consider the AIII case first. It consists of matrices of the form with X n ∈ C s(n)×t(n) , hence s(n) + t(n) = n. We assume that lim n→∞ s(n) n = κ ∈]0, 1[, hence lim n→∞ t(n) n = 1 − κ =: µ. Note that this framework is more restrictive than the one considered in Section 4. But these restrictions naturally arise if one considers X n as a subblock of an element of M AIII n , whence n is the natural parameter for asymptotics. Observe that Tr 0 X n X * n 0 k = 0 if k odd 2 Tr((X * n X n ) l ) if k = 2l even.
The special case where the entries of X n take purely imaginary values yields the same limit for class BDI. Since the extra symmetries of the CII case are compatible with (MP1), (MP2) and (MP3), we obtain the same limit for this class, too.
(19) Note that (18) differs from the corresponding formula in [7], since a different definition of L n is used in that paper, and since (1) above imposes a condition on complex rather than real variances. Now, the elegant approach of Haagerup and Thorbjørnsen ( [9]) to the moments of the Marčenko-Pastur distribution can be easily adapted to µ ch,2 , to the effect that the l-th moment of µ ch,2 is indeed given by (17). In fact, for l ≥ 1, Setting g(θ) = ( √ κ + √ 1 − κ e iθ ) l−1 and observing that sin 2 θ = 1 2 (1 − cos 2θ), we see that (20) can be written as κ(1 − κ) π with h(θ) = e iθ g(θ), k(θ) = e −iθ g(θ). Invoking Parseval's formula and elementary computations with binomial coefficients, we obtain x l µ ch,2 (dx) = 2 A reference for the last equality is [12,Cor. 9.13]. In view of (16), we have Theorem 5.1. If C is a chiral class, then E(L δn (X δn )) converges to a probability measure µ ch on R given by with a, b as in (19) above.