Central limit theorem for the complex eigenvalues of Gaussian random matrices

We establish a central limit theorem for the eigenvalue counting function of a matrix of real Gaussian random variables.

1. Introduction 1.1.Main Result.This note proves a central limit theorem (CLT) for the eigenvalue counting function of a matrix of real Gaussian random variables in regions of the complex plane.While such a result is well known for matrices of complex Gaussians (see [4,Section 3.1] for a survey), to the best of our knowledge, the analogous statement for real Gaussian matrices has not previously been addressed.
We begin by defining the random matrix ensemble of interest in this work.
Definition 1.1.For all N ∈ N, let G N = (g ij ) 1≤i,j≤N be a random matrix whose entries are mutually independent Gaussian random variables with mean zero and variance one.We call G N the real Ginibre matrix (GinOE) of dimension N .We also denote W N = N −1/2 G N .
In the limit as N goes to infinity, it is known that the empirical spectral distribution of W N tends to the uniform measure on the unit disk D = {z ∈ C : |z| < 1} [2].We note that the eigenvalues of W N come in conjugate pairs, since W N is real; if λ ∈ C is an eigenvalue, then so is λ.It is therefore natural when studying the fluctuations of the eigenvalues of W N to restrict attention to the upper half disk D + = {z ∈ C : |z| < 1, Im z > 0}.We recall that a domain is defined as a non-empty connected open subset of C. Definition 1.2.We say that a domain A is admissible if A ⊂ D + .This condition is slightly stronger than requiring A ⊂ D + , since it enforces a separation between A and the boundary of D + .We also recall that a domain is said to be Lipschitz if its boundary is locally the graph of a Lipschitz continuous function; see [25,Definition 12.9].Given an admissible Lipschitz domain A, we let ℓ(∂A) denote the length of its boundary.
Denote the eigenvalues of W N by λ 1 , . . ., λ N , in an arbitrary order.Given an admissible domain A, we define f A : C → R by f A (z) = ½ A (z), and define the (N -dependent) random variable The following theorem is our main result.We let N (0, c) denote a Gaussian random variable with mean zero and variance c > 0.
Theorem 1.3.Let A be an admissible Lipschitz domain.Then we have the weak convergence lim ( The variance of X A is of order N 1/2 , which is smaller than the variance of order N seen in sums of independent random variables.This is due to the strong correlations between the eigenvalues of W N [28].Further, the variance of the Gaussian in ( 2) is identical to the one in the analogous theorem for complex Gaussian matrices [4, (3.9)].
1.2.Background.The analogue of Theorem 1.3 for a complex Ginibre matrix (GinUE) is known.It is a consequence of a theorem that provides a CLT for a broad class of determinantal point processes proved in [35, Section 2] (see also [11]), together with the the explicit computation of the asymptotic variance in [26, Corollary 1.2.1]. 1 See [6, Corollary 1.7] for an alternative proof in the case where A has a smooth boundary.Further, a local CLT for the counting function of the GinUE eigenvalues was derived in [17].
All of these works crucially rely on the fact that the eigenvalues of the GinUE form a determinantal point process.While this determinantal structure enables a precise analysis of many aspects of the GinUE, it is absent in the GinOE.Instead, the eigenvalues of the GinOE form a Pfaffian point process, and consequently they are more difficult to study [5].
Previous work on linear statistics of the GinOE has considered smooth test functions of the complex eigenvalues [23,30], differentiable functions of the real eigenvalues [15,23,31], general functions of the real eigenvalues [15], and the number of real eigenvalues [12,13,15,16,22,31].There have also been a few recent articles proving CLTs for linear statistics of matrices of general i.i.d.random variables when the test function has at least two derivatives [7][8][9].Proving a CLT for the eigenvalue counting function in this more general setting remains an open problem.
1.3.Outline.In Section 2, we collect several preliminary lemmas, and show that the Pfaffian correlations of the GinOE eigenvalues may be quantitatively approximated by determinantal correlations.In Section 3, we compute the variance and higher cumulants of X A , and show that they match those of the desired Gaussian distribution, concluding the proof of Theorem 1.3.Using the results of [26], it is straightforward to extend Theorem 1.3 to all domains with finite perimeter (so-called Caccioppoli sets) and certain domains with fractal boundaries.We briefly discuss this point in Remark 3.8.1.4.Acknowledgments.The authors thank P. Bourgade for suggesting the problem, C. D. Sinclair for helpful comments on the references [3,32], and P. J. Forrester for comments on a preliminary draft.They are grateful to the referees for several useful suggestions.P. L. was supported by the NSF postdoctoral fellowship DMS-2202891.X. X. was supported by NSF grant DMS-1954351.

Preliminary Results
Set C * = C\R.We recall that for all k ∈ N, the complex-complex correlation functions ρ (N ) k : (C * ) k → R for G N are defined by the following property [3, (5.1)].For every compactly supported, bounded Borel-measurable function f : (C * ) k → R, we have where I k ⊂ {1, . . ., N } N is the set of pairwise distinct k-tuples of indices, {w i } N i=1 are the eigenvalues of G N , and we use dz i to denote the Lebesgue measure on C. We typically write ρ k instead of ρ (N ) k , since the value of N will be clear from context.We also recall that if M = (M ij ) 2n i,j=1 is a 2n × 2n skew-symmetric matrix, its Pfaffian is defined as where S 2n is the symmetric group of degree 2n.
The following lemma, taken from [29, Appendix B.3], identifies the correlation functions ρ k explicitly.
Lemma 2.1.The k-point complex-complex correlation functions of the N -dimensional real Ginibre ensemble G N are given by where (K(z i , z j )) 1≤i,j≤k is a 2k × 2k matrix composed of the 2 × 2 blocks and D N , I N , and S N are defined by where z, w ∈ C * and Remark 2.2.The functions ρ k were first determined explicitly in [19].The Pfaffian form in (5) was derived in the case of even N in [3].Subsequently, a variety of methods have been used to recover this form for all N [18,32,33] (see also [29,Section 4.6]).
By a change of variable it is straightforward to see that the k-th correlation function for the complex eigenvalues of The following lemma is useful for controlling these functions and is proved in Section 4. We let d A = inf{|z − w| : z ∈ A, w ∈ ∂D + } denote the distance between A and the boundary of D + , and use the standard "big O" notation O(•) for estimates that hold in the limit N → ∞.
Lemma 2.3.Let A be an admissible domain.Then there exists a constant c(d where the implicit constants in the asymptotic notation depend only on d A .
We next state a useful lemma about Pfaffians, proved in [21, Appendix B].
Lemma 2.4.Let M = (M ij ) 2n i,j=1 be a skew-symmetric 2n × 2n matrix such that M ij = 0 when i ≡ j mod 2. Let M = ( M ) n i,j=1 be the n×n matrix formed by setting Finally, we require the following integral formula from [26,Corollary 3.1.4].Lemma 2.5.Let J : C → R be radially symmetric (meaning J(z) = J(|z|)) and nonnegative.Suppose further that C J(z) • |z| dz = 1.Then for any admissible Lipschitz region A, 3. Proof of Theorem 1.3 3.1.Variance Calculation.The following lemma follows from results proved in [23].We sketch the proof for completeness.
Lemma 3.1.For any admissible domain A, where the implicit constant in the asymptotic notation depends only on d A .
Proof.From the definition (1) of X A , we compute Writing this expression in terms of correlation functions using ( 3) and ( 5), we obtain The last term in (7) vanishes exponentially, by Lemma 2.3.The first term is computed in [23, Lemma 7] and equals The second term is computed in the proof of [23,Lemma 9] and equals Inserting ( 8) and ( 9) into (7) completes the proof. 3We observe that the asymptotic bounds in the proofs of the cited lemmas rely only on Lemma 2.3 and the estimates Lemma 4.1 and (20) stated below, whose error terms depend on A only through d A .This justifies the claim that the implicit constant in (6) depends only on d A , even though this dependence was not made explicit in [23].
Lemma 3.2.For any admissible Lipschitz domain A, lim 3 While the main results of [23] require the test function to be smooth, these calculations do not.Further, they were given for even N in [23] using the statement of (5) for even N in [3].Their extension to odd N requires only notational changes, given that ( 5) is now known for all N .
3.2.Higher Cumulants.We recall that given a random variable X, its cumulants and that for every n ∈ N, there exists a degree n polynomial L n (independent of the choice of X) such that Let Y be a point process on a subset D ⊂ C with N particles {y i } N i=1 and correlation functions τ k : for all k ∈ N and all compactly supported, bounded Borel-measurable functions f : (14), and note that the number of elements (i 1 , . . ., i k ) ∈ I k such that f (y i , . . ., , since there are N A choices for i 1 , and then N A − 1 choices remaining for i 2 , and so on until i k .This implies the well-known identity Let J k denote the integral on the right-hand side of (15).Then (15) implies that for all n ∈ N, the moment E[N n A ] is equal to a linear combination of the terms J 1 , . . ., J n , with universal coefficients (independent of Y ).Recalling the definition of the polynomial L n , we conclude that for every n ∈ N, there exists a universal polynomial H n such that The following lemma is a consequence of Lemma 2.3 and Lemma 2.4.Let A be an admissible domain, and for all k ∈ N, set Q (k) (z 1 , . . ., z k ) = (S N (z i , z j )) 1≤i,j≤k .We define Lemma 3.3.For any admissible domain A, there exists a constant c(d A ) > 0 such that κ n (X A ) = H n (T 1 , . . ., T n ) + O(e −cN ) for all n ∈ N. The implicit constant depends only on n and d A .
Proof.We begin by computing ρ k ( √ N z 1 , . . ., √ N z k ) using the definition of ρ k in ( 5) and the definition of a Pfaffian in (4).By Lemma 2.3, all terms in the defining sum (4) containing a factor of D N or I N are exponentially small.We conclude that sup where where ( K(z i , z j )) 1≤i,j≤k is a 2k × 2k matrix composed of the 2 × 2 blocks Combining ( 15), ( 18), (19), and the definition of T k in (17), we find is the k-th correlation function for the complex eigenvalues of W N .The conclusion follows after recalling the definition of H n from ( 16) and using the trivial inequality |T k | ≤ 2N k .Lemma 3.3 motivates the next definition.Definition 3.4.We define the pseudo-cumulants of X A by κ n = H n (T 1 , . . ., T n ) for all n ∈ N.
The following cumulant identity is known for determinantal processes [34, (2.6)].The proof in [34] works for the pseudo-cumulants without modification, since they are defined in terms of a determinantal kernel. 4  4 They are precisely the cumulants of the determinantal point process defined by the kernel , if such a process exists.We do not address the question of existence here, since this claim is not needed.Lemma 3.5.For all n ∈ N, we have with the convention that z m+1 = z 1 .
The next lemma follows from the previous one by induction; see [35,Lemma 1] for the statement in the case of determinantal processes.Lemma 3.6.For all n ∈ N, there exist constants (α nj ) n−1 j=2 (independent of A and N ) such that In light of the previous lemma, we now aim to calculate the terms R n .Lemma 3.7.Fix δ ∈ (0, 1/2).For k ≥ 2, we have where the implicit constant in the asymptotic notation depends only on d A , k, and δ.
To prepare for the proof, we recall the standard error function asymptotic Proof of Lemma 3.7.The case k = 1 is [23, Lemma 7], so we suppose that k ≥ 2. Then by (20) and Lemma 4.1, for all z, w ∈ A, we have the asymptotic expansion where 2π Im(z) Im(w) ( w − z).
We claim that the leading order term in To show this, we begin by illustrating how to bound one of the other terms in R k coming from (21).We note that there exists a constant C(d A , k) > 0 such that We now observe that the integral in (22) decays exponentially in z, since |z| 2 − 1 − ln |z| 2 is positive and bounded away from zero for z ∈ A (since A is admissible).The other error terms can be treated similarly; each has an integrand that decays exponentially.Introducing the notation g(z, w) = Re(z) Im(w) − Re(w) Im(z), using (21), and bounding the error terms as indicated in (22), we obtain (after observing some cancellation in the exponent) that for some constant c(d A , k) > 0. We now decompose k i=1 where ǫ(z 1 , . . ., z k ) is the sum of terms containing at least one copy of (z i − z i+1 ).We claim that all integrals arising from ǫ(z 1 , . . ., z k ) are negligible.The following computation demonstrates this for terms containing exactly one copy of (z i − z i+1 ); the other terms are bounded similarly (and are lower order).We have due to the exponential decay of the integrand on the set A k ∩ {|z 1 − z 2 | > N −1/2+δ }.We have where the last inequality follows by directly evaluating the integrals in the variables z 1 through z k−1 , then using the fact that area(A) ≤ 2.
After bounding these lower-order terms, (23) becomes We write R k as where and I j for j ≥ 1 is defined similarly to I 0 , with the integral over A × C k−1 replaced by one over A j × A c × C k−j−1 .I 0 is the leading-order term, and may be computed explicitly.
After the change variables by z i → z i + z k for i < k, the variable z k disappears from the exponent and may be integrated directly.After some simplification, we obtain Changing variables to polar coordinates by setting z i = r i e iθ i , and using the identity Next, we note that for every j such that 1 ≤ j ≤ k − 1, we have Recalling (11), have The variables z k−1 , . . ., z 3 can then be integrated directly using (11).By Lemma 2.5, We conclude that for j ≥ 1, Inserting ( 25) and ( 26) into ( 24) completes the proof.

Conclusion.
Proof of Theorem 1.3.By Lemma 3.3, for every n ∈ N the cumulant κ n (X A ) is equal to the pseudo-cumulant κ n plus an exponentially small error term.Then by Lemma 3.2, Lemma 3.6, Lemma 3.7, and induction, we have for every n ≥ 3 that lim We conclude that the limiting cumulants of N −1/4 X A are the same as the cumulants of a Gaussian random variable with variance 2 −1 π −3/2 ℓ(∂A).Since the cumulants of a random variable determine its moments, the limiting moments also match those of this Gaussian.
Because the Gaussian distribution is uniquely determined by its moments [1, Theorem

Technical Estimates
The following estimate improves [3, Lemma 9.2] by establishing a quantitative error term.It is implicit in [24,Remark 3.4]; we provide a short proof here for completeness.
A standard application of Laplace's method (see [36,Section 19.2.4,Theorem 1(a)]) implies that there exists a constant C > 0 depending only on c A , and consequently only on d A , such that (1 + R(z; N )), (32) where |R(z; N )| ≤ CN −1 .Combining (30) and (32) and recalling the definition of τ N in (28) completes the proof.

Lemma 4 . 1 .
Let A be an admissible domain.Define d A = inf{|z − 1| : z ∈ A}.Then there exists a constant C( d A ) > 0 such that for all z ∈ A, s

1 0( 1
− t) N −1 f (z; t) dt = 1 N (1 − z) [26,rk 3.8.The Lipschitz hypothesis in Theorem 1.3 was used only to compute the variance in Lemma 3.2.To relax this hypothesis, one only needs to compute the integral (6) for more general domains (with rougher boundaries).This can be done for Caccioppoli sets using[26, Corollary 3.1.4]andforthe Koch snowflake using[26, Theorem 3.3.2].We note that the proof technique for the latter result is applicable to many other domains with self-similar boundaries.