Rate of Convergence for products of independent non-Hermitian random matrices

We study the rate of convergence of the empirical spectral distribution of products of independent non-Hermitian random matrices to the power of the Circular Law. The distance to the deterministic limit distribution will be measured in terms of a uniform Kolmogorov-like distance. First, we prove that for products of Ginibre matrices, the optimal rate is given by $\mathcal O (1/\sqrt n)$. Avoiding the edge, the rate of convergence of the mean empirical spectral distribution is even faster. Second, we show that also products of matrices with independent entries attain this optimal rate in the bulk up to a logarithmic factor. In the case of Ginibre matrices, we apply a saddle-point approximation to a double contour integral representation of the density and in the case of matrices with independent entries we make use of techniques from local laws.


Introduction
The Circular Law states that the empirical spectral distribution of a non-Hermitian random matrix with i.i.d. entries converges to the uniform distribution on the complex disc as the size of the matrix tends to infinity. Interestingly, for the product of m independent matrices of such type, the limit distribution will be the m-th power of the Circular Law. Here we investigate the question: How fast does it converge? The case m = 1, corresponding to the Circular Law, was studied in [16] already.
We consider the product of m independent random matrices X (1) , . . . , X (m) , each of size n × n. For fixed m ∈ N, the asymptotic in n → ∞ will be of interest. Its empirical spectral distribution is given by µ m n = 1 n n j=1 δ λj (X) , where δ λ are Dirac measures in the eigenvalues λ j of the matrix X. In this note we are interested in two different classes of random matrices X (q) that appear in the product.

Definition 1.1. (i) A (complex)
Ginibre matrix X is a complex non-Hermitian random matrix with independent complex Gaussian entries X ij ∼ N C (0, 1).
(ii) A non-Hermitian random n × n-matrix X is said to have independent entries if X ij are independent complex or real random variables, and in the complex case we additionally assume Re X ij and Im X ij to be independent.
If X (1) , . . . , X (m) have independent entries satisfying E X (q) ij = 0 and E|X (q) ij | 2 = 1, then the empirical spectral distribution converges weakly to a deterministic probability measure on the complex plane as the matrix size grows. We denote by λ λ the 2-dimensional Lebesgue measure, by ⇒ weak convergence of measures and B r = B r (0) shall be the open ball of radius r > 0 centered at z = 0. In [21], Götze and Tikhomirov showed that as n → ∞, Palmost surely we have is the m-th power of the uniform distribution µ ∞ = µ 1 ∞ on the complex disc, see also [33]. The Gaussian case has been treated in [10,2], more general models can be found in [25,17,7,1,22], for the convergence of the singular values see [5] and furthermore for local results we refer to [31,24,32,18,12].
For m = 1, we retrieve the well known Circular Law µ n = µ 1 n ⇒ µ ∞ . In the case of Ginibre matrices this has been discovered much earlier in [15]. We are interested in the rate of convergence, more precisely in a uniform Kolmogorov-type distance over balls D(µ m n , µ m ∞ ) := sup |µ m n (B R (z 0 )) − µ m ∞ (B R (z 0 ))| as n → ∞, where the supremum runs over all balls B R (z 0 ) ⊆ C. In the sequel, we will also consider the supremum over certain families of balls B. Convergence in the distance D coincides with weak convergence in the case of an absolutely continuous limit distribution, see [16,Lemma A.1]. Using the rotational symmetry of the Circular Law µ ∞ and the mean empirical spectral distributionμ n = E µ n of the Ginibre ensemble, the following optimal rate of µ n for m = 1 was shown in [16].  Here and in the sequel we denote asymptotic equivalence by ∼. We write , if an inequality holds up to a n-independent constant c > 0 and A B if c |B| ≤ |A| ≤ C |B| for some constants 0 < c < C. These constants c, C may differ in each occurrence. Moreover we abbreviate log a b = (log b) a .

The optimal rate of convergence for products of Ginibre matrices
It is natural to ask for a generalization of Lemma 1.2 to products of m ≥ 1 independent Ginibre matrices. Notably, our analogous result shows that the optimal rate of convergence does not depend on the number m. (1.4) The following more detailed estimates hold as long as the boundary of the complex disk is avoided sup R<1− m 2 √ log n/n |μ m n (B R ) − µ m ∞ (B R )| log 3/2 n n (1.5) and uniformly in R > 1 + log n/n we have (1.6) Theorem 1.3 provides the optimal rate of convergence, which however is faster inside and much faster outside of the bulk. The precise constants of the upper and lower bound of (1.4) can be chosen to be C = √ π/ √ 2 and c = 1/( √ 2π), coinciding with Lemma 1.2. The constants in (1.5) and (1.6) do depend on m. We will also see that the maximal distance in (1.4) is attained at R = 1.
The restriction of Theorem 1.3 to centered balls does not affect its character of quantifying weak convergence: Since bothμ m n and µ m ∞ have radial symmetric Lebesgue densities, it is easy to see that (1.4) already implies weak convergenceμ m n ⇒ µ m ∞ . In order to remove this restriction one may exploit monotonicity arguments of the radial part ofμ m n . While the proof of Lemma 1.2 is an elementary calculation, the proof of Theorem 1.3 is more involved and relies on a saddle-point method of a double contour integral representation for the density ofμ m n , which we will give in Section 2. An idea of the proof is given after the contours are defined, see Figure 2. Figure 1 illustrates the statements of our main results.
Pointwise convergence of the density ρ m n ofμ n orμ m n has been also discussed in [2,4,35]. In particular Akemann and Burda describe the asymptotic for fixed z without specifying the error. Aside from the error o(1), the appearance of erfc(·) also hints at exponential convergence, like in (1.3) and (1.6). Note that Akemann and Cikovic mention an algebraic rate of convergence for the fixed trace ensemble and conclude that the exponential rate of convergence for Ginibre matrices is rather special.
Chafaï, Hardy and Maïda studied invariant β-ensembles with external potential V instead of matrices with independent entries, see [11]. Their result implies a rate of convergence to the limiting measure with density c∆V of order O( log n/n) with respect to the bounded Lipschitz metric and the 1-Wasserstein distance. Similar questions in this context of log-gases, but for the non-uniform variant of D (the discrepancy) have been addressed in [34]. Moreover, for determinantal point processes, the fluctuation around the (deterministic) rate has been studied in [13]. Theorem 1.6 below shows that gaps (like one can see at the top) and clusters do not significantly differ from the limit distribution. Right: The radial part of the densities ofμ m n for n = 15 in blue and of the limit distribution µ m ∞ in orange. Clearly the rate of convergence in the bulk is faster than close to the edge, illustrating the statement of Theorem 1.3.

Non-averaged rate of convergence for products of Ginibre matrices
Note that we cannot expect an exponentially fast rate of convergence like in Lemma 1.2 for the non-averaged empirical spectral distribution µ m n , because it is still affected by the individual eigenvalue fluctuations. In particular the rough lower bound D(µ m n , µ m ∞ ) 1/n follows from placing a disk of radius cn −1/2 contained in B 1 (0) such that it does not cover any eigenvalue. Heuristically, the typical distance of n uniformly distributed eigenvalues in the Circular Law is n −1/2 , therefore one may vary B R (z 0 ) up to a magnitude of n −1/2 without covering a new eigenvalue. Hence we should also expect the non-averaged distance D(µ m n , µ m ∞ ) to be of order O(n −1/2 ) with overwhelming probability.
Theorem 1.4. For products of m Ginibre matrices and any ε, Q > 0 there exists a constant c > 0 such that This rate coincides exactly with the Wasserstein rate obtained in [11]. The proof makes use of the determinantal structure of the eigenvalues of Gaussian matrices in order to apply Bernstein's inequality and will be given in the end of Section 2.

Rate of convergence for products of non-Gaussian matrices
Based on the ideas of [16], we will also prove a rate of convergence result for products of matrices with independent entries. Definition 1.5 (Condition (C)). We say X = X (1) · · · X (m) / √ n m satisfies condition (C) if the matrices X (q) , q = 1, . . . , m, have jointly independent entries and satisfy E X for some ε > 0 independent of n and furthermore In Section 3, we will show Theorem 1.6. If condition (C) holds, then for every m ∈ N and τ, Q > 0 there exists a constant c > 0 such that where the supremum runs over balls B ⊆ B 1−τ ∪ B c 1+τ and the asymptotic error is given by In the proof of Theorem 1.6 we will see that the m-dependent term is only visible for balls touching the origin and else only the error h 1 remains for all m. To make the statement more comprehensible when comparing with Theorem 1.3, we also state the following result. Corollary 1.7. If condition (C) holds, then for every τ, Q > 0 we have where the supremum runs over all balls B such that ∂B R (z 0 ) ⊆ B c 1+τ ∪ B 1−τ \ B τ avoids the edge and the origin.
We already know from the previous discussion that the optimal rate is given by . Corollary 1.7 shows that this rate is (almost) universal in the sense that it continues to hold for matrices with independent entries, if edge and origin are avoided.
A weaker rate of convergence for m = 1 has already been established in [16, §2.2], where a comparison with similar results can be found as well. We would like to point out a subtle difference between Theorem 1.6 and the Local Law in [18]; the latter compares an integral over a smooth function with the limiting distribution on a shrinking support for a fixed point z 0 , while the former allows to choose the "worst ball" B R (z 0 ) depending on the random sample of eigenvalues.

Product of Ginibre matrices
We start with the following double contour integral representation for the density of µ m n that is essential for Theorem 1.3.
where γ is any closed contour that encircles the numbers 1, . . . , n counter clockwise and no natural number greater than n.
Rate of convergence for products of random matrices In [26], a similar double contour integral representation for the correlation kernel of the singular values of X was derived. This was used in [27] to prove bulk universality for singular values of products of independent Ginibre matrices. In general, double contour integrals like in Lemma 2.1 above appear regularly in the theory of products of random matrices, e.g. [26,14,23].
In the sequel we will make use of the Meijer G-function, which for ξ ∈ C \{0} is defined as the Mellin inverse of products of Gamma functions . . , n and j = 1, . . . , m. The contour L goes from −i∞ to i∞, but can be chosen arbitrarily as long as the poles of Γ(b j − t) are on the right hand side of the path and the poles of Γ(1 − a j − t) are on the left hand side.
Since the Gamma function is the Mellin transform of the exponential function, we have In particular the density ofμ m n is given by see [2] and compare to the case m = 1, where G 10 . We would like to point out that the Coulomb gas picture partially breaks down here, 2 is not admissible (according to [34]). In particular V is unbounded from below in its singularity and hence ∆V ∼ µ m ∞ fails for finite n.  In this case, the rate will be faster, depending on m, and in particular by setting m = n, we expect a rate On the other hand, µ m ∞ converges weakly to µ ∞ ∞ = δ 0 as m → ∞ and for fixed R > 0 we see Note that m ∼ n is also the critical scaling of Lyapunov exponents between deterministic and GUE statistics, see [3,28].
The remaining part of (2.2) is the kernel of the (monic) orthogonal polynomials with respect to the Meijer-G-weight. It can be rewritten with the help of the residue theorem.
For any closed curve γ encircling the numbers 1, . . . , n but no natural number greater than n, we have since the integrand is holomorphic except for its simple poles in N with residues 1/π of each cotangent. Combining both contour integrals proves the claim.
Asymptotic expansions of G m,0 0,m , like [29, §5.9.1.], together with heuristics for the hypergeometric kernel give rise to pointwise limits in [2]. A rigorous estimation of the error bound uniformly in z seems to be absent in the literature so far. Hence it is reasonable to study the problem by a direct analysis.
This holds in the case where s and t have distance bounded from below, which is what we will choose in the following. We will now show that shifting the vertical contour in Lemma 2.1 to L = [η − i∞, η + i∞] for another real part η ≥ 1/2, η = 1, . . . , n, produces an additional term. Cauchy's integral formula implies We temporarily split γ into two parts γ l and γ r such that γ l encircles {1, . . . , η ∧ n} and γ r encircles { η , . . . , n}. Soon we will make the path of γ more explicit. As in (2.5), continuing the integration of the right hand side of the last equation multiplied with cot(πt) over γ l ∪ γ r yields π 2πi γ l cot(πt)dt = η ∧ n, Moreover, by Cauchy's integral formula, we may artificially add the removed part γ −γ l −γ r again as long as γ is symmetric around the x-axis. Let γ be the rectangular contour connecting the vertices 3/4 − i, n + 1/4 − i, n + 1/4 + i and 3/4 + i. This ensures a constant distance to the singularities of the cotangent. The scaled version γ = γ/(R 2/m n) is illustrated below in Figure 2. Furthermore note that the integral exists as we will explicitly show below, see (2.18). Recall Stirling's formula for the Gamma function which holds uniformly for Re z ≥ 1/2, cf. for instance [36, p.249]. Thus, we have where for all s ∈ L (and analogously for t ∈ γ) the error term is at most O(1). We rescale the integration by nR 2/m and denote γ = γ/(nR 2/m ), L = L/(nR 2/m ) as well as Re Im γ vert Q δn (1) Figure 2: The scaled contours of integration and their local parts in thicker lines. As we will see later, the main contribution comes from γ loc that is in a box δ n -close to the saddle point at z = 1. If R > 1, there is no γ loc and the integral vanishes exponentially fast (depending on the distance |R − 1|). If R < 1, then both horizontal contours of γ loc will cancel, because of their symmetry. In this case we will obtain a rate of convergence of microscopic order 1/n due to the discrete nature of the residues. The maximal rate will be attained for R = 1, where the integrals do not cancel, yet the vertical part γ vert is small enough.
Observe that O(1/(n Re(t))) = O(R 2/m ) and O(1/(n Re(s))) = O(1/n). We will analyze this main formula using the method of steepest descent, hence we are interested in the saddle points of F . The saddle point equation simply reads Rate of convergence for products of random matrices and is obviously satisfied only for z = 1 only, with F (1) = 1 > 0. Denoting z = x + iy, the Cauchy-Riemann equations for F imply (2.10) Hence Re F attains its local maximum F (1) = −1 in y-direction and its minimum in as n → ∞ by our assumption (2.4). Note that γ is 1/(nR 2/m ) = O(δ 2 n )-close to the real axis and the vertical path L is equally close to the saddle point z = 1. The local parts of the paths are given by L loc = L ∩ Q δn (1) and γ loc = γ ∩ Q δn (1), as well as L c loc and γ c loc denotes the remaining part of the path (under slight abuse of notation). Let us collect the necessary bounds for each part of the contour by applying a Taylor (2.11) and similarly for t ∈ γ loc On the other hand for s ∈ L c loc by using (2.9) it holds and for t ∈ γ c loc we see from (2.10) (2.14) The nonlocal terms are negligible, e.g. we apply (2.12) and (2.13) to obtain where the last step once more follows from the assumption (2.4). Analogously we obtain from (2.11), (2.14) Locally close to z = 1, the error term of Stirling's formula (2.8) is O(n −1 ). Thus, it remains to control where we used (2.11), (2.12) and t/s = 1 + O(δ n ). We parameterize L loc as the straight The vertical microscopic part receives the same scaling, e.g. for the right part we choose This part of the integral (2.16) on γ vert is visible if and only if R is close to 1. In this case, we drop the part in t and the exponential function becomes e −u 2 /2 . Using R ∼ 1 and |cot(π/4 + ix)| = 1 for x ∈ R, the integration over γ vert can then be bounded by where in the second step we shifted u by v = O(1/ √ n) and used m(n−η+1/4) 2 /R 2/m 1. The last step follows from the asymptotics of the modified Bessel function K 0 (1/(4n)) or more elementarily by splitting the integration into |u| ≶ 1. From δ n → 0 and 1/(nR 2/m ) → 0 it follows that the left vertical path is not contained in Q δn (1).
We parameterize the remaining path of γ loc \ γ vert as horizontal lines where I is the part of I such that the corresponding contour overlaps γ. The inte-Rate of convergence for products of random matrices gral (2.16) becomes the sum of two terms, which can be estimated as Recalling the correct prefactor c = −1/4π from (2.8), we conclude In order to obtain the lower bound of the claim, it suffices to consider R = 1. Moreover we will only study the sum of (2.18) keeping the phase factor of the integrand, since all the other parts of the double contour integral are proven to be of strictly lower order than o(n −1/2 ). We have the approximation · (tan(iπ − π n/mv) + tan(iπ + π n/mv))dudv + o(n −1/2 ).
In the last step we first proceeded similarly to (2.18), i.e. we shifted v by a negligible value ± √ m/(2 √ n) and u by ± m/n, and then inverted v → −v in the second integral in order to merge both and put cot(π/2 − z) = − tan(−z) = tan(z). Note that sup x∈R |tan(iπ − x) + tan(iπ + x) − 2i| =: ε < 0.01 and that the left hand side of the previous equation is real, hence The same upper bound holds with 1 + ε instead. For a better control of the constant one may vary the distance of γ to the real axis from the start. The above asymptotic yields the first claim and coincides with m = 1 from Lemma 1.2.
If we avoid the edge by some distance 1 − R 1/m log n/n > 0, then δ n < 1 − 1/R 2/m . Hence γ vert is not a part of γ loc and (2.17) drops out. This is the case for R < 1 − m 2 log n/n, for which we have I = I. As before, we shift u and v by O(1/ √ nR 2/m ), now we denote the shifted intervals by I 1 , I 2 and obtain · tan(iπ − π nR · tan(iπ − π nR 2 m /mv) − tan(−iπ + π nR 2 m /mv) dudv.
As we have already showed and used in (2.18), the error terms of the shifted variables in the exponential function are absorbed in the error term O( log 3 n/n). Furthermore, the deviation between the integrals over I instead of I 1 , I 2 are obviously negligible as well. From the last line it follows that the horizontal contour integrals over I cancel due to symmetry, thus only the error term remains. Collecting this error, together with the non-local terms, we obtain Lastly we turn to the statement about the exponential decay for R > 1 + log n/n. As before it is sufficient to consider R ≤ 2, because of supp(µ m ∞ ) = B 1 . The position of the minimum in x-direction in (2.10) and Re(t) ≤ (n + 1)/(nR 2/m ) for t ∈ γ yield Re F (t) = Re F (Re(t)) + O(1/n) ≥ In particular, we used the fact that |t − s| behaves like |Im(s)| for s ∈ L c loc and is bounded from below by Re(s − t) log n/n |Im(s)| for s ∈ L loc , since Re(t) is small. The remaining integral over t ∈ γ is finite, since γ has finite length and the integral over s ∈ L is finite, hence we may use Bernoulli's inequality to conclude where the last inequality holds for 1 < R < 2 by a Taylor approximation of the logarithm.
Ultimately all claims are proven.
It seems to be an artifact of the method of proof that we do not obtain an exponential rate of convergence inside the bulk in the case of products of Gaussian random matrices.
Only the rate O(1/n) seems to be achievable due to the discrete nature of the residue calculus, cf. for t n = c log n/n and fixed 0 < R < 1. Set p = m/4 and consider the equidistant points r k = k/n p for k = 1, . . . , n. We have where we chose k = Rn p . Taking the supremum over R is equivalent to taking the maximum in k, hence the union bound implies has the same distribution as the sum of Bernoulli variables, where the parameters η k are given by the eigenvalues of the trace class operator . Due to rotational symmetry, like we argued for the orthogonality of the monomials, K R has eigenfunctions ϕ k (w) = G m,0 It follows from Theorem 1.3 that for t sufficiently large, since E |ξ k | 2 ≤ η k ≤ 1. In particular for t = c √ log n/5 with c = 5 3(Q + m/4 ), we obtain (2.20) as claimed.
Note that the logarithmic factor in the rate of convergence is expected to be nonoptimal and might be removed by controlling the variance term (2.21), but this will not be pursued here.

Matrices with independent entries
Let us begin with some necessary notations. We define the linearization matrix as the mn × mn-block matrix and note that W m is a block diagonal matrix with cyclic products of X (1) , . . . , X (m) . Consequently, its eigenvalues consist of the eigenvalues λ j (X) of X with multiplicity m.
Furthermore define its shifted Hermitization for z ∈ C. The eigenvalues of V(z) are given by ±s j (W − z), where s max = s 1 ≥ · · · ≥ s mn = s min are the singular values of W − z. Its ESD shall be denoted as ν z n . Similar to the role of the Stieltjes transform in the theory of Hermitian random matrices, the weak topology of measures µ on C can be expressed in terms of the so called logarithmic potential U , which is the solution of the distributional Poisson equation. More precisely for every finite Radon measure µ on C the logarithmic potential defined by U µ (z) := − C log |t − z| dµ(t) = (− log |·| * µ)(z) (3.2) and it satisfies ∆U µ = −2πµ (3.3) in the sense of distributions. Let U n denote the logarithmic potential of the ESD of W. The advantage of the logarithmic potentials U n of µ n in non-Hermitian random matrix theory is the following identity known as Girko's Hermitization trick Under the above-mentioned conditions on the matrix entries, the logarithmic potential U n concentrates around the logarithmic potential U ∞ of the Circular Law given by Proposition 3.1 ([18]). If X obeys (C), then for every τ, Q > 0 there exists a constant c > 0 such that We remark that the restriction to the bulk in Theorem 1.6 comes directly from the restriction in the previous Proposition 3.1.
Since the statement of Proposition 3.1 is not explicitly worked out in [18], we will derive it now, based on the results proved in this paper. We will directly follow the approach of [18], making use of Girko's Hermitization trick to convert the non-Hermitian problem into a Hermitian one, apply the local Stieltjes transform estimate from [18] and the smoothing inequality from [19]. Let be the Stieltjes transform of ν z n , which converges a.s. to the solution of s(z, w) = − w + s(z, w) (w + s(z, w)) 2 − |z| 2 (3.6) see for instance [20]. It is known that s(z, ·) is the Stieltjes transform of a measure ν z , which appears to be the weak limit of ν z n as n → ∞, cf. [18]. Moreover, ν z has a symmetric bounded density ρ z (the bound holds uniformly in z) and has compact support. We refer to [9, §4] for more information on s and ρ z . Note that s will be unbounded from below for w close to the origin and z close to the edge of the support of µ m ∞ , which is the reason for the bulk constraint of Proposition 3.1.
Proof of Proposition 3.1. Fix some arbitrary Q, τ > 0 and z ∈ B 1+τ −1 satisfying |1 − |z|| ≥ τ . As is explained in Girko's Hermitization trick (3.4), we may write Due to the singularities of log |x| at x = 0 and x = ∞, it is not enough to simply control the convergence of ν z n → ν z , but it is also necessary to estimate the extremal singular values. The rate of convergence of ν z n to ν z shall be measured in Kolmogorov distance where the operator norm · has been estimated by the Hilbert Schmidt norm. Thus, there exists a constant B > 0 with P(Ω c 1 ) n −Q . Since ν z has a bounded density, we get and furthermore on Ω 2 it holds that Hence, the claimed concentration of U n holds on Ω 0 ∩ Ω 1 ∩ Ω 2 , implying It remains to check P(Ω c 2 ) ≤ n −Q , which has been done explicitly in [18, (4 [19]) and the Local Law for V(z) in terms of its Stieltjes transform.
The core of the proof of the local law for products of non-Hermitian matrices in [18] is the following identity. First, for any function f ∈ C 2 c (C) define f by f (z) = f (z m ) and note that f dµ m ∞ = f dµ 1 ∞ , which follows from the definition of µ m ∞ in (1.1). Using the distributional Poisson equation (3.3) as usual in non-Hermitian random matrix theory and the representation of the eigenvalues of W, we get In Local Laws [31,18], locally shrinking functions f z0 have been considered and for fixed global f Gaussian fluctuation has been proven in [24]. As was done in [16], we would like to uniformly approximate all indicator functions 1 B R (z0) by smooth functions, replace the right hand side of (3.8) by a discrete random sum and use the pointwise estimate from Proposition 3.1. In contrast to [16], we cannot use the smoothing inequality [16, Theorem 2.1], since it would rely on the distance of U ∞ to the logarithmic potential of the matrix X, while instead we can control the logarithmic potential U n of the linearization matrix W only. Therefore we will use a direct approach. In this case, we can replace the Monte Carlo approximation of the integral (3.8) by a random grid approximation. This also yields a more precise rate of convergence in Theorem 1.6. Here, and in the sequel we will identify R 2 C via (u, v) → u + iv in order to simplify notation. Lemma 3.2. Let α, β > 0 and S be a random variable uniformly distributed on [0, 1] 2 . Fix complex numbers λ 1 , . . . , λ n ∈ C with corresponding logarithmic potential U of their empirical distribution µ n = 1 n n j=1 δ λj . Furthermore, define the random grid with overwhelming probability. Moreover, if S is chosen independently of the random matrix W, then Note that (3.9) holds uniformly in f ∈ C 3 c (C), hence we could choose a function depending on the positions of λ j . In order to make the statement more intuitive, suppose we replace the logarithmic potential U by a more regular function U ∈ C 1 . Then (3.9) is nothing but Riemann approximation of the integral This follows directly from the mean value theorem, very similarly to what we will do in (3.12) below.
In the Monte Carlo approximation used in [35] and [24], the random points z i are not ordered in a grid but drawn independently, thus variance bounds are of importance for improving bounds as (3.9). By using reference points or eigenvalue rigidity, the error estimates in [35] and [24] are stronger by a factor of 1/n for the same number of points z i . On the other hand, in order to control the singularities of U n , one has to handle many random effects of all z i , whereas in (3.9) only a single random shift affects all points z i .
Heuristically speaking, this leads to a higher probability than in previous approaches, so that the weaker error bound is negligible.
It suffices to show that with probability at least 1 − n − log n−1 we have for fixed λ ∈ C, since the claim then follows from summation over j and the union bound.
The situation is illustrated in Figure 3.
(0, 0) Figure 3: The distance of λ to random grid A is illustrated by the gray doubled arrow. This is equal to the distance of the rescaled random shift 2βn α/2 S to λ − z * (measured in the quotient space), where z * is the reference corner of the box containing λ.
From now on we will restrict ourselves to this event Ω * . Rewrite (3.11) as where we denoted the boxes with corner z i by K i = z i + [0, 2βn −α/2 ] 2 . Adding and substracting ∆f (z i ) log |λ − z|, we obtain one error of order where we used the mean value theorem and local integrability of log in C. The second term can be bounded by Applying the mean value theorem for log, yields a bound of order O(n −α/4 ) for the first sum. The second sum can be bounded by performing the integration in polar coordinates where we finally used the event Ω * . Putting all estimates together proves the first claim. The bound for U n follows from the choice of A and a trivial upper bound on the spectral radius |λ| max of W. On the one hand |λ| max is bounded by the largest singular value s max and on the other hand we have P(s max ≥ n log n ) ≤ E (W − z) 2 n −2 log n n −2 log n+1 , similar to (3.7). Therefore on the event Ω * = Ω * ∩ {s max ≤ n log n } with probability at least 1 − O(n − log n ), we have (3.9) and sup zi∈A |U n (z i )| ≤ 1 n n j=1 sup zi∈A log |λ j − z i | (log n + 1 + α) log n + log |λ| max + 5 = O(log 2 n).
Rate of convergence for products of random matrices Moreover, we mollify the indicator function 1 B R (z0)∩V appearing in (3.13) via the approx- where we choose f 1 ≡ 0 if R ≤ 2/a for smoothness reasons.
We apply f 1 ≤ 1 B R (z0)∩V and integration by parts to f 1 : z → f 1 (z m ) as was explained (3.14) Analogous upper bounds hold for f 0 and f 2 . A rough estimate of the error of approximation yields for the second term The distribution of the annulus (3.15) is bounded by 10 times the maximal probability of an annulus segment with outer circumferential length 2πR/10 ∧ 2. By radial monotonicity of µ m ∞ , this particular "worst case" 1/10-th segment S is given by one that is closest to the origin and by radial symmetry we assume it to be symmetric with respect to the x-axis. We want to bend (or rather project) S onto the y-axis. Note that the angle between the y-axis and any tangent to the circumference line of the segment S is always bounded by π/10. Split our segment S in at most ca many pieces each of outer circumferential length 10/a and shift them in x-direction until they touch the y-axis (i.e. "projecting" the pieces). This transformation only increases the probability since we may only decrease the radial part of all points in each piece. Now since the mentioned angle is bounded, at most a constant number of the shifted segments overlap. One can show that this constant is actually equal to 2, or in other words non-neighboring shifted pieces are disjoint. Therefore, the total probability in (3.15) is bounded by   Let us continue to estimate the first term of (3.14) by using our random grid approximation Lemma 3.2. Let β = 7 and S be a random variable, independent of X and uniformly distributed on [0, 1] 2 . Conditioned on X, we have with overwhelming probability Due to our explicit choice of functions f 1 and f 2 as product of shifted radial symmetric functions, the partial derivatives become fairly simple. Each derivative that hits one of the ϕ a produces a factor of a, more precisely any k-th directional derivative satisfies ∂ (k) f 1 (z) ∞ a k . This estimate, again, is independent on the choice of the ball B R (z 0 ). Together with the Riemann approximation (3.10), we conclude that for any matrix X we have with overwhelming probability = O (a 3 + a 2 log 2 n)n −α/2 , where the supremum runs over all choices of B ⊆ B 1−τ ∪ B c 1+τ . Since we will always choose a n (actually we will make it even smaller, cf. (3.17)), it is possible to freely choose α > 0 sufficiently big such that the error is arbitrarily small. For instance α = 13 is more than enough to ensure that the error is of order O(n −1 ). It should be emphasized that still no randomness of X has been used and the only randomness is the shifted grid.
Combining the previous steps yields In the same way, the bound holds for µ m n (V c ) as well. Finally we use the randomness of X by applying Proposition 3.1. Conditioning on S, i.e. freezing the lattice points z i , we obtain for any Q > 0 P |U n (z i ) − U ∞ (z i )| ≥ c log 4 (n) n S ≤ n −Q−α for each i = 1, . . . , n α . By the union bound this implies that with probability at least 1 − n −Q the logarithmic potentials concentrate like U n (z i ) − U ∞ (z i ) = O(log 4 n/n) simultaneously at all lattice points. Therefore, for k = 0, 1, 2, where the integral of the a 3 -Lipschitz function |∆ f k | has been approximated by its Riemann sum. Write ∆ = 4∂∂ in terms of the Wirtinger derivatives ∂ = 1 2 (∂ x − i∂ y ) and ∂ = 1 2 (∂ x + i∂ y ). Since g(z) = z m is holomorphic, i.e.∂g = 0, we obtain by applying the chain rule and changing variables from z to g(z) ∆ f k L 1 = 4∂∂(f k • g) L 1 = 4 (∂∂f k ) • g ·∂ḡ · ∂g) L 1 = ∆f k L 1 .
Since ∆f k a 2 and has support on an area of order a −1 , we have  Optimizing in a yields a = √ n/ log 2 n for m = 1, as well as h 2 (n) = log 3 n/ √ n. The asymptotic h m (n) for higher m follows from choosing a = n m/m+2 log −4m/(m+2) n.
In the proof we have seen that the maximal error for the limiting distribution µ m ∞ is by balls, which touch the origin (technically these balls of growing size are not even admissible here). This yields the non-optimal rate in Theorem 1.6 if we do not exclude the origin. Having Theorem 1.3 in mind however, we expect the maximizing ball to appear roughly at B 1 (0), where the error would be optimal again.
Proof of Corollary 1.7. As long as the origin is avoided by a fixed distance τ , the density of µ m ∞ is bounded as in the case of m = 1. Therefore, the only adjustment to the previous proof of Theorem 1.6 is that the error term in (3.15) is now also given by h m (a) = O(a −1 ).
Since the remaining part of the proof remains untouched, Corollary 1.7 follows.