Optimal multi-resolvent local laws for Wigner matrices

We prove local laws, i.e. optimal concentration estimates for arbitrary products of resolvents of a Wigner random matrix with deterministic matrices in between. We find that the size of such products heavily depends on whether some of the deterministic matrices are traceless. Our estimates correctly account for this dependence and they hold optimally down to the smallest possible spectral scale.

and again the error terms are optimal. Note that these error terms match the differentiation procedure; indeed ( . ) can formally be obtained by "differentiating" ( . ). Such algebraic ideas, however, do not help much further if we ask for concentration of the alternating product G(z1)B1G(z2)B2G(z3) . . . B k−1 G(z k ) ( . ) of resolvents and deterministic matrices B1, B2, . . ., and more generally for where fi's are arbitrary functions on R. The product ( . ) still concentrates but its deterministic approximation, denoted by M (z1, B1, z2, . . . , B k−1 , z k ), is non-trivial even for the Wigner case and it was identified only recently in [ , Theorem . ] (however, formulas for traces of ( . ) when fi's are polynomials have already been obtained within free probability theory, see e.g. [ , Theorem . . ] or [ , Sect . Thm .]). The main result of the current work is to prove the optimal error term for this approximation and thus to establish the optimal local law for any product of the type ( . ) when H is from the Wigner ensemble (Theorem . ). These optimal multi-resolvent local laws will then be used to establish the universality of the Gaussian fluctuations of ( . ) in subsequent works. To keep the current paper focused, we present here only one simple application of our new local law to improve our control on the thermalisation effect of the Wigner matrices (see Remark . below).
In connection with CLT for linear eigenvalue statistics, special cases of tracial local laws for ( . ) for k = 2, 3 have been proven in [ , , , , , , ]. These results, however, considered the special Bi = I case, where resolvent identities can directly reduce the number of G's. More importantly, the accurate analysis of the case with general B's must handle traceless B's separately as we explain in the next subsection.
. . The role of the traceless matrices. The major complication for the multi-resolvent local law is that the size of M (z1, B1, z2, . . . , B k−1 , z k ) heavily depends on whether some of the matrices Bi are traceless or not, and the error term must match the size of M to be considered optimal. For example, if B1 = B2 = . . . = B k−1 = I, then M (z1, I, z2, . . . , I, z k ) ∼ (1/η) k−1 with η := min |ℑzi| in the interesting regime where η 1, and the corresponding local law | G(z1)G(z2)G(z3) . . . G(z k ) − M (z1, I, z2, . . . , I, is optimal (up to N ǫ ) for η 1. Note that the error term is by a factor N ǫ /N η smaller than the deterministic approximation, hence ( . ) proves concentration for any η ≫ 1/N . Exactly the same estimate holds for ( . ) with general deterministic matrices Bi with Bi = 1 instead of Bi = I, see [ , Theorem . ]. However, if all B1, B2, . . . , B k are traceless, Bi = 0, then in the η 1 regime typically M (z1, B1, z2, . . . , B k−1 , z k )B k ∼ 1 η ⌊k/2⌋−1 , therefore N ǫ /(N η k ) in ( . ) is much bigger than the deterministic approximation. This indicates that the robust error term proven in [ , Theorem . ] for general matrices is far from being optimal when traceless matrices are involved, but it does not give a hint what the optimal error term should be. The correct answer, in a heuristic form, can be formulated by the following rule of thumb that we coin the √ η-rule (in the η 1 regime): √ η-rule: Each traceless matrix Bi reduces both the size of M and the error term by a factor √ η.
Establishing the √ η-rule for M is relatively straightforward given its explicit form, but for the error term it is much harder -this is the main content of the current paper. The special role of a traceless deterministic matrix even for the single resolvent local law was observed only recently in [ ], where it was shown that if B = 0 in contrast to the much bigger error of order 1/(N η) for general B in ( . ). In fact, G − m has two different fluctuation modes, a tracial and a traceless one, expressed somewhat informally in the following two-scale central limit theorem where ξ1 and ξ2 are independent Gaussian variables andB := B − B is the traceless part of B. The asymptotics ≈ in ( . ) is understood in the sense of all moments and in the limit as N η ≫ 1; see [ , Theorem . ] for the precise statement.
Tracking the influence of the traceless deterministic matrices in multi-resolvent local laws for Wigner matrices played an essential role in our proof of the Eigenstate thermalisation hypothesis [ ], and in the functional central limit theorems to understand the fluctuation modes of f (W ) as a matrix [ ]. However, in these papers only two-and three-resolvent local laws were necessary and suboptimal error was sufficient. For example, a key technical ingredient in [ ] was the local law smallness effect needs to be consistently tracked along the whole expansion. For example, in the main technical Theorem . in [ ], we meticulously counted the number of "effectively" traceless B factors, struggling with the complication that some B factor becomes B 2 along the cumulant expansion, losing its smallness effect. Even suboptimal error terms for small k as in ( . ) required major efforts and the general case was out of reach.
Our new method drastically simplifies this procedure using two unrelated ideas. First, the large Feynman diagrammatic representation is actually due to an overexpansion of the fluctuating error term which can be considerably reduced if one expands "minimalistically", so to say. In the context of single resolvent averaged local laws this idea appeared first in [ ], coined as recursive moment estimates, we will use this philosophy for the multi-resolvent situation and also for the isotropic case.
Second, the fundamental concern in the proofs of multi-resolvent local laws is how to truncate the resulting hierarchy involving longer and longer chains of the form GBGB . . . G. The cumulant expansion for a chain of length k as in ( . ) will contain chains of length up to 2k. For the single resolvent local law, k = 1, this problem is usually solved by the Ward identity GG * = ℑG/η, immediately reducing longer chains to a single resolvent. If traceless matrices are in between G's such identity is not directly applicable. In [ ] we solved this problem by considering the positive quantity Λ 2 := ℑGBℑGB for traceless B and estimated all longer chains in terms of Λ, to arrive, finally, at a simple Gromwall-type inequality for Λ, roughly of the type from which Λ 1 immediately follows. The reduction of longer chains to Λ's involved a careful Schwarz inequality within the spectral decomposition of H, for example for an averaged chain involving 2k resolvents (using ℑG's instead of G for illustrational simplicity) we used Here λi and ui are the eigenvalues and the orthonormal eigenvectors of H, respectively. The size of the l.h.s., based upon its deterministic approximation ( . ), is η −k+1 , while the r.h.s. is of order N k−1 hence this inequality lost a factor (N η) k−1 . Very roughly, each summation in ( . ) effectively runs over N η different i indices and if each summand were independent, then an effective central limit theorem would reduce the size by a factor 1/ (N η) 2k = (N η) −k , in reality this effect is weaker by a factor N η. Nevertheless, for larger k's this loss in the Schwarz inequality in ( . ) cannot be recovered from the smallness of higher order cumulants, which eventually results in suboptimal error terms in the local law in [ ]. Another complication is that the bound ( . ) is also needed for (GB) k . Since spectrally G is much less localized than ℑG, technically we could not do the analysis locally in the spectrum and Λ was actually defined after taking a supremum over the real parts of the spectral parameters zi in G's.
The basic objects in the current paper are the appropriately rescaled versions of the differences (GB) k − M k B between alternating chains of length k and their deterministic counterparts M k . More precisely, we set and its isotropic version Ψ iso k is defined similarly. The general definition allows for different spectral parameters and different B matrices in the GBGBGB.... chain but we ignore this technicality here. The rescaling is chosen such that Ψ av,iso k 1 corresponds to the optimal local laws to be proven. The "minimalistic" cumulant expansion applied directly to the moments of Ψ's generates further chains of alternating products of resolvents and B's. Each of them is expressed as their deterministic "main term" M plus the error term involving Ψ's, i.e. for this purpose we write ( . ) as and similarly for matrix elements [(GB) k ] ab . The explicit M k terms can be directly estimated, leaving us with a nonlinear infinite hierarchy of coupled master inequalities for Ψ av k and Ψ iso k for each k (Proposition . ). The estimate for Ψ k still contains terms involving Ψ 2k since the cumulant expansion generates longer chains. This time, however, we truncate the hierarchy in the most economical way; roughly speaking a chain of length 2k is split into two chains of length k instead of k chains of length two as in ( . ). Hence many fewer N η factors are lost in the analogue of ( . ); the loss is only (N η) 2 for the averaged bounds and N η in the isotropic bound, independently of k (see Lemma . below).
Even after the reduction of longer chains to shorter ones, the new truncated system of master inequalities cannot be closed by a simple algebra, in contrast to the single inequality ( . ) derived for Λ. We first prove a non-optimal a priori bound Ψ av,iso k √ N η for all k with a step-two induction argument and successively improving the power of N η in each step. Then we start the procedure all over again, but now we will not use the reduction of Ψ 2k 's back to Ψ k 's that would cost us (N η) or (N η) 2 factors; we rather use the already proven a priori bound Ψ 2k √ N η that loses only √ N η. It turns out that such a loss can finally be compensated by the smaller size of the higher cumulants. Summarizing, the key conceptual novelty in the current approach compared with [ ] is twofold. First, in [ ] we operated with upper bounds on size of the chains, like ( . ), while now we operate on the level of the much more precise Ψ's measuring the fluctuations of the chains, i.e. their deviations from their deterministic counterpart. This enables us to determine the leading order term for resolvent chains of any length, and perform a more accurate analysis purely on the level of sub-leading deviations. Second, longer chains are split only into two smaller chains, yielding much less (N η)-factors lost. However, the price for this higher accuracy is that we need to handle a new infinite system of inequalities for the Ψ's. Finally, two important technical differences are that (i) we can work locally in the spectrum and (ii) now we use the minimalistic cumulant expansion that considerably shortens the argument.

Notation and conventions.
We introduce some notations we use throughout the paper. For integers l, k ∈ N we use the notations for k < l. By ⌈·⌉, ⌊·⌋ we denote the upper and lower integer part, respectively, i.e. for x ∈ R we define ⌈x⌉ := min{m ∈ N : m ≥ x} and ⌊x⌋ := max{m ∈ N : m ≤ x}. For positive quantities f, g we write f g and f ∼ g if f ≤ Cg or cg ≤ f ≤ Cg, respectively, for some constants c, C > 0 which depend only on the constants appearing in the moment condition, see ( . ) later. We denote vectors by bold-faced lower case Roman letters x, y ∈ C N , for some N ∈ N. Vector and matrix norms, x and A , indicate the usual Euclidean norm and the corresponding induced matrix norm. For any N × N matrix A we use the notation A := N −1 Tr A to denote the normalized trace of A. Moreover, for vectors x, y ∈ C N and matrices A ∈ C N×N we define Axy := x, Ay .
We will use the concept of "with very high probability" meaning that for any fixed D > 0 the probability of an N -dependent event is bigger than 1 − N −D if N ≥ N0(D). Moreover, we use the convention that ξ > 0 denotes an arbitrary small constant which is independent of N . We introduce the notion of stochastic domination (see e.g. [ ]): given two families of non-negative random variables ) indexed by N (and possibly some parameter u) we say that X is stochastically dominated by Y , if for all ǫ, D > 0 we have sup N0(ǫ, D). In this case we use the notation X ≺ Y or X = O≺ (Y ). .

M
We start with the definition of the matrix model we consider.
Definition . . We call W a Wigner matrix if it is an N ×N random Hermitian matrix which satisfies the following properties. The off-diagonal matrix elements below the diagonal are centred independent, identically distributed (i.i.d) real (β = 1) or complex (β = 2) random variables with E|wij | 2 = 1/N . Additionally, in the complex case we assume that E w 2 ij = 0. The diagonal elements are centred i.i.d. real random variables with E w 2 ii = 2/(N β). Furthermore, we assume that for every q ∈ N there is a constant Cq such that Remark . . The assumptions E w 2 ij = 0 in the complex case, and E w 2 ii = 2/(βN ) are made to make the presentation clearer. All our results can be easily extended to this case as well, but we refrain from doing it for notational simplicity.
We set G(z) := (W − z) −1 to be resolvent of the Wigner matrix W with spectral parameter z ∈ C \ R. The optimal local law asserts that G(z) is approximately equal to m(z)I down to the microscopic scale |ℑz| ≫ 1/N , where is the Stieltjes transform of the semicircular distribution.
Theorem . . For any z ∈ C \ R with |z| ≤ N 100 , d := dist(z, [−2, 2]), η := |ℑz| and any deterministic vectors x, y it holds that Theorem . in this form, including both the d < 1 and d ≥ 1 regimes, can be found in [ , Theorem . ] even for much more general random matrix ensembles allowing for correlations. Its tracial version and its special entry-wise version (where x, y are coordinate vectors) have already been established in [ , Lemma B. ]. However, the really interesting d < 1 regime has been proven much earlier: tracial version in [ ], entry-wise version in [ ] and isotropic version [ ]; with many other refinements and generalisations mentioned in the introduction. The d ≥ 1 regime, sometimes called the global law, is much easier and most papers on the local law naturally excluded it for convenience albeit they could have handled this regime, too, with some minor extra effort.
In case of several spectral parameters z1, z2, . . . we use the abbreviation Gi := G(zi). For our main result we recall from [ ] that the deterministic approximation to G1B1G2 · · · G k−1 B k−1 G k for arbitrary deterministic matrices B1, . . . , B k−1 is given by where NC[k] denotes the non-crossing partitions of the set [k] = {1, . . . , k} arranged in increasing order, and K(π) denotes the Kreweras complement of π [ ], e.g.
Moreover, the partial trace pTr π with respect to a partition π is given by with B(k) ∈ π denoting the unique block containing k.
For more details on these notations, see [ , Section ]. As an example we have for any matrix B1 and for traceless matrices A1, A2, A3. In the sequel we follow the notational convention that general deterministic matrices are denoted by B, while the letter A is used to denote explicitly traceless matrices. We now give bounds on the size of the deterministic term M (z1, B1 . . . , z k , B k , z k ). The proof of this lemma is presented in Appendix A.
Exactly the same result ( . ) for k = 1 and f ∈ H 2 was proven in [ ], where we actually even proved a CLT for f (W )A .
We remark that in Corollary . there is a significant improvement in the error term compared to [ , Theorem . ] where the matrices Bj do not necessarily have trace zero. Namely, the Sobolev norm · H k in the error term of [ , Theorem . ] is here replaced by · H ⌈k−a/2⌉ , with a denoting the number of traceless matrices. For a = 0 the error terms in Corollary . coincide with the ones in [ , Theorem . ].

Remark . (Thermalisation)
. We now specialise Corollary . to f (x) = e isx , with s > 0, and define where J1 is the Bessel function of the first kind. The thermalisation result from [ , Corollaries . -. ] asserts that the unitary Heisenberg evolution generated by the Wigner matrix renders deterministic observables (matrices) asymptotically independent for large times. More precisely, for any deterministic matrices B1, B2 (for simplicity we only stated the case k = 2). Using the optimal local law for two resolvents in ( . a), by a very similar proof to the one of Corollary . , we conclude with A1 = A2 = 0. Note the improved error term in ( . ) compared to s 2 N −1 from ( . ), which allow us to prove that for any s ≪ N 1/4 (instead of s ≪ N 1/5 from ( . )), where we used that ϕ(s) 2 ∼ s −3 for s ≫ 1. We remark that by Corollary . we obtain a similar improvement for any k ≥ 3, but we refrain from stating it for notational simplicity.
We give a detailed proof of Theorem . for the much more involved d ≤ 1 regime, in particular in this case η ≤ 1. In Appendix B we explain the necessary modifications for the d ≥ 1 case. At a certain technical point (within the proof of Lemma . ), the proof for the d ≤ 1 uses ( . a) for the d ≥ 1 regime, but this lemma is not needed for the proof in the d ≥ 1 regime, so our argument is not circular. With the exception of Appendix B, throughout the rest of the paper we assume that d ≤ 1, hence η ≤ 1.
For traceless deterministic matrices Aj, Aj ≤ 1, Aj = 0, deterministic bounded vectors x, y, x + y ≤ 1 and for k ≥ 1 we introduce the normalized differences where For convenience we extend these definitions to k = 0 by and note that by the well known single-resolvent local law [ , , ]. Note that the index k counts the number of traceless matrices. For notational convenience we also introduce the concept of ǫ-uniform bounds.
Definition . . Fix any ǫ > 0 and l ∈ N. Let k ∈ N, then we say that the bounds hold (ǫ, l)-uniformly for some control parameters E av/iso = E av/iso (N, η), depending only on N, η, if the implicit constants in ( . ) are uniform in bounded deterministic matrices Bj ≤ 1, deterministic vectors x , y ≤ 1, and spectral parameters zj with 1 ≥ η := minj |ℑzj | ≥ lN −1+ǫ , |zj | ≤ N 100 . For simplicity, we say ( . ) holds ǫ-uniformly if it holds (ǫ, 1)-uniformly. Moreover, we may allow for additional restrictions on the deterministic matrices, and talk about uniformity under the additional assumption that some of the matrices are traceless, or some of them is a multiple of the identity matrix, etc.
Note that ( . ) is stated for each fixed choice of the spectral parameters zj in the left hand side, but in fact it is equivalent to an apparently stronger statement, when the same bounds hold with suprema over the spectral parameters zj . More precisely, if E av ≥ N −C for some constant C, then ( . ) implies (and similarly for the isotropic bound), where the supremum is taken over all choices of zj's in the admissible spectral domain, i.e. with |zj| ≤ N 100 and 1 ≥ minj|ℑzj | ≥ lN −1+ǫ . This bound follows from ( . ) by the usual grid argument. Indeed, we may apply ( . ) for a dense N −10k -grid of k-tuples of complex numbers within the spectral domain. The number of such tuples is at most polynomial in N and we use the standard property of stochastic domination to conclude maxi Xi ≺ C from Xi ≺ C as long as the number of i's is at most polynomial in N . Finally, we can use the Lipschitz continuity (with Lipschitz constant at most η −k−1 ≤ N k+1 ) of the left hand side of ( . ) to extend the bound for all spectral parameters in the spectral domain. In the sequel we will frequently use this equivalence between ( . ) and ( . ), e.g. when we integrate such bounds over some spectral parameter. We first establish the following key lemma which allows us to conclude multi-resolvent local laws for general deterministic matrices from the special case where each deterministic matrix is traceless.
Lemma . . Fix ǫ > 0, l ∈ N and k > 0 and assume that for all 1 ≤ j ≤ k and some control parameters ψ av/iso j the a priori bounds have been established (ǫ, l)-uniformly in traceless matrices. Then it holds that (ǫ, l + 1)-uniformly in vectors x, y and deterministic matrices B1, . . . , B k , out of which 0 ≤ a ≤ k are traceless and 0 ≤ b ≤ k are a multiple of the identity.
Using Lemma . we reduce Theorem . to the following Lemma.
Proof of Theorem . . Theorem . is equivalent to Lemma . in case when all matrices are traceless. The general case follows from Lemma . and setting ψ av/iso k = 1 due to Lemma . .
We prove Lemma . in two steps and first establish a weaker bound as stated in the following lemma.
The rest of the proof is organised as follows: First, we prove Lemma . , then in Section . we state the master inequalities on the Ψ av/iso k parameters, which we then use to prove Lemmas . and . in Section . . Finally, the proof of the master inequalities will be presented in Section .
Proof of Lemma . . We start the proof by splitting all those k − a − b matrices Bi that are neither traceless nor multiples of the identity as Bi = Bi +Bi. Since ( . ) is multi-linear in the B-matrices and the error terms in ( . ) are monotonically decreasing as a or b are increased, it is sufficient to prove Lemma . for the special case when a + b = k, i.e. all matrices are either traceless or multiple of the identity.
Moreover, if ℑziℑzj < 0 then we use the resolvent identity G(zi)G(zj) = [G(zi)−G(zj)]/(zi− zj) and |zi − zj| ≥ η repeatedly to further reduce the lemma to the special case where Ai = 0 and sgn(ℑzi,1) = · · · = sgn(ℑz i,k i ) for all i. We note that ( . ) satisfies the same relation since by definition. Finally, from the residue theorem we have that ( . ) for σ = sgn(ℑzi) = · · · = sgn(ℑzi+n) due to multi-linearity and from the residue theorem. By using ( . ) for each product in ( . ) obtain an alternating chain of traceless matrices and resolvents, so that the bound follows by the assumptions in ( . ).
. . Master inequalities and reduction lemma. From now on every deterministic matrix Ai is assumed to be traceless and uniformity is understood as uniformity in traceless matrices.

Proposition . (A priori estimates on
(ii) Now, let k > 2 and assume that a priori bounds have been established (ǫ, l)-uniformly. Then it holds that Since in Proposition . resolvent chains of length k are estimated by resolvent chains of length up to 2k we will need the following reduction lemma in order avoid an infinite hierarchy of inequalities with higher and higher k-indices.
The proofs of Proposition . and Lemma . will be given in Section and Section , respectively.
. . Proof of the bounds on Ψ av/iso in Lemmas . and . .
Proof of Lemma . . Within the proof we repeatedly appeal to a simple argument we call iteration. By this we mean the following procedure. Fix an ǫ > 0. Suppose that for any l ∈ N whenever X ≺ x holding (ǫ, l)-uniformly implies (ǫ, l + l ′ )-uniformly for some constants l ′ ∈ N, B ≥ N δ , A, C > 0, and exponent 0 < α < 1, and we know that X ≺ N D (ǫ, 1)-uniformly initially (here δ, α and D are N -independent positive constants, other quantities may depend on N ). Then by iterating ( . ) finitely many times (depending only on δ, α and D) we arrive at Here K may depend on K = K(δ, α, D) but does not depend on ǫ. In our application B ≥ (N η) 1/4 and therefore δ is practically some order one parameter depending only on the fixed ǫ in Theorem . .
For any j ≤ k and under the assumption ( . ) the reduction inequalities ( . ) and ( . ) simplify (recall that k is even) to Then together with ( . a) and ( . b) it follows that and where we used the first inequality of ( . ) to estimate ψ iso k+1 in the ψ iso k−1 ψ iso k+1 -term with ψ av 2 = √ N η. Iterating ( . ) yields and by using ( . ) in ( . ) it follows that From ( . ) we immediately conclude Ψ av/iso k ≺ √ N η and by feeding this back into ( . ) finally that concluding the induction step.
Proof of Lemma . . This follows directly from Lemma . and Proposition . and induction on k.
We recall the definition of the second order renormalisation, denoted by underlining, from [ ]. For functions f (W ), g(W ) of the random matrix W we define where ∂ W denotes the directional derivative in the direction of a GUE matrix W that is independent of W . The expectation is w.r.t. this GUE matrix. Note that if W itself is a GUE matrix, then E f (W )W g(W ) = 0, while for W with a general distribution this expectation is independent of the first two moments of W ; in other words the underline renormalises f (W )W g(W ) up to second order. We note that underline in ( . ) is a well-defined notation only when the position of the "middle" W to which the renormalisation refers is unambiguous. This is the case in all of our proof since f, g will be products of resolvents not explicitly involving monomials of W .
We also note that the directional derivative of the resolvent is given by furthermore, we have For example, in case of f = I and g(W ) = (W − z) −1 = G we have Similarly, for Gi = G(zi) we also have indicating that the definition of the underline in ( . ) depends on the "left" and "right" functions f and g, and even though f (W )W g(W ) = W f (W )g(W ) = f (W )g(W )W , their second order renormalisations are not the same. Using this underline notation and the defining equation for m = msc, we have The key idea of the proof of Proposition . is using ( . ) for some Gj in G1A1 . . . A k−1 G k and extending the renormalisation to the whole product at the expense adding resolvent products of lower order. For example, where on the rhs. only products of resolvent with one deterministic matrix need to be understood. The renormalisation of the whole product will be handled by cumulant expansion exploiting that its expectation vanishes up to second order. We note that while W G = GW , replacing G2 by m2 − m2G2W instead of m2 − m2W G2 in ( . ) still gives a slightly different expression: A key ingredient for the proof is the following lemma which shows that the deterministic approximation M defined in ( . ) satisfies the same recursive relations as suggested by ( . )

( . )
We remark that the special j = 1 case of this lemma was already proven in [ , Lemma . ]. We will present a direct combinatorial proof for the general case in Appendix A. Alternatively, Lemma . can also be deduced from the original expansions for resolvent products with the full underline term. For example, taking the expectation of ( . ) for W being a GUE matrix and letting N → ∞ removes the full underline term and the error terms. Since the local law [ , Theorem . ] asserts that G1A1G2A2G3 asymptotically equals M (z1, A1, z2, A2, z3) in the N → ∞ limit for any fixed spectral parameters, we obtain the corresponding identity ( . ) for k = 3. The argument for general k is identical.
. . Proof of Proposition . . The proofs of the averaged and isotropic bounds are done separately below. For simplicity we do not carry the dependence on the spectral parameters zj and traceless matrices Aj but instead simply write G and A.
. . . Averaged bounds ( . a), ( . b) and ( . a). Within the proof we repeatedly make use of the a priori bounds ( . ) and ( . ) for j ≤ 2k. It is important to stress that after possibly applying Lemma . no chains of length more than 2k arise along our expansion hence the a priori bounds are needed up to index 2k only.
By ( . ) for the first G and using the local law | G − m | ≺ 1/(N η) we obtain By assumption ( . ) and ( . ) and Lemma . we have so we can replace each resolvent chain by its deterministic M -value plus the error term. In particular, for the middle term in the third line of ( . ) by a telescopic summation we have where we used that by assumption η 1, the bounds ( . ) and M2 = M1A = 0. Together with the deterministic identity ( . ) we conclude where we used | M2A | 1 and | M k A | η 1−k/2 for k ≥ 3. We recall the cumulant expansion from [ , Eq. ( )] with an error term ΩR which for the application in ( . ) below can be easily seen to be of size ΩR = O(N −2p ) for R = 12p. Here the first fraction represents the Gaussian contribution and σ = N E w 2 12 ∈ {0, 1} is determined by the complex/real symmetry class of W due to Definition . . The sum in ( . ) represents the non-Gaussian contribution and κ p,q ab denotes the joint cumulant of p copies of √ N w ab and q copies of √ N w ab . Using ( . ) and ( . ) and distributing the derivatives we where Ξ av k (l, J, J * ) is defined as and the summation in ( . ) is taken over tuples l ∈ Z 2 ≥0 and multisets of tuples J, J * ⊂ Z 2 ≥0 \ {0, 0}. Moreover, we set ∂ (l 1 ,l 2 ) := ∂ l 1 ab ∂ l 2 ba , |(l1, l2)| = l1 + l2 and J := j∈J |j|. For the first term in the third line of ( . ) we have due to Lemma . and | M 2k+1 | η −k from ( . ). We now turn to the estimate on Ξ av from ( . ). Due to the Leibniz rule the derivatives can be written as a sum of products of (aa, bb, ab, ba)-entries of resolvent chains of the form GAG · · · AG(A), e.g.
Thus we have the naive bounds where we used that ψ iso k i = √ N η for ki ≤ k−2 by ( . ) by assumption. In the proof of the bounds ( . ) we used that by ( . ) and the norm bound in ( . ) for the deterministic term. We will use ( . ) for any k = 2, the k = 2 case will be done slightly differently later. For k = 2, by ( . ) we obtain Note that estimating Ξ av k is necessary only if |l| + (J ∪ J * ) ≥ 2 by ( . ), so the N -prefactor in ( . ) comes with a non-positive power. In fact, if |l| + (J ∪ J * ) ≥ 3, then this factor removes the √ N η factor from the numerator, which will be sufficient for our purpose.
In case |l| + (J ∪ J * ) = 2 we still wish to remove the √ N η factor, so we need to improve ( . ).
We use a standard procedure, called the Ward improvement, which relies on the fact that sums of the form ab ((GA) n G) ab can be estimated more efficiently then just estimating each term one by one.
Note that in ( . ), after distributing the derivatives according to the Leibniz rule, necessarily some resolvent chain (GAG · · · AG[A]) appears with off-diagonal indices (a, b) or (b, a). Indeed, an offdiagonal term comes from one of the products in ( . ) when |j| = 1 for some j ∈ J ∪ J * , and it comes from the ∂ l ((GA) k ) ba factor when |l| = 0 or |l| = 2, by parity considerations. For such off-diagonal resolvent chains we use for n ≥ 1. This allows us to gain a factor of (N η) −1/2 compared with the naive bounds ( . ) that were used in ( . ), at the expense at the expense of replacing 1 + ψ iso n / √ N η by 1 + ψ av 2n /N η. Thus, in case |l| + (J ∪ J * ) = 2 we can also improve upon ( . ) by a factor of (N η) −1/2 and obtain where we used that ψ av 2j = √ N η for j < ⌊k/2⌋ from ( . ). Combining this with the earlier discussed |l| + (J ∪ J * ) ≥ 3 case, we obtain ( . ) for all cases. By plugging ( . ) and ( . ) into ( . ) we conclude and get the appropriate estimate E|· · ·| 2p using Young inequalities. Since p is arbitrary, it follows that concluding the proof of ( . a) and ( . a). Here we used that at least one factor in the ψ av j ψ av k−j product from E av k is equal to √ N η by using ( . ), since either j or k − j is smaller or equal than k − 2 for k = 2. The proof of ( . b), i.e. the k = 2 case, is identical except that in the second line of ( . ) Here the [A] in square brackets indicates an optional matrix A which may or may not be present. and in E av 2 there are quadratic terms resulting in (ψ iso 1 ) 2 , (ψ av 1 ) 2 in ( . b). This completes the estimates for the averaged quantities.
. . . Isotropic bounds ( . c), ( . d) and ( . b). Similarly to ( . ), for the isotropic local law we start by comparing ((GA) k G − M k+1 )xy and (GAW (GA) k−1 G)xy We again replace the G-chains with their deterministic counterparts using where we used the upper bound on (G 2 (AG) k−1 )xy from ( . ). By a telescopic replacement we have and together with ( . ) we conclude from ( . ) that Thus, where and Ξ iso k (l, J, J * ) is defined as In order to estimate Ξ iso k we use the entrywise bounds ( . ) Note that in the second step of the first inequality we tacitly assumed that k = 2; the special case k = 2 will be discussed at the end of the proof. From ( . ) we directly obtain the naive bound Recalling the definition ( . ) and that we need to estimate Ξ iso k only when |l|+ (J ∪J * ) ≥ 2 by ( . ), we claim that we can improve upon ( . ) by (a) 4 factors of (N η) −1/2 in case |l| = 0 and |j| = 1 for some j ∈ J ∪J * (implying |J ∪J * | ≥ 2), (b) 3 factors of (N η) −1/2 in case |l| = 0 and |J ∪ J * | ≥ 1, (c) 3 factors of (N η) −1/2 in case |j| = 1 for some j ∈ J ∪ J * , (d) 2 factor of (N η) −1/2 otherwise, at the expense of replacing of a multiplicative factor of 1 + ψ iso k i / √ N η by 1 + (ψ iso 2k i ) 1/2 /(N η) 1/4 for each such improvement. Indeed, estimating gains factors of (N η) −1/2 and (N η) −1 respectively, compared to the naive bounds for one and two off-diagonal chains per summation index. Similar gains are possible for the summation over the b-index. We call a chain evaluated in x, a or y, a an a-chain (as in ( . a)-( . b)), and a chain evaluated in x, b or y, b a b-chain. We now check that, when performing the a and b summations, in each of the cases (a), (b), (c) and (d) the gains ( . a) and ( . b) can be used sufficiently often to obtain the claimed number of (N η) −1/2 factors. Note that even if there were many a-chains, a gain is possible from at most two of them.
(a) Here both the l-factor [(GA)xa(G(AG) k−1 ) by ] (see ( . )) and the j-factor ∂ j ((GA) k G)xy, after performing the derivative, contain exactly one aand one b-chain each. Hence ( . b) can be used for both summations, and we gain four factors. (b) Here the l-factor contains one a-chain and one b-chain, while the j-factor contains either an aor b-chain, and thus both ( . a) and ( . b) can be used once for the a and once for the b-summation, gaining three factors. (c) Due to |j| = 1, the j-factor contains one aand one b-chain, while the l-factor contains either an aor b-chain, and thus both ( . a) and ( . b) can be used once, gaining three factors. (d) The l-factor contains either one aand one b-chain, or two a-chains, or two b-chains. In the first case we use ( . a) twice, and in the latter two cases we use ( . b) once in order to gain two factors in total. Now we collect these improvements for ( . ). If |l| + (J ∪ J * ) − |J ∪ J * | = 0, then we are in case (a) and can gain 4 factors. If |l| + (J ∪ J * ) − |J ∪ J * | = 1, then either |l| = 0 and we are in case (b), or |j| = 1 for all j ∈ J ∪ J * and we are in case (c), yielding three gained factors in both cases. Finally, if |l| + (J ∪ J * ) − |J ∪ J * | ≥ 2, then case (d) applies with a two factor gain. Note that the fewer gains are compensated by the higher power of 1/N in the prefactor in ( . ). Altogether we can conclude that ( . ) By plugging ( . ) and ( . ) into ( . ) we conclude ( . c) and ( . b). This proves ( . c) and ( . b).
For the special k = 2 case, i.e. for the proof of ( . d) we note that in the first equality of ( . ) and in the estimate on E iso k there are additional quadratic terms (ψ iso 1 ) 2 and ψ iso 1 ψ av 1 but otherwise the proof remains unchanged.
In order to prove Lemma . we first infer local laws for resolvent chains including some absolute value |G| from resolvent chains without absolute value. To formulate the precise statement, for any choices of gi(x) ∈ {1/(x − zi), 1/|x − zi|} we first generalise ( . ) to where sc• is the free cumulant function of sc[i1, . . . , in] := gi 1 · · · gi k sc. We note that the bounds ( . ) and their proofs verbatim also apply to this more generalised M . The following lemma generalises Lemma . to absolute values.
Lemma . . Fix ǫ > 0 and ℓ, k > 0 and assume that for 1 ≤ j ≤ k a priori bounds have been established (ǫ, ℓ)-uniformly in traceless matrices. Then z1, . . . , z k+1 ∈ C with η = mini|ℑzi| and Gi ∈ {G(zi), |G(zi)|} and corresponding gi( Proof. The proof is analogous to the special case given in Lemma . , with the additional step first of representing any |G| via as an integral over resolvents. Here we used the identity We note that M for g(x) = |x − E − iη| −1 satisfies the analogous identity . ) by multi-linearity. In ( . ) the lhs. is understood in the sense of ( . ), and the rhs. in the sense of ( . ).
It remains to estimate the integral of the error term obtained from using ( . ) for each |G| and replacing the resulting resolvent chains by their deterministic equivalents. From now on we only consider the case a = k in the averaged version (the isotropic one is analogous). Proceeding as in Lemma . , the general case 0 ≤ a ≤ k − 1 is completely analogous and so omitted. The application of Lemma . is the only reason why ( . ) holds (ǫ, ℓ + 1)-uniformly. The proof that now follows for a = k holds (ǫ, ℓ)-uniformly. For notational simplicity in the following we denote all the deterministic matrices by A and resolvents by G (even if they are evaluated at different spectral parameters). For concreteness we assume that only two gi(x)'s are equal to |x − zi| −1 , the rest is (x − zi) −1 , i.e. k1 + k2 + 2 = k. Introducing the shorthand notations zi,s := Ei + i η 2 i + t 2 , M (z1,s, z k 1 +2,s ) := M (z1,s, A, z2, . . . , z k 1 +1 , A, z k 1 +2,s , A, z k 1 +3 , . . . , z k ), we have Note that to go from the second to the third line we used the trivial norm bound G(E + iη) η −1 to remove the very large s and t regime (and a similar bound for the deterministic term). Additionally, in the penultimate inequality we used ( . ) to bound the regime η ≤ 1, with η := mini |ℑzi|, and the averaged local law ( . a) in the regime η ≥ 1. Alternatively, we could have used [ , Theorem . ] in this latter regime.
Proof of Lemma . . Similarly to Section , to make the presentation simpler we do not carry the dependence on the spectral parameters zj and traceless matrices Aj but instead simply write G and A.
We first start with the bound in the average case and we distinguish two cases depending on whether k is even or odd. Let {λi} i∈ [N] be the eigenvalues of W , and let ui be the corresponding eigenvectors. For even k, using the shorthand notation T := A(GA) k/2−1 , we have In the last line we used Lemma . for a = k. This concludes the bound for even k.
Similarly, for odd k we have where to go to the last line we again used Lemma . for a = k. Additionally, to go from the first to the second line of ( . ) we used (with the shorthand notation T := A(GA) (k+1)/2−1 , S := A(GA) (k−1)/2−1 ) We now consider the isotropic case when k is even and j ≥ 1: Additionally, to go from the second to the third line we used that x, (GA) k/2 GA(GA) j−1 G(AG) k/2 y = ij x, (GA) k/2 ui ui, A(GA) j−1 uj uj, (AG) k/2 y . P C .
The proof of this corollary relies on the Helffer-Sjöstrand representation [ ], i.e. we express each fi(W ) in f1(W )A1 · · · f k (W ) as an integral of resolvents at different spectral parameters. Note that by eigenvalue rigidity (see e.g. [ , Theorem . ] or [ ]) the spectrum of W is contained in [−2−ǫ, 2+ǫ], for any small ǫ > 0, with very high probability. In particular this implies that it is enough to consider test functions fi ∈ H ⌈k−a/2⌉ 0 ([−3, 3]), i.e. Sobolev functions on R which are non-zero only on [−3, 3]. In fact, this can be always achieved by multiplying the original f with a smooth cut-off function without changing f (W ) up to an event of very small probability.
We present the proof only when all the matrices are traceless, i.e. when a = k. The proof in the general case is completely analogous and so omitted. −3, 3]) then we define its almost analytic extension by where χ(η) is a smooth cut-off equal to one on [−5, 5] and equal to zero on [−10, 10] c and f (j) denotes the j-th derivative. Then we have where d 2 z = dx dη denotes the Lebesgue measure on C ≡ R 2 with z = x + iη. Consider f1, . . . , Proof of Corollary . . This argument is very similar to the proof of [ , Theorem . ], hence here we only explain the main differences. Pick any ξ > 0 as a tolerance exponent in the definition of O≺(). Without loss of generality we can assume that maxi fi H ⌈k/2⌉ N 1−ξ (otherwise there is nothing to prove). We first prove the averaged case in ( . ), and then we explain the very minor changes required in the isotropic case.
We start with the bound which easily follows from ( . ). Set η0 := N −1+ξ/2 ; first we prove that the regime |ηi| ≤ η0, for some i ∈ [k] in the integral representation of f1(W )A1 . . . f k (W )A k from ( . ) is negligible. Here we only present the proof in the case when |ηi| ≤ η0 happens only for a single index i; the changes when more than one ηi's are small are exactly the same as explained above [ , Eq. ( . )], giving an even smaller bound. Without loss of generality we assume that |η1| ≤ η0. In this regime we claim that (with zi = xi+iηi) To prove ( . ) we will use Stokes theorem in the following form: for any η ∈ [0, 10], and for any ψ, h ∈ H 1 (C) ≡ H 1 (R 2 ) such that ∂zh = 0 on the domain of integration and for ψ vanishing at the left, right and top boundary of the domain of integration. We will use ( . ) and the compact support of (fi)C to conclude that OPTIMAL MULTI-RESOLVENT LOCAL LAWS FOR WIGNER MATRICES for any fixed z1, . . . , zi−1, zi+1, . . . , z k . Using ( . ) repeatedly for the z2, . . . , z k -variables, we conclude Additionally, we will use the following bound on products of k resolvents which holds uniformly in |η| ≥ N −10k . For this bound we introduce ρ(z) := π −1 |ℑmsc(z)|, for any z ∈ C \ R, as the harmonic extension of the semicircle density noting that ρ(x + i0) = ρsc(x).
Armed with all these ingredients, we have the following chain of inequalities in order to prove ( . ): |rhs. of ( . )| ≺ dx1 |f where in the first step we first used fi ∞ 1 for i ∈ [2, k] and after splitting the η1 integration, in the regime ηr ≤ |η1| ≤ η0 we used ( . ) together with for any |x1| ≤ 2, |η1| ≤ η0 from ( . ). In the complementary regime |η1| < ηr we used the trivial norm bound | G(z1)A1 · · · G(z k )A k | ≤ i G(zi)Ai ≤ i |ηi| −1 together with ( . ). In the penultimate inequality of ( . ) we also used that 1/ρ is finite due to the square root singularity of ρ, and that 1/ρ 2 log N thanks to the tiny N 2/3 -regularisation. This concludes the proof of ( . ). We now estimate the integration regime in ( . ) where |ηi| ≥ η0 for all i ∈ [k]. By ( . ) and the local law ( . a), we conclude that where we abbreviated M [k] = M (z1, A1, . . . , z k−1 , A k−1 , z k ). Note that in ( . ) we estimated the error term N −1 (min |ηi|) −k/2 coming from the local law ( . a) by with d 2 z := d 2 z1 . . . d 2 z k . More precisely, in ( . ) we considered the regime η1 ≤ η2 ≤ · · · ≤ η k (all the other regimes give the same contribution by symmetry) and performed k − 1 integration by parts in the zi-variables, i ∈ [2, k], as in ( . ), and then estimated the remaining ∂ z (f1)C(z1) by ( . ). The error term N −1 |η1| −k/2 from the local law together with the |η1| ⌈k/2⌉−1 bound from ( . ) and the integration in η1 yields ( . ). Finally, using that by ( . ) the regime ηi ∈ [ηr, η0] can be added back to ( . ) at the price of an error η0(N η0) k/2−1 maxi fi H ⌈k/2⌉ we conclude the proof of the averaged case in ( . ) modulo the computation of the leading deterministic term which is done exactly as in [ , Proof of Theorem . ] and so the details are omitted.
The proof of the isotropic case in ( . ) is very similar. The only differences are the following: (i) to bound the small ηi-regime we have to use ( . ) instead of ( . ), which still gives exactly the same bound ( . ); (ii) to estimate the error term coming from the isotropic local law ( . b) (used in the regime when |ηi| ≥ η0 for all i ∈ [k]) we have to replace ( . ) by The proof of ( . ) is exactly the same as the proof of ( . where Cn is the n-th Catalan number. Here we used (A. ) in the third and fourth step recalling that We note that (A. ) is sharp since (A. ) is sharp and leading order cancellations are impossible in the ultimate line. From the definition ( . ) it follows that pTr K(π) is non-zero only when no block of K(π) is a singleton {i} with Bi = 0, and therefore |K(π)| ≤ k − ⌈a/2⌉ or equivalently |π| ≥ 1 + ⌈a/2⌉. Thus ( . ) follows directly from ( . ).
Proof of Lemma . . We only prove ( . ) as the proof of ( . ) is completely analogous. We recall the alternative definition of M from [ , Eq. ( . )] where NCG[1, k] denotes the set of non-crossing graphs on the vertex set [1, k] = {1, . . . , k}, i.e. graphs without crossing edges (ab), (cd) with a < c < b < d. The graphs are identified with their edge sets E. Note that the connected components of any non-crossing graph E form a non-crossing partition of the set [1, k] that we denoted by π(E) in (A. ). For any fixed j ∈ [1, k], we now partition the set of non-crossing graphs as according to the idea that each non-crossing graph either (i) has j as an isolated vertex, or (ii) has a maximal l < j with (lj) ∈ E, and the graph can be written as the product of a graph inside and a graph outside the interval [l, j], or (iii) has no l < j with (lj) ∈ E but there is a maximal l > j with (jl) ∈ E, and the graph can be written as the product of a graph inside and a graph outside the interval [j, l]. The corresponding formal definitions used in (A. ) are given We note that for graphs E ∈ NCG[1, k] with an isolated vertex j whose edge-set is given by the edge-set E = E1 ∈ Gj of its restriction to [1, k] \ {j} we have pTr K(π(E)) (A [1,k) ) = pTr K(π(E 1 )) (A [1,j−2] , Aj−1Aj, A [j+1,k) ).

(A. )
Similarly for E = E1 ∪ E2 with E1 ∈ G i lj , E2 ∈ G o lj for some l < j we have pTr π(E) (A [1,k) ) = pTr K(π(E 1 )) (A [l,j−2] )Aj−1 pTr K(π(E 2 )) (A [1,l) , I, A [j,k) ) (A. ) since the vertices l + 1, . . . , j − 1 are necessarily in distinct connected components than the vertices 1, . . . , l − 1, j + 1, . . . , k due to the non-crossing property. Finally, for E = E1 ∪ E2 with E1 ∈ G i jl , E2 ∈ G o lj for some l > j we have by the same reasoning. Using this decomposition in (A. ), we thus obtain By (A. ) it follows directly that while for G o lj and G i jl we note that the graphs with or without the edges (lj) or (jl), respectively, give exactly the same tracial expression, and therefore The claim now follows from using (A. ) and (A. ) within (A. ) and using q lj /(1 + q lj ) = m l mj.
Proof of Lemma . . Let ǫ > 0 be arbitrary small and set J := N ǫ . For any |x| ≤ 2, define z(x, J) = x + iη(x, J) where η(x, J) is uniquely defined implicitly via the equation N η(x, J)ρ(z(x, J)) = J. Note that η(x, J) N −1+ǫ . Denote by λi the eigenvalues of W and by ui the corresponding orthonormal eigenvectors. Additionally, we define the quantiles γi implicitly by and we recall the rigidity bound (see e.g. [ , Theorem . ] or [ ]) Using this eigenvalue rigidity and the spectral decomposition of W , it is easy to see the following bound on the overlaps of the eigenvectors with a test matrix A | ui, Auj | 2 ≺ 1 N ρ(z(γi, J))ρ(z(γj, J)) ℑG(z(γi, J))AℑG(z(γj, J))A ≺ 1 N ρ(z(γi, J))ρ(z(γj, J)) (A. ) for any i, j ∈ [N ]. Here we neglected N ǫ -factors since ǫ > 0 is arbitrary small and eventually it can be incorporated in the ≺-notation. Note that in the last inequality of (A. ) we used ( . a) with k = a = 2 and that the corresponding deterministic term, a linear combination of M (zi, A1, zj)A2 is bounded, see ( . ), where zi = z(γi, J) or zi =z(γi, J).
Given the overlap bound (A. ), we now present the proof of ( . ); the proof of ( . ) is completely analogous and so omitted. By spectral decomposition for each resolvent together with (A. ), using that ρ(z(x, J)) ∼ ρ(x + iN −2/3 ) for any |x| ≤ 2 (modulo N ǫ -factor), we find that where we used that Here δ > 0 is an arbitrary small constant (and we neglected N δ -factors since eventually it can be incorporated in the ≺-notation), and i0 = i0(j) is the index such that γ i 0 (j) is the closest quantile to the fixed xj = ℜzj. In the first inequality in (A. ) we used rigidity to replace λi and zj with the closest quantiles. In the last step in (A. ) we first used that ρ(γi+iN −2/3 ) and ρ(xj +iN −2/3 ) are comparable up to an N δ factor, again by rigidity, and then we used the trivial bound 1/|λi − zj| ≤ 1/|ηj | in the first sum and performed the second sum using the regular spacing of the quantiles.
The d ≥ 1 regime is conceptually much simpler than d ≤ 1 for several reasons. First, there is no need to keep track of the traceless matrices separately. Second, the trivial norm estimate G(z) ≤ 1/d is affordable without much loss. These two facts mean that long chains of the form GAGA . . . G can affordably be reduced to much shorter chains by estimating intermediate A and G factors simply by norm. This trivially takes care of the reduction problem, the key difficulty in the proof when d ≤ 1; in particular no analogue of Lemma . is needed. Furthermore, we will not need to introduce the quantities Ψ iso/av and ψ iso/av and gradually improve the estimate on them; the system of master inequalities reduces to a simple induction on the length k of the resolvent chain.
We will present the proof of the averaged law ( . a) for d ≥ 1, the corresponding isotropic law ( . b) is completely analogous and will be omitted. The backbone of the argument is a very simplified form of Section . For notational simplicity, we again do not carry the precise dependence of the resolvents on the spectral parameters and we denote every deterministic matrix Ai generically by A. Note that A's are not necessarily traceless.
We prove ( . a) by induction on k, the initial k = 1 case will be proven along the way. We now fix some k ≥ 1 and in the case k ≥ 2, we assume that ( . a) has been proven for all resolvent chains of length at most k − 1. The starting point of the proof of ( . a) for k is formula ( . ) that we repeat here

(B. )
Note that the 1/(N η) in the error term in the lhs. is replaced with 1/(N d 2 ) since it came from the standard single resolvent local law from Theorem . . Notice that all but one chains in the rhs. of (B. ) have less than k resolvents, these can be approximated by their deterministic counterparts using the induction hypothesis of the form The k = 1 case is particularly simple, since the first term in the rhs. of (B. ) is simply m A and the sum is absent. In the k ≥ 2 case, for the remaining (GA) k−1 G term we instead use the integral representation ( . ) and ( . ) in order to also estimate this term using the induction hypothesis as Thus, similarly to the telescopic summation ( . ) and using the deterministic identity ( . ), we obtain the following analogue of ( . ): where the error term E av k has been appropriately redefined compared with ( . ). Now we fix any integer p and compute the 2p-th moment of the lhs. of (B. ) exactly as in ( . ) with the definition of Ξ av k given in ( . ). We follow the calculation from ( . ) through ( . ) but the estimates are greatly simplified as follows. Instead of ( . ) we now have by a trivial norm bound and d ≥ 1. Note that we exploited the additional decay |m| 1/d unlike in ( . ) where |m| 1 was used.
Now we turn to the estimate of Ξ av k . The naive bounds ( . ) become as long as j = 0, and they again follow from the trivial norm estimates. Using these bounds in ( . ), we have .
Note that this bound gains a factor 1/ √ N compared to the trivial bound in (B. ) since the double sum now contributes only by a factor N 3/2 instead of N 2 . This gain is sufficient to improve (B. ) to Ξ av k ≺ E av k 1+ (J ∪J * ) .

(B. )
Plugging this estimate together with (B. ) into ( . ), using a Young inequality as we did when going from ( . ) to ( . ) and recalling that p was arbitrary, we obtain | (GA) k − M k A | ≺ E av k i.e. we proved ( . a) in the d ≥ 1 regime. We omit the proof of ( . b) in the same regime since can be obtained analogously, following a substantial simplification of the argument in Section . . along the same lines as the average bound was simplified following Section . . .