Power-law decay of weights and recurrence of the two-dimensional VRJP

The vertex-reinforced jump process (VRJP) is a form of self-interacting random walk in which the walker is biased towards returning to previously visited vertices with the bias depending linearly on the local time at these vertices. We prove that, for any initial bias, the weights sampled from the magic formula on a two-dimensional graph decay at least at a power-law rate. Via arguments of Sabot and Zeng, the result implies that the VRJP is recurrent in two dimensions for any initial bias.


Introduction
In this paper we study an interacting stochastic process known as the vertexreinforced jump process (VRJP for short) in two dimensions using a technique known as the Mermin-Wagner theorem. We start by describing VRJP, our object of study.
1.1. The vertex-reinforced jump process. VRJP was first studied in [3] as a continuous-time version of linearly edge reinforced random walk (LRRW), a process studied earlier by Diaconis and Coppersmith (unpublished, 1987) who noted that it has an interesting property not shared by other reinforced random walks: partial exchangeability. Partial exchangeability for a discrete-time process means that the probability of any particular path depends only on the number of times each edge was crossed, and not on the order in which this happened. This property allows, via a soft argument [5], to conclude that LRRW is in fact a random walk in random environment (RWRE) and using a more elaborate argument to get a formula for the distribution of the environment, known fondly as "the magic formula". See [16] for the history of the magic formula. For VRJP the picture is slightly different. The process is not a (continuous time) RWRE as stated. Instead, the process becomes a RWRE after a time change and then has its own magic formula. A hint of the magic formula for VRJP appeared in [4] but the full picture was only revealed by Sabot and Tarrès [23]. The magic formula will be stated exactly below, in §1.2.
A second special property of VRJP is the connection to supersymmetry. We will not attempt to describe supersymmetry in details in this short introduction, but roughly it postulates a symmetry between fermions and bosons. The specific supersymmetric model relevant to VRJP is the hyperbolic sigma model, defined by Zirnbauer [27,11] (see also [10,9]). The hyperbolic sigma model has two fermions and two bosons at each vertex, with an interaction that enjoys a hyperbolic symmetry. Integrating both fermionic fields and one of the bosonic fields leads to a single field, let us denote it by u, but with a complicated interaction term. It was discovered in [23] that e u has exactly the same distribution as the environment described by the magic formula for the VRJP, establishing a link between these two topics. Supersymmetry brings a new set of tools to the problem, but the most relevant to us is the Ward identity. It states that Ee ux = 1 always.
In this paper we study the VRJP in two dimensions. We show that the environment of its RWRE representation decays at least like a power law, namely Ee |x| −c where the constant c may depend on the initial weight a (see exact definitions below). For LRRW this was proved by Merkl and Rolles [17]. This result is not sharp for small a. In this case it was known [23,1] that in fact u decays exponentially. The true decay rate for large a thus remains open. We do not know if it is really a power law (and hence a transition of Kosterlitz-Thouless type occurs) or rather if the decay is exponential for all a. This is related to the question of asymptotic freedom in quantum field theory, but this introduction is too short to cover these connections.
While this paper was written (which, unfortunately, took much too much time), Sabot gave an alternative proof of this result, see [22] (see also [6] for the quasi one-dimensional case).

Exact definitions and statements.
Definition 1. Let G be a finite graph, let o be a vertex of G and let W : E(G) → [0, ∞) be a function. The vertex-reinforced jump process (VRJP) on G with initial vertex o and weights W is a continuous-time process (Y t ) t 0 on the vertices of G defined as follows: Y 0 := o and at every t > 0, Y jumps from its current position x to a neighbour y with rate W xy L y (t) where L is the local time: The time-changed VRJP on G with initial vertex o and weights W is the process We remark at this point that the time change is not as bad as it looks on first sight: it applies at each vertex essentially independently, a fact that we use below in the proof of Lemma 7.
Convention. The u in the theorem below, and more generally any function defined on G \ {o} is considered to be zero on o.
Theorem 1 ("The magic formula"). Let G be a finite connected graph, let o be a vertex of G and let W : E(G) → (0, ∞).
(1) The function ρ : R G\{o} → R below is a probability density function: where D(W, u) is any diagonal minor of the matrix A = (a xy : x, y ∈ G) given by (Note that the convention u o = 0 was used in the sums in (3).) (2) Sample u randomly from the density ρ. Let (Z s ) s 0 be a continuous-time random walk on G, with Z 0 := o, which transitions from x to a neighbour y with rate 1 2 W xy e uy−ux . Then (Z s ) (considered after averaging over the randomness in u) is distributed as the time-changed VRJP on G with initial vertex o and weights W . [23,Theorem 2]. The result there is stated for 2 sinh 2 ( 1 2 (u x − u y ) instead of for cosh(u x − u y ) − 1 but this is of course the same. It is stated under the condition u i = 0 whereas our normalisation is u o = 0 but, again, this is the same: the measure on { u i = 0} used in [23] is not the volume measure but simply the measure one gets by fixing u x = 0 for an arbitrary vertex x (see the comment immediately after [23, Theorem 2]); and the ρ is unchanged, except the term e uo in [23] which becomes our exp(− u x ).

See Sabot and Tarrès
The result of this paper is that in a two-dimensional graph the weights decay at least at a power-law rate. The result is local, only the structure at the vicinity of the point of interest is used. Here is the exact formulation.
Here and below, for vertices x, y in a graph write d(x, y) for their graph distance and, for integer L 0, denote the closed ball of radius L around x by B(x, L) := {y : d(x, y) L}. We denote by c and C positive absolute constants whose value might change from line to line or even within the same line. We use c for constants which are "small enough" and C for constants which are "large enough". We use c(· · · ) and C(· · · ) for constants that depend on some parameters. Theorem 2. There exists C, c(a) > 0 such the following holds for each a > 0: Let L ∈ N. Let G be a finite connected graph with a distinguished vertex o and assume that B(o, L) is isomorphic to the ball B((0, 0), L) in Z 2 . Let W : E(G) → (0, ∞) satisfy that W | B(o,L) ≡ a. Let u be sampled from the density (3) with respect to G and W . Then for every x ∈ B(o, 2L), Further, for every a 0 > 0 there exists c(a 0 ) > 0 such that c(a) c(a 0 )/a for a > a 0 .
Part of the motivation for proving a local result comes from its application in Sabot-Zeng [24]. They proved recurrence of two-dimensional VRJP, conditioned on Theorem 2 in the specific case of wired boundary conditions. Here is the exact formulation. For an integer L 1 let G L be the graph with vertex set {−L, . . . , L} 2 ∪ {δ L }, with vertices in {−L, . . . , L} 2 connected by an edge if they differ by exactly one in exactly one coordinate and with δ L adjacent to every vertex (x 1 , x 2 ) for which either |x 1 | = L or |x 2 | = L (or both). Correspondingly, for a real a > 0, W L : E(G L ) → R satisfies W L (e) = a for all edges except the edges connecting δ L with (x 1 , x 2 ) having both |x 1 | = |x 2 | = L for which W L (e) = 2a (as these edges result from two edges of Z 2 when identifying the vertices of Z 2 adjacent to V L into the single vertex δ L ). Let o = (0, 0) ∈ G L . Then Remark 7 in [24] says that if (5) holds for this graph then VRJP on the whole of Z 2 is recurrent. Since this falls under our Theorem 2 this gives a proof of recurrence of VRJP. We remark that a proof of a weaker notion of recurrence was given recently in [2].

1.3.
Overview of the proof. The core of the proof is an argument of Mermin-Wagner type, so let us start with a short discussion of this approach. For physicists, the Mermin-Wagner theorem states that continuous symmetries cannot be spontaneously broken in a system with short-range interactions in dimension 2 or lower (see e.g. [26, p. 198]). Every use of the Mermin-Wagner approach starts with a perturbation argument: a calculation (usually easy to do) shows that it is possible to take one instance of the field u and then deform it so that the local deformation is small, small enough to have low energetic cost, while the overall deformation is significant. For example, take the field u to have density exp − (∇u) 2 , i.e. a two-dimensional lattice Gaussian free field. The continuous group of symmetries in this case is simply the symmetries taking u x → u x +C for some constant C, which preserves the density. Consider u in the discrete box {−L, −L + 1, . . . , L} 2 , normalized so that u (0,0) = 0. The perturbation argument entails comparing u x to u ± x := u x ±τ , with τ chosen, e.g., as log(|x|+1)/ √ log L. The energy of u is necessarily close to either that of u + or that of u − (since (∇u) 2 − 1 2 ((∇u + ) 2 + (∇u − ) 2 ) = (∇τ ) 2 and the sum of the last term is uniformly bounded in L by the choice of τ ) but overall the fields diverge by √ log L, which is significant. One concludes that the fluctuations of u must grow without bound as L increases (specifically, Var u x c log L at vertices at distance L from the origin). There are multiple approaches to harness the perturbation argument; the reader may find a discussion with references in [15, page 4].
Let us now describe how to apply the above approach to the VRJP model. Recall that the density of u (3) ("the magic formula") is proportional to The idea is to use an argument of Mermin-Wagner type to lower bound the fluctuations of u by (a constant multiple of) the fluctuations of the Gaussian free field, normalized to be zero at o, whose density is proportional to On a two-dimensional graph, with W xy ≡ a, this yields that and corresponding Gaussian lower bounds on the tail behavior. As a separate input, we use that the field u is known not to be too big. Specifically, the Ward identity of Theorem 3 shows that E(exp(u x )) = 1.
In order to avoid a contradiction between (6) (and the corresponding tail bounds) and (7), the value of u x must typically be small. For a quantitative inequality, recall that if Z is a Gaussian random variable with mean m and variance σ 2 then E(e Z ) = e m+ 1 2 σ 2 . Thus if u x were Gaussian then (6) and (7) would imply that Theorem 2 states quantitative results of this flavor.
In a different context, the idea that fluctuation lower bounds plus an a priori input that the field is not large could be used to prove that the field is typically small was used by Schenker in [25] following a suggestion of Aizenman. A version of the Mermin-Wagner method was also applied by Merkl and Rolles [17] in proving power-law decay of the weights for the LRRW.
There are two main obstacles to the application of a Mermin-Wagner type argument to the VRJP model. First, the field u does not have short-range interactions due to the presence of the determinant term. This is handled by noting that the function F (u) above is log-convex (Lemma 4) and can thus be discarded in comparing the energy of u with the energy of its perturbations u ± (as F (u + τ )F (u − τ ) F (u) for any τ ). Second, the Mermin-Wagner method is easiest to implement for gradient fields whose interaction function U is twice-continuously differentiable with sup U ′′ < ∞ (see, e.g., [15, § 1.1] or [20, § 2.6]). As the hyperbolic cosine function does not satisfy this bound we need to resort to a more sophisticated version of the argument, based on ideas of Richthammer [21] and developed in [15] (see the 'addition algorithm' of § 4). A price to pay is that an additional a priori input is required: We need to show that for some sufficiently large constant K, the random set of edges on which the gradient of u exceeds K in absolute value is sparse in an appropriate probabilistic sense. For other models, such an input was established either using reflection positivity [15] or using symmetries of the state space [19]. Here, this input is proved by the approach of [1], namely, by considering together the VRJP and RWRE pictures for the field u (see § 3). The obtained constant K is uniform in the weight a for a bounded away from zero.

Inputs on the weight distribution
Let G be a finite connected graph, W : E(G) → (0, ∞) and o ∈ G. We describe two inputs on the density (3) of the weight vector u.
The first input is known as a Ward identity.
The second input is a simple log-convexity property which will be key to the application of Mermin-Wagner type techniques in the proof of Theorem 2. This is well-known (see, e.g., Disertori-Spencer-Zirnbauer [10,Remark 2.3] where the proof is attributed to David Brydges), but for completeness we provide a proof.
Lemma 4. Let A be the matrix given by (4). Let D(W, u) be the determinant of any diagonal minor of A. Then D(W, u) is a log-convex function of u.
Proof. Since A is a Laplacian matrix, i.e. a symmteric matrix with nonpositive off-diagonal entries and rows summing to 0, we may apply the matrix-tree theorem [13, theorem 1.19]. It gives where T is the set of all spanning trees G (recall that a spanning tree of a graph is a subgraph which contains all vertices and some of the edges, and is connected and cycle-free). Rearranging gives for some coefficients a f 0 (all but finitely many of which are zero). Any such sum is log-convex: indeed, for each x, y ∈ G, z e (f (z)+g(z))uz and the symmetry between f and g allows to write the sum as To see that the resulting matrix is positive semi-definite, let µ be some test vector and write Since this is nonnegative, the lemma is proved.

Comparing to percolation
Let G be a finite connected graph, W : E(G) → (0, ∞) and o ∈ G. Let a > 0 and let H ⊂ G be some induced subgraph such that W | E(H) ≡ a. Let u be sampled from the density (3). In this section we show that the (random) set of edges {x, y} where |u x − u y | is large is sparse in a suitable sense.
A random set of edges is called an ε-percolation if each edge of the underlying graph is present in the random set with probability ε, independently between different edges. Our proof entails consideration of two ε-percolations, which may be dependent among themselves.
Write E(H) for the set of directed edges of H. The proof revolves around an "estimator" for exp(u x − u y ), (x, y) ∈ E(H), which we denote by Q xy (note that Q is not symmetric). We define Q via the Z process; recall that it has two equivalent definitions, via the VRJP picture (see Definition 1) and via the RWRE picture (see Theorem 1).
Define Q xy to be the local time spent by Z at x up to the first jump from x to y.
Proposition 5 following immediately from these two lemmas, with K Proposition 5 = log(K 1 K 2 ) and with ε Lemmas 6 and 7 = 1 2 ε Proposition 5 (the 1 2 is needed because both lemmas give directed ε-percolation, while the proposition is about undirected percolation, and projecting a directed ε-percolation to an undirected one gives a (2ε − ε 2 )percolation). Let us therefore move to the proofs of these lemmas.
Proof of Lemma 6. Examine the RWRE picture. We claim that conditioned on the environment u, Q xy ·( 1 2 W xy exp(u y −u x )) is an i.i.d. field of exponential random variables of rate 1. This implies that the unconditioned field (after integrating over u) satisfies the same. The lemma follows, as an exponential random variable T with rate 1 satisfies P(T ε) = ε 0 e −x dx ε. To see the above claim we recall a method for implementing a continuous-time random walk. For each directed edge (x, y), denote the jump rate from x to y by ρ xy and associate to (x, y) an independent Poisson process with intensity ρ xy . The walk is then defined by the rule that a jump from x to y occurs at times t for which (i) the walker is at the vertex x just before time t, and (ii) an event of the Poisson process of (x, y) occurs at time L x (t), where L x (t) is the local time accumulated at the vertex x by time t. It is a standard fact that the walk defined in this way indeed has the correct distribution. With this representation, it becomes clear that if Q xy is the local time spent by the walk at x up to its first jump to y then the (Q xy ) are independent and each Q xy has an exponential distribution with rate ρ xy .
Proof of Lemma 7. Examine the VRJP picture, and denote by q xy the time spent by Y in x up to the first jump from x to y. The instantaneous jump rate from x to y, W xy L y (t), is always larger than a (as L y 1, see (1)). Hence q xy is dominated by a field of i.i.d. exponential random variables with rate a. In particular, We now claim that Q xy = q 2 xy + 2q xy , which will then imply the lemma. Indeed, let t i be the i th time that Y enters x, let t ′ i be the i th time Y exits x, and let k be the minimal i for which Y exits x towards y at time The RWRE picture makes it clear that all the t i and t ′ i , as well as k, are almost surely finite. Recall the time change function D from (2) . But between t i and t ′ i the sum defining D changes only at x so xy + 2q xy as needed. The lemma follows.

The addition algorithm
The final ingredient used in our proof is the following, so called addition algorithm, which is introduced in [15] following earlier work of Richthammer [21].
The input to the addition algorithm is a finite, connected graph H, a function τ : H → [0, ∞) and a constant K. Its output is two bijections T + , T − on R H such that T ± (ϕ) is an approximation of ϕ±τ , chosen in a way that preserves the gradients of ϕ whenever the latter are larger than K. The exact formulation is below.
While the explicit description of the addition algorithm is not long or difficult (see [15, § 2.2]), we refrain from giving it here and instead list the properties of the algorithm which we require. The list follows the properties in [15, § 2.1], with the exception of property (iv) for which we provide the stronger statement given in [15,Proposition 2.7], and with a few differences in formulation which are explained following the list.
Let H be a finite connected graph with a distinguished vertex o. We sometimes write v ∼ w to denote that {v, w} ∈ E(H). Let τ : H → [0, ∞), τ o = 0 and K > 0 be given. The addition algorithm defines a pair of measurable mappings T + , T − : R H\{o} → R H\{o} related by the equality and satisfying the following properties: (i) (bijections) T + and T − are one-to-one and onto.
(ii) (add at most τ ) For every ϕ ∈ R H\{o} and every v ∈ H, (iii) (gradient preservation) For every ϕ ∈ R H\{o} and every (v, w) ∈ E(H), The properties stated so far do not exclude the possibility that T + is the identity mapping (implying the same for T − by (9)). The next property shows that T + (ϕ)−ϕ is close to τ under certain restrictions on the set of edges on which ϕ changes by at least K. We require a few definitions.
Recall that d stands for graph distance, here on the graph H. The next two definitions concern the Lipschitz properties of τ .
(the ′ in τ ′ is supposed to remind the reader of differentiation). In the following definitions we consider the connectivity properties of the subset of edges on which ϕ changes by more than K. For ϕ ∈ R H define and write, for a pair of vertices where we mean in particular v (iv) (add close to τ ) For any ϕ ∈ R H\{o} satisfying M (ϕ) L(τ, K) − 2, Our final property regards the change of measure induced by the mappings T + and T − . We bound the Jacobians of these mappings when the subgraph E (ϕ) does not contain many large connected components.
For easier comparison with [15] let us explain the few differences between the way the result is formulated here and there.
(1) In [15] there is an additional parameter ε. We set this ε to 1 2 . (2) The parameter K does not appear in [15]. There, the constant 2K appearing in property (iii) is replaced by 1. The version here is achieved by dividing ϕ and τ by 2K, applying the addition algorithm of [15] and then multiplying back by 2K. (3) Our L is defined slightly differently than in [15], with L = L [15] (4) In [15] there is no distinguished vertex o on which the functions τ and ϕ are assumed to be zero. In addition, the Jacobians are shown to satisfy a stronger property than (17), allowing to fix the functions ϕ to arbitrary values on vertices where τ is zero. Here, for simplicity, we restricted to the case that τ and ϕ are fixed to zero at o as this is the only case we will use.

Proof of the main result
In this section we combine the previous ingredients to prove Theorem 2. Let L ∈ N. Let G be a finite connected graph with a distinguished vertex o and assume that B(o, L) is isomorphic to the ball B ((0, 0) Let u be sampled from the density (3) with respect to G and W . We need to show that there exist C, c(a) > 0 so that for any a > 0 and any x ∈ B(o, 2L), (20) and that c(a) can be taken to be at least c(a 0 )/a for all a > a 0 . We assume throughout the following that d(o, x) (and thus also L) is at least a large absolute constant as for each fixed d(o, x) we may take C large enough and c(a 0 ) small enough to make (19) trivial and make (20) follow from the Ward identity (Theorem 3).
The following is our main lemma, which shows that u x must be either larger than c(a) log d(o, x) or smaller than −c(a) log d(o, x), with high probability. The theorem follows from it, see page 14, by a simple application of the Ward identity (which is also used in the proof of the lemma).
Lemma 8. Let a 0 > 0. There exist C, c(a 0 ) > 0 such that for every a > a 0 , The rest of the section is devoted to proving the lemma and deducing Theorem 2. Throughout we fix a 0 > 0 and assume that a > a 0 .
We wish to use the addition algorithm from the previous section and to this end we need to specify the graph H, target function τ and constant K. Let H be the induced subgraph of G on the closed ball B(o, 1 2 x)}, so that H is a ball in Z 2 regardless of the choice of x. The choice to make the radius of the ball proportional to d(o, x) is made in order for the parameter M (ϕ) appearing in the addition algorithm to typically not be too large. The parameter K will be fixed using Lemma 9 below to a value depending only on a 0 . To specify τ we introduce a parameter λ which will be fixed later (following (40)) to a value of the form c(a 0 )/a. Define τ : H → [0, ∞) by The reason for taking τ to be 0 up to a large distance from o is to increase the size of the parameter L(τ, K) defined in (12). Indeed, as nearest-neighbour differences satisfy max y∼z |τ y − τ z | λd(o, x) −1/2 . For these H, τ (and the parameter K to be fixed below), the addition algorithm produces mappings T ± : R H\{o} → R H\{o} and the associated J ± : R H\{o} → [0, ∞). We define extensions of these maps on the whole of R G\{o} as follows. First,T ± : Second, the mapsJ ± : R G\{o} → [0, ∞) are defined byJ ± (u) = J ± (u| H ). It is simple to check that the extension of Property (v) of the addition algorithm holds, namely that where dϕ now stands for Lebesgue measure on R G\{o} . It is convenient to introduce a notation for the actual increments due to the addition algorithm where the reader should keep in mind that (as will be shown) i is close to τ on H in a suitable sense. In particular, by (10), i y = 0 whenever τ y = 0, i.e., We require some control over the Jacobians and increments resulting from the addition algorithm and this is provided by the following definition and lemma.
Definition 2. For a constant σ we define a "good" event G = G (σ) ⊆ R G\{o} as the set of all u satisfying that Lemma 9. Suppose λ 1. There exist absolute constants C, c, σ and a choice of K as a function solely of a 0 for which P(G (σ)) 1 − C exp(−cd(o, x) 1/4 ). Lemma 9 follows in a straightforward manner from the "two dependent percolations" picture and properties of the addition algorithm so we postpone its proof. Continuing with the proof of lemma 8, denote by H the event to be estimated in the lemma, i.e., H is the set of all u satisfying |u x | λ 3 log 1 Let ρ be the density of the field u ("the magic formula") as given in (3). The proof of Lemma 8 makes use of the Ward identity and the fact that ρ has the form with cosh(u y − u z ) (using that W | B(o,L) ≡ a by assumption), with ρ 2 a function of the gradients of u on the edge set E(G) \ E(H) and with ρ 3 a log-convex function. We take ρ 3 = exp(− u x ) D(W, u), a product of a log-linear term and a term whose log-convexity is justified by Lemma 4. Our analysis starts with the quantity for which we proceed to establish upper and lower bounds (again, u ∈ R G\{o} and the integration is with respect to the Lebesgue measure on R G\{o} ). On the one hand, by the Cauchy-Schwarz inequality and (24), (recall thatT + ,T − mean the transformations mapping u to u + , u − , i.e., the transformations defined on the whole graph G rather than just on the subgraph H.) On the other hand, by property (27) of the good event G (which contains I ), and we proceed to find a lower bound for the integrand. We study the three factors in (30) separately. First, by log-convexity and the relation (9) of the addition algorithm, = ρ 3 (u).
Second, by (23) Lastly, we calculate To obtain a simpler expression for the summands we note that by property (iii) of the addition algorithm, i y = i z when |u y − u z | 2K. A second-order Taylor expansion of cosh thus gives with C(K) > 0 solely a function of K. For simplicity, denote all constants that depend only on a 0 by C(a 0 ), in particular the C(K) above. In conclusion, Putting together (33), (34) and (35) we thus have on I that Plugging this bound back into (32) and using property (29) of the good event G , where in the second inequality we compensated for removing the +1 and the σ from the power by increasing C(a 0 ) (recall that a > a 0 and that σ is an absolute constant). Combining (36) with the upper bound (31) brings us to the key inequality We develop the right-hand side of the inequality. As I = H ∩ G we have P(u ∈T + (I )) P(u ∈T + (H )).
Further recalling that H is the set of u satisfying |u x | λ 3 log 1 4 d(o, x) and that T + is given by (23) we have that Putting together (38) and (39) and making use of Markov's inequality and the Ward identity (Theorem 3) now shows that Combining this inequality with (37) we get We see that for λ c(a 0 )/a for some positive c(a 0 ) sufficiently small, the power becomes negative (and we may also ensure that λ 1, to satisfy the assumption of Lemma 9, by taking c(a 0 ) a 0 ). Fix λ to such a value. The proof of Lemma 8 is now finished since, by Lemma 9, P(u ∈ H ) P(u ∈ I ) + P(u / ∈ G ) Cd(o, x) −c(a 0 )/a + C exp(−cd(o, x) 1/4 ) and the second term is negligible.
Proof of Theorem 2. Fix a 0 > 0 and suppose that a > a 0 . By Lemma 8 there exist C, c 1 (a 0 ) > 0 so that with The probability that u x is large can be bounded by the Ward identity (Theorem 3) and Markov's inequality: Together (41) and (42) show (19). To further deduce (20) we write  The inequality (20) follows by combining the last four displayed equations and plugging the definitions of t and s.

Properties of a union of percolations
In this section we discuss two specific quantitative ways in which the union of ε-percolations is sparse, which are required for the proof of lemma 9. Our analysis takes the underlying graph to be the whole square lattice as this suffices for our purposes.
Let P 1 , P 2 be two (dependent) ε-percolations on Z 2 . Write P for their union. Define the radius of connected components in P by r(y) := max{d(y, z) : z is connected to y by edges in P}, y ∈ Z 2 .
Proof. The event in question entails the existence of a simple path γ with k edges of P starting from y. In this case there is some i ∈ {1, 2} such that at least ⌈k/2⌉ of the edges of γ are in P i . For a fixed γ and i this probability can be bounded by ε k/2 2 k . Summing over γ (for which there are less than 4 k possibilities) and i gives P(r(y) k) 2 · 8 k · ε k/2 .
For ε sufficiently small, this is smaller than e −k for all k 1.
Lemma 11. There exist ε 0 , C, c > 0 such that if ε ε 0 then for all ℓ > 0, (the value 1 2 can be improved easily, but this is not useful for us).
Proof. We assume that ℓ is sufficiently large as otherwise the claim is trivial. Denote the sum in (43) by S. We proceed to upper bound S by sums involving simpler random variables. Let M 1 be a parameter and write Denote by E (y, i, M ) the event that there is a simple path γ from y to some z with d(y, z) = M with at least half of the edges of γ in P i . As in the proof of Lemma 10 we have The proof of Lemma 10 also implies that, for ε ε 0 , For M small we also need the fact that for every δ > 0 there exists an ε 1 (δ) such that ε ε 1 (δ) implies P(E (y, i, M )) δ, which holds for any M 1. This is also proved exactly like Lemma 10.
Going back to S M , we further subdivide (45) according to the value of the coordinates of y modulo 3M , defining The events in this last sum are independent (each E (y, i, M ) depends only on P i in B(y, M ) and these subsets are disjoint). Hence any of the standard methods may lead to the following estimate: for every s > 2ES v,i,m , (we used exponential moments, i.e. wrote P(S > s) E(exp(µ(S −E(S))) exp(−µ(s− E(S))) with µ = cℓ 2 , but any other standard method would give a usable estimate). Summing over i and v and using (45) gives We use this inequality for s = 1 36 M −3 log ℓ, and note that if ε is sufficiently small then the condition s > 2ES v,i,M will be satisfied: indeed, for every M 1, by summing (46) over y, ES v,i,M Ce −M log ℓ, while for M small the fact that P(E (y, i, M )) can be made as small as needed by reducing ε allows to make ES v,i,M Cδ log ℓ for any δ > 0. We get P(S M > 1 2M log ℓ) CM 2 exp(−cM −3 ℓ 2 log ℓ). Summing over M = 1, 2, 4, . . . , 2 k for k = ⌊log 2 ℓ 1/2 ⌋ gives P k m=0 S 2 m > log ℓ C exp(−cℓ 1/2 log ℓ).
Finally, the probability that S M > 0 for any M 1 (in particular, for M > 2 k ) is no more than Cℓ 4 e −M by (44) and (46). This establishes the lemma.

Proof of Lemma 9
Fix ε to be the minimum of the constants ε 0 from Lemma 10 and Lemma 11. Fix K = K(a 0 ) using proposition 5 to K := max sup{K Proposition 5 (ε, a) : a > a 0 }, 1 .
Recall from the addition algorithm the notations E (u) (13) and r(u, y) (15) which we write here as E K and r(y), respectively. We may apply to E K the probability estimates of Lemma 10 and Lemma 11 as, by Proposition 5, E K is dominated by the union of two ε-percolations and as H is a subgraph of Z 2 . The event G is comprised of 3 parts (recall Definition 2), and it will be convenient to name them G 2 , G 3 , G 4 so, for example, G 2 = {u : J + (u)J − (u) d(o, x) −σλ 2 }. We reserved G 1 to the following auxiliary event (recall the definition of M from (16)), Using Lemma 10, For the rest of the proof it is important to note that the bound (22) and the fact that λ 1 (by assumption) and K 1 (by (47)) show that This is important because properties (iv) and (v) of the addition algorithm rely on this assumption, so most of the argument will work only on G 1 . Let us start by bounding P(G 3 ) (recall (28)). Let y ∈ H satisfy d(o, y) = 1 2 d(o, x) . Property (iv) of the addition algorithm implies that on G 1 , if i y = τ y then τ ′ (y, r(y)) > 0. Since τ is constant on B(y, 1 5 d(o, x)) (see (21)), this can only be if r(y) 1 5 d(o, x). Thus, relying again on Lemma 10, P({i y = τ y } ∩ G 1 ) P r(y) 1 5 d(o, x) exp − 1 5 d(o, x) . Before estimating G 2 and G 4 let us first discuss the discrete derivative of τ . One checks that if y, z are neighbours in H then |τ y − τ z | 2λ/d(o, y). Summing this gives (recall (11)) τ ′ (y, k) 3λk d(o, y) , 0 k d(o, x), We proceed to estimate P(G 2 ) (recall (27)). By (18), (48) and (50) we have on G 1 that J + (u)J − (u) (18)  (1 + max z∼y r(z)) 2 d(o, y) 2 .
Thus, still on G 1 , we may use (49), (50) and (48)  Again, taking σ large, as an absolute constant, we deduce from Lemma 11 that P(G c 4 ∩ G 1 ) = P y,z∈H y∼z Combining the estimates on the probabilities of G 2 , G 3 and G 4 , the lemma is proved.
Gady Kozma Department of Mathematics and Computer Science, The Weizmann Institute of Science, Rehovot, 76100, Israel.

Ron Peled
School of Mathematical Sciences, Tel Aviv University, Tel Aviv, 69978, Israel.