Uniform fluctuation and wandering bounds in first passage percolation

We consider first passage percolation on certain isotropic random graphs in $\mathbb{R}^d$. We assume exponential concentration of passage times $T(x,y)$, on some scale $\sigma_r$ whenever $|y-x|$ is of order $r$, with $\sigma_r$"growning like $r^\chi$"for some $0<\chi<1$. Heuristically this means transverse wandering of geodesics should be at most of order $\Delta_r = (r\sigma_r)^{1/2}$. We show that in fact uniform versions of exponential concentration and wandering bounds hold: except with probability exponentially small in $t$, there are no $x,y$ in a natural cylinder of length $r$ and radius $K\Delta_r$ for which either (i) $|T(x,y) - ET(x,y)|\geq t\sigma_r$, or (ii) the geodesic from $x$ to $y$ wanders more than distance $\sqrt{t}\Delta_r$ from the cylinder axis. We also establish that for the time constant $\mu = \lim_n ET(0,ne_1)/n$, the"nonrandom error"$|\mu|x| - ET(0,x)|$ is at most a constant multiple of $\sigma(|x|)$.


Introduction.
In i.i.d. first passage percolation (FPP) on a graph G = (V, E), i.i.d. (edge) passage times τ e are attached to the edges e ∈ E, and for a path Γ in G, the (path) passage time T (Γ) is the sum of the times τ e over e ∈ Γ. For x, y ∈ V, the passage time from x to y is (1.1) T (x, y) = inf{T (Γ) : Γ is a path from x to y in G}.
The geodesic from x to y is the path, denoted Γ xy , which minimizes the path passage time; when τ e is a continuous random variable (as we always assume), a unique geodesic exists a.s. [16]. There are two exponents of primary interest in the study of FPP. First, the fluctuations (i.e. standard deviation) of passage times T (x, y) for |y − x| of scale r in Z d are believed to be of order r χ for some χ = χ d < 1/2, with χ 2 = 1/3. Second, the typical transverse wandering of a geodesic, meaning the maximum distance from any point on Γ xy to the straight line (denoted Π xy ) from x to y, is believed to be of order r ξ for some ξ = ξ d . For |y − x| of order r, if Γ xy contains a vertex z at distance of order r ξ from Π xy (not too near x or y), then the associated extra distance |z − x| + |y − z| − |y − x| traveled by the geodesic in order to pass through z is of order r 2ξ−1 . For such wandering to have non-negligible probability, the passage time fluctuations r χ should be at least as large as the extra distance; this leads to the relation χ = 2ξ − 1. There are various ways to formally define the exponents χ, ξ; these must allow for the fact that the true scales of fluctuations and wandering are not known to be pure powers of r. Chatterjee [9] gave a rigorous version of the relationship χ = 2ξ − 1, under the assumption that multiple possible definitions of each exponent actually agree.
Looking more finely than just at the level of exponents, the heuristic for χ = 2ξ − 1 says that if the fluctuation scale is σ r for |y − x| of order r, then the scale of transverse wandering should be ∆ r = ∆(r) = (rσ r ) 1/2 .
In [2] and (for d = 2) in [14] it was shown that under natural assumptions, the transverse wandering with high probability does not exceed (rσ r log r) 1/2 . One of our main results here is an essentially optimal upper bound on wandering: under slightly weaker assumptions, for all d ≥ 2, the probability of wandering greater than s∆ r decays as e −cs 2 or faster, for s ≤ r/∆ r . Previously such a result has only been known for integrable cases of last passage percolation (LPP) in d = 2, from [8] (with e −cs in place of e −cs 2 ) and [7].
In fact we have this bound uniformly over many geodesics simultaneously, in the following sense: Consider a cylinder of length r and radius K∆ r , and let > 0 and s > 2K. Then under the assumptions we will make, the probability that there exists any geodesic Γ xy , with x, y in the cylinder and |y − x| ≥ r, which wanders farther than s∆ r from the cylinder axis decays as e −cs 2 , for s ≤ r/∆ r .
By comparison, in [2] it was shown roughly that if there is exponential concentration on some scale σ r growing "like a power of r," uniformly for passage times over distance r, then the probability of a transverse fluctuation of size t(log r) 1/2 ∆ r for a fixed geodesic Γ xy is bounded by C 1 e −C 2 t 2 log t for all t > 0. This tells us nothing, though, about transverse fluctuations of size t∆ r with 1 t (log r) 1/2 , which should also be subject to exponential concentration, as in our present result.
It should be emphasized that in our transverse wandering (and other) results, σ r is not necessarily the actual scale of the standard deviation-it need only be an upper bound in the sense that exponential concentration holds for passage times T (x, y) on scale σ(|y − x|). Then the corresponding value ∆ r is what appears in the upper bound for transverse wandering.
Our uniform wandering bound will be a byproduct of another uniform-bound result for passage times; to describe it we first discuss exponential bounds. For the lattice Z d , Kesten [17] proved that, assuming (1.2) Ee λτe < ∞ for some λ > 0, and P (τ e = 0) < p c (Z d ) (where p c (Z d ) is the bond percolation threshold for Z d ), there is exponential concentration of T (x, y) on scale r 1/2 for |x − y| ≤ r: P (|T (x, y) − ET (x, y)| ≥ tr 1/2 ) ≤ C 3 e −C 4 t for all t ≤ C 5 r.
None of these bounds are near-optimal, though-an optimal bound would be on the scale of the standard deviation of T (x, y). What we will prove here is roughly as follows. Suppose passage times satisfy exponential concentration on a scale σ(·), uniformly: P |T (x, y) − ET (x, y)| ≥ tσ(|y − x|) ≤ C 10 e −C 11 t for all x, y, for some σ(r) which "grows like r χ " for some χ ∈ (0, 1), in a sense we will make precise. Then for G r (K) a cylinder of length r and radius K∆ r for some fixed K, we have concentration on the same scale, uniformly over x, y ∈ G r (K): (1.4) P T (x, y) − ET (x, y) ≥ tσ r for some x, y ∈ Q r with |y − x| ≥ r ≤ C 12 e −C 13 t for all r large and t ≥ cK 2 . This has previously been proved for integrable models of LPP in d = 2 ([], []), but even the non-integrable part of the proof there does not carry over to FPP-see Remark 1.7. For d ≥ 3 there is no generally-agreed-upon value of χ in the physics literature. Heuristics and simulations suggest that χ should decrease with dimension; simulations in [24] for a model believed to be in the same (KPZ) universality class as FPP show a decrease from χ = .33 to χ = .054 as d increases from 2 to 7. Some have predicted the existence of a finite upper critical dimension, possibly as low as 3.5, above which χ = 0 ( [13], [19]); others predict that χ is positive for all d ( [3], [23]), with simulations in [18] showing χ > 0 all the way to d = 12, decaying approximately as 1/(d + 1). Our results here require χ > 0 so they only have content below the upper critical dimension, should it be finite.
In the preceding and throughout the paper, c 1 , c 2 , . . . and C 1 , C 2 , . . . , and 0 , 1 , . . . represent unspecified constants which depend only on the graph G (or its distribution, if it is random) and the distribution of the passage times τ e (or speeds η e , to be given below.) We use C i for constants which occur outside of proofs and may be referenced later; any given C i has the same value at all occurrences. We use c i for those which do not recur and are only needed inside one proof. For the c i 's we restart the numbering with c 0 in each proof, and the values are different in different proofs.
As is standard, since passage times T (x, y) are subadditive, assumptions much weaker than (1.2) guarantee the a.s. existence (positive and finite for x = 0) of the limit (1.5) g(x) = lim n T (0, nx) n = lim n ET (0, nx) n = inf n ET (0, nx) n a.s. and in L 1 for x ∈ Z d ; g extends to x with rational coordinates by considering only n with nx ∈ Z d , and then to a norm on R d by uniform continuity. To obtain the optimal uniform results for wandering and (1.4) for fluctuations, we need to understand both parts of the discrepancy .
Here in the parentheses on the right are the random part and nonrandom part of the discrepancy.
In [26] the error term was improved to C 14 (|x| log |x|) 1/2 , and in [14] to c η |x| 1/2 (log |x|) η for all η > 0. For the Euclidean first passage percolation of [15], the analog of (1.7) was proved in [12] with an error term of C 14 Ψ(|x|) log (k) |x| for arbitrary k ≥ 1, where Ψ(|x|) is a scale on which an exponential bound is known (analogous to σ(|x|) in (1.3) below) and log (k) |x| is the k-timesiterated logarithm. Here we will obtain an essentially optimal bound for the nonrandom part: if σ(·) satisfies certain regularly and (1.3) holds, then all the log factors in the earlier bounds are extraneous: we have There is a strong interdependence among this result, our uniform wandering bounds, and (1.4), as discussed in Remark 1.6. Analogs of (1.8), of (1.4), and of of our uniform wandering result, with optimal scale σ(|x|) = var(T (0, x)) 1/2 |x| 1/3 , are known for certain integrable models of directed last passage percolation (LPP). We note that an exponential bound like (1.3), but with centering at the analog of g(x) instead of at ET (0, x), shows that (i) (1.8) must hold, and (ii) an exponential bound like (1.3) must also hold with centering at g(y − x). Such a recentered bound appears in [8] (extracted from [4]) and in [20] for LPP on Z 2 with exponential passage times, in [21], [22] for LPP based on a Poisson process in the unit square, and in [10] for LPP on Z 2 with geometric passage times. An analog of our transverse wandering bound for LPP on Z 2 with exponential passage times appears in [7]. All of these require integral probability methods, which we do not use.
Rather than work on the lattice Z d , we will consider isotropic models, built on a random graph G = (V, E) embedded in R d . The dilation of such an embedded graph is the least C such that, for every x, y ∈ V there is a path from x to y in G for which the total (Euclidean) length of the edges is at most C|y − x|. We say that such a graph has bounded dilation if there exists C such that with probability one the dilation of G is at most C. For A ⊂ R d , the restriction of G to A is the graph with vertex set V A = {x ∈ V : x, y ∈ E for some y ∈ V ∩ A} and edge set E A = { x, y ∈ E : x ∈ A}.
We require that the graph G satisfy the assumptions A1 below, which are somewhat stringent and include bounded dilation, but we will construct an example that works. (We see no need to make the graph as general as possible; we simply need one to know we can work with one that has certain desirable properties.) That example is built roughly as follows. We first construct a point process V to serve as vertices, with V satisfying those parts of A1 which involve only the vertices. To make a graph from V we use the Voronoi diagram, which divides R d into closed polyhedrons {Q x : x ∈ V} (called Voronoi cells), the interior of the polygon Q x consisting of those points which are strictly closer to x than to any other point of V. We refer to x as the center point of Q x , and define ϕ by ϕ(y) = x for y ∈ Q x . To produce the Delaunay graph (or Delaunay triangulation in d = 2) one places an edge between each pair x, y ∈ V for which Q x and Q y have a face of positive (d − 1)-volume in common. For d = 2, it is known that the dilation of the Delaunay graph of any locally finite subset of R 2 is at most 1.998 [27], ensuring A1 is fully satisfied, but such bounded dilation for d ≥ 3 is not known. We therefore modify the Delaunay graph by adding certain nonnearest-neighbor edges of uniformly bounded length by a deterministic local rule, and show that bounded dilation then holds. Here and in what follows, by the length of an edge e = x, y we mean the Euclidean distance |y − x|, which we denote |e|.
Our FPP results on isotropic random graphs adapt to Z d , but only by assuming two unproven properties of FPP on Z d : first, uniform curvature of ∂B g , and second, a kind of smoothness of the mean as the direction changes: sup |ET (0, x)−ET (0, y)| : max(|g(x)−r|, |g(y)−r|) ≤ C 16 , |y−x| ≤ C 17 ∆ r = O(σ r ) as r → ∞.
Here then are the assumptions G = (V, E) must satisfy.
(i) G = (V, E) is isotropic, stationary, and ergodic; (ii) Bounded hole size: every open ball in R d of radius 1 contains at least one vertex of V; (iii) Finite range of dependence: there exists β such that if A, B are Lebesgue-measurable subsets of R d separated by distance d(A, B) ≥ β, then the restrictions of G to A and to B are independent; (iv) Bounded dilation: the dilation of G = (V, E) is bounded a.s. (and hence equal to some nonrandom C 18 a.s., by (i)); (v) Exponential bound for the local density: given r 0 > 0 there exist C 19 , C 20 such that for all r > r 0 and a ≥ 1, P (|V ∩ B r (0)| ≥ ar d ) ≤ C 19 e −C 20 a ; We say a random graph with these properties is acceptable. We will show that acceptable random graphs exist. By rescaling, we may replace radius 1 in (ii) by any other positive value. Condition (v) can be weakened from exponential to stretched exponential; we use exponential to simplify the exposition.
Conditionally on G we define a collection of i.i.d. nonnegative continuous random variables η = {η e , e ∈ E}. Formally the pair ω = (V, η) is defined on a probability space (Ω, F, P ), with G determined by V.
In contrast to the usual FPP on a true lattice, here we view η e not as a time but as a speed. We thus define the passage time of a bond e to be η e |e|, and proceed "as usual": for x, y ∈ V, a path Γ from x to y is a finite sequence of alternating vertices and edges of G, of the form Γ = (x = x 0 , x 0 , x 1 , x 1 , . . . , x n−1 , x n−1 , x n , x n = y). We may designate a path by specifying only the vertices. The (path) passage time of Γ is T (Γ) := e∈Γ η e |e|, and the passage time from x to y is (1.9) T (x, y) := inf{T (Γ) : Γ is a path from x to y in G}.
For technical convenience we do not require that paths be self-avoiding, but for the moment this is irrelevant because geodesics are always self-avoiding. For general x, y not necessarily in V, we take "a geodesic from x to y" to mean a geodesic from the center point ϕ(x) to ϕ(y). Let ζ(λ) = Ee ληe . We assume the following.
(i) η e is a continuous random variable.
Here (i) guarantees that there is at most one geodesic from x to y a.s., for each x, y.
Remark 1.1. If σ(·) is powerlike, then so is the increasing functionσ(r) = sup s≤r σ(s); by further increasingσ (though by at most a constant factor) we may make it strictly increasing and continuous while preserving the powerlike property. Therefore we may and do without loss of generality always assume σ(·) is strictly increasing and continuous. The inverse function ∆ −1 is well-defined, and for ξ = (1 + χ)/2 ∈ ( 1 2 , 1) we have ∆(r) r ξ and ∆ −1 (a) a 1/ξ in the sense that For general x, y ∈ R d not necessarily in V, we write Γ xy for Γ ϕ(x),ϕ(y) . In general we view Γ xy as an undirected path, but at times we will refer to, for example, the first point of Γ xy with some property. Hence when appropriate, and clear from the context, we view Γ xy as a path from ϕ(x) to ϕ(y).
Our final standard assumption is the following.
As we have noted, for |y − x| of order r, if Γ xy contains a vertex z at distance of order ∆ r from Π xy (not too near x or y), then the associated extra distance g(z − x) + g(y − z) − g(y − x) traveled by the geodesic is of order ∆ 2 r /r = σ r , and by Theorem 1.4 the same is true for h in place of g. Since the corresponding passage times satisfy T (x, z) + T (z, y) − T (x, y) = 0, this means that either The assumption (1.11) says the first of these is unlikely, and Theorems 1.2 and 1.4 can be used to show it is unlikely that there exists a z for which the second or third occurs. (Not without complications, though, as we cannot assume z ∈ G r .) This is the idea behind the following. For r, s > 0 define intervals enlarging [0, r]: where C 34 "sufficiently large" will be specified later, and For s > K we have G r (K) ⊂ G r,s ; in this case we may view G r,s as being the cylinder G r (K) fattened transversally to width s∆ r , and lengthened by an amount which varies with the size of s relative to r, but is essentially "enough to make wandering of geodesics out the cylinder end at least as unlikely as out the sides." Theorem 1.5. Suppose G = (V, E) and {η e , e ∈ V} satisfy A1, A2, and A3. There exist C i such that for all K ≥ C 35 , C 36 e −C 37 s 2 for all C 38 K ≤ s ≤ r/∆ r , C 36 e −C 37 s∆r/σ(s∆r) for all s > r/∆ r . (1.16) We include the condition |(y − x) * | ≤ (y − x) 1 because we are primarily interested in transverse fluctuations of geodesics out the side of G r,s , so we wish to avoid y − x too nearly parallel to the end of G r,s . Remark 1.6. The strategy for proving Theorems 1.2-1.5 is as follows: (1) prove Theorem 1.2 for downward deviations-this is the most difficult part; (2) use Theorem 1.2 for downward deviations to prove Theorem 1.5 restricted to a fixed (x, y);  [6], an alternate strategy was used to prove LPP analogs (in integrable cases) of Theorems 1.2 and 1.5 in d = 2. Theorem 1.4 was already known for that context-see the comments following (1.8). Essentially the strategy for the Theorem 1.2 analog in [8] is this, when translated to FPP: first the easier upward-deviations half of Theorem 1.2 is proved. For downward deviations, consider the points 0 and 3re 1 , and a cylinder G r of radius ∆ r with axis from re 1 to 2re 1 . Suppose there are (random) vertices u, v ∈ G r with u 1 < v 1 for which the passage time is fast: for some large t. From the upwarddeviations half of Theorem 1.2, with high probability we have also T (x, u) ≤ h(|u − x|) + tσ(|u − x|) and T (v, y) ≤ h(|y − v|) + tσ(|y − v|). From this, using that σ r is proportional to r 1/3 and h(r) = µr + O(σ r ), assuming t is large enough, This has probability exponentially small in t, by (1.11), hence so does the probability of such u, v existing. This strategy does not work for FPP, however, as it requires one to already know Theorem 1.4 to obtain the second inequality in (1.17).

Existence of acceptable random graphs
We construct a random graph G = (V, E) satisfying A1. We begin by constructing the point process V of vertices. We start with a "space-time" Poisson process V 0 (which we view as a random countable set) of density 1 with respect to Lebesgue measure in In other words, we keep in V the R d coordinate v of a point of V 0 if v is the first point to appear in some ball of radius 1. Then almost surely, for each v ∈ V there is a unique point (v, t v ) ∈ V 0 , and we view t v as the time at which v appeared in V 1 . With probability one, every unit ball in R d contains a point of V. We call V the available-space point process.
For the set V we let {Q v , v ∈ V} denote the corresponding Voronoi cells. Write Q x for the Voronoi cell containing x (with some arbitrary convention if x is on the boundary of multiple cells), and ϕ(x) for the unique point of V in Q x . When convenient we view e = x, y as the line segment joining x and y. Let B r (x) denote the open Euclidean ball of radius r about x. For d = 2, the Delaunay graph of V is our graph G. For d ≥ 3 we fix 0 < δ G < 1 and use We call G = (V, E) the augmented Delaunay graph of V. The edges in E which are not in the Delaunay graph are called augmentation edges. We write x ∼ y to denote that x, y are adjacent vertices in G, and x ∼ Del y to denote adjacency in the Delaunay graph of V. If y ∼ Del z, then for all u ∈ Q y ∩ Q z , B |y−z|/2 (u) ∩ V = ∅. Hence by A1(ii), Similarly if z ∈ Q y then B |z−y| (z) contains no point of V, so |z − y| < 1; thus Remark 2.1. The purpose of augmentation is roughly the following. Consider the line segment Π xy between x, y ∈ V. It passes through a sequence of Voronoi cells Q x = Q x 0 , Q x 1 , . . . , Q xm = Q y , and there is a corresponding path x = x 0 → x 1 → · · · → x m in the Delaunay graph. If too many of the cells Q x j are "thin," then the Delaunay path length m j=1 |x j − x j−1 | may be much greater than |y − x|, making it difficult to bound the dilation. The augmentation effectively allows paths in G that "skip over" such problematic sequences of cells, at least for a small distance, enabling us to prove bounded dilation while preserving other properties of the Delaunay triangulation. We can reduce the occurrence of augmentation to involve an arbitrarily small proportion of vertices by using a small enough δ > 0. We will not give details here.
Proposition 2.2. The augmented Delaunay graph of the available-space point process satisfies A1.
Proof. A1(i) for V follows from the same properties for V 0 ; since G is constructed from V via isotropic and translation-invariant local rules, A1(i) also holds for G. A1(ii) follows from the fact that the first point of V 0 to appear in any radius-1 ball is always a point of V.
To prove A1(iii), observe first that by (2.1), given x ∈ V the cell Q x and all Delaunay edges x, y are determined by V ∩ B 2 (x). Therefore the collection of Voronoi cells intersecting B 3 (x) is determined by V ∩ B 5 (x), and hence so are all augmentation edges x, y .
Turning to A1(v), let q = 1/(1 + 3 √ d ) and r > r 0 > 0, and let This means the 5 d −3 d cubes in J x form a shell around J x , with a smaller shell of cubes in between, and the diameter of J x is less than 2r 1 . Then any radius-1 ball intersecting J x must contain a cube in J x . Letting t(J x ) = max{t v : v ∈ V ∩ J x } it follows that at time t(J x ), for some J y ∈ J x , at least |V ∩ J x | points of V 0 had appeared in J x but none in J y . Letting N xy be the number of points of V 0 appearing in J x before the first point of V 0 appears in J y , this says that N x := max Jy∈Jx N xy ≥ |V ∩ J x |. Since |J x | ≤ 5 d it follows that and hence for λ < log 2, lattice" of cubes, and we divide it into 5 d sublattices, each consisting of a cube J x together with all its translates by vectors in 5qr 1 Z d which intersect [−r, r) d . We label these sublattices of cubes I 0,1 , . . . , I 0,5 d , and the cardinality satisfies (2.5) |I 0,j | ≤ 2 1 + r 5qr 1 d for each j. We denote the corresponding union of cubes as I 0,j = ∪ Jx∈I 0,j J x . The spacing of the cubes in I 0,j means that that the shells {J x : J x ∈ I 0,j } are disjoint, so the variables {N x : J x ∈ I 0,j } are i.i.d.
For a ≥ 1, taking λ = 1/5 in (2.4) we obtain Ee λNx ≤ 3 · 5 d and hence using (2.5), provided a is large (depending on r 1 ), Finally we consider A1(iv). Let x, y ∈ V. As in Remark 2.1, Π xy passes through a sequence of Voronoi cells Q x = Q x 0 , Q x 1 , . . . , Q xm = Q y , and there is a corresponding path x = x 0 → x 1 → · · · → x m in the Delaunay graph. (There is probability 0 that Π xy intersects some Q u in just a single point, so we will ignore this possibility, meaning that "passes through" here is unambiguous.) For j < m let a j be the first point of Q x j in Π xy and let a m = y, so by convexity of cells, Q x j ∩ Π xy = [a j , a j+1 ] for all 0 ≤ j < m. We select indices 0 = j(0) < j(1) < · · · < j( ) = m iteratively, taking j(k + 1) as the least index j > j(k) for which either |a j+1 − a j(k)+1 | > δ G or j = m. Then x j(k) , x j(k+1) is always a Delaunay or augmentation edge, so we consider the path For k ≤ − 1 we have using (2.2) Having ≥ 2 also ensures |y − x| ≥ |a j(1) − a j(0) | > δ G so On the other hand, if = 1 then holds. This proves bounded dilation.

Straightness of geodesics and and regularity of means
For q > 0 and x ∈ R d , let ψ q (x) = the point of qZ d closest to x (with ties broken arbitrarily), the latter being a cube (ignoring the boundary.) The bound (1.3) applies to deterministic x, y; we cannot for example take x, y ∈ V. Instead for random x, y we can apply (1.3) to nearby points of qZ d for some q, and use the following. It is the only place the assumption A2(iv) of bounded dilation is used.
and {η e , e ∈ V} satisfy A1, A2, and A3. There exist constants C i as follows.
(i) Let r ≥ 2 and t ≥ C 39 log r. Then (ii) For all x, y ∈ R d , Proof. To prove (i) we condition on V. Given x, y ∈ B r (0) ∩ V, by A1(iv) there exists a path Writing η j for η x j−1 x j , and using convexity of log M , for λ > 0 we have so using A2(ii) it follows that for some c i , where I e (t) = sup γ>0 (γt − log ζ(γ)) is the large-deviations rate function of the variables η e . From this, using A1(v), To prove (ii) we apply (3.2) to ϕ(x) and ϕ(y), and use the fact that |ϕ(x) − x| ≤ 1 for all x.
Building on Lemma 3.1 we obtain the following.
Lemma 3.2. Suppose G = (V, E) and {η e , e ∈ V} satisfy A1, A2, and A3. There exist constants C i as follows. For all r ≥ 2, u, v ∈ R d and t > 0 with σ(|u − v|) ≥ r and tσ(|u − v|) ≥ C 43 r log r, We write x → y 1 → · · · → y k → y in Γ xy to denote that the vertices y i ∈ V appear in the order y 1 , . . . , y k in traversing Γ xy from x to y. For a preceding b in Γ xy we write Γ xy [a, b] for the segment of Γ xy from a to b. (Here we do not require a, b ∈ V.) For v in a geodesic Γ xy and 0 < s < |v − x|, let u be the first vertex in Γ xy ∩ V before v satisfying Γ uv ⊂ B s (v). We then call Γ uv the trailing s-segment of v in Γ xy . Note that by (2.1), . Define the hyperplanes, slabs, and halfspaces We turn next to a weaker version of Theorem 1.4 similar to (1.7); we will need it on the way to the proof of Theorem 1.4. Recall h(r) − µr is nonnegative by subadditivity of h. We can use radial symmetry here to give a distinctly shorter proof than that of (1.7) in [1].
and {η e , e ∈ V} satisfy A1, A2, and A3. There exists C 46 such that for all r ≥ 2, Proof. It is sufficient to prove the bound for all sufficiently large r, so we will tacitly assume r is large, as needed.
We consider the geodesic Γ 0,nre 1 for a fixed large n. Let β be as in A1(iii). For 0 ≤ j ≤ n−1 let v j be the first vertex v in Γ 0,nre 1 ∩V with the property that the trailing (r −2β)-segment of v in Γ 0,nre 1 is contained in H + jr , and let u j be the starting point of this segment. From (2.1) it is easy to see that we must have jr ≤ (u j ) 1 ≤ jr + 2. It follows that Γ 0,nre 1 ; we denote this last region by W v j . Note that any two sets W v j are separated by distance at least 2β − 2.
We need to control the entropy of the collection of pairs {(u j , v j ) : 0 ≤ j < n}. To do this, we enlarge this collection in such a way that we can put a natural tree structure on it. To that end, let v n be the first vertex in Γ 0,nre 1 ∩ V, if one exists, which is at distance at least r from ∪ 0≤j<n W v j . Then let u n ∈ V be such that Γ 0,nre 1 [u n , v n ] is the trailing (r − 2β)-segment of v n in Γ 0,nre 1 , and let W vn = B r−2β (v n ), which contains Γ 0,nre 1 [u n , v n ]. We repeat this to obtain (u n+1 , v n+1 ), . . . , (u N , v N ), stopping when no v N +1 exists. Note this preserves the property of separation by distance at least 2β − 2. Also, each new v j is within distance 2r − 2β + 2 of some already-existing v i .
We define the discrete , and then let which contains W v j ; the setsŴ j , 0 ≤ j ≤ N are separated from each other by distance at least 2β − 6. We call {(û j ,v j ) : 0 ≤ j < n} primary pairs, and {(û j ,v j ) : n ≤ j ≤ N } secondary pairs.
1 c We now make a graph with vertices {(û j ,v j ) : 0 ≤ j ≤ N } by placing an edge between the ith and jth pairs if |v j −v i | ≤ 4r. The construction ensures that the resulting graph is connected, and it is easy to see that the disjointness of the setsŴ j means the number of neighbors of any pair is bounded by some c 0 . We label (u 0 , v 0 ) as the root, and by some arbitrary algorithm, we take a spanning tree of the graph, which we denote T (Γ 0,nre 1 ). For counting purposes, we view two such trees as the same if they have the same pairs {(û j ,v j ) : 0 ≤ j ≤ N }, and the same set of primary pairs. We define parents and offspring in this rooted tree in the usual way: for a given pair (û j ,v j ), its parent is the first pair after (û j ,v j ) in the unique path from (û j ,v j ) to the root, and its offspring are those pairs having (û j ,v j ) as parent.
The tree T (Γ 0,nre 1 ) determines what we will call an abstract tree, in which all that is specified is the number of offspring of the root, then the number of offspring of each of these offspring, etc.
The number of possible abstract trees here with N + 1 vertices is at most c N 0 , and, provided r is large, for each such abstract tree the number of corresponding actual trees T (Γ 0,nre 1 ) is at most ((8r) d ) 2N +2 , so the number of trees T (Γ 0,nre 1 ) consisting of N + 1 pairs is at most (c 1 r) 2d(N +1) .
We now consider all possible trees T 0 , with vertices {(û j ,v j ) : 0 ≤ j ≤ N }, and with {(û j ,v j ) : 0 ≤ j < n} primary. Let v(T 0 ) denote the number of vertices in the tree T 0 . We have We proceed by contradiction: we want to show that for some C 46 , if h(r) > µr + C 46 σ r log r then the right side of (3.8) approaches 0 as n → ∞, which means lim sup n T (0, nre 1 )/n ≥ µr + 1 a.s. , contradicting the definition of µ.
Thus suppose (3.9) h(r) > µr + 2C 46 σ r log r, with C 46 to be specified, and fix T 0 with vertices It follows that, lettinĝ , theT j 's are independent (since the sets W 1 j are separated from each other by distance more than 2β − 8 > β) and satisfy EachT j is stochastically larger than From (1.10) and (3.7) we have for some c 2 that σ(|û j −v j |) ≤ c 2 σ r . From (3.7), (3.9), and subadditivity we also have Hence from Lemma 3.2 we obtain that provided r is large, for all t ≥ 1, By increasing C 44 we may make this valid for all t > 0. We then have for λ = C 45 /2c 2 σ r and M = µr + C 46 σ r log r, ≤ e −n log r . (3.14) As we have noted, since this approaches 0 as n → ∞, it contradicts the fact that T (0, nre 1 )/n → µr a.s. Thus (3.9) must be false.
We need to use a result from [2] to the effect that "geodesics are very straight." It is proved there for FPP on a lattice, but the proof goes through essentially unchanged for the present context. The heuristics are as follows: suppose the geodesic Γ 0,re 1 passes through a vertex u = (u 1 , u * ) at distance s from Π 0,re 1 ; by symmetry we may suppose u ∈ H − r/2 . The geodesic then travels a corresponding extra distance g(u) + g(re 1 − u) − g(re 1 ). If the angle between u and re 1 is small, this extra distance is of order |u * | 2 /u 1 , and from (1.11), the cost of this (meaning log of the probability) is of order |u * | 2 /u 1 σ(u 1 ). If instead the angle between u and re 1 is not small, the extra distance is of order |u| and the cost is of order |u|/σ(|u|). We can combine these into a single statement by saying the cost for general u should be whichever of these two costs is smaller, at least for u ∈ H [0,r] .
The exact formulation of the straightness result contains extra log factors relative to the preceding heuristic, due to the need to bound the probability for all u simultaneously. It is as follows, using the constants C i , χ i of (1.10). Define σ * (s) and Φ(s) by .
Here factoring out a power of s on the right, and the use of the sup, ensure that Φ is strictly increasing. Note that by (1.10) we have Note that by (1.10) and (3.15), for large s, The "min" in the definition of D is in accordance with our heuristic: from [2], we have for large |u| that for C 23 from (1.10), Finally, define the symmetric version of D: This makes the right half of the region {u : D r (u) ≤ c} the mirror image of the left half; this region is a "tube" (narrower near the ends) surrounding the line from 0 to re 1 bounded by the shell {u : |u * | = c 1/2 Ξ(u 1 )}, augmented by a cylinder of radius Φ −1 (c) and length 2Φ −1 (c) around each endpoint, so we will call it a tube-and-cylinders region.
We will also consider tube-and-cylinders regions around general pairs u, v in place of 0, re 1 . To that end, let Θ uv : R 2 → R 2 be translation by −u followed by some unitary transformation which takes v − u to the positive horizontal axis, so that Θ uv (u) = 0, Θ uv (v) = |v − u|e 1 . (The particular choice of unitary transformation does not matter.) Define the tube-and-cylinders regions and note the latter contains Π uv . The proof of the acceptable-random-graphs version of the straightness bound is little changed from the lattice-FPP version in [2]; we can readily use Lemma 3.1(i) to change the result from "point-to-point" (say, 0 to re 1 ) to "ball-to-ball," with a sup over x and y. We omit the details. Proposition 3.4. Suppose G = (V, E) and {η e , e ∈ V} satisfy A1, A2, and A3. There exist constants C i as follows. For all r, t > 0, (3.20) P sup The following lattice-FPP result from [2] also carries over straightforwardly to the present context with the help of Lemma 3.1(i), and again we omit details. Definê Proposition 3.5. Suppose G = (V, E) and {η e , e ∈ V} satisfy A1, A2, and A3. There exist constants C i as follows. For all u, v ∈ qZ d with and all λ ≥ C 52 , we have We next prove a seemingly obvious fact: h(r) = ET (0, re 1 ) is approximately increasing in r.
Proof of Lemma 3.6. We prove (i), then obtain (ii) as a straightforward consequence. The idea is to show that Γ 0,(r+s)e 1 must with high probability approach (r + s)e 1 approximately horizontally, which forces T (0, (r + s)e 1 ) − T (0, re 1 ) to be near h(s) with high probability; a non-horizontal approach would force the sup in (3.20) to be large.
Suppose (3.23) holds under the added condition s ≤ r/4. Then for s > r/4 we can take n with s/n ≤ r/4 < s/(n − 1) and see that the hypotheses are satisfied with s/n in place of s. (This may require increasing C 55 , but without dependence on n.) Applying (3.23) n times then yields Therefore it is sufficient to prove the lemma for s ≤ r/4. It is also sufficient to consider 0 < < 0 for any fixed 0 = 0 (q) > 0. With c 0 to be specified, define Note that provided C 55 (and hence s and m) is large, which we henceforth tacitly assume, and provided 0 is small, we have Φ −1 (t) < m/2 and m ≥ 2s. Suppose w ∈ S 1 . We have |w 1 − (r + s)| ≥ m−2 > Φ −1 (t) (meaning w lies to the left of the cylinder around (r +s)e 1 in the tube-and-cylinders region {u : D r+s (u) ≤ t}) and so by (3.17), Thus S 1 ⊂ S 2 . We letŵ = (r + s − m, w * ), so |w −ŵ| ≤ 2 for w ∈ S 2 . Then for such w, recalling m ≥ 2s, we have so assuming C 55 is large and 0 is small, It follows that provided we choose c 0 large enough in (3.25), With (3.27) and Proposition 3.3, this yields that provided C 55 is large, We need a lower bound for t. Since q >q Observe that for every vertex w ∈ Γ 0,(r+s)e 1 ∩ V we have (3.30) and subadditivity we have Assuming 0 is small, by Proposition 3.4 and (3.29) we have Combining these yields To use (3.31) we also need a lower bound for E(T * ). Then for y > 0, using (3.28), We consider the first probability on the right in (3.34); the second probability is similar. For w ∈ S 2 we have using (3.26) that m − 2 ≤ |w − (r + s)e 1 | ≤ 1 + 2 m, and then from (1.10), From these and Lemma 3.2 (see C 45 there) we obtain that if c 0 is large enough, then for y ≥ c 0 , With (3.34) this shows that Combining this with (3.31) and (3.33) yields (3.23). We now prove (ii). From (i), there exists c 6 such that h(r + s) ≥ h(r) + (1 − )h(s) whenever r, s ≥ c 6 ; there then exists c 7 such that h(s) ≤ c 7 whenever 0 ≤ s ≤ c 6 . We therefore have which proves (3.24).

Proof of Theorem 1.2-downward deviations
We use a multiscale argument which is related to chaining. But first we dispense with simpler cases that only require Lemma 3.2 and Proposition 3.4. The first such case is pairs x, y which are close together. For technical convenience later, we prove the following also for geodesics with endpoints in the set G + r , satisfying G + r ⊃ G r (K) for K fixed and r large, given by with C 60 to be specified.
and {η e , e ∈ V} satisfy A1, A2, and A3. There exist constants C i for all r ≥ 2, K ≥ 1, and t > C 61 K 2 , Proof. We may assume t is large. We first discretize: by so if we take c 0 small enough then by Lemma 3.2, which with (4.2) proves (4.1). Lemma 4.1 means we need only consider x, y ∈ G r (K) satisfying Writing α uv for the angle between nonzero vectors u, v ∈ R d , this means that for large r, A second simple case is small r. For fixed r 0 and 1 ≤ r ≤ r 0 , from Lemma 3.1 for all x, y ∈ G r we have ET (x, y) ≤ c 1 r, so for t ≥ c 2 r 0 we have ET (x, y) − tσ r ≤ c 1 r − tσ r < 0, and hence the probability in (1.13) is 0. Therefore there exist C 29 , C 30 such that (1.13) is valid for all 1 ≤ r ≤ r 0 and t > 0.
A third simple case is t ≥ c 3 log(Kr), with c 3 large enough. As in (4.2) and (4.3), for C 45 from Lemma 3.2 we then have It follows that, for C 26 from Theorem 1.2, we need only consider C 26 K 2 ≤ t < c 3 log(Kr), and therefore also K ≤ c 7 (log r) 1/2 , which means G r (2K) ⊂ G + r . A fourth simple case is pairs x, y for which Γ xy goes well outside G r (K), when r is large and t < c 3 log(Kr). Assuming r is large, C = 2c 3 , from the third simple case, is the choice of interest in the following.
Recall the transformation Θxŷ which takes Πxŷ to Π 0,|ŷ−x|e 1 . Suppose w ∈ Γxŷ\G + r and letw = Θxŷw; we may take w with d(w, G + r ) ≤ 2. We claim that, for D r from (3.19), we have D r (w) ≥ C 60 log r, with C 60 from the definition of G r +. We consider several cases.

4.1.
Step 1. Setting up the coarse-graining. For purposes of coarse-graining and multiscale analysis of paths, we build grids inside G + r on various scales, using small parameters λ, δ, β satsfying (4.12) 1 λ δ χ 1 δ (1+χ 1 )/2 β in the sense that the ratio of each term to the one following must be taken sufficiently large, in a manner to be specified. We choose these so 1/δ and 1/β are integers. There is also a fourth parameter ρ > 1, and we further require all of (4.13) can be satisfied by taking β small enough after choosing ρ, λ, δ. For j ≥ 1, a jth-scale hyperplane is one of form H kδ j r , k ∈ Z, and the jth-scale grid in G + r is where K 0 ∈ [1, 2] is to be specified. Note that larger j values correspond to smaller scales, and since 1/δ is an integer, a jth-scale hyperplane for some j is also a kth-scale hyperplane for k > j.
The jth-scale grid divides a jth-scale hyperplane into cubes which we call jth-scale blocks. For concreteness we take these blocks to be products of left-open-right-closed intervals. Each point u of the hyperplane then lies in a unique such block. For a point u in a jth-scale hyperplane, the jth-scale coarse-grain approximation of u is the point V j (u) which is the unique corner point in Q j (u). We abbreviate coarse-grain as CG. The definition ensures that two points with the same jth-scale CG approximation also have the same kth-scale CG approximation for all larger scales k < j.
A transverse step in the jth-scale grid is a step from some u ∈ L j to some so from (1.10), Here 1 (j) is chosen so that the typical transverse fluctuation ∆(δ j r) for a geodesic making one longitudinal step is 1 (j) transverse steps. On short enough length scales, coarse-graining is unnecessary because we can use Proposition 3.5 and Lemma 4.1. More precisely, we will need only consider j ≤ j 1 = j 1 (r) where j 1 is the least j for which meaning j 1 (r) is of order log log r. Provided r is large, this means the spacings δ j 1 r and K 0 β j 1 ∆ r of the j 1 th-scale grid are large, and therefore we can choose q ∈ [4,5] so that δ j 1 r is an integer multiple of q, and then choose K 0 ∈ [1, 2] such that K 0 β j 1 ∆ r is an integer multiple of q. Then (since 1/δ and 1/β are integers) for all j ≤ j 1 , the jth-scale grid in every jth-scale hyperplane is contained in qZ d , which we call the basic grid.
We say an interval in R has kth-scale length if its length is between 10δ k+1 r and 10δ k r. Given x, y ∈ G r with x 1 < y 1 satisfying (4.11), we define a hyperplane collection H xy , which depends on the geodesic Γ xy , constructed inductively as follows. All hyperplanes H s ∈ H xy have s ∈ [x 1 , y 1 ], and we view the hyperplanes as ordered by their indices. Let j 2 (x, y) be the least j such that there are at least 4 jth-scale hyperplanes between x and y. Subject always to the constraint s ∈ [x 1 y 1 ], at scale j 2 we put in H xy the j 2 th scale hyperplanes H s second closest to x and to y; we call these j 2 -terminal hyperplanes. The gap between the j 2 -terminal hyperplanes is at least 4δ j 2 r and at most 5δ j 2 −1 r. In general, when we have chosen the jth-scale hyperplanes in H x,y for some j ≥ j 2 , each gap between consecutive ones is called a jth-scale interval, and the hyperplanes bounding it are the endpoint hyperplanes of the interval. For any interval I we also call {H s : s ∈ I} an interval; which meaning should be clear from the context. A jth-scale interval is short if it has jth-scale length, and long otherwise. We then add (j + 1)th-scale hyperplanes to H xy , of 3 types.
(i) The first type consists of two hyperplanes, which are the (j + 1)th-scale hyperplanes second closest to x and to y, which we call (j + 1)-terminal hyperplanes. A (j + 1)-terminal hyperplane may also be a kth-scale hyperplane on some larger scale k < j + 1, in which case we call it an incidental kth-scale hyperplane. (ii) As a second type, for each non-incidental jth-scale hyperplane H kδ j r ∈ H xy we put in H xy the closest (j + 1)th-scale hyperplanes on either side of H kδ j r , that is, H (kδ j −δ j+1 )r and H (kδ j +δ j+1 )r , which we call sandwiching hyperplanes. (iii) The third type is (j + 1)th-scale joining hyperplanes; we place between 1 and 4 of these in each long jth-scale interval, depending on the behavior of Γ xy in the interval in a manner to be specified below. Joining hyperplanes are always placed in the "extremal 10ths" of the long interval; more precisely, if the jth-scale interval has kth-scale length then they are placed at distance δ r from one of the endpoints for some k + 1 ≤ ≤ j, at most 2 at either end.
We use superscripts − and + for quantities associated with left-end and right-end joining hyperplanes, respectively. We continue through all scales from j 2 to j 1 ; after adding j 1 th-scale hyperplanes, H xy is complete and we stop.
A terminal jth-scale interval in [x 1 , y 1 ] is an interval between a terminal (j+1)th-scale hyperplane and the terminal jth-scale hyperplane closest to it; the length of such an interval is necessarily between δ j r and 2δ j r, so it is short.
We call the values µ ±,1 xy (I) outer joining points, and µ ±,2 xy (I) inner joining points; the inner ones will represent locations where a certain other path traversing I can be guided to coalesce with Γ xy , and we call H µ ±, xy (I) the (potential) joining hyperplanes of the interval I. We say "potential" because not all are necessarily actually included in H xy ; which are included depend on rules to be described.
Recall that ψ q (u) denotes the closest point to u in the basic grid, and F y = ψ −1 q (y), y ∈ qZ d . We need only consider the case in which x, y each share a Voronoi cell with a basic grid point, that is, we readily obtain the general case from this via Lemma 3.1, since for every for the segment of γ from v to w and let u s (γ) denote the entry point of γ into H + s , that is, the first vertex of γ in H + s , necessarily next to H s (in the sense that its Voronoi cell intersects H s .) Our aim is to approximate a general geodesic Γ xy (subject to (4.11)) by a CG one via certain marked (basic) grid points which we will designate, lying in hyperplanes H s ∈ H xy . In the geodesic Γ xy , the first and last marked grid points arex = ψ q (x) andŷ = ψ q (y) (see (4.17).) Initially, the ones in between are the discrete approximations ψ q (u s (γ)) of the entry points u s (γ) corresponding to each of the hyperplanes H s ∈ H xy . Here since q ≥ 4 and d(u s (γ), H s ) ≤ 2, we always have ψ q (u s (γ)) ∈ H s . Later we will remove some of these initial marked grid points, and replace others with their jth-scale CG approximations for various j. This means not every H s ∈ H xy necessarily contains a marked grid point, in all our CG approximations. When a path Γ has a marked grid point in some H s , we denote that marked grid point as m s (Γ). For all our CG approximations, the first and last marked grid points are ψ q (x) and ψ q (y), and for some j the ones in between each lie in the jth-scale grid in some jth-scale hyperplane H kδ j r ∈ H xy . If such a CG path has marked grid points v, w in consecutive jth-scale hyperplanes of H xy , we say γ makes a jth-scale transition from v to w. We say a jth-scale transition is normal if m i ∈ [− 2 , 2 ] for all 2 ≤ i ≤ d, and sidestepping otherwise. Even on the smallest scale j 1 , every jth-scale transition with v, w ∈ G + r is nearly in the e 1 direction, with angle satisfying (similarly to (4.5)) provided r is large, with C 60 from the definition of G + r . We refer to a path γ from x to y, together with its marked grid points, as a marked path; the path alone, without the marked grid points, is called the underlying path. We may write a marked path as ψ q (x) = v 0 → v 1 → · · · → v m → v m+1 = ψ q (y); here the v i are grid points. If γ is the concatenation of the geodesics Γ v i−1 ,v i , we call it a marked piecewise-geodesic path; we abbreviate piecewise-geodesic as PG. (Recall that when v, w / ∈ G, Γ vw denotes Γ ϕ(v),ϕ(w) .) Unless otherwise specified, when we give a marked path by writing its marked points in this way, we assume the path is the (unique) marked PG path given by those marked points.
We associate four quantities to this path: Using the standard fact that for some c 9 ≤ 1, we see that provided r is large and From (4.19), subadditivity of h and Lemma 3.6 we obtain, after reducing c 9 if necessary, In our applications, the last condition will always be satisfied due to (4.18). In general, provided r is large and all We can now define the joining points µ ±, xy (I) in a long jth-scale interval I = [a, b] of kth-scale length, where j 2 ≤ k < j ≤ j 1 − 1. We begin with µ −, xy (I), = 1, 2, which lie in the left part of I. We first define a modified value of t which will appear in Lemmas 4.5 and 4.6: We consider a marked PG path with marked points in the (j + 1)th-scale grid: and for each k and let We may view α( )/2 as an approximation of the "extra distance at scale δ r," that is, of |v Starting from the left, there is an endpoint hyperplane, then one of its sandwiching hyperplanes; the rightmost two are the joining hyperplanes, containing v +1 and v . In the bowed case, v +1 is sufficiently far from g +1 .
see Figure 4. Then so in view of (4.18), Suppose that for some k + 1 ≤ ≤ j we have as happens if 2 m κ(m) is maximized at m = . From (1.10), the second inequality ensures that and the first inequality in (4.28) tells us that It then follows from (4.27), (4.29), and (4.30) that the path from v 0 to v is bowed in the sense that Motivated by this we let We refer to the 3 options in (4.32) as the forward, bowed, and totally unbowed cases, respectively. They may be interpreted as follows. In the forward case the initial steps v 0 → v 1 → v 2 have little sidestepping, and we will see that this eliminates the need for bowedness. Otherwise we look for a scale δ r, with k + 1 ≤ ≤ j, on which Γ xy is bowed as in (4.31), by seeking a scale (the arg max) satisfying (4.28). In the bowed case such a scale exists. In the totally unbowed case there is no such scale, meaning κ(·) is maximized for essentially the full length scale of the interval I. By (4.30) this forces the extra distance α(k + 1) to be very large. We define the inner and outer joining points We define µ +, xy (I) in a "mirror image" manner to µ −, ; otherwise the definition is the same, and the analogs of (4.27), (4.29), and (4.30) are valid for the analog L + (I) of L − (I).
We now describe the rules for which (j + 1)th-scale joining hyperplanes H s with s = µ ±, xy (I), in a long jth-scale interval I, are included in H xy . We note again that the endpoint and sandwiching hyperplanes at both ends of I are always included; in some instances the sandwiching hyperplanes coincide with outer joining hyperplanes, so these criteria never rule out the inclusion of such hyperplanes.
(i) If both ± ends of I have the forward case, then we include the inner joining hyperplanes in H xy ; these are at distance δ j r from the interval ends. The outer ones coincide with the sandwiching hyperplanes at distance δ j+1 r from the interval ends. (ii) If both ± ends have the bowed case, then we include both the inner and outer joining hyperplanes in H xy . We note that when either of L ± (I) = j, the corresponding outer joining hyperplane coincides with the sandwiching hyperplane as in (i), so it is already in H xy on that basis.
(iii) If both ± ends have the totally unbowed case, then we include only the inner joining hyperplanes. (iv) If the two ends have different cases, we determine which end of the interval is dominant according to the criterion described next. We then include the joining hyperplane(s) only at the dominant end, 1 or 2 hyperplanes in accordance with (i)-(iii) above. We call this the mixed case.
To determine the dominant end of a long jth-scale interval I = [a, b], necessarily having kth-scale length for some k < j, in the mixed case, we first select those hyperplanes which are candidates for inclusion in H xy , in accordance with (i)-(iii) above. (For example, if the path has the bowed case with L − (I) = j at the left end and the forward case at the right, we select the inner and outer joining hyperplanes on the left, and only the inner on the right.) For these candidate hyperplanes along with the 4 endpoint and sandwiching hyperplanes in I, we consider the corresponding "tentative" marked PG path (part of Γ xy ) with a marked point in each of the hyperplanes: u 0 → · · · → u n with n = 5 or 6, where u 0 = V j (m (u 0 ) 1 (Γ xy )) and u i = V j+1 (m (u i ) 1 (Γ xy )), 1 ≤ i ≤ n, are jth and (j + 1)th-scale CG approximations. The inner joining hyperplanes contain u , u +1 , with = 2 or 3. Let w i ⊥ be the orthogonal projection of u i into the line Π u 0 u n ; see Figure 5. We use the fact that the excess length E(u 0 , . . . , u n ) = Υ Euc (u 0 , . . . , u n ) − |u n − u 0 | can be approximately split into Figure 5. The mixed case with the bowed case at the (dominant) left end of the interval, and the forward case at the right end, showing the candidate hyperplanes and tentative marked PG path. Since the right end is not dominant, the outer joining hyperplane there, containing v 4 , is not included in H xy .
components associated with the two ends, as follows. We have and similarly for the third difference, while for the middle one, so the middle difference in (4.33) is only a small fraction of the whole: We designate the left end of I as dominant if the first of the 3 differences on the right in (4.33) is larger than the third difference, and the right end in the reverse case. As given in (iv) above, we include in H xy only the candidate joining hyperplanes from the dominant end; the non-dominant end has its endpoint and sandwiching hyperplanes but no joining ones. We note that if (for illustration) the left end is dominant, then, after excluding the right-end joining hyperplanes, we are left with the marked PG path u 0 → · · · → u → u n−1 → u n , for which the contribution of the right end to the extra length can be bounded: we have the last inequality follows from dominance of the left end, so similarly to (4.35), The same bound with |u 1 − w 1 ⊥ | in place of |u n−1 − w n−1 ⊥ | holds symmetrically when the right end is dominant.
and symmetrically at the right end. See Figure 4; g +1 there is z j+1 here. We have from (4.30) and as in (4.29) it follows from these that The advantage of (4.39) is that it depends only on In (1.13) we may interpret tσ r as a reduction in the time allotted to go from x to y, relative to h(|(y − x) 1 |). In place of the reduction tσ r relative to h(|(y − x) 1 |), we can consider a modified reduction, call it R 0 , which is relative to Υ h (Γ CG ): The modified reduction is larger: using (4.22) we see that We will need to (roughly speaking) allocate pieces of R 0 to the various transitions made by Γ CG and certain related paths. Motivated by this, we define the jth-scale allocation A 0 j v, w) of a transition v → w to be In (4.41) the factor 7 j is used due to (4.16). For a marked PG path Γ CG as above, with m ≤ 7 j − 1 marks in the jth-scale grid, we have from (4.20) are large provided r is large and t ≥ 1. Remark 4.4. We now present an outline of the strategy of the proof. Each geodesic Γ xy can be viewed as a marked PG path with marks in the basic grid in each of the hyperplanes of H xy . The goal is to gradually coarsen this approximation on successively larger length scales until we obtain a final path Γ CG xy . The number of possible final paths (outside of a collection of "bad" paths having negligible probability) is small enough so that a version of (1.13) can be proved for final paths.
To perform the coarsening we iterate a two-stage process, with the exception that the first iteration has only one stage. The first iteration is on the j 1 th scale, the second on the larger (j 1 − 1)th scale, and so on. For the j 1 th-scale iteration, we perform a set of operations on the original marked PG path (essentially Γ xy ) called shifting to the j 1 th-scale grid, replacing each marked grid point (located in the basic grid) in each hyperplane in H xy with a nearby point in the j 1 th-scale grid. Proceeding to the (j 1 − 1)th scale (second) iteration, in the first stage we shift those marked points lying in (j 1 − 1)th-scale hyperplanes in H xy to the (j 1 − 1)th-scale grid. In the second stage of the iteration, we remove from the marked PG path those marked points not lying in (j 1 − 1)th-scale hyperplanes, with exceptions for points in terminal hyperplanes. See Figure 6. In general, for the jth-scale iteration (j ≤ j 1 ), at the start of the iteration all the marked points in non-terminal hyperplanes are (j + 1)th-scale grid points in (j + 1)th-scale hyperplanes; in the first stage we shift the ones in jth-scale hyperplanes to the jth-scale grid, and in the second stage we remove the ones not in jth-scale hyperplanes, again with exceptions in terminal hyperplanes.
The difficulty is that as we alter the marked PG path, the length Υ Euc (·) and corresponding h-sum Υ h (·) ≈ µΥ Euc (·) change, with marked-point deletions always reducing these sums, and we need to ensure that, with high probability, the corresponding sum of passage times ΥT (·) "tracks" these changes at least partly, to within a certain allocated error related to the above-mentioned A 0 j v, w). For shifting to a grid the tracking is not too difficult to achieve, as the allowed error turns out to be larger than the change in h-sum being tracked. But for removal of marked points the tracking requires multiple different strategies, depending on the options in (4.32) for the marked grid points in the gap between each two successive jth-scale hyperplanes in H xy . The particular tracking needed is that, with high probability to within the allocated errors, when marked points are removed from a gap, the decrease in total passage time ΥT (·) is at least a positive fraction δ of the decrease in h-sum. The primary difficulty in achieving this is that if the gap has a large length L, then the relevant passage time fluctuation size σ(L) may overwhelm both the reduction in h-sum and the allocated errors; here the remedy involves joining hyperplanes. We also make use of what we call intermediate paths, which (in most cases) have total passage time and h-sum in between the values that exist before and after the marked-point removal, and are chosen so that they are relatively easy to compare to the pre-removal path.
In (1.13) the passage time reduction for the full path is tσ r , which a priori suggests that the total of the allocated errors associated to a path should not exceed this amount. There is no natural way to work with such a small total error yet achieve bounds uniformly over all Γ xy . However, we will see that the tracking enables us to increase the total of the allocated errors by an amount proportional to the extra Euclidean length Υ Euc (·) of the original marked PG path relative to the "horizontal" distance (y − x) 1 , which gives the second term in parentheses in the formula (4.41). With this the necessary uniformity can be achieved, both for the tracking and for the fluctuation bounds on passage times of final paths Γ CG xy .

4.2.
Step 2. Performing the j 1 th-scale (first) iteration of coarse-graining. As described in Remark 4.4, for the j 1 th-scale iteration we perform a sequence of operations on marked PG paths called shifting to the jth-scale grid, in the hyperplanes of H xy . The j 1 th-scale iteration is different from those that follow, in that there is no second stage of removing marked points, and no need for the "tracking" of Remark 4.4. In general we will refer to the marked PG path existing before a shift or removal operation as the current (marked) path, and the modified one resulting from the operation as the updated (marked) path.
For the j 1 th scale, every H s ∈ H xy is a j 1 th-scale hyperplane.
. At the start, the current path Γ j 1 ,0 xy is Γ xy with marks at the grid points x i : Note that Γ j 1 ,0 xy is a marked path, but not necessarily a marked PG path, since Γ xy need only pass near ϕ(x i ), not necessarily through it. By near we mean that both u s i (Γ xy ) and ϕ(x i ) lie in the same cube F x i of the basic grid.
Recall that the blocks of L j 1 have side K 0 β j 1 ∆ r . The first shift to the j 1 th-scale grid happens in H s 1 , replacing x 1 with p 1 = V j 1 (x 1 ) to produce the updated marked path Γ j 1 ,1 xy : p 0 → p 1 → x 2 → · · · → x m+1 , with the underlying path being the concatenation of the geodesics Γ xp 1 , Γ p 1 ,us 2 (Γxy) , Γ us 2 (Γxy),y . Next we repeat this in H s 2 , replacing x 2 with p 2 = V j 1 (x 2 ). We continue this way performing shifts to the j 1 th-scale grid in H s 3 , . . . , H sm , producing the updated path with the underlying path now (in view of (4.17)) being the marked PG path given by these points.
Let us analyze the effect of these shifts on Υ h (Γ CG ). Consider the ith shift, replacing x i with p i . From (4.5) and basic geometry we have Consider first the "sidestepping" case: we have from (4.13), (4.14), and (4.43) that provided r is large, for 1 ≤ i ≤ m, Now consider the "normal" case: |(p i − p i−1 ) * | < 2 (j 1 )β j 1 ∆ r . From (4.13), (4.14), and (4.43) we have We can interchange the roles of p i and x i and/or replace p i−1 with x i+1 , so it follows from (4.44) and (4.45) that and then also We claim that It is enough to show To that end, we have If the ith transition has very small sidestep, that is, then provided r is large, using (4.13) the first term on the right in (4.52) satisfies If instead the sidestep is larger, meaning then the first term on the right in (4.52) satisfies Together, (4.52)-(4.57) prove (4.51), and thus also (4.50). This same proof shows that x i ) are all within a factor of 1 + and similarly for A 0 From (4.42), (4.46), (4.47), and (4.58) we get (4.59) and similarly, using (4.48)-(4.49) instead of (4.46)-(4.47), In view of (4.58), the derivation of (4.59) and (4.60) is also valid if we replace A 0 For basic grid points u, v define M (u) = max{T (y, z) : y, z ∈ F u }.
We have and a form of approximate subadditivity holds: for basic grid points u, v, w, Before proceeding we stress that m, p i , and x i should always be viewed at functions of (x, y, ω). From (4.42) we have The key inclusion here is the second one, as it takes us from an event involving the passage time of the original path Γ j 1 ,0 xy to an event involving the j 1 th-scale CG approximation Γ j 1 ,m xy , up to the "tracking error event" given in the last line. The name is only partly suitable here-bounding the last probability in (4.66) is related to the tracking of Remark 4.4, in that we are ensuring that changing the path from Γ j 1 ,0 xy to Γ j 1 ,m xy doesn't change ΥT (·) too much, but the change in h-sum here is too small to require being tracked.

4.3.
Step 3. Bounding the tracking-error event for the j 1 th-scale-iteration. Consider next the tracking-error event from the right side of (4.66). Recalling Remark 4.4, this reflects the failure of the passage time to track well when the path changes from Γ j 1 ,0 xy to its j 1 th-scale CG approximation Γ j 1 ,m xy . We have and therefore using (4.58), Let us consider any one of the events in the first union on the right in (4.68); we prepare to apply Proposition 3.5. We assume |p i − p i−1 | ≥ |x i − p i−1 | as the opposite case is similar. Define by We observe that in (4.44) and (4.45), one can replace 1/32µ with any given positive constant, if r is large enough. Consequently we have We splitT (p i−1 , p i ) into two corresponding increments, using (4.63): and define the corresponding unions so that (4.70) says the first union in (4.68) is contained in J (1a) xy . Define the set of (x, y) corresponding to (4.11) X r = (x, y) ∈ qZ d × qZ d : x, y ∈ G r (K), |y − x| > C 62 r (log r) 1/χ 1 , (4.17) holds , with C 62 from (4.11), and the events J (1a) = ∪ (x,y)∈Xr J (1a) xy , J (1b) = ∪ (x,y)∈Xr J (1b) xy , J (1c) = Γ xy ⊂ G + r for some (x, y) ∈ X r , For configurations ω / ∈ J (1c) , all p i−1 , p i , x i ,p i lie in G + r , so the number of possible tuples (p i−1 , p i , x i ,p i ) arising from some (x, y) ∈ X r is at most c 16 |G + r | 4 . Define s, α by From this, (4.13), and the definition of j 1 , xy we have for some i that In view of (4.69) and (4.71) we then get from Proposition 3.5 and Lemma 4.2 that Next, recalling (4.69), we observe that |p i − p i | ≤ q √ d − 1 and provided r is large, so using (4.69) and the bound on t in (4.11) we have from Lemmas 3.1 and 3.2 that Hence from (4.66) we obtain for some (x, y) ∈ X r ; ω / ∈ J (0) ∪ J (1c) + c 32 e −c 33 t .

4.4.
Step 4. Further iterations of coarse-graining: preparation. For the (j 1 − 1)th-scale and later iterations of coarse-graining we use allocations A 1 j (Γ cur xy , u i ) associated not to a particular transition, but to the shifting of the marked grid point u i in some (j + 1)th-scale current marked PG path Γ cur xy to the jth-scale grid. Specifically, in such a shift the marked grid point u i in the (j + 1)th-scale grid is replaced by the jth-scale CG approximation V j (u i ), and the other marked grid points are left unchanged. Consider an initial marked PG path when an iteration of shifts to the jth-scale grid begins: Γ 0 : u 0 → u 1 → · · · → u n+1 , with n ≤ 7 j+1 − 1. Suppose that for some I ⊂ {1, . . . , n} not containing two consecutive integers, the marked grid points (u i , i ∈ I) are the ones shifted, one at a time, in the iteration, updating the path from Γ 0 = Γ start via a sequence of intermediate paths Γ 1 , . . . , Γ |I|−1 to a final Γ |I| . Note that since I does not contain two consecutive integers, the shifts of different u i 's do not "interact," as shifting u i only affects the path between ϕ(u i−1 ) and ϕ(u i+1 ) and only affects u i among the marks; we will refer to this aspect as noninteraction of shifts. Recall t * (·, ·) from (4.23).
Lemma 4.5. Let j, K ≥ 1. Consider a marked PG path Γ : u 0 → u 1 → · · · → u n+1 in G + r with u 0 , u n+1 ∈ G r (K), n ≤ 7 j − 1, and (u 0 ) 1 < · · · < (u n+1 ) 1 , and let I ⊂ {1, . . . , n}, not containing two consecutive integers. Define and for v, w ∈ L j with v 1 < w 1 , let Then provided t/K 2 is sufficiently large, The bound (4.76) would be trivial if we used t instead of t * (u i−1 , u i ) the definition of A 1 j (Γ start , i). The point of Lemma 4.5 is that when the transitions affected by the shifting start or end at points u i which are at distance a large multiple of ∆ r from G r (K), it increases the effective value of t by a multiple of (|(u i ) * |/∆ r ) 2 , manifested in the definition of t * (u i−1 , u i ).

From (4.19) we also get
which with (4.80) yields (4.76). The proof of (4.77) is similar, the only (inconsequential) difference being that when we sum the terms t * (u i−1 , u i ) over all i ≤ n, most terms δµ|(u k ) * | 2 /18r7 j get counted twice, for k = i and k = i + 1, whereas each was counted at most once when we summed over i ∈ I in (4.76).
For j 2 ≤ j < j 1 and j 2 < < j − 1, a (j, )th-scale joining 4-path is a marked PG path In a jth-scale interval with left endpoint a, these hyperplanes represent the hyperplane at a, a sandwiching (j+1)th-scale hyperplane, and two possible joining hyperplanes. The pairs note these points are collinear. Recalling Remark 4.3, we say the (j, )th-scale joining 4-path is internally bowed if (compare to (4.39)) and (compare to (4.40)) or equivalently We will need an analog of Lemma 4.6 which enables us to deal with joining 4-paths as single units. Normally for th-scale links (length of order δ r), where the fluctuations of the passage timê T (v i−1 , v i ) are of order σ(δ r), we need th-scale allocations (proportional to λ ) to get a good bound from Lemma 3.2, but in the next lemma we are able to use jth-scale allocations even for much longer th-scale lengths, by taking advantage of bowedness. There exist constants C i as follows. Let j 2 ≤ j < j 1 and j 2 < ≤ j − 1.
(ii) and (iii). The proof of (iii) is a slightly simplified version of the proof of (ii), so we only prove (ii). We proceed as in Lemma 4.6. We decompose the set of joining 4-paths according to the sizes of |(v 0 ) * | and |(v 3 − v 0 ) * |, and the degree of bowedness, measured by the left side of (4.86).
From Remark 4.3, every internally bowed joining 4-path in G + r is in one of these classes; in particular we don't need classes with m 2 = 0.
Suppose Γ ∈ R r,j, (ν, m 2 , m 3 ). We have Regarding the size of R r,j, (ν, m 2 , m 3 ), the number of possible v 0 (necessarily in L j+1 ) in a given jth-scale hyperplane is at most The last upper bound in (4.93) implies so for a given v 0 the number of possible v 3 is at most The upper bound for |v 2 − z 2 | 2 /δ +1 r in (4.93) implies and for given v 0 , v 3 , the point z 2 is determined, and then the number of possible v 2 is at most Combining these, we see that Multiplying this by (4.95) and summing over ν, m 2 , m 3 gives (4.90).

4.5.
Step 5. First stage of the (j 1 − 1)th-scale (second) iteration of coarse-graining: shifting to the (j 1 −1)th-scale grid. The current marked PG path at the start of the (j 1 −1)thscale iteration step is Γ j 1 ,m xy . We rename it now as We shift certain points to the (j 1 − 1)th grid; the procedure is somewhat different from the j 1 thscale iteration step, as we are starting from a j 1 th-scale marked PG path Γ j 1 −1,0 xy . Fix (x, y) ∈ X r and, recalling H xy = {H s i , 1 ≤ i ≤ m}, let I xy = i ∈ {1, . . . , m} : H s i is a non-incidental (j 1 − 1)th-scale hyperplane in H xy , then relabel {s i : i ∈ I xy ∪ {1, m}} as a 1 < · · · < a n , so H a 1 = H s 1 and H an = H sm are the terminal j 1 th-scale hyperplanes, and define indices γ(N ) by a N = s γ(N ) , so the p γ(N ) , 2 ≤ N ≤ n − 1, are the marked points in non-incidental (j 1 − 1)th-scale hyperplanes. Observe that every (j 1 − 1)thscale interval, including terminal ones, has at least one j 1 th-scale hyperplane in its interior; hence γ(N + 1) ≥ γ(N ) + 2 for all 1 ≤ N < n.
In the second stage of the iteration we will remove marked points in non-terminal j 1 th-scale hyperplanes to create the path By Lemma 4.5 we have while similarly to (4.59) and (4.61), (The only notable change from the derivation of (4.59) and (4.61) is that now, since we are shifting to the (j 1 − 1)th scale, we have the bound larger only by a constant β −1 compared to the analogous bound in (4.44)-(4.45).) Then (4.96) and (4.98) give (4.99) From (4.97)-(4.98), analogously to (4.66), we have for the event on the right in (4.74) that (4.100) Remark 4.8. In view of Remark 4.4, in the context of shifting to the grid, we can think of "tracking" as corresponding showing that a probability of form for some (x, y) ∈ X r , (4.101) is small, or the same with Υ h replaced by its approximation µΥ Euc . In (4.100) the particular allocations chosen are A 1 j 1 (Γ j 1 −1,0 xy , i). The event in (4.101) says that, as the shifting-to-the-grid process changes the h-length of the path, the change inT -sum fails to track even a small fraction δ of the change in h-length, to within the error given by the allocations. Further, in (4.74) and on the left in (4.100), represents roughly the accumulated error allocations used in the completed j 1 th-scale iteration.
Thus what (4.98) and (4.96) say is that for the current shift to the grid, the change in h-length (or in its approximation µΥ Euc ) is negligible in the sense of being small relative to the accumulated allocations, making tracking only a minor issue here. This negligible-ness will not be valid for the marked-point-removal stage of the iteration, however.
Step 6. Second stage of the (j 1 − 1)th-scale (second) coarse-graining iteration: removing marked points. For the next update of the current marked PG path Γ j 1 −1,1 xy , we remove all the marked points in non-terminal j 1 th-scale hyperplanes to create the updated marked PG path Here we recall that b 0 =x, b n+1 =ŷ, b 1 and b n lie in terminal j 1 th-scale hyperplanes, and b 2 , . . . , b n−1 lie in (j 1 − 1)th-scale hyperplanes. As with p i ,p i , etc., n and b i should be viewed as functions of x, y, ω.
From (4.97) we have and therefore From this and (4.104) we obtain the setup to establish tracking:

(4.105)
Let us consider the contribution to the difference of sums from a single (j 1 − 1)th-scale interval Removing the marked points from the hyperplanes in the interior of I N changes the marked PG path from the full path Given n ≥ 1 and Γ : u 1 → · · · → u n with all u i in some grid L j , we define From Lemma 4.5 we have n N =0 107) so the last probability in (4.105) is bounded above by This is the tracking-failure event (see Remark 4.4) for the marked-point-removal stage of the iteration, and our main task is to bound its probability. The last of the 3 terms on the right inside the probability can be viewed as part of the allocation of allowed errors. As noted in Remark 4.8, the quantity 4λ in the second probabilty in (4.105) represents the accumulated error allocations used in the 1.5 iterations completed so far; the allocations for the present stage increase the 4 and 6 to 5 and 7 in the third probability in (4.105). Our ability to bound the tracking-failure event is what allows us to replace the quantity in that second probability to obtain the third probability. We need each iteration to involve similar such replacement, so that when the iterations are complete this term becomes δµ Υ Euc Γ CG xy − Υ Euc Γ j 1 −1,0 xy which is typically negative and can in part cancel the accumulated error allocations. Letŵ i be the point Πpγ(N) ,p γ(N +1) ∩ H (p i ) 1 ; noteŵ i does not necessarily lie in any jth-scale grid. Let w i ⊥ be the orthogonal projection ofp i into Πpγ(N) ,p γ(N +1) , noting that by (4.18), |w i ⊥ −ŵ i | is much smaller than |p i −ŵ i |. (Note the indexing of marked points differs here from that used in Step 1 in defining L − (I). Ourŵ i here has index i matching that of the pointp i in the hyperplane, whereas w in Step 1 has index corresponding to the distance δ r from the left end of the interval.) Case 1. I N is short standard (j 1 − 1)th-scale interval. Here H xy includes no joining hyperplanes in the interval, so it includes exactly two maximally j 1 th-scale (sandwiching) hyperplanes there, at distance δ j 1 r from each end; see Figure 7. We introduce the intermediate path which has the same endpoints and satisfies ΥT (Γ N,int ) ≥T (p γ(N ) ,p γ(N +1) ). To bound (4.108) we use the expression on the right in (4.106). Applying Lemma 3.6 with = 1 − δ yields This is the essential property of the intermediate path: when we look at the bowedness of the full Similarly to (4.107), since in Case From (4.106)-(4.111) it follows that the contribution to the tracking-failure probability (4.108) from short standard intervals is bounded by (j 1 − 1)th-scale, and some (x, y) ∈ X r ; ω / ∈ J (0) (c 29 ) ∪ J (1c) (4.112) The terms C 56 and c 29 log r are negligible in (4.112) because λ j 1 tσ r is of the order of a power of r, due to j 1 = O(log log r). All of the incrementsT (u, v) in the last event have δ j 1 r ≤ (v − u) 1 ≤ δ j 1 −1 r. We can therefore bound the last probability similarly to Lemma 4.6, with the main difference being that in place of pairs (u, v) in the definitions of R * , * r,j we need to consider 4-tuples (u, v, w, z) corresponding to values of (p γ(N ) ,p γ(N )+1 ,p γ(N )+2 ,p γ(N +1) ) ∈ (L j 1 ∩ G + r ) 4 . This means the exponents d − 1 in the bounds on |R * , * r,j | become 3(d − 1). The values w i ⊥ are determined by (p γ(N ) ,p γ(N )+1 ,p γ(N )+2 ,p γ(N +1) ) so their presence does not increase the necessary size of R * , * r,j in the lemma. As with the sets R ν, r,j in the lemma proof, we can decompose the possible 4-tuples (p γ(N ) ,p γ(N )+1 ,p γ(N )+2 ,p γ(N +1) ) according to the size of and |p γ(N )+2 − w γ(N )+2 ⊥ | and sum over the possible size ranges. Otherwise the proof remains the same, and we get that the last probability in (4.112) is bounded by Case 2. I N is a terminal (j 1 − 1)th-scale interval (meaning either [p γ(1) ,p γ(2) ] or [p γ(n)−1 ,p γ(n) ]). The proof is similar to Case 1, except that the interval includes only one maximally j 1 th-scale (sandwiching) hyperplane between the two terminal hyperplanes that are at the ends of the interval, so the full path in the interval has formp γ(N ) →p γ(N )+1 →p γ(N +1) . We obtain for the terminalinterval contribution to the tracking-failure probability (4.108) the bound Case 3. I N is a long standard (j 1 −1)th-scale interval. Such an interval, and thus also the middle incrementp γ(N +1)−1 −p γ(N )+1 of the 3 comprising Γ N,f ull in Case 1, may be much longer than δ j 1 −1 r. Therefore the quantity A 2 j 1 (p γ(N )+1 ,p γ(N +1)−1 ) used on the right side of (4.112) for that increment is no longer large enough to give a useful bound on the probability. To avoid this problem we will sometimes use a different intermediate path in I N which coincides with the full path between the inner joining hyperplanes, so differs from the full path only near the ends of I N , while preserving the property (4.109) (this preservation being the purpose of our choice of L ± (I).) Equation (4.109) represents what we may informally call deterministic tracking, a nonrandom analog of the tracking of Remark 4.4 which facilitates our desired (random) form of tracking.
Fix a long standard (j 1 − 1)th-scale interval I N , and suppose it has kth-scale length for some k < j 1 − 1. The hyperplanes of H xy in I N are the (j 1 − 1)th-scale ones at each endpoint, two sandwiching j 1 th-scale hyperplanes at distance δ j 1 r from each end, and between 2 and 4 secondary j 1 th-scale hyperplanes between these. These hyperplanes are at the joining points (p γ(N ) ) 1 + δ L − (I)+1 r, (p γ(N ) ) 1 + δ L − (I) r, (p γ(N +1) ) 1 − δ L + (I) r, and (p γ(N +1) ) 1 − δ L + (I)+1 r, with two exceptions. First, if L − (I N ) = j 1 − 1 then the first of these 4 joining hyperplanes coincides with the leftend sandwiching one, and similarly for L + (I N ), which reduces the number of j 1 th-scale joining hyperplanes to fewer than 4, as discussed in criterion (ii) after (4.32). Second, in the totally unbowed case (third option in (4.32)) at either end of I N , there is no outer joining hyperplane at that end, as in criterion (iii). We define 4 ≤ V (N ) ≤ 6 to be the number of j 1 th-scale hyperplanes in the interior of I N .
Case 3a. The bowed case (second option in (4.32)) for both L ± (I), with V (N ) = 6. Since V (N ) = 6 we must have L − (I N ) < j 1 − 1 and L + (I N ) < j 1 − 1. Here the full path is Γ N,f ull :p γ(N ) →p γ(N )+1 → · · · →p γ(N )+6 →p γ(N +1) and the direct path again isp γ(N ) →p γ(N +1) . Definê This time the intermediate path is defined as Γ N,int :ẑ γ(N ) →ẑ γ(N )+1 → · · · →ẑ γ(N )+6 →ẑ γ(N +1) , which coincides with Γ N,f ull between the inner joining hyperplanes, which is "most" of I N . We denote the parts of the paths outside the inner joining hyperplanes by  For the corresponding quantities for means, similarly to (4.109), but using = 1/2 in Lemma 3.6, we have (4.116) We claim that deterministic tracking holds in the sense that This and (4.116) give the full analog of (4.109) for Case 3a. To prove the claim, suppose L − (I N ) = − for some k + 1 ≤ − < j 1 − 1. Recall α(·) and θ(·) from (4.24) and (4.25), noting that in our present notation, v, w, z there have becomep,ŵ,ẑ, with different superscript labeling. Using (4.27)-(4.29) with j = j 1 − 1, Here the second inequality uses the fact thatp γ(N )+2 −ẑ γ(N )+2 is nearly perpendicular to Πpγ(N) ,p γ(N )+3 , by (4.18). In the other direction, recalling I N has kth-scale length (so (p γ(N )+4 −p γ(N )+3 ) 1 ≥ 8δ k+1 r) and supposing L + (I N ) = + , since k + 1 ≤ + and k + 1 ≤ − we have similarly to (4.34) and (4.35) Now the first ratio on the right is α( − ), so it follows from (4.29) and (4.118) that and similarly Hence from (4.119) and the second equality in (4.116), The analog of (4.110) remains valid, and we now have the ingredients (4.116), (4.117) and (4.122) for the analog of (4.112), bounding the bowed-case contribution to the tracking-failure probability (4.108): and I N long standard (j 1 − 1)th-scale, in the bowed case of (4.32) for both of L ± (I N ), for some (x, y) ∈ X r ; ω / ∈ J (0) (c 29 ) ∪ J (1c) λ j 1 tσ r for some N with V (N ) = 6 and I N long standard (j 1 − 1)th-scale, in the bowed case of (4.32) for both of L ± (I N ), and for some (x, y)   For the ± in the last event, the − applies to γ(N ) + 1 ≤ i ≤ γ(N ) + 3 and the + applies to γ(N ) + 5 ≤ i ≤ γ(N + 1). An application of Lemma 4.7(ii) and (iii) with j = j 1 − 1, followed by summing over ± , bounds the last (tracking-failure) probability in (4.123) by Case 3b. The bowed case (second option in (4.32)) for both L ± (I), with V (N ) < 6. This means we have at least one of L ± (I N ) = j 1 − 1. If for example L − (I N ) = j 1 − 1, then what were in Case 3a the two pointsp γ(N )+1 ,p γ(N )+2 are now the same point, so effectively there is no separate point p γ(N )+1 . Instead we have only the equivalent ofp γ(N ) ,p γ(N )+2 ,p γ(N )+3 , so joining 4-paths become joining 3-paths. This has no significant effect on the arguments, including Lemma 4.7, other than some simplifications, and the bound in (4.124) still applies for the corresponding contribution to the tracking-failure probability (4.108): and I N long standard (j 1 − 1)th-scale, in the bowed case of (4.32) for both of L ± (I N ), and for some (x, y) Case 3c. The forward case (first option in (4.32)) for both L ± (I N ). Here we have V (N ) = 4 as there are only inner joining points, at distance δ j 1 r from each end of I N . As before the direct path isp γ(N ) →p γ(N +1) , and the full path is We claim that in the forward case we have By (4.126), this means that in the forward case there is no tracking failure: I N long standard (j 1 − 1)th-scale, in the forward case of (4.32) for both of L ± (I N ), and for some (x, y) ∈ X r ; ω / ∈ J (0) (c 29 ) ∪ J (1c) = 0. (4.128) To prove (4.127), we first observe that from the definition (4.75), We use the same allocation for all 10 links of Γ N,f ull and Γ N,int : while from (4.19), From these we get deterministic tracking: We now can establish the analog of (4.112) and (4.123) for the totally-unbowed-case contribution to the tracking-failure probability (4.108), using (4.63): Then using (1.10), (4.138) and by Lemma 3.2, for every possible value (u 1 , . . . , u 6 ) of (p γ(N ) , . . . ,p γ(N +1) ) and every link (u i−1 , u i ), we have Similarly to the entropy bounds in Lemmas 4.6 and 4.7, and in Case 3c, we see that the number of possible choices of Γ N,f ull satisfying (4.135)-(4.137) is at most (4.140) .
Case 3e. Mixed cases, which we subdivide as mixed forward case, mixed bowed case, and mixed totally unbowed case, according to the condition at the dominant end (as defined after (4.35)) of I N .
In previous cases we have assumed that the same option in (4.32) occurs at both ends of the interval I N , but this is strictly for clarity of exposition. As explained in Step 1, in mixed cases we have joining hyperplanes only at the dominant end of I N . The situation is symmetric so let us assume the left end is dominant. In mixed cases the full path in I N is alwaysp γ(N ) →p γ(N )+1 → · · · →p γ(N +1) , with γ(N + 1) − γ(N ) = 4 or 5. Suppose I N is a (j 1 − 1)th-scale interval of kth-scale length. The intermediate paths are analogous to those in cases 3a-3d, as follows. If the dominant left end has the forward case then there is one joining hyperplane in I N , containingp γ(N )+2 , at the left end at distance δ j 1 +1 r fromp γ(N ) , and the intermediate path is If the (dominant) left end has the totally unbowed case then there is one (inner) joining hyperplane in I N , containingp γ(N )+2 , at the left end at distance δ k+1 r fromp γ(N ) , and the intermediate path follows the line fromp γ(N ) top γ(N )+1 : If the left end has the bowed case with L − (N ) = < j 1 − 1 then there are two joining hyperplanes in I N , containingp γ(N )+2 andp γ(N )+3 , at the left end at distances δ +1 r and δ r fromp γ(N ) , and the intermediate path is If the left end has the bowed case with L − (N ) = j 1 − 1 then there is one (inner) joining hyperplane in I N , containingp γ(N )+2 , at the left end at distance δ j 1 −1 r fromp γ(N ) , and the intermediate path is . In each case, we define the subpaths Γ N,f ull,− and Γ N,int,− between the left end of the interval and the left inner joining hyperplane. In all cases the arguments are essentially the same as in the analogous non-mixed case, with the addition that for the final transition ending atp γ(N +1) , one uses (4.37) to bound |p γ(N +1)−1 − w γ(N +1)−1 ⊥ |. We will not reiterate the arguments here, but simply state that the result is again the analog of (4.112) and (4.123), bounding the mixed-case contribution to the tracking-failure probability (4.108): xy − (ŷ −x) 1 for some N with I N long standard (j 1 − 1)th-scale, in the mixed case of (4.32) for some (x, y) ∈ X r ; ω / ∈ J (0) (c 29 ) ∪ J (1c) in the mixed totally unbowed case) for some N with I N long standard (j 1 − 1)th-scale, in the mixed case of (4.32) for some(x, y) ∈ X r ; ω / ∈ J (0) (c 29 ) ∪ J (1c) , in the mixed totally unbowed case for I N . As in Cases 3a-3d, the right side of (4.142), and thus the mixed-case contribution to the trackingfailure event (4.108), is bounded above by (4.143) f (λ, δ, β) j 1 exp −c 41 λ 7δ χ 1 j 1 t for some f (λ, δ, β).
Step 7. Further iterations of coarse-graining. The further iterations proceed quite similarly to the (j 1 − 1)th-scale iteration; for the most part, to do the jth-scale iteration we simply replace j 1 − 1, j 1 throughout by j, j + 1. We will sketch the (j 1 − 2)th-scale (third) iteration to make the pattern clear.
Similarly to (4.96)-(4.98) we have These lead to a tracking bound like (4.103) for the first stage (shifting to the (j 1 − 2)th-scale grid) using Lemma 4.6: As with (4.74) and (4.104), with (4.144) this yields + c 32 e −c 33 t + c 42 exp −c 43 λ 7δ χ 1 multiplied by 4, 3, and 2. As noted in Remark 4.8 and after (4.108), these represent the accumulated error allocations from the j 1 th, (j 1 − 1)th, and (first-stage) (j 1 − 2)th-scale iterations, respectively, which we have split out here for clarity. In general after the first stage of the jth-scale iteration, the term with coefficient 3 would represent allocations from all stages j 1 − 1 through j + 1. With this in mind we define the accumulated-allocations upper bounds with A 1 j (Γ xy ) valid after the first stage of the jth-scale iteration, and A 2 j (Γ xy ) valid after the second stage. Comparing (4.150) to (4.104), here we have j = j 1 − 1, the sum in (4.150) has no terms, and the other two terms on the right in (4.104) merge into one. Comparing (4.151) to (4.144), in this instance also j = j 1 − 1, and the sum in (4.151) has one term which merges with the other term. Equation (4.151) may also be compared to (4.74), where j = j 1 and the sum in (4.151) has no terms. We can rewrite the second probability in (4.149) as for some (x, y) ∈ X r ; ω / ∈ J (0) (c 29 ) ∪ J (1c) . Moving on to the second stage of the (j 1 − 2)th-scale iteration, from (4.149), using the form (4.152), we have similarly to (4.105) and (4.108) for some N for some (x, y) ∈ X r ; ω / ∈ J (0) (c 29 ) ∪ J (1c) , (4.154) separately for each of the 5 cases; see for example (4.134) for the totally-unbowed case. This is then bounded by summing probabilities of form over all possible links (v, w) of paths Γ 2,N,f ull , Γ 2,N,int for all N , using Lemmas 4.6 and 4.7. The end result of the (j 1 − 2)th-scale iteration is that After all iterations are completed through the j 2 th scale, this becomes  We claim that This says that the (typically negative) second term on the left, representing part of the cumulative effects of tracking, is enough to adequately cancel the (positive) cumulative allocations A 2 j 2 (Γ xy ); see the comments after (4.108). To prove (4.157) we make the subclaim (4.158) D := max Assuming (4.158) and assuming λ satisfies 7λ/(1 − λ) < 1/4 we have and (4.157) follows. To prove (4.158) we use (4.97) (which generalizes to all j) and the fact that Υ Euc (Γ j,2 xy ) ≤ Υ Euc (Γ j,1 xy ) for all j, which yields that for all j ≥ 1, tσ r 3 + δµD (4.160) and therefore from which (4.158) follows. The right side of (4.156) is therefore bounded above by Step 8. Final marked CG paths. In Γ j 2 ,2 xy , only terminal hyperplanes contain marked points (excludingx,ŷ), one at each end of [x 1 ,ŷ 1 ] for each scale j 2 ≤ j ≤ j 1 . The marked point in the j-terminal hyperplane is in the grid L j for all j. As noted in Step 1, the gap between the j 2 -terminal hyperplanes is at least 4δ j 2 r and at most 5δ j 2 −1 r. Now Γ j 2 ,2 xy is our final CG path, so we rename it Γ CG xy :x = u 0 → · · · → u R+1 =ŷ, where R = 2(j 1 − j 2 + 1). We also define a projected path with collinear marked points Γ CG,pr where v i = (u i ) 1 e 1 is the projection onto the e 1 axis. For j 1 − 1 ≤ j ≤ j 2 , the links (u j 1 −j , u j 1 −j+1 ) and (u R−j 1 +j , u R−j 1 +j+1 ) each have one end in a j-terminal hyperplane and the other in a (j + 1)-terminal hyperplane; these will be called the jth-scale links of Γ CG xy . The links (u 0 , u 1 ) and (u R , u R+1 ) are called final links, and the link (u R/2 , u R/2+1 ) between j 2 -terminal hyperplanes is called a macroscopic link. See Figure 3 in Section 4.1, where (u 3 , u 4 ) is the macroscopic link.
We have Υ Euc Γ CG,pr xy = (ŷ −x) 1 so by Lemma 3.6, since R ≤ 2j 1 , Therefore the last probability in (4.163) is bounded above by Considering the probability for final links (v, w), the number of possible such links in G + r is at most c 46 r 2d and for each such link we have from Lemma 3.2 and (4.15) The probabilities for jth-scale and macroscopic links can be bounded using minor modifications of Lemma 4.6, showing that (4.166) (and hence also the probability on the right in (4.156)) is bounded above by With (4.156) this completes the proof of (1.13). As noted after Theorem (1.2), the downwarddeviations part of (1.12) is a consequence, so the proof of the downward-deviations part of Theorem (1.2) is complete.
We move on to step (2) of the strategy in Remark 1.6. As with the proof of Theorem 1.2 (downward deviations), there are simpler cases which can be proved from Lemma 3.2 and do not require Theorem 1.2. These we can handle uniformly over (x, y); after dealing with these we will handle the main case for the moment only for fixed (x, y), in Lemma 5.1. Let x y x y Figure 10. The inner box G r (K) has height 2K∆ r , and the outer box G r,s has height 2s∆ r . The dotted line shows Γ xy in transverse wandering case (i), the solid line shows case (ii), and the dashed line shows cases (iii) and (iv), for large and small s, respectively.
with c 0 to be specified; we will consider separately short geodesics (meaning |ŷ −x| ≤ r 0 ) and longer ones. When Γ xy ⊂ G r,s for some x, y ∈ V ∩ G r (K) with |(y − x) * | ≤ (y − x) 1 , taking u to be the first vertex in Γ xy outside G r,s , we see that one of the following cases must hold.
Cases (i)-(iii) are the ones which do not require Theorem 1.2 and can be proved from Lemma 3.2. This is because the exponent obtained from that lemma is at least of order log r, meaning it dominates the entropy from the number of possible choices ofx,ŷ,û.
Further, for s ≤ r/∆ r we have 1/s∆ r σ(s∆ r ) ≤ c 7 /∆ 2 r or equivalently s∆ r /σ(s∆ r ) ≥ c 7 s 2 . Hence similarly to (i), provided we take c 0 large in (5.1),  To deal with (iv) and complete the proof for fixed (x, y), we have the following. We note the difference from Lemma 4.2, which covered many x, y simultaneously but considered only larger transverse wandering, of order ∆ r log r or more.
We move to (3) of Remark 1.6. We begin with an extension of a consequence of Theorem 1.2, removing the requirement that y lie in the same tube G r (K) as x. Define Λ r = [0, 2] × [−∆ r , ∆ r ] d−1 .
Lemma 6.1. Suppose G = (V, E) and {η e , e ∈ V} satisfy A1, A2, and A3. There exist constants C i such that for all r ≥ C 79 and t ≥ C 80 , (6.1) P T (x, y) ≤ h(r) − tσ r for some x ∈ Λ r ∩ V and y ∈ H + r ≤ C 81 e −C 82 t .

(6.5)
We now deal with the remaining case |y * | ≤ c 0 (log r) 1/2 ∆ r . For C 26 from Theorem 1.2, we take K = t/C 26 (d − 1) to be specified and subdivide H [r,r+2] into blocks Let z m denote the center point of Y m and let M = m ∈ Z d−1 : 0 / ∈ Y m , ∃y ∈ Y m with |y * | ≤ c 0 (log r) 1/2 ∆ r .
Since t = C 26 (d − 1)K 2 , by rotational invariance we can apply Theorem 1.2 (for downward deviations) to the cylinder C m and obtain P T (x, y) ≤ h(r) − tσ r for some x ∈ Λ r ∩ V and y ∈ Y m ≤ P T (x, y) ≤ h(|y − x|) − t + µ(jK) 2 6 σ r for some x ∈ Λ r ∩ V and y ∈ Y m ≤ C 27 exp −C 28 t + µ(jK) 2 6 .  Figure 11. The block Λ r in H 0 ("fattened" slightly to thickness 2), the similar block Y m of H r , and the cylinder C m containing both, for m = (0, 1). The gray blocks in H r are those with j = 1.
so the width of G + is near 2K∆(kr). For m ∈ Z d let G r,m , Λ r,m be G r , Λ r translated by (rm 1 , 2∆ r m * ), and let M = {m ∈ Z d : G r,m ⊂ G + }; see Figure 12. Then the number of small boxes comprising G + is and we have G + ⊃ G kr,K ∩ H [0,kr] . Let E m (t) be the event in (6.1) translated to G r,m , that is, we replace Λ r with Λ r,m and H + r with H + (m 1 +1)r . Fix C to be specified; when ω ∈ E m (C log k) we say Λ r,m is a fast source. Let L(r) = (h(r) − rµ)/σ r , so (6.8) 0 ≤ L(r) ≤ C 46 log r by Proposition 3.3. We now consider the passage time between the ends of the big box. If Γ xy ⊂ G kr,K then Γ 0,kre 1 must intersect at least k of the regions Λ r,m , one for each 0 ≤ m 1 < k. If there are no fast sources, this means T (0, kre 1 ) > krµ + kL(r)σ r − Cσ r k log k. Provided K is large, it then follows from Lemmas 5.1 and 6.1 that P T (0, kre 1 ) ≤ krµ + kL(r)σ r − Cσ r k log k ≤ P Γ 0,kre 1 ⊂ G kr,K + P ∪ m∈M E m (C log k) ≤ C 75 e −C 76 K 2 + C 81 |M|e −C 82 C log k ≤ C 75 e −C 76 K 2 + c 1 K d−1 k (d−1)χ 2 e −C 82 C log k . (6.9) By taking K large and then C large, we can make the right side of (6.9) less than 1/2 for all k ≥ 3. On the other hand, from (1.11) for large c > 1 we have P T (0, kre 1 ) ≤ krµ + L(kr)σ(kr) + cσ(kr) ≥ 1 − C 24 e −C 25 c > 1 2 .
From this and (6.9) it follows that (6.10) (L(kr) + c)σ(kr) ≥ k(L(r) − C log k)σ r for all r large and k ≥ 3. and therefore by Theorem 1.4, using (4.12) and the assumption t ≥ C 26 K 2 , It follows using Lemma 3.2 that The same is valid for a final link, with j = j 1 . For macroscopic links it is valid with σ(2δ j r) replaced by σ r , giving The numbers of possible links inside G r (K) are as follows: (i) jth-scale links: at most c 5 K d−1 /δ 2j β 2j(d−1) , (ii) final links: at most c 6 K d−1 r d , (iii) macroscopic links: at most c 7 K d−1 /δ 2 β 2(d−1) .
Taking λ < 1/3 it follows that P T (x, y) − ET (x, y) ≥ tσ r for some x, y ∈ G r (K) with |y − x| ≥ r ≤ c 10 e −c 11 t , which completes the proof of Theorem 1.2 for downward deviations.
As in section 5, this means that, for C 56 from Lemma 3.6, Together with cases (i)-(iii) in section 5, this completes the proof of Theorem 1.5.