Route Lengths in Invariant Spatial Tree Networks

Is there a constant $r_0$ such that, in any invariant tree network linking rate-$1$ Poisson points in the plane, the mean within-network distance between points at Euclidean distance $r$ is infinite for $r>r_0$? We prove a slightly weaker result. This is a continuum analog of a result of Benjamini et al (2001) on invariant spanning trees of the integer lattice.


Introduction
Parts of classical stochastic geometry [12], for instance Delaunay triangulations on random points, implicitly concern random spatial networks but without direct motivation as real-world network models. Substantial recent literature, surveyed in the 2018 monograph [9], concerns toy models of more specific types of real-world spatial network, studied in statistical physics style rather than theorem-proof style. Intermediate between those styles, and envisioning examples such as inter-city road networks, one can model the city positions as a Poisson point process, and one can study the trade-off between a network's cost (taken as network length) and its effectiveness at providing short routes [4,5,6]. It is often remarked that tree networks are obviously very ineffective at providing short routes, and the purpose of this article is to give one formalization, as Theorem 1.2.
As background we mention two results for lattice models. Consider m 2 cities at the vertices of the m × m grid. Any connected network must have length Ω(m 2 ), and the mean route-length between two uniform random points must be Ω(m). Observe that these orders of magnitude can be attained by a tree-network; from each vertex create a unit edge to a neighbor vertex nearer to a central root. This type of construction extends readily to the Poisson model. But this apparent "linearity of mean route lengths" is in some ways misleading, in that it depends on a finite network having a central region.
Infinite tree networks with a spatial stationarity property are different, as shown by the following elegant result of Benjamini et al. [10] in the infinite lattice setting. Here invariant means the distribution of the network is invariant under the automorphisms of the lattice. 1 Invariant spatial tree networks Theorem 1.1 ([10]). For any invariant random spanning tree in the infinite 2-dimensional square lattice, the (within-tree) route length D between lattice-adjacent vertices satisfies P(D ≥ i) ≥ 1 8i , i ≥ 1. In particular, ED = ∞.
A relation between finite models and infinite invariant models is provided by local weak convergence, discussed briefly in section 3.2.
The proof of Theorem 1.1 exploits symmetries of the lattice which clearly are not directly applicable in the Poisson model. So what is the analog of Theorem 1.1 in the rate-1 Poisson model on the plane? Here invariant means the distribution of the network is invariant under the Euclidean group. 2 We would like to consider ρ(r) := mean route length between two Poisson points at distance r.
As noted in section 3.1, the MST (minimum spanning tree) provides a model in which ρ(r) < ∞ for small r. It seems natural to conjecture that there exists a constant r 0 < ∞ such that, for all invariant tree networks over the rate-1 Poisson process, ρ(r) = ∞ for a.a. r ≥ r 0 . To avoid possible very artificial examples (see section 3.3) we actually prove a slightly weaker assertion, by considering instead the route-length D r between Poisson points at distance at most r.
To be precise about the meaning of tree-network , we allow Steiner points (junctions, envisaging road networks) as vertices in addition to the given Poisson points. And we take edges to be line segments between vertices. The tree property is that there are no circuits. Theorem 1.2. There exist constants r 0 < ∞ and β > 0 such that, in every invariant tree-network connecting the points of a Poisson point process of rate 1 in the infinite plane, for r ≥ r 0 P(D r > d) ≥ βr/d, r ≤ d < ∞ and so ED r = ∞ for r ≥ r 0 .
So this is a continuum analog of Theorem 1.1. The proof in section 2 relies on the fact that a finite tree has a centroid from which each branch contains less than half the vertices; the route between two vertices in different branches must go via the centroid, so the route length is lower bounded by the sum of distances to the centroid. Consider the partition of a very large square into a large number of large subsquares. If there are a non-negligible number of subsquares in which points from more than one branch have non-negligible relative frequency, then the point-pairs within such subsquares provide the desired long routes. Otherwise almost all subsquares have almost all points from the same branch, but therefore (and this is the key intricate technical issue, Lemma 2.2) there must be some number of pairs of adjacent subsquares for which these are different branches, and so (by the easy Lemma 2.1) some overlapping square has a substantial proportion of its points from different branches, which as before provide the desired long routes.
Our proof is technically elementary, albeit rather intricate, using only very basic facts from percolation theory. It seems quite likely that some shorter proof could be found, using some more sophisticated percolation theory.
Remarks on analogous questions for general networks are given in section 3.5. Note also that, for Theorem 1.2 to be interesting in the sense of generality, one would like to know that there are many different ways to construct invariant tree-networks over Poisson points, and we discuss this in section 3.2.

Technical lemmas
Here we give two lemmas. The first, which is elementary, will enable reduction to a lattice percolation setting, and the second is the key technical ingredient we need in that setting. To aid intuition we state these in terms of colorings, though with different interpretations in the two lemmas, and it is not the graph-theoretic coloring notion in which adjacent vertices must have different colors. Lemma 2.1. Suppose S 1 and S 2 each contain a balanced configuration of points. Consider a {blue, red} coloring of the points in S 1 ∪ S 2 , and suppose that neither (a) S 1 and S 2 both contain less than 0.1m 2 blue points nor (b) S 1 and S 2 both contain more than 0.88m 2 blue points is true. Then the number of blue-red point pairs at distance at most 2 1/2 m apart is at least 0.088m 4 .
Note we are counting all such pairs, not asking for a matching where a point can be in only one pair.
Proof of Lemma 2.1. First, if either S 1 or S 2 contains between 0.1m 2 and 0.88m 2 blue points, say y blue points, then (from the definition of balanced ) there are at least 0.98m 2 −y red points, and so at least y(0.98m 2 −y) ≥ 0.1m 2 ×0.88m 2 blue-red pairs within that square. Such a pair is at most 2 1/2 m apart. The only remaining case is w.l.o.g where S 1 contains less than 0.1m 2 blue points, and S 2 contains more than 0.88m 2 blue points. In this case, consider the successive translated squares [im/5, m + im/5] × [0, m], i = 0, 1, 2, . . . , 5. At each step the number of blue points can increase by at most 1.02m 2 /5, so in at least one of the translated squares there are between 0.1m 2 and 0.88m 2 blue points, and the result follows as in the first case.
For our key technical lemma, fix a large integer k and consider the k × k grid graph with vertices G k = {0, 1, , . . . , k − 1} × {0, 1, , . . . , k − 1}. Lemma 2.2. Given an arbitrary subset ξ k of G k , let c(ξ k ) be the minimum, over all {green-yellow} colorings of G k with at least k 2 /4 vertices of each color, of the number of green-yellow adjacent pairs where neither vertex is in ξ k . Then there exists q > 0 such that, taking Ξ k to be the random subset in which each vertex is present independently with probability q, As motivation, in the proof of Theorem 1.2 we will apply this where the vertices represent large squares and the two colors indicate a relatively large or relatively small number of points in a given tree-branch in the square. The proof of Lemma 2.2 is in essence just the classical Peierls contour method [16], in that it involves counting self-avoiding paths, but applied in two different ways at (2.1) and (2.7).
Proof of Lemma 2.2. To recall basic percolation theory, in any coloring a green-yellow adjacent pair specifies an edge in a dual graph, and these edges form the boundaries of colored components. More precisely, as illustrated in Figure 1 (left), the set of such edges is a disjoint union of (i) self-avoiding circuits within the k × k grid ECP 26 (2021), paper 31. (ii) self-avoiding paths starting and ending on the external dual boundary.
We write path* for "path or circuit". Fix > 4 and consider a self-avoiding path* (in the dual graph) π of length in G k . Each edge separates some pair of vertices in G k . A pair of adjacent vertices overlaps with 6 other pairs, and so (by listing the pairs associated with π and greedily choosing pairs not overlapping with previously chosen pairs) we can find a set S π of /7 disjoint adjacent vertex pairs separated by some edge within π. Consider the event A π that at most /20 pairs within S π have neither end-vertex in Ξ k . This event has probability The number of length-self-avoiding paths* π is at most 4k 2 3 −1 . So the expected number of events A π that occur is at most We claim that for sufficiently small q > 0 lim k→∞ >log k To verify this, tidy by setting p = 1 − (1 − q) 2 and m = /7 ; now we need to prove that for sufficiently small p > 0 The Chernoff bound says that P(Bin(m, 2) follows easily. The impact of (2.1) is that for sufficiently small q we may assume (*) For every self-avoiding path* π of length > log k in the dual graph of G k , there exist at least /20 disjoint adjacent vertex pairs separated by some edge within π and with neither vertex in Ξ k .
Note this is a property of Ξ k , not involving any coloring. Now consider a green-yellow coloring of G k with at least k 2 /4 vertices of each color, By an elementary argument, the length of the boundary within G k between colored regions, that is the sum of lengths of the paths* at (i,ii), is at least k/2. Split that sum as S long + S short according as the path* lengths are longer or shorter than log k. If S long > k/10 then property (*) implies there are at least k/200 green-yellow adjacent pairs where neither vertex is in Ξ k , and because a vertex can be adjacent to at most 2 paths* these contain a subset of at least k/400 disjoint green-yellow adjacent pairs where neither vertex is in Ξ k .
So it is enough to consider only colorings in which S long ≤ k/10 and S short ≥ k/2 − k/10 = 2k/5.
Fix such a coloring, and consider the associated paths and circuits, as described in (i,ii) above and illustrated in Figure 1. Note that a circuit splits G k into an exterior and an interior region. Also a path , which by (2.3) has length ≤ k/10, splits G k into a well-defined larger and a smaller region, where (somewhat confusingly) we designate the smaller region meeting the boundary of G k as the interior of the path. Now a circuit of length c fits inside some square of side c/2 and so its interior has at most c 2 /4 vertices. A path of length c fits inside some square of side c and so its interior has at most c 2 vertices.
at most (k/10) 2 = k 2 /100 vertices are in the interior of long paths*.

(2.4)
A short path starts and ends on the boundary of the square and so the interior vertices are within a distance log k from the boundary. So by considering the number of vertices within distance log k from the boundary of the k × k square at most 4k log k vertices are in the interior of short paths. (2.5) We remark that the point of this argument is that, from the upper bounds above and the lower bound on S short in (2.3), we will next be able to lower bound the number of vertices within short circuits, as (2.6) below.
A maximal circuit is one that is not contained inside another circuit or path, and a maximal path is one that is not inside 3 another path. Figure 1 (right) shows the 5 maximal paths and the 1 maximal circuit in that example. Note that, by definition, there is a single-color path 4 immediately inside and a single-opposite-color path immediately outside each maximal path or circuit. Moreover the colors of these immediately-inside paths are the same (say •) for each component, because a path in G between a vertex in each component must cross component boundaries an even number of times. Every vertex of color • at distance at least log k from the sides of G k is either inside some path, or inside some circuit and therefore inside some maximal circuit. By hypothesis there are at least k 2 /4 vertices of each color, so using (2.4, 2.5) and considering maximal circuits we have shown that property (2.3) implies that for large k there exist circuits, each of length less than log k, with disjoint interiors and containing a total of at least k 2 /5 vertices.
(2.6) So it is enough to consider only colorings with property (2.6). To analyze this case we need to set up some notation. Write Ξ ∞ for the random subset of the infinite square lattice Z 2 in which each vertex is present independently with probability q. Define the cost of a dual circuit C in Z 2 to be the number of edges for which neither adjacent vertex is in Ξ ∞ , and similarly for dual circuits in G k and Ξ k . Consider the event A q ∞ := some circuit in Z 2 around the origin has zero cost. Invariant spatial tree networks By a simpler use of the Peierls contour method used for (2.1), P(A q ∞ ) → 0 as q ↓ 0. So we can fix q sufficiently small that Consider a coloring of G k (depending on Ξ k ) satisfying (2.6): to complete the proof of Lemma 2.2 it will suffice to show that N k := number of green-yellow adjacent pairs in G k with neither vertex in Ξ k satisfies P(N k < k/400) → 0 as k → ∞. (2.8) Write C k for the set of circuits guaranteed by (2.6), and write G k for the union of their interior vertices. For v ∈ G k write C k (v) for the circuit in C k containing v and area(C k (v)) for its area (= number of interior vertices). Now is the event that some circuit around v in Ξ ∞ with length ≤ log k has zero cost. Note by (2.7) By (2.6), the circuit C k (v) has length at most log k, and so lies within a square of side log k, and so each area(C k (v)) is at most log 2 k, so (2.10) If v 1 and v 2 are more than log k apart, the events A k (v 1 ) and A k (v 2 ) are independent, so var v∈G k and then Chebyshev's inequality gives (2.11) By (2.6) we have |G k | ≥ k 2 /5, and combining with (2.10) we find P(N k < k 2 /(10 log 2 k)) → 0 as k → ∞ which is stronger than the desired bound (2.8).
To check the logic of this argument, note that the event in (2.11) involves only Ξ ∞ .
The other inequalities are deterministic, and show that, outside event (2.11), for every coloring satisfying (2.6) and for large k, we have N k ≥ k 2 /(10 log 2 k).

Invariant spatial tree networks
We actually need the following modification of Lemma 2.2, to say that the same result holds if we insist that we count only pairs outside an arbitrary subsquare of side 0.001k. Corollary 2.3. Let Ξ k be the random subset of G k in which each vertex is present independently with probability q. Let 2 k be a subsquare of G k of side asymptotic to 0.001k, dependent on Ξ k . Let c (Ξ k ) be the minimum, over all {green-yellow} colorings of G k with at least k 2 /4 vertices of each color, of the number of green-yellow adjacent pairs where neither vertex is in Ξ k or in 2 k . Then there exist q > 0 and α > 0 such that Outline proof. Re-color the vertices in the small subsquare to become all the same color, and apply Lemma 2.2 to the new configuration. We omit details.

Proof of Theorem 1.2
Take large integers k and m, and set n = km. Consider the n × n square [0, n] 2 in the plane. Write Σ m,k for the index set of the natural partition of the square [0, n] 2 into k 2 subsquares σ of side m -call these the natural subsquares. The set Σ m,k is isomorphic to the k × k vertex grid G k . In particular a subsquare 2 of G k , say with s × s vertices, corresponds to a subsquare 2 + of the square [0, n] 2 , with side sm, consisting of s 2 natural subsquares.
Write q m for the probability that a realization of a rate-1 Poisson point process on a m × m square is not balanced , in the sense of Lemma 2.1. By the weak law of large numbers for the Poisson distribution, q m → 0 as m → ∞. for the sets of Poisson points within the square that are in the different branches from v * , the largest such set has size at most N/2. It is then always possible to merge (if necessary) these sets into a bipartition {B, B c } of the points in the square such that N/3 ≤ |B| ≤ N/2. The key observation is that the path from any v ∈ B to any v ∈ B c must go via v * . We will use this to prove the following key result, from which Theorem 1.2 will follow quite easily.

Lemma 2.4.
There exists β 0 > 0 such that, with probability → 1 as m, k → ∞, there are at least β 0 m 4 k pairs of points from B and B c within straight-line distance 2 1/2 m but whose route-length is at least 0.001mk.
Proof. Color red the points in B, and color blue the points in B c . The "probability" parts of the argument are the following easy consequences of the law of large numbers.
Outside an event of probability → 0 as m, k → ∞: the total numbers of blue and of red points each exceed0.33m 2 k 2 ; (2.14) The remainder of the argument is deterministic, and the precise numerical constants are not important.
Write 2 for a subsquare of G k with 0.001k × 0.001k vertices, and write 2 + for the corrresponding square of side 0.001n within [0, n] 2 . The essential issue is to find a lower bound for N pair := min Write b σ for the number of blue points in the natural subsquare σ. Recall that a balanced natural subsquare must have between 0.98m 2 and 1.02m 2 points -call this the size condition. Amongst balanced natural subsquares, σ, consider • the number S for which 0.09m 2 ≤ b σ ≤ 0.89m 2 , • the number S < for which b σ < 0.09m 2 , • the number S > for which b σ > 0.89m 2 .
If a balanced natural subsquare σ has 0.09m 2 ≤ b σ ≤ 0.89m 2 then, by the size condition, there are at least 0.98m 2 − b σ red points, and so at least 0.08m 4 blue-red pairs in σ. So where (as above) 2 is a subsquare of G k with 0.001k × 0.001k vertices. A given subsquare 2 cannot intersect more than 0.000001k 2 natural subsquares, and so N pair ≥ 0.08m 4 (S − 0.000001k 2 ). (2.15) If S is indeed of order k 2 then this inequality gives all we require (see (2.17) below) but the key issue is to analyze the case where S is small. We can lower bound the total number N blue of blue points by (2.13) and upper bound it by (2.14) and the definitions of (S, S < , S > ): this gives 0.33m 2 k 2 ≤ 0.09m 2 S < + 0.89m 2 S + 1.02m 2 S > + m 2 k 2 ψ(m).
With the corresponding inequality arising from counting red points, we obtain that for m sufficiently large if S ≤ 0.005k 2 then min(S < , S > ) ≥ k 2 /4. Color a natural subsquare σ yellow if b σ < 0.1m 2 , or green otherwise. In the case min(S < , S > ) ≥ k 2 /4 we can apply Corollary 2.3 provided m is sufficiently large (recall (2.12)), to conclude that (outside an event of probability → 0 as k → ∞) the number of adjacent balanced green-yellow pairs is at least αk, and these can be taken to avoid any choice of 2 corresponding to a choice of 2 +. A given subsquare can be in at most 4 such green-yellow pairs, so we can find αk/4 disjoint pairs. By Lemma 2.1, within each such pair there exist at least 0.088m 4 blue-red pairs and so in this case N pair ≥ 0.088m 4 × αk/4 := β 0 m 4 k (*) There are constants β 1 > 0 and r 0 , ρ 0 < ∞ such that, for r ≥ r 0 and d/r ≥ ρ 0 , and for any invariant tree model, the mean number of pairs of Poisson points within [0, n] 2 at distance ≤ r apart and with route-length ≥ d is at least β 1 dr 3 .
Let χ(r, d) be the probability, in a given model of an invariant tree-network over the Poisson points, that between two typical Poisson points at distance ≤ r the route-length is ≥ d. The mean total number of pairs within [0, n] 2 at distance ≤ r apart is bounded above by 1 2 n 2 πr 2 . So the mean number of such pairs with route-length ≥ d is bounded above by 1 2 n 2 πr 2 χ(r, d). This holds in particular when d = 0.001n, and now comparing the upper and lower bound we find for a constant β 2 . In the notation of Theorem 1.2 we have χ(r, d) = P(D r ≥ d), and this inequality is equivalent to the form stated in Theorem 1.2.

The Euclidean MST
Consider the random geometric graph G(r 0 ) whose vertices form the rate-1 Poisson point process and whose edges link all pairs of points at Euclidean distance at most r 0 . Write N (v, r 0 ) for the number of vertices in the component C(v, r 0 ) of G(r 0 ) containing a typical vertex v. It is well known [16] that for sufficiently small r 0 , all moments of N (v, r 0 ) are finite. Consider now the Euclidean MST over the Poisson process. Fix r 0 satisfying (3.1). By considering Prim's algorithm started from a vertex of C(v, r 0 ), we see that the algorithm first constructs a spanning tree within he component C(v, r 0 ), using edges of length ≤ r 0 , before using any edge of length > r 0 needed to escape that component. In other words, the restriction of the MST to C(v, r 0 ) is a spanning tree within C(v, r 0 ). So the route length in the MST between v and another vertex v at distance ≤ r 0 is at most r 0 N (v, r 0 ). We can now use a size-biasing argument to show that, for r 0 satisfying (3.1), which is essentially saying that ρ(r 0 ) < ∞. Consider a component C n,m of G(r 0 ) with n ≥ 2 vertices and some number m of pairs of vertices with inter-pair distance ≤ r 0 . Note n − 1 ≤ m ≤ n 2 . Such components appear at some rate β(n, m) per unit area. In the entire Poisson process, the rate of pairs at inter-pair distance ≤ r 0 is rate(r 0 ) = 1 2 πr 2 0 = n≥2 ( n 2 ) m=n−1 mβ(n, m). The route length between such a pair within C n,m is at most nr 0 , so the contribution to "sum of route lengths between such pairs in a component" from one such C n,m is at most mnr 0 . So the contribution to that sum "per unit area" from such components is at most r 0 mnβ(n, m). Writing (N 0 , M 0 ) for the values (n, m) in a typical (uniformly chosen) component of G(r 0 ), we have Using (3.3) we have β(n, m) ≤ rate(r 0 ) and we also know M 0 ≤ N0

Constructing invariant tree-networks
Some examples are given in [14]; here is our general discussion. Take an arbitrary tree-network linking m 2 independent uniform random vertices in the continuum square [0, m] 2 , and write m for the expectation of the average (over vertices) length of the edge from the vertex toward the centroid. Randomly re-center, that is translate the plane as (x, y) → (x − U, y − V ) for (U, V ) uniform on [0, m] 2 , and then apply a uniform random rotation. A sequence of such networks with m bounded as m → ∞ is tight in the natural "local weak convergence" topology, and any subsequential weak limit network has invariant distribution. This very general construction suggests that the class of invariant tree-networks should be very rich. But there are two issues. In general the weak limit structure is guaranteed to be a forest with infinite treecomponents, but is not guaranteed to be a single tree. The planar MST limit is known to be a tree [7] but the proof heavily exploits its explicit structure; there seem to be no useful general methods for proving that a construction via local weak convergence gives a limit tree. To illustrate a more algorithmic construction, consider "Poisson rain" on the plane -rate 1 per unit area per unit time over time 0 < t ≤ 1. The construction rule "each arriving point is a child of the nearest existing point" gives a genealogical tree studied in [2]. Representing the parent-child relation by drawing a line segment, the network is not a tree because such lines may cross, but instead one can draw just the part of the segment from the child to the existing network, and the analysis in [2] implies this will be a tree. Presumably other rules for connecting arriving points to the existing network within this Poisson rain framework will also yield invariant trees.
The second issue is illustrated by the notion of minimal (shortest length) Steiner tree. In the finite random setting this is a.s. unique. The local weak convergence scheme produces limit random forests attaining the minimum length-per-unit area possible, but (even if one could prove that limits are trees) it is not clear how to prove there is an a.s. unique limit tree attaining that minimum.
More abstractly, infinite trees arising as local weak limits are unimodular: general theory for unimodular trees at the graph-theoretic level is given in [11,8] but is not specifically adapted to the spatial setting.

Outline of possible counter-examples to the natural conjecture
Take r i → ∞ very fast and δ i → 0 very fast. Draw a line segment between the Poisson point pairs which are at some distance in ∪ i [r i , r i + δ i ]. One can arrange that the density of intersections of these lines is arbitrarily small. Break the (rare) circuits. Then assign random arrival times and use a "Poisson rain" construction as in the section above. In this way it might be possible to construct an invariant tree-network such that ρ(r) < ∞ for r ∈ ∪ i [r i , r i + δ i ].

Possible generalizations
For Theorem 1.2, can one replace the Poisson point process by a more general point process satisfying some spatial mixing condition? The centroid argument outlined below the statement of Theorem 1.2 seems heuristically to require very little beyond invariance. A more sophisticated proof of the Poisson case might allow such generalizations, and this might be a good starting project for a student wishing to engage percolation theory.

General spatial networks
For general (i.e. non-tree) invariant networks over Poisson points, the quantity ρ(r) := mean route length between two Poisson points at distance r at (1.1) is a natural object of study. From [1,13] we know that under very weak assumptions (which roughly correspond to "not a tree"), not only is ρ(r) < ∞ for all r, but also (by subadditivity, heuristically) there exists the limit lim r→∞ r −1 ρ(r) := ρ * < ∞.
That is, average route lengths are asymptotically linear in straight-line distance. But quantitative analytic study of ρ(r) or ρ * seems very difficult even in simple-to-describe network models.
For reasons explained in [6], it is not always wise to use ρ * as a summary statistic for efficiency at providing short routes. Instead, in [6] we recommend the statistic sup r r −1 ρ(r) to ensure that the network provides short routes on all scales. This line of thought also motivates study of exactly self-similar networks (so r −1 ρ(r) is constant) on the continuum plane [3].