On the Estimation of Latent Distances Using Graph Distances

We are given the adjacency matrix of a geometric graph and the task of recovering the latent positions. We study one of the most popular approaches which consists in using the graph distances and derive error bounds under various assumptions on the link function. In the simplest case where the link function is an indicator function, the bound is (nearly) optimal as it (nearly) matches an information lower bound.


Introduction
Suppose that we observe a undirected graph with adjacency matrix W = (W ij : i, j ∈ [n]) (where [n] := {1, . . ., n} and n ≥ 3) with W ij ∈ {0, 1} and W ii = 0. We assume the existence of points, x 1 , . . ., x n ∈ R v , such that for some non-increasing link function φ : [0, ∞) → [0, 1].The (W ij , i < j) are assumed to be independent given the point set (x 1 , . . ., x n ).We place ourselves in a setting where the adjacency matrix W is observed, but the underlying points are unknown.We will be mostly interested in settings where φ is unknown (and no parametric form is known).Our most immediate interest is in the pairwise distances In general, when the link function is unknown, all we can hope for is to rank these distances.Indeed, the most information we can aspire to extract from W is the probability matrix P := (p ij ), where and even with perfect knowledge of P , the distances can only be known up to a monotone transformation, since p ij = φ(d ij ) and φ is in principle an arbitrary non-increasing function.Recovering the points based on such a ranking amounts to a problem of ordinal embedding (aka, non-metric multidimensional scaling), which has a long history [19,36,37,45].
Although this is true in general, we focus our attention on the 'local setting' where the link function has very small support.In that particular case, we are able to (approximately) recover the pairwise distances up to a scaling.By fixing the scale arbitrarily (since it cannot be inferred from the available data), recovering the underlying points amounts to a problem of metric multidimensional scaling [7].Classical Scaling [42] is the most popular method for that problem, and comes with a perturbation bound [5] which can help translate an error bound for the estimation of the pairwise distances (up to scale) to an error bound for the estimation of the points (up to a similarity transformation).We thus focus our attention on the estimation of the pairwise distances (2).

Related work
The model we consider in (1) is an example of a latent graph model and the points are often called latent positions.In its full generality, the model includes the planted partition model popular in the area of graph partitioning.To see this, take r = 1 and let v denote the number of blocks and, with e k denoting the k-th canonical basis vector, set x i = e k if i belongs to block k.The planted partition model is a special case of the stochastic block model of Holland et al. [18].This is also a special case of our model, as can be seen by changing e k to z s chosen so that φ( z s − z ) = p k , where p k denotes the connection probability between blocks k and .Mixed-membership stochastic block models as in [2,4,44] are also special cases of latent graph models, but of a slightly different kind.The literature on the stochastic block model is now substantial and includes results on the recovery of the underlying communities; see, e.g., [1,11,16,21,26,30,38] and references therein.
Our contribution here is of a different nature as we focus on the situation where the latent positions are well spread out in space, forming no obvious clusters.This relates more closely to the work of Holland et al. [17].Although their setting is more general in that additional information may be available at each position, without that additional information their approach reduces to the following logistic regression model: which is clearly a special case of (1) with link function the logistic function.Sarkar et al. [31] consider this same model motivated by a link prediction problem where the nodes are assumed to be embedded in space with their Euclidean distances being the dissimilarity of interest.In fact, they assume that the points are uniformly distributed in some region.They study a method based on the number of neighbors that a pair of nodes have in common, which is one of the main methods for link prediction [22,23].Parthasarathy et al. [28] consider a more general setting where a noisy neighborhood graph is observed: if (x i ) are points in a metric space with pairwise distances (d ij ), then an adjacency matrix, W = (W ij ), is observed, where W ij = 1 with probability 1 − p if d ij ≤ r and with probability q if d ij > r, where p, q ∈ [0, 1] are parameters of the model.Under fairly general conditions on the metric space and the sampling distribution, and additional conditions on (n, r, p), they show that the graph distances computed based on W provide, with high probability, a 2-approximation to the underlying distances in the case where q = 0.In the case where q > 0, the same is true, under some conditions on (n, r, p, q), if W is replaced by W = ( Wij ) where Wij = 1 exactly when where τ is a carefully chosen tuning parameter and where N i := #{j : W ij = 1} (number of neighbors of i) and N ij := #{k : W ik = W jk = 1} (number of common neighbors of i and j).Scheinerman and Tucker [32] and Young and Scheinerman [46] consider what they call a dot-product random graph model where p ij = x i , x j , where it is implicitly assumed that x i , x j ∈ [0, 1] for all i = j.This model is a special case of (1), with φ(d) = 1 −1 2 d 2 .Sussman et al. [39] consider recovering the latent positions in this model with full knowledge of the link function.They devise a spectral method which consists in embedding the items {1, . . ., n} as points in R v , with v assumed known, as the row vectors of , where W = U ΘV is the SVD of W , and for a matrix A = (A ij ) and an integer s ≥ 1, A (s) = (A ij : i∨j ≤ s).They analyze their method in a context where the latent positions are in fact a sample from a possibly unknown distribution.The same authors extended their work in [40] to an arbitrary link function, which may be unknown, although the focus is on a binary classification task in a setting where for each i ∈ [n] a binary label y i is available.
Alamgir and von Luxburg [3,43] consider the closely related problem of recovering the latent positions in a setting where a nearest-neighbor graph is available.They propose a method based on estimating the underlying density denoted f .If fi denotes the density estimate at x i , a graph is defined on [n] with weights , and d ij is estimated by the graph distance between nodes i and j.
Latent positions random graph models also play a role in the literature on rankings 1 [15,25].A typical parametric model represents each player i ∈ [n] by a number x i such that the probability that i wins against j in a single game is φ(x i − x j ).Note that the link function is applied to the difference and not the absolute value of the difference.For example, the Bradley-Terry-Luce model [8,24] uses the logistic link function.Suppose that multiple games are played between multiple pairs of players.The result of that can be summarized as (W ij : i = j), where W ij is the number of games where i prevailed over j.This is the weight matrix of a directed latent positions graph where the positions are (x 1 , . . ., x n ).We refer the reader to [27,33] and references therein for theoretical results developed for such models.

Our contribution
Graph distances are well-known estimates for the Euclidean distances in the context of graph drawing [20,35], where the goal is to embed items in space based on an incomplete distance matrix.They also appear in the literature on link prediction [22,23] and are part of the method proposed in [43].We examine the use of graph distances for the estimation of the Euclidean distances (2).As we shall see, the graph distances are directly useful when the link function φ is compactly supported, which is for example the case in the context of a neighborhood graph where φ(d) = I{d ≤ r} for some connectivity radius r > 0. In fact, the method is shown to achieve a minimax lower bound in this setting (under a convexity assumption).This setting is discussed in Section 2. In Section 3, we extend the analysis to other (compactly supported) link functions.We end with Section 4, where we discuss some important limitations of the method based on graph distances and consider some extensions, including localization (to avoid the convexity assumption) and the use of the number of common neighbors (to accommodate non-compact link functions).Proofs are gathered in Section 5.

The graph distance method
Given the adjacency matrix W , the graph distance (aka shortest-path distance) between nodes i and j is defined as where inf ∅ = ∞ by convention.Here and elsewhere, we will sometimes use the notation W (i, j) for W ij , d(i, j) for d ij , etc.We propose estimating, up to a scale factor, the Euclidean distances (2) with the graph distances (4).Indeed, since φ is assumed unknown, the scale factor cannot be recovered from the data, as is the case in ordinal embedding, for example.Therefore, estimates are necessarily up to an arbitrary scaling factor, so that the accuracy of an estimator d = ( dij ) for d = (d ij ) is measured according to how close we can make s d and d in some chosen way by choosing the scale s > 0 with (oracle) knowledge of d.For example, with mean squared error, this leads to quantifying the accuracy of d as follows min s>0 i<j The graph distance method is the analog of the MDS-D method of Kruskal and Seery [20] for graph drawing, which is a setting where some of the distances (2) are known and the goal is to recover the missing distances.Let E denote the set of pairs i < j for which d ij is known.MDS-D estimates the missing distances with the distances in the graph with node set [n] and edge set E, and with edge (i, j) ∈ E weighed by d ij .This method was later rediscovered by Shang et al. [35], who named it MDS-MAP(P), and coincides with the IsoMap procedure of Tenenbaum et al. [41] for isometric manifold embedding.(For more on the parallel between graph drawing and manifold embedding, see [10].) As we shall see, the graph distance method is most relevant when the positions are sufficiently dense in their convex hull, which is a limitation it shares with which measures how dense the latent points are in Ω.We also let Λ(x 1 , . . ., x n ) denote (5) when Ω is the convex hull of {x 1 , . . ., x n }.

Simple setting
In this section we focus on the simple, yet emblematic case of a neighborhood (ball) graph, that is, a setting where the link function is given by φ(d) = I{d ≤ r} for some r > 0. When the positions are drawn iid from some distribution, the result is what is called a random geometric graph [29], but here we consider the positions to be deterministic.In particular, the setting is not random.We start with a performance bound for the graph distance method and then establish minimax lower bound.Similar results are available in [6,9,28], among other places, and we only provide a proof for completeness, and also to pave the way to the more sophisticated Theorem 3.
Assume that φ(d) = I{d ≤ r} for some r > 0, and define dij = rδ ij .If the connectivity radius r is sufficiently larger than the density of the point set ε, which in particular implies that where ρ is the diameter of {x 1 , . . ., x n }.
In the statement, d is not a true estimator in general as it relies on knowledge of r, which may not be available, nor be estimable, if the link function is unknown.Nevertheless, the result says that, up to that scale parameter, the graph distances achieve a nontrivial level of accuracy.Compare with [28, Th 2.5], which in the context of points on a Euclidean space as considered here says that, in a stochastic setting where the points are generated iid from some distribution supported on a convex set, max ij ( dij − d ij ) is bounded by r in the limit where n → ∞ while r remains fixed.
For a numerical example, see Figure 1.In Figure 2 we confirm numerically that the method is biased when the underlying domain from which the positions are sampled is not convex.That said, the method is robust to mild violations of the convex constraint, as shown in Figure 3, where the positions correspond to n = 3000 US cities.2 (Computations were done in R, with the graph distances computed using the igraph package, to which Classical Scaling was applied, followed by a procrustes alignment and scaling using the package vegan.) Remark 1.If we apply Classical Scaling to d, we obtain an embedding with arbitrary scaling and rigid positioning, which are not recoverable when r is unknown.Nevertheless, if we apply the perturbation bound recently established in [5,Cor 2], the recovery of the latent positions is of order at most O(ε/r + r).It turns out that the graph distance method comes close to achieving the best possible performance (understood in a minimax sense) in this particularly simple setting.Indeed, we are able to establish the following general lower bound that applies to any method.Theorem 2. Assume that φ(d) = I{d ≤ r} with r > 0 known.Then there is a numeric constant c 0 > 0 with the property that, for any ε > 0 and any estimator 3 d, there is a point set x 1 , . . ., x n such that Λ(x 1 , . . ., x n ) ≤ ε and, for at least half of the pairs i = j, and also, for another numeric constant c 1 > 0, where ρ is the diameter of the point set.
Thus, in the strictest sense, the graph distance method is, for this particular link function, minimax optimal (in order of magnitude).It turns out that the point configurations that we consider in the proof are all embedded on the real 3 An estimator here is simply a function on the set of n-by-n symmetric binary matrices with values in R n(n−1)/2 + .line and thus, in principle, can be embedded in any Euclidean space.It is also the case that, for these particular configurations, it does not help if we know that We speculate that a better error rate can be achieved (in probability) under a stochastic model, for example, when x 1 , . . ., x n are drawn iid from the uniform distribution on some 'nice' domain of a Euclidean space.On the other hand, we anticipate that our performance analysis of the graph distance method is essentially tight even then.To achieve a better performance, more sophisticated methods need to be considered.Methods using neighbors-of-neighbors information [28,31] are particularly compelling, but in principle require knowing (or perhaps estimating) the underlying density if it is unknown.We probe this question a little further in Section 4.3 with some simple but promising numerical experiments.(All we know about this approach is that it can lead to a 2-approximation [28].)

General setting
Beyond the setting of a neighborhood graph considered in Section 2, the graph distance method, in fact, performs similarly when the link function is discontinuous at the edge of its support, meaning when it drops abruptly to 0. A case in point is when φ(d) = pI{d ≤ r} for p > 0 constant, which corresponds to a random geometric graph with its edges independently deleted with probability 1 − p. See Figure 4 for a numerical example illustrating this particular case.
More generally, we establish the performance of the graph distance method when the link function is compactly supported.The bound we obtain is in terms of how fast the function approaches 0 at the edge of its support.Note that, unlike the setting of a neighborhood graph, the model is truly random when the link function is not an indicator function, so that the statement below is in probability.Theorem 3. Assume that φ has support [0, r], for some r > 0, and define dij = rδ ij .Assume that, for some which in particular implies that where ρ is the diameter of {x 1 , . . ., x n }.Same setting as in Figure 3.Here we set r = 5 and vary p.In fact, to ease the comparison, we coupled the different adjacency matrices in the sense that the (p = 0.2)matrix was built by erasing edges from the (p = 0.5)-matrix independently with probability 0.2/0.5 = 0.4.
Although we believe our performance analysis in (7) to be tight, we do not know whether it is minimax optimal in any way.
For the graph distance, we expect it to be less accurate the slower the link function φ approaches 0. This is borne out in some numerical experiments that we performed.In those experiments, n = 5000 points were drawn uniformly at random from [0, 1] 2 considered as a torus to avoid boundary effects.(Clearly, our results apply in this setting as well.)For each α ∈ {0, 0.1, . . ., 0.9, 1, 2, 3, 4, 5}, we computed a realization of the adjacency matrix with link function chosen so that P(W ij = 1) is the same regardless of α as long as r ≤ 0.5.In our experiments, we chose r = 0.1.This was repeated 100 times.The results are presented in Figure 5, where

Discussion
The method based on graph distances suffers from a number of serious limitations: Here n = 5000 points were drawn uniformly at random from the 2D unit torus.The radius was set at r = 0.1 and the link function varied with α as specified in (8).The median relative error is over 100 repeats.The density ε set at 0.025, as determined by simulation.
1.The positions need to span a convex set, although the method is robust to mild violations of this constraint as exemplified in Figure 3. 2.Even in the most favorable setting of Section 2, the relative error is still of order r, as established in (7).This is clearly tight for the graph distance method, and although it matches the lower bound established in Theorem 2, this bias could potentially be avoided when the positions are nicely spread out, for example, as a random sample from some nice distribution is expected to be. 3. The link function needs to be compactly supported.Indeed, the method can be grossly inaccurate in the presence of long edges, as in the interesting case where the link function is of the form where 0 < q < p ≤ 1, as considered in [28].
We address each of these three issues in what follows.

Localization
A possible approach to addressing Issue 1 is to operate locally.This is wellunderstood and is what lead Shang and Ruml [34] to suggest MDS-MAP(P), which effectively localizes MDS-MAP [35].(As we discussed earlier, the latter is essentially a graph-distance method and thus bound by the convexity constraint.)More recent methods for graph drawing based on 'synchronization' also operate locally [12,13].
Experimentally, this strategy works well.See Figure 6 for a numerical example, which takes place in the context of the rectangle with a hole of Figure 2. We adopted a simple approach: we kept the graph distances that were below a threshold, leaving the other ones unspecified, and then applied a method for multidimensional scaling with missing values, specifically SMACOF [14] (initialized with the output of the graph distance method).For all i = j, δ ij ∈ {1, 2, 3, 4, 5}, so that the graph distances are rather discrete, yet the embedding computed by classical multidimensional scaling is surprisingly accurate.

Regularization
Regarding Issue 2, in numerical experiments we have found that the graph distances, although grossly inaccurate, are nevertheless useful for embedding the points using (classical) multidimensional scaling.Thus, if one is truly interested in estimating the Euclidean distances, one may use graph distances as rough proxies for the underlying distances, apply multidimensional scaling, and then compute the distances between the embedded points.For a numerical illustration, see Figure 7.This phenomenon remains surprising to us and we do not have a good understanding of the situation.

Number of common neighbors
A possible approach to addressing Issue 3, as well as Issue 2, is to work with the number of common neighbors, which provides an avenue to 'super-resolution' in a way, at least when the positions are sampled iid from a known distribution such as the uniform distribution on a domain (known and convex).By this we mean that, say in the simple setting of Section 2, although the adjacency matrix only tells whether two positions are within distance r, it is possible to gather all this information to refine this assessment.Similarly, in the setting where ( 9) is the link function, it is possible to tell whether two positions are nearby or not.This sort of concentration is well-known to the expert and seems to be at the foundation of spectral methods (see, e.g., [39,Prop 4.2]).We refer the reader to [28,31], where such an approach is considered in greater detail.

Proof of Theorem 1
We have z 0 = x i and z m+1 = x j , and z 0 , z 1 , . . ., z m+1 are on the line joining x i and x j and satisfy z s − z s+1 ≤ r − 2ε for all s.Let x ks be such that z s − x ks ≤ ε, with x k0 = x i and x km+1 = x j .Note that x ks is well-defined since z s belongs to the convex hull of {x 1 , . . ., x n } and we have assumed that Λ(x 1 , . . ., x n ) ≤ ε.By the triangle inequality, for all s ∈ {0, . . ., m}, Hence, (x k0 , x k1 , . . ., x km+1 ) forms a path in the graph, and as a consequence, using the fact that ε ≤ r/4.Resetting the notation, let k 0 = i, k 1 , . . ., k = j denote a shortest path joining i and j, so that = δ ij .By the triangle inequality, using the fact that x ks − x ks+1 ≤ r for all s.

Proof of Theorem 2
First term on the RHS of (6) We construct two point configurations that yield the same adjacency matrix and then measure the largest difference between the corresponding sets of pairwise distances.Assume that r ≤ 1/2 (without loss of generality) and that m := r(n − 1) is an integer for convenience.We define two configurations of points, both in Ω := [0, 1] (so that v = 1 here).
The two configurations coincide when η = 0, but we will choose η > 0 in what follows.Under Configuration 1, the adjacency matrix W is given by W ij = I{|i − j| ≤ m}.For the design matrix to be the same under Configuration 2, it suffices that x 1 have (exactly) m neighbors (to the right) and that x n have (exactly) m neighbors (to the left); this is because i → x i − x i−1 is decreasing in this configuration.These two conditions correspond to four equations, given by We need only consider the first and fourth as they imply the other two.After some simplifications, we see that the first one holds when r ≤ 1, while the fourth holds when r ≤ 1−2/(n−1) and η ≤ 1/(2n−3+m(n−m−3)).Since r ≤ 1/2, r ≤ 1−2/(n−1) when n ≥ 5, and we choose η = 1/(2n+m(n−m)) for example.Then Λ Ω (x 1 , . . ., x n ) ∼ 1/2n in Configuration 2 (same as in Configuration 1).We choose n = n ε just large enough that Λ Ω (x 1 , . . ., x n ) ≤ ε in both configurations.
In particular, ε ∼ 1/n ε as ε → 0. Since the result only needs to be proved for ε small, we may take n as large as we need.Now that the two designs have the same adjacency matrix, we cannot distinguish them with the available information.It therefore suffices to look at the difference between the pairwise distances.Let d (k) ij denote the distance between x i and x j in Configuration k.For i < j, we have by the fact that ηn ≤ 1/2.Also, for some universal constant C > 0, using the fact that ηn ≤ 1/2 and ηn 1/(1 ∨ rn) ε/(ε ∨ r), the latter because ε 1/n in our construction.Since |i + j − n − 1| ≥ n/10 for most pairs of indices i < j, the following is also true To conclude, since the two configurations have the same adjacency matrix, they are indistinguishable solely based on that information, and so it must be that for any estimator d, for most pairs i < j, where Second term on the RHS of (6) We construct again two point configurations, also on the real line, that have the same adjacency matrix.We assume that ε ≤ r for otherwise the first term in the RHS of ( 6) is of order 1 for most pairs of indices.In fact, we assume that q = r/ε is an integer for simplicity.
To any pattern y 0 = 0 < y 1 < • • • < y m = r with y j − y j−1 ≤ 2ε for all j, associate the point set x 1 < • • • < x n where x i = y i mod m + i/m r.Note that the x point set is built by repeating the y pattern.As can be readily seen, all these point sets have the same adjacency matrix W ij = I{|i − j| ≤ m} and have Λ bounded by ε.We now consider two particular cases.Take m even so that m = 2q for some integer q > 0, and define the following configurations.

Let d (k)
ij denote the distance between x i and x j in Configuration k.Letting b i := i mod 2q, we have where for a real a, a + := max(0, a).We have ε − η = qε/(q + 1) = r/(q + 1), and elementary considerations confirm that for most i < j, the factor defined in the curly bracket above is ≥ C(q + 1) for some universal constant C > 0. Hence, |d ij | ≥ Cr for most pairs of indices, and this then implies as before that for any estimator d, for most pairs i < j,

Proof of Theorem 3
In the following, C 0 , C 1 , C 2 refer to the constants appearing in the statement of Theorem 3, while c 1 , c 2 , . . .denote positive constants that only depend on (α, C 0 ).Since the result only needs to be proved for large r/ε, we will take this quantity as large as needed.In what follows, we connect each node in the graph to itself.This is only for convenience and has no impact on the validity of the resulting arguments.
As before in (10), we have dij ≥ d ij for all i = j.Recall the definition of p ij ≡ p(i, j) in (3).Let p 0 = φ(r/2) > 0 and note that p 0 ≥ C 0 (1/2) α .Special case Suppose that d kl ≤ r/2 for all k = l.In that case, for all i = j, p ij = φ(d ij ) ≥ φ(r/2) = p 0 .For (i, j, k) distinct, (x i , x k , x j ) forms a path in the graph if and only if W ik W kj = 1, which happens with probability p ik p kj ≥ p 2 0 .Therefore, by independence, Henceforth, we assume that Claim 1.By choosing C 1 large enough, the following event happens with probability at least 1 − 1/n 2 , Take i, j such that d ij ≤ r/4.We first note that there is j * such that d(i, j * ) > r/4, for otherwise, for all k, l ∈ [n], d kl ≤ d ki + d il ≤ r/4 + r/4 = r/2, which would contradict our assumption (11).Define where m := (r/4 − ε)/ε .By construction each z s is on the line segment joining x i and x j * , and so belongs to the convex hull of x 1 , . . ., x n ; hence, by the fact that Λ(x 1 , . . ., x n ) ≤ ε, there is i s ∈ [n] be such that x is − z s ≤ ε.By the triangle inequality, and Therefore, for each s ∈ [m], (x i , x is , x j ) forms a path with probability at least p 2 0 .By independence, therefore, there is such an s ∈ [m] with probability at least 1 − (1 − p 2 0 ) m .With the union bound and the fact that m ≥ r/5ε when r/ε is large enough, we may conclude that, if C 1 is chosen large enough, the event A 2 := dij ≤ 2r for all i = j such that d ij ≤ r/4 , has probability at least 1 − 1/n 2 .Indeed, eventually, when r/ε ≥ C 1 (log n) 1+α with C 1 large enough.Next, we prove that A 2 implies A 1 , which will suffice to establish the claim.For this, we consider the remaining case where i, j are such that d ij > r/4.Define z 0 = x i and where this time m := d ij /ε .As before, for each s ∈ [m], there is i s ∈ [n] such that x is − z s ≤ ε.We let i 0 = i and i m = j.The latter is possible since z m − x j ≤ ε.
We have so that, under A 2 , implying that d(i s , i s ) ≤ 2r when |s − s | ≤ h.Thus, by the triangle inequality, under A 2 , By the triangle inequality, and it is not hard to verify that 0 when ε/r is small enough.
We have thus established Claim 1.
Claim 1, of course, falls quite short of what is stated in the theorem, but we use it in the remainder of the proof.That said, the claim takes care of all pairs (i, j) such that d ij ≤ 2r.Thus, for the remainder of the proof, we only need focus on i, j such that d ij > 2r.Define m and z 0 , . . ., z m as before, and also the corresponding As before, (12) implies that and in particular d(i s , j) ≤ r when s ≥ m − h.
(Note that we changed the definition of h.)Similarly, we have For each 0 ≤ s ≤ m, define the random variable H s by This is a maximum since we have set W (k, k) = 1 for all with the convention that inf ∅ = ∞.Our first objective is to bound T in probability.Given t ≥ 1, we have On the one hand, On the other hand, we note that, when H St−1 > 0, we necessarily have S t ≥ t, so that Thus, In what follows, we bound each term in the right hand side of (13).

Claim 2.1. For any
where H is a random variable supported on {0, 1, . . ., h} with distribution function Indeed, by independence, and by the fact that φ is non-increasing and (12), Claim 2.2.We have For any 0 ≤ a ≤ h − 1, we have Since φ is non-increasing, and using the lower bound we assume in the statement of the theorem, This proves (14).With Claims 2.1 and 2.2, the first term on the right-hand side of ( 13) is bounded by where { Ht } are iid copies of H. First, if t = 1, we have Next, fix t ≥ 2 and suppose that the claim is true at t − 1.Since t ≤ S t ≤ a implies that t − 1 ≤ S t−1 ≤ a, we have where we used the fact that {S t−1 = k} is independent of H k , and also the fact that the Ht are iid.We may assume, and we do so, that the { Ht } are defined on the same probability space as the {S t }, and are independent of them.Then, where the inequality comes from the recursion hypothesis.Thus the recursion proceeds, and the claim is proved.
Define Ū = h − H and Ūt = h − Ht .We have 14) and the fact that h ≤ r/ε − 2. Hence, Ū is stochastically bounded by 1 + νY , where Y is a random variable with distribution With Claims 2.3 and 2.4, by choosing a = h − (m − h)/t, we obtain that the second term on the right-hand side of ( 13) is bounded by exp −c 5 tν −1 a , whenever a ≥ νb 0 , which happens when t ≥ (m − h)/(h − νb 0 ).This is the case when t ≥ t * := 2d ij /r, which may be seen using the fact that m−h ≤ m ≤ d ij /ε, and that h − νb 0 ≥ r/ε − 3(r/ε) α/(α+1) b 0 ≥ r/2ε when r/ε is large enough.In fact, when t ≥ t * , the corresponding a satisfies a ≥ r/ε − 3 − (d ij /ε)/(2d ij /r) ≥ r/2ε when r/ε is large enough, so that the right-hand side of ( 13 A joint control on V T and T is useful because of the following.Under {T < ∞}, (i, i S1 , . . ., i S T ) forms a path in the graph, so that d(i, i S T ) ≤ T r.We also have This proves that (7) holds for all i, j ∈ [n] such that d ij > 2r.

Fig 3 .
Fig 3. A numerical example illustrating the setting of Theorem 1.The latent positions are located at the coordinates of n = 3000 US cities and the connectivity radius varies (in degrees).

Fig 4 .
Fig 4.  Same setting as in Figure3.Here we set r = 5 and vary p.In fact, to ease the comparison, we coupled the different adjacency matrices in the sense that the (p = 0.2)matrix was built by erasing edges from the (p = 0.5)-matrix independently with probability 0.2/0.5 = 0.4.