On a random walk that grows its own tree

Random walks on dynamic graphs have received increasingly more attention from different academic communities over the last decade. Despite the relatively large literature, little is known about random walks that construct the graph where they walk while moving around. In this paper we study one of the simplest conceivable discrete time models of this kind, which works as follows: before every walker step, with probability p a new leaf is added to the vertex currently occupied by the walker. The model grows trees and we call it the Bernoulli Growth Random Walk (BGRW). We show that the BGRW walker is transient and has a well-defined linear speed c(p) > 0 for any 0 < p ≤ 1. Moreover, we show that the tree as seen by the walker converges (in a suitable sense) to a random tree that is one-ended. Some natural open problems about this tree and variants of our model are collected at the end of the paper.


Introduction
Random walks on graphs that change over time have received much attention over the past decades. Within this context, a large body of work assumes only edge (or node) weights change over time while the graph structure (i.e., edge set) remains constant. Examples include reinforced random walks and random walks in random environments [1,8,9,10,15,11,6,7]. A much smaller line of work assumes that the graph structure (edges and nodes) changes over time. However, such works generally assume graph dynamics to be independent of the walker [3,12,17] (an exception is [14]).
This work explores a novel model where the random walk constructs its own graph, mutually coupling the walker and graph dynamics. This model is defined as follows: (0) start with a finite tree with the walker sitting on one of the vertices; (1) with probability p, add and connect a new leaf to the current location of the walker; (2) let the walker take one step on the current graph; (3) go to step 1.
We refer to this model as BGRW (Bernoulli Growth Random Walk). Note that our model may be seen as a sequence of pairs (T n , X n ) n∈N , where T n is a tree and X n is the walker's position on the tree T n . A more formal definition is given in Section 2. This model is a variant of the Non-Restart Random Walk (NRRW) proposed and studied in [2,13]. There, the initial graph is a vertex with a loop edge, and new leafs are created every s steps for a fixed s > 0. Prior to [2], other models of walks creating graphs had been proposed, but they all had periodic restart step where the random walker would jump to a uniform vertex in the tree [5,19].
A fundamental question on dynamic graphs is the recurrent or transient nature of the walker. Figure 1 shows simulated sample paths from the BGRW model for different values of p. The depicted trees have exactly the same number of nodes (i.e., regardless the value of p the simulations stopped when the number of nodes reaches n = 5000). For larger values of p the generated trees are very slim and long, as p decreases the trees become fatter and shorter. Intuitively, with large p the random walk can escape more easily, while for small p the random walk wanders more. Our findings show that for any fixed p, the random walk escapes with a positive speed c(p) > 0. p = 0.9 p = 0.5 p = 0.1 p = 0.01 Figure 1: Trees with n = 5000 nodes generated by simulating BGRW using different values for p. Figure 1 naturally raises a question regarding the tree from the perspective of the walker as it walks around. This "point of view of the particle" has proven to be powerful in the investigation of many examples of random processes on random environments, see [18] and references therein. In our case, this approach gives fine information about the speed of the walk, as well as about the topology of the environment the walker constructs around itself. More specifically, we show that the tree around the walker converges (in a suitable sense) to an infinite one-ended random tree whose law does not depend on the initial states, i.e., the initial graph is in some sense "forgotten". Remarkably, we also show that the frequency with which the walker sees any given tree of finite height around itself is strictly positive for any value of p ∈ (0, 1).

Main results
Our first main result, proved in Section 5, shows that our process has a well-defined, linear 1 asymptotic speed. This is essentially a mixing property of the BGRW when viewed as a Markov chain over T * . Our proof reveals that P p is the unique stationary measure of this chain with the following property: if the process is started from P p , then almost surely, for all ≥ 1, the walker will eventually be at the tip of a path of length . Theorem 1.2 follows from a stronger statement, Theorem 7.3 in Section 7.
The next theorem gives us some information on the probability measure P p . Recall that an infinite rooted tree is one-ended if it contains a single infinite path starting from the root. Theorem 1.3 (Support of P p ). Given 0 < p ≤ 1, let P p be the limit measure in Theorem 1.2. Then P p is supported on one-ended infinite trees. If 0 < p < 1, any rooted tree [S, x] (with a certain height h) satisfies On a random walk that grows its own tree The moments when BGRW creates a long path and does not backtrack by more than half its length are "local regeneration times" in the following precise sense: the sequence of neighborhoods of radius /2 seen by the walker from that point on are independent of the past.
As it turns out, there are difficulties in establishing the intuitive picture drawn by items 1 and 2 above. Creating paths looks easy enough: all one needs is to create a new leaf and jump to it consecutive times. However, in order for this to happen often, the walker needs to come across lots of nodes of low degree. For this to take place, it is necessary that nodes are usually not visited too many times. To prove that returns to a vertex are unlikely to be numerous, one needs to show that the walk moves away fast enough from any vertex. The upshot is that a weaker form of positive speed is needed before we can justify the above items.
Proving that dist Tn (X n , X 0 ) grows at least linearly with time for any positive p will thus be the first goal in our proof. In essence the proof consists in establishing a coupling between the random walk in BGRW and a biased random walk on Z. While this can be readily achieved for p sufficiently large (e.g., for p = 1 see [13], but also for p > 2/3, more generally), establishing such a coupling for all positive p turns out to be surprisingly laborious. We do that by resorting to a grass-roots argument which we believe closely mimic the actual behaviour of the walker in BGRW. Given the preliminary result on the growth of dist Tn (X n , X 0 ), we can move on to establishing the convergence of the tree as seen by the walker. The latter result, in particular, implies the existence of a well-defined positive speed. It will be crucial for us that this tree -or rather, the corresponding empirical measure -converges almost surely to an invariant measure P p for the dynamics on rooted trees, for all "nice" initial conditions. There is a high-level similarity to the work of Lyons, Pemantle and Peres [16] on random walk on Galton-Watson trees, where the existence of a speed relies on ergodic-theoretic arguments in the space T * . However, in that paper the stationary measure can be described explicitly (leading to an explicit formula for the speed), which is not the case in our setting. Our analysis will also require more quantitative estimates on the convergence.

Organization and main proof steps
The remainder of the paper is organized as follows. In Section 2 we define our process formally. One important conceptual point will be to define it on locally finite trees from the start.
The proof of linear growth of dist Tn (X n , X 0 ) begins in Section 3, where we show that, for any positive M , it is very likely that dist Tm (X m , X 0 ) ≥ log M n for some m ≤ n.
This requires comparisons with simple one dimensional random walk. The argument continues in Section 4, where we prove that backtracking on a long path, if it happens at all, typically takes a really long time. For this we introduce a simplified "loop process" that only keeps track of the path itself along the way. Combining polylog distances and long times to backtrack, we prove in Section 5 that our process has positive drift away from X 0 , and derive some consequences of this fact for the degrees in the tree.
We switch gears for the remainder of the paper, as our arguments become more abstract. In Section 6 we extend the definition of our process to the space of rooted locally finite trees. The statement in Theorem 1.2, on the convergence of the tree as seen by the walker, can then be stated as a weak convergence result for the empirical measure of this process. Tightness and weak convergence criteria are discussed in this section, and tightness is proven right away.
Section 5 begins to study the loss-of-memory mechanism described above, whereby long paths are created and never backtracked on. Since we deal with infinite trees, we need some condition on the initial distribution to ensure that this takes place. We use this in Section 6 to prove that the averages of "local functions" along the trajectories of the process always converge almost surely. This is basically the last ingredient we need to prove stronger versions of Theorems 1.2 and 1.3 in Section 7. One key point is that the limiting measure P p is a stationary distribution for our process.
The paper wraps up with Section 8, with some final comments, and an Appendix containing technical results.
2 Definition of the model

Preliminaries
All trees in this paper are locally finite (all vertices have finite degree). Given a tree T , we let V (T ) and E(T ) denote its vertex and edge sets. For x, y ∈ V (T ), d T (x) denotes the degree (number of neighbors) of x in T and dist T (x, y) is the shortest-path distance between x and y.
We let Ω denote the set of all pairs (T, x), where T is locally finite tree with V (T ) ⊂ N and N\V (T ) infinite, and x ∈ V (T ). This set Ω can be described as a subset of a product space {0, 1} N × {0, 1} ( N 2 ) × N with a natural σ-field; we omit the details. Given a probability measure µ over some space, we use the symbol U ∼ µ to mean that U is a random element with law µ.

Definition of the process
Let p ∈ (0, 1]. The BGRW process with parameter p is a Markov chain with transition kernel K p . To define this kernel, given (T, x), we sample (T , x ) ∼ K p ((T, x), ·) as follows.
Conditionally on the above, let x be a uniformly chosen neighbor of x in T .
It is easy to see that this does define a valid Markov transition kernel on Ω. We use (T t , X t ) t≥0 to denote a trajectory of K p and P µ,p and E µ,p to denote probabilities and expectations when (T 0 , X 0 ) ∼ µ. If µ is a point mass on (S 0 , x 0 ), we replace µ by S 0 , x 0 in the subscripts. Remark 2.1. Most of the time we will ignore this formal definition of the process and stick with the informal version presented in the introduction. We will later need to define this process on the set of rooted trees up to isomorphisms. See Section 6 for details.

Nontrivial distance from the root
In this section we show the following property of our process: for any y ∈ T 0 , and any constant M > 0, the random walker will most likely reach distance ≥ log M n from y in at most n steps. Lemma 3.1. For any M > 0 and 0 < p ≤ 1 there exists n 0 = n 0 (p, M ) ∈ N, depending only on p and M such that, for all n ≥ n 0 , all finite trees T 0 and all x 0 , y ∈ T 0 , P T0,x0,p ∃m ≤ n, dist Tm (X m , y) ≥ log M n ≥ 1 − e −n 1/4 .
For the proof of this lemma, we need the following definition.

Definition 3.2.
Say that M > 0 is admissible if the conclusion of Lemma 3.1 holds for this specific value of M . That is, M is admissible if, for any 0 < p ≤ 1, there exists n 0 = n 0 (p, M ) ∈ N, depending only on p and M such that, for all n ≥ n 0 , all finite trees T 0 and all x 0 , y ∈ T 0 , P T0,x0,p ∃m ≤ n, dist Tm (X m , y) ≥ log M n ≥ 1 − e −n 1/4 . when M is admissible and x 0 is "far" from y, then it is likely that the distance from the walker to y will increase by at least one unit by time n. This probability is large enough that we are likely to see many such increases in a small time window.
Claim 3.5 (Small growth in distance; proof in subsection 3.2). Assume M ≥ 1 2 is admissible (cf. Definition 3.2). Then there exists n 1 (p, M ) ∈ N such that, for n ≥ n 1 (p, M ), the following property holds. Take a finite tree T and x, y ∈ T with dist T (x, y) ≥ log M n.
As we will see, the "harder" Proposition 3.4 follows from this claim and the assumption that M is admissible applied to time ≈ √ n instead of n.
Throughout this section we will use the following simple and standard lemma, which we prove in the Appendix for completeness. Lemma 3.6 (Proof in Appendix A). Suppose (I j ) j∈N\{0} are indicator random variables. Assume µ is such that P (I 1 = 1) ≥ µ and ∀j > 1 : P (I j = 1 | I 1 , . . . , I j−1 ) ≥ µ.

The "easy" proposition
Proof of Proposition 3.3. It suffices to show that there exists n 0 = n 0 (p) ∈ N such that, for all n ≥ n 0 , for any initial tree T 0 and any vertices x 0 , y ∈ T 0 , P T0,x0,p ∃t ≤ n : dist Tt (X t , y) ≥ log To do this, we define, for each time 1 ≤ t ≤ n, Notice that we have the following inclusion of events ∃t ≤ n : dist Tt (X t , y) ≥ log 1 2 n ⊃ at least log 1 2 n consecutive 1's in (I t ) n t=1 .
So we will apply Lemma 3.6. The point is that, by the Markov property: Indeed, the value p/2 is achieved when x is a leaf of T . In that case I 1 = 1 only occurs when a new neighbor is created for x (with probability p) and then the walker jumps to that neighbor (with probability 1/2). If x is not a leaf, then the probability of the latter event is strictly larger than 1/2.
On a random walk that grows its own tree Combining (3.1) with Lemma 3.6, we obtain: Choosing n 0 sufficiently large such that , the lower bound (3.2) is larger than 1 − e −n 1/4 for all n ≥ n 0 . For later purposes we point out that, n 0 (p) is non-increasing as a function of p.

A key claim
We now come to the proof of Claim 3.5, which connects the "easy" and "hard" proposition in the proof.
Proof of Claim 3.5. Consider the vertex y * on the unique path from x to y with dist T (x, y * ) = log M n − 1. We recall that the admissibility of M implies, for all 0 < p ≤ 1, the existence of n 0 depending only on M and p such that: ∀n ≥ n 0 : P T,x,p (∃t ≤ n : dist Tt (X t , y * ) ≥ log M n) ≥ 1 − e −n 1/4 . Now let F n (for failure) denote the event that dist Tt (X t , y) ≤ dist T (x, y) for all t ≤ n. Let τ y * the hitting time of y * τ y * := inf{t ∈ N : X t = y * } ∈ N ∪ {+∞}.
Define τ +k x the k-th return time to x. That is, we set τ +0 x := 0, and then (recursively for k ∈ N\{0}): Note that, for any k, we may upper bound the probability of F n by three terms, which we will bound separately.
We start with the first term in the RHS. Note that if τ y * > n, then the unique path from X t to y passes through y * at all times t ≤ n. Therefore, In particular, when F n ∩ {τ y * > n} holds, we must have ∀t ≤ n : dist Tt (X t , y * ) ≤ dist T (x, y * ) < log M n, i.e. the event in (3.3) does not hold. P T,x,p (F n ∩ {τ y * > n}) ≤ e −n 1/4 .

(3.5)
We now consider the second term in the RHS of (3.4). In order for F n ∩ {τ +k x < τ y * ≤ n} to take place, it must be that X t returns at least k times to x before visiting y * but never gets to jump to a neighbor x of x with dist(x , y) = dist(x, y) + 1. Now, at each return to x, the probability that X t jumps to such a neighbor conditionally on the process up to that point is at least p/2: the probability of creating a leaf and then not jumping in the direction of y (similarly to what we did in Equation (3.1)). We deduce: Finally, we come to the third term in the RHS of (3.4). Consider the walk X t at the sequence of time steps that it spends on the path from x to y, up to the time τ y * .
The resulting process is a simple random walk on the path with potential delays and reflecting barriers: indeed, since our graph is a tree at all times, whenever the random walk leaves the path, it must return to it (if it does return) at the same point that was last visited. Now, in order that τ +k x > τ y * , it must be that this simple random walk on the path hits y * before returning k times to x. Since the path has length dist T (x, y) = log M n − 1, the probability of this happening is ≤ k/( log M n − 1), and we obtain: Combining the terms in (3.5), (3.6) and (3.7), we obtain a bound in (3.4): (log log n) 2 log M n e −n 1/4 log M n (log log n) 2 + e − p 2 (log log n) 2 log M n (log log n) 2 + log M n log M n − 1 , (3.8) where the last inequality follows by choosing k := (log log n) 2 . Choosing n 1 = n 1 (p, M ) sufficiently large in (3.8) such that e −n 1/4 log M n (log log n) 2 +e − p 2 (log log n) 2 log M n (log log n) 2 + log M n log M n−1 < 2 we obtain that, for all n ≥ n 1 , P T,x,p (F n ) ≤ 2 (log log n) 2 log M n . (3.9) This proves the claim. Note that n 1 (p, M ) is non-increasing in p.

The "harder" proposition
We now prove Proposition 3.4, thus finishing the proof of Lemma 3.1.  Both of these time lengths depend on n, but we omit this dependency from the notation to avoid clutter. The definitions of these quantities will probably seem misterious, but we comment on them in due time; see Remarks 3.7, 3.8 and 3.9 below.
The fact that M is admissible and * → +∞ when n → +∞ implies that, for any 0 < p ≤ 1, there exists n 2 (p, M ) such that if n ≥ n 2 (p, M ), we have the following properties for any finite tree T 0 and any x 0 , y ∈ T 0 .
2. By Claim 3.5 applied with * replacing n, if T is finite and x, y ∈ T with dist T (x, y) ≥ 2 −M log M n, then: Remark 3.7 ( * is large enough). In the above we used the fact that log * = log √ n to guarantee that the two events under consideration have high probability. We will later need that n/ * t * is large; see Remark 3.8 below.
We now define a sequence of stopping times σ j and indicators I j . Intuitively, we will want that dist Tσ j (X σj , y) > dist Tσ j−1 (X σj−1 , y), and we will signal such a success by setting I j = 1. More formally, for j = 0, σ j = 0 and I j = 0. We define σ j and I j for j > 0, with the following two choices.
We thus arrive at the following crucial observation (recall the time lengths t * , * in (3.10).
Observation 3.1. Assume that there are t * + 1 consecutive ones in the sequence Indeed, if we have t * + 1 consecutive ones in this sequence there exists j 0 with (j 0 + t * ) * ≤ n and I j0 = I j0+1 = · · · = I j0+t * = 1. What we are after is to show that the probability of t * + 1 consecutive ones is very likely. The upshot is that we may apply Lemma 3.6 above to control the probability that the random walk reaches distance log M +1/2 n from y in n time steps, i.e. that M + 1/2 is admissible.
Remark 3.8. Note that for the lemma to be effective we need n/ * 1 so that there are "many indicators" to consider, and also that n/ * t * 1. It will later become clear that (for t * polylogarithmic) it suffices that n/ * ≥ n 1/4+c+o (1) for some c > 0.
Crucially, conditions (a) and (b) in the definition of σ 1 and I 1 correspond precisely to the situations in the two bounds (3.11) and (3.12) (respectively). It follows that, for any 0 < p ≤ 1, there exists a n 3 = n 3 (p, M ) such that, for all n ≥ n 3 , Lemma 3.6 implies that: P T0,x0,p at least t * + 1 consecutive ones in (I j ) .
What is left to show is that, for any 0 < p ≤ 1, there exists n 4 = n 4 (p, M ) such that which would imply ∀n ≥ n 4 : P T0,x0,p at least t * consecutive ones in (I j ) The latter bound, which is uniform in T 0 , x 0 , together with Observation 3.1, will prove that M + 1/2 is admissible.
Remark 3.9. The important thing here is that, because t * = log M +1/2 n, the probability of t * + 1 consecutive ones, i.e. µ t * +1 n , goes to 0 slowly with n. Remark 3.8 shows we are considering "polynomially many indicators", so such not-too-small probabilities make a long run of ones very likely.
Let us show that for our choices of t * and * we can find an n 4 sufficiently large such that Equation (3.13) holds. We point out that we will implicitly assume n 4 > n 3 so to guarantee that inf T,x P T,x,p (I 1 = 1) ≥ µ n . It is for this reason that n 4 will depend on p (in particular, n 4 will be non-increasing in p). .
Since t * = log M +1/2 n , for sufficiently large n we have where we are using that 1 − bn an an ≈ e −bn−o(bn) for sufficiently large n whenever a n , b n → +∞ and b n = o(a n ). Thus, for large enough n.

The loop process, or why it's hard to go back
Now we stop the discussion about the BGRW process, to introduce a simpler process which will help us to understand how long the walker X stays on specific subgraphs of the random trees {T n } n∈N . Roughly speaking, a loop process on an initial graph G is a random walker such that at each step adds a loop to its position according to a coin and then chooses uniformly one edge of its current position to walk on. In other words, the process is quite similar to the BGRW but here the walker adds loops instead of leaves, which makes it possible for it to stand still. We will be particularly interested in the loop process over specific graphs which we define before the definition of the process itself. We call a finite graph B a backbone of length if B is a path of length having a loop attached to its + 1-th vertex and possibly to its other vertices, see Figure 2. In this section, we will abuse the graph terminology saying degree of a vertex even though we do not count loops twice. We then reserve the special notation deg t (i) to denote the number of edges attached to vertex i at time t. We stress out to the index t + 1 of B in (2). It means that we may add a new loop at (1) and then choose it at (2).
Once we have defined the loop process, we are interested on the time it takes to go from one end of the backbone to another. More precisely, we would like to obtain bounds on the stopping time below when the process starts from . The next lemma gives us some estimates.
Proof of Lemma 4.1. We need some notation and definitions. Consider the following sequence of stopping times Observe that the probability of X loop t leaving its current position is at least 1/(t + C) (where C is a constant depending on B 0 only) since the degree of a vertex at time t is at most t plus the number of edges in B 0 . This implies that τ k is finite a.s. for all k. This allow us to define the process Y k := X loop τ k . Note that by the strong Markov Property, {Y k } k is a simple random walk on {0, 1, · · · , } with reflecting barriers. Regarding the process {Y k } k we let σ be the following stopping We prove the lemma by showing that X loop spends at least exp{K} steps on the vertex . More precisely, we prove that the degree of at time τ σ is at least exp{K}, w.h.p. To do this, first observe that the degree of a vertex may be written in terms of Y k and ∆τ k := τ k+1 − τ k as follows Also notice that if Y k = , then the number of steps X loop spends on is exactly ∆τ k , which satisfies the following stochastic ordering To see the bound above, consider the random variable which counts the number of steps X loop spends on by choosing only the loops which were already attached on when X loop arrived at . This random variable is clearly (stochastically) smaller than ∆τ k and follows a geometric distribution of parameter 1/deg τ k ( ). Thus, we have Regarding the random variables Bin Geo 1/deg τ k ( ) , p , we claim that Proof of the claim: To simplify our writing, write and letF τ k be the σ-algebra generated by G k and F τ k . Now, by Chernoff bounds, we have that Recall that d k is greater than or equal to 2 for all k. So, taking the conditional expectation wrt F τ k on the above inequality yields which proves the claim.
The above claim tells us that conditionally on the past, every time {Y k } k visits it has probability at least q of increasing the degree of by a factor of at least 1 + εp. This points out that the degree of must be at least exponential of the number of visits receives from {Y k } k . So, let N σ ( ) the number of visits made by Y to before it reaches vertex 0. Since Y is a simple random walk on {0, 1, · · · , }, N σ ( ) follows a geometric distribution of parameter 1/ . Moreover, the random variable W that counts how many times we have successfully multiplied the degree of by 1 + εp may be written as follows and dominates a random variable distributed as Bin(N σ (l), q). Consequently, by Chernoff , then deg τσ ( ) ≥ 2e K which implies that τ σ is at least this amount, finishing the proof.
The following special case of Lemma 4.1 will be particularly useful to our proposes.

Coupling the BGRW and the loop process
In this subsection we construct a coupling of the BGRW and loop processes in such way that the loop process is always closer to the root than the walker X. For this, let T be a rooted locally finite tree of height at least 2( + 1), x a vertex such that dist T (x, root) ≥ and d T (x) ≥ 2 and y its ancestor at distance . Since T is a tree, there exists only one path P connecting x to y. With this in mind, we define a graph operation B which associates to each pair (T, x) and ancestor y satisfying the aforementioned conditions a backbone B(T, y, x) of length as follows: 1. delete all vertices of T whose distance from P is at least 2; 2. replace each edge xy ∈ E(T ) with x ∈ P and y ∈ P with a loop edge at x (so each edge stemming out of the path becomes a loop).
3. label the vertices on P by their distance from y (so y gets label 0, its neighbor on P gets label 1 and so on). So far, we have shown that the BGRW is capable of reaching long distances -powers of log n -away from the root. Now, we would like to argue that once it has gone so far, it takes too long to return. More specifically, if the initial condition is T 0 , x 0 and y is one ancestor of x 0 , we would like to obtain lower bounds on the following stopping time The way we bound η y from below is comparing it with η loop 0 , which we recall its definition The next proposition tells us that we may couple the BGRW and the loop process in such way that η y is greater than η loop 0 almost surely.

Proposition 4.4 (Coupling)
. Let T 0 be a rooted locally finite tree, x 0 one of its vertices different from the root and y an ancestor of x 0 . Then, there exists a coupling Proof of Proposition 4.4. Let P denote the path connecting x 0 to its ancestor y on T 0 and its length. Also consider the following sequence of stopping times, In fact, when X loop k−1 = X ζ k−2 we know that X ζ k−2 has jumped outside P. Thus, the only way of X coming back to P is through we know that X ζ k−2 has moved on P implying that ζ k−1 = ζ k−2 + 1. Consequently, regardless the finiteness of ζ k−1 -since W k is independent of the BGRW, B k is obtained by adding a loop on X loop k−1 independently of the whole past and with probability p. To define X loop k we proceed in the following way By definition of the above coupling, we obtain that η y ≥ η loop 0 . The best scenario would be that in which the BGRW only walks over P, in this case the stopping times are equals.
As a straightforward consequence of the above coupling, we restate Corollary 4.3 in terms of the hitting time η y .

Positive drift and its consequences
In this section we show that the BGRW has a positive drift away from the root. In particular, we show that lim inf n→∞ dist Tn (X n , root) n > 0, almost surely, which implies the transience of the walker X. We do that by tracking the distance of X from the root at random times and comparing it to a right biased simple random walk on Z. Furthermore, this comparison with the right biased simple random walk allows us to improve the results given in Proposition 3.4 and Corollary 4.5. Specifically, we can now prove • the walker achieves distances of order n in n steps w.h.p; • the probability of the walker going back long distances is exponentially rare.
Intuitively, the main message of this section is that if we look at the distance of X from the root properly normalized and at some "random times" we see a random walk on the line that dominates a right biased simple random walk. Let us begin by formally define what we mean by "random times". For a fixed positive integer r, we define three stopping conditions from a vertex x 0 .
3. X walks exp{ √ r} steps and none of the previous conditions occurs.
We say (1) − (3) occurred from X m whenever one of the three stopping conditions occurred when we put x 0 as X m . We must point out that if dist Tm (X m , root) < r, then stopping condition (2) cannot be attained.
We define our sequence of stopping times as follows: From the definition of (1) − (3) follows that, for all k, σ (r) k is bounded from above by exp{ √ r}k. To avoid clutter, when r is fixed we suppress the upper script from the nation of σ (r) k .
Lemma 5.1 (Coupling to the biased random walk). Let T 0 be a rooted locally finite tree and x 0 one of its vertices. For any q ∈ (1/2, 1) there exists r = r(p, q) depending on p and q only, such that the process {dist Tσ k (X σ k , root)/r} k≥0 can be coupled with a simple random walk {S k } k≥0 on Z, whose probability of jumping to the right is equal to q, in such way that Proof of Lemma 5.1. We begin by a few notations. To simplify our writing, put d k := dist Tσ k (X σ k , root). Note that the process {d k } k≥0 is a non-markovian random walk on the half line whose increments belong to the interval [−r, r]. Let {U k } k≥0 be a sequence of i.i.d random variables independent of the BGRW and following uniform distribution on [0, 1]. Also letF k be the σ-algebra generated by the BGRW process up to time σ k and let H k be the σ-algebra encoding all information ofF k and all the uniform random variables up to time k.
Regarding the increments of {d k } k≥0 , we claim Claim 5.2. There exist positive constants r and C depending on p only, such that for all k inf T0,x0 Proof of the claim: First of all, observe that ∆d k+1 = r if, and only if, σ k+1 stops because of condition (1). Also, by the strong Markov Property, it is enough to prove the claim for σ 1 . Said that, let us first assume we start from an initial condition (T 0 , x 0 ) with dist T0 (x 0 , root) ≥ r. We derive the desired lower bound by proving upper bounds for the probability of the events {∆d 1 = −r} and {|∆d 1 | < r}. The former occurs if, and only if, σ 1 stops because condition (2) and the latter because of condition (3).
Observe that if σ 1 stops because of (2) (which can happen given that dist T0 (x 0 , root) ≥ r) then the walker X visited the ancestor y of x 0 at distance r, spending at most exp{ √ r} steps. So, using Corollary 4.5 we have P T0,x0,p (σ 1 stops because of (2)) ≤ P T0,x0,p η y ≤ e √ r ≤ C √ r .
Note that the above upper bound holds for all possible pairs of a rooted locally finite tree T 0 and one of its vertex x 0 at distance greater than r from the root.
On a random walk that grows its own tree Finally, if X stops because of (3) then X has walked for e √ r steps and has neither visited the ancestor y of x 0 at distance r, nor has increased its distance from x 0 by r. This is the same of observing a processX on the subtree T y that in e √ r steps does not get at distance 2r from the root y. Applying Proposition 3.4 (page 6) with n = e √ r and M = 4 we obtain that there exists r 0 = r(p) such that for all r ≥ r 0 we have inf T0,x0 Choosing r large enough so that r ≥ r 0 , exp{−e √ r/4 } ≤ C / √ r and r 2 > 2r we conclude that sup T0,x0 P T0,x0,p (σ 1 stops because of (3)) ≤ exp{−e √ r/4 }.
To drop our assumption that we start at distance greater than r from the root, just recall that when this is not the case the condition (2) has probability zero and the above upper bound for condition (3) still holds. This implies that inf T0,x0 P T0,x0,p (σ 1 stops because of (1)) ≥ 1 − C √ r , which combined with the strong Markov Property proves the claim.
Since we may increase r, we choose it large enough so Now we couple the process {d k /r} k≥1 and {S k } k≥1 in the following way. Set and assume we have defined {S j } k−1 j=0 in such way that it has the distribution of k − 1 steps of a q-right biased random walk. Let Q k−1 denote Q k−1 := P T0,x0,p (∆d k = r|F k−1 ), and recall from Claim 5.2 that Q k ≥ q for all k. We then define S k in the following way In words, if at time σ k the walker X increased its distance from the root by r, then S k jumps to the right with probability q/Q k−1 . In this way, whenever the process {d k /r} k≥0 jumps back (at most one unit), the SRW also jumps back one unit. Now, we show that the process {S k } k≥0 does have the distribution of a q-right biased random walk. We start by checking that the increments are 1 with probability q or −1 with probability 1 − q. By the definition of S we have since U k is independent of H k−1 and Q k−1 is measurable with respect to H k−1 . Moreover, the equality below holds Q k−1 = P T0,x0,p (∆d k = r|F k−1 ) = P T0,x0,p (∆d k = r|H k−1 ), since H k−1 isF k−1 added of information independent of the whole process {(T k , X k )} k≥1 . Equation (5.1) allows us to derive the independence of all the increments of our right biased random walk {S k } k≥1 . For any fixed set of indexes k 1 < k 2 < · · · < k j and any vector (a 1 , · · · , a j ) ∈ {−1, 1} j we have which implies the independence of the increments ∆S k . So, {S k } k≥0 is distributed as simple random walk on Z with probability q > 1/2 of jumping to the right starting at the left of d 0 /r. And, by construction, we have that d k /r ≥ S k for all k.
As a consequence of the coupling above, we prove the ballisticity of the walker. Proof of Proposition 5.3. Given a q > 1/2 and {S k } k≥0 a q-right biased simple random walk on Z, Lemma 5.1 guarantees the existence of a positive constant r, depending on p and q, so that, for any initial condition on the BGRW, By the Strong Law of Large Numbers we also have that This implies that lim inf k→∞ dist Tσ k (X σ k , root) k ≥ rµ, a.s.

(5.2)
On the other hand, by the definition of the stopping times σ k we have that, for all k, the following inequality holds almost surely, as well as On a random walk that grows its own tree Remark 5.4 (A lower bound for the speed). From the proof of Proposition 5.3 one can derive a lower bound for the speed c(p) of the form exp{−C p −C }, for some universal constant C . In our argument, given q > 1/2 we choose r satisfying and obtain that lim inf Since q can be chosen arbitrarily close to 1/2 and independent of p, to obtain a lower bound for the velocity as a function of p, one must know how C depends on p. By inspection of the proof of Lemma 5.1, the constant C is coming from Corollary 4.3 and it is obtained by setting K = where c 1 = (4 log(1 + εp)) −1 , c 2 = 8c 1 /q, q = 1 − e −(1−ε) 2 p e −3/2 and ε is an auxiliary parameter in (0, 1). Choosing q = 2/3, ε = 1/2 and using that (1 − 1/x) x ≥ e −3/2 for all x ≥ 2, one obtains that C = 3c 2 , and Equation (5.3) is satisfied with √ r = 9c 2 . Finally, replacing this expression for r in Equation (5.4) one obtains a lower bound of the form exp{−C p −C }.

Controlling returns and degrees
The coupling gives us a picture of the dynamics of the walker: it is moving away from the root at linear speed. Before moving on, we use the coupling to show that vertices are not visited many times and (as a result) degrees in our tree tend to be small. Lemma 5.5. Let (T 0 , x 0 ) be the initial state of the dynamics and y ∈ T 0 be given. Then for any 0 < p ≤ 1, the number v(y) of visits to y, v(y) := |{0 ≤ t < +∞ : X t = y}|, satisfies ∀k ≥ 0 : P T0,x0,p (v(y) ≥ k) ≤ C e −β k , and P T0,x0,p (v(y) > 0) ≤ C e −β dist T 0 (x0,y) , for some β, C > 0 depend only on p. Moreover, if p 0 > 0 and p 0 ≤ p ≤ 1, C and α can be chosen uniformly in p 0 .
Proof of Lemma 5.5. Recall from the coupling we just constructed that we may choose r large enough so that the process {dist Tσ k (X σ k , y)/r} k dominates a right biased random walk {S k } k on Z starting on s 0 := dist T 0 (x0,y) r . We count the visits to y per time interval (σ k−1 , σ k ] (with a possible additional visit at time 0). |dist Tσ k (X σ k , y) − dist Tt (X t , y)| ≤ r.

v(y)
As a result, if X t = y for some σ k−1 ≤ t ≤ σ k , then S k ≤ dist Tσ k (X σ k , y)/r ≤ 1. In The RHS is (up to a constant) the number of visits of a right-biased random walk started from S 0 ≥ 0 to the interval (−∞, 1]. This number has an exponential tail that only depends on the bias. The first inequality follows because we can choose r = r(p 0 ) to guarantee a bias of 1/3 (say) for all p 0 ≤ p ≤ 1. The second inequality also follows (perhaps with changes to C, β) once we realize that S 0 ≥ dist T0 (x 0 , y)/r. Lemma 5.6. Given p 0 > 0, there exist constants C, α > 0 depending only on p 0 such that, for all p 0 ≤ p ≤ 1, all finite (T 0 , x 0 ), and all n, k ≥ 1, Proof of Lemma 5.6. Assume without loss of generality that V (T ) = {1, . . . , }. Also let + 1, + 2, . . . be the vertices that the BGRW process started from T, x creates. Finally, we let v t (i) denote the number of visits to i up to time t, so that v t , as the degree only grows at times when i is visited. Therefore, for all 0 ≤ t ≤ n, α > 0 and k ≥ + 1: As a consequence: By Lemma 5.5, v n (i) ≤ v(i) has an exponential tail uniformly in i, n, T 0 , x 0 and p ∈ [p 0 , 1]. Thus we may take α = β/2 (with β from Lemma 5.5) and adjust C to obtain our result.

The local point of view on infinite trees
In this section we start our study of the environment as seen by the walker. From now on, we consider the pair (T t , X t ) as random elements of the space of rooted locally finite trees.

Rooted graphs and trees
We recall the definition of rooted graphs, rooted trees and the local topology on these objects. All notions we need are defined and studied in Bordenave's lecture notes [4].
A rooted graph is a pair (G, o) where G is a connected, locally finite graph and o is vertex of G. Note that any rooted graph must have a countable vertex set which we may assume to be a subset of N or Z. Two One can show that (G * , ρ) is a Polish metric space. The set T * ⊂ G * of rooted trees is the set of equivalence classes [T, o] where T is a locally finite tree. This is a closed subset of G * and is therefore a Polish space with the metric ρ.
Our BGRW dynamics (see, Section 2.2) may be naturally extended to the set T * of random rooted trees. The idea is that the state [T t , X t ] describes the tree "rooted at the position of the walker". With some abuse of notation, we use K p to denote the Markov transition kernel of our process over this space as well.

Empirical measures and local functions
Much of the remainder of the paper will be spent dealing with the tree as viewed by the walker. More precisely, we will study the empirical measure of the tree around the walker. Definition 6.1. Given a realization [T t , X t ] of the process K p , we let P n denote the empirical measure, that is, the random probability measure over T * given by: Thus for a given element [T, v] ∈ T * and h ∈ N, counts the number of times 0 ≤ t ≤ n at which the ball of radius h around X t in T t is isomorphic to [T, v] h .
In this section we show the convergence of empirical averages of bounded local functions on the space T * under suitable assumptions on [T 0 , x 0 ]. A specific instance of this general result easily implies that the walker has a well-defined speed. In Section 7 we show that P n ⇒ P p almost surely, where P p is an invariant measure for the process K p .
In the proof of such results, local functions will play an important role.

Definition 6.2 (Local function). A function
To avoid clutter, with a slight abuse of notation, we will write ψ(G, o) instead of ψ([G, o]).
In order to prove the convergence of empirical averages we restrict ourselves on measures on T * which have some nice properties. To define this class of measures we first define certain stopping times. Given ∈ N\{0}, let Q be the path of length , i.e. the graph with vertex set {0, 1, . . . , } and edges {(i − 1), i} (1 ≤ i ≤ ). We define τ as the first time when the graph around X t in T t is Q . τ := inf{t ≥ 0 : [T t , X t ] = [Q , ]}. Definition 6.3. A measure µ over pairs (T, x) is called p-escapable if τ < +∞ P µ,palmost surely for every ≥ 1.
We can now state the main theorem of this section. Theorem 6.4. For each parameter 0 < p ≤ 1 and each integer h ≥ 1 and all h-local function ψ : T * → R, there exists a constant M p ψ such that lim n→+∞ P n ψ = M p ψ P µ,p -almost surely , for all p-escapable distribution µ on T * .
As we will see in Section 7, this result implies that "the tree seen by the walker" converges to an infinite random tree when n → +∞.
The proof of Theorem 6.4 relies on the following results. The first lemma shows that the empirical measures of processes started from two p-escapable measures are asymptotically the same, i.e. the initial distribution is "forgotten". The second lemma shows that the empirical measures of local functions when our process starts from specific p-escapable measures, corresponding to δ [T0,x0] with T 0 finite, converges to a constant. Lemma 6.5 (Forgetfulness of empirical measures; proof in §6.5). Let µ, ν be two pescapable measures. Let h be a positive integer and ψ : T * → R be a bounded h-local function. If there exists a constant M p ψ such that P n ψ → M p ψ P ν,p -almost surely, then we also have P n ψ → M p ψ P µ,p -almost surely.
The above statement allows us to restrict ourselves to subclasses of initial distributions and then extend the results to the whole class of p-escapable distributions. This procedure greatly reduces the amount of work in our proofs. The next lemma is a finite version of Theorem 6.4. Lemma 6.6 (proof in §6.6). Consider an initial condition (T 0 , x 0 ) with T 0 finite. Fix 0 < p ≤ 1 and a bounded h-local function ψ : T * → R (with h a positive integer). Then there exists a constant M p ψ such that P n ψ → M p ψ, P T0,x0,p -almost surely.
Combining the two lemmas gives us a straightforward proof of Theorem 6.4.
Proof of Theorem 6.4. Let [T 0 , x 0 ] be a finite rooted tree and consider ν = δ [T0,x0] . By Lemma 6.6, we have that P n ψ → M p ψ, P ν,p -almost surely. But, by Lemma 5.6, ν is p-escapable. Therefore, by Lemma 6.5, P n ψ → M p ψ, P µ,p -almost surely. Now we know how Theorem 6.4 follows from our lemmas we organize the remainder of this section as follows. In the next subsection we prove Theorem 1.1 as an application of Theorem 6.4. Then, in Subsection 6.4, we discuss the concept of p-escapable trees and give quantitative criteria for escapabality. Finally, in Subsections 6.5 and 6.6, we prove Lemma 6.5 and Lemma 6.6, respectively.

Existence of the speed
In this section we prove a stronger version of Theorem 1.1 as an application of Theorem 6.4. We prove that the random walker in the BGRW model has a well-defined positive speed for any p-escapable initial distribution on T * . This result illustrates how the our results regarding the "point of view of the particle" can give finer information about the walk. Theorem 6.7 (Linear speed of the walker). For each 0 < p ≤ 1, there exists a constant 0 < c(p) ≤ p such that, for any p-escapable distribution µ on T * , lim n→∞ dist Tn (X n , root) n = c(p) P µ,p -almost surely.
Proof of Theorem 6.7. Our strategy is to prove that the distance from the root at time n may be written as a sum of a bounded term, a martingale, and a sum of a 1-local function computed along the trajectory. The result will then follow from Theorem 6.4.
Fix a p-escapable measure µ and define the following bounded 1-local function: Note that: where {M j } j is a martingale with bounded increments and is bounded by the total number of visits to the root, which is a.s. finite because our process is transient.
Since the martingale has bounded increments, Azuma's inequality implies that M n /n → 0 almost surely. Since C n is a.s. bounded, C n /n → 0 almost surely as well.
Theorem 6.4 gives us that with M p (ψ) is a constant depending on p and ψ only (but not on µ). Proposition 5.3 implies c(p) > 0.

Escapable trees and mixing
The concept of escapable trees is closely related to the forgetfulness of the walker.
A tree T is p-escapable if the measure δ (T,x) is p-escapable for every x ∈ V (T ). Note that, by the Markov property it is enough to show that δ (T,x) is p-escapable for a particular x ∈ V (T ). In this section we prove a quantitative criteria for checking "escapability". Let us notice that not all trees are p-escapable. The next example gives a useful example to keep in mind.

Example 6.8 (A hard tree).
Consider an infinite rooted tree T hard , with a root node x hard , such that each node at distance h from the root has g(h) > 0 children, with 1/g(h) summable. Consider the BGRW from (T hard , x hard ). One can show via the Borel-Cantelli Lemma that there is a positive probability that dist Tt (X t , x hard ) = t for all t ∈ N (i.e. the walker simply "walks down the tree" for eternity). For the same reason, for any ∈ N\{0} there is a positive probability that τ = +∞.
The next lemma explains one reason why p-escapability is important to us. The lemma gives a quantitative criterion for "escapability". In a way, it says that what makes the tree in Example 6.8 non-escapable is that X t sees very large degrees along the way. Lemma 6.9 (Quantitative escapability). Assume µ is a starting measure for which there exist C, D, α > 0 such that, for all n, k ≥ 1, n t=0 P µ,p (d Tt (X t ) ≥ k) ≤ C (n + D) e −α k .

(6.2)
Moreover, the following event holds with probability ≥ 1 − 1/n 16 : for all times 1 ≤ t ≤ n, In particular, all measures µ = δ (T,x) , with T a finite tree, satisfy the above inequalities, with C ≤ e α ∆(T ) by virtue of Lemma 5.6.
So Lemma 6.9 also guarantees that not only µ is p-escapable, but also that, with high probability, each intermediate [T t , X t ] satisfies a quantitative escapability property. One important point is that the assumptions of Lemma 6.9 always apply to initial trees that are finite, by virtue of Lemma 5.6.
Proof of Lemma 6.9. We first prove inequality (6.2) for E µ,p [τ ∧ n], noting that it implies p-escapability because, assuming it, we can show: Fix some ∆ > 0. Let η ∆ be the first time the walker "sees" a node with degree ≥ ∆, Thus, by the definition of the stopping time η ∆ , we also have Using the above inequality and noticing that 1{η Then, proceeding by induction, we obtain the following upper bound we have: and consequently, But, by our condition (7.1), We may choose ∆ = 2 log(n + 1) + log C α , to finish the proof of (6.2).
To prove the last statement in the lemma, we go back to (6.5) and note that was the only step in that proof where we used (6.1). In particular, we can go back to (6.4) and obtain: In particular, if ∆ is given and we consider the event we have that, when G(∆) holds, then: for each 0 ≤ t ≤ n. To finish, we show that P µ,p (G(∆)) ≥ 1 − n −16 for an appropriate choice of ∆. Indeed, by union bound and Markov's inequality: Now by the Simple Markov property, for 0 ≤ t ≤ n And again, by condition (6.1), Taking ∆ := log(2n + D) + 17 log n + log C α , guarantees P µ,p (G(∆)) ≥ 1 − n −16 . Plugging this choice of ∆ into (6.6) gives (6.3) inside the event G(∆).

Forgetfulness of empirical measures: proof of Lemma 6.5
The proof of the lemma will follow as a consequence of another result that highlight the relation between "escapabability" and forgetfulness. As we will see, it implies an approximated renewal property of the BGRW process in the sense that after a random time it may be coupled to a BGRW process starting from a semi-infinite path. Then, both walkers see around them the same tree structure as long as this coupling works.
For this purpose, we will need additional notation. Let Q * be a semi-infinite path, i.e., rooted at v 0 and consider a trajectory (T * t , X * t ) t≥0 of BGRW started from (Q * , v 0 ) with empirical measures P * n . Also, for each pair of integers h and n, let π(h, n) be π(h, n) := P Q * 0 ,v0,p (∃m ≤ n, X * m = v h ) .
(6.8) Lemma 6.10. Let ψ : T * → R be a h-local and bounded and µ be a p-escapable distribution. Consider a trajectory (T t , X t ) t≥0 of BGRW starting from µ, with empirical measures P n . Then, (a) for every ε ∈ (0, 1), there exists a coupling of the processes (T t , X t ) t≥0 and (T * t , X * t ) t≥0 such that: Proof of Lemma 6.10. Part (a). Take an ∈ N bigger than 2h. Consider a chain started from measure µ. Since µ is p-escapable, the stopping time τ is finite P µ,p -almost-surely. Let y 0 = X τ , y 1 , . . . , y be the unique vertices at distances 0, 1, 2, . . . , at time τ . Define a random time τ ,h as follows: Notice that this time may well be infinite. In fact, since dist(X τ , y −h ) = − h, the second statement in Lemma 5.5, combined with the strong Markov property and the p-escapability of µ, implies that Note that τ is the time X t is at the tip of a path of length and τ + τ ,h is the first time at which X t comes within distance h of the other end of that path. Now consider the process ( This time may also be infinite. It is easy to see that one can couple ( Under this coupling, if τ ,h > n and τ ≤ n, then (ψ(T i , X i ) − ψ(T n−τ ∧n+i+1 , X n−τ ∧n+i+1 )). (6.9) Since τ < +∞ is bounded and P n ψ ≤ ψ ∞ , this implies that, under the coupling: So, for any fixed , we have In particular, this can be made greater than 1 − ε with an appropriate choice of . This proves part (a). Part (b). Put = 2h. From Equation (6.9) and the fact that under the coupling τ ,h = τ * ,h , we may obtain that P n ψ − P * n ψ = P n ψ − P * n ψ 1{τ ,h ≤ n} + P n ψ − P * n ψ 1{τ ,h > n} taking the expected value both sides it is enough to proves part (b).
Now we see how Lemma 6.5 follows immediately from the above result.
Proof of Lemma 6.5. From part (a) of Lemma 6.10, one sees that, given a deterministic c ∈ R, P µ,p ( P n ψ → c) = 1 ⇔ P Q * ,v0,p ( P * n ψ → c) = 1. This holds for any p-escapable measure µ. In particular, if ν is also p-escapable the same must hold with ν replacing µ.

Convergence of average of local functions: proof of Lemma 6.6
The proof of Lemma 6.6 will follow essentially from the lemma below that states that, when started from a finite initial condition, the average of local functions is close to its expected value but started from the semi-infinite path Q * introduced in (6.7). Lemma 6.11. Given p 0 > 0, there exist constants C, α > 0 depending only on p 0 such that, for all p 0 ≤ p ≤ 1, all finite (T 0 , x 0 ), all −1 ≤ ψ ≤ 1 h-local, all ε > 0, and all 1 ≤ q < n, n/q > 0, Proof of Lemma 6.11. Let n = kq + s, with k ≥ 1. Obseve that, for every k, q ≥ 1, we can where, S j := 1 q jq−1 i=(j−1)q ψ(T i , X i ), for j ∈ {1, · · · , k}. Let us denote by {F j } j≥0 the following filtration:F By the definition of S j , the process {S j } j≥1 is adapted to the filtration {F j } j≥0 . Moreover, the process {M k } k≥1 defined as is a mean zero martingale with respect to {F j } j≥0 , whose increments are bounded by 2, since S j is an average of random variables bounded by 1.
With all these definitions, by triangle inequality, we have that As far as (a) is concerned, we observe that the following deterministic bound holds: since, by hypothesis, ψ ∞ ≤ 1. Thus, to prove the claim, it suffices to show that P T0,x0,p (b) + (c) ≥ ε + err(q, h) ≤ 2 exp{−ε 2 k/32} + 1 q 16 . (6.10) Using the simple observation that we will prove Equation (6.10) by showing : To show (1), we observe that the martingale M k can be written as the following Thus, (b) is equal to |M k /k| and we apply Azuma's inequality which gives us: To show item (2), we first observe that, by the Simple Markov Property, for all j ∈ {2, · · · , k} the following identity holds Then, by Part (b) of Lemma 6.10 we have Therefore, it is enough to show that the following event happens with probability at most q −16 .
By Lemma 6.9 (using the constants coming from Lemma 5.6) we know that, if we define which implies P T0,x0,p (B) ≤ 1 q 16 , and concludes the proof.
The next step towards the proof of Lemma 6.6 is to derive from the above lemma three small results which combined will give us the proof. The first corollary is a more convenient version of Lemma 6.11. Corollary 6.12. Given p 0 > 0 and (T 0 , x 0 ) finite, there exists n 0 and a constant c depending only on p 0 , T 0 such that, for all n ≥ n 0 , for all p ∈ [p 0 , 1] we can find values for h and q such that, for all −1 ≤ ψ ≤ 1 h-local P T0,x0,p P n ψ − E Q * ,v0,p P q (ψ) > c 2 log n ≤ 2 n 8 .
Proof of Corollary 6.12. We use Lemma 6.11, suitably choosing the values of q, h and ε.
We begin observing that by Lemma 5.5 we have that π(h, q) ≤ C 1 log n .
Given that the constants C, C 1 , α only depend on p 0 , and T 0 is finite, there exists a constant c and a sufficiently large n 0 , both depending on p 0 and T 0 only, such that err(q, h) ≤ c 6 log n .
Proof of Corollary 6.13. By union bound, it suffices to show that there exists n 0 such that for all n ≥ n 0 and for all m, m ∈ {n, n + 1, · · · , n 2 }, P T0,x0,p P m ψ − P m ψ > c log n ≤ 4 n 8 .
Proof of Corollary 6.14. Using Corollary 6.13, together with a Borel-Cantelli argument implies that P T0,x0,p almost surely there exists n 0 such that, for all n ≥ n 0 and for all m, m ∈ {n, n + 1, · · · , n 2 } we have For any δ > 0, we can choose n 1 large enough so that c log n1 < δ/2 and n 1 ≥ n 0 . Now, for m, m ≥ n 1 , assuming m ≤ m , there exists k 0 such that m ∈ {m 2 k 0 , · · · , m 2 k 0 +1 }.

Thus,
Now we are able to prove Lemma 6.6.
Proof of Lemma 6.6. From Corollary 6.14 we have that { P n ψ} n∈N converges P T0,x0almost surely. Thus, in order to prove the result, we prove first that the sequence { P n ψ} n∈N started from any finite initial condition always converges to the same limit.
Finally, we prove that this limit must be a constant.
Consider two independent BGRW processes (T t , X t ) t∈N and (T t , X t ) t∈N starting from two finite initial conditions (T 0 , x 0 ) and (T 0 , x 0 ) respectively. Let { P n ψ} n∈N and { P n ψ} n∈N be the averages associated to each process. The proof follows from (6.6) and triangle inequality, which combined give us that for some positive constant c depending on p, T 0 and T 0 only. Then, an application of Borel-Cantelli Lemma allows us to conclude that both processes converge to the same limit M p ψ. Once we have that, we finally observe that since the BGRW processes are independent, the events {M p ψ ∈ A} and {M p ψ ∈ B} are independents for all Borel sets on the real line. This implies that M p ψ is independent of itself, thus, constant almost surely.

Weak convergence of empirical measures
In this section we prove Theorem 1.2 and Theorem 1.3 in the Introduction. In fact, we prove slightly stronger statements that only require the initial measure to be pescapable, which in particular applies to all initial measures supported on finite trees (cf. Lemma 6.9). Specifically, we will show that, for any p-escapable initial measure µ, there exits a measure P p on the space T * such that, when n → +∞, One can check that D h : T * → R is (h + 1)-local.  (1). Then there exists a countable family S of bounded local functions such that, if lim n Q n f exists for all f ∈ S, then Q n ⇒ Q for some Q that is uniquely defined by: We begin with addressing tightness for {P n } n∈N . The next lemma shows Cèsaromean-type sequences of measures generated by our Markov chain are tight when the initial measure satisfy the quantitative criterion of p-escapability defined in Lemma 6.9.
Lemma 7.2. Assume µ is a measure over T * with the following property: there exist C, α, D > 0 such that for all n, k ≥ 1: Then the sequence satisfies the tightness criterion in Proposition 7.1. In fact, the following quantitative estimate is satisfied: for all h, k ≥ 1, n ∈ N: Proof of Lemma 7.2. Note that: So it suffices to prove the following quantitative estimate: for all h, n, k ≥ 1, which is true for h = 0 by our assumption (7.1). Consider h > 0 and assume that we have proven that, for each j ≤ h − 1, there exists C j > 0 depending only on j and p, and an exponent α as in the base case, such that (induction hyp.) ∀m ∈ N : m t=0 P µ,p (D j (T t , X t ) ≥ k) ≤ C j (m + D + j) e −α k , for all k ≥ 1 (again, this is true for j = 0 with C 0 = C). We claim that a similar statement holds for j = h with C h = C 0 + kC h−1 . A simple induction would then imply our goal: To obtain our bound, note that The first term in the RHS is ≤ C 0 (n + D) e −α k by the base case. For the second term, we observe that D h (T t , X t ) ≥ k -i.e. some vertex at distance h from X t has degree ≥ kmeans D h−1 (T t , v) ≥ k for one of the neighbors of X t in T t . Now, if d Tt (X t ) ≤ k, there is a chance of at least 1/(k + 1) that X t+1 = v and thus D h−1 (T t+1 , X t+1 ) ≥ k. We conclude: On a random walk that grows its own tree So: In the next result -a strengthening of Theorem 1.2 -, we prove convergence of empirical measures P n and Cesàro-style convergence of the chain K p from any initial measure that is p-escapable. We also characterize the limiting probability measure of the chain as a stationary distribution for K p . Theorem 7.3 (Convergence of the empirical measure; proof in §7.1). Given p ∈ (0, 1], there exists a probability measure P p on the space T * of rooted trees such that, for any p-escapable initial measure µ, P n ⇒ P p P µ,p -almost surely and 1 n + 1 n t=0 µK t p ⇒ P p when n → +∞.
For 0 < p 0 ≤ p ≤ 1, the measure P p satisfies: where C, α > 0 only depend on p 0 . Moreover, P p is an invariant measure for K p .
Note that the measure P p itself is p-escapable, by Lemma 6.9 combined with (7.2). That is, P p is the unique p-escapable invariant measure for K p . On the other hand, the hard tree in Example 6.8 shows that one does not have convergence to P p from all initial measures.
Our next result extends Theorem 1.3 from the Introduction.
Theorem 7.4 (Support of P p ; proof in §7.2). Given 0 < p ≤ 1, let P p be the limit measure in Theorem 7.3. Then P p is supported on one-ended infinite trees. Moreover, if 0 < p < 1, given a finite rooted tree (S 0 , x 0 ) of diameter ≤ h, In particular, this shows that, if µ is such that P n ⇒ P p P µ,p -a.s., then P µ,p (τ < +∞) = 1 i.e. µ is p-escapable. In this sense, p-escapability is a necessary condition in Theorem 7.3.

Weak convergence of empirical measures: proof of Theorem 7.3
We present here the proof of convergence to the stationary measure.
We claim that the sequences {P n } n and { P n } n are tight (almost surely, in the second case). More specifically, we will apply the criterion in Proposition 7.1, part (1). In fact, fixing k, h ∈ N, we may take which is a (h + 1)-local function, so that P n φ k,h → M p φ k,h , and P n φ k,h → M p φ k,h P µ,p -a.s. Now, to evaluate the limit in the RHS, we replace the initial measure µ with a measure µ 0 supported on a finite tree with two vertices. Applying Lemma 7.2 in conjunction with Lemma 5.6, and letting n → +∞, we obtain: Since the RHS of this inequality goes to 0 as k → +∞, we obtain the required tightness. We now prove that the weak convergence criterion of Proposition 7.1 (part (2)) is also satisfied, which asks for the convergence of expectations of all bounded local functions.
For P n this is immediate from (7.3). For P n there is the slight issue that we have: ∀ bounded local ψ : P µ,p ( P n ψ → M p (ψ)) = 1, and we now need to "move the quantifier ∀ inside the probability". However, Proposition 7.1 shows that we only need to worry about a countable family of ψ, and there is no problem in moving the quantifier inside for a countable family. The upshot is that, assuming this claim, we have that Proposition 7.1 assures that P n and P n converge weakly (almost surely, in the second case) to the same probability measure P p , which is uniquely characterized by the fact that P p ψ = M p ψ. In particular, this measure satisfies the property that: Finally, we show that P p is an invariant measure for K p . To this aim, we need the following fact assuring that if ψ is bounded Lipschitz, so is K p (ψ). 1. X n and v lie on "opposite sides" of v −s , so dist Tn (X n , v ) ≥ dist Tn (v −s , v ) = s. 2. The subtree consisting of v −s and all nodes on the same side as X n is finite, as it consists of v i for 0 ≤ i ≤ − s plus the new nodes added between times τ and n. 3. The subtree consisting of v −s and all nodes on the same side as v is a.s. infinite, as it contains the tree T 0 at time 0.
Combining these properties, we see that all long enough paths in T n that start from X n must pass through v , which lies at distance greater or equal to s from X n . This implies [T n , X n ] ∈ O(s), since these same paths must pass through the unique vertex at distance s from X n that lies on the path from X n to v . We have thus shown that if τ ≤ n < τ + τ ,s , then [T n , X n ] ∈ O(s), as claimed.
We deduce: we may observe as in the proof of Lemma 6.5 that if is large enough. Since P p is p-escapable, we may ensure that P Pp,p (τ > n) ≤ 2 by taking n large enough (in terms of ). So for some choice of , n we obtain the desired inequality: P Pp,p (τ ≤ n < τ + τ ,s ) ≥ 1 − ε.

Final comments
Our analysis leaves open many problems about the BGRW process and its variants. We briefly discuss some of these. That is, how fast can p n decay while maintaining transience? Clearly, our arguments break down in this regime.

Problem 8.3. Consider models with cycles.
A "cheap" way to create cycles would be to connect the current vertex to some uniformly random vertex at distance K, with K a random variable. If K has light tails, it should still be possible to use the local topology to study the structure of our model.

A A simple lemma on dependent indicators
We prove here Lemma 3.6.
Note that k consecutive indices j appear in each F r , so: P at least k consecutive 1's in the sequence (I j ) m j=1 ≥ P   m/k r=1 F r   .
Moreover, the values of indices j involved in events F r , F r are disjoint for r = r .
With this in mind one may easily show (via our assumptions) that P(F 1 ) ≥ µ k and P(F r | F c 1 ∩ . . . F c r−1 ) ≥ µ k .  Then {P n } n∈N is tight.
Notice that, no matter what sequence we choose, we get: ∀n ∈ N : 1 − P n (K({k r } +∞ r=1 )) = P n So it suffices to show that for each r ∈ N we can choose k r ∈ N with sup n P n ((K (r) To do this, fix some r. By our assumption that lim k→+∞ lim inf n∈N P n (F (r) k ) = 1, we can find k * r ∈ N, n 0 ∈ N such that ∀n ≥ n 0 : P n (F (r) k * r ) ≥ 1 − /2 r .
On a random walk that grows its own tree On the other hand, for n < n 0 we still have: P n (F (r) k ) 1 as k → +∞ due to assumption 1.
Since there are only finitely many n < n 0 , we can find k * * r such that ∀n < n 0 : P n (F We may now choose k r := max{k * r , k * * r }. Since the sets F (r) k are nested, we see at once that ∀n ∈ N : P n (F (r) which is the property we needed.  Then there exists a countable family A of bounded Lipschitz functions from M to R, which depends only on {K k } k∈N , such that, if lim n P n f exists for all f ∈ A, then P n ⇒ P for some probability measure P . In that case, P is uniquely characterized by the fact that P f = lim n P n f for all f ∈ A.

B.1.2 Criterion for convergence
Proof of Proposition B.2. It is known that a tight sequence {P n } n∈N converges weakly if and only lim n P n f exists for each function in the set A 0 := {f : M → R : f ∞ ≤ 1 and f Lip ≤ 1}.
In that case, P f is uniquely defined by the fact that P f = lim n P n f for all f ∈ A 0 . Now consider a positive rational > 0. The fact that {P n } n∈N is tight implies that there exists a compact set K k ⊂ M , with k = k( ), so that with P n (K ) ≥ 1 − for all large enough n. For each such set, the functions A 0 restricted to K k( ) form a bounded equicontinuous family. Thus the Ascoli-Arzèla Theorem implies that there exists a countable subset A ⊂ A 0 with ∀δ > 0 ∀f ∈ A 0 ∃f ,δ ∈ A : sup x∈k( ) |f (x) − f ,δ (x)| ≤ δ. is the set we are looking for. For assume that lim n P n f exists for each f ∈ A. For each f ∈ A 0 , one can choose f ,δ ∈ A as in (B.1) and obtain: |P n (f − f ,δ )| ≤ |P n 1 K k( ) (f − f ,δ )| + P n (K c k( ) ) ≤ δ + for large enough n. lim n P n f ,δ , exists for all f ∈ A 0 . So P n ⇒ P . Passing to limits, we obtain from the above estimates that |P (f − f ,δ )| ≤ + δ.
In particular, P is uniquely defined by the values of P f = lim n P n f for f ∈ A 0 , which are uniquely specified by the numbers P f = lim n P n f for f ∈ A.

B.2 Application to local topology over rooted graphs
The general theory of the previous section will now be applied to the space T * of rooted, locally finite trees. which is still countable, and obtain the desired result.