Scaling limit of linearly edge-reinforced random walks on critical Galton-Watson trees

We prove an invariance principle for linearly edge reinforced random walks on $\gamma$-stable critical Galton-Watson trees, where $\gamma \in (1,2]$ and where the edge joining $x$ to its parent has rescaled initial weight $d(\rho, x)^{\alpha}$ for some $\alpha \leq 1$. This corresponds to the recurrent regime of initial weights. We then establish fine asymptotics for the limit process. In the transient regime, we also give an upper bound on the random walk displacement in the discrete setting, showing that the edge reinforced random walk never has positive speed, even when the initial edge weights are strongly biased away from the root.


Introduction
Linearly edge-reinforced random walks (LERRW) are a classical model of self-interacting processes introduced by Coppersmith and Diaconis in 1986 [CD86] (for a generalisation, we refer to the recent work of [BST21]). Given a rooted graph G endowed with initial edge weights, and a reinforcement parameter ∆ > 0, the model is defined as follows. The process (X n ) n≥0 on G is started at the root and given X n = v, the next edge traversed by X is chosen from the edges incident to v with probability proportional to their weights. After X has crossed the chosen edge, its weight is subsequently increased by ∆ and the process repeats with the updated edge weights.
The non-Markovian nature of the LERRW makes it difficult to analyse. However, a remarkable result of Diaconis and Freedman [DF80] (for the recurrent case) and then Merkl and Rolles [MR07] (extension to the transient case) shows that the LERRW can be represented as a random walk in random environment (RWRE). This makes LERRW more tractable and for example leads to applications in Bayesian statistics [DR06,BCP09,Bac11,BPFT16]. On trees this RWRE representation can be understood by noticing that the weights of the edges incident to each of the vertices evolve according to a Pólya urn model, independently for each vertex, see [Pem88]. We will make this connection precise in Section 3.2.
The aim of this paper is to study LERRW on critical Galton-Watson trees by constructing its scaling limit for a range of initial weights and obtaining almost sure asymptotics for the limiting diffusion. Since critical Galton-Watson trees have a rich geometry, in particular having fractal properties, unbounded degrees, and non-uniform volume growth, we hope that these results are interesting in their own right as well as offering insight into the behaviour of LERRW on related critical random graphs such as uniform planar triangulations, high dimensional uniform spanning trees and critical percolation clusters.
Our first result, an invariance principle, extends an earlier work of Lupu, Sabot and Tarrès, who constructed the scaling limit of LERRW on Z as a diffusion on R [LST20]. In the Galton-Watson tree setting, in the special case where the initial weights are constant and identical for all edges and the critical Galton-Watson tree has finite variance, the scaling limit was previously constructed in [And21].
We assume that the underlying Galton-Watson tree is critical with offspring distribution in the domain of attraction of a γ-stable law for some γ ∈ (1, 2]. In the case γ < 2 this immediately entails that the offspring distribution ξ satisfies ξ([k, ∞)) = k −γ L(k) for some slowly-varying function L. If such a ξ is fixed and T n is a ξ-Galton-Watson tree with n vertices endowed with uniform mass measure, it is well-known that there exists a sequence (a n ) ∞ n=1 such that n −1 a n T n converges in distribution to the stable tree of Le Gall and Le Jan [LGLJ98] (see also [DLG02, Chapter 1]). We denote the compact stable tree by T c γ and its root by O. In the case γ = 2 this is simply Aldous' Continuum Random Tree (see [Ald91b] and [Ald93]). We give the formal constructions in Section 2.3.
To define our LERRW model, we fix a reinforcement parameter ∆ > 0, label the root of our Galton-Watson tree O n and consider a class of initial weights (α (n) e ) e∈E(Tn) parametrised by α ≤ 1 whereby, if ← − x denotes the parent vertex of x, α (n) where d n denotes the graph distance on T n , cf. [Tak21, Section 2.3]. Note that na −1 n is the natural scale for branch lengths in T c γ ; the addition of of this term in is just to ensure that the weights and inverse weights are integrable at 0. Our first result is an invariance principle for this LERRW process.
Theorem 1.1. Let X (n) be a discrete-time LERRW on T n , started at O n , with initial weights as in (1) and reinforcement parameter ∆. Denote its law by P (α (n) ) On . For every α ≤ 1, there exists a stochastic process X defined on T c γ and started at O, with law P O , such that the following convergence holds jointly with respect to the Gromov-Hausdorff topology, and the topology of weak convergence of measures on the càdlàg path space on T n (itself endowed with the Skorohod-J 1 topology): a n n −1 T n , P (α (n) ) On n −1 a n X (n) ⌊2n 2 a −1 n t⌋ t≥0 The law of X will be defined explicitly in Section 3.3.
To establish scaling limits for tree-indexed random walks, one generally needs tighter control on the input laws compared to the Z d case, cf. [JM05, Theorem 2] and [Mar20, Theorem 1] which require finiteness of higher moments in order to prove convergence of tree-indexed random walks to tree-indexed Brownian motion (known as snake processes). This is because all the extra branches present in the tree present more opportunities for large displacements of the random walk, leading to larger fluctuations without the higher moment assumption. However, we will see in the LERRW setting that the random environment becomes increasingly concentrated as n → ∞ without making any higher moment assumptions.
The analogous result previously proved by Lupu, Sabot and Tarrès for the LERRW in dimension one [LST20] used a high-level representation of a LERRW as a RWRE that was established in [DV02]. They encode the corresponding random environment in a random walk process and show that this converges to a timechanged Brownian motion on R under the appropriate rescaling. The law of the limiting LERRW diffusion is analogously encoded by this Brownian motion. We will use the same strategy to prove Theorem 1.1. In contrast to [LST20] and [And21], we directly use the observation of Pemantle that the random environment can be represented by independent Dirichlet random variables at each vertex [Pem88], rather than the (more complicated) mixing measure obtained in [DV02], which makes our proof more elementary.
In the second half of this paper, we obtain asymptotics for the limiting diffusion appearing in Theorem 1.1.
To understand the long-time asymptotics of LERRW and its scaling limit it is more natural to study these processes on Galton-Watson trees conditioned to survive and their continuum counterparts, rather than on compact trees. In the discrete setting Kesten [Kes86a] showed that this conditioning can be achieved by conditioning a Galton-Watson tree to have a single branch that survives to infinity, known as the backbone, and attaching finite Galton-Watson trees to the backbone. We denote this tree by T ∞ . This construction also has a natural analogue in the continuum known as sin-trees. These were constructed by Aldous in the case γ = 2 [Ald91c, Section 2.5] and Duquesne in the case γ < 2 [Duq09]. We denote the infinite stable tree by T ∞ γ . We give the formal constructions in Sections 2.4 and 5.
It seems intuitively clear that increasing ∆ should make the LERRW "more recurrent" whereas increasing α should make it "more transient". Since the infinite tree T ∞ contains a unique path to infinity, it follows directly from the corresponding result of Takeshima [Tak00] (see also [ACT19]) for LERRW on the infinite half-line that the LERRW is almost surely recurrent if n≥1 n −α = ∞, and almost surely transient otherwise.
However, although the value of ∆ cannot change the transience and recurrence properties of X, we will see in our next theorems that both α and ∆ affect the time it takes for X to escape a ball. In the case α < 1, which we refer to as "strongly recurrent", the reinforcement gives an exponential bias towards the root, so that X only moves away from the root at logarithmic speed. In what follows, we denote by P the probability measure for Kesten's tree T ∞ and for the infinite stable tree T ∞ γ .
Theorem 1.2. [Strongly recurrent regime α < 1]. Let (X t ) t≥0 be the limiting diffusion of Theorem 1.1 on the infinite stable tree T ∞ γ when α < 1. If ∆ > 0, we have that P × P O -almost surely, where by d we denote the tree metric on T ∞ γ .
Remark 1.3. Takei [Tak21] also proved similar results for the discrete-time LERRW on Z. Lupu, Sabot and Tarrès also proved a result analogous to Theorem 1.2 in the case α = 0 for the limiting diffusion on R [LST20, Proposition 4.9]; this corresponds to taking the initial occupation profile function L 0 equal to 1 everywhere, except possibly a compact interval. Our long-time asymptotics results cover a larger class of initial occupation profiles, and we additionally have to take care of fluctuations in the underlying tree T ∞ γ .
In the critical case α = 1, the slow-down effect from the reinforcement is almost balanced by the increasing initial edge weights, so we lose the exponential attraction towards the root and no longer see the slow movement. Instead the attraction is of polynomial order and this is reflected in the results of Theorem 1.4. Because the reinforcement effect and the volumes of balls in the tree both grow on a polynomial scale, we see that there are two regimes depending on which one has the dominant effect. (ii) If ∆ > 2γ−1 γ−1 , then for any β > 1 2 , we have that P × P O -almost surely, In the transient regime when α > 1, the resistance techniques used to prove Theorem 1.1 no longer provide the machinery to identify the LERRW scaling limit since the rescaled resistances degenerate in this regime. However, we can work in the discrete setting and obtain an upper bound on the displacement of the random walk. It is somewhat more delicate to obtain lower bounds on the displacement since this requires uniform control of the environment. However, we believe that it should also be possible to get a comparable lower bound using discrete analogues of the arguments used to obtain this control in the recurrent case.
Despite the bias of our initial weights away from the root, the result of Theorem 1.5 contrasts strongly to those obtained on b-regular trees with constant initial weights by Collevecchio [Col06], who proved that the speed is positive when b is large enough or under an appropriate moment condition for the LERRW. This was later extended to all b ≥ 2 by Aidekon [Aid08]. In our case, Theorem 1.5 holds because the LERRW is slowed down by the time spent in dead ends of the tree; this effect is particularly pronounced because of the fractal properties of the tree (i.e. many dead ends en route to and from other dead ends). By contrast, in regular trees there are no dead ends, and exponential volume growth, which gives a much stronger bias away from the root.
In the same spirit, due to the uniqueness of the path to infinity in T ∞ , the LERRW on T ∞ can be viewed as a LERRW on Z + slowed down by extra excursions into subtrees hanging off this path. This can be quantified: on Z + , with the same initial weights, it is possible to show that sup m≤n |Y m | ≥ n 1 2 −o(1) (see Remark 8.5). Since γ > 1, the LERRW is really slower on the trees we consider (note however that it is not true that T ∞ → Z + in any sense as γ ↓ 1, in fact due to the heavy tails, the Galton-Watson trees undergo a condensation phenomenon, e.g. see [KR19]).
Here we briefly record some notation that we will use throughout the paper.
T n critical Galton-Watson tree with offspring distribution satisfying (2) d n graph distance on T n µ n uniform probability measure on the vertices of T n R n distorted graph distance on T n (17) ν n distorted measure on the vertices of T n (17) O n root of T n T ∞ critical Galton-Watson tree conditioned to survive, with offspring distribution satisfying (2) root of T c γ or T ∞ γ a n scaling factor for T n : n −1 a n T n continuous-time constant speed RWDE on T ñ P On,ω quenched law of Z (n) P (b) On annealed law of Z (n) X scaling limit of X (n) as in Theorem 1.1 P O,φ quenched law of X (over the environment)

Organisation
The paper is organised as follows. In Section 2 we give background on critical Galton-Watson trees and the topologies used in this paper. In Section 3 we make the connection between LERRW and RWRE precise, and construct a candidate for the scaling limit of LERRW. In Section 4 we prove that the resistance metrics and measures characterising the associated RWRE converge to those characterising the aforementioned limit candidate, which proves Theorem 1.1. The second half of the paper is devoted to the proofs of Theorems 1.2 -1.5: in Section 5 we establish some properties of T ∞ γ and its associated Gaussian potential, then in Sections 6 -8 we respectively prove Theorems 1.2 -1.5.

Acknowledgements
We would like to thank David Croydon for introducing us to this topic, and Andrea Collevecchio for useful comments on the manuscript. The research of EA was supported by JSPS and ERC starting grant 676970 RANDGEOM.

Critical Galton-Watson trees
To prove the convergence we will view our Galton-Watson trees as plane trees using the following Ulam-Harris labelling notation for discrete trees. To define these, we follow [Nev86] and first introduce the set By convention, N 0 = {∅}. If u = (u 1 , . . . , u n ) and v = (v 1 , . . . , v m ) ∈ U, we let uv = (u 1 , . . . , u n , v 1 , . . . , v m ) denote the concatenation of u and v.
We let T denote the set of all plane trees. If T ∈ T and u ∈ T we also let τ u (T ) = {v ∈ U : uv ∈ T } denote the subtree emanating from u. In this paper we are interested in random plane trees, more specifically Galton-Watson trees. To define these, first let ξ be a probability measure on Z ≥0 . We will refer to ξ as the offspring distribution.
Definition 2.2. A Galton-Watson tree with offspring distribution ξ is a plane tree T with law P ξ satisfying the following properties: (ii) For every j ≥ 1 with ξ(j) > 0, the shifted trees τ 1 (T ), . . . , τ j (T ) are independent under the conditional probability P ξ (· | k ∅ = j), with law P ξ .
In other words, a Galton-Watson tree with offspring distribution ξ is a branching process with a single root ∅, where the trees emanating from each vertex are independently distributed according to P ξ . It is shown in [Nev86, Section 3] that for any probability measure ξ on Z ≥0 , there is a unique probability measure P ξ on Ì satisfying the above two properties.
We will restrict to Galton-Watson trees with a critical aperiodic offspring distribution in the domain of attraction of a γ-stable law for some γ ∈ (1, 2], by which we mean that ∞ k=0 kξ(k) = 1 and there exists an increasing sequence a n ↑ ∞ such that, if ( as n → ∞, where Z γ is a γ-stable random variable, i.e. can be normalised so that E e −λZγ = e −λ γ for all λ > 0. In the finite variance case we always have γ = 2. It is shown in [BGT89, Section 8.3.2] that necessarily a n = n 1 γ L(n) for some slowly-varying function L. Equivalently, ξ([n, ∞)) = n −γ L(n) (but not necessarily with the same L). Throughout the paper we will make the assumption that γ ∈ (1, 2], and let (a n ) ∞ n=1 be the sequence appearing in (2). A Galton-Watson tree T conditioned to have n vertices can be coded by a random walk (W T m ) 0≤m≤n called the Lukasiewicz path; this is defined by setting W T 0 = 0, then for m ≥ 1 listing the vertices u 0 , u 1 , . . . , u n−1 in lexicographical order and setting W T m+1 = W T m + k um (T ) − 1. It is not hard to see that W T m ≥ 0 for all 0 ≤ m ≤ n − 1, and W T n = −1. A discrete tree T conditioned to have n vertices can be also characterised by its contour function. To introduce the contour function, we imagine that the tree is embedded in the plane in such a way that each edge has length one. Consider a particle that is placed at the root of the tree at time i = 0 and then traverses the tree, moving continuously along the edges at unit speed (respecting the left-right order of the vertices), until all vertices have been explored and the particle has returned back to the root. Then, we denote by C T (i) the distance to the root of the position of the particle at time i. More specifically, letting x i denote the i-th visited vertex by the particle, set By convention, we also set x 2n = x 0 , and C T (2n) = 0. Naturally, we extend C T by linear interpolation between integer times. Note that the particle visits every vertex apart from the root a number of times given by its degree.

Gromov-Hausdorff topology and correspondences
In this paper we will be interested in pointed Gromov-Hausdorff-Prokhorov (GHP) convergence between pointed compact metric measure (mm) spaces. Accordingly, let K c denote the set of quadruples (X, d, µ, O) such that (X, d) is a compact metric space, µ is a finite Borel measure of full support, and O is a distinguished point, which will play the role of the root. Given two pointed mm-spaces (X, d, µ, O) and (X ′ , d ′ , µ ′ , O ′ ), the pointed GHP distance between them is defined as where the infimum is taken over all isometric embeddings φ : X → F , φ ′ : X ′ → F into some common metric space (F, δ), where d F H denotes the Hausdorff distance between two subsets of F and where d P denotes the Prokhorov distance between finite Borel measures on F . For a definition of the latter, see [Bil99, Chapter 1]. If the second term is omitted from the right-hand side of (3), this is known simply as the pointed Gromov-Hausdorff (GH) distance, and is denoted by d GH . We say that two spaces (X, d, µ, O) and (X ′ , d ′ , µ ′ , O ′ ) in K c are equivalent if there exists a measure and a root-preserving isometry between them. It is well-known, see [ADH13, Theorem 2.3], that the pointed GHP distance defines a metric on the space of equivalence classes of K c , and that this is a Polish metric space.
To prove Theorem 1.1, we will use the helpful notion of correspondences. A correspondence R between two metric spaces (M, R) and (M ′ , R ′ ) is a subset of M × M ′ such that for every x ∈ M , there exists y ∈ M ′ with (x, y) ∈ R, and similarly for every y ∈ M ′ , there exists x ∈ M with (x, y) ∈ R. It is straightforward to show (e.g. see [BBI01,Theorem 7.3.25]) that where the infimum is taken over all correspondences R between (M, R) and (M ′ , R ′ ) that contain the pair of distinguished points (O, O ′ ). The quantity on the right-hand side above is known as the distortion of R, and is denoted by dis(R).
In this paper, we will prove GHP convergence by first proving GH convergence using correspondences, and then prove Prokhorov convergence between the measures on the canonical embedding induced by the correspondence.

Stable trees
If T n denotes a discrete Galton-Watson tree conditioned to have n vertices with critical aperiodic offspring distribution ξ in the domain of attraction of a γ-stable law, endowed with the graph metric d n , it is wellknown that it admits a compact metric space scaling limit, known as the γ-stable tree. We denote this by T c γ . More precisely, it is shown in [Duq03a, Theorem 3.1] that n −1 a n T n → T c γ (4) in the GH topology as n → ∞, where a n is defined as in (2). The limiting space T c γ can be formally defined from a γ-stable Lévy excursion which plays the role of a continuum Lukasiewicz path (e.g. see [DLG05] for details). Given a spectrally positive γ-stable Lévy excursion X (γ) , we define the height function H (γ) to be the continuous modification of the process The limit above exists in probability, see [DLG02, Lemma 1.
It is obvious that d is symmetric and satisfies the triangle inequality. One can introduce the equivalence relation s ∼ t if and only if is the γ-stable tree T c γ , which can be proven to be almost surely a compact metric space [DLG05, Theorem 2.1]. In addition, this construction provides a natural way to define a canonical (nonatomic) probability measure associated with it, µ, which has full support. Denote by p H (γ) : [0, 1] → T c γ the canonical projection. For every A ∈ B(T c γ ), we let denote the image measure on T c γ of Lebesgue measure ℓ on R by the canonical projection p H (γ) . We briefly outline how to prove (4) since we will use a similar strategy in Section 4. Using Skorohod's Representation theorem, we can assume that we are working on a probability space under which the contour function of T n , which is normalised by setting: where C n is the contour function of T n , converges almost surely (with respect to the uniform norm) to the height function constructed by a spectrally positive γ-stable Lévy excursion. The convergence in distribution was originally shown by Duquesne in [Duq03b, Theorem 3.1]. We construct the related correspondence where x i is the i-th visited vertex in the exploration of the outline of T n and p H (γ) (t) is the equivalence class of t with respect to the relation ∼. It is straightforward to show that the distortion of this correspondence between n −1 a n T n and T c γ is upper bounded by 2||C (n) − H (γ) ||, where ||C (n) − H (γ) || stands for the uniform norm of C (n) − H (γ) . Since (7) holds, the convergence of the metric spaces follows. In fact, if µ n denotes the uniform probability measure on its vertices, it is the case that (T n , n −1 a n d n , µ n ) → (T c γ , d, µ) in distribution with respect to the GHP distance between compact mm-spaces, see [LG06,Theorem 4.2], which is a corollary of the result originally proved in [Duq03a]. Although the uniform probability measure µ n on the vertices of T n was not considered in [Duq03a], it is not difficult to extend the result in (4) to include it since the work regarding the convergence in (9) has already been done.

Infinite critical trees
A critical Galton-Watson tree is almost surely finite, so to study the long-time asymptotics of LERRW on critical trees and more specifically to prove Theorems 1.2 -1.5, we will instead consider the model on a Galton-Watson tree conditioned to survive. In the discrete setting such a model is naturally described by Kesten's tree, denoted T ∞ and defined as follows.
The Kesten's tree T ∞ associated to the probability distribution ξ is a two-type Galton-Watson tree distributed as follows: • Individuals are either normal or special.
• The root of T ∞ is special.
• A normal individual produces only normal individuals according to ξ.
• A special individual produces individuals according to the size-biased distribution ξ * . Of these, one of them is chosen uniformly at random to be special, and the rest are normal.
Almost surely, the special vertices form a unique one-ended infinite backbone of T ∞ . Aldous in [Ald91a] coined the term sin-trees for such trees, since they have a single infinite spine. Although a critical Galton-Watson tree is almost surely finite, Kesten [Kes86b, Lemma 1.14] showed that T ∞ arises as the local limit of a critical Galton-Watson tree with offspring distribution ξ as its height goes to infinity.
Kesten's construction has been imitated in the continuum by Duquesne in [Duq09], who constructs continuum sin-trees and shows that these arise as the appropriate local limit of compact continuum trees conditioned on being large. Duquesne's construction involves defining two height functions from two independent Lévy processes in the same way as done for the compact case. In the stable case, we denote the infinite stable sin-tree by T ∞ γ . It is also possible to show that T ∞ γ is the scaling limit of a discrete tree constructed from a critical aperiodic γ-stable offspring distribution as constructed in Definition 2.3 (e.g. by following a similar strategy to [Arc20a]). As in the discrete case, T ∞ γ has a single infinite path to infinity, to which compact stable trees are grafted.

Connection between LERRW and a RWRE on trees
The fine mesh limit of the LERRW in dimension one, which was called linearly reinforced motion (LRM), was introduced in [LST20]. In fact, the authors construct the continuous space limit of the vertex-reinforced jump process (VRJP) out of a convergent Bass-Burdzy flow [BB99]. To obtain LRM as the continuous space limit of the LERRW, they then use the close relation between the VRJP and the LERRW that was established in [ST15], namely that the LERRW can be represented in terms of a VRJP with independent gamma conductances. The principles used in [LST20] are broadly the same as those used in this paper.
In the setting of [LST20] an appropriate potential related to the VRJP converges. This yields a characterisation of the limiting LRM as a diffusion in random potential that contains a Wiener term and a drift (the motion drifts towards the places it has already visited many times), or equivalently describes the LRM as a scaletransformed and time-changed diffusion in a random environment, cf. [Sin82,Bro86,Sei00,Car97,Pac16].
In our setting, we work directly on the trees and view them as electrical networks equipped with a so-called resistance metric and a measure that we will specify in Section 3.1. The latter provide the natural scale functions and speed measures of a RWRE associated with the LERRW in a representation of Pemantle, see Section 3.2. Then we are able to use the main result of [Cro18] to yield convergence of the RWRE as a consequence of the convergence of the resistance metric and the speed measure. Since the resistance metric and the speed measure are expressed in terms of the potential of the RWRE, their convergence can be deduced from the convergence of the aforementioned potential, see Sections 4.1 and 4.2.
The limiting resistance metric and speed measure on the limiting stable tree are distortions of its canonical metric and uniform mass measure. These distortions are expressed in terms of an exponential that includes a tree-indexed Gaussian term and a drift. The Gaussian term corresponds to a tree-indexed process that is essentially a Brownian snake on the stable tree. This yields a characterisation of the limiting diffusion X as a diffusion in random potential on the stable tree, see Section 3.3. We note here that the notion of a "scale change" on general separable real trees was formalised in [AEW13].

Random walk in a random environment
In order to study a LERRW on T n we will use a representation of a LERRW as a RWRE. We briefly introduce the formalism of a RWRE here.
In a rooted metric tree (T, d T , O), we define for u, v ∈ T the path intervals We say that u and v are connected by an edge, denoted u ∼ v, if and only if u = v and [u, v] = {u, v}. We also define a partial order on T by setting u v (u is an ancestor of We call u ∧ v the most recent common ancestor to u and v. For a fixed tree T with root O, a RWRE on T can be constructed by assigning each edge e ∈ E(T ) a random conductance. At each time, if a random walk is currently at vertex v it moves to one of the neighbours of v with a probability proportional to the conductance of the edge joined to v. Rather than defining the full set of conductances (c(e)) e∈E(T ) , we can equivalently assign a single conductance to a single special "root edge" e root and then sample a sequence of random variables (W v ) v∈T , so that gives the ratios of the conductances of edges emanating from v. We will follow this approach as it allows us to formalise the connection to Pólya urns outlined in the introduction.
We now assume that an environment ω = c(e root ), which in our case represents the transition probability from ← − v to the ancestor We add a new vertex ← − O , which we call the base and attach it to the root with a new edge. This edge will play the role of e root and we call the new tree the planted tree. To keep the notation simple, even if the statements are expressed in terms of this new planted tree, say T * , we still phrase them in terms of T . It follows from (10) that the conductance of an edge { ← − v , v} is given by This motivates the definition of the potential on T as By defining the RWRE from conductances in this way, we have in fact defined it via an electrical network and can therefore take advantage of electrical network theory (e.g. see [LPW17,Chapter 9] for an introduction). The conductance of an edge { ← − v , v} is given by (11), and consequently its electrical resistance is given by Moreover, since our graph is a tree, the electrical resistance between any two vertices is given by where in the first sum, E u,v is the set of edges contained in [u, v]. Clearly, R defines a metric on T (this also follows from a more general result of Tetali [Tet91]). We will take the measure This is the stationary reversible measure for a stochastic process associated with (T, R, O). This stochastic process has generator which acts on L 2 (T, µ) as follows: Section 2] for more details on the correspondence between graphs equipped with edge conductances and a measure, and stochastic processes). This stochastic process is a continuous-time random walk Z having exp(1) holding time at each non-root vertex, and at each jump time, the random walk traverses an edge incident to its previous state, chosen with probability proportional to the conductance of each of the available edges. At the base, Z jumps from The random walk transitions from the base to the root with probability 1. Moreover, the overall rate at which Z jumps from O to its collection of children is 1.
Remark 3.1. The electrical resistance between x and y can be alternatively defined by is a quadratic form on T . In fact, E is a regular Dirichlet form on L 2 (T, µ), and corresponds to electrical energy. For such functions f and g, we note that We will not directly use the machinery of Dirichlet forms in this paper.
The RWRE that we use in this paper will be defined by a collection of Dirichlet random variables taking the role of (W v ) v∈T .
Definition 3.2 (Dirichlet distribution). For a finite set I and positive real numbers (b i ) i∈I ∈ (0, +∞) I , the Dirichlet distribution D((b i ) i∈I ) on for any choice of j 0 ∈ I.
The Dirichlet distribution can alternatively be defined as the law of a normalised Gamma vector. One can indeed check that the following lemma holds.
Lemma 3.3. [Pit06, Section 0.3.2]. Let (W i ) i∈I be independent random variables such that, for i ∈ I, Then, Furthermore, (U i ) i∈I is independent of i∈I W i .

Representation of LERRW as a random walk in random conductances
In this section we outline the connection between LERRW, Pólya urns and Dirichlet random variables on trees. This connection was first used in the context of (non-directed) edge reinforced random walk on trees by Pemantle [Pem88] where, due to the absence of cycles, the process evolves independently at each vertex.
Given a tree T , let α = (α e ) e∈E(T ) ∈ (0, +∞) E(T ) be a sequence of positive initial weights on the edges, and let O denote the root of T . We consider edges to be undirected, so that e = {x, y} = {y, x} denotes the edge joining two vertices x and y.
Definition 3.4. The discrete-time linearly edge reinforced random walk (LERRW) on T with initial weights α and starting at O is the process on T with law P O -a.s., X 0 = O and, for all n ≥ 0, for all edges e ∈ E(T ), where N e (n) = α e + #{0 ≤ k ≤ n − 1 : (X k , X k+1 ) = e}∆. We call ∆ > 0 the reinforcement parameter.
In other words, at time n the walk jumps along a neighbouring edge e chosen with probability proportional to its current weight N e (n). This weight is initially equal to α e and then increases by ∆ each time the edge e is traversed.
Consider a vertex v ∈ T and suppose v has #v offspring in T . Let e 0 (v) denote the edge joining v to its parent, and e 1 (v), ..., e #v (v) denote the edges joining v to each of its offspring. When the LERRW arrives at v for the first time, it must have arrived via the parent, so the weights of these edges will be respectively given by the components of the vector Moreover, since T is a tree, if the LERRW exits v by edge e i (v), it must also return to v via e i (v), so that at the next visit to v the weight of e i (v) will have increased by 2∆ and all other weights will have remained the same. Moreover, this holds independently of what happens to the LERRW between its successive traversals of e i (v). The same logic applies on subsequent visits to v, with the weights updating each time.
It follows that the decisions of the LERRW process are ruled by independent Pólya urns, one per vertex, where outgoing edges play the role of colours, and α ei(v) determines the initial number of balls of colour i.
Since the asymptotic proportion of colours in such an urn model follows the Dirichlet distribution, conditional on which the draws are independent and identically distributed as Bernoulli random variables with success probability given by the asymptotic proportion of the drawn colour, the reinforced walk may equivalently be obtained by assigning a Dirichlet random variable ω (x,·) to each vertex x, which then defines the transition probabilities of the walk every time it is at x. This is the definition of the random walk in Dirichlet random environment (RWDE) that we define below. To get this equivalent representation of LERRW we have to average over all the Dirichlet random variables.
Given v ∈ T , let the positive initial weights from v to its neighbouring vertices ← − v , v 1 , ..., v #v be denoted by the vector with α e0(v) being the positive weight to the parent of v. We define the set of environments on T as . We shall denote by ω a random environment sampled from Ω T .
Definition 3.5. For ω = (ω v ) v∈T ∈ Ω T , the quenched random walk in random environment ω starting at O is the Markov chain on T starting at O and with transition probabilities (ω v ) v∈T . We denote its law bỹ P O,ω . Thus,P O,ω -a.s., X 0 = O and for all n ≥ 0, for all edges e ∈ E(T ), We define the Dirichlet distribution on T with parameters (b v = (b e v ) v∈e ) v∈T as the product distribution: Now let us consider the joint lawP Recall from Definition 3.4 that P Lemma 3.6. [Pem88, Lemma 2]. Fix a tree T . If α = (α e ) e∈E(T ) is a collection of positive initial weights for a LERRW, then for any k ≥ 0 and any O = x 0 , x 1 , ..., x k ∈ T , we have that Remark 3.7. On general (non-tree) graphs this representation of non-directed LERRW is not available since a random walk may return to a vertex v via a different edge from that which it used to leave v. Therefore, the urn models at each vertex do not update independently of each other. However, on general graphs, non-directed LERRWs can still be seen as random walks in an explicit correlated random environment. For instance, see [ST15]. Additionally, it is possible to represent a directed LERRW on general graphs as a RWDE, see for example [ST17, Lemma 2].
In this paper we want to consider a LERRW on the random tree T n . We consider the quenched law of LERRW on T n , by which we mean that we first sample T n and then run a LERRW on T n started at ← − O n according to Definition 3.4. In order to obtain a non-trivial scaling limit, we will consider a LERRW on T n with initial weights given by (1) and reinforcement parameter ∆. In light of Lemma 3.6, we will construct the scaling limit of LERRW on T n by instead constructing the scaling limit of the corresponding RWDE on T n , which can be achieved by representing it as an electrical network endowed with a resistance metric and a measure as outlined in Section 3.1. This is equivalent to sampling a random environment ω = (c(e root ), W (n) ) and defining a resistance metric R n and a measure ν n from ω as in (12) and (13). The edge { ← − O n , O n } will play the role of e root .
To sample the random environment, we therefore define the parameters b (n) = (b We will sample the Dirichlet distribution with parameter b (n) v using Lemma 3.3. Since we are only interested in the ratios of the Dirichlet weights, we can work directly with the non-normalised Gamma weights, so we assume that our random environment (14), and without loss of generality take c({ ← − O n , O n }) = 1 for simplicity.
In the setup of Section 3.1, we therefore have that For simplicity, we henceforth suppress the dependence on ω n and write ρ v , V (n) , R n and ν n instead of ρ v (ω n ), V ωn , R ωn and ν ωn respectively. Moreover, we let Z (n,pl) denote the stochastic process associated with the triple (T * n , R n , ν n ), which was called the planted tree, see Section 3.1. Since Z (n,pl) is defined on the planted tree, we define Z (n) to be its trace on the unplanted tree T n ; that is, we set It follows from our definitions that Z (n) (t) is a continuous-time random walk on T n , with exp(1) holding times at each vertex, and that at each jump time, the transition probabilities from a vertex are given by the Dirichlet weights of (15).
Finally, in light of Lemma 3.6, in order to connect this with LERRW we need to consider the law of Z (n) annealed over the Dirichlet weights. We denote this process Z (n) and its lawP It follows from Lemma 3.6 that underP , Z (n) has the law of a quenched (with respect to the randomness of T n ) continuous-time LERRW on T n , with initial weights (1) and reinforcement parameter ∆.

Candidate for the scaling limit
To construct the scaling limit of (T n , R n , ν n ) we need to imitate the definitions of (17) in the continuum. To this end, we take T c γ as in Section 2.3, denote its root by O and for all σ, σ ′ ∈ T c γ we set (cf. (5)): where σ ∧ σ ′ denotes the most recent common ancestor of σ and σ ′ . This is a metric on T c γ .
Definition 3.8 (Snake process). Let (φ (α) (σ)) σ∈T c γ be the R-valued Gaussian process with law P whose distribution, given T c γ is characterised by Since which is a metric on T c γ , it can be seen by the same arguments as in [DLG05, Section 6] that such a process φ (α) exists, and that it has a continuous modification, with bounded sample paths, P × P-almost surely. In the continuum, φ (α) will play the same role as the potential V ωn in (17).
To capture that our diffusions are processes on "natural scale" it is desirable to introduce the notion of a length measure on T c γ which extends Lebesgue measure on R. For real trees it was first presented in [EPW06] and later extended to any separable 0-hyperbolic metric space in [ALW17]. Denote the skeleton of (T c γ , d) by Recalling that T c γ is a separable pointed metric space, observe that if D ⊂ T c γ is a dense countable subset, then the previous definition is still the same when the union is taken over points in D. In particular, Alternatively, the length measure is the trace onto the skeleton of the γ-stable tree of the one-dimensional Hausdorff measure on it.
Now fix the reinforcement parameter ∆ > 0 and let W be the space of continuous functions T c γ → R vanishing at the root, and let Ω be the space of processes R + → T c γ . Given a realisation φ (α) ∈ W, we firstly define for all x, y ∈ T c γ , and secondly define ν φ to be the measure which is absolutely continuous with respect to the measure µ defined in (6) with density given by We let T γ,α denote the random mm-space (T c γ , R φ , ν φ ), let X = (X t ) t≥0 denote the process canonically associated with it via the resistance form of [Cro18, Definition 2.1], and letP O,φ denote the quenched law of X when started from O. Finally, given T c γ , we denote by P O the corresponding annealed probability measure of the process on W × Ω defined by Remark 3.9. We can also view ((X t ) t≥0 ,P O,φ ) as an ν φ -Brownian motion on (T c γ , R φ ) as characterised in [AEW13, Proposition 1.9]. In this setting, (X t ) t≥0 is a diffusion process with regular Dirichlet form As usual, by C ∞ (T c γ ) we refer to the space of continuous functions which vanish at infinity. A function u belongs to A if and only if u is locally absolutely continuous. We need to stress that the gradients of u and v correspond to gradients of We will not directly use the theory of Dirichlet forms in this paper (though it is hidden behind the result we apply from [Cro18]).
Remark 3.10 (Connection to 1d results). The analogue of [LST20, Proposition 1.9] (which is written with a scaling factor of 2 n rather than na −1 n ) equally applies to the LERRW on a single branch of n −1 a n T n with initial weights Comparing with (1), we see that we can formulate our process in the setting of [LST20, Proposition 1.9] by taking L 0 ( According to [LST20, Proposition 1.9], such a rescaled LERRW on Z converges to a diffusion in the random potential , where W is a standard Brownian motion on R (the final term of log ∆ above appears since we have rescaled everything by ∆ to fit into the framework of [LST20]). In particular, we can identify W ((|x| + 1) 1−α − 1) with φ (α) (|x|) and W (log(|x| + 1)) with φ (1) (|x|), with a slight abuse of notation. Substituting the above values of L 0 (x) and S 0 (x) we obtain a limiting potential of the form when α = 1, which is consistent with (21) up to the constant log 2 (but note that the effect of adding a constant cancels out when inserted into both (21) and (22)).
4 Scaling limit of (T n , R n , ν n ) and the LERRW X (n) Our goal in this section is to prove the following proposition. The space (T n , R n , ν n ) is as defined just below (17).
Proposition 4.1. Under the joint law P × P and with initial weights as in (1), the following convergence holds with respect to the GHP topology as n → ∞: Throughout the section, we will work pointwise on the probability space (Ω, F , P) (on which we defined (T n , d n , µ n )), and where the convergence of (4) holds almost surely. This means that most of the statements that follow should be written conditionally on T n or on T c γ . To make the arguments clearer to follow, we have not written this explicitly in the statements or the proofs, and instead ask the reader to keep this in mind throughout. There is one specific case (Proposition 4.8) where we need to restrict to a certain set A n,ε ⊂ Ω, and in this case we make it explicit.
We also let V (n,α) (2nt) = V ωn (x ⌊2nt⌋ ), where V ωn is as defined in (17). At times we will abuse notation and We will prove Proposition 4.1 in two main steps. We first apply Skorohod Representation and assume that (7) holds almost surely on (Ω, F , P). On this space, the set of all pairs (x ⌊2nt⌋ , p H (γ) (t)) defines a correspondence between n −1 a n T n and T c γ with distortion going to 0; see (8) and Section 2.2 for details. We first prove the following claim.
Claim 4.2. Take α ≤ 1. We have for almost every ω ∈ Ω that with respect to the topology of uniform convergence on C([0, 1]).
Then, we apply Skorohod Representation a second time to work on a probability space where the convergence of (24) also holds almost surely. On this space, we show that the rescaled resistances and measures defined in (17) converge to the limit candidates suggested in (21) and (22).

The limiting potential
In this subsection we establish (24) above.

Proof of Claim 4.2
Throughout this section, recall that (Ω, F , P) is a probability space where the convergence of (7) holds almost surely. This also implies that the convergence of (4) holds almost surely (see [LG06, Theorem 4.2]). For n ≥ 1 and x ∈ T n we define the shorthand Note though that the reinforcement parameter for the process defined in Section 3.2 is still ∆, not ∆ n . If x ∈ T n , then the Dirichlet weights for the model have parameters given by (16), so that by (1) the weight at x is given by For each x ∈ T n \ {O n }, as the ratio of two independent Gamma random variables, we observe that and (ρ x ) x∈Tn\{On} is a sequence of independent random variables. Here β ′ (a, b) denotes the beta prime distribution with positive parameters a and b, and has probability density function where B is the beta function.
We start by giving some preliminary claims about the expectation and variance of the quantities log ρ x summed along a branch. The proofs are elementary but are included in the appendix.
Claim 4.5. For all α ≤ 1 and almost every ω ∈ Ω, it holds for all x ∈ T n and all Proof. See Lemma A.3 in the Appendix.
We will prove the convergence of Claim 4.2 in two steps. Firstly we consider the recentred processes defined by M for 0 ≤ i ≤ 2n, and extended to non-integer time indices by interpolation. We then prove that M converges to an appropriate snake process as follows. Firstly, by applying a martingale CLT along finitely many branches of T n , we deduce that the finite dimensional marginals of M (n,α) converge to those of the snake process. We also set 1. Almost surely on Ω, it holds for each t ∈ (0, 1) that as n → ∞.
2. First note that, for any t ∈ (0, 1), Also note that it follows from Claim 4.3(i) and Claim 4.5 that almost surely on Ω, we have for all Taking k = (na −1 n ) 1/2 , the latter expression in (27) is upper bounded by n sup For any ε > 0, there exists N ε < ∞ such that the minimum in the final integrand is equal to the second expression for all y ∈ (ε 2 , ∞). By dominated convergence, this integral therefore tends to 0 as n → ∞, as required. (In fact, for fixed ε > 0, this upper bound is also uniform over t ∈ (0, 1)).
3. Without loss of generality we can assume that t 1 , . . . , t k ∈ 1 2n Z; the general case follows since we interpolate and the contribution of a single jump goes to zero in probability as n → ∞ (for example by Markov's inequality). Note that, for fixed t ∈ (0, 1), parts 1 and 2 exactly verify the conditions given in [HH14, Corollary 3.1 and Theorem 3.2] to imply that as n → ∞. Therefore, given such a sequence 0 < t 1 < . . . < t k < 1, for each k ≥ 1 we can define an augmented sequence obtained by adding the time indices corresponding to all the most recent common ancestors of pairs of vertices in the original sequence. Given this augmented sequence, we can then sum the contributions along the relevant branch segments between vertices of the form x ti and x tj , where t i and t j are such that there is no ℓ ≤ k with t i t ℓ t j . Since the process evolves independently along distinct branch segments and the sum of independent Gaussians is Gaussian, the finite-dimensional result then follows from (29).
In order to strengthen the above convergence, we will verify Kolmogorov's tightness condition. We first give a preliminary lemma. For this, for α < 1 we first define d We first give a useful lemma. Lemma 4.7. Almost surely on Ω, we have for all n ≥ 1 and all s, t ∈ [0, 1] that Proof. By breaking at the most recent common ancestor, it is enough to prove this when s t. If α ≤ 0, we just write If instead α ∈ (0, 1), we treat two cases: We are now ready to verify the tightness condition.
Proposition 4.8 (Kolmogorov's condition). For every ε > 0, n ≥ 1 there exist p > 0, q > 1, C ε,p,q < ∞ and an event A n,ε ⊂ Ω with P(A n,ε ) ≥ 1 − ε, such that on the event A n,ε , we have for all s, t ∈ [0, 1] that In particular, Kolmogorov's tightness condition is satisfied on the event A n,ε and for almost every ω ∈ Ω, the convergence of (26) holds in distribution on the space C([0, 1]) equipped with the uniform topology.
Proof. First note that it follows from [Mar20, Lemma 1.4] and [Kor17, Theorem 2] that for any γ ′ < γ−1 γ and ε > 0, we can choose D ε , C ε,γ ′ < ∞ such that with probability 1 − ε, we have that (Note that [Mar20, Lemma 1.4] actually states a result using the lexicographical ordering rather than the contour ordering as we use, but since the difference in the labels of two fixed vertices can only decrease by at most a factor of 2 in the contour ordering, this immediately implies the same result with the contour ordering and therefore really implies the first line of (31)).
Case α < 1. Assume for now that 2ns and 2nt are integers and that x 2ns is an ancestor of x 2nt . This will then extend to the general case by breaking paths at the most recent common ancestor and since M (n,α) is defined by interpolation. Note that by (28), we almost surely have for all Therefore, for all 1 ≤ k ≤ (na −1 n ) 1/2 we deduce that there exists C ∆ < ∞ such that (using also that, given T n , the sequence (ρ x ) x∈Tn is independent) where d (α) n (s, t) is given by (30). Applying Lemma 4.7, we deduce that this is upper bounded by Assume that d In the case α ≤ 0, we deduce from a Chernoff bound that there exists C ε,∆,α < ∞ such that on the event A n,ε , we have for all y > 0 that where (33) was used to provide the first bound in the inequality above. Moreover, by Claim 4.5 the same result holds on replacing ρ x by ρ −1 x , meaning that we can instead consider the absolute value of the sum in the bound above.
We obtain the same upper bound in the case α > 0 (modifying the constant C ε,∆,α a bit if necessary). We deduce that, on the event A n,ε , for all p > 1 there exists C p,ε,∆,α < ∞ such that for all s, t ∈ [0, 1] with d By replacing C p,ε,∆,α with D ε C p,ε,∆,α this extends to the case d n (s, t) > 1 on the event A n,ε . Finally, this extends to the general case where 2ns and 2nt are not integers and x 2ns is no longer an ancestor of x 2nt by breaking paths at the most recent common ancestor and since M (n,α) is defined by interpolation.
We now suppress the dependence on ∆ and α since these are assumed to be fixed. We deduce that, on the event A n,ε , we have for all γ ′ < γ−1 γ that there exists C p,ε,γ ′ (also applying (31)) such that for all s, t ∈ [0, 1], This proves the result by taking p large enough.
Case α = 1. Again we can assume, without loss of generality, that 2ns and 2nt are integers and that x 2ns is an ancestor of x 2nt . This time, repeating the arguments that led to (32) gives that there exists n (s, t) .
Again we assume that d n (s, t) ≤ 1 and set As in (34), this implies that there exists C ε,∆ < ∞ such that for all y > 0, Again by Claim 4.5 we can replace ρ x by ρ −1 x in this argument, so we can continue as in (35) to deduce that on the event A n,ε , we have that for any p > 1 there exists C p,ε,∆ < ∞ such that for all s, t ∈ [0, 1] with d The proof then proceeds as in the case α < 1.
The proof in the case α = 1 follows similarly.

Convergence of the metric measure spaces
In Section 4.1, we showed that, almost surely on Ω, the relevant processes converge in distribution with respect to the Skorohod-J 1 topology and therefore with respect to the uniform topology since the limit process is continuous. Therefore, applying Skorohod's Representation theorem again (the space of continuous functions on [0, 1] is separable), we will work on a probability space (Ω ′ , F ′ , P ′ ) where these functionals additionally converge almost surely with respect to the uniform topology.
Proof of Proposition 4.1. Part 1: convergence of metrics. In the first part of the proof we will show that the distortion of the natural correspondence, defined by converges to 0 as n → ∞, almost surely on Ω ′ . It is enough to prove this just in the case t = 0 since where (x ⌊2nr⌋ , p H (γ) (r)) ∈ R n , where r ∈ [s, t] is any time between s and t at which the minimum of H (γ) is achieved, and where r n is the index of the most recent common ancestor of x ⌊2ns⌋ and x ⌊2nt⌋ . We will therefore first establish (38) when t = 0 and then show that n −1 a n R n (x rn , x ⌊2nr⌋ ) → 0, uniformly over s, t ∈ [0, 1].
By the definition in (21): For r ∈ (0, 1), let us set Combining (39) and (40), we deduce that for any pair (x ⌊2ns⌋ , p H (γ) (s)) ∈ R n , There are two steps to show that almost surely on Ω ′ , the right-hand side converges to 0 uniformly over uniformly over s ∈ [0, 1], by (7) and (24) (recall that we are working on a probability space where the convergences of (7) and (24) hold almost surely). Secondly, using (20) it is not hard to see that which also goes to 0 almost surely as n → ∞, by (7) (which holds almost surely on Ω ′ ).
This therefore shows that each individual term on the right hand side of (41) converges to 0 uniformly in s ∈ [0, 1], completing the first part of the proof about the convergence to 0 of the distortion of the correspondence R n as defined in (38) when t = 0.
The second part of the proof is to show that sup s,t∈[0,1] n −1 a n R n (x r s,t n , x ⌊2nr s,t ⌋ ) → 0, where r s,t = s ∧ t and r s,t n is the index of the most recent common ancestor of x ⌊2ns⌋ and x ⌊2nt⌋ . However, note that (writing r = r s,t and r n = r s,t n ) n −1 a n R n (x rn , x ⌊2nr⌋ ) = n −1 a n x∈[xr n ,x ⌊2nr⌋ ]\{xr n ∧x ⌊2nr⌋ } e V (n,α) (x) ≤ (n −1 a n d n (x rn , x ⌊2nr⌋ ) sup x∈Tn e V (n,α) (x) . By (7) (which holds almost surely on Ω ′ ), n −1 a n d n (x rn , x ⌊2nr⌋ ) → 0 uniformly over s, t ∈ [0, 1], almost surely on Ω ′ . Similarly, by (24), for almost every ω ∈ Ω ′ we have that sup x∈Tn e V (n,α) (x) is bounded by a constant (which may depend on ω, but that can nevertheless be upper bounded on a set of probability 1 − ε for any ε > 0). Therefore, we deduce that (43) holds almost surely on Ω ′ , as required.
Part 2: convergence of measures.
Recall from (17) that for a non-root vertex x ∈ T n with #x offspring, where t xi is the minimal t such that (x i , p H (γ) (t)) ∈ R n , where x 0 denotes the parent of x and (x i ) #x i=1 denotes its children. Therefore, for any set A n of vertices in T n , we have that We introduce an intermediate measure by setting: Therefore, for a subset A n ⊂ T n , where u x is the minimal u such that (x, p H (γ) (u)) ∈ R n . We will now show that it suffices to considerν n in place of ν n to obtain the Prokhorov limit. For s ∈ (0, 1), let us set g (n,α) (s) = e −V (n,α) (2ns) .
Note that, if u x and t xi are defined as above, then ∆ g (n,α) := sup x∈Tn,xi∼x by the convergence of (24) and since [0, 1] is compact. Therefore, if A n ⊂ T n , then as n → ∞. Therefore, letting r n = dis(R n ) as in (38), it is sufficient to show that under the canonical Gromov-Hausdorff embedding F n = T n ⊔ T c γ , and hence converges to 0 as n → ∞ (by Part 1 of this proof). We proceed as follows. First, take a subset B ⊂ T c γ , and for s ∈ (0, 1) set if α = 1.
2n ) for 0 ≤ i < 2n and B n = ∪ 0≤i<2n {x ∈ T n : ∃s ∈ B ′ such that s ∈ I n,i and x = x i }, where x i corresponds to the i-th node of T n in contour exploration order. Clearly B ′ ⊂ ∪ i:xi∈Bn I n,i , and so Now note that if x ∈ B n , then there exists i ≤ 2n and s ∈ B ′ with s ∈ I n,i so that (x ⌊2ns⌋ , p H (γ) (s) ) ∈ R n . This further entails that B n ⊂ B rn , and so We now prove the reverse statement. Let A n be a set of vertices in T n , and let For any x ∈ A ′′ n , there exists t ∈ A ′ n with x = p H (γ) (t) and t ∈ I n,i for some i with x i ∈ A n . Moreover, this entails that i = ⌊2nt⌋, and hence (x ⌊2nt⌋ , p H (γ) (t)) ∈ R n . It follows that x ∈ A rn n , i.e. A ′′ n ⊂ A rn n ; hence it follows from (44) that Together with (45), this entails that d Fn P ((2n) −1ν n , ν φ ) ≤ r n . The desired result follows.
In the following corollary, applying [Cro18, Theorem 7.2], yields the quenched convergence of the LERRW to a mixture of diffusions, with the limit law of this process given by the annealed law P O as defined in (23).
Corollary 4.9. Let X (n) have the law of a discrete-time LERRW on T n , with initial weights (1) and reinforcement parameter ∆. Let X be as defined by (23). Then, P (α (n) ) On n −1 a n X (n) ⌊2n 2 a −1 n t⌋ t≥0 Proof. Recall that we are working on a probability space where the convergence of (7) holds almost surely. It follows that almost surely on this probability space, there exists a canonical embedding into the common metric space n −1 a n T n ⊔ T c γ in which the GHP distance between the random elements in Proposition 4.1 goes to zero as n → ∞. We work pointwise on this probability space so that we only need to consider the randomness of the Dirichlet weights.
We first consider a continuous-time LERRW with exp(1) holding times at each vertex. By our choice of metric and measure, it follows that the stochastic process associated with the discrete space (T n , R n , ν n ) has the quenched law of a RWDE also with exp(1) holding times at each vertex. Therefore, by Lemma 3.6 the continuous-time LERRW corresponds to the annealed law of this RWDE exactly as considered in (18). It therefore follows directly from [Cro18, Theorem 7.2] (the non-explosion condition (39) appearing there is satisfied since all the spaces are compact) that this continuous-time LERRW converges in distribution to X as defined by (23). The extension to the discrete-time LERRW in place of the continuous-time one then follows by a straightforward application of the law of large numbers on the time index.
Clearly Theorem 1.1 follows directly from Corollary 4.9.

Properties of T ∞ γ and its Gaussian potential
In order to understand the long-time behaviour of a LERRW it is more natural to consider it on an infinite critical Galton-Watson tree T ∞ as introduced in Section 2.4. Analogously to Section 3.2, one can define a RWRE on T ∞ determined by (ρ v ) v∈T∞ , V, R and ν on T ∞ , as defined in (17). Once again, the law of the LERRW can be represented as a RWRE as in Lemma 3.6.
With such offspring distribution as in (2), Kesten's tree T ∞ is well-known to converge to a non-compact version of a stable tree coded by two independent Lévy excursions and an appropriate immigration measure under rescaling [Duq09]; these are known as stable sin-trees and we will denote them by T ∞ γ for γ ∈ (1, 2) (in fact due to uniform re-rooting invariance one can also construct the same objects with only the Lévy processes, but we do not explore this here). Analogously to T c γ , one can define a canonical metric and a measure on T ∞ γ using the canonical projection from the coding Lévy processes, and additionally define a snake process φ (α) satisfying all the properties of Definition 3.8. Given these, we can define a resistance metric R φ and a measure ν φ on T ∞ γ exactly as in (21) and (22). (To keep the notation light, we only write "∞" explicitly on the spaces T ∞ and T ∞ γ and not on all the metrics and measures). For non-compact mm-spaces, GHP convergence naturally extends to Gromov-Hausdorff-vague (GHv) convergence which is equivalent to GHP convergence of metric balls B(O n , r) → B(O, r) for almost every r > 0 [ALW16, Definition 5.8]. The results of Proposition 4.1 extend straightforwardly to these balls and therefore we deduce the following proposition. The proofs to extend to the infinite-dimensional setting are not illuminating, but can be carried out using the same strategy as in the proof of [Arc20a, Theorem 1.2].
Exactly as in Corollary 4.9, it has the following immediate corollary, by [Cro18, Theorem 7.2]. Note that condition (40) of [Cro18, Theorem 7.2] is satisfied as a consequence of the unique spine to infinity.
Corollary 5.2. Let X (n) be the discrete-time LERRW on T ∞ , with initial weights as in (1) and reinforcement parameter ∆. Let X be the diffusion on T ∞ γ defined analogously to (23). Then n −1 a n X (n) ⌊2n 2 a −1 n t⌋ t≥0 Corollary 5.2 shows that the quenched scaling limit of LERRW, which we denote (X t ) t≥0 , can be represented as a mixture of diffusions in random environments parametrised by different realisations of the functional φ (α) . In particular, for a fixed realisation of T ∞ γ we do not have a pointwise correspondence between realisations of the law of φ (α) and realisations of (X t ) t≥0 , but instead we must average over φ (α) to get the equality in distribution. Nevertheless, we can transfer almost sure results for φ (α) directly to (X t ) t≥0 to prove Theorems 1.2 and 1.4. In this section we establish some almost sure properties of T ∞ γ and φ (α) . For some of the following propositions, the stated results are well-known but we could not find them explicitly written in the literature (for example Proposition 5.5). Therefore, we have provided an outline of the proofs but omitted some details (which would be long to justify and somewhat tangential to the main purpose of this paper).
Throughout this section, the notation d and µ refer to the measure and metric on the non-compact tree T ∞ γ . Proposition 5.3. P-almost surely, for any ε > 0, Proof. Set r m = 2 m . By monotone convergence (applied twice), scaling invariance of T ∞ γ , then monotone convergence again, By [DLG06, Theorem 1.4], the final probability on the RHS is 1 when considered on the compact stable tree T c γ instead of T ∞ γ . Combining again with scaling invariance, it therefore follows from [Duq09, Theorem 1.3] (which shows that T ∞ γ is a local limit of compact stable trees) that this probability is also 1 on T ∞ γ . Finally, since ε > 0 was arbitrary we can replace 1 with and δ > 0 and then with 0, and extend more generally to r → 0 since µ(B(O, r)) ≤ µ(B(O, 2 ⌈log 2 r⌉ ).
The second statement follows by the same proof using [DW11, Proposition 1.1] for T c γ .
It follows from [Duq09, Proposition 1.1] that we can define a local time measure (L (r) ) r>0 on T ∞ γ such that for any compactly supported continuous function g : (0, ∞) → R, The Similarly, we have the following. The construction of T ∞ γ in [Duq09] codes T ∞ γ by two Lévy processes plus two stable subordinators that represent immigration. The result of [Duq09, Proposition 1.3] shows that T ∞ γ is also the local limit of compact stable trees conditioned on their height going to infinity. Since compact stable trees satisfy the property of uniform re-rooting invariance, it is also possible to consider them to be coded by two halves of a Lévy bridge, rather than a Lévy excursion (by applying the Vervaat transform to the Lévy excursion that codes them). By taking a limit using this bridge representation, it follows that T ∞ γ can in fact be constructed solely by two Lévy processes, by imagining that the time 0 corresponds to the "tip" rather than the "base" of T ∞ γ . The argument proceeds exactly the same as in the proof in [Arc20a, Section 5.1]; we do not repeat it here, but just give the construction.
Construction of T ∞ γ 1. Let X and X ′ be independent γ-stable, spectrally positive Lévy processes.

Define a function X
3. Given s < t, let I s,t = inf r∈[s,t) X ∞ r . Say that s t if X ∞ s− ≤ I s,t and that s ≺ t if s t and s = t. Also set s ∧ t = sup{r ∈ R : r t and r s}. Finally, define an equivalence relation ∼ on R by setting s ∼ t if and only if d(s, t) = 0. We define and denote by π the canonical projection R → T ∞ γ , and set the root to be equal to π(0).
We will use this construction and in particular excursion theory for the Lévy processes to prove some results about T ∞ γ . (This is more convenient than using the construction of [Duq09] since we don't have to deal with the immigration). In what follows we will let N (·) denote the Itô excursion measure for X. This can be thought of as the "law" of an excursion of X, though it is not normalisable (see [CK14, Section 3.1.1] for a concise explanation).
For a subset A ⊂ T ∞ γ and a pseudometricd on A, let D(A, ε,d) denote the ε-packing number of (A,d); in other words, D(A, ε,d) is the maximal size of a collection of points (x i ) i≤D(A,ε,d) contained in A such that Proposition 5.6. For any p > 2γ γ−1 , there exists a constant C p < ∞ such that for any r, t > 0, Remark 5.7. This is not an optimal result, but is sufficient for our purposes.
Proof of Proposition 5.6. Without loss of generality we can assume that r > t; otherwise D(B (O, r), t, d) = 1. By scaling invariance, it is sufficient to show the result when r = 1.
We claim the following: for any x ∈ R, δ > 0, there exist C < ∞, c > 0 such that The result then follows since if B(O, 1) ⊂ π([−λ, λ]), we can divide the interval [−λ, λ] up into ⌈2t −γ γ−1 λ 2 ⌉ covering intervals of length t γ γ−1 λ −1 , and with high probability the diameter of each of these intervals is at most t. In particular, the probability that this does not happen is upper bounded by Any t-packing can have at most one point in each of these intervals (in fact, we have bounded the covering number), so we deduce that The result then follows by writing and performing the appropriate change of variables to apply the tail bound of (48).
By scaling invariance, for the general claim we just need to replace t with r −1 t, so we see that the claim holds for any p > 2γ γ−1 . We therefore just need to prove (47). We will use excursion theory for the Lévy process X ′ coding the left side of T ∞ γ (i.e. on the negative real line). We therefore let X ′ t = sup s≤t X ′ s denote the running supremum process of X ′ , and let (L(t)) t≥0 denote the local time of X ′ − X ′ at zero, normalised so that E exp{−λX ′ L −1 (t) } = e −tλ γ−1 (this is well-defined, e.g. [Ber96, Section VIII]). Moreover, we have by [Ber96, Section VIII, Lemma 1] that L −1 is a stable subordinator of index 1 − 1 γ , and [CK14, Proposition 3.1(ii)] that the measure X ′ s >X ′ s − δ(L(s), ∆ s (X ′ )) is a Poisson point measure with intensity dl · C γ x −γ dx. First bound. To prove the first statement, first note that X ∞ −t = −X ′ t − for t ≥ 0. It follows that new suprema of X ′ correspond to backwards minima of X ∞ from t = 0. Therefore, L = inf{s ≥ 0 : L(s) ≥ 1} corresponds to the ancestor of 0 on the infinite backbone with d(O, π(L)) = 1, and setting R = inf{t ≥ 0 : Let S denote a stable subordinator with Lévy measure C γ x −γ dx. We then have that
Second bound. We will prove the diameter bound for x = 0; the proof is the same for arbitrary x. Set x t = inf{s ≥ 0 : d(0, −s) > t}. P-almost surely, there is a unique path Γ t from O to π(x t ) in T ∞ γ of length exactly t. Moreover, the interval [−x t , 0] codes all the subtrees grafted to one side of this path. Each of these complete subtrees are coded by the Itô excursion measure N , and moreover, x t is equal to the sum of the lengths of all of these Itô excursions. In fact, the subtrees grafted to this side of the path Γ t form a Poisson point process on this path. Let M t,λ denote the number of subtrees of lifetime at least t γ γ−1 λ −1 grafted to Γ t at a point within distance t 2 of O. The subtrees grafted to Γ t are concentrated in groups at certain hubs along Γ t , and it follows from [Ber92, Corollary 1] that M t,λ stochastically dominates a Poisson random variable with parameter S t 2 N (σ > t γ γ−1 λ −1 , H < t 2 ), where S is a (γ − 1)-stable subordinator by [GH10, Proposition 5.6] (here σ denotes the lifetime of a Lévy excursion, and H the height of a tree it codes). Moreover, since this parameter is lower bounded by cλ 1 γ −p with probability at least 1 − Ce −cλ p(γ−1) . Therefore, On the event M t,λ > 0, it follows that x t ≥ t γ γ−1 λ −1 , so taking p = γ −2 , we deduce the second result of (47), which completes the proof.
We can extend the definitions of d (α) and φ (α) from Section 3.3 to the infinite tree T c γ . Just as we did there, for σ, σ ′ ∈ T ∞ γ , we set We also define φ (α) exactly as in Definition 3.8, but on T ∞ γ instead of T c γ . This means that φ (α) is a centred Gaussian process and that for all s, t ∈ R, These are well-defined on T ∞ γ by the same considerations as in the compact case. We now transfer the result of Proposition 5.6 to the metric d (α) .
Corollary 5.8. For any α < 1 and any η > 0, there exists P > 0 such that for all p ≥ P there exists Proof. Case 1: 0 < α < 1. Note that, since 1 − α ∈ (0, 1) in this case, we can assume that r > δ 2 Firstly, suppose that p > 2γq γ−1 for some q > 1. Then, by Jensen's inequality and Proposition 5.6, there exists ε > 0 such that We now set B = B(O, r), measured with respect to the metric d, and observe the following: if x, y ∈ B, then d (α) (x,y) 2 1 1−α ≤ d(x, y). Therefore, for any δ > 0, a δ-packing of B with respect to d (α) is a δ 2 1 1−α -packing of B with respect to d, so that Taking p and q as above and combining with (49), we therefore deduce that In particular, we can choose p and therefore q large enough so that this final exponent is less than η, which proves the result.
Taking p and q as above and combining with (49), we therefore deduce that In particular, we can choose p and q large enough so that this final exponent is less than η, which proves the result.
In the critical case the packing number instead grows logarithmically in r, as we see in the following proposition.
Proposition 5.9. Take α = 1. For any p > 1 γ−1 and any ε > 0, there exists an event A ε with P(A c ε ) < ε and a constant c γ,p,ε < ∞ such that for any r > 0, Proof. Given δ > 0, we construct a δ-covering of B(O, r) with respect to the metric d (1) , the size of which gives an upper bound for the δ-packing number. In particular, for i ≥ 0 we let S (i) δ denote the set of vertices at distance e iδ 2 − 1 from the root that have descendants at distance e (i+1)δ 2 − 1 from the root. We claim that ∪ Note that i x ≥ 2 since we are assuming that d(O, x) ≥ e δ − 1. Then let A x be the ancestor of x at distance e δ 2 ix − 1 from the root, and let A ′ x be the ancestor of x at distance e δ 2 (ix−1) − 1 from the root. Then, since d(O, x) ≤ e δ 2 (ix+1) − 1, we have that δ . We therefore turn to bounding |S

By Proposition 5.4, it follows that we can choose
where the symbol s.d.
denotes stochastic domination between random variables. Therefore, if p > 1 and δ > 0 is sufficiently small, Proposition 5.10. (i) P × P-almost surely, for any α < 1, β > 1 2 , (ii) P × P-almost surely, for any β > 1 2 , Proof. Case α < 1. The result is a consequence of [vdVW96, Theorem 2.2.4 and Corollary 2.2.5]. First, fix some p > 2γ γ−1 , and note that by Gaussianity, we have for any s, t > 0 that P-almost surely, where C p is a deterministic constant. Rearranging, we deduce that By Corollary 5.8, we can also assume that p is large enough that we can upper bound this by Therefore, if β > 1 2 , we can choose some η < β − 1 2 and apply Markov's inequality to get that Therefore, applying Borel-Cantelli along the subsequence r n = 2 n , we deduce that sup s,t∈B(O,rn−1) as r → ∞, which proves the result.
6 Reinforced strongly recurrent regime: α < 1, ∆ > 0 In this section we will prove Theorem 1.2. Recall from (23) that, given a realisation of T ∞ γ , the limiting diffusion X has the annealed law of a diffusion in a random potential φ (α) , i.e.
We will in fact prove the analogue of Theorem 1.2 for the quenched law of this diffusion, i.e. P × P ×P O,φalmost surely. This clearly implies the same result for the annealed process, and therefore for X.

Volume and resistance growth in T γ,α
We start with some asymptotics for balls with respect to the distorted metric R φ . Here B φ denotes the open ball measured with respect to R φ , as defined by (21): Proposition 6.1. P × P-almost surely, for any ε > 0 we have that for all sufficiently large r.
Proof. Take some β ∈ ( 1 2 , 1) and some ε > 0. Also set b = 1 − α. By Proposition 5.10(i), there almost surely exists R < ∞ such that Without loss of generality, we also assume that , which we will just use in the case b ≤ 1. Note that this almost surely exists and is unique, since T ∞ γ is a length space. Moreover, d(O, z) > R for all z ∈ [y, x] by our choice of R, which implies that z) b for all such z. We therefore have that For the second inclusion, we keep β, R and y as above, and now assume that d(O, x) < r. Also set (This is always possible since φ (α) is continuous with bounded sample paths, so is bounded on the compact set B(O, R), and so sup w∈∂B We then have for all r >r c ∨ R and all x ∈ B(O, r) that In the case b > 1 we bound this above by (using that A ∆ bR b−1 ≥ 1): If b ≤ 1, we instead bound this above by (using that which completes the proof since ε was arbitrary.
Proposition 6.2. For any α < 1, we have that P × P-almost surely, for any ε > 0, for all sufficiently large r.
Proof. Fix ε > 0, and let M φ (r) be the smallest cardinality of a set of points in such that any path passing from O to B φ (O, r) c must pass through one of the points in the set. Similarly to [BK06,Lemma 4.5], we note that since any such set is a cutset, it follows from the parallel law for resistance that Now chooseε small enough that (1 +ε)(1 − ε) < 1 −ε. By Proposition 6.1, it follows that any cutset separating where M (r) is the smallest cardinality of a set of points separating By scaling invariance of the stable tree, M (r) (d) = M (e A∆(1+ε) ) for all r > 0, so we bound the latter quantity, and set so that we are in fact counting subtrees from level 1 − δ to level 1.
The "size" of generation 1 − δ can be formally measured by the local time measure of Proposition 5.4. Moreover, conditional on the total local time measure at level 1 − δ being equal to L, it follows from the construction on page 29 that the number of subtrees emanating from level 1 − δ that reach level 1 is one more than a Poisson random variable with parameter 2LN (H > δ), where N is the Itô excursion measure, so that N (H > δ) = c γ δ − 1 γ−1 for a deterministic c γ ∈ (0, ∞) [GH10, Proposition 5.6]. Moreover, M (e A∆(1+ε) ) is equal to the number of such subtrees. Again letting L denote the total local time measure at level 1 − δ, we have from (a minor adaptation of) [DLG06, Proposition 5.2] that there exists c > 0 such that P(L > λ) ≤ cλ −(γ−1) , uniformly over δ ∈ [0, 1]. From a Chernoff bound, we therefore deduce that We therefore deduce from Borel-Cantelli and (50) that if we take r n = 2 n and λ n = (log r n ) 1+ε γ−1 , then P × P-almost surely, we have that n for all sufficiently large n. By monotonicity, this implies that R φ (O, B φ (O, r) c ) ≥ r 1−3ε for all sufficiently large r. Since ε was arbitrary and the upper bound R φ (O, B φ (O, r) c ) ≤ r holds trivially, this proves the result.

Proof of Theorem 1.2
We now have all the tools to prove Theorem 1.2.
Proof of Theorem 1.2. Again set b = 1 − α. We work pointwise on the probability space Ω ′ on which T ∞ γ and the environment φ (α) are defined. Recall from (23) that given T c γ and φ (α) ,P O,φ denotes the (quenched) law of the corresponding RWRE. We will show that the statement of Theorem 1.2 holdsP O,φ almost surely on Ω ′ . This then transfers to the annealed law via (23). Since the quenched law of the LERRW limit is equal to the annealed law of the RWRE, this proves the result.
Throughout we assume that T c γ and φ (α) are fixed and therefore just writeP in place ofP O,φ . We first show that,P -almost surely, Letting T B(O,r) denote the exit time of X from B(O, r), we will in fact show that, almost surely, for all sufficiently large r. For notational convenience, set t = e A∆r b and τ r = T B(O,r(1+ε)) . Also let T r be a geometric random variable, with parameter p r given by where τ O is the hitting time of O (p r is well-defined by [AEW13, Proposition 1.9]). We can then writẽ where (ξ i ) Tr −1 i=1 are the times to travel from ∂B(O, 1) to O and back to ∂B(O, 1), conditional on not hitting ∂B(O, r(1 + ε)) first, from a "best case" starting point on ∂B(O, 1). Letting L(t) = log t, we then have that To bound the first term, note that since ∂B(O, 1) is a cutset separating O from ∂B(O, r(1 + ε)), we can apply Propositions 6.1 and 6.2 to deduce that P × P-almost surely for any ε > 0, we have for all sufficiently large r that Recalling that t = e A∆r b , we therefore have for all sufficiently large r that p r tL(t) ≤ A ∆ r b e − 1 4 A∆εr b , and hence converges to zero as r → ∞. It follows that for all sufficiently large r, Now take any s > 0. To bound the second term of (52), we follow the approach of [Kum04, Section 4] and use the Markov property to writẽ  , 1)).
Note that ν φ (B(O, 1)) is P × P-almost surely finite and non-zero by Proposition 6.3, and the same holds for sup x∈B(O,1) R φ (x, B(O, 1) c ) by Proposition 6.1. It therefore follows from (54) that there P × P-almost surely exist constants c 1 < c 2 ∈ (0, ∞) such that for all s ≥ 0, Rearranging, we see thatP To return to (52), note that the sequence (ξ i ) in the sum there stochastically dominates a sequence of independent copies of T B(O,1) (since the above bounds hold P × P-almost surely, we can work pointwise on the probability space so that the tree and the environment are fixed, so we are only considering randomness under P (·)). Therefore, letting ξ i denote independent copies of T B(O,1) (conditional on our particular realisations of T ∞ γ and φ (α) ), and recalling again that t = e A∆r b , we have from [Bar98, Lemma 3.14] that P × P-almost surely there exist c 3 , c 4 ∈ (0, ∞) such that for all sufficiently large r. In particular, substituting the bounds in (53) and (55) back into (52) we have that, P × P-almost surely,P for all sufficiently large r. Therefore, applying Borel-Cantelli along the sequence of integer r, we deduce that P × P ×P -almost surely, for all sufficiently large r, or in other words, for all sufficiently large t. Since ε was arbitrary, this proves the upper bound. To prove the lower bound, i.e. that lim sup we take ε > 0 and set Then, again using [Kum04, Equations (4.5), (4.6) and (4.7)] and Markov's inequality, we have that is almost surely finite. Now choose some δ = δ(ε) > 0 small enough that (1 + δ)(1 − ε) b < 1. By Propositions 6.1 and 6.2, we therefore have that, almost surely for all sufficiently large t,P which vanishes as t → ∞. Applying this along with Fatou's Lemma, we therefore deduce that Since ε was arbitrary, this completes the proof.
7 Reinforced critical regime: α = 1, ∆ > 0 In this section we prove Theorem 1.4. The strategy is the same as in Section 6, but due to the logarithmic factors we obtain different estimates.
In contrast to the previous section, this time where B ∆ = √ 4∆ and A ∆ = ∆ − 1.

Volume and resistance growth in T γ,0
Again it is natural to start with some volume and resistance estimates.
Proposition 7.1. P × P-almost surely, for any ε > 0 we have that for all sufficiently large r.
Proof. Take some β ∈ ( 1 2 , 1), β ′ ∈ ( 1 2 , β) and some ε > 0. By Proposition 5.10(ii), there P × P-almost surely exists R < ∞ such that (log R) β < ε 2 log R and Similarly to Proposition 6.1, provided that r is sufficiently large (depending only on R and ∆) we have that For the second inclusion, we keep R, y, β and β ′ as above, and now assume that d(O, x) < r. Also set (This is always possible since φ (1) is continuous with bounded sample paths, so for any fixed R > 0, φ (1) is bounded on the compact set B(O, R), and so the supremum is almost surely finite).
We then have for all sufficiently large (depending on R and ∆) r and all x ∈ B(O, r) that which completes the proof.
Proposition 7.2. For α = 1, we have for any ε > 0 that P × P-almost surely, for all sufficiently large r.
Proof. The proof is similar to that of Proposition 6.2. Fix ε > 0, and let M φ (r) be the smallest cardinality of a set of points in such that any path passing from O to B φ (O, r) c must pass through one of the points in the set. Since any such set is a cutset, it follows from the parallel law for resistance that It follows from Proposition 7.1 that M φ (r) ≤ M (r) almost surely for all sufficiently large r, where M (r) is the smallest cardinality of a set of points separating almost surely for all sufficiently large r.
Let (L (s) ) s≥0 be as in (46). The same arguments that led to (51) entail that We therefore deduce from Borel-Cantelli that if we take r n = 2 n , then P × P-almost surely, we have that Proposition 7.3. Take any β > 1 2 . Then P × P-almost surely, (r+1)) β for all sufficiently large r.
Proof. Again the proof is similar to that of Proposition 6.3. We treat the case A ∆ < γ γ−1 first, starting with the upper bound. Take some β ∈ ( 1 2 , 1). By Proposition 5.10(ii) and since φ (1) is continuous, we can P × P-almost surely choose R ≥ 1 and ε, p < ∞ so that We then have from Proposition 5.5 that for any δ > 0, almost surely for all sufficiently large r, Since β, δ > 0 were arbitrary this implies the upper bound. If A ∆ = γ γ−1 we obtain a factor of log r from the sum, but this is dominated by the e (log r) β+δ term, so the result still holds.
The lower bound is simpler since almost surely for all sufficiently large r by Proposition 5.3. This proves the result since β was arbitrary. Clearly the result holds trivially if A = γ γ−1 . If A ∆ > γ γ−1 the calculation in (57) shows that ν φ (T ∞ γ ) < ∞ almost surely.

Proof of Theorem 1.4
We now have all the tools to prove Theorem 1.4.
Proof of Theorem 1.4. As in the proof of Theorem 1.2, we assume that T c γ and φ (1) are fixed and therefore just writeP orP O in place ofP O,φ .
Take β ′ ∈ ( 1 2 , 1) and assume first that A ∆ ≤ γ γ−1 . Letting g B (·, ·) denote the Green's function for the diffusion killed on exiting the ball B(O, r) we firstly have from Propositions 7.1 and 7.3 that, P × P-almost surely for all sufficiently large r, Therefore, for any β > 1 2 we can choose β ′ ∈ ( 1 2 , β) so that Markov's inequality gives for all sufficiently large r. In particular, Fatou's Lemma gives that so we similarly deduce from Markov's inequality and Fatou's Lemma that For the upper bound, we use [Cro18, Lemma 4.2(b)] which states that for any t > 0 and any δ ∈ (0, R φ (O, B φ (O, r) c )), Set δ = re −(log r) β for some β > 1 2 , and choose β ′ ∈ ( 1 2 , β). Note that A ∆ +1 e (log(r+1)) β ′ ) eventually almost surely by Proposition 7.1, eventually almost surely by Proposition 7.2, (δ+1)) β for all sufficiently large δ by Propositions 7.1 and 7.3. If A ∆ ≥ γ γ−1 , then there exists (a random) In the case A ∆ ≤ γ γ−1 , we take some η > 0 and set t = r 2γ−1 (γ−1)(A ∆ +1) e −(log r) β+2η . We then deduce from (58) that, on the almost sure events above and provided r is sufficiently large, In particular, if r n = 2 n we have from Borel-Cantelli that If r ∈ [r n , r n+1 ], this implies that for all sufficiently large r. By inverting this relation we deduce the result. If instead A ∆ ≥ γ γ−1 we instead take t = re −(log r) β+2η and (58) instead gives that where C V > 0 is a (random) constant that depends on V . The final result can be deduced from Borel-Cantelli exactly as above.
8 Reinforced transient regime: α > 1, ∆ > 0 In the transient regime resistance does not provide the right framework to characterise the scaling limit of LERRW, since the resistance between pairs of points in the correspondence collapses to zero under any rescaling. However, we can still use resistance when working directly with the unrescaled process to understand its exit times from a ball of radius r.
Accordingly, we let (Y t ) t≥0 denote a constant speed continuous-time LERRW on T ∞ as given by Definition 2.3 with offspring distribution satisfying (2) (the final result transfers directly to a discrete-time LERRW by the strong law of large numbers applied to the time index). For convenience, we assume that the function a n appearing in (2) is of the form a n = cn 1/γ ; however one can also incorporate a slowly-varying correction and "pull it through" in all the proofs that follow. More precisely, (Y t ) t≥0 has the annealed law of a continuoustime random walk in the Dirichlet random environment of (16) on the infinite tree T ∞ with exp(1) holding time at each vertex and initial weights given by (16)). We will work primarily with the RWRE in this section and denote the law of the environment by P. The discrete setup of Section 4.2 still holds, so that for y ∈ T ∞ , cf. (17), we can write Let O ∞ = b 0 , b 1 , . . . denote the backbone vertices of T ∞ (these are the special vertices of Definition 2.3), ordered by their distance from the root. Given i ≥ 1, also let T i denote the (infinite) subtree of T ∞ rooted at b i , and for any integer r > 0 let B T i (b i , r) denote the ball of radius r around b i in this subtree, defined with respect to the graph metric. We use the notation B R to denote a ball defined with respect to the metric R defined above.
We start with a brief lemma on the structure of T ∞ \ T r .
Proposition 8.1. P-almost surely, for any ε > 0, Proof. Let T denote an unconditioned Galton-Watson tree with the same offspring distribution as T ∞ . By conditioning on the heights of the subtrees grafted to the infinite backbone, it follows from Definition 2.3, [Arc20b, Lemma A.2], [Sla68, Theorem 2] and a union bound that there exists c < ∞ such that for any p > 0, r ≥ 1, λ ≥ 1, Taking p = 1 γ(γ−1) gives an upper bound of order λ − 1 γ . Now setε = ε 2 , apply Borel-Cantelli with λ r = (log r) γ+ε along the subsequence r n = 2 n , then use monotonicity and divide through by (log r)ε for the result.
The difference with the recurrent regime is that typical terms of the form log ρ v will now be negative. In particular, we show in the Appendix in Lemmas A.4 and A.5 that, as d T∞ (O ∞ , y) → ∞, We start with the following bound on the behaviour of the potential on T ∞ .
Lemma 8.2. For any δ > 0, P-almost surely, y)). It follows from Chebyshev's inequality and the bounds of (61) that there exists a constant c such that for any y ∈ T ∞ and any Applying Borel-Cantelli, we therefore deduce that for any δ > 0, Moreover, if r ∈ [2 m , 2 m+1 ] and δ < 1, then as m → ∞ by (61) and Chebyshev's inequality. It therefore follows that The last equality holds from (63). This proves (i) since δ > 0 was arbitrary.
(ii) We first prove (ii) when y is on the backbone. First take m ≥ 1 and note that as m → ∞ by part (i). Now note that if y ∈ T ∞ with b k y for some k ≥ m and y is not on the backbone, then it follows from the Dirichlet construction of Section 3.2 that R(b k , y) and ( u≺v log ρ u ) v∈[bm,y] are distributed as R(b k , b dT ∞ (O∞,y) ) and ( u≺v log ρ u ) v∈[bm,b d T∞ (O∞ ,y) ] respectively. Part (ii) follows.
Recall that our aim in this section is to prove Theorem 1.5. Our strategy to do this will be to divide the ball B(O ∞ , r) up into smaller sets which must all be traversed by the RWRE of (16) before it can exit B(O ∞ , r). We then apply a well-known chaining technique to bound the sum of all the exit times from these smaller sets. We then transfer this back to the LERRW using (18).
Before doing this, we give some definitions of good and typical vertices and establish some of their properties.
Definition 8.3. Fix some δ > 0 and then fix some A ′ > A > 1 2 + δ. To ease notation in the rest of this section, we view these parameters as fixed and do not record them as indices. 2. We say that a vertex y ∈ T ∞ is m-good if b m y and and that y is m-bad otherwise.
3. For c, C, m ∈ [0, ∞), r > 2m and λ > 1, we say that an index 0 ≤ i < λ is (m, r, c, C, λ)-typical if We will use the fact that for typical indices i, we can control (in probability) the time for the RWRE to exit a ball of the form B T r i (b ri ,r). Moreover, it is highly likely that most indices are typical. We make this precise in the following lemma.
Lemma 8.4. Take A ′ > A as in Definition 8.3, and A ′′ > A ′ .
Proof. (i) To prove point (i), note from Lemma 8.2(i) that we almost surely have for all sufficiently large r and all λ > 1 that The calculation for the upper bound follows similarly.
(ii) For the second point, take some i ≥ 1 and some y ∈ B T r i (b ri ,r) and choose some 0 < ε ≪ 1 96 . Note that, if y is m-good, then similarly to (64), it holds that Therefore, by Lemma 8.2(ii), which says that y is m-good whp provided d T∞ (O ∞ , y) and m are sufficiently large, we P-almost surely have that for all sufficiently large m, if we fix some r > 2m, i ≥ 1 and y ∈ B T r i (b ri ,r), then Now fix m large enough satisfying the above, and take any r > 2m. By Markov's inequality and the inequalities above, it P-almost surely holds for all sufficiently large r and any i ≥ 1 that Therefore the result follows by a union bound over the relevant events above.
These events are not independent for distinct i. However, by Lemma 8.2 we can choose m so that the probability appearing in Lemma 8.2(ii) is at least 1−ε, and define the event A ε to be the corresponding high probability event. Then on the event A ε we can control V (b ri ) and R(O ∞ , b ri ) for all i (for sufficiently large r), conditionally on which the tail bounds above hold independently for each i, so if N = |{i : i is (m, r, c, C, λ)-typical }|, then N stochastically dominates a Bin(2λ, 7 8 ) random variable. In particular this establishes the claim.
In the proof of Theorem 1.5 we assume that ε > 0 has been fixed and c, C are as in Lemma 8.4(iii).
Proof of Theorem 1.5. We proceed in three steps: 1. Firstly we couple T ∞ and its Dirichlet weights with a modified modelT ∞ =T ∞ (r, λ) such that (using a hat to denote analogous quantities in this model), lettingτ i denote the time to hit b ri+1 starting from b ri , eachτ i is stochastically dominated by its analogous quantity in T ∞ . We then show that, with probability at least 1 − ε, we can obtain comparable upper and lower bounds for E[τ i ] in the modified model for all typical indices i. We use these bounds to show that there P-almost surely exists a constant κ α ∈ (0, ∞) and an event A ε such that P(A c ε ) < ε (the same event as in Lemma 8.4(iii)) and such that, provided r γ γ−1 λ −γ γ−1 e − γ γ−1 (log r) A ′ ≥ rλ −1 , we have P-almost surely have for all 0 ≤ i < λ thatP for each (m, r, c, C, λ)-typical i.

2.
We then apply a chaining result of [Bar98] to deduce that, P-almost surely, 3. Using the stochastic domination, we then transfer this result to the analogous sum N i=1 τ i on the original tree. Using this tail bound, we bound the hitting time of b r with a Borel-Cantelli argument.
Step 1. Fix ε > 0 and δ > 0, and choose m ′ large enough that the probability in Lemma 8.2 is at least 1 − ε. Let A ε be the corresponding high probability event. For any r > 2m ′ , we can proceed as follows.
Set m = ⌊ r 2 ⌋. The bound of Lemma 8.2(ii) still holds since this can only increase the value of m, and for the rest of the proof we work on the event A ε , which in particular implies that Given r > 1, λ > 1 and T ∞ , we defineT ∞ =T ∞ (r, λ) from T ∞ as follows: • First remove all m-bad vertices and their incident edges from T ∞ .
• Then also remove all vertices in (For the vertices and edges that are retained inT ∞ , they retain the measures and resistances that they previously had in T ∞ ). The resulting structure isT ∞ . Note that the definition of m-good and Lemma 8.2 ensures thatT ∞ is really a connected tree for all sufficiently large r.
We use hat notation to denote all analogous quantities inT ∞ . For each i < λ, we consider the subtreeT ri (rooted at b ri ) and letτ i denote the exit time fromT ri \T ri+1 in the subtreeT ri for a random walk started at b ri (this means that the random walk can only exitT ri \T ri+1 through b ri+1 ; since the random walk eventually has to pass through b ri to exitT ∞ \T ri and further excursions back towards the root and in subtrees can only slow the random walk down, it is clearly sufficient to restrict to this set, and clearlyτ i is stochastically dominated by τ i for all i).
Step 2. Since (68) holds for all (m, r, c, C, λ)-typical indices i, and recalling that N = N (m, r, c, C, λ) denotes the number of (m, r, c, C, λ)-typical indices, we therefore have from [Bar98, Lemma 3.14] that on the event A ε , it holds for all t > 0 that In particular, taking λ = e (log r) A ′ +δ (which satisfies the conditions required in step 1 provided that r is sufficiently large) and t = r )λ 1−δ gives that, for all sufficiently large r, Consequently, combining with Lemma 8.4(iii) and a union bound we deduce that, on the event A ε , ≤ exp −e (log r) A ′ +δ 12 + exp −e (1− δ 4 )(log r) A ′ +δ .
Step 3. Since adding back the m-bad vertices and then adding the times to traverse balls corresponding to non-typical i can only increase the total exit time, the same tail bound clearly holds for the exit time of T ∞ \ T ri . By Proposition 8.1, we can also choose R large enough that the event B ε := sup r≥R Diam(T ∞ \ T r ) r(log r) γ+ε < 1 satisfies P(B ε ) ≥ 1 − ε.
Consequently, it follows from Borel-Cantelli along the subsequence r n = 2 n and monotonicity of T r that almost surely, 2γ−1 e −(log r) A ′ +2δ ½{A ε , B ε } ≤ r(log r) γ+ε , for all sufficiently large r. Since A ε and B ε both have P × P-probability at least 1 − ε, this implies that with overall probability at least 1 − 2ε, we have that for all sufficiently large t. Since A ′ > 1 2 , δ > 0 and ε > 0 were arbitrary, this implies the result for the annealed lawP , and therefore for the LERRW.
Remark 8.5 (Application to LERRW on Z + ). We can also apply these results to LERRW on the infinite half line with these initial weights. Note that Lemma 8.2(i) also applies in this setting, to give that, for any δ > 0, P   0<i≤r log ρ i + α log r ≥ (log r) In particular, this implies that P × P-almost surely, there exist constants c > 0, C < ∞ such that for all sufficiently large r, r α+1 e −(log r) By Markov's inequality, Borel-Cantelli along the sequence r n = 2 n and monotonicity we therefore get that T r ≤ r 2 e (log r)

A Appendix
In the appendix we prove Claims 4.3, 4.4 and 4.5, regarding the expectation, variance and moment generating function of the random variables (log ρ x ) x∈Tn .
In the appendix we always work conditionally on T n . In particular, we will work pointwise on the probability space (Ω, F , P) (on which we defined (T n , d n , µ n )), and on which the convergence of (4) holds almost surely. This means that most of the statements that follow should really be written conditionally on T n . To make the arguments clearer to follow, we have not written this explicitly in the statements or the proofs, and instead ask the reader to keep this in mind throughout.
Moreover, a statement of the form a n = a + o(1) almost surely on Ω means that a n → a almost surely and therefore that the o(1) term is not necessarily bounded uniformly on Ω. However, it will always be true that for any ε > 0, the o(1) term can be bounded uniformly on a set of probability at least 1 − ε. The same holds for O(·) terms. Recall that we simplified notation by writing ∆ n = ∆(na −1 n ) −(1−α) , if α < 1, ∆, if α = 1, |x| = d n (O n , x).

A.1 Properties of the digamma function
The mean and variance of terms of the form log ρ x can be expressed in terms of the digamma function, defined for z > 0 by ψ(z) = Γ ′ (z)/Γ(z), where Γ is the gamma function.
Note that, for each n ≥ 1 and each x ∈ T n , we have that ρ x = 1−px px , where p x has the beta distribution with some positive parameters (a x , b x ). This entails that ρ x ∼ β ′ (a x , b x ) and that 1 ρx ∼ β ′ (b x , a x ), where β ′ is the beta prime distribution.
In our case, we have for each n ≥ 1 and each x ∈ T n , that and (ρ x ) x∈Tn\{On} is a sequence of independent random variables. Next, we record three properties of the digamma function that we will use throughout this section.
A.5 Expectation of the potential when α > 1 In the transient regime α > 1 considered in Section 8, we do not rescale the initial weights so for each x ∈ T ∞ we have Lemma A.4. Assume α > 1. Then for almost every ω ∈ Ω, the following holds. As |y| → ∞, we have that Proof. First note that it follows from (72) that for any y ∈ T ∞ , For the first term in (79), we use (74) which implies that, as |y| → ∞, For the second term, since i α 2∆ → ∞ as i → ∞, it follows from (73) that as i → ∞, (1)).
A.6 Variance of the potential when α > 1 Lemma A.5. Assume α > 1. Then for almost every ω ∈ Ω, as |y| → ∞ we have that Proof. It follows from (72) that for all t ∈ [0, 1], Since both arguments in the derivatives of the digamma function blow up as i → ∞, it follows from (75) that as i → ∞, (1)).

This implies that
O∞≺x y as |y| → ∞.