A branching process with coalescence to model random phylogenetic networks

We introduce a biologically natural, mathematically tractable model of random phylogenetic network to describe evolution in the presence of hybridization. One of the features of this model is that the hybridization rate of the lineages correlates negatively with their phylogenetic distance. We give formulas / characterizations for quantities of biological interest that make them straightforward to compute in practice. We show that the appropriately rescaled network, seen as a metric space, converges to the Brownian continuum random tree, and that the uniformly rooted network has a local weak limit, which we describe explicitly.


Biological context
Random trees play a central role in evolutionary biology: ultimately, much of what we know about evolution relies on a random tree being used as a null model. Meanwhile, the genomic revolution of the past decades has shown that phenomena once thought to play a minor role in large-scale evolution, such as hybrid speciation [36,37] or horizontal gene transfers [8,26,40], are in fact widespread and crucial to our understanding of evolutionary processes. As a result, there have been growing calls by biologists to replace trees by networks when studying phylogenies [6,17,18], which lead to the emergence of the flourishing field of phylogenetic networks (see e.g. [32] for a recent review).
Despite this, there is still a notable lack of biologically relevant, mathematically tractable models of random phylogenetic networks. To the best of our knowledge, so far only two models of random phylogenetic networks have been studied extensively from a probabilistic standpoint: uniform ranked tree-child networks [9] and uniform level-k networks [48]. Uniform ranked tree-child networks are generated by a biologically natural process where species split at constant rate and pairs of species hybridize at a constant rate. They turn out to be highly tractable [9,12,25]. However, they fail to take into account the fact that phylogenetically distant species are less likely to hybridize than closely related ones, which results in a very non-tree-like structure whose biological relevance is questionable. By contrast, uniform level-k networks have a tree-like large-scale structure [48]; but they do not have a biological interpretation that would justify their relevance as a model of random phylogenetic network, and they are not as mathematically tractable (at least for generic values of k).
In this work, we introduce a model of random phylogenetic network that has a natural biological interpretation while remaining mathematically tractable. The idea of this model is to consider species that (1) speciate and go extinct at constant rates and (2) hybridize, subject to some constraints: each species has a type, which can be thought of as a proxy for the genetic distance, and species of the same type hybridize at a constant rate. Types are created at a constant rate in an infiniteallele fashion, and inherited by descendants. The formal description of the model is given in the next section, along with an overview of our main results.

Setting and main results
Starting from one colored lineage at time t = 0, consider the continuous-time interacting particle system where: • each lineage splits into two lineages at rate 1 (branching); • each lineage dies at rate α > 0 (death); • each pair of lineages of the same color merge at rate 2β > 0 (coalescence); • each lineage takes a new, never-seen-before color at rate µ > 0 (mutation).
As illustrated in Figure 1, this process defines a time-embedded random network which can be seen as a random metric measure space ( G, d G , λ G ). Formally, a point The vertical axis is the time, flowing from top to bottom, and the vertical lines represent the lineages. Dots correspond to mutations (i.e. to a lineage changing color) and crosses correspond to deaths (i.e. to a lineage stopping). Horizontal lines correspond to either branching or coalescence, and serve to indicate the genealogical relationship between lineages; they should be treated as having length 0.
x ∈ G corresponds to a lineage ℓ and a time t at which that lineage is alive. Since the lineages can be seen as segments, G can be seen as a collection of segments glued together at their endpoints, and λ G as the usual Lebesgue measure on this union of segments.
There is a natural metric d G on G, obtained by defining d G (x, y) to be the length of a shortest path between two points x, y ∈ G. Since genetic material cannot be transmitted back in time, a more biologically relevant notion of distance between two points x, y ∈ G would consist in considering only the paths that lie in the past of the focal points. Letting h(x) = t denote the height of a point x = (ℓ, t) ∈ G and x ∧ y the most recent common ancestor of x and y, this notion of distance can be expressed as h(x) + h(y) − 2h(x ∧ y).
However, this does not define a distance in the mathematical sense, as the triangle inequality is not satisfied when the network contains coalescence points.
In this document, we are mostly interested in the structure of G conditioned on being large. More specifically, we consider ( G n , d Gn , λ Gn ), the metric measure space having the law of ( G, d G , λ G ) conditioned on having n colors, and we study various limits of G n as n goes to infinity.
For this, it will be convenient to see G as a decorated Galton-Watson tree. For each color k, let X k denote the subnetwork of G formed by the lineages of color k, endowed with the information of which endpoint corresponds to the creation of the color k (henceforth referred to as the root of X k ) and of which of the other endpoints correspond to mutations as opposed to deaths. Let T denote the genealogical tree of the colors -that is, the ordered tree whose vertices are the colors of the lineages and where k ′ is a child of k if and only if k ′ was created by the mutation of a lineage of color k, the children of a color being ordered according to the order of apparition of the corresponding mutations. Finally, let T ⋆ denote the tree T where each vertex k is decorated by the corresponding network X k . Note that G and T ⋆ contain the same information, since to reobtain G from T ⋆ it suffices to glue, for each color k and each child of that color, the root of the decoration the i-th child of k to the endpoint of X k corresponding to its i-th mutation.
In Section 2, we list miscellaneous results that are used throughout the document, starting with properties of the process describing the dynamics of the number of lineages of a given color (the so-called logistic branching process). We then study the random variable M giving the number of new colors that a color produces over its lifetime, i.e. the offspring distribution of the Galton-Watson tree T. We show that its expected value is given by where ρ k = α + µ + (k − 1)β. Since every color has an almost surely finite lifetime, the process generating G goes extinct with probability 1 if and only if E(M ) ⩽ 1.
We also show that the probability generating function of M can be expressed as the continued fraction This expression makes it straightforward to numerically compute the probability of extinction of the process -that is, the smallest fixed-point of g in [0, 1]. Similarly, we give a characterization of the asymptotic growth rate of the total number of lineages that makes it possible to compute it in practical applications.
In Sections 3 and 4, we study the geometry of the network G n . Section 3 deals with the global, large-scale structure of G n as n goes to infinity. This structure is tree-like: in Theorem 3.12 we show that, letting | G n | = λ Gn ( G n ), for some well-characterized constant C the rescaled space G n , C √ n d Gn , 1 | Gn| λ Gn converges to Aldous' Brownian continuum random tree in distribution for the Gromov-Hausdorff-Prokhorov topology. Finally, Section 4 focuses on the local structure of G n : we show that G n rooted at a uniform point has a local weak limit, which we describe explicitly.

Comments and perspectives
Our proof of the convergence to the CRT is based on [48], where most of the ideas that we use in Section 3 can already be found. Nevertheless, some specificities of our model -in particular the fact that the number of new colors produced by a color during its lifetime and the total length of the corresponding subnetwork are not bounded random variables -require a different treatment and have necessitated a fine-grained study of the logistic branching process with mutation. The existence of various local weak limits was also obtained in [48]. The ideas are similar, insofar as we are also dealing with local weak limits of blow-ups of Galton-Watson trees. However, there are some notable differences -such as the fact that our focal point is chosen uniformly with respect to the length measure of our time-embedded network (as opposed to uniformly on the vertices of the underlying graph) and that we are able to give a more explicit description of the local geometry of the limit.
In an effort to make this paper accessible to mathematical biologists who do not have specific knowledge about Galton-Watson trees or Gromov-Hausdorff-Prokhorov convergence, we have strived to make it as self-contained as possible by (1) providing detailed reminders about most of the notions and results that are used and (2) whenever possible, expressing our results as general statements that are not tied to our particular setting. In particular, Proposition 3.4 provides a general recipe for proving Gromov-Hausdorff-Prokhorov convergence to a random R-tree, and Lemma 3.10 makes it straightforward to apply this proposition to decorated Galton-Watson trees.
We close this introduction by mentioning an interesting line of research: our study hinges on the fact that our model can be seen as a decorated Galton-Watson tree. This crucial connection stems from the fact that the hybridization rate is a 0-1 function of the phylogenetic distance, which has the simple form d(ℓ, ℓ ′ ) = 1 if ℓ and ℓ ′ are the same color, and 0 otherwise. However, from a modelling point of view it would be more natural to use a more nuanced notion of phylogenetic distance, and to let the phylogenetic distance be a gradually decreasing function of that distance.
For instance -as a first step and in keeping with the idea of colors representing incompatibility alleles -one could let lineages carry several colors, and make the hybridization rate between two lineages a decreasing function of the number of colors that differ between these lineages. Based on the biological interpretation, one might expect such a model to have properties that are very similar to our model. However, because the link with Galton-Watson trees is lost, it is not clear whether this is the case, and how to study this. Therefore, studying such models of phylogenetic networks -whose large-scale geometry is expected to be tree-like, even though there is no immediate, rigorous connection with branching processesseems like an interesting and challenging problem that will likely require developing new tools and methods.

The logistic branching process
Throughout this document, we denote by X = (X t : t ⩾ 0) the process counting the number of lineages of the first color. It is a birth-death process started from X 0 = 1, killed in 0, and with transition rates: • k → k + 1 at rate k; This process has been called the branching process with logistic growth (or, more succinctly, the logistic branching process) and has been studied, e.g, in [34,43]. It is also a special case of a branching process with interactions, see [13,29,42].
The qualitative behaviour of X can be described as transient fluctuations in a potential well. Indeed, letting K = 1 + (1 − α − µ)/β, when X is smaller than K it tends to increase whereas when it is greater than K it tends to decrease. Thus, in particular when K is large, typical trajectories of X quickly relax towards a quasi-stationary distribution and then fluctuate until they eventually hit 0, which happens in finite time with probability 1.
Although this qualitative behaviour is well-understood, the quadratic term in the death rate makes the obtention of exact quantitative results difficult -and, to some extent, impossible. For instance, a classic approach to study birth-death processes consists in using the Kolmogorov forward equations to obtain a characterization of the probability generating function f (z, t) = E(z Xt ) as the solution of a partial differential equation. Here, standard calculations show that f is the unique analytic solution on [0, 1] × R + of However, this partial differential equation is known not to have a closed-form solution -see Proposition 1.2 in [1]. Another powerful approach to study birth-death processes is the integral representation of the transition probabilities using orthogonal polynomials introduced by Karlin and McGregor [30,31], but to our knowledge in the case of the logistic branching process this does not yield useful explicit expressions.
One of the important properties of the logistic branching process is that it comes down from infinity (meaning that there is a unique way to start it from X 0 = ∞ and yet have X t < ∞ for any t > 0), as shown by Lambert in [34]. We denote by E ∞ the expectation under the initial condition X 0 = ∞. A recurring quantity throughout this paper is the extinction time T = inf{t ⩾ 0 : X t = 0}. In his Theorem 2.3, Lambert gives Laplace transform of T under E ∞ as a function of the solution of a Riccati equation, and shows that its expected value is finite. In fact, T also has finite exponential moments under E ∞ . This can be deduced, e.g, from [24,Proposition 2.4] or [5,Proposition 2.2], and will play an important role in our study -even though, for reasons that will become clear, we actually need a variant of this result (namely Lemma 3.6 in Section 3.3).
Because in our setting the mutations associated to the logistic branching process play a crucial role, the following change of measure will be useful. In what follows, we fix the parameters α and β, and we denote by E µ the expectation under a logistic branching process with mutation rate µ.
Proof. For this proof, it will be convenient to use the "extended" chainX which, in addition to the trajectory of X, contains the information about which transitions correspond to mutations. In other words,X is a continuous-time Markov chain where there are two distinct types of transitions from i to j, one with rate q • ij and one with rate q • ij , where For convenience, we also use the notation q i = j q ij , i.e. in our case q i = (1 + ρ i ) i.
First, note thatX almost surely has a finite number of jumps before hitting 0. Thus, for any n ∈ N, to any trajectoryγ starting from 1 and ending in 0 after n jumps, encoded as a sequenceγ of i • → j and i • → j transitions and a vector x = (x 1 , . . . , x n ) of corresponding holding times, we can associate the probability density whereγ k denotes the position ofγ before the k-th transition, and dx = dx 1 · · · dx n . Noting that M (γ) = i • →j 1 and that where the sums run on all sequencesγ that end in 0 after a finite number of steps, and the integrals are over R n + , with n is the number of jumps ofγ. This concludes the proof.
Finally, it will also be useful to describe the trajectory of X as seen from a uniform mutation time. For this, we first need to introduce some notation for yet another type of changes of measures that will appear several times in the paper. Notation 2.2. Let A and B be random variables defined on the same probability space such that B is almost surely nonnegative and 0 < E(B) < ∞. We write L(A † B) for the distribution of A biased by B, that is, under the probability measure defined by P( · † B) = E(1 {·} B) / E(B). ⋄ With this notation, by "the process X as seen from a uniform mutation time" we rigorously mean where U is chosen uniformly at random among the atoms of the point process M giving the times of the mutations associated to the trajectory of X; note that U need not be defined when M is empty because P(M = 0 † M ) = 0. Equivalently, the distribution of X m is characterized by for any measurable bounded functional F .
It turns out that it is also possible to obtain X m by a simple construction. For this, we need to introduce one last definition.
The back-to-back pasting of f to g is the càdlàg function f ≀ g : Let ν • be the probability distribution on the positive integers defined by with C the corresponding normalizing constant. Let K ∼ ν • and, conditional on K, let X ′ and X ′′ be two independent realizations of the logistic branching process X started from X ′ 0 = K and X ′′ 0 = K − 1. Then, The proof uses general results about the decomposition of trajectories of Markov chains that are recalled in Appendix A.1, and therefore is deferred to the end of that appendix.

Offspring distribution and extinction probability of T
In this section, we focus on the law of the random variable M giving the number of mutations of a color (that is, the number of new colors that it produces; also the offspring distribution of the Galton-Watson tree T). Our main result is the following theorem, on which much of our study relies.
Theorem 2.5. Let M be the offspring distribution of T, and let g be its probability generating function. Then, letting ρ k = α + µ + (k − 1)β, which, using Gauss's notation for continued fractions, can be written Moreover, g is meromorphic on C. The radius of convergence of its power series expansion around 0 is R > 1, and g has a pole in R.
Proof. LetX be the embedded chain of X, that is,X i = X τ i where τ 0 = 0 and τ i+1 = inf{t > τ i : X t ̸ = X τ i }. Note that, conditional on the trajectory ofX, each step from k to k − 1 corresponds to a mutation with probability p k · · = µ/ρ k , independently of everything else. Let us refer to a trajectory ofX started from k and killed when it first hits k − 1 as a k-excursion ofX. Every k-excursion ofX can be decomposed into N k independent (k + 1)-excursions, followed by a single step from k to k − 1. By the strong Markov property, N k follows a geometric distribution on {0, 1, . . .} with parameter θ k · · = ρ k /(1 + ρ k ). Therefore, letting M k have the distribution of the number of mutations along a k-excursion ofX, we have where M (i) k+1 are independent copies of M k+1 that are also independent of N k , and Ber(p k ) is a Bernoulli variable that is independent of everything else.
Applying Wald's formula to Equation (2) gives and solving this first-order linear recurrence yields the formula for the expected value of M d = M 1 .
Let us now turn to the generating function of M and let g k (z) Note that since M k stochastically dominates M k+1 , the R k are nondecreasing. Then, for all z such that |(1 − θ k ) g k+1 (z)| < 1 -note that this is true for all |z| < 1 -we have This gives the representation of g = g 1 as the continued fraction of the theorem.
We now show that g extends to a meromorphic function on all of C. Note that M k is stochastically dominated by H k , the hitting time of 0 by the simple random walk started from 1 that goes up with probability 1 − θ k and down with probability θ k , independently of its current position. A standard calculation (see e.g. [11,Section 6.4]) shows that the probability generating function of H k is whose power series expansion around zero has a radius of convergence equal to (4θ k (1 − θ k )) −1/2 . Moreover, is equal to 1 for all k large enough, and P(H k < ∞) = 1 implies that E(z H k ) = h k (z) inside the disk of convergence of E(z H k ). Since E(z M k ) ⩽ E(z H k ) for z ⩾ 1 and since (4θ k (1 − θ k )) −1/2 → +∞ as k goes to infinity, this shows that for any r > 0 there exists k r such that g kr is analytic on D r = {z : |z| < r}. It then follows by induction that g kr−1 , . . . , g 1 are meromorphic on D r .
Finally, recall that the dominant singularities of a function that is analytic at 0 are those singularities that are closest to the origin. To see that the dominant pole of g is in R > 1, note that since the power series representation of g around the origin has nonnegative coefficients, Pringsheim's theorem (see e.g. [23,Theorem IV.6]) ensures that it has a dominant singularity in ]0, +∞[. Since g(1) = 1 is finite and since all singularities of g are poles, this means that g has no singularity in 1. Hence, g has a dominant pole in R for some R > 1.
One of the advantages of the expression of g as a generalized continued fraction is that this makes its numerical evaluation straightforward and very efficient. Indeed, modified convergents of this continued fraction provide us with upper and lower bounds on g, as the next proposition shows. The rapid convergence of these bounds is illustrated in Figure 2.
Letting Proof. We give a probabilistic proof. Although it is possible to give a shorter analytic proof, we think that the probabilistic one is more instructive.
Let M (n) denote the number of mutations associated to a modified version X (n) of the process X, where coalescences happen at rate n(n − 1)β instead of k(k − 1)β whenever X (n) = k ⩾ n. Let M (n) denote the number of mutations that correspond to transitions from k to k − 1 with k ⩽ n in the original process X. For all n ⩾ 1, we have the stochastic dominations Let G n (z) and G n (z) denote the left-and right-hand sides of the inequality of the proposition, respectively. The same reasoning as in the proof of Theorem 2.5 shows that G n and G n are, respectively, the generating functions of M (n) and of M (n) . Note however that, in the case of M (n) , for small values of n there can be a positive probability that X (n) never hits 0 -in which case it is not possible to decompose its trajectory into finite excursions. Nevertheless, letting A k be the event that X (n) started from k never hits k − 1 and N (n) k a geometric variable with parameter ρ k∧n /(1 + ρ k∧n ), we have k+1 are independent realizations of A k+1 that are also independent of N (n) k . Since, up to a negligible event, A k and {M (n) k = ∞} are equal, this means that Equation (2) holds for M (n) k , mutatis mutandis, even when P(M (n) k = ∞) > 0. Finally, the expression ofḡ n is obtained by solvinḡ Being generating functions, G n and G n are analytic at 0 with radius of convergence at least 1, and we have E(z M (n) ) = G n (z) and E(z M (n) ) = G n (z) for all z ∈ [0, 1[. Combining this with (3) and taking the limit z → 1 − proves the inequality of the proposition for all z ∈ [0, 1].
For z > 1, since M (n) is stochastically dominated by M and since M is almost surely finite, for all n we have G n (z) = E(z M (n) ) ⩽ E(z M ) = g(z) for all z ∈ [1, R[, where R is the radius of convergence of g around 0. Similarly, for n large enough P(M (n) < +∞) = 1 and thus g(z) ⩽ G n (z) for all z ∈ [1, R n [, where R n is the radius of convergence of G n around 0. Note however that for small n we can have G n (1) = P(M (n) < +∞) < 1, and thus G n (z) < g(z) for z ∈ [1, R n [. Finally, to see that sup z∈[0,1] |G n (z) − G n (z)| = O(n c β −n /n!), note that so that for all z, A, B ∈ [0, 1], Since |ḡ n (z) − 1| ⩽ 1, an immediate induction gives finishing the proof.
Besides numerical evaluation, the bounds of Proposition 2.6 can be used to obtain rigorous bounds on the probability of extinction of the model. For instance, taking n = 2 for the left-hand side, n = 3 for the right-hand side, and finding the corresponding fixed points, we get the simple bounds α 2µ In fact, it is possible to get one such upper bound up to n = 9. However, the resulting expression, although very sharp, is too complex to be of any practical use -so we do not reproduce it here.
Let us now point out two immediate consequences of Theorem 2.5 that will be useful in the rest of this document.

Corollary 2.7.
(i) M has finite exponential moments: (ii) There is an exponential tilt of M with mean 1: Proof. (i) is merely saying that the radius of convergence of g is greater than 1; (ii) is a classic consequence of the fact that g(s) → +∞ as s ↑ R, see e.g. point (iv) of Lemma 3.1 in [28]. For the sake of completeness, we recall the proof here: for any a ⩾ 0 and any s ∈ [0, R[, Since The main consequence of Corollary 2.7 is that, for all α, β, µ > 0, when conditioned to have n vertices T is distributed as a critical Galton-Watson tree conditioned to have n vertices. We will come back to this in Section 3.
Finally, we close this section with a brief remark about M, the point process of mutation times. We state it as a proposition for ease of reference, but it is not specific to our setting and follows readily from the infinitesimal definition of a continuous-time Markov chain -so we omit the proof.
Proposition 2.8. Let M be the point process on R + giving the birth times of the children of the first color (that is, every atom t ∈ M corresponds to a mutation of a lineage of the first color).
In particular,

Growth rate of the number of lineages
Let us start by focusing on the number the colors. We will turn to the number of lineages at the end of the section. Let Z t denote the number of colors alive at time t. The process Z = (Z t : t ⩾ 0) is a Crump-Mode-Jagers process, or CMJ for short, where individuals give birth according to a point process distributed as M, the point process of mutations of the first color; and die after a time distributed as T , the extinction time of the logistic branching process started from 1.
The next proposition is an application of standard results from the theory of CMJ processes [14,15,27]. Essentially, CMJ processes grow / decrease exponentially with a growth rate known as their Malthusian parameter. Proposition 2.9 recalls the precise meaning of this "exponential growth" and gives the usual, generic characterization of the growth rate. Another characterization -one that is specific to our setting and makes it possible to compute the growth rate numerically -will be given in Proposition 2.10.

Proposition 2.9. Let λ be the unique solution of
Then, λ has the same sign as E(M ) − 1. Moreover,

then the process Z of the number of colors satisfies
where W is a random variable with E(W ) = 1 that is almost surely positive on non-extinction, i.e. on the event {Z t > 0 for all t}, and where the convergence holds both almost surely and in mean square.
Proof. The fact that the two characterizations of λ are equivalent follows from Proposition 2.8 and Campbell's formula. The uniqueness of λ is standard, and so is the fact that λ is guaranteed to exist whenever E(M ) ⩾ 1, see [14,Section 6].
To see that λ also exists when E(M ) < 1, letting τ denote the time of the first jump of X one can consider the random variable Y η that takes the value e −ητ if the first jump of X is a mutation, and 0 otherwise. Thus, Y η ⩽ t∈M e −ηt . A straightforward calculation then shows that E(Y η ) = µ ∞ 0 e −(1+α+µ+η)t dt → +∞ as η decreases to −(1 + α + µ) and therefore can be made greater than 1 by decreasing η. By the same reasoning as in the proof of the meromorphy of g in Theorem 2.5, η → E( t∈M e −ηt ) cannot jump to infinity. Since it is equal to E(M ) < 1 when η = 0, the existence of λ follows by continuity.
Since Theorem 2.5 entails that E(M 2 ) < ∞, the mean-square convergence in point (i) follows immediately from [15,Theorem 3.1]. Similarly, the almost sure convergence follows from [15,Theorem 3.2], provided that the intensity function of M, namely m : and that X t is integer-valued, we have |m ′ (t)| < K E(X 2 t ) for some constant K. Thus, by Jensen's inequality, to complete the proof of point (i) it suffices to show that ∞ 0 E(X p t ) dt < ∞ for some p > 2. Standard calculations, again using the decomposition of the trajectory of X into excursions, as in the proof of Theorem 2.5, show that for some constant c, as already seen in the proof of Proposition 2.6, the integral is finite for all p, finishing the proof of point (i).
Finally, letting T denote the extinction time of X, 1 {T >t} ⩽ X t and therefore We now give another characterization of λ, which makes use of the measure ν • introduced in Proposition 2.4. Here, we let E • ( · ) denote the expectation for the process X started from a random state with distribution ν • . Proposition 2.10. Let T denote the extinction time of X. Then, the growth rate λ is the unique solution of Furthermore, where f k (λ) is given by the continued fraction The interest of this proposition is that, since the functions f k (λ) can be evaluated efficiently, so can E(M ) E • (e −λT ). This makes it straightforward to determine λ numerically, for instance using the bisection method.
Proof. The first part of the proposition is a consequence of the standard characterization of λ, which is recalled in Proposition 2.9, and of the construction of the process X m given in Proposition 2.4. Indeed, first note that E( t∈M e −λt ) = E(M ) E(e −λU † M ), where U is a uniform atom of M, and also corresponds to minus the infimum of the times for which X m is defined. Second, recall that X m is distributed as X ′ ≀ X ′′ , where X ′ is distributed as X started from ν • , and that in this construction U corresponds to the extinction time of X ′ . As a result, where T k−1 denotes the hitting time of k −1 by X. By the strong Markov property, From the expression of E(M ) in Theorem 2.5, we see that the normalizing constant in Equation (1) Therefore, to finish the proof it only remains to show that The reasoning is exactly the same as for the expression of the generating function of M in Theorem 2.5, so we do not detail it.
So far, we have been focusing on the growth rate of Z t , the number of colors at time t. But from a biological point of view it is arguably more natural to consider Υ t , the number of lineages at time t. We therefore close this section with a proposition showing that the asymptotic growth rate of the number of lineages is the same as that of the number of colors. For simplicity we do not try to state the results in full generality.
Proposition 2.11. Let λ be the growth rate of Z, as given in Proposition 2.9 and 2.10, and let Υ t be the number of lineages alive at time t. If λ > 1, then where Ξ is a random variable that is almost surely positive on non-extinction.
Proof. Again, this is a standard application of general results for CMJ processes counted with a random characteristic, see e.g. [41]. More specifically, let the characteristic associated to each color be the number of lineages of that color. Note that the characteristic of a color is not independent of its lifespan and of its reproduction, but that the characteristics of different colors are independent. Since E(M ) < ∞, Condition 5.1 in [41] holds with g(t) = e −λt . Moreover, by using the same argument as for M it is straightforward to show that the total number of jumps of X has finite exponential moments. Since X has bounded jumps, this implies that E(sup t X t ) < ∞, and so Condition 5.2 in [41] holds with h(t) = e −λt . As a result, the proposition follows from [41,Theorem 5.4].

Convergence to the CRT
In this section, we study the large-scale geometry of G. We will show that, after being conditioned to have n colors and appropriately rescaled, as n goes to infinity G converges in distribution to the Brownian continuum random tree (CRT) for the rooted Gromov-Hausdorff-Prokhorov topology.
The Brownian CRT, introduced by Aldous in [3], is the universal scaling limit of critical Galton-Watson trees when the offspring distribution has finite variance. Since its first description as a random subset of ℓ 1 obtained by successively glueing segments of random lengths along orthogonal directions, it has become standard (see e.g. [ • Let ( C, d) be the quotient metric space obtained by identifying the points of [0, 1] at distance zero for d e , and let the root r ∈ C be the equivalence class of 0.
• Let λ be the pushforward on C of the Lebesgue measure on [0, 1].
The rest of this section is organized as follows: first, we give a brief reminder about convergence in the rooted Gromov-Hausdorff-Prokhorov topology. Coming back to our model, we then detail how to condition G on having n colors, and we introduce some notation. Finally, we prove a series of technical lemmas which, when put together, readily give us the desired convergence to the CRT.

The rooted Gromov-Hausdorff-Prokhorov distance
Here we recall, mostly without proof, the minimal set of notions about convergence of metric probability spaces that are needed to state and prove our results. More detailed treatments can be found, e.g, in [39,Section 6] or in [22,Section 4]. In particular, Proposition 3.4 below provides a general-purpose, simple way to establish convergence to the CRT by following the approach used in [48]. See also [44] for related results.
Since our network G has a distinguished point, namely the point that corresponds to the first lineage at time 0, it is natural to work with a rooted version of the Gromov-Hausdorff-Prokhorov distance. We adapt the definition of [39, Section 6.2] to the rooted setting: let M be the set 1 of isometry classes of rooted compact metric probability spaces X = (X, r, d, λ), where r ∈ X is called the root of X; d is a metric on X; and λ is a probability measure on X. The rooted Gromov-Hausdorff-Prokhorov distance d GHP (X, X ′ ) between two elements (X, r, d, λ) and (X ′ , r ′ , d ′ , λ ′ ) of M is defined as the infimum of the ε > 0 such that there exists a well-defined metric δ on the disjoint union Y · · = X ⊔ X ′ satisfying: is a complete separable metric space (see e.g. [39, Theorem 6 and Proposition 8] for a proof in the unrooted setting; we let the interested reader check that the proof carries over to the rooted setting, and refer them to [22,Section 4.3.3] where this is done for the Gromov-Hausdorff distance).
Because our metric spaces are tree-like, in our setting it will be more convenient to work with height processes than to manipulate d GHP directly. Let us start by recalling how one can obtain a metric space from a càdlàg function, and introducing some notation. Note that this construction is simply a generalization of the construction of the Brownian CRT recalled at the beginning of this section, but with more general functions as contour processes.
where, as previously, [x, y] is shorthand for [x ∧ y, x ∨ y]. We then denote by T h the rooted compact metric probability space obtained by: (1) identifying points x, y ∈ [0, 1] such that d h (x, y) = 0; (2) taking the completion of the space with respect to d h ; (3) taking the equivalence class of 0 as the root; and (4) endowing the resulting rooted metric space with the pushforward of the Lebesgue measure on [0, 1]. This metric space is a subset of an R-tree and consists of a countable number of connected components -see Figure 3 for an illustration, and e.g. [22] for an introduction to R-trees. ⋄ The interest of working with R-trees and their height processes comes from the following lemma, which is a straightforward extension of [35,Lemma 2.4]. Let us denote by D the space of càdlàg functions from [0, 1] to R that are also continuous at 1, endowed with the usual Skorokhod topology [10].
A self-contained proof can be found in the appendices.
From Lemma 3.2, we get the upcoming Proposition 3.4, which provides a general recipe for proving convergence to the CRT in the rooted Gromov-Hausdorff-Prokhorov topology, and is going to be our main tool for the rest of this section.

Definition 3.3.
Let (X, r, d, λ) be a random rooted compact metric probability space. A random càdlàg function φ : where Φ is a deterministic functional and Θ is a random variable with values in [0, 1] that is independent from X, in such a way that the functions t → λ(φ([0, t])) and t → d(r, φ(t)) are well-defined random variables in the Skorokhod space D. ⋄ We require our parametrizations to be admissible for measurability issues -namely, we need this assumption in order to use a variant of Skorokhod's representation theorem in the proof of Proposition 3.4 below. In practice, admissibility should not be a restrictive requirement. In our case, we will define a parametrization of our network G through a randomized traversal algorithm where, conditional on the network, the additional randomness that is needed amounts to a finite number of coin tosses; such a parametrization is readily checked to be admissible.
Then, X n d −→ T h for the rooted Gromov-Hausdorff-Prokhorov topology.
Again, this proposition is proved in the appendices.

Conditioning on the number of colors
We now introduce some notation for conditioning G on its number of colors. This notation will also be used in Section 4, where we study the local weak limit of G conditioned to have n colors. First, recall that G can be viewed as the decorated Galton-Watson tree T ⋆ obtained as follows: 1. Sample a Galton-Watson tree T with offspring distribution M .
2. Conditional on T, decorate each vertex k with the network X k associated to an independent realization of the process X conditioned on having M k mutations (where M k denotes the number of children of k in T).
Let A n be the event { G has n colors}, i.e. {T has n vertices}. Since A n is a deterministic function of T and that the networks (X k ) depend on T only, G n ∼ ( G | A n ) can be obtained by replacing T with T n ∼ (T | A n ) in step 2 of the construction above -i.e. by decorating a Galton-Watson tree with offspring distribution M conditioned to have n vertices.
Note that we have not assumed that E(M ) = 1. Thus, T is not necessarily critical. However, we know from Corollary 2.7 that there exists ζ > 0 such that Thus, lettingM be a ζ-tilt of M , i.e. a random variable whose distribution is characterized by by considering a Galton-Watson tree with offspring distributionM we get a critical Galton-Watson treeT. It is classic -and straightforward to check by writing down the probability distributions explicitly -that T n has the same distribution asT conditioned to have n vertices.
Since we will be conditioning on A n and that (T | A n ) ∼ (T |Â n ), it may not be clear at this point what the interest of working withT instead of T is; this will become apparent later -see e.g. Remark 3.11 -but for now let us simply note that for any nonnegative function f we have E(f (T) |Â n ) ⩽ E(f (T))/P(Â n ) and that, as the following classic proposition shows, it is straightforward to get an asymptotic equivalent of P(Â n ) as n goes to infinity.

Proposition 3.5.
LetT be a critical Galton-Watson tree whose offspring distribution has a finite variance σ 2 > 0 and is not supported on kN, for any k ⩾ 2. LetÂ n denote the event {T has n vertices}. Then, This result is well-known (see e.g. [33, Lemma 1.11] for a more general statement), but since it is central to our study we recall a short proof in the appendices.

Technical lemmas
To clarify the proof of the convergence to the CRT given in Section 3.4, we gather some of the more technical details here. Lemma 3.6 is a result about the logistic branching process with mutation. Once we have recognized that we are working with a decorated Galton-Watson tree, this lemma is the key specificity of our model for the convergence to the CRT. Lemma 3.10 is a streamlined, model-agnostic synthesis of the approach developed in [48,Section 4]. It provides generic concentration inequalities for sums of random variables associated to the vertices/edges of a size-conditioned Galton-Watson tree.
Before stating our first lemma, recall the notation for the quantities associated to a generic color: • X = (X t ) t⩾0 denotes the trajectory of the number of individuals of that color, starting from a single individual at time t = 0.
• T = inf{t ⩾ 0 : X t = 0} denotes the time of extinction of the color.
• L = ∞ 0 X(t) dt denotes the total length of the corresponding subnetwork. • M denotes the number of offspring of the color, i.e. the number of new colors that it produces by mutation.
Finally, recall that ζ > 0 is the unique real number -whose existence is guaranteed by Corollary 2.7 -such that E(M ζ M ) = E(ζ M ).
Lemma 3.6. There exists η > 0 such that Proof. Let us fix s > ζ ∨1 such that E(s M ) < ∞ -such an s exists by Theorem 2.5. Note that to prove the lemma it is sufficient to show that there exists η > 0 such that E(s M e ηL ) < ∞; and that since 0 ∈ A s · · = {η ∈ R : E(s M e ηL ) < ∞}, it in fact suffices to show that A s is an open subset of R.
Recall from Proposition 2.1 that for any numbers µ and s and any nonnegative measurable function f , Now, on the one hand by applying (4) to f (X) = e ηL we get that for any η, On the other hand, for η < µ, by taking f (X) = 1 and replacing (µ, s) with (µ − η, sµ µ−η ) in (4) we get Combining these two equalities, we see that for η < µ, Now, from the explicit expression of the probability generating function of M given in Theorem 2.5, we see that if is finite at η = 0, then it is also finite in a neighborhood of 0. This concludes the proof. We now give a simple Chernoff-type subpolynomial bound on the tail probabilities of a sum of independent random variables with finite exponential moments. The reasoning is classic -see e.g. [48], where it is used repeatedly -but we could not find a generic statement in the litterature; so to streamline some of our proofs we state it as a lemma here. Lemma 3.9. Let Z 1 , Z 2 , . . . be i.i.d. copies of a random variable Z such that E(Z) = 0 and that there exists η > 0 for which E(e η|Z| ) < ∞. Then there exists C > 0 such that, for all ε > 0 and all n ⩾ 1, Proof. We write u n = n 1/2 + ε to ease the notation. Let us start by focusing on positive deviations. For any θ < η, by taking the exponential of the sum, applying Markov's inequality and using the independence of the Z i 's, we get:

21
Now, since E(Z) = 0, taking K > E(Z 2 )/2 we have E(e θZ ) ⩽ 1 + Kθ 2 for all θ small enough. Thus, with θ n = n −1/2 we get The negative deviations are treated similarly (or, more directly, by applying this bound to the variables −Z 1 , −Z 2 , . . .), yielding is an exchangeable real vector of length m. In particular, note that for each edge e, G e is a real-valued random variable; and that for a given k the family (G k→ℓ , ℓ child of k) is not assumed to be independent. Note also that because of exchangeability, for any edge e = k → ℓ the law of G e depends only on M k , and so with a slight abuse of notation we write (M k , G k→ℓ ) for a typical pair -for instance the pair of variables corresponding the edge from the root to a uniformly chosen child of the root. (i) There exists C > 0 such that (ii) Letting v 1 , v 2 , . . . denote the vertices of T, labeled in depth-first order: conditional on A n , for all ε > 0, More precisely, P(∆ ⩾ n 1/2+ε | A n ) ⩽ Ce −cn ε for some constants C, c > 0 that may depend on ε, and all n ∈ N.
Let (G e ) e∈E(T) be edge decorations of T such that there exists η > 0 for which we have E(e η|G k→ℓ | M k ζ M k ) < ∞. Then, letting Γ(v) denote the path from the root of T to its vertex v: (iii) Conditional on A n , for all ε > 0, More precisely, P(∆ * ⩾ n 1/4+ε | A n ) ⩽ Ce −cn a for some constants C, c, a > 0 that may depend on ε, and all n ∈ N.
Proof. First, note that the main difficulty comes from the fact that, under P( · | A n ), the decorations are not independent.
Point (ii) is proved similarly: we fix ε > 0 and use a union bound to get Under the unconditional probability P, the random variablesF v i are i.i.d. and their expected value is E(F ) =m. Therefore, by applying Lemma 3.9 we get for some C > 0. Since n/P(Â n ) = Θ(n 5/2 ), this implies P(∆ ⩾ n 1/2 + ε | A n ) = O(e −cn ε ) for some c > 0.
The proof of point (iii) requires a few extra ingredients. As for (i) and (ii), we start from P(∆ * ⩾ u n | A n ) = P(∆ * ⩾ u n |Â n ). Next, we recall that the maximum of the distance between a vertex and the root in a Galton-Watson tree T n conditioned to have n vertices is of order n 1/2 . More specifically, if we denote by H(t) the maximal distance to the root in a tree t, then n −1/2 H(T n ) converges in distribution as n → ∞, see e.g. [2]. Therefore, for every ε > 0 there exists c > 0 such that P(H(T) > c √ n |Â n ) < ε for all n large enough -which in turns entails As a result, we can pick an integer sequence (w n ) such that √ n = o(w n ) and assume in what follows that, conditional onÂ n , we have H(T) ⩽ w n .
Let us denote byT |h the set of vertices at distance h from the root inT and, to keep notation light, set S(v) · · = | (k→ℓ)∈Γ(v) (G k→ℓ − m * )|. For any sequence (u n ), Now, let (T (h) , v * ) be the random pointed tree obtained in the following way: • Let v 1 be the root, and start with a path v 1 , v 2 , . . . , v h+1 = v * from v 1 to v * . This path will be referred to as the spine of the tree.
• For k = 1 to h, add M * k − 1 children to v k , where M * k is an independent copy of the size-biasing ofM , i.e. a random variable whose distribution is P(M * k = i) = i ζ i P(M = i)/E(M ζ M ). • Let each of the vertices added at the previous step, as well as v * , be the root of an independent Galton-Watson tree with offspring distributionM .
It is classic (see e.g. [48,Section 4.2]) and readily checked that for any fixed tree t and each vertex v ∈ t |h , As a result, for any function f on pointed trees, Applying this identity to f (T, v) = 1 {S(v)⩾un} in (6), we get By construction, on the spine ofT (h) the number of children of the vertices is distributed as the vector (M * 1 , . . . , M * h ,M h+1 ), whose components are independent. As a result, letting G * k be the first component of the vector G(M * k , Θ k ) and m * = E(G k→ℓ M ζ M )/E(M ζ M ) its expected value, if u n → ∞ as n → ∞, then Moreover, we then also have, as n → ∞, Finally, taking u n = n 1/4 + ε for some ε > 0 and w n = ⌊n 1/2 + δ ⌋ for some δ > 0 such that (1/2 + δ) 2 < 1/4 + ε, Lemma 3.9 yields for some positive constants a and C. Plugging this back in (7) and using that w n P(Â n ) −1 = Θ(n 2+δ ) concludes the proof.
Remark 3.11. This proof illustrates the point of working with a critical Galton-Watson tree: for instance, even though we also have because in the non-critical case P(A n ) decays exponentially, the mere fact that F has finite exponential moments would not have been sufficient to get an adequate upper bound on the expression above. ⋄

Proof of the convergence to the CRT
We are now ready to prove the main theorem of this section. Recall that ζ > 0 is the unique real number such that E(M ζ M ) = E(ζ M ).
Theorem 3.12. Let ( G n , r n , d Gn , λ Gn ) denote the random rooted metric probability space ( G, r, d G , λ G ) conditioned to have n colors, and let C be the Brownian CRT. Then, as n → ∞, for the rooted Gromov-Hausdorff-Prokhorov topology, with where U * is sampled uniformly at random among the mutation times of the biased process X * ∼ L(X †M ζ M ), andσ 2 is the variance ofM ∼ L(M †ζ M ). Moreover, Remark 3.13. Using the probability measure ν • introduced in Proposition 2.4, the constant C can also be expressed in terms of T , the extinction time of the logistic branching process. Indeed, as explained after the proof of Theorem 3.12, Remark 3.14. Addario-Berry et al. [2] have shown that for critical Galton-Watson trees with finite-variance offspring distribution, the normalized height (and width) of the tree conditioned to have n vertices satisfy uniform sub-Gaussian tail bounds. More precisely, letting H(T n ) denotes the height of the Galton-Watson tree T n , there exist K, k > 0 such that for every n ∈ N and every x ∈ R + we have Now consider, as in Lemma 3.10, any family of edge decorations (G e ) e∈E(T) such that E(e η|G k→ℓ | M k ζ M k ) < ∞ for some η > 0. Let us write With the notation of Lemma 3.10 (iii), we have ∆ n ⩽ (∆ * + H(T)m * )/ √ n. Thus, applying (8) and Lemma 3.10 (iii) with ε = 1/4 yields for some positive constants K, k and a that do not depend on n. This uniform tail bound implies that for any p ⩾ 1, the sequence ( ∆ p n ) n⩾1 is tight. In our context, if we let G k→ℓ be the distance between the root of X k and the mutation corresponding to the vertex ℓ, then ∆ n is the height of ( G n , r n , d Gn / √ n). Therefore, we conclude that in addition to the convergence in distribution of Theorem 3.12, all moments of the height of the network, diameter and related quantities converge to the moments of the properly rescaled Brownian CRT. ⋄

Proof of Theorem 3.12. The proof is an application of Proposition 3.4. Set
and let us define an admissible parametrization φ n : [0, 1] → ( G n , r n , d n , λ n ). Recall that, in the forward-time process defining G, a branching point is a point where a lineage splits and a coalescence point is a point where two lineages merge. Using some arbitrary procedure, distinguish one of the two outgoing lineages of each branching point of G n and one of the two incoming lineages of each coalescence point. Note that by (1) disconnecting the tip of each of the distinguished lineages that correspond to coalescences points from those coalescence points and (2) drawing distinguished lineages that correspond to branching points to the right of their undistinguished counterparts, we get a rooted plane R-tree G # n (not to be confused with T n , the combinatorial tree encoding the genealogy of the colors of G n ). Now, pick a depth-first ordering of the vertices of T n , and visit the points of G n as follows: • Visit the subnetworks corresponding to the vertices of T n in depth-first order.
• Within each subnetwork X k , do a depth-first traversal of the corresponding "unreticulated" R-tree X # k , that is: starting from the root, travel along the lineages at constant speed nL k = nλ Gn (X k ), in depth-first order and visiting the "left" subtree first when encountering a branching point.
This construction is illustrated in Figure 4. Note that each jump of φ n corresponds to either the tip of a lineage or the second visit of a coalescence point, and that those jumps can be negative (typical case) or positive (which can only happen when finishing the exploration of a color and moving to a new one). Moreover, each point of G n is visited exactly once, except for: • The tips of lineages, which -with the exception of φ n (1) -correspond to the left-limits of some of the jumps of φ n .
• Branching points, which are visited twice.
As a result, φ n ([0, 1]) is dense in G n and φ n is an admissible parametrization of G n . Figure 4: Illustration of the construction of the admissible parametrization φ n used in the proof. Left, a realization of G n for n = 5, with the same drawing conventions as in Figure 1. The distinguished lineages associated to coalescences are indicated by asterisks and, to avoid cluttering, the distinguished lineages associated to branchings are taken to be the lineages drawn to the right. Right, the rooted plane R-tree G # n obtained by "disconnecting" coalescence points of G n . This tree is to provide us with a natural order in which to visit the lineages of G n . Bottom, the height function h n : t → d n (r n , φ n (t)) associated to φ n . The speed of travel along the lineages of the subnetwork corresponding to a given color is proportional to the total length of that subnetwork, ensuring that each color is allotted the same amount of time by φ n .
Next, let us show that φ n satisfies assumptions (i-iii) of Proposition 3.4. Starting with (i), pick s, t ∈ [0, 1] with s < t, and let (x, y) = (φ n (s), φ n (t)) be the corresponding points of G n . Let then x ∧ y be the most recent common ancestor of x and y in G n , i.e. the (unique) oldest point in a (non necessarily unique) shortest path between x and y, and let X c be the subnetwork containing x ∧ y. Let z c be the root of X c and, for i ∈ {x, y}, let z i be: z c if i ∈ X c ; otherwise, the root of the subnetwork through which every path from x ∧ y to i exits X c . These definitions are illustrated in Figure 5. Figure 5: Graphical depiction of some of the notation used in the proof. The black lines correspond to shortest paths between various points of G n , and the colored blobs to the subnetworks associated to the colors. Although there can be several shortest paths between x and y, each of these paths goes through z x and z y , x). Finally, note that within each subnetwork X k the distances are bounded above by 2 T k , where T k is the extinction time of the corresponding process X k .
With this notation, and recalling that h n (t) = d n (r n , φ n (t)), observe that Now, d Gn (z x , z y ) < 2 T c , where T c is the extinction time of the logistic process X c associated to the color c. Therefore, Moreover, since by construction of φ n the vertices of T n are visited in depthfirst order, for all u ∈ [s, t] the point φ n (u) belongs to either X c or one of its descendants, which implies that h n (u) ⩾ d n (r n , z c ); and since there exists u ∈ [s, t] such that φ n (u) is at distance 0 from X c (indeed, if y ∈ X c one can take u = t, and if y / ∈ X c then one can take u such that φ n (u) = z y ), which in turns implies h n (u) ⩽ d n (r n , z c ) + Cn −1/2 T c , we get d n (r n , z c ) ⩽ inf [s,t] h n ⩽ d n (r n , z c ) + Cn −1/2 T c .
Similarly, for i ∈ {x, y} we have d n (r n , z c ) ⩽ d n (r n , z i ) ⩽ d n (r n , z c ) + Cn −1/2 T c , so that Plugging (10) and (11) in (9), we get |d n (x, y) − d hn (s, t)| < 4 C n −1/2 T c . Therefore, Applying point (i) of Lemma 3.10 to the extinction times T 1 , . . . , T n , which is made possible by the fact that we know from Lemma 3.6 that T has finite exponential moments under L( · † M ζ M ), we get max(T 1 , . . . , T n ) = o p (n ε ) for any ε > 0, which in turns implies thereby proving that φ n satisfies assumption (i) of Proposition 3.4.
Let us now turn to assumption (ii). Let X 1 , . . . , X n be the subnetworks of G n , in order of their visit by φ n . By construction of φ n , for all t ∈ [0, 1] we have φ n (t) ∈ X ct , where c t · · = (⌊tn⌋ + 1) ∧ n Moreover, where L k = λ Gn (X k ). Applying point (ii) of Lemma 3.10 to L 1 , . . . , L n , which again is made possible by Lemma 3.6, we get that for any ε > 0, uniformly in t ∈ [0, 1] and with ℓ = E(Lζ M )/E(ζ M ). Taking t = 1, we see that | G n | = nℓ + o p (n 1/2 + ε ), as claimed in the statement of the theorem. From there, we get sup which shows that φ n satisfies assumption (ii).
To show that φ n satisfies assumption (iii), let h Tn be the height process of T n , that is where v 1 , . . . , v n are the vertices of T n , in order of their visit by φ n and d Tn (u, v) is the number of edges of the path joining u and v. Since T n ∼ (T |Â n ) is a critical Galton-Watson tree conditioned to have n vertices, it is well-known -see e.g. Corollary 1 in [38], from which this readily follows -that, as n → ∞, t∈ [0,1] in the Skorokhod space D, where (e(t)) t∈[0,1] is a standard Brownian excursion andσ 2 = Var(M ) is the variance of the offspring distribution ofT, and c t = (⌊tn⌋ + 1) ∧ n. Therefore, to conclude the proof of Theorem 3.12 it suffices to show that 29 where h Gn (t) = d Gn (r n , φ n (t)) and U * is the time of a mutation sampled uniformly at random among the mutations of the process X * ∼ L(X † M ζ M ).
For this, for all k ∈ {1, . . . , n} let us denote by z k the root of X k . As a result, letting Γ(k) = (i 1 → . . . → i p ) be such that (v i 1 = v 1 , . . . , v ip = v k ) is the path from v 1 to v k in T n , and recalling that φ n (t) ∈ X ct , we see that, for all t ∈ [0, 1], φ n (t)) .
As we have already seen, for any ε > 0, d Gn (z ct , φ n (t)) < T ct = o p (n ε ), uniformly in t. Since, by definition of h Tn , the number of edges of Γ(c t ) is h Tn (c t ), we get that for any constant κ, Moreover, along Γ(c t ) each d Gn (z i , z j ) is the time elapsed between the creation of X i and that of X j -which, conditional on X i , is distributed as the random functional U i = U (X i ) giving the time of a mutation sampled uniformly at random among the mutations of X i . As a result, applying point (iii) of Lemma 3.10, we get does indeed correspond to the expected value of the time of a mutation sampled uniformly at random among the mutations of the biased process X * ∼ L(X † M ζ M ). Putting the pieces together, this proves (12), thereby concluding the proof of Theorem 3.12.
Finally, before closing this section, let us justify the expression of E(U * ) given in Remark 3.13. To make things simpler, we work with the "extended" processX, which, in addition to the trajectory of X, contains the information of which jump corresponds to a mutation. Thus, M = M (X) is a deterministic function ofX. First, note that and that, considering the shift operators (Θ t ) t∈R defined by Θ tX · · = (X t+s ) −t⩽s⩽T −t and the function Recalling the definition of the process X m introduced in Section 2.1, this is also Therefore, using theX m d =X ′ ≀X ′′ decomposition given in Proposition 2.4, together with the fact that where T (X ′ ) denotes the extinction time ofX ′ , we get Putting the pieces together, this yields the expression given in Remark 3.13.

Local weak limit
In this section, we describe the structure of G n around a uniformly chosen point. More specifically, we give an algorithmic construction of the local weak limit of G n around a focal point distributed according to the normalized measure λ Gn /| G n |. The notion of local weak convergence after uniform rooting was introduced by Benjamini and Schramm in [7], and is therefore also known as Benjamini-Schramm convergence; see e.g. [47,Section 2.2] or [16, Section 1.2] for a general introduction. Throughout this section, unless specified otherwise the term local weak limit will always refer to the to the Benjamini-Schramm limit.
This section is organized as follows: first, we briefly lay out the topological notions that are used in our proof of the local convergence. We then describe the local weak limit ( G † , x † ) as a decorated random tree. This random tree is a biased -that is, non-uniformly rooted -local weak limit of the size-conditioned Galton-Watson tree T n giving the genealogy of the colors of G n (see Section 3.2), and the decorations are modifications of the subnetwork X corresponding to a generic color. We close the section by describing the geometry of these various modifications of X.

Local topology
In order to define our local topology, we first need to specify a local space of decorated trees. A locally finite pointed rooted plane tree -henceforth simply referred to as a pointed tree for brevity -is a pair (T, v * ) where T is a rooted plane tree in which every vertex has a finite degree, and v * is a vertex of T known as the focal vertex. Note that in the case where T is infinite, its root can be located at infinity: in that case, instead of corresponding to a vertex, the root corresponds to a topological end of T. Another way to see this is that T being rooted actually means for any pair of adjacent vertices (u, v) we know who is the parent and who is the child.
A decorated pointed tree (T, v * , (D v ) v∈T ) is a pointed tree where each vertex v ∈ T is associated to a random variable D v taking value in a Polish space D. In our setting, D will be a space in which the color networks (X v ) used in the construction of G as a decorated tree are well-defined Borel-measurable random variables; but for now let us view it simply as an abstract Polish space. We denote by T loc the space of decorated pointed trees.
The local topology on T loc is the topology generated by the following basis of open sets: where r runs over the positive integers, t over the finite pointed trees, and (V u ) u∈t over the opens sets of D. The notation B T (v * , r) stands for the ball of radius r centered at v * in T.
To make our description of T loc fully explicit, we would need to give a formal definition of the space D of decorations. While this is relatively straightforward to do, this is not only tedious but also uninformative. Therefore, we leave it to the reader to convince themself that this can be done while ensuring that the following properties hold: • The decorations are pointed networks, i.e. pairs (X, x * ) where x * ∈ X and, as previously, the network X -which is meant to represent the subnetwork of G that corresponds to a given color -can be seen as a collection of segments glued together at their endpoints (see Section 1.2). We denote by λ X the Lebesgue measure on X.
To keep the notation light, the fact that the decorations are pointed will be considered implicit: we occasionally write X instead of (X, x * ) when the focal point x * is irrelevant.
• The map X → L X · · = dλ X is continuous.
• For all continuous bounded maps F : D → R, the map is continuous.

Construction of the limit as a decorated tree
First, recall from Section 3.2 that ifT is Galton-Watson tree whose offspring distributionM is given by where ζ is as in Corollary 2.7, thenT conditioned to have n vertices has the same distribution as the tree T n used to construct G n as a decorated tree.
Next, let us describe (T * , v * ), the local weak limit of T n . The local weak limit of size-conditioned critical Galton-Watson trees after random rooting is the invariant random sin-tree introduced by Aldous in [4] -see [46] for a detailed presentation. With our notation, this pointed tree (T * , v * ) can be constructed as follows: • Let v * be the focal vertex and let (v * , v 1 , v 2 , . . . ) be the spine of T * , i.e. an infinite path going towards the root (thus, v 1 is the parent of v * , v 2 is the parent of v 1 , etc).
• For each k ⩾ 1, add M * k −1 children to v k , where (M * k ) k⩾1 is an i.i.d. sequence with the size-biased distribution ofM .
• Let v * , as well as each of the vertices added at the previous step be the root of a Galton-Watson tree with offspring distributionM , and call T * the resulting infinite random tree.
Now, let us consider the infinite random network G * obtained by decorating T * using the same procedure as when decorating T n to obtain G n : conditional on T * , let us decorate each vertex v, independently of everything else, with a random network X v having the distribution of the generic color network X conditioned to have a number of mutations equal to the number of children of v. Note that the subnetwork X * corresponding to the focal vertex v * plays a special role in G * : we refer to that subnetwork as the focal network.
The pair ( G * , X * ) is not the local weak limit of G n : indeed, it describes the limit of neighborhoods of a color network picked uniformly at random in G n , rather than the limit of the neighborhoods of a point picked uniformly at random in G n . To see why the two differ, note in particular that picking the focal point x * according to the normalized length measure λ Gn /| G n | biases the focal network by its total length L * = λ X * (X * ).
To construct the local weak limit of G n , conditional on ( G * , X * ) let x * ∼ λ X * /L * be a random point of X * . Note that -to fall back on the topological framework of Section 4.1 -the pointed network ( G * , x * ) can be seen as an element of T loc by identifying it with a copy of (T * , v * ) where the focal vertex v * is decorated with the pointed network (X * , x * ), and the decorations of the other vertices are arbitrarily pointed. Now recall from Notation 2.2 that L( · † L * ) denotes the distribution under the L * -biased probability measure E(1 {·} L * )/E(L * ), and let ( G † , x † ) be the random pointed network rooted at infinity characterized by Before proving this theorem, let us point out that ( G † , x † ) can also be constructed as follows; we leave it to the reader to convince themself of the equivalence of the definitions: • Let v † be the focal vertex, and let (v † , v 1 , v 2 , . . . ) be an infinite spine going towards the root.
• For each k ⩾ 1, add M * k −1 children to v k , where (M * k ) k⩾1 is an i.i.d. sequence with the size-biased distribution ofM .
• Let X † ∼ L(X † Lζ M ) where X is a generic color network and M and L are respectively its number of mutations and total length. Write M † and L † for the corresponding quantities in X † , and add M † children to v † .
• Let each children of v † , as well as each of the children that were added to the nodes (v k ) k⩾1 , be the root of a Galton-Watson tree with offspring distributionM . Let T † be the resulting tree.
• Decorate each node v ∈ T † , v ̸ = v * , with a network X v having the law of a generic color network X conditioned to have as many mutations as the number of children of v. Let X † be the decoration of v † .
Proof of Theorem 4.1. Let x n be a uniformly chosen point of G n , and let v * n be the vertex of T n such that x n ∈ X v * n . Remember that, since here we view G n as the tree T n decorated with pointed networks, if we let x n be the focal point of X v * n then ( G n , x n ) and (T n , v * n ) can be seen as the same object. Thus, we will use these two notations interchangeably.
To prove the theorem, it suffices to show that for any function F : T loc → R of the form where r is a positive integer; t is a finite pointed rooted plane tree; and (F v ) v∈t is a family of nonnegative continuous bounded maps D → R such that for all Note that any such map F is continuous for the local topology and bounded, with Moreover, since the map X → L X giving the total length of a network is continuous, we can restrict ourselves to functions F for which there exists ℓ > 0 such that, for all v ∈ t, Let us show that to finish the proof it suffices to show that whereL ∼ L(L † ζ M ) is the total length of a generic decoration of the critical Galton-Watson treeT such that T n ∼ (T |T has n vertices). By definition of x n , Thus, if (14) holds we have where the map is continuous (and bounded by ∥F ∥ ∞ ℓ) for the local topology, because the maps X → F v (X, x) λ X (dx) are continuous on D. Since the pointed tree (T * , v * ) used in the construction of ( G † , x † ) is the local weak limit of T n , we get where the last equality holds because, by definition, ( G † , x † ) ∼ L(( G * , x * ) † L * ) where L * = dλ X v * ∼L. Putting the pieces together, this proves (13).
Let us now prove (14), i.e. show that E(Y n ) → 0, where For this, on the one hand, note that F is bounded and and that on the other hand, since we have assumed that if the total length of the subnetwork containing x is greater than ℓ then F ( G, x) = 0, we also have As a result, Y n ⩽ ∥F ∥ ∞ (1 + ℓ/E(L)). Thus, by dominated convergence, to prove that E(Y n ) → 0 it suffices to show that Y n → 0 in probability. Using again (15), Finally, by Lemma 3.6 we can apply point (ii) of Lemma 3.10 to the random variables (L v ) v∈Tn to get that for any ε > 0, concluding the proof.

Geometry of the focal and spinal networks
In order to complete the picture of the local weak limit of ( G n ) n⩾1 , let us zoom in on the decorations composing G † and describe their distributions more finely than in the previous section. Specifically, we are interested in • (X † , x † ), the focal network and its distinguished point; • (X ⋄ , x ⋄ ), which we call a spinal network. This network is distributed as the color network that is the parent of the focal network, and its distinguished point is the mutation point that corresponds to the root of the focal network.
Recall that, by the construction of G † given in Section 4.2, these objects satisfy, for any positive measurable functional F on pointed color networks: where, by a slight abuse of notation, M denotes the point process of mutations on the space X (previously, M denoted the point process on R corresponding to the mutation times).
Our next result shows that focal and spinal networks can be constructed by "glueing" two half-networks that are independent conditional on their number of tips. Moreover, there is an explicit procedure to built these networks from their profile, i.e. from the process giving their number of lineages as a function of time. Let us start by introducing some notation.
Let I = [t 0 , t 1 ], with t 0 < 0 ⩽ t 1 , and let γ = (γ t ) t∈I be a càdlàg, positive except at time t 1 , integer-valued trajectory consisting of a finite number of ±1 jumps, starting at 1 and ending with a jump to 0. As usual, let X denote a generic color network, and let X be the corresponding logistic branching process. With a slight abuse of notation, we will write {X = γ} for the event on which the trajectory of the Markov chain X, started from 1 at time t 0 , is exactly γ. Note that for any t 0 ∈ R, it makes sense to consider a random network X started from a single individual at time t 0 , for which the logistic branching process X started from 1 at time t 0 is the "number of lineages" process. The distribution of X does not depend on t 0 .

Definition 4.2.
The random pointed network X[γ] is defined as and the focal point is chosen uniformly among the points of the networks that correspond to lineages alive at time 0. See Figure 6 for an illustration. In the case where γ has a downward jump at time 0, we also define • For each jump from k to k + 1 at time t, pick one a lineage uniformly at random among the lineages alive at time t, and let it split into two lineages.
• For each jump from k to k − 1 at time t, choose one of the following possibilities: with probability µ/ρ k , pick a lineage alive at time t uniformly at random, and let it mutate; with probability α/ρ k , pick a lineage similarly and let it die; and with probability 1 − (α + µ)/ρ k , pick a pair of lineages uniformly at random and let them merge together.
The network X m [γ] is obtained similarly, with the additional constraint that the jump from k to k − 1 at time 0 is a mutation. Now, recall the following notation, introduced in Section 2.

Proposition 4.3.
For each k ⩾ 0, let X ′ k and X ′′ k be independent realizations of X that are started from k and also independent of everything else, and let K ∼ ν • .
. Remark 4.4. This construction of the focal/spinal networks makes it possible to get expressions for some characteristics of the local weak limit ( G † , x † ). For instance, if we let N be the number of lineages of the same color as x † that are alive at the same time as x † , then by (i) we have, for a normalizing constant C ⩾ 0, where E k denotes the expectation conditional on {X 0 = k}. The limitation comes from the fact that if ζ ̸ = 1, then the expressions E k (ζ M ) are not explicit. However, they can expressed as continuous fractions, which would it possible compute them numerically (see Theorem 2.5).
Similarly, if T denotes the time since the last mutation in the ancestry of x † , then for any bounded measurable function F : R → R, where here T 0 denotes the hitting time of 0 for the process X. ⋄

Proof of Proposition 4.3.
The proof is very similar to that of Proposition 2.4, and also relies on the path decomposition Markov chains presented in Section A.1.
Let Y be the birth-death chain on N that goes from k to k + 1 at rate 1 and from k to k − 1 at rate ρ k . Note that Y is distributed as the chain X slowed-down by a factor k when in state k, and resurrected at rate 1 when it hits 0. Thus, Y is positive recurrent -and therefore, reversible (as any positive recurrent birth-death chain). Moreover, it is straightforward to check that the stationary distribution of Y is the probability distribution π defined by where C is a normalizing constant. Note that by definition of ν • , if K 0 is a random variable with distribution π, then its conditional distribution given {K 0 ⩾ 1} is ν • .
Let Y be started from 1, and denote T 0 the hitting time of 0. Conditional on T 0 , let U be uniform on [0, T 0 ], and set K · · = Y U . Now, from the trajectory of (Y t ) t∈[0,T 0 ] , construct a path with the same distribution as X by speeding up time by a factor k when in state k. Let V be the point corresponding to U in the new timescale. Note that this shows that T 0 , the hitting time of 0 by Y , has the same distribution as L = ∞ 0 X t dt, since in this construction the two quantities are equal. Now, consider the biased probability measure where Y ′ and Y ′′ are independent copies of Y started from K. In other words, for any measurable positive functional F of trajectories: Using our coupling of X and Y , and recalling that T 0 = L, this yields: where τ 0 is the extinction time of X, and X ′ K and X ′′ K are independent copies of X started from K, as they are defined in the statement of the proposition.
Let us now define a pointed network (X, x * ) as follows: conditional on the trajectory of X constructed above, using Definition 4.2 let (X, Thus, X is distributed as a standard color network whose root is located at time −V . Recall that by definition, conditional onX, the focal point x * is chosen uniformly at random among the points that correspond to the K lineages alive at time 0.
By construction, x * is uniform onX with respect to its length measure. Therefore, for any measurable positive functional F on pointed networks: where M is the number of mutations ofX. Moreover, by applying Equation (18) to the functional γ → E ζ M (X[γ]) F (X[γ]) , we get where M † is the total number of mutations of X[X ′ K ≀ X ′′ k ]. Therefore, Finally, taking F ≡ 1, we get E(ζ M † ) = E(Lζ M )/E(L), and so comparing the previous display with Equation (16) characterizing the law of (X † , x † ), we see that finishing the proof of point (i).
Point (ii) is proved similarly, but using Proposition A.2 instead of Proposition A.1 to view the network from a uniform mutation point instead of from a uniform point.

A.1 Path decompositions of Markov chains
In this appendix, we give a description of the trajectory of a Markov chain as seen from a random point in time. The ideas are standard, but we could not find the two propositions below in the literature. Their proofs are elementary, but somewhat tedious; so since they are very similar we present only the most involved of the two and leave the other one to the reader. This appendix also contains the proof of Proposition 2.4.
Let E be a countable set and let Y be a continuous-time Markov chain on E with transition rate matrix Q = (q ij ) i,j∈E , started from the initial state 0 ∈ E. Let us write τ for the first jump time of the chain and T 0 for the return time to 0, i.e. T 0 = inf{t ⩾ τ : Y t = 0}. Assume that Y is positive recurrent, with stationary distribution π, and define the reversed chain Y ′ as the continuous-time Markov chain with transition rate matrix We will consider time-shifted trajectories γ of the Markov chain Y killed upon reaching 0. For this purpose, let us formally define a convenient Skorokhod-like space of trajectories. Let E ′ denote E ∪ {∆}, where ∆ / ∈ E is arbitrary, and let Γ denote the set of càdlàg functions γ : R → E ′ such that for all t with |t| large enough we have γ(t) = ∆. With a slight abuse, for any a < b ∈ R and γ : The space Γ can be endowed with the metric d defined by: seen as a random variable in Γ. This trajectory can be conveniently described by decomposing it into its left and right parts. For this, recall the "back-to-back pasting" operation introduced in Definition 2.3, which to two càdlàg functions Proposition A.1. With the definitions above, we have where Y ′ is the reversed chain and Y ′′ has the same transitions as Y . Both chains are started from Y ′ 0 = Y ′′ 0 ∼ π * , where π * i = π i 1−π 0 for all i ∈ E \ {0}, and stopped upon reaching 0. Conditional on their common starting point, they are independent.
As discussed above, the proof is left to the reader.
Assume now that the transitions of the process Y are associated with weights: each transition i → j has weight w ij ⩾ 0. Define a random measure W by where the sum is over all (finitely many) (i, j, t) such that Y jumps from i to j at time t along a trajectory started from 0 and stopped upon reaching 0. Define W = dW as the total weight accumulated along the trajectory and assume that 0 < E(W ) < ∞. We are now interested in the distribution of where the conditional distribution of U given W is 1 W W. In other words, the distribution of Z w is characterized by for any measurable bounded functional F .
where Y ′ is the reversed chain and Y ′′ has the same transitions as Y . Conditional on their starting points, Y ′ and Y ′′ are independent. They are started from a pair of states (Y ′ 0 , Y ′′ 0 ) chosen according to the probability and stopped upon reaching 0. If one of the chains is started from 0, its trajectory is reduced to a single point.
Proof. The proof is a series of elementary Markov chain calculations. We use the standard notation q i = j q ij .
Consider two starting states i ̸ = j, a trajectory f from i to 0 and and a trajectory g from j to 0. Let γ 0 , . . . , γ n f +1 and ξ 0 , . . . , ξ ng+1 be the successive states visited by f and g, respectively, and let x 0 , . . . , x n f and y 0 , . . . , y ng be the corresponding holding times. Writing P(Z w ∈ dh) for the probability density of Z w evaluated in a specific trajectory h, by definition of Z w , we have where dx = dx 0 · · · dx n f and dy = dy 0 · · · dy ng . Rearranging the terms, we get which concludes the proof.
We close this appendix by proving Proposition 2.4 from the main text, whose statement we reproduce here for convenience. Recall that X m denotes the process X "as seen from a uniform mutation time", that is, where U is chosen uniformly at random among the atoms of the point process M giving the times of the mutations associated to the trajectory of X.

Proposition 2.4. Let ν • be the probability distribution on the positive integers defined by
with C the corresponding normalizing constant. Let K ∼ ν • and, conditional on K, let X ′ and X ′′ be two independent realizations of the logistic branching process X started from X ′ 0 = K and X ′′ 0 = K − 1. Then, Proof. Let X • be a Markov chain started from 0 with the same transition rates as X, except for an additional "rebirth" transition from 0 to 1 at an arbitrary positive rate. Thus, X • is positive recurrent, and it is straightforward to check that its stationary distribution (π i ) i⩾0 satisfies, for i ⩾ 1, Now, since the excursions of X • away from 0 are distributed as the restriction of X to [0, T ], we have where τ is the first jump time of X • ; T • its time of first return to 0; M • its number of mutations on [τ, T • ]; and U the time of a mutation chosen uniformly at random among the mutations on [τ, T • ]. Moreover, each downward jump of X • from i to i − 1 corresponds to a mutation with probability µ/ρ i , independently. Therefore, if conditional on X • we let W be the measure defined by where the sum is over all (i, t) such that X • goes from i to i − 1 at time t ∈ [τ, As a result, conditional on X • , letting V ∼ 1 W W we have Therefore, by applying Proposition A.2 to (X • , W), we get that X m d = X ′ ≀ X ′′ , where: • X ′ and X ′′ are independent, X ′′ has the same transition rates as X • , and X ′ has the same transition rates as the time-reversed chain of X • .
• (X ′ , X ′′ ) is started from (i, i − 1) with probability where C is the normalization constant (and we have used the expression of π i given in (19) to get the right-hand side).
Finally, since every positive recurrent birth-death chain is reversible -a standard fact that follows from Kolmogorov's criterion for time-reversibility -X ′ in fact has the same transition rates as X • ; and since X ′ and X ′′ are both killed upon reaching 0, these two chain also have the same transitions rates as X.

A.2 Proofs for Section 3.1
In this appendix, we prove Lemma 3.2 and Proposition 3.4, whose statements we will reproduce below for convenience.
Let us start by recalling how the notions of correspondence and distortion can be used to tackle Gromov-Hausdorff-Prokhorov convergence more conveniently than by working directly with the definition of the metric. Note that, in order to deal with the Prokhorov component of the metric, we will use definitions that differ slightly from those traditionally used in the Gromov-Hausdorff setting.
Let (X, r, d, λ) and (X ′ , r ′ , d ′ , λ ′ ) be two rooted compact metric probability spaces. Since we view a subset R ⊂ X ×X ′ as a binary relation, we write xR x ′ to indicate that (x, x ′ ) ∈ R. For any A ⊂ X, we let AR = {x ′ ∈ X ′ : ∃x ∈ A with xRx ′ } and we define RB similarly for any subset B ⊂ X ′ . In what follows, we use the term correspondence from X to X ′ to refer to any nonempty subset R ⊂ X × X ′ . Note that it is sometimes required that R satisfies XR = X ′ and X = RX ′ to be called a correspondence, but in our setting it will be more convenient to drop this restriction.
We now introduce a modified version of the notion of distortion of a correspondence. In what follow, A ε denotes the ε-neighborhood of a set A.
(iii) For any Borel set A ⊂ X, λ ′ ((AR) ε/2 ) + ε ⩾ λ(A). ⋄ The usual notion of distortion only takes (i) into account: (ii) is added to be able to relax the usual definition of correspondence, as discussed above; and (iii) controls the Prokhorov part of the Gromov-Hausdorff-Prokhorov topology. It may seem that by replacing the ε/2 with ε would yield a more natural definition; however, this ε/2 makes several calculations neater. Finally, note that because we have only imposed one inequality in (iii), this definition is not symmetric: if we let R −1 = {(x ′ , x) : (x, x ′ ) ∈ R} then a priori dis(R) ̸ = dis(R −1 ).
As the next lemma shows, correspondences and their distortions provide a simple characterization of the (rooted) Gromov-Hausdorff-Prokhorov convergence.
It is readily checked that δ is indeed a metric. Moreover, for any (x, x ′ ) ∈ R, we have δ(x, x ′ ) = ε/2. This implies that δ(r, r ′ ) ⩽ ε/2, and that for each Borel set A ⊂ X, (AR) ε/2 ⊂ A ε . Therefore, it follows from point (ii) of Definition A.3 that the Hausdorff distance between X and X ′ in (X ⊔ X ′ , δ) is at most ε. Similarly, it follows from point (iii) of Definition A.3 that the Prokhorov distance between the extensions of λ and λ ′ to X ⊔ X ′ is also at most ε.
Therefore, d GHP (X, X ′ ) ⩽ ε and the proof is complete.
We are now ready to prove Lemma 3.2. First, recall from Definition 3.1 how to obtain a rooted compact metric probability space T h from a nonnegative càdlàg function h such that h(0) = 0. Proof. It is classic [10] that the Skorokhod topology can be metrized by the following metric: for two càdlàg functions f and g : [0, 1] → R, define where θ runs over the set of continuous increasing bi-Lipschitz bijections from [0, 1] into itself, and Note that, since θ(0) = 0 and θ(1) = 1 for every such bijection θ, if Lip(θ) < 1 + ε then ∥θ − Id∥ ∞ < ε, where Id is the identity map.
By Lemma A.4, to show that d GHP (T f , T g ) ⩽ 4ε, it is sufficient to check that dis(R) ⩽ 4ε and that r f R r g . The latter point is trivial since r f = φ(0) and r g = ψ(0) = ψ(θ(0)), therefore we need to check that the following three points hold: (i) (T f R) 2ε = T g and T f = (R T g ) 2ε ; this is also immediate since T f R = ψ([0, 1]) is dense in T g and since R T g = φ([0, 1]) is dense in T f .
To prove (ii), consider s < t ∈ [0, 1]. We need to show that This is readily seen, since To show (iii), consider a Borel subset A ⊂ T f , and let ℓ denote the Lebesgue measure on [0, 1], so that λ f (A) = ℓ(φ −1 (A)) and λ g (A) = ℓ(ψ −1 (A)). Notice that, by definition, AR = ψ • θ(φ −1 (A)). Therefore, λ g (AR) = ℓ ψ −1 ψ • θ(φ −1 (A)) ⩾ ℓ θ(φ −1 (A)) . and so, by the Borel-Cantelli lemma, there almost surely exists m * such that for all m ⩾ m * , there is a unique index i m such that U is in the m-good interval J m im . This implies that, almost surely, for all n ′ ⩾ n ⩾ N m * and writing i = i mn to avoid clutter, U ∈ I mn n,i ∩ I mn n ′ ,i and so f (x n ), f (x n ′ ) ∈ B mn i with diam(B mn i ) ⩽ ε mn . This shows that (f (x n )) is almost surely a Cauchy sequence, concluding the proof.

A.4 Tail of the size of critical Galton-Watson trees
In order to make this article as self-contained as possible, we provide a short proof of Proposition 3.5 for the asymptotic equivalent of the probability that a critical Galton-Watson tree has size n, which is a key element in our study. As previously, we repeat the statement of the proposition here for convenience.
The first equality is easily seen by marking the root as to-visit, and then at each step removing a vertex from the to-visit pile and adding its children to it: the procedure ends where there are no vertices left to visit -which happens after exactly n steps, where n is the total number of vertices in the tree; and if we let ξ i denote the number of vertices added/removed from the pile at step i, then the number of vertices on the pile after step i is exactly S i + 1.