$k$-cut model for the Brownian Continuum Random Tree

To model the destruction of a resilient network, Cai, Holmgren, Devroye and Skerman introduced the $k$-cut model on a random tree, as an extension to the classic problem of cutting down random trees. Berzunza, Cai and Holmgren later proved that the total number of cuts in the $k$-cut model to isolate the root of a Galton--Watson tree with a finite-variance offspring law and conditioned to have $n$ nodes, when divided by $n^{1-1/2k}$, converges in distribution to some random variable defined on the Brownian CRT. We provide here a direct construction of the limit random variable, relying upon the Aldous-Pitman fragmentation process and a deterministic time change.


Introduction
Let k ∈ N and let T be a rooted tree. The following procedure is considered by Cai, Holmgren, Devroye and Skerman [9]. To each vertex v of T , we associate an independent Poisson process N v = (N v (t)) t≥0 of rate 1. Imagine that each time N v increases, the vertex v is cut once and is eventually removed when it receives k cuts. The procedure ends when the root is removed. We are interested in the total number of cuts, denoted as X k (T ). Let us observe that for k = 1, the above procedure reduces to the classic problem of cutting down random trees introduced by Meir and Moon [14]; see in particular [1,2,5,7,10,12] for some recent progress on the classical version.
Let ξ = (ξ(p)) p≥0 be a probability measure on the set of non negative integers which satisfies p≥1 p ξ(p) = 1, and 0 < σ := p≥2 p(p − 1)ξ(p) For n ≥ 1, let T n be a Galton-Watson tree with offspring distribution ξ conditioned on having n vertices. Berzunza, Cai and Holmgren show in [6] that where T is the so-called Brownian Continuum Random Tree, and Z k is a non degenerate random variable whose distribution is characterised via its moments. Note that the convergence of σ general setting of k ≥ 1, thus answering a question in [6] on the construction of Z k . To that end, let us start with a brief introduction to the Aldous-Pitman fragmentation process. The Aldous-Pitman fragmentation process can be viewed as the analogue of the 1-cut procedure for the Brownian continuum random tree (CRT). First, we need to construct this CRT. Let us take e = (e s ) 0≤s≤1 , where 1 2 e is distributed as the standard normalised Brownian excursion of duration 1. For s, t ∈ [0, 1], define d(s, t) = e s + e t − 2b(s, t), where b(s, t) = min s∧t≤u≤s∨t e u .
It turns out the function d is non negative, symmetric and satisfies the triangular inequality. To turn it into a metric, let s ∼ t if and only if d(s, t) = 0. Then d defines a metric on the quotient space T := [0, 1]/∼, which we still denote as d. In the sequel, we will refer to the (random) metric space (T , d) as the Brownian CRT. Note that it has "tree-like" features: each pair of points in T , say σ and σ , is joined by a unique path, denoted as σ, σ , which turns out to be a geodesic. Metric spaces with such properties are called R-trees. Interested readers can check Evans [11] and Le Gall [13] for more background on R-trees and CRT.
Let us also introduce the following notation on (T , d) which will be useful later. We denote by p : [0, 1] → T the canonical projection which sends every t ∈ [0, 1] to its equivalence class with respect to ∼. The root of (T , d) is then the point ρ = p(0) = p(1). In addition, the map p also induces a probability measure on T : the mass measure, denoted as µ, is the push-forward of the uniform measure on [0, 1] by p. On the other hand, the length measure is a σ-finite measure on T , characterised by the relation ( σ, σ ) = d(σ, σ ), for all σ, σ ∈ T .
We introduce a Poisson point measure One can imagine the (t i , x i )'s as cuts on T : at time t i , the point x i is removed from T , which disconnects the tree. As times moves on, more cuts arrive and T fragments into finer and finer connected components. The Aldous-Pitman fragmentation consists in describing the time evolution of the collection of µ-masses of these connected components. It is also known that the above cutting process of T using points from P appears as the scaling limit of the 1-cut procedure on T n . On the other hand, the key element in our construction is the following time-changed version of P: for k ∈ [1, ∞), definẽ Here, Γ(·) stands for the Gamma function. Let us denote by T t = {σ ∈ T : P([0, t] × ρ, σ ) = 0}, the subtree connected to the root at time t. Similarly, denoteT t = {σ ∈ T :P([0, t] × ρ, σ ) = 0} the remaining subtree in the time-changed cutting process. We define For k = 1, X 1 (T ) appears in [1,2,5] as the scaling limit of X 1 (T n ). Let us also recall Aldous and Pitman [4] have shown that the process (µ(T t )) t≥0 has the same distribution as ((1 + L t ) −1 ) t≥0 with (L t ) t≥0 being a 1 2 -stable subordinator. Combined with a Lamperti time-change, this then implies X 1 (T ) has the Rayleigh distribution ( [5]). Note that we also have the following bound from (3).
So in particular, X k (T ) < ∞, a.s. Let us also point out that even though the discrete model is only defined for k ∈ N, the above definition of X k (T ) makes sense for all k ∈ [1, ∞). Here is our main result.
Theorem 1. For all k ∈ N, conditional on (T , d), X k (T ) has the same distribution as Z k .
We'll give two proofs to the theorem. In Section 2, we give a first proof by identifying the conditional moments of X k (T ) given T with those of Z k , which were computed in [6]. In Section 3, we give a second proof via weak convergence arguments. Even it takes a bit more space, the second proof is perhaps more helpful in explaining the motivation for the definition (3), as well as provides an alternative proof to the convergence in (1).
2 Conditional expectation of X k (T ) given T We will need the following notation. For q ∈ N and s = (s 1 , s 2 , . . . , s q ) ∈ [0, 1] q , we set ∆ e 1 (s) = e s 1 , and more generally for 2 ≤ r ≤ q, Note that ∆ e 1 (s) + · · · + ∆ e r (s) is the total length (i.e. -mass) of the reduced subtree of T spanned by p(s 1 ), . . . , p(s r ), for all r ≤ q. Our goal is to prove the following formulas on the moments of X k (T ).
Then v ∈T t if and only if E v > t. Therefore, we can re-write X k (T ) q as follows.
where we have used in the last line the definition that µ is the push-forward of the Lebesgue measure on The above yields that We then split R q + into q! subdomains according to the q! outcomes in ranking (t i ) 1≤i≤q . However, (s i ) 1≤i≤q is sampled in an i.i.d fashion and is therefore exchangeable, so that integration from each subdomain will contribute equally. Hence, Let R q be the reduced subtree of T spanned by v 1 = p(s 1 ), . . . , v q = p(s q ), i.e. the smallest connected subspace of T containing these q points and the root ρ. Note that R q is a "finite" tree in the 0 s 1 s 2 1 e Figure 1: An illustration of R q with q = 2. Here, R q has the shape of a binary tree with 2 leaves, one branch point and three edges (depicted by the line segments in bold). The edge lengths correspond to the lengths of these line segments.
sense that it only has a finite number of branch points and leaves. Here, it will be convenient to think of it as a (graph) tree (V q , E q ), where the vertex set V q consists of the root, the leaves and the branch points of R q and each edge e ∈ E q is equipped with an edge length l(e) ∈ (0, ∞). These edge lengths are consistent with the distance d in the following way: where P (v) stands for the set of edges on the path from the root ρ to v. See also Fig. 1 for an example of R q . Now to each edge e in this tree, we associate an independent exponential variable E e of mean 1/l(e). It follows from the definition (2) ofP that (E vr ) k /k! is distributed as an exponential random variable of mean 1/d(ρ, v r ) = 1/e sr . It is then straightforward to check that Bearing in mind that t 1 > t 2 > · · · > t q , we then find that By the previous arguments, this completes the proof.
Proof 1 of Theorem 1. Comparing (5) with equations (8) and (9) in [6], we see that Applying Theorem 2 and Lemma 8 there, we conclude that conditional on e, X k (T ) has the same distribution as Z k .
3 Scaling limit of X k (T n ) Here, we give a second proof of the theorem by showing X k (T ) is the scaling limit of X k (T n ). Throughout this section, we assume k ∈ {2, 3, . . . }.

Convergence of random trees
We briefly recall Aldous' Theorem on the convergence of the conditioned Galton-Watson tree T n , as well as provide some necessary background on the Gromov-Hausdorff topology. Further details on these topics can be found in [3,8,11,13,15].
The Gromov-Hausdorff distance between two compact metric spaces (X, d X ) and (Y, d Y ) is the following quantity: where the infimum is over all the isometric embeddings φ : X → Z and ϕ : Y → Z into a common metric space (Z, d Z ), and d Z,Haus stands for the usual Hausdorff distance for the compact sets of Z.
In our application, we often need to keep track of specified points in the initial spaces. To that end, let x = (x 1 , . . . , x p ) and y = (y 1 , . . . , y p ) be p ∈ N points of X and Y . Then the marked Gromov-Hausdorff distance between (X, d X , x) and (Y, d Y , y) is defined as where the infimum is again over all the isometric embeddings of X and Y into a common metric space. For each p ≥ 1, it turns out that the space of metric spaces with p marked points is a Polish space with respect to d p,GH ( [15]). Now the convergence of T n mentioned earlier can be given a precise meaning. Let us recall that the Brownian CRT (T , d) is a metric space by definition. Recall also ρ ∈ T stands for its root. Equipping its vertex set with the graph distance, we can also view the tree T n as a metric space. Let us denote by σ √ n T n the rescaled metric space where the graph distance is multiplied by a factor σ √ n . Denote also by ρ n its root. We have in the weak topology of the marked Gromov-Hausdorff distance. We note that T is further equipped with a probability measure µ. Let us define its discrete counterpart: for n ≥ 1, let µ n be the uniform probability measure on the vertex set of T n . In fact, Aldous's Theorem in [3] also implies the following convergence of reduced trees. Given T , let (V i ) i≥1 be an i.i.d. sequence of points in T sampled with µ. For p ∈ N, denote by R p the reduced tree of T spanned by V 1 , . . . , V p . Similarly, we sample an i.i.d. sequence (V n i ) i≥1 from T n with law µ n . Let R n p be the reduced subtree of T n spanned by V n 1 , . . . , V n p , namely, the smallest subgraph of T n (an edge of the subgraph is also an edge of T n ) containing V n 1 , . . . , V n p and the root ρ n . As above, we denote by σ √ n R n p the metric space obtained from R n p by equipping its vertex set with σ √ n times the graph distance. Then we have with respect to the marked Gromov-Hausdorff topology. We have seen that R p can be viewed as a (graph) tree with edge lengths. But so does σ √ n R n p , where the edge length is simply σ √ n . In fact, the convergence in (7) amounts to saying that the "shape" of R n p coincides with that of R p for large n and where # stands for the counting measure on the vertex set of R n p and is the length measure of T . Let us recall the Poisson point measure P has an intensity dt (dx). Since (R p ) < ∞, there is a finite number of "cuts" (t i , x i ) from P which fall on R p before time t. So a convenient approach to studying the cutting of T is first look at those cuts on R p , p ≥ 1. We'll also see the convergences in (7) and (8) will be our starting point for proving the convergence of X k (T n ).

Convergence of the cutting process
For each vertex v of T n , let us denote η v = inf{t : N v (t) = k}, the time when v is removed from T n . We show here that the point measure P n := v∈Tn δ (ηv,v) converges in an appropriate sense toP. Let us start with the following observation.
Lemma 3. For each m ∈ N, suppose a m ∈ (0, ∞) and let (G m,i ) 1≤i≤m be independent Gamma(k, 1 am ) random variables whose probability density function is given by If m a k m → a ∈ (0, ∞) as m → ∞, then we have where (N (t)) t≥0 is a Poisson process on R + of rate a and D(R + , R) is the space of càdlàg functions endowed with the Skorokhod topology.
Proof. Let G denote a Gamma(k, 1) random variable and let X be a Poisson random variable of mean t. We note that where R(·) is bounded on any finite interval. Let T > 0. For all t ≤ T and p ≥ 0, noting P(G m,1 ≤ t) = P(G ≤ a m t), we deduce that We now extend this to multidimensional marginals. Let l ≥ 2, 0 ≤ t 1 ≤ t 2 ≤ · · · ≤ t l and a sequence of non negative integers p 1 ≤ p 2 ≤ · · · ≤ p l . Then for m ≥ p l , we apply (9) again to find that . Combined with an induction argument, this readily yields the distributional convergence of (N m (t i ), 1 ≤ i ≤ l) to (N m (t i ), 1 ≤ i ≤ l) for all (t i ) 1≤i≤l , l ≥ 1. Since t → N m (t) is non decreasing, we conclude with the convergence in D(R + , R).
Recall the reduced trees R n p and R p . Let us take the vertices v ∈ R n p and rank them in the increasing order of the η v 's. We write the ranked sequence as (v n,p i ) 1≤i≤#R n p so that η v n,p Similarly, sinceP([0, t] × R p ) = #{(s i , x i ) : x i ∈ R p , s i ≤ t} < ∞ for each t > 0, we can rank the elements of {(s i , x i ) : x i ∈ R p } in the increasing order of their first coordinates and write the ranked (infinite) sequence as (τ p 1 , χ p 1 ), (τ p 2 , χ p 2 ), . . . . Let us also denote Proposition 4. For each p ≥ 1, as n → ∞, we have for all j ≥ 1, where the convergence of the first coordinates is with respect to the marked Gromov-Hausdorff topology.
Proof. Since the η v 's are i.i.d, the law of (v n,p 1 , . . . , v n,p j ) is that of a uniform sampling without replacement, and is further independent of (η v n,p i ) 1≤i≤j . Combined with the convergence in (7), this implies that (v n,p 1 , . . . , v n,p j ) converges in distribution to j independent uniform points in R p , which is precisely the distribution of χ p 1 , . . . , χ p j . So it remains to check the convergence of η v n,p i . Let us define Since each δ −1 n η v is distributed as an independent Gamma(k, 1 δn ), applying Lemma 3 with m = #R n p and a m = δ n , we obtain from (8) that (N n,p (t)) t≥0 converges in distribution to N (t k /k!) t≥0 , a Poisson process of rate (R p ). By (2), the latter has the same law as (P([0, t] × R p )) t≥0 . Standard results on point processes then allow us to complete the proof.
Let T n (t) be the subtree of T n formed by the vertices connected to the root at time t. Note that a vertex v ∈ T n (t) if and only if none of its ancestors nor v itself has been removed by time t. Let us denote µ n (t) = µ n (T n (t)). Recall thatT t is the subtree of T connected to the root at time t from the cutting processP. Proposition 4 implies the following Lemma 5. As n → ∞, jointly with the convergence in (7), we have (µ n (δ n t)) t≥0 converging to (µ(T t )) t≥0 in distribution with respect to the Skorokhod topology on D(R + , R).
Proof. The arguments are similar to the ones in Section 2.3, [4], so we'll only sketch the proof. Recall that (V n i ) i≥1 (resp. (V i ) i≥1 ) is a sequence of i.i.d. uniform vertices of T n (resp. i.i.d. points of T with law µ). By Law of Large Numbers, we have for each t > 0, On the other hand, V n i ∈ T n (t) if and only if the first η v for those v in the path from the root to V n j arrives after t. Therefore, according to Proposition 4, for each j ≥ 1, It follows that we can find a sequence k n → ∞ slowly enough such that jointly with (7). Invoking Law of Large Numbers again, we deduce that µ n (δ n t) → µ(T t ) in distribution, jointly with (7). These arguments can also be adapted to prove the convergence of the multidimentional marginals. The functional convergence then follows thanks to monotonicity.
By the Skorokhod representation, we can assume from now on that jointly with (7), we have

Records and numbers of cuts
Recall the Poisson process N v associated to each vertex v ∈ T n . Let us write η v,r = inf{t : N v (t) = r} for the r-th jump of N v ; in particular, η v,k = η v . For r = 1, · · · , k, we say v is a r-record if v is still connected to the root at time η v,r . Denote by X k,r (T n ) the total number of r-records in T n . Clearly, X k (T n ) = 1≤r≤k X k,r (T n ). On the other hand, as pointed out in Lemma 6 of [9], we have so that we only need to look for the scaling limit of X k,1 (T n ). To that end, let us introduce a n (t) = #{v ∈ T n (t) : N v (t) = 0}. Standard tools from stochastic analysis yield the following Lemma 6. For all n ≥ 1, we have Proof. For t > 0, let us denote the number of 1-records which have occurred by time t. Clearly, X n (∞) = X k,1 (T n ). Note that η v,1 is distributed as an exponential variable with mean 1. It is then classic that is a martingale which further satisfies that E[M 2 t ] = E[ t 0 a n (s)ds]. In the terminology of point processes, this is saying that ( t 0 a n (s)ds) t≥0 is the compensator of (X n (t)) t≥0 . On the other hand, for each fixed n, one can easily convince oneself that E[ ∞ 0 a n (s)ds] < ∞. Therefore, (M t ) t≥0 is also bounded in L 2 . Taking t → ∞ yields the desired result.
Proof. Conditional on µ n (t), a n (t) is distributed as Binomial(n µ n (t), e −t ). Hence, E 1 n a n (δ n t) − µ n (δ n t) = E[µ n (δ n t)](1 − e −δnt ) ≤ δ n t → 0, as n → ∞. Proof. The first part of the proof is identical to that of Lemma 3 in [5]. We include it here for the sake of completeness. Let p(t) = P(η v > t) be the probability that v is not removed at time t. We note that v ∈ T n (t) if and only if N w > t, for every vertex w in the path from the root to v. Letting ht(v) be the number of vertices in that path, we can write where Z m (T n ) = #{v ∈ T n : ht(v) = m}. Now according to Theorem 1.13 in [12], there exists some constant C ∈ (0, ∞) which only depends on the offspring distribution ξ such that E[Z m (T n )] ≤ Cm for all n and m. It follows that n E[µ n (t)] ≤ C m≥1 m p(t) m = Cp(t) (1 − p(t)) 2 .
On the other hand, since η v has the same distribution as the sum of k independent exponential variables of mean 1, we deduce the bound p(t) ≤ k exp(−t/k). For small values of t, we will use instead: Let t 0 be such that k exp(−t 0 /k) < 1. Applying the previous bounds, we find that for n large enough, E ∞ t µ n (δ n s)ds ≤ C n ∞ t p(δ n s) (1 − p(δ n s)) 2 ≤ C n t 0 /δn t ds e −2δns (δ n s) 2k /(k!) 2 + C n