Random recursive trees: A boundary theory approach

We show that an algorithmic construction of sequences of recursive trees leads to a direct proof of the convergence of random recursive trees in an associated Doob-Martin compactification; it also gives a representation of the limit in terms of the input sequence of the algorithm. We further show that this approach can be used to obtain strong limit theorems for various tree functionals, such as path length or the Wiener index.


INTRODUCTION
A tree with node set [n] := {1, . . ., n} is recursive if the node numbers along the unique path from 1 to j increase for j = 2, . . ., n. Trees with this property may be encoded by a sequence ( j 1 , . . ., j n−1 ), where j k ∈ [k] denotes the direct ancestor of k + 1 (next node on the way to the 'root' 1).Such a sequence also gives a recipe for growing the corresponding tree: Starting with the unique recursive tree of size (number of nodes) 1, which consists of the root node 1 only, we obtain the respective next tree by joining node k to node j k−1 , k = 2, . . ., n. Choosing the ancestor of the next node uniformly at random among the nodes of the current tree we obtain a sequence Y 1 ,Y 2 , . . . of random recursive trees, which we collect into a stochastic process Y = (Y n ) n∈N .
A survey of random recursive trees and their applications is given in [SM94]; for a more recent reference see [Drm09,Chapter 6].Various functionals of these structures have been considered by different authors, a representative but not exhaustive list being node degrees [Szy90b,GS02,Jan05], height [Pit94], path length [Mah91,DF99], profiles [FHN06], spectra [BES12], and various 'topological' indices, such as the Wiener and the Zagreb indices [Nei02,FH11].Often the results are limit theorems, with (strong) convergence of the random variables or convergence of their distributions as n → ∞.This, in the authors' view, naturally raises the question of convergence of the trees themselves, with the aim of developing a systematic approach to the strong asymptotics of tree functionals.The Doob-Martin compactification, initiated by the fundamental paper [Doo59], is a general tool that can be used in this context; see [Woe09] for a recent textbook introduction.In particular, using concepts from discrete potential theory it provides an enlargement of the state space of a Markov chain such that the variables converge almost surely.This approach has been used in [EGW12] to obtain convergence results for a class of randomly growing discrete structures that includes various random trees.
If we use the encoding explained in the opening paragraph then it is possible to retrace the full sequence Y 1 , . . .,Y n−1 of previous trees from the current tree Y n .In such a case the discrete potential theory approach leads to convergence in the sense of projective (or inverse) limits, which is of little help for proving convergence of functionals.Noting that the functionals of interest are often invariant under relabelling (a phrase that has to be made precise) we therefore choose a model that is coarser in the sense that it 'forgets the labels' but retains the Markov property.This partial loss of information turns the sequence Y into a sequence X = (X n ) n∈N of randomly growing subsets of a fixed infinite tree.For this chain, the Doob-Martin compactification has been determined in [EGW12].The first of our aims here is to show that the convergence result provided by the general theory can be obtained more directly by using a suitable algorithmic construction, and that this approach has the advantage of leading to a description of the limit X ∞ in terms of the input sequence of the algorithm.The representation serves as the basis for the analysis of tree functionals such as different notions of path length and the Wiener index; indeed, our second objective is to obtain strong limit theorems for such functionals.A similar strategy has been used in [Grü14] for binary search trees.
In the next section we first take care of a variety formal details, including some terminology and notation, and then give a new 'constructive' proof of the basic limit result.In Section 3 we discuss various tree functionals and comment on the connections to related work.

THE LIMIT TREE AND ITS DISTRIBUTION
We introduce Harris trees and the Harris chain generated by the RRT process; in view of its confounding potential we spell out the details of the transition from recursive to Harris trees.From the RRT sequence Harris chains inherit a useful decomposition property.Next, we recall from [EGW12] the Doob-Martin compactification of the Harris chain.Then we explain an algorithm which is then used to give a new proof of that part of [EGW12, Theorem 6.1] that is relevant for our present purposes, together with a representation of the limit.Finally, we collect some auxiliary results on the distribution of the limit that will be useful in the next section when we analyze tree functionals.
2.1.From recursive trees to Harris trees.We regard the set V = N of finite sequences of natural numbers as the set of potential tree nodes and write u + v = (u 1 , . . ., u k , v 1 , . . ., v l ) for the concatenation of the nodes u = (u 1 , . . ., u k ) and v = (v 1 , . . ., v l ), abbreviating u + (i) to ui, i ∈ N. By a Harris tree we mean a finite subset x of V with the properties (H1) if u + v ∈ x, then u ∈ x, (H2) if ui ∈ x with i > 1, then u j ∈ x for j = 1, . . ., i − 1. Condition (H1) is prefix stability if we regard nodes as words with letters from the alphabet N. In a family tree interpretation, condition (H2) means that a non-root node must either be the first child of its ancestor node or that it must have earlier-born siblings.Harris trees are also known as Ulam-Harris trees; they may be seen as rooted planar trees with a specific labelling of nodes.
We write H for the set of Harris trees and H n for the subset of those trees that have n nodes.In order to relate Harris trees to recursive trees we map the nodes j of a recursive tree to words u( j) = (u 1 ( j), . . ., u k ( j)) ∈ V as follows: The length k of the word is the distance to the root of (the node labelled) j, and u k ( j) is the number of nodes i ∈ [ j] that have the same direct ancestor as j.The prefix sequences similarly encode the nodes from the root to j.This corresponds to an embedding of recursive trees into the plane where new nodes are placed to the right of their siblings.
Clearly, there are (n − 1)! possibilities for the encoding sequences for recursive trees with n nodes, hence this is also the number of recursive trees with n nodes.Figure 1 shows the five elements of H 4 .Of the (4 − 1)! = 6 recursive trees with four nodes, encoded by (2,1) The Harris trees with four nodes.
(1, 2, 3), (1, 1, 2), (1, 2, 1), (1, 2, 2), (1, 1, 3) and (1, 1, 1) respectively, the second and third are mapped to the same Harris tree.The figure also offers an opportunity to comment on the informal expression of 'forgetting the labels' that we used above and that often appears in the literature: It is tempting to regard this as passing from graphs to isomorphism classes, but this is not what is happening here-indeed, the second and the fourth Harris tree in Figure 1 are isomorphic as rooted trees.A compatible notion of equivalence and isomorphism in the present situation can be obtained on the basis of the above planar embedding of recursive trees.
Writing Ψ for the function that maps recursive trees to Harris trees, we define X = (X n ) n∈N by X n := Ψ(Y n ) for all n ∈ N, where Y is the RRT chain introduced in Section 1.In the original process, Y n is uniformly distributed on its range, but X n is not uniformly distributed on H n as explained above for n = 4.In this new process, it is no longer possible to 'trace back' to previous values.As Ψ does not change the number of nodes it is adapted to the combinatorial family H in the sense that P(X n ∈ H n ) = 1 for all n ∈ N. To see that it retains the Markov property and to obtain the corresponding transition probabilities we argue as follows: Let y n , y n be recursive trees with n nodes and let y n+1 be a recursive tree with n + 1 nodes.Suppose that Ψ(y n ) = Ψ(y n ) =: x n and let x n+1 := Ψ(y n+1 ).If x n ⊂ x n+1 then there is a unique recursive tree z n+1 such that Ψ(z n+1 ) = x n+1 and and similarly there is a z n+1 with the same property for y n .Clearly, if x n ⊂ x n+1 , then these probabilities will be 0.This shows that whenever Ψ(y n ) = Ψ(y n ), and further that . By [LPW09, Lemma 2.5] the first of these implies that X is a Markov chain; the second shows that, as with Y , we select the ancestor for the new node uniformly at random in the step from X n to X n+1 .

2.2.
A tree decomposition.We associate with a node u = (u 1 , . . ., u k ) ∈ V its 'flat' and 'raised' version and lift this to trees x ∈ H via These are the subtree of x rooted at (1) and the shifted tree that remains if this subtree is taken out.It is well known that the random variables K n := #X n , X n and X n are independent, with K n uniformly distributed on [n − 1], and that, conditionally on K n = k, X n and X n have the same distribution as X k and X n−k respectively.An interesting combinatorial proof of the corresponding statement for the Y process, based on a bijection between permutations and random recursive trees, is given in [DF99].An alternative proof can be obtained on using the algorithmic background to be given in Section 2.4 below.
2.3.The Doob-Martin compactification of the Harris chain.The paths of the stochastic process X are sequences of growing subset of the set V of all potential nodes.We may regard V itself as the infinite Harris tree (note that this tree is not locally finite).It can be shown that, in the X-sequence, every potential node will eventually be an element of the infinite Harris tree.Hence, if we embed H into {0, 1} V via the node indicators, then X n converges almost surely to this infinite tree, which is represented by the function that is constant 1.This, however, does not capture the 'true' asymptotics of X.In contrast, Markov chain boundary theory provides a state space completion (compactification) H of H with the properties (L) X n → X ∞ ∈ ∂ H := H \ H with probability 1 as n → ∞, (T) X ∞ generates the tail σ -field associated with X, up to null sets.
For (T), we require that the chain has the space-time property, meaning that the time parameter n is a function of the state x.For the Harris sequence this is the case, so (T) implies that the Doob-Martin compactification captures the persisting randomness of the sequence, whereas for any one-point compactification the σ -field generated by the limit will always be trivial in the sense that only 0 and 1 arise as probabilities of its elements.The Doob-Martin compactification H of H with respect to the Harris chain has been identified in [EGW12].Let In words: V consists of all finite and infinite sequences of natural numbers, plus all infinite sequences u = (u i ) i∈N ⊂ N {∞} with the property that, for some k ∈ N, Let V be the σ -field on V generated by the sets A u , u ∈ V, let H be the set of probability measures µ on (V, V ), and endow H with the coarsest topology that makes the functions µ → µ(A u ), u ∈ V, continuous.Finally, embed H into H by identifying x ∈ H with the uniform distribution on x as a subset of V. Then H is the Doob-Martin compactification of H induced by the chain X, up to homeomorphism.2.4.The algorithmic construction.For x ∈ H and u ∈ V let x(u) := {v ∈ V : u + v ∈ x} be the subtree of x rooted at u. Then the embedding of H into H may be written as and we can restate the convergence in the Doob-Martin topology of a sequence Our plan is to prove the almost sure convergence of the Harris chain X in this topology by using an algorithm that generates X if the input is chosen appropriately.The recursive tree algorithm maps an input sequence t = (t n ) n∈N of pairwise distinct positive real numbers to an output sequence (x n , φ n ) n∈N of labelled trees, with x n ∈ H n and φ n : x n → R. The algorithm works sequentially, starting with x 1 = { / 0} and the label φ 1 ( / 0) = t 0 := 0 for the root node.As explained in Section 1 and at the end of Section 2.1, we need to specify the direct ancestor of the new node v to be added in the step from x n to x n+1 : We attach v as a next (resp.the first) child to the node with label max{t j : j = 0, . . ., n − 1, t j < t n } and then label v by t n .Figure 2 shows an example where new children are positioned to the right of their older siblings.By RT(t) we mean the sequence (x n ) n∈N , i.e. we ignore the labels.
Clearly, if the trees converge, then the limit must be a function of the input sequence.In order to be able to specify this relationship we need some more notation: Given an increasing sequence (x n ) n∈N of Harris trees, let be the (augmented) increasing order statistics associated with the first n values t 1 , . . .,t n , and let κ(u) := #{1 ≤ i ≤ τ(u) : t i ≤ t τ(u) } be the rank of t τ(u) in t 1 , . . .,t τ(u) , so that t (τ(u):κ(u)) = t τ(u) .In Figure 2 for example, the node u = (2, 2) has τ(u) = 4, κ(u) = 2 and t τ(u) = 0.22.
The following result relates the algorithm and the limit object.Let unif(0, 1) be the uniform distribution on the unit interval.We write L (Y ) for the distribution (law) of the random quantity Y and sometimes use Y ∼ µ instead of L (Y ) = µ.Theorem 1.Let η i , i ∈ N, be independent random variables, with η i ∼ unif(0, 1) for all i ∈ N.
(a) The algorithm RT generates the RRT chain in the sense that RT(η) and X are identical in distribution.
(b) Suppose that X = RT(η).Then X n converges almost surely to X ∞ in the Doob-Martin topology as n → ∞, where on a set of probability 1 the limit X ∞ is given by Proof.Part (a) belongs to the folklore of the subject.Due to its importance for the present paper we recall for the proof that the rank of η n in η 1 , . . ., η n is uniformly distributed on {1, . . ., n}, and that rank i means that η n is attached to the node with label η (n:i−1) (which is the root if i = 1).
With each node u we associate the interval From the definition of the RT algorithm, nodes added to the tree at a time n > τ(u) will have prefix u if and only if η n ∈ I(u).The random variables η τ(u)+n , n ∈ N, are independent and uniformly distributed on the unit interval, hence (1) follows with the Glivenko-Cantelli theorem.
Theorem 1 can be related to the corresponding result [Grü14, Theorem 1] for binary search trees via the natural or rotation correspondence between Harris trees and binary trees [Knu97, Section 2.3.2][FS09, p.73]; details are given in [Mic14].
In addition to the convergence of the trees we also obtain the distribution of the limit X ∞ , which takes its values in the set of probability measures µ on ( V, V ).As a preliminary step we extend the tree decomposition introduced in Section 2.2 to H as follows: for all u ∈ V\ { / 0}, and µ Proposition 2. Let X ∞ be as in Theorem 1. Then the random variables η := X ∞ (A (1) ), X ∞ and X ∞ are independent.Further, η ∼ unif(0, 1), and X ∞ and X ∞ have the same distribution as X ∞ .
Let Σ ∞ ⊂ [0, 1] ∞ be the infinite-dimensional probability simplex, that is, the set of all sequences (ρ i ) i∈N with ρ i ≥ 0 for all i ∈ N and ∑ ∞ i=1 ρ i = 1.An atom-free and diffuse µ associates with each u ∈ V an element ρ(µ, u) For later use we note that, for such µ, Clearly, µ can be reconstructed form ρ(µ, u), u ∈ V.In fact, (2) Of course, for random input both the τ-and the κ-values will be random, as will be µ.
We say that a random variable ξ = (ξ i ) i∈N with values in Σ ∞ has the (standard) GEM (Griffiths-Engen-McCloskey) distribution if its components can be written as At each level k, the sets A u with |u| = k provide a partition of V \ N k−1 .The corresponding values X ∞ (A u ) are related to the kth nested decomposition of the unit interval into descending records of the input sequence.This interpretation suggests the following result, which gives a description of the distribution of X ∞ .
Theorem 3. Let X ∞ be as in Theorem 1. Then the random variables ρ(X ∞ , u), u ∈ V, are independent and GEM distributed.
), X ∞ and X ∞ are independent.Repeating the decomposition with the respective raised part, we obtain that the variables with ρ o (X ∞ , / 0) := 1, are independent and unif(0, 1)-distributed, that these are independent of the random probability measures X ∞,i , i ∈ N, defined by and that the these measures are independent and identical in distribution to X ∞ .(It is easy to see that X ∞,i arises as the -part of the ith iteration of the decomposition) .In particular, ρ(X ∞ , / 0) ∼ GEM.Taken together, this proves the case k = 0 of the following statement: We can apply the same reasoning used for k = 0 separately to each of the nodes at level k + 1 to obtain the induction step from k to k + 1.
This shows that the above compound statement holds for all k ∈ N; clearly, (i) and (ii) imply the assertion of the theorem.
In view of the fact that X ∞ (A u ) is a function of the variables ρ(X ∞ , v) with |v| < |u| we obtain that X ∞ (A u ) and ρ(X ∞ , u) are independent, for all u ∈ V.

Conditional distributions.
In order to be able to use the general limit theorem for the analysis of tree functionals in the next section we need the conditional distribution of X ∞ given X n .For this we rely on the results in [EGW12, Section 6]; we also need some more notation.
The distribution Beta(α, β ) with parameters α, β > 0 is given by its density For later use we recall that (5) and, clearly, Beta(1, 1) = unif(0, 1).For a = (a 1 , . . ., a k ) ∈ N we write GEM(a) for the distribution of the Σ ∞ -valued random sequence ξ = (ξ i ) i∈N given by ( 6) where ζ i , i ∈ N, are independent and ( 7) Interestingly, the marginals of such random sequences are again beta distributed (of course, they are no longer independent).
Theorem 5.The conditional distribution of ρ(X ∞ , u) given X n is GEM(a), where a = (a 1 , . . ., a k ) with Further, the random sequences ρ(X ∞ , u), u ∈ V, are conditionally independent given X n .
Proof.By the general theory of Markov chain boundaries the conditional distribution Q 2 of X ∞ given X n = x ∈ H n has density K(x, •) with respect to the (unconditional) distribution Q 1 of X ∞ , where K denotes the extended Martin kernel.Note that Q 1 and Q 2 are probability measures on the set H of probability measures µ on ( V, V ), where the σ -field on H is the one generated by the evaluation maps µ → µ(A u ), u ∈ V.The extended Martin kernel has been determined in [EGW12]: It can be written as the product of 'local extended kernels', where u(x) = (a 1 , . . ., a k ) ∈ N is given by (8) with x instead of X n , and (10) x with respect to its corresponding unconditional counterpart Q 1,u , which we know to be the GEM distribution.The product form (9) implies that the independence of the sequences ρ(X ∞ , u), u ∈ V, remains intact in the transition from This proves the second part of the theorem.Now let T : Σ ∞ → [0, 1] ∞ be given by This is the inverse of the transition from ζ to ξ in (6).We know that the push-forward Q T 1,u of Q 1,u under T is the infinite product of uniforms.The first part of the theorem refers to the push-forward is given by for almost all t = (t i ) i∈N ∈ [0, 1] ∞ , with f as in (4).With all this notation in place it remains to check that g • T = K u (x, •), with K u as in (10).This, however, is a bookkeeping task.
The embedding of H into H, which maps X n to the uniform distribution on its nodes, leads to an interpretation of X n as a real-valued random function on V via u → #X n (u)/n.Similarly, the limit X ∞ can be seen as the random function u → X ∞ (A u ) on V. Obviously, all these functions are bounded and, if we endow V with the discrete topology, they are continuous.This displays X n , n ∈ N, and X ∞ as random elements of an infinite-dimensional separable Banach space.
due to the Markov property.Further, the notion of infinite-dimensional martingale, see e.g.[Nev75, Section-V.2], in the present context means that we have to check that Let u = (u 1 , . . ., u k ) ∈ V be given and let From (2) we obtain X ∞ (A u ) = ∏ k i=1 ξ i , and by Theorem 5 the factors are conditionally independent given X n .Hence, using Lemma 4 and (5), This result may be seen as a consequence of the general Doob-Martin construction.

TREE FUNCTIONALS
Let Y = (Y n ) n∈N be the RRT chain and let X = (X n ) n∈N , with X n = Ψ(Y n ) for all n ∈ N, be the associated Harris chain.In this section we consider functionals of the recursive trees that are invariant under Ψ and hence can be written as functions V n = Φ(X n ) of the X-variables.A typical example is the total path length, which is the sum of the depth of all nodes in the tree.The methods discussed below can be applied to fairly general functions Φ, but here we will restrict ourselves to the real-valued case.
There are two main probabilistic methods to obtain distributional or even strong limit results for suitably standardized versions of the V -variables.In the first of these, we try to find a suitable martingale and then apply a martingale limit theorem.In the second, we use the internal structure of the X-variables to find a recursion for the V -variables and then apply Banach's fixed point theorem with a suitably chosen metric space of probability measures.The prototypical example is the number of comparisons needed by the Quicksort algorithm, which can be related to the total path length of binary search trees: The martingale approach is carried out in [Rég89], whereas [Rös91] employed the second approach, which since then has come to be known as the contraction method.The two methods may fruitfully be combined, as exemplified by [DF99] in connection with the total path length of random recursive trees.
On its own the martingale method does not say anything about the limit, and the contraction method may miss the fact that the random variables themselves converge.The method suggested in the present paper and in [Grü14] needs some additional investment in connection with proving the convergence of the discrete structures themselves but then provides a unifying approach: In view of the fact that X ∞ generates the tail σ -field associated with the Harris chain, see property (T) in Section 2.3, any almost sure limit Y ∞ must be a functional Y ∞ = Ψ(X ∞ ) of X ∞ , up to null sets.Projecting Y ∞ on the natural filtration we obtain a convergent martingale, which often turns out to be a simple transformation of the variables V n .Below we carry this out for two versions of the total path length and for the Wiener index.
3.1.Total path length.This is simply the sum of all node depths and can be written in terms of subtree sizes as Here we have written |u| for the length (or depth) k of u = (u 1 , . . ., u k ) ∈ V. We need the auxiliary function The harmonic numbers will appear repeatedly; we will write H(n) instead of H n whenever this is typographically more convenient.We collect some auxiliary statements.
In particular, EC(ξ Proof.For the proof of the first part we assume that a = / 0 and use the representation of ξ by a sequence (ζ i ) i∈N of independent random variables with distribution unif(0, 1), see (6).Then, for each i ∈ N, p , where we have used independence and this shows that ξ i log ξ i p decreases at an exponential rate as i → ∞.The generalization to an arbitrary a ∈ N is straightforward.
For the proof of (b) we first note that, for ζ ∼ Beta(i, j) with i, j ∈ N, Suppose now that ξ ∼ GEM(a) with a = (a 1 , . . ., a k ) ∈ N and let b := ∑ k j=1 a j .We have ξ i ∼ Beta(a i , 1 + b − a i ) for j = 1, . . ., k by Lemma 4, hence Using the second part of Lemma 4 we see that for i > k we may write ξ i = α i β i with α i and β i independent, α i ∼ Beta(1, b) and β i the product of i − k independent unif(0, 1)distributed random variables.This gives, using (12) again, Putting pieces together we finally arrive at (11).
We may then connect the root / 0 =: ) are independent and unif(0, 1)-distributed, by (3) for a step to the right and by Theorem 3 for a downstep.
The transition u → ū in the proof corresponds to the transition to the direct ancestor (next node on the path to the root) in the infinite binary tree {0, 1} associated with V by the natural correspondence mentioned after the proof of Theorem 1.
Proof.Let p > 1.We introduce the local abbreviations with some constant that depends on p only.Unconditioning and ( 14) lead to upper bounds for both sums that decrease at an exponential rate κ k for some κ < 1/2.This offsets the cardinality 2 k of V[k], and we conclude that be the limit in Lemma 9.
almost surely and in L p for every p > 0.
Proof.We project the prospective limit on the natural filtration introduced in Corollary 6: Using the remark after Theorem 5 and Lemma 7 we obtain where a telescope effect simplified the sums.The statement of the theorem now follows with the well-known martingale convergence theorems; see e.g.[Nev75, Theorem IV-1-2, Proposition IV-2-7].
It is easy to see that EY ∞ = 0, hence it follows from the calculations in the proof that the mean of the total path length is given by ETPL(X n ) = nH n − n for all n ∈ N.
The formula for the mean and the almost sure and L p -convergence, p > 0, of the standardized total path length of random recursive trees have already been obtained in [Mah91] and [DF99] respectively; we augment this by the representation of the limit variable in terms of Doob-Martin limit X ∞ .The technical difficulty in the proof of almost sure and L 2convergence in [Mah91], as in its analogue for search trees in [Rég89], consists of showing that the respective martingales (which have to be found first) are bounded in L 2 .Here we obtain the martingale as a projection of a variable with finite second (or pth) moment onto the natural filtration of the Harris chain, which implies the desired boundedness by Jensen's inequality for conditional expectations.
3.2.Horizontal total path length.We may regard the depth |u| of a node u as its vertical position; it is the number of downward moves (if this is the direction of tree growth, from ancestor to child in familial terms) on the way from the root to u.The (vertical) total path length of a tree, considered in Section 3.1, is the sum of these positions, taken over all nodes in the tree.By the horizontal position of u we mean the number of moves to the right (if this is where new nodes are added to an existing family) on the way from the root to u.In the Harris encoding of nodes the horizontal position of the node u = (u 1 , . . ., u k ) is given by |u| 1 −|u|, and the horizontal total path length of a tree is the sum of these positions over all nodes of the tree, The horizontal position of a node can be seen as a recursive tree analogue of the notion of vertical position in a binary tree; see [Drm09, Chapter 5] for the latter.The total horizontal path length does not seem to have been considered before, but a close relative is the total path degree length investigated in [Szy90a].
We proceed as in the previous section, now using the auxiliary function Using similar arguments as in the proof of Lemma 9 we obtain that the series (16) almost surely and in L p for every p > 0.
Proof.We project Z ∞ on the natural filtration.Using (17) we get Now we proceed as in the proof of Theorem 10.
As in the vertical case, see the remark after the proof of Theorem 10, we may use the calculations in the proof to obtain an explicit formula for the mean horizontal path length, ( 18) so that we may rewrite the Wiener index for x ∈ H n in terms of total path length and subtree sizes as Again, we will show that a suitably standardized version converges almost surely if we insert for x the random variables X n of the Harris chain.In addition to Y ∞ as in (15) we need Arguments similar to those used for Y ∞ in the proof of Lemma 7 show that this series converges almost surely and that the limit has moments of all orders.
, almost surely and in L p for every p > 0.
Proof.As in the proof of the corresponding results for the other tree functionals, we project the right hand side of the formula on the natural filtration.For Y ∞ this has been done in Section 3.1.For W ∞ , we proceed as follows: For u = (u 1 , . . ., u k ) and i = 1, . . ., k let ξ i := ρ u i X ∞ , (u 1 , . . ., u i−1 ) .Then, as in the proof of Corollary 6, X ∞ (A u ) 2 = ∏ k i=1 ξ 2 i , so that, using (5) and the conditional independence from Theorem 5, whenever u ∈ X n .
In order to deal with the nodes not in X n we use the operation v → v =: φ (v) introduced in the proof of Lemma 8. Let be the set of external nodes of X n and put where φ 0 (u) Conditionally on #X n ( ṽ1) = a 1 , . . ., #X n ( ṽ j) = a j , j := v k − 1, the distribution of ξ k is equal to the distribution of Y Z, with Y, Z independent and In view of ∑ k−1 j=1 a j = #X n ( ṽ) we thus obtain and hence, for For the contribution of the nodes not in X n to the conditional expactation of W ∞ this gives Putting pieces together we arrive at and we can now proceed as in the proof of Theorem 10.
Again, we can use the proof to obtain expected values, This agrees with Neininger's result [Nei02, Theorem 1.2].
3.4.Distributional considerations.Let X ∞,i , i ∈ N, be as in the proof of Theorem 3.For the total path length the representation Y with Y ∞,i := Φ(X ∞,i ).Note that this is an equality for random variables (strictly speaking, it refers to the underlying probability measure as we may have to discard a null set for X ∞ to be atom-free and diffuse).In terms of distributions this may be rewritten as ∞ , . . .independent, and ρ ∼ GEM, Y On the other hand, it is known [DF99] that the limiting total path length also satisfies What is the connection between the two equations?Suppose that ξ ∼ GEM and let ζ = (ζ i ) i∈N be related to ξ as in (3).Consider the shifted sequence ζ = ( ζi ) i∈N with ζi = ζ i+1 for all i ∈ N. Clearly, ζ is again a sequence of independent, unif(0, 1)-distributed random variables, and it is independent of ζ 1 .This implies that the corresponding ξ is GEM distributed, and we have

Using (20) we now get, with ζ
Together with (21) this leads to the distributional equation ( 22).It is instructive to compare this with a proof of ( 22) that is based on the 'musical decomposition' in Section 2.2.The limit version of the decomposition given in Proposition 2 transforms X ∞ into independent components η = X ∞ (A (1) ), X ∞ and X ∞ with the properties that and with ρ(X ∞ , / 0) = ξ , where ξ is constructed from ξ = ρ(X ∞ , / 0) as explained above.With this construction, . Joint distribution of the vertical and horizontal total path length for the trees in H 7 .
Once again, we note that the decomposition takes place on the level of the random quantities themselves; there is no '= d '-sign.As in the transition from Section 3.1 to Section 3.2 the detailed consideration of the vertical case now makes it easy to treat the horizontal path length.With Ψ(X ∞ ) = Y ∞ + Z ∞ the limit in Theorem 11 we just replace C by C + D to obtain the decomposition Clearly, G(η) and G(η) are equal in distribution, which implies that the limit distributions arising in the vertical and horizontal case satisfy the same fixed point equation.It is straightforward to set up a metric space of probability distributions which contains these limit distributions and that turns the right hand side of (22) into a contraction, hence the limit distributions arising for the vertical and horizontal path length of random recursive trees are identical.
The above argument depends on the limit version of the decomposition.With some additional work the finite version in Section 2.2 can be used directly to obtain the convergence in distribution of the standardized path length; see [Rös91] for the Quicksort situation.As pointed out at the beginning of this section, the contraction method may miss the fact that the random variables themselves converge.On the other hand, as the above path length example shows, the approach via a fixed point relation for the limit distribution may lead to the direct recognition of the equality of two limit distributions, which may not be apparent from the representation of the respective limit random variables in terms of the limit tree (indeed, the representations Y ∞ and Y ∞ + Z ∞ , given in Theorems 10 and 11 respectively, seem to suggest that the limit distributions are different).
Equality of the limit distributions naturally raises the question whether there is a relation between the respective distributions for finite trees.Figure 3 shows the pair (i, j) of values i for the vertical and j for the horizontal total path length for all 6! = 720 recursive trees with 7 nodes, where the sizes of the black dots correspond to the multiplicities of the pairs and the blue dots represent pairs that do not appear.The picture suggests that, up to a shift that is apparent from (18), the joint distribution of total vertical and total horizontal path length is symmetric.Clearly, this would imply that the limit distributions are the same.
We now define T : V → V by T ( / 0) = / 0, T ((1)) = (1) and, if u = (u 1 , . . ., u k ) and T (u) = v with v = (v 1 , . . ., v j ), by T (u 1 , . . ., u k , 1) := (v 1 , . . ., v j−1 , v j + 1), T (u 1 , . . ., u k−1 , u k + 1) := (v 1 , . . ., v j , 1). ( 23) It is easy to see that T is bijective; in fact, T −1 = T (T can be related to the natural correspondence mentioned after the proof of Theorem 1; see [Mic14]).The recursive part (23) translates a move downwards into a move to the right and vice versa.Further, T is compatible with tree growth: If we add a node u to a tree x as a first child of v ∈ x, then T (u) is the next next child to the parent of T (u) and, again, vice versa.In particular, writing T (x) for {T (u) : u ∈ x}, we may lift T to a bijective map on H with the property that T (H n ) = H n for all n ∈ N.This construction proves L TPL(X n ) − (n − 1) = L HPL(X n ) for all n ∈ N, n ≥ 2, if we can show that the distribution of the Harris chain (X n ) n∈N is invariant under T and that (24) HPL T (x) = TPL x − 1 for all x ∈ H, #x > 1.
The first of these is an immediate consequence of the tree growth mechanism.To obtain (24) it is enough to show that This, however, can easily be proved by induction, considering the two cases in (23) separately.
In view of this simple bijective proof one may naturally wonder what the advantage of the boundary theory approach might be.Almost sure convergence of the standardized vertical and horizontal path lengths implies the convergence of any linear combinations, for example.This is of interest in connection with the analysis of the recursive tree algorithm RT introduced in Section 2.4: The number C n of comparisons needed to build the tree X n for n − 1 data is given by the sum of the horizontal and the vertical path length of X n , hence with Y ∞ and Z ∞ as in Sections 3.1 and 3.2.While the mean can be obtained from the symmetry and the individual results for the two versions of path length, we would need their joint distribution in order to obtain the limit result for the sum.
2n + 1 for all n ∈ N. as a measure of spread of an arbitrary finite connected graph G with node set V .Here d • denotes the canonical graph distance, i.e. d • (u, v) is the minimum length of a path connecting u and v in G. Let u ∧ v be the longest common prefix of u, v ∈ V.For trees we then have d • (u, v) = |u| + |v| − 2|u ∧ v| and, as in the case of binary trees [Grü14, eq.(34) corrected],