Novel Characteristics of Split Trees by use of Renewal Theory

We investigate characteristics of random split trees introduced by Devroye; split trees include for example binary search trees, $m$-ary search trees, quadtrees, median of $(2k+1)$-trees, simplex trees, tries and digital search trees. More precisely: We introduce the use of renewal theory in the studies of split trees, and use this theory to prove several results about split trees. A split tree of cardinality $n$ is constructed by distributing $n$"balls"(which often represent"key numbers") in a subset of vertices of an infinite tree. One of our main results is to give a relation between the deterministic number of balls $n$ and the random number of vertices $N$. Devroye has found a central limit law for the depth of the last inserted ball so that most vertices are close to $\frac{\ln n}{\mu}+\mathcal{O}\Big(\sqrt{\ln n}\Big)$, where $\mu$ is some constant depending on the type of split tree; we sharpen this result by finding an upper bound for the expected number of vertices with depths $\geq\frac{\ln n}{\mu}+\ln^{0.5+\epsilon} n$ or depths $\leq\frac{\ln n}{\mu}+\ln^{0.5+\epsilon} n$ for any choice of $\epsilon>0$. We also find the first asymptotic of the variances of the depths of the balls in the tree.


Introduction 1.Preliminaries
In this paper we consider random split trees introduced by Devroye [3].Some important examples of split trees are binary search trees, m-ary search trees, quadtrees, median of (2k + 1)-trees, simplex trees, tries and digital search trees.As shown in [3] the split trees belong to the family of so-called log n trees, i.e., trees with height (maximal depth) a.a.s.O(log n).(For the notation a.a.s, see [13].) The (random) split trees constitute a large class of random trees which are recursively generated.Their formal definition is given in the "split tree generating algorithm" below.To facilitate the penetration of this rather complex algorithm we will first provide a brief heuristic description.
A skeleton tree S b of branch factor b is an infinite rooted tree in which each vertex has exactly b children that are numbered 1, 2, . . ., b.A split tree is a finite subtree of a skeleton tree S b .The split tree is constructed recursively by distributing balls one at a time to a subset of vertices of S b .We say that the tree has cardinality n if n balls are distributed.Since many of the common split trees come from algorithms in Computer Science the balls often represent some "key numbers" or other data symbols.There is also a so-called vertex capacity, s > 0, which means that each node can hold at most s balls.We say that a vertex v is a leaf in a split tree if the node itself holds at least one ball but no descendants of v hold any balls.The split tree consists of the leaves and all the ancestors of the leaves, in particular the root of S b , but no descendant of a leaf is included.In this way the definition of leaves in split trees is equivalent to the usual definition of leaves in trees.See Figure 1 and Figure 2, where two examples of split trees are illustrated (the parameters s 0 and s 1 in the figures are introduced in the formal "split tree generating algorithm").
The first ball is placed in the root of S b .A new ball is added to the tree by starting at the root, and then letting the ball fall down to lower levels in the tree until it reaches a leaf.Each vertex v of S b is given an independent copy of the so-called random split vector V = (V 1 , V 2 . . ., V b ) of probabilities, where i V i = 1 and V i ≥ 0. The split vectors control the path that the ball takes until it finally reaches a leaf; when the ball falls down one level from vertex v to one of its children, it chooses the i-th child of v with probability V i , i.e., the i-th component of the split vector associated to v. When a full leaf (i.e., a leaf which already holds s balls) is reached by a new ball it splits.This means that some of the s + 1 balls are given to its children, leading to new leaves so that more nodes will be included in the tree.When all the n balls are distributed we end up with a split tree with a finite number of nodes which we denote by the parameter N .
The split tree generating algorithm: The formal, comprehensive "split tree generating algorithm" is as follows with the following introductory notation.The (random) split tree has the parameters b, n, s and V as we described above; there are also two other parameters: s 0 , s 1 (related to the parameter s) that occur in the algorithm below.Let n v denote the total number of balls that the vertices in the subtree rooted at vertex v hold together, and C v be the number of balls that are held by v itself.Thus, we note that a vertex v is a leaf if and only if C v = n v > 0. Also note that a vertex v ∈ S b is included in the split tree if, and only if, n v > 0. If n v = 0, the vertex v is not included and it is called useless.
Below there is a description of the algorithm which determines how the n balls are distributed over the vertices.Initially there are no balls, i.e., C v = 0 for each vertex v. Choose an independent copy V v of V for every vertex v ∈ S b .Add balls one by one to the root by the following recursive procedure for adding a ball to the subtree rooted at v.
1.If v is not a leaf, choose child i with probability V i , and recursively add the ball to the subtree rooted at child i, by the rules given in steps 1, 2 and 3.
2. If v is a leaf and C v = n v < s, (s is the capacity of the vertex) then add the ball to v and stop.Thus, C v and n v increase by 1.
3. If v is a leaf and C v = n v = s, the ball cannot be placed at v since it is occupied by the maximal number of balls it can hold.In this case let n v = s + 1 and C v = s 0 , by placing s 0 ≤ s randomly chosen balls at v and s + 1 − s 0 balls at its children.This is done by first giving s 1 randomly chosen balls to each of the b children.The remaining s + 1 − s 0 − bs 1 balls are placed by choosing a child for each ball independently according to the probability vector and then using the algorithm described in steps 1, 2 and 3 applied to the subtree rooted at the selected child.Note that if s 0 > 0 or s 1 > 0, this procedure does not need to be repeated since no child could reach the capacity s, whereas in the case s 0 = 0 this procedure may have to be repeated several times.
From 3. it follows that the integers s 0 and s 1 have to satisfy the inequality 0 ≤ s 0 ≤ s, 0 ≤ bs 1 ≤ s + 1 − s 0 .
Note that every nonleaf vertex has C v = s 0 balls and every leaf has 0 < C v ≤ s balls.
We can assume that the components V i of the split vector V are identically distributed.If this were not the case they can anyway be made identically distributed by using a random permutation, see [3].Let V be a random variable with this distribution.This gives (because i V i = 1) that E(V ) = 1 b .We use the notation T n to denote a split tree with n balls.However, note that even conditioned on the fact that the split tree has n balls, the number of nodes N , is still a random number.The only parameters that are important   in this work (and in general these parameters are the important ones for most results concerning split trees) are the cardinality n, the branch factor b and the split vector V; this is illustrated in Section 1.4.As an example, in the binary search tree considered as a split tree, b = 2 and the split vector V is (U, 1 − U ) where U is a uniform U (0, 1) random variable.This is a beta (1, 1) random variable.In fact for many important split trees V is beta-distributed.(The other parameters for the binary search tree considered as a split tree are s = 1, s 0 = 1 and s 1 = 0.) For the binary search tree the number of balls n is the same as the number of vertices N ; this is not true for split trees in general.

Notation
In this section some of the notation that we use in the present study is collected.
Let T n denote a split tree with n balls; for simplicity we often write T .Let V T denote the set of vertices in a rooted tree T .We write |S| for the number of vertices in a set S. Note that for the number of vertices N we have N = V T n .Let T v be a subtree rooted at v. Let n v denote the number of balls in the subtree rooted at vertex v and let N v denote the number of vertices.Note that Let D n denote the depth of the last inserted ball in the tree and D k the depth of the k-th inserted ball when all n balls have been added.Let D * n be the average depth, i.e., D * n = n k=1 D k n .We also use the notation D f k for the depth of the node of ball k when it is added to the tree; this could differ from D k since the ball can move during the splitting process: k is the depth of the last ball in a split tree with k balls.Let d(v) denote the depth (or height) of a vertex v, sometimes we just write d for the depth of v.
Let p(v) denote the parent of a vertex v.
There are at least two different types of total path lengths in a tree T that are of interest: the sum of all depths (distances to the root) of the balls in T , and the sum of all the depths of the vertices in T .We denote the former by Ψ(T ) and the latter by Υ(T ).
We use the standards notations, N (µ, σ 2 ) for a normal distribution with expected value µ and variance σ 2 , and Bin(m, p) for a random variable X with a binomial distribution with parameters m and p.We also use the notation mixed binomial distribution (X, Y ) or for short mBin(X, Y ) for a binomial distribution where at least one of the parameters X and Y is a random variable (the other one could be deterministic).Let H n denote the height of a split tree with n balls.
Let T v i , i ∈ {1, . . ., b L } be the subtrees rooted at depth L = β log b ln n for some constant β.For simplicity we just write , the depth of a vertex v in the subtree T w .In particular we write Recall that V is a random variable with the distribution of the identically distributed components V i , i ∈ {1, . . ., b} in the split vector and (1) Note that the second equalities of µ and σ imply that they are bounded.Similarly all moments of − ln ∆ are bounded.
For a given > 0, we say that a vertex v in T n is "good" if and "bad" otherwise.We write V * T n for the set of good vertices in T n , and for the number of good vertices we write 1 p /a m → 0. We use the notation Ω j for the σ-field generated by {n v , d(v) ≤ j}.Finally we write G j for the σ-field generated by the V-vectors for all vertices v with d(v) ≤ j.

A weak law and a central limit law for the depth
In [3] Devroye presents a weak law of large numbers and a central limit law for D n (the depth of the last inserted ball).If P(V = 1) = 0 and P(V = 0) < 1 then and From the following lemma it follows easily (as we explain below) that (4) also holds for the average depth D * n .Recall that D k is the depth of the k-th ball in the tree when all n balls are added.Lemma 1.1.For i ≤ j, we have D i ≤ D j in stochastic sense.
Proof.We show this by showing that for an arbitrary i ∈ {1, . . ., n − 1}, D i ≤ D i+1 , where the inequalities and equalities below are in the stochastic sense only.We show this by the use of coupling arguments.
First consider two identical copies T and T of the split tree when i − 1 balls have been added, where we let v in T denote the corresponding vertex of v in T .More precisely, we consider two split trees T and T with the same split vectors in all vertices of the infinite skeleton tree, and if a ball k, k ≤ i − 1, is added to v in T then ball k is added to v in T .We now assume that we add the two balls i and i + 1 to T and T .
If ball i and ball i + 1 are added to different leaves l 1 and l 2 in T then in T we let them switch positions, i.e., ball i is added to l 2 and ball i + 1 is added to l 1 .(Recall the notation D f k from Section 1.2.)Hence, it is obvious for reasons of symmetry that When the balls ∈ {i + 2, . . ., n} are added, we add them to the corresponding vertices in T and T .Thus, the two trees are identical in the whole process except for that ball i and ball i + 1 always have switched positions in T and T .Hence, by symmetry D i d = D i+1 .If ball i and ball i + 1 are added to the same leaf l in T then there are three different cases: If n l ≤ s − 2, so that l does not split when also ball i and ball i + 1 have been added, then T and T are still identical since ball i and ball i + 1 stay in l.When more balls are added we can again assume that ball i and ball i + 1 have switched positions in T and T at every step of the the recursive construction until all n balls are added.Hence, by symmetry If n l = s − 1, so that l gets s + 1 balls when the new balls are added, l splits according to the usual splitting process when ball i + 1 is added.Again we let ball i and ball i + 1 switch positions in T and T .This means that if ball i is added to v 1 and ball i+1 is added to v 2 in T , then in T ball i is added to v 2 and ball i + 1 is added to v 1 .Thus, again by symmetry . By using the same type of argument as in the cases above we get If n l = s, so that l in T gets s + 2 balls when the new balls are added, we let l split according to the usual splitting process where l keeps s 0 balls and sends the other balls to its children.
If ball i is one of the s 0 balls in the children then it is obvious without using the coupling that D f i ≤ D f i+1 and also D i ≤ D i+1 .If ball i is not one of the s 1 balls in the children of l in T and ball i is added to v 1 and ball i + 1 is added to v 2 , then in T we can again assume that ball i is added to v 2 and ball i + 1 is added to v 1 .Thus, in the stochastic sense If ball i is one of the s 1 balls in T , we use a related but not an identical type of coupling argument as in the previous cases.In this case ball i is added by uniformly choosing one of the b children of l each with probability 1 b , while ball i + 1 is added by using the probabilities given by the components in the split vector V l of l.Again T and T are identical until i − 1 balls are added baring the possibility of variation in the split vectors of the vertices above the leaves as described below.If ball i in T goes to a child v 1 of l related to a component V j in V l , then we add ball i + 1 in T to v 1 with probability min{1, V j 1/b } and to one of the other children related to a component V k > 1/b with probability max{0, 1 − V j 1/b }, so that the sum of the probabilities gives the right marginal distribution.Assume that ball i is added to the child v of l in T and ball i + 1 is added to the child w of l in T .This means that w relates to a component of the split vector of l at least as large as the component of the split vector of l related to v. Now we can assume that the split vectors in the vertices in the subtree rooted at v correspond to the split vectors in the vertices in the subtree rooted at w.This means that we can assume that when ball number j in the subtrees is added it goes to the corresponding vertex in both of the subtrees.However, note that the balls could have different labels if we consider their original label in the whole tree, since the subtree rooted in w could have more balls than the subtree rooted in v. Thus, as long as the subtrees have the same number of balls, new balls are added to the corresponding positions in these subtrees, and ball i and ball i + 1 are also held by vertices of corresponding positions.This construction shows that if the subtrees rooted in w and v have k and l balls, respectively, where k > l, and ball i in T v is in vertex h, then ball i + 1 in T is in a subtree of T w with root corresponding to the position of h.This shows that in the stochastic sense Hence, in all cases, D i ≤ D i+1 stochastically and thus for i < j, it follows that D i ≤ D j in stochastic sense.

This means in particular that for all
Furthermore, see [3, Theorem 1], if σ > 0, and assuming that V is not monoatomic, i.e., we don't have V ≡ 1 b , where N (0, 1) denotes the standard Normal distribution and d → denotes convergence in distribution.Tries are special forms of split trees with a random permutation of deterministic components (p 1 , p 2 , . . ., p b ) and therefore not as random as many other examples.(In the literature tries have also been treated separately to other random trees of logarithmic height.)Of all the most common examples of split trees only some special cases of tries (the symmetric tries and symmetric digital search trees) have a monoatomic distribution of V .From ( 6) it follows that "most" nodes lie at µ −1 ln n+O( √ ln n).

Subtrees
For the split tree where the number of balls n > s, there are s 0 balls in the root vertex and the cardinalities of the b subtrees are distributed as (s 1 , . . ., s 1 ) plus a multinomial vector (n Thus, conditioning on the random V -vector that belongs to the root, the subtrees rooted at the children have cardinalities close to nV 1 , . . ., nV b .This is often used in applications of random binary search trees.In particular, we used this fact frequently in [11]."The split tree generating algorithm" described above, and the fact that a mBin(X, p 1 ) in which X is Bin(m, p 2 ) is distributed as a Bin(m, p 1 p 2 ), give in a stochastic sense, an upper bound on the number of balls n v in a subtree rooted at a vertex v: Let v be a vertex at depth d, conditioning on G d (i.e., the σ-field generated by the V vectors for all vertices v with d(v) ≤ d), gives where W j,v , j ∈ {1, . . .d} are i.i.d.random variables given by the split vectors associated with the nodes in the unique path from v to the root.This means in particular that W j,v d = V .However, we note that the terms in (7) are not independent.Also observe that G d is equivalently the σ-field generated by W j,v , j ∈ {1, . . ., d} for all v with d(v) = d.Similarly, we also have a lower bound for n v , i.e., for v at depth d, conditioning on G d in stochastic sense, we can replace the term s by s 0 + bs 1 ≤ s for a sharper bound.As in (7) the terms in ( 8) are not independent.
Recall that for a Bin(m, p) distribution, the expected value is mp and the variance is mp(1 − p).Thus, Chebyshev's inequality applied to the dominating term Bin(n, d j=1 W j,v ) in (7) gives that n v for v at depth d is close to More precisely by using ( 7) and ( 8), the Chebyshev and Markov inequalities give for v with d(v) = d, that for large n, Since the n v 's (conditioned on the split vectors) for all v at the same depth are identically distributed, we sometimes skip the vertex index of W j,v and just write W j .

Renewal Theory
Renewal theory is a widely used branch of probability theory that generalizes Poisson processes to arbitrary holding times.A classic in this field is Feller [5] on recurrent events.First we recollect some standard notation.Let X 0 = 0 a.s.. Let X k , k ≥ 1, be i.i.d.nonnegative random variables distributed as X and let S m , m ≥ 1, be the partial sums.Let F denote the distribution function of X, and let F m be the distribution function of S m , m ≥ 0. Thus, for x ≥ 0, i.e., F m equals the m-fold convolution of F itself.The "renewal counting process" {N (t), t ≥ 0} is defined by which one can think of as the number of renewals before time t of an object with a lifetime distributed as the random variable X.In the specific case when X d = Exp(λ), {N (t), t ≥ 0} is a "Poisson process".An important well studied function is the so called "standard renewal function" defined as which one can easily show is equal to E(N (t)).The renewal function V (t) satisfies the so called renewal equation For a broader introduction to renewal theory, see e.g.[1], [6], [7] and [9].One of the main purposes of this study is to introduce renewal theory in the context of split trees.Recall from (9) in Section 1.4 that the subtree size n v for v at depth k, is close to nW 1 W 2 . . .W k , where W j , j ∈ {1, . . ., k}, are independent random variables distributed as V .Now let Y k := − k j=1 ln W j , and for simplicity we also denote the summands r := − ln W j .Note that nW 1 W 2 . . .W k = ne −Y k .Recall that in a binary search tree, the split vector where U is a uniform U (0, 1) random variable.For this specific case of a split tree the sum Y k , (where W j , j ∈ {1, . . ., k}, in this case are i.i.d.uniform U (0, 1) random variables) is distributed as a Γ(k, 1) random variable.This fact is used by, e.g., Devroye in [4] to determine the height of a binary search tree.For general split trees there is no simple common distribution function of k j=1 ln W j , instead renewal theory can be used. Let We define the renewal function We also denote ν(t) := ν 1 (t) = bP( r ≤ t).For U (t) we obtain the following renewal equation

Main Results
In this section we present the main theorems of this work.
(A1).In this work we assume as in Section 1.3 that P(V = 1) = 0, and we also assume for simplicity that P(V = 0) = 0 and that − ln V is non-lattice.
The reason for the non-lattice assumption (A1) is that we use renewal theory and there it often becomes necessary to distinguish between lattice and non-lattice distributions.Note that the assumption that V is not monoatomic in Section 1.3 is included in the assumption that − ln V is nonlattice.Again of the common split trees only for some special cases of tries and digital search trees does − ln V have a lattice distribution.Our first main result is on the relation between the number of vertices N (recall that this is a random variable) and the number of balls n.
Theorem 2.1.There is a constant α depending on the type of split tree such that and Recall that there is a central limit law for the depth D n in (6) so that most vertices are close to ln n µ + O √ ln n , our next result sharpens this fact.Recall that for any constant > 0, we say that a vertex v in T n is "good" if and "bad" otherwise.Theorem 2.2.For any choice of > 0, the number of bad nodes in T n is bounded by O L 1 n ln k n for any constant k.
In the third main result we sharpen the limit laws in (4) and ( 5) for the expected value of the depth of the last ball D n and the average depth D * n .We also find the first asymptotic of the variances of the k:th ball D k for all k, n ln n ≤ k ≤ n.Theorem 2.3.For the expected value of the depth of the last ball we have and the same result holds for the average depth D * n , i.e., Furthermore, for the variance of the depth of the k:th ball we have that for all n ln n ≤ k ≤ n, We complete this section by stating two corollaries of Theorem 2.3.Recall that we write V * T n for the set of good vertices in T n , i.e., those with depths that belong to the strip in (2).
Corollary 2.1.Summing over all vertices give For the good vertices we also have We write V * T i for the set of good vertices in T i .
Corollary 2.2.Let L = β log b ln n for some large constant β.Then, summing over all vertices give and for the good vertices we also have 3 Some Fundamental Renewal Theory Results The main goal of this section is to present a renewal theory lemma and a corollary of this lemma, which are both frequently used in this study.In contrast to standard renewal theory the distribution function ν(t) in ( 13) is not a probability measure.However, to solve (13) we can apply [1, Theorem VI. 5.1] which deals with non probability measures.The result we get is presented in the following lemma.
Lemma 3.1.The renewal function U (t) in (12) satisfies Proof.Since the distribution function ν(t) is not a probability measure, we define another ("conjugate" or "tilted") measure ω on [0, ∞) by Recall from Section 1.2 that ∆ = V S is the size biased distribution of (V 1 , . . ., V b ).We note that ω(x) is the distribution function of the random variable − ln ∆ since Thus, ω is a probability measure.Further, by recalling µ := E(− ln ∆) and σ 2 := Var(− ln ∆) gives Define U (t) := e −t U (t) and ν(t) := e −t ν(t).We shall apply [1, Theorem VI.5.1], but first we need to show that the condition that ν(t) is "directly Riemann integrable" (d.R.i.) is satisfied.Note that ν(t) ≤ be −t , and thus since ν(t) is also continuous almost everywhere, by [ where ω(t) is a probability measure, and Integration by parts now gives Thus, The following result is a very useful corollary of Lemma 3.1.We write for v at depth d(v), M n v := n d(v) j=1 W j .Recall from (9) in Section 1.4 that this is close to the real subtree size n v .
Proof.By using Lemma 3.1 we get We complete this section with a more general result in renewal theory, and a corollary of a more specific result that is valid for the renewal function U (t) in (12).Theorem 3.1.Let F be a non-lattice probability measure and suppose that where z(t) is a nonnegative function, such that a := Then Proof.Let V (t) be the standard renewal function in (11), where where the last equality follows because V (t) = 0 for t ≤ 0. By applying (33) and Fubini's Theorem we get Hence, From [1, Proposition VI.4.1] we have Hence, the Lebesgue dominated convergence theorem applied to the last integral in (35) gives Note that for all x, x z(u)xdu = 0, and the convergence result in (32) obviously follows.If ∞ 0 z(u)udu is not integrable then we have a special case of (32), i.e., lim x→∞ G(x) = −∞.
We define the function The next result is a corollary of Theorem 3.1.
Corollary 3.2.The function W (x) in (37) satisfies Proof.We apply Theorem 3.1 to Z(t) = U (t) = e −t U (t) defined in the proof of Lemma 3.1 (recall that U (t) satisfies the renewal equation in ( 25)).Now, the constant a as defined in Theorem 3.1, satisfies a = ∞ 0 ν(u)du, thus from (26) and ( 27) we get a = 1.Using ( 24) and ( 26)-( 27) gives, We present below some crucial lemmas by which we can then prove Theorem 2.1.The proofs of these lemmas are given in Section 4.1.4below.The first lemma is fundamental for the proof.
and for the second moment of N we have Lemma 4.2.Adding K balls to a tree will only affect the expected number of nodes in a split tree by O(K) nodes.
Let R be the set of vertices such that conditioned on the split vectors, W j ≥ B, recall that p(r) is the parent of r.For now we just let B be large; however, later our choice of B will be more precise.To show (14) we consider all subtrees rooted at some vertex r ∈ R. We denote these subtrees by T r,B , r ∈ R. Recall from (9) that with "large" probability the cardinality n r is "close" to M n r .We will show that in fact we can replace n r by M n r in our calculations.Let n r be the number of balls and let N r be the number of nodes in the T r,B subtree.Corollary 3.1 implies that most vertices are in the T r,B subtrees, i.e., The next lemma shows that the expected number of vertices in the T r,B subtrees with subtree sizes n r that differ significantly from M n r is bounded by a "small" error term for large B. Since the variance of a Bin(m, p) distribution is m(p − p 2 ), the Chebyshev and Markov inequalities give similarly as in (10) that for large B, From (41) we have Lemma 4.3.The expected value of the number of nodes that are not in the T r,B , r ∈ R, subtrees with subtree size n r that differs from M n r with at least B 0.6 balls, is hence, from (43) We also sub-divide the T r,B , r ∈ R, subtrees into smaller classes, wherein the M n r 's in each class are close to each-other.Choose γ := 2 and let Z := {B, B − γB, B − 2γB, . . ., B}, where = 1 k for some positive integer k.We write R z ⊆ R, z ∈ Z, for the set of vertices r ∈ R, such that M n r ∈ [z − γB, z) and M n p(r) ≥ B. (Note that the intervals are of length γB and that the set Z contains at most 1 γ elements.)We write |R z | for the number of nodes in R z .The next lemma is a result that we get by the use of renewal theory applied to the renewal function U (t) in (12).
for a constant c α (only depending on α), and also α∈S c α = O 1 , where the constant in O is not depending on .
Before proving these lemmas we show how their use leads to the proof of Theorem 2.1.

Proof of (14) in Theorem 2.1
Proof.For showing (14) it is enough to show that for two arbitrary values of the cardinality n and n, where n ≥ n, we have Since (47) implies that E(N ) n is Cauchy it follows that E(N ) n converges to some constant α as n tends to infinity; hence, we deduce (14).
Recall from Section 4.1.1 that we will consider the subtrees T r,B , r ∈ R, rooted at r; these are defined such that Lemma 4.3 shows that we only need to consider the vertices in r ∈ R .Let R ⊆ R be the set of vertices such that r ∈ R if r ∈ R and Note that since α∈S c α = O(1), we have that α∈S c α O(Bγ) B = O(γ).Recall that γ := 2 .Thus, for a constant c α (depending on α) and a αB = O 1 , we get from (52) and ( 53) that In analogy we also get for n ≥ n, Thus, (47) follows, which shows (14).

Proof of (15) in Theorem 2.1
Proof.First note that (40) in Lemma 4.1 implies that Var(N The purpose is to use the variance formula where Y is a random variable and G is a sub-σ-field, see e.g.[10, exercise 10.17-2].We consider the subtrees T i , 1 ≤ i ≤ b D at depth D = c ln n, choosing the constant c small enough so that the number of nodes Z D between depth D and the root is O n for some arbitrary small .Let n i be the number of balls and N i the number of nodes in T i .Conditioned on Ω D , N i , 1 ≤ i ≤ b D , are independent and it follows that, Taking expectation in (57) gives Recall that Ω D is the σ-field generated by {n v , d(v) ≤ D}.
Lemma 4.5.For D = c ln n there is a δ > 0, such that Proof.The representation of subtree sizes in split trees described in (7) in Section 1. 4 gives in particular that conditioning on G D , n i for i at depth D is bounded from above (in stochastic sense) by Bin(n, where W j , j ∈ {1, . . ., D}, are i.i.d.random variables distributed as V .The fact that the second moment of a Bin(m, p) is m 2 p 2 + mp − mp 2 and the bound of n i in (60) give Note that E(W 2 j ) < E(W j ) = 1 b , since W j ∈ (0, 1).Hence, there is an > 0 such that and thus there is a δ > 0 such that which shows (59).
Remark 4.1.The proof shows that if we can improve the result in ( 14) in Theorem 2.1 such that E(N ) = αn + O(n 1−c 1 ) for some constant c 1 > 0, we will also get a sharper result for the variance, i.e., Var(N ) = o(n 2−c 2 ) for some constant c 2 > 0.

Proofs of the Lemmas of Theorem 2.1
Proof of Lemma 4.1.(Note that if s 0 > 0 it is always true that N ≤ n and if s 1 > 0 we always have N ≤ 2n.)For s 0 = s 1 = 0 we can argue as follows: When a new ball is added to the tree the expected number of additional nodes is bounded by the expected number of nodes one gets from a splitting node.Let Z be the number of nodes that one gets when a node of s + 1 balls splits.Then Note that once a node gives balls to at least 2 children the splitting process ends.Thus, Hence, (65) implies, There is a δ > 0 such that bE( for V ∈ (0, 1).Thus, (66) gives This shows (39).Now we show (40).Note that (40) obviously holds if s 1 > 0 or s 0 > 0, since then N ≤ 2n.Recall that Z is the number of nodes that one gets when a node of s + 1 balls splits.Then by the well-known Minkowski's inequality By similar calculations as in ( 66)-( 68) we get that for some constant δ > 0, Thus, (40) follows from ( 69) and (70).
Proof of Lemma 4.1.The proof of this lemma is in analogy with the proof of (39) in Lemma 4.1.Adding one ball to the tree will only increase the vertices if it is added to a leaf with s balls.Recall that Z is the number of nodes that one gets when a node of s + 1 balls splits.Hence, (66) gives E(Z) = O 1 , implying that K balls can only create O K additional nodes.
Proof of Lemma 4.3.By applying (42) we get that with probability at least 1 We have where Hence, the facts that r M n r = O n and that the bound in (71) holds with probability 1 − 1 B 0.1 , give Recall that R is the set of vertices such that r ∈ R, if r is the root of a T r,B subtree.We obviously have By summing over vertices v at depth k we get We write F for the expected value in (73), i.e., Hence, the conditional Cauchy-Schwarz and the conditional Markov inequalities give From (7) we have that for all v with d(v) = d, conditioned on G d (i.e., the σ-field generated by W j,v , j ∈ {1, . . ., d}), n v ≤ n v + n v , where Thus, (74) gives for M n v ≥ 1, where we in the last equality apply that E n v = M n v .For M n v < 1 we apply that (74) gives By applying the fact that the variance of a Bin(m, p) distribution is m(p−p 2 ) we get , and from the Minkowski's inequality we easily deduce that E (n v ) 2 = O(1).Hence, by using that we can bound F as in (75) for M n v ≥ 1, and by the bound in (76) for M n v < 1, we get that F = O(1).Thus, from (73) we get Hence, by applying Corollary 3.1 for K = B, we get from (77) that E 2 = O n B .By applying Lemma 4.1 in combination with (72) and using the bounds of E 1 and E 2 we get By using the key renewal theorem [9,Theorem II.4.3] applied to U (t) we get Note that lim q→∞ Z(q) := c α , for some constant c α only depending on α.Thus, by using Z(x) = e −x Z(x) we get that for the constant c α (only depending on α), which shows (46).Also note that we have

Proof of Theorem 2.2
Proof.We use large deviations to show this theorem (in fact we get a sharper bound of the number of bad nodes).Note that a vertex v belongs to the tree if and only if n v ≥ 1. Recall that there is an upper bound of n v with d(v) = d in (7) above, i.e., conditioning on G d in stochastic sense, where W j,v , j ∈ {1, . . ., d}, are i.i.d.random variables distributed as V .It is enough to just consider the first term Bin(n, d j=1 W j,v ) in (84), and prove that the number of bad nodes with Bin(n, d j=1 W j,v ) ≥ 1 is bounded by O L 1 n ln k+1 n , where we choose k large enough.If s 1 = 0, Bin(n, d j=1 W j,v ) is the only term in (84).We now explain the fact that we can ignore the terms in n v that occurs because of the parameter s 1 .Assume that for split trees with s 1 = 0, the number of bad nodes is bounded by O L 1 n ln k+1 n .We first consider the vertices with d ≤ µ −1 ln n − ln 0.5+ n.If s 1 > 0, we assume that we first add the n balls as in the construction of a split tree with the parameter s 1 = 0. Hence, the number of vertices v with d ≤ µ −1 ln n − ln 0.5+ n, is bounded by O L 1 n ln k+1 n .We now repay the subtree sizes for their potential loss of balls because of s 1 > 0. A vertex v at depth d can at most have a loss of s 1 d balls in the subtree rooted at v.These balls cannot give more than s 1 bd nodes to the tree (since only if s 0 = s 1 = 0 it is possible for an increment of more than b nodes when a new ball is added to the tree).Thus, since d ≤ µ −1 ln n and the fact that we assume that we have O L 1 n ln k+1 n nodes before the repayment of the loss of balls, these additional balls cannot give more than O L 1 n ln k n nodes.Now we consider the vertices with d ≥ µ −1 ln n + ln 0.5+ n.Again we first distribute the n balls assuming that s 1 = 0, and then repay for the potential loss of balls in the subtrees if s 1 > 0. First note that for d = O(ln n) we can argue as in the previous case.This means that the number of nodes with µ −1 ln n + ln 0.5+ n ≤ d ≤ K ln n for some arbitrary constant K is bounded by O L 1 n ln k n .For larger d we argue as follows: For any constant The Markov inequality gives, where the last equality is obtained by first condition on G d and then take the expected value twice.Thus, the expected number of vertices that gets a repayment of at least K 1 s 1 ln n + 2 balls is bounded by O n b K 1 ln n .Since s 1 > 0, we can assume that d ≤ n.Hence, the expected number of balls of this contribution is O n 2 b K 1 ln n ; choosing K 1 large enough this number is just o(1) and can thus be ignored.
It remains to prove that if s 1 = 0 the number of vertices v, where for any constant k.Note that an upper bound of the expected number of vertices at depth d is given by where v is a vertex at depth d − 1.Note that this is true even in the case s 0 = 0, since for all internal nodes n v ≥ s + 1. Choosing t > 0, an application of the Markov inequality implies that Thus, an upper bound of the expected profile for the vertices at depth d is where v is a is a vertex at depth d − 1.
First we show that the number of vertices v (assuming s 1 = 0) where d(v) ≥ µ −1 ln n + ln 0.5+ n is bounded by O L 1 n ln k+1 n .We prove this by choosing t = 1+ (n)

2
, where (n) > 0 is a decreasing function of n that we specify below, and show that Let X d be a mixed binomial (n, d j=1 W j ), where W j , j ∈ 1, . . ., d are i.i.d.random variables distributed as V .To show (89) it is enough to show that the expected value of n ln k+1 n .That this is enough follows because of the bound of n v in (84), since we assume that s 1 = 0. Suppose that (n) < 1, thus the Lyapounov inequality (which is a special case of the well-known Hölder inequality) gives Hence, to show (89) we deduce from the right hand-side of the second inequality in (91) that it is enough to show that Taylor expansion gives Thus, by taking expectations in (93) we get Hence, by choosing (n) = δ ln −0.5+ n for some constant δ (that we choose small enough) we get from the last inequality in (94) that for some constant B > 0 and any constant k, We argue similarly for the vertices v, d(v) ≤ µ −1 ln n − ln 0.5+ n.In (88) let t = 1− (n) 2 where (n) = δ ln −0.5+ n for some constant δ as above.In analogy with (89) an upper bound for the expected number of vertices v with d(v) ≤ µ −1 ln n − ln 0.5+ n is We use similar calculations as in ( 92)-( 95) to show that This implies in analogy with (89)-( 92) that for some constant B and any constant k, Hence, if s 1 = 0 the number of bad vertices is bounded by O L 1 n ln k+1 n , for any constant k.Thus, it follows from our previous explanation that the number of bad vertices for arbitrary s 1 ≥ 0 is bounded by O L 1 n ln k n .
Remark 4.2.We note from (94), ( 95) and (98) that we in fact get a sharper bound for the number of bad nodes, i.e., O ne −B ln 2 n for some constant B > 0.
Remark 4.3.From the calculations in the proof of Theorem 2.2 in particular in (94), we see that we can get a much smaller error term for larger depths, i.e., for any constant r there is a constant C > 0 so that the number of nodes with d(v) ≥ C ln n is bounded by O L 1 1 n r .

Proof of Theorem 2.3
Proof.We write By a classical result in probability theory, see e.g.[10, Theorem 5.5.4], the limit law in (6) implies that (16) holds if Z n is uniformly integrable.In particular this is true if Z 2 n is uniformly integrable.This uniformly integrability also gives Furthermore, the convergence results in ( 16) and (100) imply (18) for k = n.By using the same coupling argument as in (5) it is easy to show that the convergence result of the expected depth in ( 16) implies the convergence result of the expected average depth in (17).Thus, it remains to show that Z 2 n is uniformly integrable and that ( 18) for k = n implies that (18) also holds for n ln n ≤ k < n.By a standard argument, see e.g.[10,Theorem 5.5.4],Z 2 n is uniformly integrable if for some p > 1 and n 0 large enough, sup is uniformly bounded.We choose p = 3 2 .We show that this is true by using similar calculations as Devroye used in [3] for proving the limit law of D n in (6).First, consider an infinite random path u 1 , u 2 , . . ., in the skeleton tree S b , where u 1 is the root.Given u 1 and the split vector V u i = (V 1 , . . ., V b ) for u i , then u i+1 is the j-th child of i with probability V j .Construct a random split tree with n balls and let u * be the unique leaf in the infinite path.Then by using a natural coupling, letting the n:th ball follow the random path, D n is in stochastic sense less than or equal to the distance between u * and the root.In the coupling D n is less than this distance, if the n-th ball is sent to a leaf which splits and does not send this ball to one of its children (i.e, the n-th ball is one of the s 0 balls).If the n-th ball is one of the s 1 balls it is added to a child of p(u * ) (the parent of u * ), i.e., it ends up at the same depth as u * .Recall that H n denotes the height of a split tree with n balls.For all β > 0 we have and Recall that ∆ = V S , where given (V 1 , . . ., V b ), S = j with probability V j .
Then n(u k ) is stochastically bounded by where ∆ j are i.i.d random variables distributed as ∆.
Consider the probability P( ln n for x ∈ R + .We bound this by bounding the probabilities in the right handside of (102), choosing β = x 2 ln 0.2 (n) .First note that the bound of n(u k ) in (104) implies that in stochastic sense Thus, we can bound the first probability in the right hand-side of (102) by For bounding the first probability in the right hand-side of the inequality in (106), we use [3,Lemma 4] which states a general result for bounding tail probabilities for mixed binomial (m, Z) distributions where Z is a random variable, thus we obtain From (107) we deduce that for n large enough Recall the notations c := E ∆ , µ =: E − ln ∆ and σ 2 =: Var ln ∆ .Note that c < 1.Since the ∆ j , j ∈ {1, . . ., k}, are i.i.d random variables we can use the Marcinkiewicz-Zygmund inequalities, see e.g.[10, Corollary 3.8.2],which gives for q ≥ 2, where B q is a constant only depending on q.By using the Markov inequality and (109) we get from (108) that for n large enough for the constant C = < ∞ (recall from section 1.2 that all moments of | ln ∆| are bounded).The Markov inequality implies that We now consider the other probability i.e., P H β > β .(Note that this probability is 0 if s 0 > 0 or s 1 > 0.) By applying (86) we get where v is a vertex at depth β − 1.From (87) we deduce for t = 0.75, Let X β be a mixed binomial (n, β j=1 W j ), where W j , j ∈ {1, . . ., β} are i.i.d.random variables distributed as V .Note similarly as in (90) that ( 113) is bounded by the expectation of We note similarly as in (91) that the Lyapounov inequality gives Again the fact that E(W 2 j ) < E(W j ) = 1 b (since W j ∈ (0, 1)), gives that there is a δ > 0 such that We now consider the probability P(D n < k), where k = µ −1 ln n − x √ ln n for x ∈ R + , and use the bound of the larger probability in (103).We have Again by applying [3, Lemma 4] and using similar calculations as in (107)-(110), we get for n large enough and thus Z 2 n is uniformly integrable so that (100) holds, which shows (18) for k = n.
From this result it is now easy to show as we explain below that (18) also holds for all k, n ln n ≤ k < n.Recall that we denote the depth of ball k, when it is added to the tree by D f k .As we argued for proving (5), in stochastic sense for k ≤ n, From ( 6) it follows that for all n By using this and (120), for n As for D n this follows if for n 0 large enough, ln 0.4 n.Hence, the expected number of (good) vertices in T n that are not in the subtrees for B = ln 0.3 n.Hence, this bound implies that the expected value of the last equality in (126) is equal to Proof of Corollary 2.2.As in Corollary 2.1 we only show the result for the good vertices, i.e., (22).From the proof it is obvious that also (21) holds by applying Theorem 2.2, showing that the number of bad vertices is covered by the error term.We observe the obvious fact that the sum of those n i , i ∈ {1, . . ., b L }, which are less than n b kL for large enough k, is bounded by (Note that by choosing k large enough in (127), the power of the logarithm can be arbitrarily large.)Recall that V * T i is the set of good vertices in T i and that Ω L is the σ-field generated by {n v , d(v) ≤ L}.Let Let k > 0 be a fixed constant and assume that n i is at least n b kL ; by Taylor expansion we get

Results on the Total Path Lengths
We complete this study with some results and a conjecture of the "total path length" random variables.Recall from Section 1.2 the definitions of the two types of total path length Ψ(T ) and Υ(T ), i.e., the sum of the depths of balls and the sum of the depths of nodes, respectively.From (17) we have where q(n) = o(ln 0.5 n) is a function that depends on the type of split tree.
For proving Theorem 5.1 we will show that Γn n converges to a constant.We write * v for a sum where we sum over all vertices v ∈ T n except the root i.e., v = σ.First we recall that the total pathlength for the balls is equivalent to the sum of all subtree sizes (except for the the whole tree) for the balls i.e., where σ is the root of T n .Similarly we recall that the total pathlength for the nodes is equivalent to the sum of all subtree sizes (except for the the whole tree) for the nodes i.e., where σ is the root of T n .Hence, by assuming (A3) we get from (137) that We will again consider the T r,B , r ∈ R, subtrees from the proof of Theorem 2.1 in Section 3 (defined such that M n r := n d(r) j=1 W j < B and M n p(r) := n d(r)−1 j=1 W j ≥ B).However, here we choose B differently, i.e., B = −20 .Lemma 5.1.Assume that (A1)-(A3) hold, then Furthermore, where Proof.Assuming (A3), we get from (140) that where we applied Corollary 3.1 in the last equality.In the same way, for the vertices which are not in the T r,B , r ∈ R, subtrees (ignoring the root σ) we deduce that Hence, (142) follows from ( 145) and (144).
Proof of Theorem 5.1.We will use the same type of proof as the proof of ( 14) in Theorem 2.1.We start with two arbitrary values of the cardinality n and n, where n ≥ n, and show that Since (146) implies that Γn n is Cauchy it also converges to some constant as n tends to infinity; hence, we deduce Theorem 5.1.Recall from the proof of Theorem 2.1 that a main application for the proof is to use (39) in Lemma 4.1.Here we use an analogous applications of (144) in Lemma 5.1, i.e., Γ n = O n .
Recall that we prove Lemma 4.

) 1 p≤
positive number and Y m is a random variable such that Y m /a m p → 0 as m → ∞.We use two unusual types of order notation; let a m be a positive number and Y m a random variable, by the notation Y m := O L p (a m ) we mean that (E(Y m p )Ca m for some constant C, and by the notation Y m := o L p (a m ) we mean that (E(Y m p ))

Corollary 3 . 1 .
By taking the sum over vertices v, d(v) = k and letting n K → ∞, we get that the expected number of nodes with M n v ≥ K is equal to

Lemma 4 . 1 .
For the first moment of the number of vertices N we have

Lemma 4 . 4 .
Let S := {1, 1 − γ, 1 − 2γ, . . ., }, where γ = 2 .Choose α ∈ S and let n B → ∞, then E(|R αB |) n B = c α + o(1), 127) for k large enough to cover those n i that are less than n b kL in an error term o n ln 2 n , and using (128) we deduce we can assume that n i is at least n b kL for large enough k, by Taylor expansion good vertices are considered, and the random variables conditioned on Ω L are independent for i ∈ {1, . . ., b L },Var b L i=1 Z i Ω L = b L i=1 Var Z i Ω L ≤ µ 3 b L i=1 Var v∈V * T i 1 ln 2−2 n i Ω L .(131)Thus, the well-known Minkowski's inequality and the fact thatE(N 2 ) = O(n 2 ) implyVar ), Chebyshev's inequality gives (22).

3 by showing E r n r
I{|n r − M n r | ≥ B 0.6 } = O n B 0.1 ,(147)and then applying (39) in Lemma 4.1.In the same way by using (147) and (141) as well as (142) in Lemma 5.1 we get thatΓ n = E r∈R Γ nr I{|n r − M n r | ≤ B 0.6 } + o(n) + O n B 0.1 .(148)Note that conditioned on Ω L , the summands Υ(T i ), i ∈ {1, . . ., b L } are independent.By applying the Cauchy-Schwarz inequality, and using the facts that E(N 2 ) = O n 2 and that E D 2 k = O ln 2 n for all k, we deduce thatVar b L i=1 Υ(T i ) Ω L = b L i=1 Var Υ(T i ) Ω L ≤ b L i=1 E Υ(T i ) 2 Ω L = b L i=1 O n 2 i ln 2 n i .(160)Similarlyas in (133), for any constant k (and choosing the constant β in L large enough) the following holds E a large enough constant β, by taking expectations in (160) we getEVar and (160) and applying (162), the Chebyshev inequality results in that conditioning on Ω L ,b L i=1 Υ(T i ) µ −2 ln 2 n i = b L i=1 αn i µ −1 ln n i + b L i=1 n i r(n i ) µ −2 ln 2 n i + o p n ln 2 n .(163) By applying Theorem 5.1, (128) and (129) we get b L i=1 n i r(n i ) µ −2 ln 2 n i = 158) follows from (163) and (164).