On the independence number of some random trees

We show that for many models of random trees, the independence number divided by the size converges almost surely to a constant as the size grows to infinity; the trees that we consider include random recursive trees, binary and $m$-ary search trees, preferential attachment trees, and others. The limiting constant is computed, analytically or numerically, for several examples. The method is based on Crump-Mode-Jagers branching processes.


Introduction
The independence number i.e., the maximum size of an independent set of nodes, is a quantity that has been studied for various models of random trees (and other random graphs, not considered here). In the present paper we consider rooted trees that can be constructed as family trees of a Crump-Mode-Jagers branching process stopped at a suitable stopping time; this includes, for example, random recursive trees, preferential attachment trees, fragmentation trees, binary search trees and m-ary search trees; see Section 2.1 and [7] for details, and the examples in Sections 4-8 below.
We denote the independence number of T by I(T ). Our main result, Theorem 3.1, gives a strong law of large numbers for I(T ); more precisely, it shows convergence almost surely (a.s.) of I(T n )/|T n |, the fraction of nodes that belong to a maximum independent set, for a sequence T n of random trees. The limit ν is a constant depending on the random tree model; the theorem expresses this limit in terms of the solution p(t) of the functional equation (3.3). We show in Sections 4-8 how this equation can be solved and ν found explicitly (at least numerically) in some important examples, viz. random recursive trees, binary search trees, preferential attachment trees, extended binary search tress and m-ary search trees (in particular m = 3).
Note that the cases of random recursive trees and binary search trees have been studied before. For random recursive trees, the expectation was found already by Meir and Moon [11,14]. More recently, both Dadedzi [4] and Fuchs et al [5] prove (independently, and with different methods) the weak version (i.e., convergence in probability) of (3.2) below for random recursive trees and binary search trees, with explicit ν, and also a much stronger central limit theorem. Nevertheless, we think that the present approach is of interest, since it is quite general; moreover, it gives convergence a.s. Furthermore, although we only prove a law of large numbers in the present Date: 19 March, 2020; revised 20 March, 2020. 2010 Mathematics Subject Classification. 60C05; 05C05; 05C69. Partly supported by the Knut and Alice Wallenberg Foundation. 1 paper, we hope that future development of our methods will also lead to a central limit theorem.
Similar results have also been proved for other types of random trees. For simply generated trees, see e.g. Meir and Moon [10,12] and Banderier, Kuba and Panholzer [2]; for uniform unlabelled trees (rooted or unrooted), see Meir and Moon [13]. Remark 1.1. As is well known, for trees, several other quantities are determined by the independence number by linear relations, and our results thus immediately transfer to these quantities. These include, for example: (i) The matching number, i.e., the maximum size of a partial matching. This equals |T | − I(T ), (ii) The minimum size of a vertex cover, i.e., of a vertex set that contains at least one end-point of every edge. This equals the matching number, i.e., |T | − I(T ). (iii) The nullity, i.e., the dimension of the kernel of the adjacency matrix, or the multiplicity of the eigenvalue 0 of the adjacency matrix. This equals 2I(T ) − |T |. (The results in [4] referred to below are actually stated for the nullity.) See e.g. [5] for further examples.

Preliminaries
We give some definitions and notation, together with some known results that will be used.
The number of nodes of a tree T is denoted |T |.
If T is a rooted tree, and v ∈ T (i.e., v is a node in T ), then T v denotes the fringe subtree of T at v, i.e., the subtree consisting of v and all its descendants; T v is defined as a rooted tree with root v.
2.1. Family trees of branching processes. We follow [7,Section 5], to which we refer for further details. Let T t be the family tree of all individuals born up to time t 0 in a given Crump-Mode-Jagers (CMJ) process, starting at time t = 0 with a single individual (the root). Let the children of the root be born at (random) times (ξ i ) N 1 , where 0 N ∞ and 0 < ξ 1 ξ 2 . . . . We regard the (multi)set of birth times as a point process Ξ; formally, Ξ is the random (discrete) measure i δ ξ i , where δ t is the Dirac measure (point mass) at t. Moreover, each individual x has its own copy Ξ x of Ξ; the processes Ξ x are i.i.d. (independent and identically distributed). Let σ x be the time individual x is born. For simplicity we assume that all individuals live forever.
Let Z t be the number of individuals at time t. In the simplest, and most common, case, we define the stopping time τ (n) := inf{t : Z t n}, (2.1) the first time the number of individuals is at least n, and T n := T τ (n) , the family tree at that time. (By the assumptions below, τ (n) < ∞ a.s.) Thus T n is a random tree with |T n | n. Typically, the birth times ξ i are continuous random variables and a.s. no two births are simultaneous, and then |T n | = n. More generally, we fix a weight ψ(s). This is assumed to be a characteristic, i.e., a random function ψ(s) 0 associated to the root and its point process Ξ, and we assume that each individual x is equipped with its own copy ψ x (s) of ψ; the simplest case is that ψ x (s) is a deterministic function of the point process Ξ x . (More generally, ψ x may also depend on the entire tree of descendants of x, and possibly also on some extra randomness, see [7, in particular Remark 5.10].) We assume ψ x ∈ D[0, ∞), and we exclude the trivial case ψ(t) = 0 for all t 0 a.s. The argument s 0 of ψ x (s) should be interpreted as the current age of x, which is t − σ x at time t. Let be the total weight at time t 0. We then let the first time the total weight is at least n. (We define inf ∅ = ∞.) Finally, as before, we define T n := T τ (n) . Note that the choice ψ(s) = 1 gives Z ψ t = Z t , and thus the simple definition (2.1) of T n .
Examples of common random trees that can be constructed as T n in this way are given in Sections 4-8; see further [7].
Let µ := E Ξ be the intensity of the point process Ξ. In other words, µ is the (deterministic) measure on [0, ∞) such that, for any Borel set A, µ(A) is the expected number of children of the root born at times t ∈ A. In particular, with N ∞ as above the (random) total number of children of the root, µ[0, ∞) = E N ∞.
We use the following assumptions throughout the paper: (A1) ξ 1 > 0, i.e., no children are born immediately at their parent's birth.
(Equivalently, µ{0} = 0.) (A2) µ is not concentrated on any lattice hZ, h > 0. (The results extend to the lattice case with suitable modifications, but we do not know any interesting examples and ignore this case.) (A3) N 1 a.s. and E N > 1. (Thus, every individual has at least one child, so the process never dies out, and Z ∞ = ∞ a.s.) (A4) There exists a real number α (the Malthusian parameter ) such that ∞ 0 e −αt µ(dt) = 1.
(2.5) 2.2. Independence numbers. We collect here some simple and well-known properties of independence numbers of (rooted) trees; see e.g. [10]. For a tree T , let I(T ) be the independence number of T , i.e., the maximum size of an independent set of nodes. For a rooted tree T , let further I 1 (T ) be the maximum size of an independent node set containing the root, and let I 0 (T ) be the maximum size of an independent node set not containing the root. Thus Furthermore, if the children of the root are v 1 , . . . , v d , then it is easily seen that Since I 0 I by (2.6), it follows that I 1 (T ) I 0 (T ) + 1, and thus Then (2.7) yields which shows that the independence number I(T ) is an additive functional on rooted trees with toll function ι(T ). As is well known, (2.11) is equivalent to Furthermore, by (2.10), (2.6) and (2.7)-(2.8), Say that a node v ∈ T is essential if it belongs to every maximum independent set of T v . This is equivalent to I 1 (T v ) > I 0 (T v ), and thus to ι(T v ) = 1. In other words, (2.14) In particular, ι(T ) equals the indicator that the root is essential in T . Note also that, by (2.14) and (2.13), a node is essential if and only if none of its children is.
(2.15) Remark 2.1. By (2.12) and (2.14), the independence number I(T ) equals the number of essential nodes in T . Moreover, (2.15) implies that the set of essential nodes is independent, and thus an independent set of maximum size.

Main result
We next state our main theorem. Recall that the Laplace functional of the point process Ξ is defined as Theorem 3.1. Let T n , n 1, be random trees that can be defined as stopped family trees of Crump-Mode-Jagers processes as in Section 2.1, for some point process Ξ = (ξ i ) satisfying assumptions (A1)-(A5)and some weight ψ. Then, as n → ∞, where α is the Malthusian parameter and p(t) is the unique function [0, ∞) → (0, 1] satisfying Note that the result does not depend on the choice of weigth ψ. Proof. By (2.12) and (2.14), I(T n )/|T n | is the fraction of nodes in T n that are essential. We apply [7, Theorem 5.14(ii)] to the property that a node is essential. (This theorem is a special case of deep results by Jagers and Nerman [8; 15], see also Aldous [1].) Then [7, (5.23)-(5.24)] yield (3.2), with To see (3.3), condition on Ξ = (ξ i ) i , i.e., on the sequence of times that the root gives birth. Then, the children of the root of T t are the individuals i born at times ξ i t. Each such child has grown a tree T i t that has the same distribution as T t−ξ i , and thus Furthermore, still conditioned on Ξ, the events in (3.5) for different i are independent, and thus using (2.13), Fix β > α, and define h(t) := sup s t e −βs ∆p(s) ∈ [0, 1]. Then, (3.7) yields and thus (3.9) implies h(t) = 0 for any t 0. Thus p 1 (t) = p(t), and the solution to (3.3) is unique.
Note that T 0 consists of the root only, and thus (3.4) yields the initial condition, also a trivial special case of (3.3), Remark 3.2. An explanation for the formula (3.2) for ν is that a random fringe tree of T n converges in distribution to T τ , the tree obtained by stopping the branching process at a time τ ∼ Exp(α) independent of the brancing process; thus ν is the probability that the root of T τ is essential. See further [7].
taking the product over all children (born yet or not). We will use this a couple of times, but note that all formulas for p(t) assume t 0.

The random recursive tree
The random recursive tree is an example of a random tree that can be constructed as in Section 2.1, taking Ξ to be a Poisson process with constant intensity 1 on [0, ∞) and the trivial weight ψ(t) = 1, see [7, Example 6.1]. Thus Theorem 3.1 applies and shows I(T n )/|T n | a.s. −→ ν as n → ∞. To find the limit ν, note first that by the standard formula [9, Theorem 3.9] for the Laplace functional of a rate 1 Poisson process This can also be seen directly as follows. The number of children of the root at time t has the Poisson distribution Po(t). Furthermore, a child born at time s t has probability p(t − s) of being essential at time t, and thus, by the independence properties of the branching process, the children of the root that are essential at time t are born according to a random thinning of the rate 1 Poisson process; this thinning is a Poisson process on [0, t] with intensity p(t − ·). In particular, the number of children of the root of T t that are essential is Po(λ(t)) with λ(t) = t 0 p(t − s) ds. By (2.15), the root is essential if and only if this number is 0, which has probability e −λ(t) , and (4.2) follows.

The Binary Search Tree
The binary search tree is another example where Theorem 3.1 applies. Now each node gets two children, after waiting times that are independent and Exp(1). Again, ψ(t) = 1.
We proceed to find the limit ν. In this case, (3.3) yields This can also, perhaps more easily, be seen as follows. As always, if a child is born at time s t, then the probability that this child is essential at time t is p(t − s). Hence, the probability that the left child of the root is born and is essential at time t is t 0 p(t − s)e −s ds, and thus the probability that there is no left child that is essential equals 1 − t 0 p(t − s)e −s ds. The same holds for the right child, and since the two children appear and develop independently, (2.15) yields (5.1).

Preferential attachment trees
Consider now a preferential attachment tree, where nodes are added one by one, and each new node chooses a parent at random, with the probability of choosing a node v as the parent is proportional to χd is the current outdegree of v, and χ and ρ are given constants. (Here ρ > 0 and either χ 0 or ρ/|χ| is an integer. Only the ratio χ/ρ is significant.) This random tree can be constructed by a CMJ process where an individual that already has k children gets the next child with rate χk + ρ; see [7,Example 6.4]. Again the weight ψ(t) = 1. Thus Theorem 3.1 applies and shows I(T n )/|T n | a.s. −→ ν as n → ∞. To find p(t) and ν, we use instead of (3.3) the following (closely related) argument. Consider also, for λ > 0, a modified branching process X λ , where the starting individual (= the root) is special, and gets children with the rate χk + λ, where k is the current number of children. All other individuals are as before, with rate χk + ρ. (If χ < 0, we assume that λ/|χ| is an integer; this case will be enough below.) Let p λ (t) be the probability that the root is essential in the family tree of X λ at time t. Note that if λ = ρ, then the modified process equals the original one, and thus p(t) = p ρ (t).
Consider again the original process, let t > 0, and condition on the first child of the root being born at time s t. The probability that this child is not essential at time t is 1 − p(t − s). Furthermore, if we ignore this child and its descendants, the rest of the tree evolves after time s as the modified process X χ+ρ . Hence, the probability that no child of the root after the first is essential at time t is p χ+ρ (t − s). Consequently, conditioned on the first child being born at time s t, the probability that the root has no essential child at time t is 1 − p(t − s) p χ+ρ (t − s). By (2.15), this is the conditional probability that the root is essential at time t, and since the time the first child is born has the distribution Exp(ρ), we have If we have two independent modified processes X λ 1 and X ′ λ 2 , then we may merge them by identifying the two roots. This yields a modified process X λ 1 +λ 2 with parameter λ = λ 1 + λ 2 . There are no essential children of the root in the combined process if and only if there are none in both modified processes taken separately; hence, it follows that, for any t 0 and λ 1 , λ 2 > 0, Fix t > 0. Since 0 < p λ (t) 1, (6.2) implies that λ → p λ (t) is decreasing, which in turn implies that (6.2) has the solution p λ (t) = e −C(t)λ for some C(t) 0. Hence, Combining (6.1) and (6.3) yields the functional equation Again, p is infinitely differentiable, and taking the derivative yields Then (6.5) can be written p ′ (t) = −ρh(p(t)). By continuity and h(0) = 0, this maximum always exists. Furthermore, if χ 0, then (6.6) implies h(x) > 0 on (0, 1], and thus q = 0. On the other hand, if χ < 0, then h(x) < 0 for small positive x, and thus 0 < q < 1. The function p(t) is continuous on [0, ∞), with p(0) = 1. Suppose that p(t) = q for some t < ∞, and let t 0 be the smallest such t. Then p(t) ∈ [q, 1] for t ∈ [0, t 0 ]. However, h(x) is continuously differentiable and thus Lipschitz on [q, 1], as is seen by considering the cases χ 0 and χ < 0 separately, and thus the differential equation (6.7) has at most one solution for t ∈ [0, t 0 ] with p(t) ∈ [q, 1] and p(t 0 ) = q. Since p(t) = q is another solution of (6.7), this is a contradiction. Hence, p(t) = q, and thus, by continuity, p(t) > q for all t 0. This further implies, by (6.7) again, that p(t) is strictly decreasing on [0, ∞). Hence the limit p(∞) := lim t→∞ p(t) exists. Then (6.7) implies p ′ (t) → −ρh(p(∞)) as t → ∞, and thus p(∞) > q is impossible; hence, It follows from (6.12), letting t → ∞, that the limit Ψ(q) = ∞, which also easily can be seen directly from (6.10). The Malthusian parameter α = χ + ρ, see [7, (6.20)], and thus (3.2) and (6.13) yield (6.14) The change of variables s = Ψ(x) and an integration by parts yield the formulas These integrals can be evaluated numerically. Example 6.2. Let χ = ρ = 1; this yields the standard preferential attachment random tree. (This is the same as the plane oriented recursive tree [16]; it is a special case of the preferential attachment graphs [3], [6,Chapter 8].) We have χ ′ = 1, and thus (6.6) yields We find from (6.10), and thus (6.17) yields, with an integral that magically has an elementary primitive function, Thus, Theorem 3.1 shows that for the standard preferential attachment tree,

Extended binary search trees
An extended binary search tree is a binary search tree where we have added further leaves at all possible places; thus the original nodes (called internal nodes) have all two children each, and the new nodes (called external nodes) have no children. This can be constructed by a CMJ process where each individual gets twins after an Exp(1) time (and no further children). Note that in the tree T t , the internal nodes are the ones that have had children, while the others are external nodes.
We may choose to measure the size of an extended binary search tree in three different ways: the total number of nodes, the number of internal nodes, or the number of external nodes. (These are related in simple ways, since in the binary case treated here, the number of external nodes is always 1 + the number of internal nodes.) We obtain these three versions as our T n by choosing different weight functions ψ; ψ(t) = 1 as usual gives the total number of vertices, while the number of internal vertices is given by Z ψ t with ψ(t) := 1{ξ 1 t} and the number of external vertices is given by ψ(t) := 1{ξ 1 > t}. Recall that Theorem 3.1 applies, and gives the same limit, to all three versions.
Each individual (node) starts by gaining weight; the weight ψ(t) represents the number of keys in the node. It starts with ψ(0) = 1, and then increases by 1 after successive independent waiting times Y 2 , . . . , Y m−1 with Y i ∼ Exp(i). At time S := m−1 i=2 Y i the weight thus reaches m − 1; this marks puberty, and the node becomes fertile and gets m children after further independent waiting times X i ∼ Exp(1). (Thus, child i is born at S + X i .) Theorem 3.1 thus applies. To find ν, we condition on S and find that if 0 s t, then (with p(u) = 0 for u < 0, see Remark 3.3) These equations can (as far as we know) only be solved numerically. and then ν can be computed numerically by (3.3), with the Malthusian parameter α = 1 [7]. We obtain ν =