Simply generated trees, conditioned Galton--Watson trees, random allocations and condensation

We give a unified treatment of the limit, as the size tends to infinity, of simply generated random trees, including both the well-known result in the standard case of critical Galton--Watson trees and similar but less well-known results in the other cases (i.e., when no equivalent critical Galton--Watson tree exists). There is a well-defined limit in the form of an infinite random tree in all cases; for critical Galton--Watson trees this tree is locally finite but for the other cases the random limit has exactly one node of infinite degree. The proofs use a well-known connection to a random allocation model that we call balls-in-boxes, and we prove corresponding theorems for this model. This survey paper contains many known results from many different sources, together with some new results.


Svante Janson
To cite this version: 1 Simply generated trees and Galton-Watson trees We suppose that we are given a fixed weight sequence w = (w k ) k 0 of non-negative real numbers. We then define the weight of a finite rooted and ordered (a.k.a. plane) tree T by taking the product over all nodes v in T , where d + (v) is the outdegree of v. Trees with such weights are called simply generated trees and were introduced by Meir and Moon [24]. We let T n be the random simply generated tree obtained by picking a tree with n nodes at random with probability proportional to its weight. (To avoid trivialities, we assume that w 0 > 0 and that there exists some k 2 with w k > 0. We consider only n such that there exists some tree with n vertices and positive weight.) One particularly important case is when ∞ k=0 w k = 1, so the weight sequence (w k ) is a probability distribution on Z 0 . (We then say that (w k ) is a probability weight sequence.) In this case we let ξ be a random variable with the corresponding distribution: P(ξ = k) = w k . It is easily seen that the simply generated random tree T n equals the conditioned Galton-Watson tree with offspring distribution ξ, i.e., the random Galton-Watson tree defined by ξ conditioned on having exactly n vertices.
One of the reasons for the interest in these trees is that many kinds of random trees occuring in various applications (random ordered trees, unordered trees, binary trees, . . . ) can be seen as simply generated random trees and conditioned Galton-Watson trees, see e.g. Aldous [3,4], Devroye [9] and Drmota [10].
It is easily seen that if a, b > 0 and we change w k to then the distribution of T n is not changed. In other words, the new weight sequence ( w k ) defines the same simply generated random trees T n as (w k ). (This is essentially due to Kennedy [19], who did not consider trees but showed the corresponding result for Galton-Watson processes. See also Aldous [4].) We say that weight sequence (w k ) and ( w k ) related by (2) (for some a, b > 0) are equivalent. In many cases it is possible to change the weight (w k ) to an equivalent probability weight sequence; in this case T n can thus be seen as a conditioned Galton-Watson tree. Moreover, in many cases this can be done such that the resulting probability distribution has mean 1. In such cases it thus suffices to consider the case of a probability weight sequence with mean E ξ = 1; then T n is a conditional critical Galton-Watson tree. It turns out that this is a nice and natural setting, with many known results proved by many different authors. We extend here some of these results to the general case, including cases where no equivalent probability weight sequence exists.

Notation
We consider a fixed weight sequence w = (w k ) k 0 ; we let be the generating function of the given weight sequence, and let ρ ∈ [0, ∞] be its radius of convergence. We further define, for t such that Φ(t) < ∞, In particular, if Φ(ρ) < ∞, then Then ν = 0 ⇐⇒ ρ = 0, and if ρ > 0, then If ρ > 0, then ν is the supremum of the means of all probability weight sequences equivalent to (w k ). 3 Main result for simply generated random trees Our main limit theorem for simply generated random trees is the following. The case when ν 1 and σ 2 < ∞ was shown implicitly by Kennedy [19] (who considered Galton-Watson processes and not trees), and explicitly by Aldous and Pitman [5], see also Grimmett [14], Kolchin [21], Kesten [20] and Aldous [4]. Special cases with 0 < ν < 1 and ν = 0 are given by Jonsson and Stefánsson [18] and Janson, Jonsson and Stefánsson [17], respectively.
The limit (in distribution) in the theorem is for a topology where convergence means convergence of outdegree for any fixed node; it thus really means local convergence close to the root. (It is for this purpose convenient to regard the trees as subtrees of the infinite Ulam-Harris tree.) See [16] for details.
Theorem 3.1 Let w = (w k ) k 0 be any weight sequence with w 0 > 0 and w k > 0 for some k 2.

Remark 3.4
If we replace (w k ) by an equivalent weight sequence ( w k ), we obtain the same distribution (π k ).
Remark 3.5 If ρ > 0, then τ > 0 and the distribution (π k ) is a probability weight sequence equivalent to (w k ). There are other equivalent probability weight sequences, but Theorem 3.1 shows that (π k ) has a special role and therefore is a canonical choice of a weight sequence in its equivalence class. Furthermore, (π k ) is the unique probability distribution with mean 1 that is equivalent to (w k ), if any such distribution exists. If no such distribution exists but ρ > 0, then (π k ) is the probability distribution equivalent to (w k ) that has the maximal mean.
Remark 3.6 When ν 1, the quantity σ 2 is a natural parameter of the weight sequence (w k ), which frequently occurs in asymptotic results. In this case a calculation yields also the formula [4]

The infinite limit tree
Let (π k ) k 0 be a probability distribution on N 0 and let ξ be a random variable on N 0 with distribution (π k ) ∞ k=0 : We assume that the expectation µ := E ξ = k kπ k 1 (the subcritical or critical case). We then define (based on Kesten [20] and Jonsson and Stefánsson [18]) a modified Galton-Watson tree T as follows: There are two types of nodes: normal and special, with the root being special. Normal nodes have offspring (outdegree) according to independent copies of ξ, while special nodes have offspring according to independent copies of ξ, where (Note that this is a probability distribution on N 1 .) Moreover, all children of a normal node are normal; when a special node gets an infinite number of children, all are normal; when a special node gets a finite number of children, one of its children is selected uniformly at random and is special, while all other children are normal.
Since each special node has at most one special child, the special nodes form a path from the root; we call this path the spine of T . We distinguish two different cases: (T1) If µ = 1 (the critical case), then ξ < ∞ a.s. so each special node has a special child and the spine is an infinite path. Each outdegree d + (v) in T is finite, so the tree is infinite but locally finite.
In this case, the distribution of ξ in (14) is the size-biased distribution of ξ, and T is the size-biased Galton-Watson tree defined by Kesten [20], see also Aldous [4], Aldous and Pitman [5] and Lyons, Pemantle and Peres [23]. The underlying size-biased Galton-Watson process is the same as the Qprocess studied in Athreya and Ney [6, Section I.14], which is an instance of Doob's h-transform.
(See Lyons, Pemantle and Peres [23] for further related constructions in other contexts and Geiger and Kauffmann [13] for a generalization.) An alternative construction of the random tree T is to start with the spine (an infinite path from the root) and then at each node in the spine attach further branches; the number of branches at each node in the spine is a copy of ξ − 1 and each branch is a copy of the Galton-Watson tree T with offspring distributed as ξ; furthermore, at a node where k new branches are attached, the number of them attached to the left of the spine is uniformly distributed on {0, . . . , k}. (All random choices are independent.) Since the critical Galton-Watson tree T is a.s. finite, it follows that T a.s. has exactly one infinite path from the root, viz. the spine.
(T2) If µ < 1 (the subcritical case), then a special node has with probability 1 − µ no special child. Hence, the spine is a.s. finite and the number L of nodes in the spine has a (shifted) geometric distribution Ge(1 − µ), The tree T has a.s. exactly one node with infinite outdegree, viz. the top of the spine. T has a.s. no infinite path.
In this case, an alternative construction of T is to start with a spine of random length L, where L has the geometric distribution (15). We attach as in (T1) further branches that are independent copies of the Galton-Watson tree T ; at the top of the spine we attach an infinite number of branches and at all other nodes in the spine the number we attach is a copy of ξ * − 1 where ξ * d = ( ξ | ξ < ∞) has the size-biased distribution P(ξ * = k) = kπ k /µ. The spine thus ends with an explosion producing an infinite number of branches, and this is the only node with an infinite degree. This is the construction by Jonsson and Stefánsson [18].
Example 4.1 In the extreme case µ = 0, or equivalently ξ = 0 a.s., i.e., π 0 = 1 and π k = 0 for k 1, (14) shows that ξ = ∞ a.s. Hence, every normal node has no child and is thus a leaf, while every special node has an infinite number of children, all normal. Consequently, the root is the only special node, the spine consists of the root only (i.e., its length L = 1), and the tree T consists of the root with an infinite number of leaves attached to it, i.e., T is an infinite star. (This is also given directly by the alternative construction in (T2) above.) In contrast, T consists of the root only, so |T | = 1. In this case there is no randomness in T or T .

Remark 4.2
In case (T1), if we remove the spine, we obtain a random forest that can be regarded as coming from a Galton-Watson process with immigration, where the immigration is described by an i.i.d. sequence of random variables with the distribution of ξ − 1, see Lyons, Pemantle and Peres [23]. (In the Poisson case, Grimmett [14] gave a slightly different description of T using a Galton-Watson process with immigration.) In case (T2), we can do the same, but now the immigration is different: at a random (geometric) time, there is an infinite immigration, and after that there is no more immigration at all. Remark 4.3 Some related modifications of Galton-Watson trees having a finite spine have been considered previously, see Sagitov and Serra [29], Addario-Berry, Devroye and Janson [1], Geiger [12] and Aldous [2]. Kurtz, Lyons, Pemantle and Peres [22] and Chassaing and Durhuus [8] have constructed related trees with infinite spines using multi-type Galton-Watson processs.

Remark 4.4
In case (T1), the random variable ξ is a.s. finite and has mean where σ 2 := Var ξ ∞. In case (T2), we have P( ξ = ∞) > 0 and thus E ξ = ∞. This suggests that in results that are known in the critical case (T1), and where σ 2 appears as a parameter, the correct generalization of σ 2 to the subcritical case (T2) is not Var ξ but E ξ − 1 = ∞.

Three different types of weights
Although Theorem 3.1 has only two cases, it makes sense to treat the case ρ = 0 separately. We thus have the following three (mutually exclusive) cases for the weight sequence (w k ): 1. Then 0 < τ < ∞ and τ ρ ∞. The weight sequence (w k ) is equivalent to (π k ), which is a probability distribution with mean µ = Ψ(τ ) = 1 and probability generating function ∞ k=0 π k z k with radius of convergence ρ/τ 1.
If we consider the modified Galton-Watson tree in Theorem 3.1, then III is the case discussed in Example 4.1; excluding this case, I and II are the same as (T1) and (T2) in Section 4.
We can reformulate the partition into three cases in more probabilistic terms. We say that ξ, or the distribution (p k ), has some finite exponential moment if E R ξ < ∞ for some R > 1, or equivalently, E e rξ < ∞ for some r > 0; this is equivalent to the probability generating function ∞ k=0 p k z k having radius of convergence strictly larger than 1. The cases I-III can then be described as follows: I. ν 1. Then (w k ) is equivalent to a probability distribution with mean µ = 1 (with or without some exponential moment). Moreover, (π k ) in (8) is the unique such distribution.
II. 0 < ν < 1. Then (w k ) is equivalent to a probability distribution with mean µ < 1 and no finite exponential moment. Moreover, (π k ) in (8) is the unique such distribution.
III. ν = 0. Then (w k ) is not equivalent to any probability distribution.
Case I may be further subdivided. From an analytic point of view, it is natural to split I into two subcases: Ia. ν > 1; equivalently, 0 < τ < ρ ∞. The weight sequence (w k ) is equivalent to a probability distribution with mean µ = 1 and some finite exponential moment. (Then (π k ) is the unique such distribution.) (This case is called generic in [11] and [18].) Ib. ν = 1; then 0 < τ = ρ < ∞. The weight sequence (w k ) is equivalent to a probability distribution with mean µ = 1 and no finite exponential moment. (Then (π k ) is the unique such distribution.) Case Ia is convenient when using analytic methods, since it says that the point τ is strictly inside the domain of convergence of Φ, which is convenient for methods involving contour integrations in the complex plane. (See e.g. Drmota [10] for several such results of different types.) For that reason, many papers using such methods consider only case Ia. However, it has repeatedly turned out, for many different problems, that results proved by such methods often hold assuming only that we are in case I with finite variance of (π k ). (In fact, as shown in [15], it is at least sometimes possible to use complex analytic methods also in this case.) Consequently, it is often more important to partition case I into the following two cases: Iα. ν 1 and (π k ) has variance σ 2 < ∞. In other words, (w k ) is equivalent to a probability distribution (π k ) with mean µ = 1 and finite variance σ 2 .
Note that Ia is a subcase of Iα, since a finite exponential moment implies that the second moment is finite.
Remark 5. 1 We have seen that except in case III, we may without loss of generality assume that the weight (w k ) is a probability weight sequence. If this distribution is critical, i.e. has mean 1, we are in case I with π k = w k , so we do not have to change the weights.
If the distribution (w k ) is supercritical, then ν > 1 and we are in case Ia; we can change to an equivalent critical probability weight. Hence we never have to consider supercritical weights.
If the distribution (w k ) is subcritical, we can only say that we are in case I or II. We can often change to an equivalent critical probability weight, but not always. Theorem 6.1 Let (w k ) k 0 and (π k ) k 0 be as in Theorem 3.1. Then, as n → ∞,

Node degrees
Consequently, regarding d + Tn (o) as a random number in N 0 , where ξ is a random variable in N 0 with the distribution given in (14).
Note that the sum ∞ 0 dπ d = µ of the limiting probabilities in (17) may be less than 1; in that case we do not have convergence to a proper finite random variable, which is why we regard d + Tn (o) as a random number in N 0 .
If we instead take a random node, we obtain a different limit distribution, viz. (π k ). (When ν > 1, this was proved by Otter [27], see also Minami [26].) Theorem 6.2 Let (w k ) k 0 and (π k ) k 0 be as in Theorem 3.1, and let v be a uniformly random node in T n . Then, as n → ∞,

The maximum degree
Results for the maximum degree are more complicated, and we discuss the different cases separately. We denote the maximum outdegree in the tree T n by Y (1) .
Case Ia: ν > 1 In this case 0 < τ < ρ ∞, and we have a logarithmic bound due to Meir and Moon [25]: if further w 1/k k → 1/ρ as k → ∞, then In particular, if ρ = ∞, then Y (1) = o p (log n). Moreover, if w k+1 /w k → a > 0 as k → ∞, then Y (1) = k(n) + O p (1) for some deterministic sequence k(n), so Y (1) is essentially concentrated in an interval of length O(1). The distribution of Y (1) is asymptotically given by a discretised Gumbel distribution, but different subsequences may have different limits and no limit distribution exists.
Similarly, if w k+1 /w k → 0, then Y (1) ∈ {k(n), k(n) + 1} so Y (1) is concentrated on at most two values, and often (but not always) on a single value.
Case Iα: ν 1 and σ 2 < ∞ The maximum outdegree Y (1) is asymptotically distributed as the maximum ξ (1) of n i.i.d. copies of ξ; this holds in the strong sense that the total variation distance tends to 0. Since E ξ 2 < ∞, this implies in particular Case Iβ: ν 1 and and this is (more or less) best possible.
Case II: 0 < ν < 1 In this case, if further (w k ) satisfies an asymptotic power-law w k ∼ ck −β as k → ∞, then Jonsson and Stefánsson [18] showed that while the second largest node degree Y (2) = o p (n). However, if the weight sequence is more irregular, this is no longer always true; it is possible (at least along a subsequence) that Y (1) = o p (n), which can be seen as incomplete condensation; it is also possible (at least along a subsequence) that Y (2) too is of order n, meaning condensation to two or more giant nodes.
Case III: ν = ρ = 0 This is similar to case II. In some regular cases we have (24), which now says Y (1) = n + o p (n), and then necessarily Y (2) = o p (n), but there are exceptions in other cases with an irregular weight sequence.

Balls-in-boxes
The balls-in-boxes model is a model for random allocation of m (unlabelled) balls in n (labelled) boxes. The set of possible allocations is thus B m,n := (y 1 , . . . , y n ) ∈ N n 0 : where y i counts the number of balls in box i. We suppose again that w = (w k ) ∞ k=0 is a fixed weight sequence, and we define the weight of an allocation y = (y 1 , . . . , y n ) as Given m and n, we choose a random allocation B m,n with probability proportional to its weight, see e.g. Bialas, Burda and Johnston [7]. We can replace the weight sequence by an equivalent weight sequence for the balls-in-boxes model just as we did for the random trees above.
Example 8.1 (probability weights) In the special case when (w k ) is a probability weight sequence, let ξ 1 , ξ 2 , . . . be i.i.d. random variables with the distribution (w k ). Then, B m,n has the same distribution as (ξ 1 , . . . , ξ n ) conditioned on n i=1 ξ i = m. (This construction of a random allocation B m,n is used by Kolchin [21] and there called the general scheme of allocation.) The connection to random trees is that if T is a tree with |T | = n, then its degree sequence (in depthfirst order, say) is an allocation in B n−1,n , with the same weight as the tree. Moreover, a converse holds by the following lemma, see e.g. Takács [30], Wendel [31], Pitman [28].