On general subtrees of a conditioned Galton-Watson tree

We show that the number of copies of a given rooted tree in a conditioned Galton-Watson tree satisfies a law of large numbers under a minimal moment condition on the offspring distribution.


Introduction
Let T n be a random conditioned Galton-Watson tree with n nodes, defined by an offspring distribution ξ with mean E ξ = 1, and let t be a fixed ordered rooted tree. We are interested in the number of copies of t as a (general) subtree of T n , which we denote by N t (T n ). For details of these and other definitions, see Section 2. Note that we consider subtrees in a general sense. (Thus, e.g., not just fringe trees; for them, see similar results in [9].) The purpose of the present paper is to show the following law of large numbers under minimal moment assumptions. Let n t (T ) be the number of rooted copies of t in a tree T , i.e., copies with the root at the root of T . Further, let ∆(t) be the maximum outdegree in t. Theorem 1.1. Let t be a fixed ordered tree, and let T n be a conditioned Galton-Watson tree defined by an offspring distribution ξ with E ξ = 1 and E ξ ∆(t) < ∞. Also, let T be a Galton-Watson tree with the same offspring distribution. Then, as n → ∞, N t (T n )/n L 1 −→ E n t (T ), (1.1) where the limit is finite and given explicitly by (3.2) below. Equivalently, N t (T n )/n p −→ E n t (T ), (1.2) and E N t (T n )/n → E n t (T ). (1. 3) The fact that (1.1) is equivalent to (1.2)-(1.3) is an instance of the general fact that for any random variables, convergence in L 1 is equivalent to convergence in probability together with convergence of the means of the absolute values (i.e., in this case, with non-negative variables, the means); see e.g. [6,Theorem 5.5.4]. We nevertheless state both versions for convenience.
Chyzak, Drmota, Klausner and Kok [1] (see also [2,Section 3.3]) considered patterns in random trees; their patterns differ from the subgraph counts above in that some external vertices are added to t, and that one only considers copies Date: 9 November, 2020. Supported by the Knut and Alice Wallenberg Foundation. 1 of t in a tree T such that each internal vertex in the copy has the same degree in T as in t (counting also edges to external vertices); equivalently, each vertex in t is equipped with a number, and one considers only copies of t where the vertex degrees match these numbers. (Another difference is that [1] consider unrooted trees, but the proof proceeds by first considering rooted [planted] trees. Furthermore, only uniformly random labelled trees are considered in [1], but the proofs extend to suitable more general conditioned Galton-Watson trees, as remarked in [1] and shown explicitly in [11; 12].) It was shown in Chyzak, Drmota, Klausner and Kok [1] that the number of occurences of such a pattern is asymptotically normal, with asymptotic mean and variance both of the order n (except that the variance might be smaller in at least one exceptional degenerate case), which of cource entails a law of large numbers. Moreover, [1] discuss briefly generalizations, including subtrees without further degree conditions as in the present paper; they expect asymptotic normality to hold in this case too, but it seems that their method, which is based on setting up and analyzing a system of functional equations for generating functions, in general would require extensions to infinite systems, which as far as we know has not been pursued. (See [4] for a related problem.) See further Section 5.
Our method is probabilistic, and quite different from the analysis of generating functions in [1].

Notation
All trees are rooted and ordered. The root of a tree T is denoted o = o T . The size |T | of a tree T is defined as the number of vertices in T .
The degree d(v) of a vertex v ∈ T always means the outdegree, i.e., the number of children of v. The degree sequence of T is the sequence of all degrees d(v), v ∈ T , for definiteness in depth first order. Let ∆(T ) := max v∈T d(v) be the maximum (out)degree in T .
A (general) subtree T ′ of a tree T is a non-empty connected subgraph of T ; we regard a subtree as a rooted tree in the obvious way, with the root being the vertex in T ′ that is closest to the root in T . Note that for any vertex v ∈ T ′ , its set of children in T ′ is a subset of its set of children in T ; the order of the children of v in T ′ is (by definition) the same as their relative order in T .
If v ∈ T , the fringe subtree T v is the subtree of T consisting of v and all its descendants; this is thus a subtree with root v.
If t and T are ordered rooted tree, let N t (T ) be the number of (general) subtrees of T that are isomorphic to t (as ordered trees), and let n t (T ) be the number of such subtrees that furthermore have root o T . Then n t (T v ) is the number of subtrees with root v isomorphic to t, and thus In other words, N t (T ) is an additive functional with toll function n t (T ), see e.g. [9]. Let T be a random Galton-Watson tree defined by an offspring distribution (p i ) ∞ 0 , and let T n be the conditioned Galton-Watson tree defined as T conditioned on |T | = n (tacitly considering only n such that P |T | = n > 0); see e.g. [8] for a survey. We let ξ be a random variable with the distribution (p i ) ∞ 0 ; we call both (p i ) ∞ 0 and (with a minor abuse) ξ the offspring distribution. We will only consider offspring distributions with E ξ = 1 (i.e., ξ is critical). (We often repeat this for emphasis.) Let σ 2 := Var ξ ∞; we tacitly assume σ 2 > 0, but do not require σ 2 < ∞ unless we say so. C and c denote unspecified constants that may vary from one occurrence to the next. They may depend on parameters such as the offspring distribution or the fixed tree t, but they never depend on n.
Convergence in probability and distribution is denoted p −→ and d −→, respectively. Unspecified limits are as n → ∞.

Proof
We begin by finding the expectation of n t for both unconditioned and conditioned Galton-Watson trees. Let Proof. (i): We try to construct a copy t ′ of t in T , with the given root o. Let m 1 be the root degree of T . Then there are m 1 d 1 ways to choose the d 1 children of the root that belong to t ′ . Fix one of these choices, say v 11 , . . . , v 1d 1 .
Next, let m 2 be the number of children of v 11 in T . Given m 2 , there are m 2 d 2 ways to choose the d 2 children of v 11 that belong to t ′ . Fix one of these choices. Continuing in the same way, taking the vertices of t ′ in depth first order, we find for every sequence m 1 , . . . , m k of non-negative integers, a total of k choices, and each of these gives a tree t ′ ∼ = t provided the selected vertices in T have degrees m 1 , . . . , m k , which occurs with probability k i=1 p m i . Hence, Condition on m 1 , . . . , m k and one of the corresponding choices of t ′ . The probability that the m − k + 1 children above and their descendants are n − k vertices is the probability that a Galton-Watson process (with offspring distribution ξ) started witk m − k + 1 individuals has total progeny n − k, which by the Otter-Dwass formula [5] (see also [17] and the further references there) is given by Multiplying with k i=1 p m i , the probability that the vertices in t ′ have the right degrees in T , and summing over all possibilities, we obtain By the Otter-Dwass formula again (this time the original case in [15]), and ( We need estimates of the probabilities P S n = n − m . The estimate (3.8) below is standard; we expect that also (3.9) is known, but we have not found a reference, so we give a proof. (It is related to more difficult estimates in e.g. [16] assuming more moments, see Remark 3.3 below.) Lemma 3.2. Suppose that E ξ = 1 and E ξ 2 < ∞. Then, uniformly for all n 1 and m ∈ Z, Proof. (3.8): This is well-known. In fact, the classical local limit theorem, see e.g. [16, Theorem VII.1], gives the much more precise result that, uniformly in m ∈ Z as n → ∞, where h is the span of the offspring distribution. (Provided h|(n − m); otherwise the probability is 0.) (3.9): Let ϕ(t) := E e it(ξ−1) be the characteristic function of ξ −1 = ξ −E ξ; note that ϕ(t) is twice differentiable because E ξ 2 < ∞. Then, by Fourier inversion, Hence, using an integration by parts, and thus The assumptions yield ϕ ′ (0) = E(ξ − 1) = 0 and sup |ϕ ′′ (t)| = |ϕ ′′ (0)| = Var ξ = C < ∞, and thus Assume for simplicity that the span of ξ is 1 (the general case is similar, with standard modifications). Then, as is well-known, it is easy to see that there exist c > 0 such that Using (3.14) and (3.15) in (3.13) we obtain which proves (3.9).
Remark 3.3. In the same way, taking two derivatives inside (3.11), one obtains  and for n > k, with as above m := i m i =: |m| (and C = 1, actually), Denote the summand in (3.20) by b m,n . By the local limit theorem (3.10), as is well-known, which is summable by (3.19). Consequently, dominated convergence shows that which together with (3.20) yields the result n −1/2 E n t (T n ) → 0.
We will see in Example 4.4 below, that the estimate o(n 1/2 ) in Lemma 3.4 is best possible in general. However, if we assume another moment on ξ, we can improve the estimate to O(1), and furthermore show that E n t (T n ) converges. We next show this, although it is not required for our main result.
Lemma 3.5. Let t be a fixed tree with degree sequence d 1 , . . . , d k , and suppose that E ξ = 1. Then, as n → ∞, Proof. Define again a m by (3.18), and denote the summand in (3.3) by b ′ m,n , where as above m = (m 1 , . . . , m k ) ∈ Z k 0 . It follows from the local limit theorem (3.10) that for every fixed m, as n → ∞, and it remains only to evaluate the limit. Since t is a tree, we have k i=1 d i = k − 1, and thus m − k which equals the right-hand side of (3.26) because ( . This completes the proof by (3.32).
Remark 3.6. Assume only E ξ = 1. If T is the infinite size-biased Galton-Watson tree defined by Kesten [10], see also [8,Section 5], then T n d −→ T in a local topology (i.e., close to the root), see [8,Theorem 7.1], and it follows that It is not difficult to see that E n t ( T ) equals the right-hand side of (3.26), which thus says that E n t (T n ) → E n t ( T ). (This could presumably be used to give an alternative proof of Lemma 3.5, but we prefer the direct proof above.) In particular, if E ξ ∆(t)+1 = ∞, then E n t ( T ) = ∞, and thus (3.34) and Fatou's lemma yield E n t (T n ) → ∞. Hence, the last sentence in Lemma 3.5 holds also without the assumption E ξ 2 < ∞.
We proceed to the proof of Theorem 1.1. The case ∆(t) 1 is special, since we then do not assume E ξ 2 < ∞, but on the other hand this case is simple and rather trivial, so we discuss it separately in the following example.
Example 3.7. Consider the case ∆(t) 1. This means that t is a path P k with k 1 vertices, and thus length k − 1. A copy of t in a tree T is thus a path consisting of k vertices v 1 , . . . , v k such that v i+1 is a child of v i ; such a path is determined by its endpoint v k , and every vertex of depth (= distance from the root) at least k − 1 is the endpoint of a copy of t. Hence, if ν i (T ) is the number of vertices in T of depth i, then (3.35) In particular, N P 1 (T n ) = n and N P 2 (T n ) = n − 1 are deterministic; these are trivially just the numbers of vertices and edges. Moreover, as said in Remark 3.6, assuming E ξ = 1, the random tree T n converges locally in distribution as n → ∞, see [8,Theorem 7.1]; in particular each ν i (T n ) converges in distribution (to ν i ( T )) and thus ν i (T n ) = O p (1) (i.e., is bounded in probability). Hence, for every k 1, (3.35) implies (3.36) In particular, N P k (T n ) is more strongly concentrated than the dispersion of order n 1/2 typically seen in similar statistics, see e.g. Example 4.2 and Section 5.

Examples
We give some simple but illuminating examples. Recall also Example 3.7.
Example 4.1. Let t = t q,r consist of two paths with q + 1 and r + 1 vertices, joined at the root; here q, r 1. We have k = 1 + q + r and d 1 = 2 while d i = 1 for i > 1; thus ∆(t) = 2. Since E ξ = 1, (3.2) yields Hence, Theorem 1.1 yields, for any q, r 1, Example 4.2. Consider the special case q = r = 1 of Example 4.1. Then t 1,1 is a cherry, i.e., a root with two children. If a vertex v in a tree T has degree d(v), then the number of cherries rooted at v is d(v) 2 , and thus where X r (T ) is the number of vertices of degree r in T . It is known that X r (T n )/n p −→ p r , see e.g. [8,Theorem 7.11]. Hence, (4.2) (with q = r = 1) is what we would get by dividing (4.3) by n and taking the limit inside the sum; if the degree distribution is bounded, the sum is finite so this is rigorous and (4.2) (still with q = r = 1) follows from (4.3).
In this case we can say much more than (4.2). It was proved in [13], see also [3], that X r (T n ) is asymptotically normal, with for some explicit γ 2 r . This was extended to joint convergence for all r in [7], provided E ξ 3 < ∞. Hence, at least if ξ is bounded, it follows from (4.3) that N t 1,1 (T n ) is asymptotically normal, with for some explicit γ 2 0. There are degenerate cases where γ 2 = 0. For example, for full binary trees (P(ξ = 2) = P(ξ = 0) = 1 2 ), all degrees are 0 or 2, and then each X r (T ) is a deterministic function of |T |; hence (4.3) shows that N t 1,1 (T n ) is deterministic. More generally, the same happens for full m-ary trees, with ξ ∈ {0, m} a.s., for any m 2. But it can be seen from the covariances given in [7] that γ 2 > 0 in all other cases with bounded ξ. See further Section 5. Example 4.3. Let ℓ 1, and let ̟ ℓ (T ) be the number of (undirected) paths of length ℓ in T . For definiteness, we count undirected paths, so this equals the number of unordered pairs (v, w) of vertices of distance ℓ. There are two cases: (i) v is an ancestor of w, or conversely; the number of such pairs is N P ℓ (T ).
(ii) Neither v nor w is an ancestor of the other. Then v and w are the two leaves in a copy of t q,r with q, r 1 and q + r = ℓ. For given q and r, the number of such pairs equals N tq,r (T ) Consequently, (4.6) Hence, Examples 3.7 and 4.1 yield For example, taking ξ ∼ Po(1) we obtain (forgetting the ordering) a uniformly random unordered labelled tree; we have σ 2 = 1 and thus (4.7) yields Similarly, taking ξ ∼ Ge(1/2) we obtain a uniformly random ordered tree; we have σ 2 = 2 and thus (4.7) then yields (4.9) Taking ξ ∼ Bi(2, 1/2) we obtain a uniformly random binary tree; we have σ 2 = 1/2 and thus (4.7) now yields (4.10) The following example shows that the estimate o n 1/2 in Lemma 3.4 is best possible.
Example 4.4. For simplicity, let the tree t be a star, where the root has degree ∆ 2 and its children are leaves with degree 0. (The argument is easily modified to any tree t with ∆(t) 2.) Thus k := |t| = ∆ + 1. Assume that the span of ξ is 1.
The local limit theorem (3.10) implies that if n is large and m n 1/2 , then and thus, using (3.21), P(S n−k = n − m − 1)/ P(S n = n − 1) c. If ε > 0, and we let p m = m −∆−1−ε for large m, then E ξ ∆ < ∞, and (4.13) yields, for large n, Hence, for any ε > 0, E n t (T n ) can grow faster than n 1/2−ε . Similarly, we can find an offspring distribution (p m ) ∞ 0 satifying the conditions such that E n t (T n ) = n 1/2−o(1) ; we omit the details. Moreover, for any given sequence δ(n) ց 0, we can find (p m ) ∞ 0 such that E n t (T n ) δ(n)n 1/2 , at least for a subsequence. To see this, take an increasing sequence (m j ) ∞ 1 with ∞ j=1 jδ(m 2 j ) < 1. Let p m j := jδ(m 2 j )m −∆ j , and p m = 0 for all other m 2, choosing p 0 and p 1 such that i p i = i ip i = 1. Also, let n j := m 2 j . Then (4.13) implies that, for large j,

Asymptotic normality?
We showed in Example 4.2 that if ξ is bounded, then N t 1,1 (T n ) is asymptotically normal in the sense that (4.5) holds (although γ 2 = 0 is possible). In fact, this holds for any fixed tree t. Proposition 5.1. Assume that ξ is bounded. Then, for any fixed tree t, for µ t := E n t (T ) and some γ 2 t 0.
Proof. This follows from the result by Chyzak, Drmota, Klausner and Kok [1] on patterns discussed in Section 1 (extended to conditioned Galton-Watson trees [1; 11; 12]); the assumption on ξ means that vertex degrees are bounded by some constant, and thus there is a finite number of patterns that correspond to subtrees isomorphic to t; hence N t (T n ) is a linear combination of pattern counts, and the result follows from the joint asymptotic normality of the latter. (See also [14] for a special case.) Alternatively, this is an application of [9, Theorem 1.13]: the functional n t is local (as defined in [9]) and for trees with degrees bounded by some constant K, n t is bounded. Hence (5.1) follows from [9, Theorem 1.13].
We conjecture that this behaviour is typical, and that Proposition 5.1 holds for every ξ with E ξ = 1 that satisfies a suitable moment condition. However, it seems that substantial additional work would be required to show this. As said in the introduction, this was briefly discussed in [1], but it seems that the method there requires extensions to infinite systems of functional equations. Similarly, the application of [9, Theorem 1.13] requires n t (T n ) to be bounded, which is not the case when ξ is unbounded. It is possible that this may be overcome by truncations and some variance estimates, but again more work is needed. (The extension in [18] applies to the case when t is a star with root degree ∆ (including Example 4.2 with ∆ = 2) and E ξ 2∆+1 < ∞; this might suggest further extensions.) This problem is thus left for future research.
Note also that there are degenerate cases when the asymptotic variance in (5.1) γ 2 t = 0; see Examples 3.7 and 4.2. (Then (5.1) does not give asymptotic normality; only a concentration result.) However, we conjecture that this is an exception, occuring only in a few special cases.