Recursive tree processes and the mean-field limit of stochastic flows

Interacting particle systems can often be constructed from a graphical representation, by applying local maps at the times of associated Poisson processes. This leads to a natural coupling of systems started in different initial states. We consider interacting particle systems on the complete graph in the mean-field limit, i.e., as the number of vertices tends to infinity. We are not only interested in the mean-field limit of a single process, but mainly in how several coupled processes behave in the limit. This turns out to be closely related to recursive tree processes as studied by Aldous and Bandyopadyay in discrete time. We here develop an analogue theory for recursive tree processes in continuous time. We illustrate the abstract theory on an example of a particle system with cooperative branching. This yields an interesting new example of a recursive tree process that is not endogenous.

1 Introduction and main results

Introduction
Let Ω and S be Polish spaces, let r be a finite measure on Ω with total mass |r| := r(Ω) > 0, and let γ : Ω × S N + → S be measurable, where N + := {1, 2, . . . }. Let T be the operator acting on probability measures on S defined as T (µ) := the law of γ[ω](X 1 , X 2 , . . .), (1.1) where ω is an Ω-valued random variable with law |r| −1 r and (X i ) i≥1 are i.i.d. with law µ. In this paper, we will be interested in the differential equation In Theorem 1 below, we will prove existence and uniqueness of solutions to (1.2) under the assumption that there exists a measurable function κ : Ω → N such that depends only on the first κ(ω) ∈ N coordinates, and Ω r(dω) κ(ω) < ∞. (1.4) Our interest in equation (1.2) stems from the fact that, as we will prove in Theorem 5 below, the mean-field limits of a large class of interacting particle systems are described by equations of the form (1.2). In view of this, we call (1.2) a mean-field equation. The analysis of this sort of equations is commonly the first step towards understanding a given interacting particle system. Some illustrative examples of mean-field equations in the literature are [DN97, (1.1)], [NP99,(1.2)], and [FL17,(4)].
In the special case that κ(ω) = 1 for all ω ∈ Ω, we observe that T (µ) = S µ(dx)K(x, · ), where K is the probability kernel on S defined as K(x, A) := P γ[ω](x) ∈ A (x ∈ S, A ⊂ S measurable). (1.5) In view of this, if ω 1 , ω 2 , . . . are i.i.d. with law |r| −1 r and X 0 has law µ, then setting X k := γ[ω k ](X k−1 ) (k ≥ 1) inductively defines a Markov chain with transition kernel K, such that X k has law T k (µ), where T k denotes the k-th iterate of the map T . Also, (1.2) describes the forward evolution of a continuous-time Markov chain where random maps γ[ω] are applied with Poisson rate r(dω). A representation of a probability kernel K in terms of a random map γ[ω] as in (1.5) is called a random mapping representation. More generally, when the function κ is not identically one, Aldous and Bandyopadhyay [AB05] have shown that the iterates T k of the map T from (1.1) can be represented in terms of a Finite Recursive Tree Process (FRTP), which is a generalization of a discrete-time Markov chain where time has a tree-like structure. More precisely, they construct a finite tree of depth k where the state of each internal vertex is a random function of the states of its offspring. If the states of the leaves are i.i.d. with law µ, they show that the state at the root has law T k (µ). They are especially interested in fixed points of T , which generalize the concept of an invariant law of a Markov chain. They show that each such fixed point ν gives rise to a Recursive Tree Process (RTP), which is a process on an infinite tree where the state of each vertex has law ν. One can think of such an RTP as a generalization of a stationary backward Markov chain (. . . , X −2 , X −1 , X 0 ). A fixed point equation of the form T (ν) = ν is called a Recursive Distributional Equation (RDE). Studying RDEs and their solutions is of independent interest as they appear naturally in many applications, see for example [AB05,Als12].
In the present paper, we develop an analogue theory in continuous time, generalizing the concept of a continuous-time Markov chain to chains where time has a tree-like structure. Let (T t ) t≥0 be the semigroup defined by T t (µ) := µ t where (µ t ) t≥0 solves (1.2) with µ 0 = µ. (1.6) In Theorem 6, we show that T t has a representation similar to (1.1), namely where T is a countable set, G t : S T → S is a random map, and the (X i ) i∈T are i.i.d. with law µ and independent of G t . Similar to what we have in (1.3), the map G t does not depend on all coordinates in T but only on a finite subcollection (X i ) i∈∇St . Here (∇S t ) t≥0 turns out to be a branching process and condition (1.4) (which is not needed in the discrete-time theory) guarantees that the offspring distribution of this branching process has finite mean. Similarly to (1.5), we can view (1.7) as a random mapping representation of the operator in (1.6).
As we have already mentioned, in Theorem 5 below, we prove that the mean-field limits of a large class of interacting particle systems are described by equations of the form (1.2). These interacting particle systems are constructed by applying local maps at the times of associated Poisson processes, which are introduced in detail in Section 1.3.
We are not only interested in the mean-field limit of a single process, but mainly in the mean-field limit of n coupled processes that are constructed from the same Poisson processes. For each n ≥ 1, a measurable map g : S k → S gives rise to n-variate map g (n) : (S n ) k → S n defined as g (n) x 1 , . . . , x k ) = g (n) x 1 , . . . , x n ) := g(x 1 ), . . . , g(x n ) (x 1 , . . . , x n ∈ S k ), (1.8) where we denote an element of (S n ) k as (x m i ) m=1,...,n i=1,...,k with x i = (x 1 i , . . . , x n i ) and x m = (x m 1 , . . . , x m k ). Let P(S) denote the space of probability measures on a space S. Letting γ (n) [ω] denote the n-variate map associated with γ[ω] then, in analogy to (1.1), T (n) (µ) := the law of γ (n) [ω](X 1 , . . . , X κ(ω) ), (1.9) defines an n-variate map T (n) : P(S n ) → P(S n ), which as in (1.2) gives rise to an n-variate mean-field equation, which describes the mean-field limit of n coupled processes. If X is an S-valued random variable whose law ν := P[X ∈ · ] is a fixed point of T , then ν (n) := P[(X, . . . , X) ∈ · ] is a fixed point of T (n) that describes n perfectly coupled processes. We will be interested in the stability (or instability) of ν (n) under the n-variate mean-field equation. In other words, for our mean-field interacting particle systems, we fix the Poisson processes used in the construction and want to know if small changes in the initial state lead to small (or large) changes in the final state. Aldous and Bandyopadhyay [AB05] define an RTP to be endogenous if the state at the root is a measurable function of the random maps attached to all vertices of the tree. They showed, in some precise sense (see Theorem 10 below), that endogeny is equivalent to stability of ν (n) . In Theorem 11, we generalize their result to the continuous-time setting.
The n-variate map T (n) is well-defined even for n = ∞, and T (∞) maps the space of all exchangeable probability laws on S N + into itself. Let ξ be a P(S)-valued random variable with law ρ ∈ P(P(S)), and conditional on ξ, let (X m ) m=1,2,... be i.i.d. with common law ξ. Then the unconditional law of (X m ) m=1,2,... is exchangeable, and by De Finetti, each exchangeable law on S N + is of this form. In view of this, T (∞) naturally gives rise to a mapŤ : P(P(S)) → P(P(S)) which is the higher-level map defined in [MSS18], and which analogously to (1.2) gives rise to a higher-level mean-field equation. For any ν ∈ P(S), let P(P(S)) ν denote the set of all ρ ∈ P(P(S)) with mean ρ(dµ)µ = ν. In [MSS18] it is shown that if ν is a fixed point of T , then the corresponding higher-level mapŤ has two fixed points ν and ν in P(P(S)) ν that are minimal and maximal with respect to the convex order, defined in Theorem 14 below. Moreover, ν = ν if and only if the RTP corresponding to ν is endogenous.
We will apply the theory developed here as well as in [MSS18] to the higher-level mean-field equation for a particular interacting particle system with cooperative branching and deaths; see also [SS15,Mac17,BCH18] for several different variants of the model. To formulate this properly, it is useful to introduce some more general notation. Recall that for each ω ∈ Ω, γ[ω] is a map from S κ(ω) into S. We let G := γ[ω] : ω ∈ Ω (1.10) denote the set of all maps that can be obtained by varying ω. Here, elements of G are measurable maps g : S k → S where k = k g ≥ 0 may depend on g. If k = 0, then S 0 is defined to be a set with just one element, which we denote by ∅ (the empty sequence, which we distinguish notationally from the empty set ∅). We equip G with the final σ-field for the map ω → γ[ω] and let π denote the image of the measure r under this map. Then the mean-field equation (1.2) can be rewritten as ∂ ∂t µ t = G π(dg) T g (µ t ) − µ t (t ≥ 0), (1.11) where for any measurable map g : S k → S, T g (µ) := the law of g(X 1 , . . . , X k ), where (X i ) i=1,...,k are i.i.d. with law µ. (1.12) In the concrete example that we are interested in, S := {0, 1} and G := {cob, dth} each have just two elements. Here cob : S 3 → S and dth : S 0 → S are maps defined as cob(x 1 , x 2 , x 3 ) := x 1 ∨ (x 2 ∧ x 3 ) and dth(∅) := 0. (1.13) We choose π {cob} := α ≥ 0 and π {dth} := 1. (1.14) Then the mean-field equation (1.11) takes the form ∂ ∂t µ t = α T cob (µ t ) − µ t + T dth (µ t ) − µ t .
(1.15) which describes the mean-field limit of a particle system with cooperative branching (with rate α) and deaths (with rate 1). We will see that for α > 4, (1.11) has two stable fixed points ν low , ν upp , and an unstable fixed point ν mid that separates the domains of attraction of the stable fixed points.
In Theorem 17 below, we find all fixed points of the corresponding higher-level mean-field equation, and determine their domains of attraction. Note that solutions of the higher-level mean-field equation take values in the probability measures on P({0, 1}) ∼ = [0, 1]. As mentioned before, each fixed point ν of the original mean-field equation gives rise to two fixed points ν, ν of the higher-level mean-field equation, which are minimal and maximal in P(P(S)) ν with respect to the convex order. Moreover, ν = ν if and only if the RTP corresponding to ν is endogenous. In our example, we find that the stable fixed points ν low , ν upp give rise to endogenous RTPs, but the RTP associated with ν mid is not endogenous. The higher-level equation has no other fixed points in P(P(S)) ν mid except for ν mid and ν mid , of which the former is stable and the latter unstable. Numerical data for the nontrivial fixed point ν mid (viewed as a probability measure on [0, 1]) are plotted in Figure 2.

The mean-field equation
In this subsection, we collect some basic results about the mean-field equation (1.2) that form the basis for all that follows. We interpret (1.2) in the following sense: letting µ, φ := φ dµ, we say that a process (µ t ) t≥0 solves (1.2) if for each bounded measurable function φ : S → R, the function t → µ t , φ is continuously differentiable and (1.16) Our first result gives sufficient conditions for existence and uniqueness of solutions to (1.2).
Theorem 1 (Mean-field equation) Let S and Ω be Polish spaces, let r be a nonzero finite measure on Ω, and let γ : Ω × S N + → S be measurable. Assume that there exists a measurable function κ : Ω → N such that (1.3) and (1.4) hold. Then the mean-field equation (1.2) has a unique solution (µ t ) t≥0 for each initial state µ 0 ∈ P(S).
Theorem 1 allows us to define a semigroup (T t ) t≥0 of operators T t : P(S) → P(S) as in (1.6). It is often useful to know that solutions to (1.2) are continuous as a function of their initial state. The following proposition gives continuity w.r.t. the total variation norm · and moreover shows that if the constant K from (1.18) is negative, then the operators (T t ) t≥0 form a contraction semigroup.
Proposition 2 (Continuity in total variation norm) Under the assumptions of Theorem 1, one has (1.18) Continuity w.r.t. weak convergence needs an additional assumption.

1.19)
Then the operator T in (1.1) and the operators T t (t ≥ 0) in (1.6) are continuous w.r.t. the topology of weak convergence.

The mean-field limit
In this subsection, we show that equations of the form (1.2) arise as the mean-field limits of a large class of interacting particle systems. In order to be reasonably general, and in particular to allow for systems in which more than one site can change its value at the same time, we will introduce quite a bit of notation that will not be needed anywhere else in Section 1, so impatient readers can just glance at Theorem 5 and the discussion surrounding (1.36) and skip the rest of this subsection.
Let S be a Polish space as before, and let N ∈ N + . We will be interested in continuoustime Markov processes taking values in S N , where N is large. Denoting an element of S N by x = (x 1 , . . . , x N ), we will focus on processes with a high degree of symmetry, in the sense that their dynamics are invariant under a permutation of the coordinates. It is instructive, though not necessary for what follows, to view {1, . . . , N } as the vertex set of a complete graph, where all vertices are neighbors of each other. The basic ingredients we will use to describe our processes are: (i) a Polish space Ω ′ equipped with a finite nonzero measure q, (ii) a measurable function λ : Ω ′ → N + , as well as, for each ω ∈ Ω ′ and 1 ≤ i ≤ λ(ω), depends only on the coordinates in K i (ω).
We now use the ingredients Ω ′ , q, λ, and γ to define the class of Markov processes we are interested in. We construct these processes by applying local maps, that affect only finitely many coordinates, at the times of associated Poisson processes. In the context of interacting particle systems, such constructions are called graphical representations.
For any N ∈ N + we set [N ] := {1, . . . , N }. We let [N ] l denote the set of all sequences i = (i 1 , . . . , i l ) for which i 1 , . . . , i l ∈ [N ] are all different. Note that [N ] l has N l := N (N − 1) · · · (N − l + 1) elements. We will consider Markov processes X = (X(t)) t≥0 with values in S N that evolve in the following way: (i) At the times of a Poisson process with intensity |q| := q(Ω ′ ), an element ω ∈ Ω ′ is chosen according to the probability law |q| −1 q.
Here ω represents some external input that tells us that we need to apply the map γ[ω]. The coordinates where and the time when this map needs to be applied are given by i and t, respectively. It is easy to see that the random maps (X s,u ) s≤u form a stochastic flow, i.e., where 1 denotes the identity map. Moreover (X s,u ) s≤u has independent increments in the sense that X t 1 ,t 2 , . . . , X t k−1 ,t k are independent (1.31) for each t 1 < · · · < t k . It is well-known (see, e.g., [SS18, Lemma 1]) that if X(0) is an S N -valued random variable, independent of the Poisson set Π, then setting defines a Markov process X = (X(t)) t≥0 with values in S N . Note that (X(t)) t≥0 has piecewise constant sample paths, which are right-continuous because of the way we have defined Π s,u . We now formulate our result about the mean-field limit of Markov processes as defined in (1.32). For any x ∈ S N , we define an empirical measure µ{x} on S by Below, µ ⊗n := µ ⊗ · · · ⊗ µ denotes the product measure of n copies of µ. Theorem 5 (Mean-field limit) Let S be a Polish space, let Ω ′ , q, λ, and γ be as above, and assume (1.23). For each N ∈ N + , let (X (N ) (t)) t≥0 be Markov processes with state space S N as defined in (1.32), and let µ N t := µ{X (N ) (t)} denote their associated empirical measures. Let d be any metric on P(S) that generates the topology of weak convergence. Fix some (deterministic) µ 0 ∈ P(S) and assume that (at least) one of the following two conditions is satisfied.
0 for all n ≥ 1, where · denotes the total variation norm.
where (µ t ) t≥0 is the unique solution to the mean-field equation (1.22) with initial state µ 0 .
with common law µ 0 . Note that in (1.34), we rescale time by a factor N .
(1.35) Then the particle system in (1.32) has the following description. Let us say that a site i is occupied at time t if X i (t) = 1. Then, with rate α, three sites (i 1 , i 2 , i 3 ) ∈ [N ] 3 are selected at random. If the sites i 2 and i 3 are both occupied, then the particles at these sites cooperate to produce a third particle at i 1 , provided this site is empty. In addition, with rate 1, a site i is selected at random, and any particle that is present there dies. It is not hard to see that for our choice of Ω ′ , q, λ, and γ, the mean-field equation (1.22) simplifies to (1.15), Note that since γ 2 [1] and γ 3 [1] are the identity map, they drop out of (1.22), so only γ 1 [1] = cob and γ 1 [2] = dth remain. Since γ 1 [2](x 1 ) = 0 regardless of the value of x 1 , we can choose for K 1 (1) the empty set and view γ 1 [2] = dth as a function dth : S 0 → S.
Solutions of (1.15) take values in the probability measures on S = {0, 1}, which are uniquely characterized by their value at 1. Rewriting (1.15) in terms of p t := µ t ({1}) yields the equation

A recursive tree representation
In this subsection we formally introduce Finite Recursive Tree Processes (FRTPs) and state the random mapping representation of solutions to the mean-field equation (1.2) anticipated in (1.7). For d ∈ N + , let T d denote the space of all finite words i = i 1 · · · i n (n ∈ N) made up from the alphabet {1, . . . , d}, and define T ∞ similarly, using the alphabet N + . If i, j ∈ T d with i = i 1 · · · i m and j = j 1 · · · j n , then we define the concatenation ij ∈ T d by ij := i 1 · · · i m j 1 · · · j n . We denote the length of a word i = i 1 · · · i n by |i| := n and let ∅ denote the word of length zero. We view T d as a tree with root ∅, where each vertex i ∈ T d has d children i1, i2, . . ., and each vertex i = i 1 · · · i n except the root has precisely one ancestor and use the convention ∂∅ := {∅}, so that (1.38) holds also for n = 0. We return to the set-up of Subsection 1.1, i.e., S and Ω are Polish spaces, r is a nonzero finite measure on Ω, and γ : Ω × S N + → S and κ : Ω → N are measurable functions such that (1.3) holds. We fix some d ∈ N + ∪ {∞} such that κ(ω) ≤ d for all ω ∈ Ω and set T := T d . Let (ω i ) i∈T be an i.i.d. collection of Ω-valued r.v.'s with common law |r| −1 r. Fix n ≥ 1 and assume that (i) the (X i ) i∈∂T (n) are i.i.d. with common law µ and independent of (ω i ) i∈T (n) , Then it is easy to see that the law of X ∅ is given by T n (µ), where T n is the n-th iterate of the operator in (1.1). We call the collection of random variables a Finite Recursive Tree Process (FRTP). We can think of (X i ) i∈T (n) ∪∂T (n) as a generalization of a Markov chain, where time has a tree-like structure. We now aim to give a similar representation of the semigroup (T t ) t≥0 from (1.6). To do this, we let (σ i ) i∈T be i.i.d. exponentially distributed random variables with mean |r| −1 . We interpret σ i as the lifetime of the individual with index i and let (1.41) denote the times when the individual i is born and dies, respectively. Then are the (random) subtrees of T consisting of all individuals that have died before time t, resp. are alive at time t. If the function κ from (1.3) is bounded, then we can choose T := T d with d < ∞. Now it is easy to check that (∂T t ) t≥0 is a continuous-time branching process where each particle is with rate |r| replaced by d new particles. In particular, T t is a.s. finite for each t > 0. On the other hand, when κ is unbounded, we need to choose T := T ∞ , and this has the consequence that T t is a.s. infinite for each t > 0. Nevertheless, under the assumption (1.4), it turns out that only a finite subtree of T t is relevant for the state at the root X ∅ , as we explain now. Let S be the random subtree of T defined as and for each subtree U ⊂ S, let ∇U := {i ∈ S : ← i ∈ U, i ∈ U} denote the outer boundary of U relative to S, where again we use the convention that ∇U := {∅} if U is the empty set. Then, under condition (1.4), are a.s. finite for all t ≥ 0. Indeed, (∇S t ) t≥0 is a branching process where for each individual i, with Poisson rate r(dω), an element ω ∈ Ω is selected and i is replaced by new individuals i1, . . . , iκ(ω). The condition on the rates (1.4) guarantees that this branching process has finite mean and in particular does not explode, so that S t is a.s. a finite subtree of S. Let (ω i ) i∈T be i.i.d. with common law |r| −1 r, independent of the lifetimes (σ i ) i∈T . For any finite rooted subtree U ⊂ S and for each (x i ) i∈∇U = x ∈ S ∇U , we can inductively define x i for i ∈ U by Then the value x ∅ we obtain at the root is a function of (x i ) i∈∇U . Let us denote this function by G U : S ∇U → S, i.e.,  Figure 1: A particular realization of the branching process ∇S t for a system with cooperative branching and deaths. The random map G St is the concatenation of random maps attached to the vertices of the family tree S t of the individuals alive at time t. In this example, ∇S t = {22, 23, 313, 322, 323, 332} and the maps cob and dth are as defined in (1.13).
We can think of G U as the "concatenation" of the maps (γ[ω i ]) i∈U . We will in particular be interested in the random maps with S t as in (1.44). For our running example of a system with cooperative branching and deaths, these definitions are illustrated in Figure 1.
be the natural filtration associated with our evolving marked tree, that contains information about which individuals are alive at time t, as well as the random elements ω i and lifetimes σ i associated with all individuals that have died by time t. In particular, G t is measurable w.r.t. F t . The following theorem is a precise formulation of the random mapping representation of solutions of the mean-field equation (1.2), anticipated in (1.7).
Theorem 6 (Recursive tree representation) Let S and Ω be Polish spaces, let r be a nonzero finite measure on Ω, and let γ : Ω × S N + → S and κ : Ω → N be measurable functions satisfying (1.3) and (1.4). Let (ω i ) i∈S be i.i.d. with common law |r| −1 r and let (σ i ) i∈S be an independent i.i.d. collection of exponentially distributed random variables with mean |r| −1 . Fix t ≥ 0 and let G t and F t be defined as in (1.47) and (1.48). Conditional on F t , let (X i ) i∈∇St be i.i.d. S-valued random variables with common law µ. Then where T t is defined in (1.6).
Recalling the definition of G t , we can also formulate Theorem 6 as follows. With (ω i , σ i ) i∈S as above, fix t > 0 and let (X i ) i∈St∪∇St be random variables such that (i) Conditional on F t , the r.v.'s (X i ) i∈∇St are i.i.d. with common law µ, (1.50) Then (1.49) says that the state at the root X ∅ has law T t (µ). This is a continuous-time analogue of the FRTP (1.39).
In our proofs, we will first prove Theorem 6 and then use this to prove Theorem 5 about the mean-field limit of interacting particle systems. Recall that these particle systems are constructed from a stochastic flow (X s,t ) s≤t as in (1.32). To find the empirical measure of X(t) = X 0,t (X(0)), we pick a site i ∈ [N ] at random and ask for its type X i (t) which via X 0,t is a function of the initial state X(0). When N is large, X i (t) does not depend on all coordinates (X j (0)) j∈[N ] but only on a random subset of them, and indeed one can show that the map that gives X i (t) as a function of these coordinates approximates the map G t from Theorem 6, in an appropriate sense. The heuristics behind this are explained in some more detail in Subsection 4.1 below.
Remark Another way to write (1.49) is where T Gt is defined as in (1.12) for the random map G t and (µ t ) t≥0 is a solution to (1.2). One can check that (∇S t , G t ) t≥0 is a Markov process. Let us informally denote this process by (G t ) t≥0 and its state space by G. Then equation (1.49) can be understood as a (generalized) duality relationship between (G t ) t≥0 and (µ t ) t≥0 with (generalized) duality function H : G × P(S) → P(S) given by H(g, µ) := T g (µ). (1.52) With this definition, using the fact that G 0 is the identity map, (1.51) reads and we can obtain a family of usual (real-valued) dualities by integrating against a test function φ.

Recursive tree processes
Recall the definition of the operator T in (1.1) and the semigroup (T t ) t≥0 in (1.6). It is clear from (1.2) that for a measure ν ∈ P(S), the following two conditions are equivalent: (1.54) We call such a measure ν a fixed point of the mean-field equation (1.2). Condition (ii) is equivalent to saying that a random variable X with law ν satisfies where d = denotes equality in distribution, X 1 , X 2 , . . . are i.i.d. copies of X, and ω is an independent Ω-valued random variable with law |r| −1 r. Equations of this type are called Recursive Distributional Equations (RDEs).
FRTPs as in (1.39) are consistent in the sense that if (X i ) i∈∂T (n) are as in (1.39), then for any 1 ≤ m ≤ n, (i) the (X i ) i∈∂T (m) are i.i.d. with common law T n−m (µ) and independent of (ω i ) i∈T (m) , (1.56) The following lemma states a similar consistency property in the continuous-time setting.
Using the consistency relation (1.56) and Kolmogorov's extension theorem, it is not hard to see that if ν solves the RDE (1.54), then it is possible to define a stationary recursive process on an infinite tree such that each vertex has law ν. This was already observed in [AB05]. The following lemma is a slight reformulation of their observation.
Lemma 8 (Recursive Tree Process) Let ν be a solution to the RDE (1.54). Then there exists a collection (ω i , X i ) i∈T of random variables whose joint law is uniquely characterized by the following requirements.
(ii) For each finite rooted subtree U ⊂ T, the r.v.'s (X i ) i∈∂U are i.i.d. with common law ν and independent of (ω i ) i∈U . (1.58) We call a collection of random variables (ω i , X i ) i∈T as in Lemma 8 the Recursive Tree Process (RTP) corresponding to the map γ and the solution ν of the RDE (1.54). We can view such an RTP as a generalization of a stationary backward Markov chain. For most purposes, we will only need the random variables ω i , X i with i ∈ S, the random subtree defined in (1.43). The following proposition shows that by adding independent exponential lifetimes to an RTP, we obtain a stationary version of (1.57).
Proposition 9 (Continuous-time RTP) Let (ω i , X i ) i∈T be an RTP corresponding to a solution ν of the RDE (1.54), and let (σ i ) i∈T be an independent i.i.d. collection of exponentially distributed random variables with mean |r| −1 . Then, for each t ≥ 0, (i) conditional on F t , the r.v.'s (X i ) i∈∇St are i.i.d. with common law ν, (1.59) At the end of Subsection 1.3 we have seen that in our example of a system with cooperative branching, the RDE (1.54) has three solutions when the branching rate satisfies α > 4, two solutions for α = 4, and only one solution for α < 4. For α > 4, the solutions to the RDE are ν low , ν mid , and ν upp , where we let ν ... denote the probability measure on {0, 1} with mean ν ... ({1}) = z ... (. . . = low, mid, upp) as defined around (1.37). By Lemma 8, each of these solutions to the RDE defines an RTP.

Endogeny and bivariate uniqueness
In [AB05, Def 7], an RTP (ω i , X i ) i∈T corresponding to a solution ν of the RDE (1.54) is called endogenous if X ∅ is a.s. measurable w.r.t. the σ-field generated by the random variables (ω i ) i∈T . In Lemma 46 below, we will show that this is equivalent to X ∅ being a.s. measurable w.r.t. the σ-field generated by the random variables S and (ω i ) i∈S , where S is the random tree defined in (1.43). Aldous and Bandyopadhyay have shown that endogeny is equivalent to bivariate uniqueness, which we now explain.
Let P sym (S n ) denote the space of probability measures on S n that are symmetric with respect to permutations of the coordinates. Let π m : S n → S denote the projection on the m-th coordinate, i.e., π m (x 1 , . . . , x n ) := x m , and let µ (n) • π −1 m denote the m-th marginal of a measure µ (n) ∈ P(S n ). For any µ ∈ P(S), we define to be the set of probability measures on S n whose one-dimensional marginals are all equal to µ, and we denote P sym (S n ) µ := P sym (S n ) ∩ P(S n ) µ . Finally, we define a "diagonal" set S n diag := (x 1 , . . . , x n ) ∈ S n : x 1 = · · · = x n (1.61) and given a measure µ ∈ P(S), we let µ (n) denote the unique element of P(S n ) µ ∩ P(S n diag ), i.e., µ (n) := P (X, . . . , X) ∈ · , where X has law µ. (1.62) Recall the definition of the n-variate map T (n) in (1.9). The following theorem has been proved in [MSS18, Thm 1], and in a slightly weaker form in [AB05, Thm 11]. Below, ⇒ denotes weak convergence of probability measures.
Theorem 10 (Endogeny and n-variate uniqueness) Let ν be a solution of the RDE (1.54). Then the following statements are equivalent.
(i) The RTP corresponding to ν is endogenous.
(iii) ν (2) is the only fixed point of T (2) in the space P sym (S 2 ) ν .
We remark that bivariate uniqueness as introduced in [AB05] refers to ν (2) being the only fixed point of T (2) in the space P(S 2 ) ν . The equivalences in the above theorem tells us that bivariate uniqueness already follows from the weaker condition (iii) since it implies (ii), which implies n-variate uniqueness for any n ≥ 1.
We will prove a continuous-time extension of Theorem 10, relating endogeny to solutions of the n-variate mean-field equation where we have replaced T in (1.2) by T (n) and we write µ (n) t to remind ourselves that this is a measure on S n , rather than on S. This equation has the following interpretation. As in Subsection 1.3, let (X s,u ) s≤u be a stochastic flow on S N constructed from a Poisson point set Π. Let (X 1 (0), . . . , X n (0)) be a random variable with values in S n , independent of (X s,u ) s≤u . Then setting X 1 (t), . . . , X n (t) := X 0,t (X 1 (0)), . . . , X 0,t (X n (0)) (t ≥ 0) (1.64) defines a Markov process (X 1 (t), . . . , X n (t)) t≥0 that consists of n Markov processes with initial states X 1 (0), . . . , X n (0) that are coupled in such a way that they are constructed using the same stochastic flow. Applying Theorem 5 to this n-variate Markov process, we see that the mean-field equation for the n-variate process takes the form (1.63). We note that if µ ∈ P(S n diag ) for all t ≥ 0. We now formulate a continuous-time extension of Theorem 10. Note that in view of (1.54), a measure ν (2) is a fixed point of the bivariate mean-field equation (i.e., (1.63) with n = 2) if and only if it is a fixed point of T (2) . Therefore, the equivalence of points (i) and (iii) from Theorem 10 immediately implies an analogue statement in the continuous-time setting.
Theorem 11 (Endogeny and the n-variate mean-field equation) Under the assumptions of Theorem 10, the following conditions are equivalent.
(i) The RTP corresponding to ν is endogenous.
(ii) For any µ (n) 0 ∈ P(S n ) ν and n ≥ 1, the solution (µ Theorem 11 motivates us to study the bivariate mean-field equation in our example of a particle system with cooperative branching. Recall that in this example, G := {cob, dth} with cob and dth as in (1.13), and π is defined in (1.14). In line with (1.15) we write the bivariate mean-field equation as (1.65) For simplicity, we restrict ourselves to symmetric solutions, i.e., solutions that take values in P sym ({0, 1} 2 ). For any probability measure µ (2) ∈ P sym ({0, 1} 2 ), we let µ (1) denote its one-dimensional marginals, which are equal by symmetry. We let ν low , ν mid , ν upp denote the probability measures on mid , ν mid , and ν (2) upp .
(1.66) which are uniquely characterized by their respective marginals ν low , ν mid , ν mid , ν upp , as well as the fact that ν mid is not.

For any µ
(2) 0 ∈ P sym ({0, 1} 2 ), the solution to (1.65) started in µ (2) 0 converges as t → ∞ to one of the fixed points in (1.66), the respective domains of attraction being (1.67) For α = 4, there are two fixed points ν low . Combining Proposition 12 with Theorem 11, we see that the RTPs corresponding to ν low and ν upp are endogenous, but for α > 4, the RTP corresponding to ν mid is not. As is clear from [AB05,

The higher-level mean-field equation
Following [MSS18, formula (1.1)], if S is a Polish space and g : S k → S is a measurable map, then we define a measurable mapǧ : P(S) k → P(S) by g := the law of g(X 1 , . . . , X k ), where (X 1 , . . . , X k ) are independent with laws µ 1 , . . . , µ k .
(1.72) (Here E[ · ] denotes the expectation of a random measure; see the remark above Theorem 5.) Our notation for moment measures is on purpose similar to our earlier notation for solutions to the n-variate equation, because of the following proposition.
Proposition 13 (Moment measures) If (ρ t ) t≥0 solves the higher-level mean-field equation (1.71), then its n-th moment measures (ρ Similarly to Proposition 13, it has been shown in [MSS18, Lemma 2] thatŤ (ρ) (n) = T (n) (ρ (n) ), and this formula holds even for n = ∞. In view of this, as discussed in Subsection 1.1, the higher-level mapŤ is effectively equivalent to the ∞-variate map T (∞) : P sym (S ∞ ) → P sym (S ∞ ). It follows from Proposition 13 that if ρ solves the higher-level RDĚ (1.73) then its n-th moment measures solve the n-variate RDE T (n) (ρ (n) ) = ρ (n) , with T (n) as in (1.9). If X is an S-valued random variable defined on some probability space (Ω, F, P) and H ⊂ F is a sub-σ-field, then P[X ∈ · |H] is a random probability measure 1 on S. As a consequence, the law of P[X ∈ · |H] is an element of P(P(S)). In the following theorem, which is based on [Str65, Thm 2] and which in its present form we cite from [MSS18, Thm 13], we use the fact that each Polish space S has a metrizable compactification S [Bou58, §6 No. 1, Theorem 1]. Moreover, we naturally identify P(S) with the space of all probability measures on S that are concentrated on S.
Theorem 14 (The convex order for laws of random probability measures) Let S be a Polish space, let S be a metrizable compactification of S, and let C cv P(S) denote the space of all convex continuous functions φ : P(S) → R. Then, for ρ 1 , ρ 2 ∈ P(P(S)), the following statements are equivalent.
Proposition 15 (Extremal solutions in the convex order) If (ρ i t ) t≥0 (i = 1, 2) are solutions to the higher-level mean-field equation (1.71) such that ρ 1 0 ≤ cv ρ 2 0 , then ρ 1 t ≤ cv ρ 2 t for all t ≥ 0. If ν solves the RDE (1.54), then ν solves the higher-level RDE (1.73) and there exists a solution ν of (1.73) such that (1.75) Here ⇒ denotes weak convergence of measures on P(S), equipped with the topology of weak convergence. Any solution ρ ∈ P(P(S)) ν to the higher-level RDE (1.73) satisfies (1.76) The following result, which we cite from [MSS18,Prop. 4], describes the higher-level RTPs associated with the solutions ν and ν of the higher-level RDE. Proposition 16 (Higher-level RTPs) Let ν be a solution of the RDE (1.54) and let ν and ν as in (1.76) be the corresponding minimal and maximal solutions to the higher-level RDE, with respect to the convex order. Let (ω i , X i ) i∈T be an RTP corresponding to γ and ν and set (1.77) Then (ω i , ξ i ) i∈T is an RTP corresponding toγ and ν. Also, (ω i , δ X i ) i∈T is an RTP corresponding toγ and ν.
Proposition 16 gives a more concrete interpretation of the solutions ν and ν to the higherlevel RDE from (1.76). Indeed, if (ω i , X i ) i∈T is an RTP corresponding to ν, then (1.78) which corresponds to "perfect knowledge" about the state X ∅ of the root, while corresponds to the knowledge about X ∅ that is contained in the random variables (ω i ) i∈T . Since X ∅ is a measurable function of (ω i ) i∈T if and only if its conditional law given (ω i ) i∈T equals δ X∅ , it follows from (1.78) and (1.79) that the RTP corresponding to ν is endogenous if and only if ν = ν.
It is instructive to demonstrate the general theory on our concrete example of a system with cooperative branching and deaths. Recall that for α > 4, the mean-field equation (1.15) has three fixed points ν low , ν mid , ν upp . We denote the corresponding minimal and maximal solutions to the higher-level RDE in the sense of (1.76) by ν ... and ν ... (. . . = low, mid, upp). The following theorem lifts the results from Proposition 12 about the bivariate equation to a higher level. Indeed, using the theorem below, it is easy to see that the measures ν Theorem 17 (Higher-level equation for cooperative branching) Let ν low , ν mid , and ν upp denote the fixed points of the mean-field equation (1.15) defined above Proposition 12. Then we have for the corresponding minimal and maximal solutions to the higher-level RDE that For α > 4, the higher-level RDE (1.73) has four solutions, namely ν low , ν mid , ν mid , and ν upp . (1.81) Any solution (ρ t ) t≥0 to the higher-level mean-field equation (1.71) converges as t → ∞ to one of the fixed points in (1.81), the respective domains of attraction being (1.82) For α = 4, there are two fixed points ν low and ν upp with respective domains of attraction while for α < 4 all solutions converge to ν low .
Lemma 18 (Nontrivial solution of the higher-level RDE) Let α > 4 and let η be a random variable with law ν mid . Then (1.88) It is not too hard to obtain numerical data for ν mid , see Figure 2. These data suggest that apart from the atom in 0, the measure ν mid has a smooth density with respect to the Lebesgue measure, but we have no proof for this. We have tried to find an explicit formula for the density but have not been successful.

Lower and upper solutions
In this and the next subsection we collect a few further results on endogeny and the uniqueness of solutions to RDEs. In the present subsection, we show that the endogeny of the RTPs corresponding to ν low and ν upp follows from a general principle, discovered in [AB05], that says that RDEs that are defined by monotone maps always have a minimal and maximal solution with respect to the stochastic order, and that the RTPs corresponding to these solutions are always endogenous.
Let S be a compact metrizable space that is equipped with a partial order ≤ that is closed in the sense that is a closed subset of S 2 , equipped with the product topology. Recall that a function f from one partially ordered space into another is monotone if It is known that for two probability measures µ 1 , µ 2 ∈ P(S), the following statements are equivalent: (ii) f dµ 1 ≤ f dµ 2 for all bounded continuous monotone f : S → R.
The equivalence of (ii) and ( If µ 1 , µ 2 satisfy the above conditions, then one says that they are stochastically ordered, denoted as µ 1 ≤ µ 2 . This defines a partial order on P(S); in particular, by Lemma 50 below, The proposition below is a variant of [AB05, Lemma 15]. As in our usual setting, we assume that S and Ω are Polish spaces, r is a nonzero finite measure on Ω, and γ : Ω × S N + → S and κ : Ω → N are measurable functions such that (1.3) and (1.4) hold. If S is equipped with a partial order, then we equip S k with the product partial order. Recall that Proposition 3 gives sufficient conditions for T to be continuous w.r.t. the topology of weak convergence.
Proposition 19 (Lower and upper solutions to RDE) Assume that S is compact and equipped with a closed partial order. Assume that S has minimal and maximal elements, denoted by 0 and 1. Assume γ[ω] is monotone for each ω ∈ Ω and that the operator T in (1.1) is continuous w.r.t. the topology of weak convergence. Then there exists solutions ν low , ν upp to the RDE (1.54) that are minimal and maximal with respect to the stochastic order, in the sense that any solution ν to the RDE (1.54) must satisfy where ⇒ denotes weak convergence. Moreover, if (µ low t ) t≥0 and (µ upp t ) t≥0 denote the solutions to the mean-field equation (1.2) with initial states µ low 0 = δ 0 and µ upp 0 = δ 1 , then Finally, the RTPs corresponding to ν low and ν upp are endogenous.
We can view the solutions ν low and ν upp to the RDE (1.54) as mean-field versions of the lower and upper invariant laws of monotone particle systems; compare [Lig85, Thm III.2.3].
In our example of a system with cooperative branching, the maps cob and dth are monotone, so Proposition 19 is applicable. Since the measures we called ν low and ν upp before are the t → ∞ limits of the solutions of the mean-field equation started in δ 0 and δ 1 , our earlier notation agrees with the more general notation of Proposition 19. The endogeny of the RTPs corresponding to ν low and ν upp , which before we proved based on an analysis of the bivariate equation, using Proposition 12 and Theorem 10, alternatively follows from Proposition 19.

Conditions for uniqueness
In the present subsection, we prove some results of varying generality that allow one to conclude that a given RDE has a unique solution. In our example of a system with cooperative branching and deaths, this happens if and only if α < 4. We will see that there are some general results that can be applied to prove uniqueness in the whole regime α < 4. We also make a connection with a general duality for monotone particle systems described in [SS18]. Although duality plays only a minor role in our paper, the original motivation for the work that led to it was to understand this duality in the mean-field limit.
We return to our usual set-up from Subsection 1.1 with S and Ω Polish spaces and γ, κ and r satisfying (1.3) and (1.4). We also recall the random subtrees S t ⊂ S ⊂ T defined in (1.43) as well as the fact that S t for any t ≥ 0 are a.s. finite by (1.4). The tree S is the family tree of the branching process (∇S t ) t≥0 . In view of this, by well-known facts about branching processes, S is a.s. finite if and only if Recall that G t = G St , where for any finite subtree U ⊂ S that contains the root, G U : S ∇U → S is the map defined in (1.46). If S is a.s. finite, then ∇S t = ∅ for t sufficiently large and hence G t : S ∇St → S is eventually constant.
More generally, if U is finite subtree of S that contains the root ∅, then we say that U is a root determining subtree if the map G U : S ∇U → S is constant. Note that this can happen even if ∇U = ∅. It is easy to see that if V ⊂ U and V is root determining, then the same is true for U. We say that U is a minimal root determining subtree if U is root determining but there exists no V ⊂ U with V = U that is root determining. By our previous remark, it suffices to check this for such V that differ from U by a single element.
Lemma 20 (Root determining subtrees) The following conditions are equivalent: (i) There a.s. exists a t < ∞ such that G s is constant for all s ≥ t.
(ii) S a.s. contains a root determining subtree.
(iii) S a.s. contains a minimal root determining subtree.
If U is a subtree of S, then we denote by Ξ U the set of all x = (x i ) i∈U∪∇U that satisfy (1.45). We say that U is uniquely determined if x, y ∈ Ξ U imply x i = y i (i ∈ U). The following lemma is inspired by [AB05, Lemma 14] who showed that (i) implies that the RDE (1.54) has a unique solution and the corresponding RTP is endogenous.
(i) S a.s. contains a finite, uniquely determined subtree that contains the root ∅.
(ii) The equivalent conditions of Lemma 20 are satisfied.
(iv) The RDE (1.54) has at most one solution and any corresponding RTP is endogenous.
(v) The RDE (1.54) has a solution ν that is globally attractive in the sense that any solution where · denotes the total variation norm.
The following lemma illustrates these ideas on our example of a system with cooperative branching and deaths. Below, |U ∩ {i2, i3}| denotes the cardinality of U ∩ {i2, i3}. See Figure 3 for an example. (1.94) Lemma 22 shows that in our example of a system with cooperative branching and deaths, the conditions of Lemma 20 are in fact equivalent to uniqueness of solutions to the RDE. As the next lemma shows, this is a consequence of monotonicity.
Lemma 23 (Uniqueness for monotone systems) Assume that S is a finite partially ordered set that contains a minimal and maximal element, and assume that γ[ω] is monotone for each ω ∈ Ω. Then the RDE (1.54) has a unique solution if and only if the equivalent conditions of Lemma 20 are satisfied.
In the remainder of this subsection, we focus on the case that S = {0, 1} and γ[ω] is monotone for all ω ∈ Ω, which allows us to make a connection to a general duality for monotone particle systems described in [SS18] Recall from Section 1.4 that (∇S t , G t ) t≥0 is a Markov process. If S = {0, 1} and γ[ω] is monotone for all ω ∈ Ω, then the random map G t : {0, 1} ∇St → {0, 1} is monotone for each t ≥ 0. In view of this, by (1.96), G t is uniquely characterized by Y Gt and hence (∇S t , Y Gt ) t≥0 is a Markov process too. For a system with cooperative branching and deaths, this process has been defined before in [Mac17, Section I.2.1.2]. As explained in more detail there, it can be seen as the mean-field limit of a general dual for monotone particle systems described in be monotone for all ω ∈ Ω, and let U be a subtree of S that contains the root ∅. Borrowing terminology from percolation theory, we say where we use the convention that 1 We note that formula (1.98) can be generalized to more general finite partially ordered sets S, see Lemma 64 below. Again, it will be useful to illustrate our definitions on the concrete example of a system with cooperative branching and death. To make the example more interesting, we add a birth map bth : S 0 → S, which is defined similarly to the death map as bth(∅) := 1. (1.100) The following lemma describes open subtrees for a system described by the maps cob, dth, bth; see Figure 4 for an illustration.
We can think of open subtrees as a generalization of the open paths from oriented percolation. Outside of a mean-field setting, using ideas from [SS18, Section 5.2], one can characterize the upper invariant law of quite general monotone particle system in terms of "open structures" that in general are neither paths nor trees.

Discussion
This section is divided into four subsections. In Subsection 2.1, we discuss the relation of our work to [BCH18], who in parallel to our work have studied Moran models that generalize our running example of a system with cooperative branching and deaths. In Subsection 2.2, we compare our results and methods with the existing literature on mean-field limits. In Subsection 2.3, we state open problems and we conclude in Subsection 2.4 with an outline of the proofs.
This equation has an interpretation in terms of a Moran model describing a fixed population of N individuals which can be of two types, 0 and 1, where type 1 is fitter than type 0. The parameter γ is the frequency dependent selection rate, s is the selection rate, u is the mutation rate, and ν 0 , ν 1 are mutation probabilities. The frequency dependent selection is of a type that is especially appropriate to describe an advantageous, (partially) recessive gene in a diploid population.
In parallel to our work, Moran models of this form have been studied by Ellen Baake, Fernando Cordero, and Sebastian Hummel in [BCH18]. A notational difference between their work and the discussion here is that they denote the fitter type by 0, so their [BCH18, formula (2.1)] is our (2.3) rewritten in terms of y(t) = 1 − p t and with the roles of ν 0 and ν 1 reversed. They prove that (2.3) describes the mean-field limit of a class of Moran models [BCH18,Prop. 4.1] and that in the limit N → ∞, the genealogy of a single individual is described by an Ancestral Selection Graph (ASG) A t , which in our notation corresponds to i.e., this is the random tree with maps attached to its branch points depicted in Figure 1. The authors of [BCH18] define a duality function H(A t , p) which corresponds to the duality function in (1.52) after the identification µ({1}) = p. (Here we have slightly rephrased things compared to the different conventions in [BCH18], where 0 denotes the fitter type and y is the frequency of the unfit type.) In [BCH18,Lemma 4.4], they show that H(A t , p) can be calculated by concatenating the higher-level mapsγ[ω i ] with i ∈ S t . For example, the equation y = y 1 [y 2 + y 3 − y 2 y 3 ] in [BCH18, Lemma 4.4 (4)] can be rewritten in terms of p = 1 − y as p = cob(p 1 , p 2 , p 3 ) with cob as in (1.84).
In [BCH18,Section 5], it is shown that the ASG A t can be simplified a lot, while retaining all information necessary to calculate the duality function H(A t , p). This is done in three steps, I, IIa, and IIb.
In the step I, the ASG is pruned. This is a process in which parts of the tree that are irrelevant for the map G t are cut off. In particular, if the function G t is constant, then the pruned G t consists of a single edge ending in one of the maps dth or bth. In the remaining case, the pruned ASG is a finite tree where each branch point is marked with one of the maps cob and bra.
In steps IIa and IIb, the pruned ASG is stratified. In step IIa, the tree structure is changed in such a way that starting at the root, one first sees a ternary tree containing only the map cob, and then at the leaves of this ternary tree, there are attached binary trees containing only the map bra. In step IIb, each binary tree is replaced by an integer n ≥ 0 which records the number of leaves of the binary tree.
The result of this is a simplified process, the stratified ASG T t , which contains all necessary information about the ASG A t in the sense that there exists a function H(T t , p) such that In particular, solutions of (2.3) can be represented One can now check (compare Lemma 48 below) that ρ t := P[H(T t , p) ∈ · ] solves the higher-level mean-field equation with initial state ρ 0 = δ p , where we use the identification P({0, 1}) ∼ = [0, 1]. In [BCH18, Thm 6.5], it is observed that M t := H(T t , p 0 ) is a bounded sub-or supermartingale for each p 0 ∈ [0, 1] and hence converges to an a.s. limit H ∞ (p 0 ). In [BCH18, Prop. 6.6], it is proved that if p 0 is not an unstable fixed point of (2.3), then H ∞ (p 0 ) is a Bernoulli random variable with parameter lim t→∞ p t .
Our Propositions 15 and 16 imply that if p 0 is a fixed point of (2.3), then H ∞ (p 0 ) is a Bernoulli random variable if and only if the RTP corresponding to p 0 is endogenous. Thus, [BCH18,Prop. 6.6] implies that for the model in (2.2), RTPs corresponding to stable fixed points are always endogenous. Since all stable fixed points of (2.3) are in fact lower or upper solutions, this alternatively also follows from our Proposition 19.
In the special case s = 0 and ν 1 = 0, [BCH18, Prop. 6.6] follows alternatively from our Theorem 17, which completely describes the long-time behaviour of solutions to the higherlevel mean-field equation not just for initial states of the form ρ 0 = δ p , but for general initial states.

Mean-field limits
If N Markov processes interact in a way that is symmetric under permutations of the N coordinates, then it is frequently possible to obtain a nontrivial limit as N → ∞. Such limits are generally called mean-field limits. In the mean-field limit, the individual processes behave asymptotically independently, but with transition probabilities that depend on the average behavior of all processes. For systems of interacting diffusions, this principle was demonstrated by McKean in his analysis of the Vlasov equation [McK66]. Consequently, mean-field limits are also called McKean-Vlasov limits. There exists an extensive literature on the topic. Most work has focused on interacting diffusions, but jump processes have also been studied [ST85,ADF18]. An elementary introduction to mean-field limits for interacting particle systems is given in [Swa17,Chapter 3].
In a biological setting, well-mixing populations converge in the mean-field limit to the solution of a deterministic ODE. Similarly, spatial populations with strong local mixing can be expected to converge, after an appropriate rescaling, to the solution of a determinstic PDE. For interacting particle systems whose dynamics have an exclusion process component with a large rate, this intuition was made rigorous by De Masi, Ferrari and Lebowitz [DFL86, Thm 2]. They state their theorem only for processes whose state space S consists of two points, and only prove the theorem for one particular one-dimensional example, but sketch how the proof should be adapted to the general case. In [DN94, Thm 1], a version of the theorem is stated where S can be any finite set; it is claimed that the proof is again the same.
In our running example of a particle system with cooperative branching and deaths, the limiting PDE takes the form This PDE was used in [Nob92] to derive asymptotic properties of the associated spatial particle system with strong mixing. We can view (2.5) as a spatial version of the ODE (1.36); in particular, if p 0 (x) = p 0 does not depend on x, then p t = p t (x) solves (1.36). The intuition behind (2.5), and more general PDEs of this type, is easily explained. In the strong mixing limit, the genealogy of a single site should be described by a branching process as in Figure 1 where in addition, each particle has a position in R, which moves according to an independent Brownian motion. Convergence to the PDE should then follow from, on the one hand, convergence of the genealogies to a system of branching Brownian motions with random maps attached to their branching events, and, on the other hand, a representation in the spirit of Theorem 6 of solutions of the PDE (2.5) in terms of such a system of branching Brownian motions.
The proof of [DFL86, Thm 2] is indeed based on this sort of dual approach, although one would wish that they had given a more explicit statement of the stochastic representation of solutions of their general PDE. Our proof of Theorem 5 follows the same strategy, i.e., we first prove the stochastic representation of solutions to the mean-field equation (Theorem 6) and then use this to prove our convergence result (Theorem 5).

Open problems
In the present paper, we have adapted results from [AB05, MSS18] about discrete-time Recursive Tree Processes and endogeny to the continuous-time setting, and applied our general results on a concrete system with cooperative branching and deaths. Among other things, we proved that for α > 4, the RTPs corresponding to ν low and ν upp are endogenous but the RTP corresponding to ν mid is not. The proof was based on an analysis of the bivariate mean-field equation. Here, it was convenient to be able to analyse a differential equation, as an analysis of the associated discrete-time bivariate evolution would have been possible, but more messy.
Our work leaves a number of questions unanswered, both in the general setting and more specifically for our running example with G := {cob, dth} and π as in (1.14). Concerning the latter, we pose the following questions.
Open Problem 1 Not every measure µ (n) ∈ P sym {0, 1} n is the n-th moment measure of a measure ρ ∈ P P ({0, 1}) . Determine all symmmetric solutions of the n-variate RDE, for general n ≥ 3, and their domains of attraction.
Open Problem 2 Same as Open Problem 1 but without the symmetry assumption and for general n ≥ 2.
Open Problem 3 Prove that apart from the atom at zero, the law ν mid , viewed as a probability law on [0, 1], has a smooth density with respect to the Lebesgue measure.
Open Problem 4 Determine the aymptotics of the distribution function F of ν mid near 0 and 1.
Open Problem 5 For the more general model in (2.2), is it true that unstable fixed points of the mean-field equation that separate the domains of attraction of two stable fixed points correspond to nonendogenous RTPs? Is the picture for the higher-level RDE the same?
Partly inspired by our concrete example, we ask the following problems in the general setting.
Open Problem 6 Can (1.4) be relaxed to allow for branching processes (∇S t ) t≥0 that are nonexplosive but have infinite mean?
Question 7 Are there general results linking the (in)stability of fixed points of the mean-field equation to (non)endogeny of the related RTP? Question 8 In our example, the higher-level RDE has two solutions ν mid and ν mid with mean ν mid , of which the former is stable and the latter is unstable. Is this a general phenomenon in the nonendogenous case? Can one prove nonendogeny of an RTP corresponding to a solution ν of the RDE by showing that ν is unstable?
Open Problem 10 Is the higher-level RTP (ω i , ξ i ) i∈S from Proposition 16 always endogenous?
Finally, we mention the problem of proving nonendogeny for the frozen percolation of [Ald00], which to our knowledge is still open. Although we did not attempt to solve this problem here, one might hope that the methods of the present paper can provide a useful new point of view on this old problem.

Outline of the proofs
In the remainder of the paper, we prove all results stated so far, except for Theorems 10 and 14 as well as Proposition 16, which we cite from [MSS18, Thm 1, Thm 13, and Prop 4].
In Section 3 we prove Theorem 1, Propositions 2 and 3, and Lemma 4, which state elementary properties of solutions of the mean-field equations (1.22) and (1.2), as well as Theorem 6, which gives a stochastic representation of solutions of the mean-field equation in terms of finite recursive tree processes. In Section 4, we use this stochastic representation to prove Theorem 5 about convergence of finite systems to a solution of the mean-field equation.
In Section 5, we prove our main results about RTPs with continuous time, which are largely analogous to known results from the discrete-time setting. Basic results are Lemma 7 and Proposition 9, as well as Lemma 8 which deals with discrete time and is a slight reformulation of known results. Following [AB05], Theorem 11 links the n-variate equation to endogeny, while Propositions 13 and 15 are concerned with the higher-level equation, and closely follow ideas from [MSS18].
In Section 6 we prove some additional results about RTPs, first Proposition 19, which generalizes [AB05, Lemma 15] and shows that upper and lower solutions of a monotonous RDE are always endogenous, and then Lemmas 20, 21, 23, and 24 which give conditions for uniqueness in a general setting and then more specifically for monotone systems.
In Section 7, finally, we have collected all proofs that deal specifically with our running example of a system with cooperative branching and deaths. The first such result is Proposition 12 about the bivariate equation, which is a two dimensional ODE for which by elementary means we find all fixed points and their domains of attraction. By combining Proposition 12 with ideas involving the convex order we then prove the much stronger Theorem 17 which gives all fixed points and domains of attraction for the higher-level equation. The picture is then completed by the proofs of Lemma 18, which gives some properties of the nontrivial fixed point of the higher-level equation, as well as Lemmas 22 and 25 which illustrate ideas from Section 6 in the concrete set-up of our example.

The mean-field equation
In this section, we prove Theorems 1 and 6, which state that the mean-field equation (1.2) has a unique solution and can be represented in terms of a random tree generated by a branching process, with random maps attached to its vertices. In addition, we also prove Propositions 2 and 3, as well as Lemma 4.
In Subsection 3.1, we start with some preliminaries, showing, in particular, that the integral in (1.16) is well-defined, and Lemma 4, which says that mean-field equations of the form (1.22) can be rewritten in the simpler form (1.2).
Next, in Subsection 3.2, we prove uniqueness of solutions of (1.2), which yields the uniqueness parts of Theorem 1. To prove existence, in Subsection 3.3, we show that the right-hand side of (1.49) solves (1.2), which not only completes the proof of Theorem 1 but also yields the stochastic representation that is Theorem 6.
The proofs of Propositions 2 and 3, finally, can be found in Subsection 3.4.

Preliminaries
Recall that we interpret the mean-field equation (1.2) as in (1.16), where, by (1.12), Since by assumption, γ[ω](x 1 , . . . , x k ) is jointly measurable in ω and x 1 , . . . , x k , the right-hand side of (3.1) is measurable as a function of ω and hence the integral in (1.16) is well-defined.
Proof of Lemma 4 Recall from Subsection 1.3 that the basic ingredients that go into the equation (1.22) are the measure space (Ω ′ , q) and function λ, as well as, for each ω ∈ Ω ′ and 1 ≤ i ≤ λ(ω), the function γ i [ω] and set K i (ω). Also, κ i (ω) := |K i (ω)|. In terms of these basic ingredients we need to define Ω, r, κ, and γ as in Subsection 1.1 so that (1.22) takes the simpler form (1.2).
Since we want to replace the integral and sum in (1.22) by a single integral, we put where as before Ω ′ l := {ω ∈ Ω ′ : λ(ω) = l} and [l] := {1, . . . , l}, and we equip Ω with the measure In general, Ω need not be a Polish space, as required in Subsection 1.1. We will fix this problem at the end of our proof, but for the sake of the presentation we neglect it for the moment being. We define κ : Ω → N as in Subsection 1.1 by κ(ω, i) := κ i (ω), where the right-hand side is the function from Subsection 1.3. We write  We still have to fix the problem that Ω, as defined in (3.2), is in general not a Polish space. There are several possible ways to fix this. 2 The solution we will choose is to replace Ω by the Polish space were Ω ′ l denotes the closure of Ω ′ l in Ω ′ . We view r as a measure on Ω that is concentrated on Ω and extend κ and γ in a measurable way to the larger space, which is possible since Ω is a measurable subset of Ω. Since r is concentrated on Ω, it does not matter how we extend κ and γ as this has no effect on (3.6).

Uniqueness
In the present section, we prove that under the assumption (1.4), solutions to (1.2) are unique, which settles the uniqueness part of Theorem 1.
Below, we let M(S) denote the space of all finite signed measures on S. The total variation norm has already been mentioned several times. There are two conventional definitions, which differ by a factor 2. We will use the definition Lemma 26 (Lipschitz continuity) Let g : S k → S be measurable and let T g be defined as in (1.12). Then Moreover, if T is defined as in (1.1), then (3.11) Proof By (3.9) we can find an S 2 -valued random variable (X, Y ) such that µ − ν = P[X = Y ]. Let (X 1 , Y 1 ), . . . , (X k , Y k ) be i.i.d. copies of (X, Y ). Then, by (1.12), (3.12) This proves (3.10). Formula (3.11) follows by integrating over ω.
Our next lemma gives equivalent formulations of the mean-field equation (1.2), that will also be useful in the next subsection where we prove existence of solutions. Below, we interpret an integral of a measure-valued integrand in the usual way, i.e., (i) For each bounded measurable φ : S → R, the function t → µ t , φ is continuously differentiable and ∂ Proof Integrating the equation in (i) from time 0 until time t, we see that (i) implies (ii). Also, we can equivalently write the equation in (i) as If t → µ t ∈ P(S) is continuous with respect to the total variation norm, then Lemma 26 together with (1.4) imply that also t → T (µ t ) ∈ P(S) is continuous with respect to the total variation norm. It follows that t → µ t , φ and t → T (µ t ), φ are continuous for each bounded measurable φ : S → R. As a result, the right-hand side of (ii), integrated against any bounded measurable φ, is continuously differentiable as a function of t, and (ii) implies (i). By the same argument, rewriting (iii) as (3.15) and differentiating, we see that (iii) implies (i).
We now prove the promised uniqueness of solutions to (1.2). Proposition 2, which will be proved in Subsection 3.4 below, shows that the constant L from (3.17) is not optimal and can be replaced by the constant K from (1.18).

The stochastic representation
In this section, we prove the following proposition, that settles the existence part of Theorem 1. Together with Lemma 28, this completes the proof of Theorem 1 and at the same time also proves Theorem 6. We work in our usual set-up where S and Ω are Polish spaces, κ : Ω → N is measurable, γ is as in Subsection 1.1, and r is a nonzero finite measure on Ω satisfying (1.4). We fix T as in Section 1.4 and let (ω i ) i∈T be i.i.d. with common law |r| −1 r. We let (σ i ) i∈T be an independent i.i.d. collection of exponentially distributed random variables with mean |r| −1 and define S, S t , ∇S t , and G t as in (1.43), (1.44), and (1.47).

Proof of Proposition 29
The condition (1.4) guarantees that (∇S t ) t≥0 is a finite mean branching process; more precisely, by standard theory, (3.28) Fix µ 0 ∈ P(S) and define µ t and µ t,(n) as in (3.19) and (3.22). Then the total variation norm distance between these measures can be bounded by Since µ t+ε − µ t ≤ P S t+ε = S t , (3.32) using the fact that the branching process (∇S t ) t≥0 a.s. does not jump at deterministic times, we see that [0, ∞) ∋ t → µ t is continuous with respect to the total variation norm. Using this and (3.31), we see from Lemma 27 that (µ t ) t≥0 solves the mean-field equation (1.2).

Continuity in the initial state
In this subsection, we prove Propositions 2 and 3.
Proof of Proposition 2 It follows from Theorem 6 and Lemma 26 that where F t is the filtration defined in (1.48).
Proposition 3 follows from the following two lemmas.
Lemma 31 (Continuity of T ) Under the condition (1.19), the operator T in (1.1) is continuous w.r.t. the topology of weak convergence.
Proof If µ n ∈ P(S) converge weakly to a limit µ ∞ , then by Skorohod's representation theorem there exists random variables X n with laws µ n that converge a.s. to a limit X ∞ with law µ ∞ . Let (X n i ) n∈N∪{∞} i≥1 be i.i.d. copies of such a sequence (X n ) n∈N∪{∞} and let ω be an independent random variable with law |r| −1 r. Then by (1.19), and hence T (µ n ) converges weakly to T (µ ∞ ) by (1.1).
Lemma 32 (Continuity in the initial state) Assume that the operator T in (1.1) is continuous w.r.t. the topology of weak convergence. Then the same is true for the operators T t (t ≥ 0) defined in (1.6).
Proof We need to show that solutions of the mean-field equation (1.2) are continuous in their initial state, in the sense that if (µ k t ) t≥0 (k ∈ N ∪ {∞}) are started in initial states such that µ k 0 ⇒ µ ∞ 0 , then µ k t ⇒ µ ∞ t for all t ≥ 0. To see this, inductively define µ k t,(n) as in (3.22) with µ 0 replaced by µ k 0 . Using the continuity of T , by induction, we see that µ k t,(n) ⇒ µ ∞ t,(n) as k → ∞ for all n ≥ 1 and t ≥ 0. By (3.29), for each bounded continuous φ : S → R, the quantity µ k t,(n) , φ converges to µ k t , φ uniformly in k ∈ N ∪ {∞}, which allows us to conclude that µ k t , φ → µ ∞ t , φ as k → ∞ for all t ≥ 0.

Approximation by finite systems 4.1 Main line of the proof
In this section, we prove Theorem 5. The basic idea, which already goes back to [DFL86], is that in the mean-field limit, the genealogy of a site converges to a branching process, and sites are independent in the limit. More precisely, consider n sites, sampled uniformly at random from [N ]. To find out what their states are at time t, we follow the sites back until the last time when a random map is applied that has the potential to change the state of one of our sites. At this point, we stop following that given site but replace it by the sites that are relevant for the outcome of the map at the given site, and we continue in this way. When N is large, the new sites that are added in each step are with high probability sites we have not been following before, so that in the limit we obtain a branching process with random maps attached to its branch points. Making this idea precise yields the following proposition, that will be proved in Subsection 4.2 below.
Proposition 33 (State at sampled sites) For each N ∈ N + let (X (N ) (t)) t≥0 be a process as in Theorem 5 started in a deterministic initial state X (N ) (0). Fix t ≥ 0 and let T t be defined as in (1.6) but with the mean-field equation (1.2) replaced by (1.22). Fix n ≥ 1 and let I 1 , . . . , I n be i.i.d. uniformly distributed on [N ] and independent of X N (t). Then P X (N ) where · denotes the total variation norm, and the convergence in (4.1) is uniform w.r.t. the initial state X (N ) (0).
Proposition 33 allows us to control the mean and variance of µ N N t , which is enough to prove the convergence of µ N N t to µ t for fixed times t. To boost this up to pathwise convergence, we use the following lemma, that will be proved in Subsection 4.3 below.
Lemma 34 (Tightness in total variation) For each N ∈ N + let (X (N ) (t)) t≥0 be a process as in Theorem 5 started in a deterministic initial state X (N ) (0), and let µ N t := µ X (N ) (t) denote the empirical measure of X (N ) (t). Then there exist random processes (τ N t ) t≥0 such that τ N : R → R is a.s. nondecreasing with τ N 0 = 0 and where · denotes the total variation norm and L := Ω q(dω) λ(ω).
In Subsection 4.4, we will derive Theorem 5 from Proposition 33, Lemma 34, and some abstract considerations.

The state at sampled sites
In this subsection we prove Proposition 33. We start with two preparatory lemmas. Let The following lemma says that for large N , the map in (4.2) can be approximated by the map in (4.3).
Lemma 35 (Coupling of maps) For each t ≥ 0, it is possible to couple the random maps M N t and M N t with N ∈ N + in such a way that Proof The essence of the proof can be summarized as follows: since for large N , sampling with or without replacement from [N ] is almost the same, the genealogy of a given site is approximately given by a branching process. In spite of this simple idea, the proof is quite long, mainly because we have to take care of a lot of definitions, such as the way Ω, r, γ and κ are defined in terms of Ω ′ , q, γ i [ω] and K i (ω) in the proof of Lemma 4.
We start by recalling that the random map G t from (1.47) can be seen as the concatenation of random maps assigned to the branch points of a branching process. We then embed this branching process in the set [N ] and prove that what we obtain is a good approximation for the genealogy of a given site.
We observe that in order to construct the map G t : S ∇St → S from (1.47), it suffices to know S t , (ω i ) i∈St , (4.5) where S t is defined in (1.44). Indeed, from the information in (4.5) we can determine ∇S t , since (4.6) and the map G t : S ∇St → S is obtained by concatenating the maps γ[ω i ] with i ∈ S t according to the tree structure of S t . The object in (4.5) is in fact a Markov chain as a function of t. Starting from the initial state S 0 = ∅ and ∇S 0 = {∅}, its evolution is as follows: Independently for each i ∈ ∇S t , with rate |r|, we add i to S t and assign to it a value ω i chosen according to the probability law |r| −1 r.
We will be interested in the process in (4.5) in the special case when Ω, r, κ, and γ are defined in terms of Ω ′ , q, λ, K i (ω), and γ i [ω] as in the proof of Lemma 4. In this case, elements of Ω are pairs (ω, n) where ω ∈ Ω ′ and 1 ≤ n ≤ λ(ω), so we denote the process in (4.5) as where ω i ∈ Ω ′ and 1 ≤ n i ≤ λ(ω i ). The set ∇S t is now given by (4.8) Defining r as in (3.3), the process in (4.5) now evolves in such a way that independently for each i ∈ ∇S t , with rate |r|, we add i to S t and assign values (ω i , n i ) to it that are chosen according to the probability law |r| −1 r. Let α ∈ [N ] be fixed. Our next aim is to "embed" the process from (4.7) in the set [N ], in such a way that it approximates the genealogy of the site α. To this aim, we define, for each time, a random function ψ N t : S t ∪ ∇S t → [N ]. Initially, we set ψ N 0 (∅) := α. We let the function ψ N t evolve in a Markovian way together with the process in (4.7) in the following way. Recall that when we add an element i to S t and assign values (ω i , n i ) to it, this element is at the same time removed from ∇S t and replaced by new elements i1, . . . , iκ n i (ω i ). We assign labels ψ N t (ik) (k = 1, . . . , κ n i (ω i )) to these new elements as follows. First, we choose (I l ) l=1,...,λ(ω i ) in such a way that I n i := ψ N t (i) and (I l ) l =n i are i.i.d. uniformly chosen from [N ], (4.9) and next, we set ψ N t (ik) := I j k , where as in (3.4), we order the elements of K n i (ω i ) ⊂ {1, . . . , λ(ω i )} as K n i (ω i ) = j 1 , . . . , j κn i (ω i ) with j 1 < · · · < j κn i (ω i ) . (4.10) Note that this has the effect that if n i is an element of K n i (ω i ), say n i = j k , then the corresponding element ik gets the same label as i, i.e., ψ N t (ik) = ψ N t (i). Otherwise, we assign new i.i.d. labels to all new elements of ∇S t .
Using the function ψ N t that embeds the process in (4.7) in the set [N ], we define a function (4.11) We now consider the maps is the label initially assigned to the root. We claim that where · denotes the total variation norm. In particular, if α is chosen uniformly distributed in [N ] and independent of everything else, then (ψ N t (i)) i∈∇St are i.i.d. uniformly distributed in [N ] and independent of the map G t , so (4.13) implies (4.4).
To prove (4.13), we construct a process similar to the process in (4.7), together with an embedding in [N ], that describes the true genalogy of the site α, and show that the error we make by replacing this true genealogy by the process we had before is small. We denote this process as S t , (ω i , n i ) i∈St ,ψ N t . (4.14) At each time, ∇S t is defined in terms of this process in the same way as ∇S t is defined in (4.8). We also defineG t andφ N t : S N → S ∇St as before, i.e.,G t is the concatenation of the random maps γ n i [ω i ] with i ∈S t according to the tree structure ofS t , andφ N t : S N → S ∇St is defined in terms ofψ N t as in (4.11). Recall thatM N t (x) = X N −N t,0 (x) α . As for our previous process we start withS 0 = ∅, ∇S 0 = {∅}, andψ N (∅) = α. In Subsection 1.3, the stochastic flow (X N s,t ) s≤t is constructed from a Poisson point set Π. We will construct the process in (4.14) in terms of Π in such a way thatG which expresses the fact that the process in (4.14) describes the "true genealogy" of the site α.
The Poisson set Π consists of triples (ω, i, t) which express the fact that at time t the random map γ[ω] should be applied to the coordinates i = (i 1 , . . . , i λ(ω) ). Note that we are interested in X N −N t,0 (x) α , which means that we look at negative times and need to rescale time by a factor N . For each (ω, i, −N t) ∈ Π and j ∈ ∇S t such thatψ N t (j) = i l for some 1 ≤ l ≤ λ(ω), we update the process in (4.14) as follows: (i) We remove j from ∇S t and add it toS t .
It is straightforward to check that these rules guarantee that (4.15) holds and hence the process in (4.14) describes the true genealogy of the site α. As some more explanation, we can add the following: we follow a site β back in time till the first time when a map is applied that has the possibility to change the value of β. From that moment on, we follow back all sites that are relevant for the outcome of the map at β, and we number them according to the convention in (3.4). This defines a family structure, i.e., i = i 1 i 2 i 3 is the i 3 -th child of the i 2 -th child of the i 1 -th child of the original site α. The mapψ N t applied to i tells us where this ancestor lives in the set [N ]. There may be some overlap, i.e., it is possible thatψ N t (i) =ψ N t (j) for some i, j ∈S t ∪ ∇S t . For i, j ∈ ∇S t , however, the probability that two ancestors live at the same site in [N ] tends to zero as N → ∞, as we will see in a moment.
In view of (4.15), to prove (4.13), it suffices to prove that the Markov process in (4.14) is close in total variation distance to the process withS t andψ N t replaced by S t and ψ N t . Since the latter process is nonexplosive by (3.28), it suffices to prove convergence for the processes stopped at the first time when the cardinality of∇S t resp. ∇S t exceeds a certain value, and then at the end send this value to infinity. We will prove convergence of the stopped processes in a number of steps, by making small changes in the jump rates. Here we use the fact that if the transition kernels of two continuous-time Markov chains are close in total variation norm, uniformly in the starting point, then by standard arguments the two processes can be coupled so that their laws at fixed time are close in total variation norm.
LetΨ N t :=ψ N t (∇S t ) denote the image of ∇S t under the mapψ N t . As a first step, we change the dynamics of the (stopped) process from (4.14) in such a way that elements (ω, i, −N t) ∈ Π have no effect if {i 1 , . . . , i λ(ω) } intersectsΨ N t in more that one point. Then the modified process is still Markovian; we claim the change in jump rates compared to the original process is of order N −1 . Indeed, for fixed l, if i 1 , . . . , i l are chosen uniformly without replacement from [N ], then the probability that one, resp. two or more of them lie in a set A of fixed cardinality is of order N −1 resp. N −2 as N → ∞. Taking into account the fact that we rescale time by a factor N , as well as the summability condition (1.23) (i), this translates into a change in jump rates of order N −1 for the modified process, stopped at the first time when the cardinality of ∇S t exceeds a fixed value.
Recall that by (4.15), X N −N t,0 (x) α is a function only of (x β ) β∈Ψ N t . The modified process we have just constructed has the property thatψ N t : ∇S t →Ψ N t is a bijection, i.e., each element β ∈Ψ N t corresponds only to a single place (ψ N t ) −1 (β) in the family tree. The dynamics of the modified process can be described as follows: (i) Independently for each β ∈Ψ N t , with rates described by the measure r from (3.3), we choose a pair (ω, n) with 1 ≤ n ≤ λ(ω).
(iv) If some of the (β ′ k ) k =n are elements ofΨ N t , we do nothing.
(vi) If j = (ψ N t− ) −1 (β) is the place of β in the family tree immediately prior to time t, then we assign to each new element ofΨ N t a place in the family tree by setting (ψ N t− ) −1 (β j k ) := jk.
Removing the restrictions in points (ii) and (iv) above, and performing sampling without replacement instead of sampling with replacement in point (iii), we only make changes in the transition rates of order N −1 , and arrive at a process whose family tree evolves as the process in (4.7) and where to new members of the family tree, sites in [N ] are assigned chosen uniformly with replacement, as described by the process ψ N t . In the proof of Lemma 35, we have seen that in the mean-field limit N → ∞, the genealogy of a single site can be approximated by a branching process with random maps attached to its branch points. Similarly, the genealogy of n randomly chosen sites can be approximated by n independent branching processes, which leads to the following extension of Lemma 35. (4.16) Proof The proof is the same as the proof of Lemma 35, except that instead of following back the genealogy of one site, one follows the genealogies of n sites. By the same arguments as given in the proof of Lemma 35, when N is large, with high probability, the genealogies do not intersect, and hence can be approximated by independent branching processes. Although writing down all objects involved is notationally complicated, no new ideas are needed so we omit the details.
Proof of Proposition 33 Let x := X (N ) (0) be the (deterministic) initial state and using notation as in (1.33) let µ N 0 = µ{x} denote its empirical measure. Define mapsM N t and M N t as in Lemma 36. Then X (N ) In (N t) has lawM N t (x) while the coordinates of M N t (x) are i.i.d. with a law that by Theorem 6 equals T t (µ N 0 ). In view of this, the claim follows from Lemma 36.

Tightness in total variation
In this subsection we prove Lemma 34.

Proof of Lemma 34
The process (X (N ) (t)) t≥0 is defined in (1.32) in terms of a stochastic flow which is in turn defined in terms of a Poisson set Π. Elements of Π are triples (ω, i, s) which tell us that at time s the map γ[ω] should be applied to the coordinates i = (i 1 , . . . , i λ(ω) ). We let where L := Ω q(dω) λ(ω), which is finite by (1.23). Then (i) follows from a functional law of large numbers. Since for any s ≤ t, the fraction of sites in [N ] that changes its type is bounded from above by L(τ N t − τ N s ), in view of (3.9), we obtain also (ii).

Convergence to the mean-field equation
In this subsection, we prove Theorem 5. The proof is split into a number of lemmas. We start by proving convergence at fixed times. This part of the proof is based on Proposition 33. At the end of the proof, we use Lemma 34 to obtain pathwise convergence.
Proof Fix t ≥ 0. Let φ : S → [−1, 1] be measurable. Let I 1 and I 2 be uniformly distributed on [N ] and independent of each other and of X N (t). Since (4.20) Assume for the moment that X (N ) (0) is deterministic. Then applying Proposition 33 with n = 1, 2 we find that where we take the supremum over all measurable φ : S → [−1, 1]. It follows that and hence (4.18) follows by Chebyshev's inequality. To obtain (4.18) more generally when X (N ) (0) is random, we condition on the initial state to get, for each ε > 0 and measurable ψ : (4.23) Since the integrand on the right-hand side does not depend on ψ and tends to zero in a bounded pointwise way as a function of x ∈ S N , (4.18) follows.
Our next aim is to prove that if in addition to the assumptions of Lemma 37, condition (i) or (ii) of Theorem 5 is satisfied, then where d is any metric on P(S) that generates the topology of weak convergence. Applying the following well-known fact to the Polish space P(S), we see that if (4.24) holds for one such metric, then it holds for all of them.
Lemma 38 (Convergence in probability) Let X n be random variables taking values in a Polish space S, let x ∈ S be deterministic, and let d be a metric generating the topology on S. Then one has if and only if P X n ∈ · ] =⇒ n→∞ δ x , (4.26) where ⇒ denotes weak convergence of probability measures on S.
Proof It is easy to see that (4.25) implies E[φ(X n )] → φ(x) for all bounded continuous φ : S → R, so (4.25) implies (4.26). Conversely, if (4.26) holds, then by Skorohod's representation theorem it is possible to couple the random variables X n such that X n → x a.s., which implies (4.25).
The following lemma gives sufficient conditions for the type of convergence of (4.24).
Lemma 39 (Convergence to a deterministic measure) Let S be a Polish space, let µ ∈ P(S) be deterministic, and let µ N be random variables with values in P(S). Let d be a metric on P(S) generating the topology of weak convergence. Then the following conditions are equivalent.
Proof We equip P(S) with the topology of weak convergence, making it into a Polish space. Then by Lemma 38, condition (i) is equivalent to We will prove (i)'⇒(ii)⇒(iii)⇒(i)'. (i)'⇒(ii). By Skorohod's representation theorem, (i)' implies that the µ N can be coupled such that µ N =⇒ N →∞ µ a.s., which implies (ii).
(iii)⇒(i)'. Since S is Polish, it has a metrizable compactification, i.e., there exists a compact metrizable space S such that S is a dense subset of S and the topology on S is the induced topology from S [Cho69, Theorem 6.3]. It is known that this implies that S is a G δ -subset of S [Bou58, §6 No. 1, Theorem. 1]. In particular, S is a Borel measurable subset of S and we can identify P(S) with the space of probability measures on S that are concentrated on S. If we equip P(S) with the topology of weak convergence, then the induced topology on P(S) is also the topology of weak convergence (this follows, e.g., from [EK86, Thm 3.3.1]), and in fact P(S) (being compact by Prohorov's theorem) is a metrizable compactification of P(S).
We view µ N and µ as probability measures on S. Since S is compact, so are P(S) and P(P(S)), so by going to a subsequence if necessary, we can assume that the laws P[µ N ∈ · ] converge weakly to some limit ρ ∈ P(P(S)). Since the restriction to S of a continuous function φ : S → R is a bounded continuous function on S, condition (ii) implies that for general n ≥ 1 and continuous functions φ i : S → R (i = 1, . . . , n). By the Stone-Weierstrass theorem, the linear span of functions of the form ν → n i=1 µ, φ i is dense in the space of continuous functions on P(S), and hence (4.28) implies ρ = δ µ .
We now prove (4.24) under either of the conditions (i) and (ii) of Theorem 5.
Lemma 40 (Continuity argument) In addition to the assumptions of Lemma 37, assume that condition (i) of Theorem 5 is satisfied. Then (4.24) holds.
Proof Fix t ≥ 0. In view of Lemma 39 (ii), it suffices to show that for any bounded continuous φ : S → R. By Lemma 37, it suffices to show that By the second part of condition (i), Lemma 4, and Proposition 3, the operator T t is continuous w.r.t. weak convergence. In view of this, (4.30) is implied by the first part of condition (i).
Lemma 41 (Moment argument) In addition to the assumptions of Lemma 37, assume that condition (ii) of Theorem 5 is satisfied. Then (4.24) holds.
Proof Fix t ≥ 0. In view of Lemma 39 (iii), it suffices to show that (4.32) By Proposition 33 applied to the process conditioned on X (N ) (0), there exist ε N → 0 such that P X (N ) In view of (3.8), it follows that Combining this with (4.32), taking the expectation, we obtain that In view of this, to prove (4.31), it suffices to show that If µ ∈ P(S) is deterministic, then Theorem 6 tells us that (4.38) If we replace the expectation on the right-hand side by a conditional expectation given (∇S i t , G i t ) i=1,...,n , then this is the integral of a measurable [−1, 1]-valued function with respect to the expectation of a product measure of the form ( where the (X i j ) i=1,...,n j∈T are i.i.d. with common law µ 0 and independent of (∇S i t , G i t ) i=1,...,n , and R N is a random error term that by condition (ii) can be estimated as where lim N →∞ ε N m = 0 for each m. Note that moreover |R N | ≤ 2 since the φ i 's take values in [−1, 1]. Integrating over the randomness of (∇S i t , G i t ) i=1,...,n , using bounded convergence, (4.37) and (4.38), (4.36) follows.
With Lemmas 40 and 41 proved, most of the work needed for proving Theorem 5 is done. The only remaining task is to improve the convergence at fixed times in (4.24) to pathwise convergence as in (1.34). Our first aim is to show that the condition (1.34) does not depend on the choice of the metric d. This follows from the following lemma, applied to the Polish space P(S).
Lemma 42 (Convergence in path space) Let S be a Polish space and let d be a metric generating the topology on S. Let D S [0, ∞) be the space of cadlag functions x : [0, ∞) → S, equipped with the Skorohod topology. Let X n = (X n (t)) t≥0 be random variables with values in D S [0, ∞) and let x : [0, ∞) → S be a continuous function. Then one has Before the proof of Theorem 5 we need one more lemma.
Lemma 43 (Weak convergence and convergence in total variation norm) Let S be a Polish space. Then there exists a metric d on P(S) such that d generates the topology of weak convergence and d(µ, ν) ≤ µ − ν (µ, ν ∈ P(S)), where · denotes the total variation norm.
Proof Let r be a metric generating the topology on S. Replacing r(x, y) by r(x, y) ∧ 1 if necessary we can assume without loss of generality that r ≤ 1. Let L be the space of all functions φ : S → R such that |φ(x) − φ(y)| ≤ r(x, y) (x, y ∈ S), i.e., these are Lipschitz continuous functions with Lipschitz constant ≤ 1. Then is the 1-Wasserstein metric on P(S), which is known to generate the topology of weak convergence. Let L ′ := {φ ∈ L : sup x∈S |φ(x)| ≤ 1}. Since r ≤ 1, each function φ ∈ L can be written as φ = 1 2 φ ′ + c with φ ′ ∈ L ′ and c ∈ R. In view of this and (3.8), (4.47) Proof of Theorem 5 Lemmas 40 and 41 show that either of the conditions (i) and (ii) implies (4.24). We will use Lemma 34 to improve (4.24) to pathwise convergence as in (1.34). By Lemma 42 it suffices to prove (1.34) for one particular metric d on P(S) that generates the topology of weak convergence. We will choose a metric d as in Lemma 43. Set µ t := T t (µ 0 ) (t ≥ 0) denote the solution to the mean-field equation (1.22) with initial state µ 0 . Lemma 34 implies that (4.48) Taking the limit N → ∞, using the fact that d(µ, ν) ≤ µ − ν and (4.24), it follows that Since for any s, t ≥ 0, using Lemma 34, (4.24), and (4.49), we see that for each T > 0 and t ∈ [0, T ], Combining this with the fact that by (4.24), for any n ≥ 1, (4.53) Since ε and n are arbitrary, this implies (1.34).

Recursive Tree Processes
In this section, we prove our main results about RTPs with continuous time. For completeness, we also prove Lemma 8 which deals with discrete time and says that each solution to the RDE (1.54) gives rise to an RTP. This is done in Subsection 5.1 Our basic results about continuous-time RTPs are Lemma 7 and Proposition 9. Lemma 7 describes the evolution of the law of the process that is constructed by assigning independent values X i to elements i ∈ ∇S t and then calculating backwards. Proposition 9 says that adding exponential lifetimes to the elements of an RTP yields a stationary version of the process in (5.1). These results are proved in Subsection 5.2. In Subsection 5.3, we prove continuous-time analogues of known discrete-time results related to endogeny. Following [AB05], Theorem 11 links the n-variate mean-field equation to endogeny, while Propositions 13 and 15 are concerned with the higher-level mean-field equation, and closely follow ideas from [MSS18].

Construction of RTPs
Proof of Lemma 8 For each finite subtree U ⊂ T that contains the root, we can construct random variables (ω i ) i∈U and (X i ) i∈U∪∂U such that the (ω i ) i∈U are independent with common law |r| −1 r, the (X i ) i∈∂U are i.i.d. with common law ν and independent of the (ω i ) i∈U , and the (X i ) i∈U are inductively defined by The joint law of (ω i ) i∈U and (X i ) i∈U∪∂U is a probability law P U on Ω U × S U∪∂U . Since Ω and S are Polish spaces, we can apply Kolmogorov's extension theorem. The statement of the lemma then follows provided we can show that the laws P U are consistent in the sense that if V ⊂ U is another subtree that contains the root, then the projection of P U on Ω V × S V∪∂V equals P V . It suffices to prove this when U and V differ by one element only, say U = V ∪ {i} where i ∈ ∇V. It follows from (5.2) and the fact that ν solves the RDE (1.54) that X i has law ν and is independent of (X j ) j∈∇V\{i} , and from this we see that the projection of P U is indeed P V .
It will be useful in what follows to have a somewhat stronger version of Lemma 8 that applies also to certain random subtrees U ⊂ T. Let T denote the set of all finite subtrees U ⊂ T such that either ∅ ∈ U or U = ∅. Let us define a stopping tree to be a random variable U with values in T such that In the special case that κ ≡ 1 and T = N, a stopping tree is just a stopping time w.r.t. the filtration generated by ω ∅ , ω 1 , ω 11 , . . ..
Lemma 44 (RTPs and stopping trees) Let (ω i , X i ) i∈T be an RTP corresponding to a map γ and a solution ν to the RDE (1.54), and let U ⊂ T be a stopping tree. Then conditional on U, the random variables (X i ) i∈∂U are i.i.d. with common law ν and independent of (ω i ) i∈U .
Proof For each fixed V ∈ T , by Lemma 8, conditional on (ω i ) i∈V , the random variables (X i ) i∈∂V are i.i.d. with common law ν. By (5.3), it follows that conditional on the event {U = V} and (ω i ) i∈V , the random variables (X i ) i∈∂V are i.i.d. with common law ν. Since this holds for all V ∈ T , and since U ∈ T a.s., the claim follows.

Continuous-time RTPs
In this subsection, we prove Lemma 7 and Proposition 9. We work in our usual set-up as described above Proposition 29. We start with a preparatory lemma that says that if we condition on the σ-field F t defined in (1.48), then the subtrees of S rooted at i ∈ ∇S t are i.i.d. with the same distribution as S. To formulate this properly, we need some notation. We call the object S, (ω i , σ i ) i∈S .
(5.4) a marked branching tree. For each i ∈ S, let S i describe the subtree of S that is rooted at i, i.e., S i := {j ∈ T : ij ∈ S}. (5.5) We set ω i j := ω ij (i, j ∈ T), so that ω i j is the random element of Ω that "belongs" to j ∈ S i . Fix t ≥ 0. For each i ∈ ∇S t , let σ i,t j describe the lifetime of an individual j ∈ S i after time t, i.e., where t − τ * i is the age of the individual i at time t. Lemma 45 (Memoryless property) For each t ≥ 0, conditional on the σ-field F t , the marked branching trees S i , (ω i j , σ i,t j ) j∈S i i∈∇St (5.7) are i.i.d. with the same distribution as the marked branching tree in (5.4).
Proof Let T be as defined above (5.3). Then, for each V ∈ T , the event {S t = V} is measurable w.r.t. the σ-field generated by the random variables Note that here ∇V = {ij : i ∈ V, j ≤ κ(ω i )} is measurable w.r.t. the σ-field generated by (ω i ) i∈V while for each i ∈ ∇V, the random variable τ * i is measurable w.r.t. the σ-field generated by (σ i ) i∈V .
Conditional on {S t = V} and the random variables in (5.8), the random variables (ω i ) i∈T\V are still i.i.d. with their original law and independent of (σ i ) i∈T\V . The latter are also still independent of each other and the (σ i ) i∈T\(V∪∇V) still have their original law, but the laws of (σ i ) i∈∇V are changed since conditioning on {S t = V} entails conditioning on σ i > t − τ * i for each i ∈ ∇V.
Since this holds for each V ∈ S, we see that if we condition on F t as in (1.48), then under the conditional law the random variables ω i and σ i with i ∈ T\S t are still independent, and all of these random variables still have their original law, except the σ i with i ∈ ∇S t , whose laws are conditioned on the events σ i > t − τ * i . From this observation, using the memoryless property of the exponential distribution, the claim of the lemma follows.
For each s ≥ 0 and i ∈ ∇S s , within the marked branching tree S i , (ω i j , σ i,s j ) j∈S i rooted at i, we define the birth and death times τ i, * j and τ i, † j as in (1.41), with σ j replaced by σ i,s j , and we use this to define S i,s t and ∇S i,s t (t ≥ 0) as in (1.44). Finally, we define G i,s t = G S i,s t as in (1.46) and (1.47).

Proof of Lemma 7
We fix a marked branching tree as in (5.4) and times 0 ≤ s ≤ t. Conditional on F t , we assign i.i.d. (X i ) i∈∇St with common law µ 0 to the leaves of S t and define (X i ) i∈St inductively as in (1.50).
We observe that ∇S t is given by the disjoint union Conditioning on F t is the same as first conditioning on ∇S s , (ω j , σ j ) j∈Ss , (5.10) and then conditioning on which by Lemma 5.7 are conditionally independent given the random variable in (5.10). Set In view of this, by Theorem 6, conditional on the the random variable in (5.10), i.e., conditional on F s , the random variables (X i ) i∈∇Ss are i.i.d. with common law µ t−s , where (µ s ) s≥0 denotes the solution of the mean-field equation (1.2) with initial state µ 0 .
Proof of Proposition 9 Since (σ i ) i∈T and (ω i , X i ) i∈T are independent, the conditional law of (ω i , X i ) i∈T given (σ i ) i∈T is the same as the unconditional law. We claim that under the conditional law given (σ i ) i∈T , the random finite subtree S t is a stopping tree in the sense of (5.3). Indeed, S t = V if and only if for each i ∈ V and j ∈ N + (resp. j ∈ [d], depending on how T is chosen), one has ij ∈ V if and only if (5.14) Here the event in (i) is clearly measurable w.r.t. σ((ω i ) i∈V ) while under the conditional law given (σ i ) i∈T , (ii) is just a deterministic condition. We can therefore apply Lemma 44 to conclude that conditional on (σ i ) i∈T , S t , and (ω i ) i∈St , the random variables (X i ) i∈∂St are i.i.d. with common law ν. We observe that is a function of S t , and (ω i ) i∈St . Therefore, if we condition on F t = σ(∇S t , (ω i , σ i ) i∈St ), the random variables (X i ) i∈∇St are i.i.d. with common law ν. This proves (1.59) (i). Condition (1.59) (ii) is also clearly fulfilled by the definition of an RTP.

Endogeny, bivariate uniqueness, and the higher-level equation
In this subsection, we prove Theorem 11 and Propositions 13 and 15.
Recall that an RTP (ω i , X i ) i∈T is endogenous if X ∅ is measurable with respect to the σfield generated by the random variables (ω i ) i∈T . In general, if X is a random variable taking values in a Polish space and F is a sub-σ-field, then it is not hard to see that X is a.s. equal to a F-measurable function if and only if the conditional law P[X ∈ · |F] is a.s. a delta-measure. In view of this, the following lemma implies that an RTP is endogenous if and only if X ∅ is a.s. measurable w.r.t. the σ-field generated by the random variables S and (ω i ) i∈S .
Lemma 46 (Relevant randomness) Let (ω i , X i ) i∈T be an RTP corresponding to a solution ν of the RDE (1.54). Let F be the σ-field generated by the random variables (ω i ) i∈T and let F be the σ-field generated by the random variables S and (ω i ) i∈S . Then P[X ∅ ∈ · |F ] = P[X ∅ ∈ · |F] a.s. (5.16) Proof Since F is generated by F and the random variables (ω i ) i∈T\S , formula (5.16) says that conditional on on F, the random variables (ω i ) i∈T\S are independent of X ∅ . Let U (n) be deterministic finite rooted subtrees of T that increase to T. Let F (n) be the σ-field generated by (ω i ) i∈U (n) and let F (n) be the σ-field generated by S ∩ U (n) and (ω i ) i∈S∩U (n) . Conditional on F (n) , the state at the root X ∅ is a deterministic function of (X i ) i∈∇(S∩U (n) ) . Therefore, by point (ii) in the definition of an RTP in Lemma 8, X ∅ is conditionally independent of (ω i ) i∈(T\S)∩U (n) given F (n) , or equivalently, for each measurable A ⊂ S. Letting n → ∞, using martingale convergence, we arrive at (5.16).
The following lemma prepares for the proof of Theorem 11.
Lemma 47 (Successful coupling) Let (ω i , X i ) i∈T be an endogenous RTP corresponding to a solution ν of the RDE (1.54) and let (σ i ) i∈T be an independent i.i.d. collection of exponential random variables with mean |r| −1 . Furthermore, let (Y i ) i∈T be an i.i.d. collection of S-valued random variables with common law ν, independent of (ω i , X i , σ i ) i∈T . For each t > 0, define random variables The following argument is a continuous-time version of the proofs of [AB05, Thm 11 (c)] and [MSS18, Lemma 6]. Let F t be the filtration defined in (1.48). We add a final element F ∞ := σ( t≥0 F t ) to the filtration, which is the σ-algebra generated by the random tree S and the random variables (ω i , σ i ) i∈S . Let f, g : S → R be bounded and measurable functions. Since X ∅ and X t ∅ are conditionally independent and identically distributed given F t , we have where we used the martingale convergence and in the last equality also endogeny and Lemma 46. Since (5.20) holds in particular for any bounded continuous f and g, we conclude that the law of (X ∅ , X t ∅ ) converges weakly to the law of (X ∅ , X ∅ ), which implies (5.19). Proof of Theorem 11 If (ii) holds, then ν (2) is the only fixed point in P(S 2 ) ν of the bivariate mean-field equation. Since a measure is a fixed point of the bivariate mean-field equation if and only if it is a fixed point of the map T (2) , by Theorem 10, it follows that the RTP corresponding to ν is endogenous.
Assume, conversely, that the RTP corresponding to ν is endogenous. Let (Y 1 i , . . . , Y n i ) i∈T be a collection of i.i.d. S n -valued random variables with common law µ (n) 0 , independent of the RTP (ω i , X i ) i∈T and the exponential lifetimes (σ i ) i∈T . For each 1 ≤ m ≤ n and t > 0, define random variables (X m,t i ) i∈St∪∇St by Then, by Theorem 6 applied to the n-variate map γ (n) , we see that (X 1,t ∅ , . . . , X n,t ∅ ) has law µ (n) t . By endogeny we get from Lemma 47 that This completes the proof since the right-hand side of (5.22) has law ν (n) as defined in (1.62).

Proof of Proposition 13
The fact that (ρ t ) t≥0 solves the higher-level mean-field equation (1.71) means that for any bounded measurable φ : P(S) → R. In particular, we can apply this to functions of the form where f : S n → R is bounded and measurable. Then   Lemma 48 (Conditional law of the root) Let (ω i , X i ) i∈T be an RTP corresponding to a solution ν of the RDE (1.54), let (σ i ) i∈T be an independent i.i.d. collection of exponentially distributed random variables with mean |r| −1 , and let (F t ) t≥0 be the filtration defined in (1.48). Then the measures ρ t := P P[X ∅ ∈ · |F t ] ∈ · (t ≥ 0) (5.27) solve the higher-level mean-field equation (1.71) with initial state ρ 0 = δ ν .
Proof Conditional on F t , the map G t : S ∇St → S is a deterministic map, and (X i ) i∈∇St are i.i.d. with common law ν. Therefore, applying [MSS18,Lemma 8] to the case that the σ-fields H k there are all trivial and the probability measure P there is replaced by the conditional law given F t , we see that Now by Theorem 6, solves the higher-level mean-field equation (1.71) with initial state ρ 0 = δ ν .
Proof of Proposition 15 Let (ρ i t ) t≥0 (i = 1, 2) be solutions to the higher-level mean-field equation (1.71) such that ρ 1 0 ≤ cv ρ 2 0 . Define ρ i t,(n) as in (3.22), with T replaced by the higherlevel mapŤ from (1.73). It has been shown in [MSS18, Prop 3] thatŤ is monotone w.r.t. the convex order, so by induction we obtain from (3.22) that ρ 1 t,(n) ≤ cv ρ 2 t,(n) for all n ≥ 1 and t ≥ 0. Letting n → ∞, using (3.30), we see that ρ 1 t ≤ cv ρ 2 t for all t ≥ 0. Let ν be a solution of the RDE (1.54). It has been shown in [MSS18,Prop. 3] that ν solves the higher-level RDE (1.73) and there exists a (necessarily unique) solution ν of (1.73) such that (1.76) holds. It has moreover been shown in [MSS18,Prop. 4] that ν is given by (1.79). In view of this, to complete the proof, it suffices to show that the solution (ρ t ) t≥0 to the higher-level mean-field equation (1.71) with initial state ρ 0 = δ ν converges to the measure in (1.79).
We apply Lemma 48. As in the proof of Lemma 47, we add a final element F ∞ := σ( t≥0 F t ) to the filtration, which is the σ-algebra generated by the random tree S and the random variables (ω i , σ i ) i∈S . Then, by martingale convergence, P[X ∅ ∈ · |F t ] =⇒ t→∞ P[X ∅ ∈ · |F ∞ ] a.s., (5.30) and hence the measures ρ t in (5.27) satisfy where ⇒ denotes weak convergence of probability measures on P(S), which is in turn equipped with the topology of weak convergence of probability measures on S. Since the exponentially distributed random variables (σ i ) i∈T are independent of the RTP (ω i , X i ) i∈T , we have where as in Lemma 46 F denotes the σ-field generated by the random variables S and (ω i ) i∈S and the last equality follows from that lemma. Inserting this into (5.31) we see that ρ t converges weakly to ν as defined in (1.79).

Further results
In this section, we prove some additional results about RTPs. In Subsection 6.1, we prove Proposition 19 about the upper and lower solutions of a monotonous RDE. In Subsection 6.2 we prove Lemmas 20, 21, and 23 which give conditions for uniqueness of solutions to an RDE. Subsection 6.3 is devoted to the proof of Lemma 24.

Monotonicity
In this subsection, we prove Proposition 19. We start with a number of simple lemmas.
Lemma 49 (A continuous monotone function) Let S be a compact metrizable space that is equipped with a closed partial order in the sense of (1.90), and let d be a metric that generates the topology. Then defines a continuous function f : S 2 → [0, ∞) such that f (x, y) = 0 if and only if x ≥ y and moreover f (x, y) is decreasing in x and increasing in y.
Proof Since for any (x, y), is continuous. Assume that (x n , y n ) ∈ S 2 converge to a limit (x, y). Since the infimum of a family of continuous functions is upper semi-continuous, we have lim sup To prove that f is actually continuous, assume the converse. Then there exists a sequence such that lim inf for some ε > 0. By the definition of f , there exist x ′ n ≤ x n and y ′ n ≥ y n such that d(x ′ n , y ′ n ) ≤ f (x n , y n ) + ε/2. Since S is compact, we can select a subsequence such that (6.4) still holds and the (x ′ n , y ′ n ) converge to a limit (x ′ , y ′ ). Since the partial order is closed in the sense of (1.90), we have x ′ ≤ x and y ′ ≥ y, so f (x n , y n ), (6.5) which contradicts (6.4). We conclude that f : S 2 → [0, ∞) is continuous. If x ≥ y, then setting (x ′ , y ′ ) = (x, x) shows that f (x, y) = 0. Conversely, if f (x, y) = 0 then there exist x n ≤ x and y n ≥ y such that d(x n , y n ) → 0. Using the compactness of S, by going to a subsequence, we can assume that the (x n , y n ) converge to a limit (z, z). Since the partial order is closed in the sense of (1.90), y ≤ z ≤ x and hence x ≥ y.
If x ≤ x * and y ≥ y * , then since the second infimum is taken over a smaller set, showing that f (x, y) is decreasing in x and increasing in y.
Lemma 50 (Comparison principle) Let S be a compact metrizable space that is equipped with a partial order that is closed in the sense of (1.90). Let X, Y be S-valued random variables such that X ≤ Y a.s. and P[X ∈ · ] ≥ P[Y ∈ · ]. Then X = Y a.s. We will prove the lemma by showing that if X, Y are S-valued random variables such that (6.8) Since for each (x, y) ∈ S 2 < , one has f x (x) = 0 but f x (y) > 0, we see that We now use the inner regularity of measures on Polish spaces w.r.t. compacta, which follows from the regularity and tightness of any probability measure on a Polish space [Par05, Thm. 1.2 and 3.2]. Thus, we can find a compact set K ⊂ S 2 < such that P[(X, Y ) ∈ K] > 0. Since K is compact, it is covered by finitely many sets of the form (6.8), so there must exists a z ∈ S and δ > 0 such that Lemma 51 (Compatibility of the stochastic order) Assume that S is equipped with a partial order that is closed in the sense of (1.90). Then the stochastic order on P(S) is closed with respect to the topology of weak convergence.
Proof We need to show that if µ 1 n ≤ µ 2 n for all n ∈ N and the µ i n ∈ P(S) converge weakly as n → ∞ to a limit µ i ∞ (i = 1, 2), then µ 1 ∞ ≤ µ 2 ∞ . Since µ 1 n ≤ µ 2 n , for each n, we can couple X i n with laws µ i n (i = 1, 2) such that X 1 n ≤ X 2 n . Since µ 1 n and µ 2 n converge as n → ∞, the joint laws of (X 1 n , X 2 n ) are tight, so by going to a subsequence we may assume that they converge. Then, by Skorohod's representation theorem, we can couple the random variables (X 1 n , X 2 n ) for different n in such a way that they converge a.s. to a limit (X 1 ∞ , X 2 ∞ ). Since the partial order on S is closed, we have X 1 ∞ ≤ X 2 ∞ a.s., proving that µ 1 ∞ ≤ µ 2 ∞ .
Lemma 52 (Monotonicity of T ) Assume that S is equipped with a partial order that is closed and that γ[ω] is monotone for all ω ∈ Ω. Then the operator T in (1.1) is monotone w.r.t. the stochastic order.
In practice, Lemma 52 is the usual way to prove monotonocity of a map of the form (1.1). Nevertheless, it is known that there are maps of the form (1.1), in particular, probability kernels, that are monotone yet cannot be represented in terms of monotone maps [FM01, Example 1.1].
Lemma 57 (Random maps applied to extremal elements) Under the assumptions of Proposition 19, if (ω i ) i∈T are i.i.d. with common law |r| −1 r, then there exist random variables X upp ∅ and X low ∅ with laws ν upp and ν low that are given by the decreasing, resp. increasing limits 1, . . . , 1) and X low ∅ := lim U (n) ↑S G U (n) (0, . . . , 0), (6.11) where the limit does not depend on the choice of the sequence U (n) ∈ T such that U (n) ↑ S.
Here T denotes the set of all finite subtrees U ⊂ T such that either ∅ ∈ U or U = ∅, and for each U ∈ T , the random map G U : S ∇U → S is defined in (1.46).
Proof By symmetry, it suffices to prove the statement for X low ∅ . Since γ[ω] is monotone for each ω ∈ Ω, the map G U is monotone for each U ∈ T . Define Then X U ∅ ≤ X V ∅ for all U ⊂ V and hence if U (n) ∈ T increase to S, then the X U (n) ∅ increase to a limit X low ∅ that does not depend on the choice of the sequence U (n) . Let (σ i ) i∈T be an independent i.i.d. collection of exponential random variables with mean |r| −1 . Define S t ∈ T as in (1.44). Then by Theorem 6, X St ∅ has law µ low t while by what we have already proved X St ∅ increases to X low ∅ . Since µ low t ⇒ ν low , it follows that X low ∅ has law ν low .

Proof of Proposition 19
In view of Lemma 56, it only remains to prove the statement about endogeny. Let (ω i , X i ) i∈T be an RTP corresponding to γ and some solution ν to the RDE (1.54). Then with X U ∅ as in (6.12). So letting U ↑ T, using the fact that the partial order is closed, we obtain that X ∅ ≥ X low ∅ . In particular, if ν = ν low , then since X low ∅ also has law ν low , Lemma 50 tells us that X ∅ = X low ∅ a.s. Since the latter is measurable w.r.t. the σ-field generated by the (ω i ) i∈T , this proves the endogeny of the RTP corresponding to γ and ν low .

Conditions for uniqueness
In this subsection, we prove Lemmas 20, 21, and 23.
Proof of Lemma 20 If G t is constant then S t is a root determining subtree, proving the implication (i)⇒(ii). Conversely, if there a.s. exists a root determining subtree U, then, since S t ↑ S, there a.s. exists a (random) t < ∞ such that S t ⊃ U and hence G s is constant for all s ≥ t. The implication (iii)⇒(ii) is trivial. Conversely, if S contains a root determining subtree U, then by the finiteness of the latter we can keep removing elements from U as long as this is still possible while retaining the property that U is root determining.
Proof of Lemma 21 (i)⇒(ii): This is clear, since a finite uniquely determined subtree is root determining.
(ii)⇒(iii): For each i ∈ S, let S i , defined in (5.5), denote the subtree of S that is rooted at i. Since S i is equally distributed with S, by (ii), for each i ∈ S, there a.s. exists a root determining subtree U i ⊂ S i . Since x ∈ Ξ S implies x i = G U i (x ij ) j∈∇U i and G U i is constant, it follows that S is a.s. uniquely determined.
(ii)⇒(v): Since G U i is constant, we can define where the right-hand side does not depend on the choice of (x ij ) j∈∇U i . It is straightforward to check that (ω i , X i ) i∈T satisfies conditions (i)-(iii) of Lemma 8 and hence is an RTP corresponding to γ. It follows that ν := P[X ∅ ∈ · ] solves the RDE (1.54). Let (Y i ) i∈T be an independent i.i.d. collection of S-valued random variables with common law µ 0 , let (σ i ) i∈T be an independent i.i.d. collection of exponential random variables with mean |r| −1 , and define X t ∅ := G t (Y i ) i∈∇St . Then X t ∅ has law µ t by Theorem 6. Since G t = G St with S t ↑ S, we see from (6.14) that P[X t ∅ = X ∅ ] → 0 as t → ∞, proving that µ t − ν → 0.
(iii)⇒(iv): We note the following general principle: if S 1 , S 2 , S 3 are Polish spaces and (X 1 , X 2 ) and (X ′ 1 , X 3 ) are random variables taking values in S 1 × S 2 resp. S 1 × S 3 such that X 1 and X ′ 1 are equal in law, then we can couple (X 1 , X 2 ) and (X ′ 1 , X 3 ) such that X 1 = X ′ 1 . To see this, let µ denote the law of X 1 , let K i (x 1 , dx i ) denote a regular version of the conditional law of X i given X 1 resp. X ′ 1 (i = 1, 2), and define the joint law of X 1 , X 2 , X 3 as i.e., make X 2 and X 3 conditionally independent given X 1 . Applying this general principle, we see that if ν 1 , ν 2 are solutions to the RDE (1.54), then we can couple the associated RTPs (ω i , X 1 i ) i∈T and (ω ′ i , X 2 i ) i∈T in such a way that ω i = ω ′ i for all i ∈ T. Since S is a.s. uniquely determined, it follows that X 1 ∅ = X 2 ∅ a.s. and hence ν 1 = ν 2 . The same argument also shows that any solution to the bivariate RDE is concentrated on the diagonal, which by Theorem 10 implies endogeny.
(iii) and S finite imply (ii): Since V ⊂ U and V root determining imply that U is root determining, we see that P[G t not constant] decreases to P[G t not constant ∀t ≥ 0]. Assume that this event has positive probability and condition on it. Choose t(n) → ∞. Then there exist x n , y n ∈ Ξ S t(n) such that x n ∅ = y n ∅ . Since S is finite, the sequences x n and y n have subsequences that converge pointwise for each i ∈ S to limits x ∞ , y ∞ . It is easy to see that x ∞ , y ∞ ∈ Ξ S . Moreover, x ∞ ∅ = y ∞ ∅ . This shows that on the event {G t not constant ∀t ≥ 0}, the tree S is not uniquely determined.
(ii) and S = {0, 1} imply (i): It suffices to show that each root determining subtree U of S contains a uniquely determined subtree. For any i ∈ T, let T (i) := {ij : j ∈ T} denote i and its descendants, and let Ξ U,i denote the set of all (x j ) j∈(U∪∇U)∩T (i) that satisfy We claim that If V ⊂ U is a subtree, then any x ∈ Ξ V that satisfies x i ∈ χ i for all i ∈ ∇V can be extended to an x ∈ Ξ U . (6.18) Indeed, this follows from the fact that the sets (U∪∇U)∩T (i) for different i ∈ ∇V are mutually disjoint, which allows us to choose x ∈ Ξ U,i independently for each i ∈ ∇V.
Note that χ i = S for i ∈ ∇U and |χ ∅ | = 1 if U is root determining. Let V be the connected component of {i ∈ U : |χ i | = 1 that contains ∅. Since S has only two elements, χ i = S for i ∈ ∇V. Using (6.18), it follows that V is uniquely determined. Proof The subtree U := {∅, 1} is root determining, since G U (x) = (x − 2) ∨ 0 = 0 for all x ∈ S ∇U = S. On the other hand, if V = {∅, 1, 11, . . . , 1 (n) } is a finite subtree of S that contains the root, then there exist x, y ∈ Ξ V with x 1 (n) = 0 and y 1 (n) = 1, which shows V is not uniquely determined.
Example 59 ((iii) ⇒(ii)) Let S = N and Then S is a.s. uniquely determined but S a.s. contains no root determining subtree.
Proof If x ∈ Ξ S satisfies x 1 (n) = m = 0 for some n, then x 1 (n+k) = 0 and x 1 (n+k) = m − k for all k ≥ 0, which leads to a contradiction. It follows that Ξ S contains a single element, which is given by x 1 (n) = 0 for all n ≥ 0. In particular, S is uniquely determined. On the other hand, for each finite subtree U = {∅, 1, 11, . . . , 1 (n−1) } that contains the root, the function G U is of the form G U (0) = 0 and G U (x) = x + n (x ≥ 1), which is clearly not constant.
Proof Since the continuous-time Markov chain that jumps from x to 1 − x with rate π({g}) is ergodic, the RDE (1.54) has a solution ν that is globally attractive. On the other hand, if U = {∅, 1, 11, . . . , 1 (n−1) } is a finite subtree that contains the root, then G U (x) = x if n is even and G U (x) = 1 − x if n is odd, so G U is not constant.
Proof of Lemma 23 By Lemma 21, it suffices to prove that if the RDE (1.54) has a unique solution, then G t is constant for t large enough. By Proposition 19, the RDE (1.54) has a unique solution if and only if ν low = ν upp . Let 0 and 1 denote the minimal and maximal elements of S. By Lemma 57, G t (0, . . . , 0) and G t (1, . . . , 1) converge as t → ∞ to a.s. limits with laws ν low and ν upp , respectively. Since γ[ω] is monotone for each ω, the maps G t are monotone, and hence G t (0, . . . , 0) ≤ G t (x) ≤ G t (1, . . . , 1) (6.20) for all x ∈ S ∇St . Since S is finite, if the laws of the left-and right-hand sides of (6.20) converge to the same limit, then lim t→∞ P[G t (0, . . . , 0) = G t (1, . . . , 1)] = 1, proving that G t is constant for t large enough.

Duality
In this subsection, we prove Lemma 24. For a start, we will generalize quite a bit and assume that S is a finite partially ordered set and that γ[ω] : S κ(ω) → S is monotone for all ω ∈ Ω, where S κ(ω) is equipped with the product partial order. As in Subsection 5.1, we let T denote the set of all finite subtrees U ⊂ T such that either ∅ ∈ U or U = ∅. For each U ∈ T , we define G U : S ∇U → S as in (1.46), where ∇U : For any U ∈ T , we let Σ U denote the set of all (y i ) i∈U∪∇U that satisfy Lemma 61 (Monotone duality) For any U ∈ T , x ∈ S ∇U , and z ∈ S, one has G U (x) ≥ z if and only if there exists a y ∈ Σ U such that y ∅ = z and x ≥ y on ∇U.
Proof Fix z ∈ S. For each U ∈ T , let us write Then we need to show that The proof is by induction on the number of elements of U. If U = ∅, then G U is the identity map, Y U = {z}, and the statement is trivial. We will show that if the statement is true for U and if j ∈ ∇U, then the statement is also true for V := U ∪ {j}. Let x ∈ S ∇V and inductively define x i for i ∈ V as in (1.45). By the induction hypothesis, x ∅ ≥ z if and only if (x i ) i∈∇U ≥ y for some y ∈ Y U . (6.24) Here ∇V = (∇U\{j}) ∪ {j1, . . . , jκ(ω j )} and Y V = y ∈ S ∇V : ∃y ′ ∈ Σ ∇U s.t. y i = y ′ i ∀i ∈ ∇U\{j} and (y j1 , . . . , y jκ(ω j ) ) is a minimal element of x ∈ S κ(ω j ) : γ[ω j ](x) ≥ y j .
(6.25) It follows that (6.24) is equivalent to which completes the induction step of the proof.
Lemma 62 (Minimal elements) Assume that for all ω ∈ Ω, there do not exist z, z ′ ∈ S and minimal elements y, y ′ of {y : γ[ω](y) ≥ z} resp. {y ′ : γ[ω](y ′ ) ≥ z ′ } such that z ≤ z ′ but y ≤ y ′ . Fix z ∈ S. For any U ∈ T , define Y U as in (6.22) dependent on z. Then Y U = y ∈ S ∇U : y is a minimal element of G −1 U ({z}) . (6.27) Proof By Lemma 61, (6.28) In view of this, it suffices to prove that Y U does not contain two elements y, y ′ with y = y ′ and y ≤ y ′ . (6.29) The proof is by induction on the number of elements of U. If U = ∅, then ∇U = {∅} and Y U consists of a single element that has y ∅ = z, so (6.29) is satisfied. Assume that (6.29) holds for U and let V := U ∪ {i} for some i ∈ ∇U. Then (6.25) and the assumption of the lemma imply that (6.29) holds for V.
Proof If z ≤ z ′ then we must have z = 1 and z ′ = 0, so we must show that there do not exist minimal elements y, y ′ of {y : γ[ω](y) ≥ 1} resp. {y ′ : γ[ω](y ′ ) ≥ 0} such that y ≤ y ′ . Clearly, Lemma 64 (Lower and upper solutions) Assume that S is a finite partially ordered set that contains minimal and maximal elements, denoted by 0 and 1. Assume that γ[ω] is monotone for all ω ∈ Ω. Then, for all z ∈ S, ν upp {x : x ≥ z} = P ∃y ∈ Σ S s.t. y ∅ = z ν low {x : x ≥ z} = P ∃y ∈ Σ S s.t. y ∅ = z and {i ∈ S : y i = 0} is finite . (6.30) Proof By Lemma 61, G U (1, . . . , 1) ≥ z if and only if Σ z U := {y ∈ Σ U : y ∅ = z} is not empty. If U ⊂ V, then Σ z V = ∅ implies Σ z U = ∅, so the events {Σ z U (n) = ∅} decrease to a limit. We claim that this is the event {Σ z S = ∅}. Since the restriction of an element y ∈ Σ z S to U yields an element of Σ z U , it is clear that Σ z U (n) = ∅ ∀n ⊃ {Σ z S = ∅}. (6.31) Conversely, if for each n there exists some y(n) ∈ Σ z U (n) , then by the finiteness of S we can select a subsequence of the y(n) that converges pointwise to a limit y. Since y ∈ Σ z S , this proves the other inclusion. By Lemma 57, it follows that (6.32) By Lemma 61, G U (0, . . . , 0) ≥ z if and only if Σ z U contains an element y such that y i = 0 for all i ∈ ∇U. Since for each ω ∈ Ω, the zero configuration (0, . . . , 0) is the unique minimal element of {x ∈ S κ(ω) : γ[ω](x) ≥ 0}, we observe that if y ∈ Σ U satisfies y i = 0 for all i ∈ ∇U, then y can uniquely by extended to an element of Σ V for any V ⊃ U by putting y i := 0 for i ∈ (V ∪ ∇V)\(U ∪ ∇U). In view of this, by Lemma 57, P[X low ∅ ≥ z] = P ∃y ∈ Σ z S s.t. {i ∈ S : y i = 0} is finite .

Cooperative branching
In this section we prove all results that deal specifically with our running example of a system with cooperative branching and deaths. In Subsection 7.1, we prove Proposition 12 about the bivariate mean-field equation. In Subsection 7.2, we prove Theorem 17 and Lemma 18 about the higher-level mean-field equation. In Subsection 7.3, finally, we prove Lemmas 22 and 25 which illustrate the concepts of minimal root determining subtrees and open subtrees in the concrete set-up of our example.
In view of Lemma 65 and the remarks that precede it, Proposition 12 follows from the following proposition.
Proposition 66 (Bivariate differential equation) For α > 4, the equation (7.3) has four fixed points in the space D defined in (7.2), which are of the form (z low , z low ), (z mid , z mid ), (z mid , r mid ), and (z upp , z upp ), (7.8) with z low , z mid , z upp as in (1.37) and z mid < r mid . Solutions to (7.3) started in D converge to one of these fixed points, the domains of attraction being For α < 4, (z low , z low ) is the only fixed point in D and its domain of attraction is the whole space D.
Proof In Section 1.3 we have found all fixed points of (7.3) (i) and determined their domains of attraction. It is clear from (7.3) that if z is a fixed point of (7.3) (i), then (z, z) is a fixed point of (7.3), so (z low , z low ) and for α ≥ 4 also (z mid , z mid ) and (z upp , z upp ) are fixed points of (7.3). If α ≥ 4 and p 0 < z mid or if α < 4 and p 0 is arbitrary, then we have seen in Section 1.3 that solutions to (7.3) (i) satisfy p t → 0 = z low as t → ∞. Since 0 ≤ r t ≤ 2p t , it follows that also r t → 0. This proves the statements of the proposition about the domain of attraction of (z low , z low ) for all values of α. Let P α (p) := αp 2 (1 − p) − p and R α,p (r) := α r 2 − 2(r − p) 2 (1 − r) − r (7.12) denote the drift functions of p t and r t , respectively. We observe that R α,p (r) ≤ P α (r) (p, r ∈ R) and P α (r) < 0 for all z upp < r ≤ 1, which implies that sup p∈R R α,p (r) < 0 (z upp < r ≤ 1). (7.13) It follows that solutions of (7.3) satisfy lim sup t→∞ r t ≤ z upp . (7.14) If α > 4 and p 0 > z mid or if α = 4 and p 0 ≥ z mid , we have seen in Section 1.3 that solutions to (7.3) (i) satisfy p t → z upp as t → ∞. Combining this with (7.14) and the fact that p t ≤ r t , we see that (p t , r t ) → (z upp , z upp ).
To complete the proof, we must investigate the long-time behavior of solutions of (7.3) when α > 4 and p 0 = z mid . In this case p t = z mid for all t ≥ 0 and r t takes values in [z mid , 2z mid ] and solves the differential equation ∂ ∂t r t = R α,z mid (r t ) (t ≥ 0). (7.15) It is clear r t = z mid for all t ≥ 0 is a solution. Since z mid < 1/2, in view of (7.2), we must prove that all solutions with z mid < r 0 ≤ 2z mid converge to a nontrivial fixed point. We write R α,z mid (r) = P α (r) − 2α(r − z mid ) 2 (1 − r). (7.16) Since the first term has a positive slope at r = z mid while the second term has zero slope, we conclude that R α,z mid has a positive slope at r = z mid . Since solutions to (7.3) do not leave the domain D, we must have R α,z mid (2z mid ) ≤ 0. Since R α,z mid (r) = αr 3 + O(r 2 ) as r → ∞, we must have R α,z mid (r) > 0 for r sufficiently large. These observations imply that the cubic function R α,z mid has three zeros r low < r mid < r up with r low = z mid < r mid ≤ 2z mid < r upp , (7.17) and R α,z mid > 0 on (z mid , r mid ) and R α,z mid < 0 on (r mid , r upp ). It follows that solutions to (7.15) started with z mid < r 0 ≤ 2z mid satisfy r t → r mid as t → ∞.

The higher-level mean-field equation
In this subsection we prove Theorem 17 and Lemma 18. We start with two preparatory lemmas.
Lemma 67 (Convex order and second moments) Let S be a Polish space S and let ρ 1 , ρ 2 ∈ P(P(S)) satisfy ρ 1 ≤ cv ρ 2 and ρ In the next lemma, we use the notation µ := P[δ X ∈ · ] defined in Subsection 1.7.
Proof of Theorem 17 It follows from their definition that the measures ν low , ν mid , ν upp and ν low , ν mid , ν upp solve the higher-level RDE and their first moment measures are ν low , ν mid , ν upp , respectively.
By Proposition 13, the second moment measure of ν mid solves the bivariate RDE. Since ν mid = ν mid , Lemma 68 tells us that the second moment measure of ν mid is different from ν (2) mid . It follows that the measure ν (2) mid from Proposition 12 is indeed the second moment measure of ν mid .
Let (ρ t ) t≥0 be a solution to the higher-level mean-field equation. Assume that α > 4. Then Propositions 12 and 13 tell us that ρ To prove that in fact ρ t converges to ν low , ν mid , ν mid , or ν upp , respectively, in each of these cases, by the compactness of P(P ({0, 1})), it suffices to prove that if ρ tn ⇒ ρ * along a sequence of times t n → ∞, then ρ * is the right limit point. In the cases (i), (iii) and (iv) this is clear from Lemma 68.
This completes the proof for α > 4. The cases α = 4 and α < 4 are similar, but simpler.
To calculate ν mid (dη) η 2 , we use that 1 − ν (2) mid (0, 0) = r mid , where r mid is the second largest solution r of the equation R α,z mid (r) = 0, with R α,z mid defined as in (7.12). The smallest solution of the cubic equation R α,z mid (r) = 0 is r = z mid . Dividing by (r − z mid ) yields a quadratic equation of which r mid is the smallest solution. Since these are straightforward, but tedious calculations, we omit them.

Root-determining and open subtrees
In this subsection we prove Lemmas 22 and 25. which is ≤ 1 if and only if α ≤ 1 2 . At the end of Subsection 1.3, we have seen that in our example the RDE (1.54) has a unique solution if and only if α < 4. By Lemma 23 this is equivalent to condition (ii) of Lemma 21. Since S = {0, 1}, Lemma 21 tells is that in our example, conditions (i)-(iii) are equivalent.
We claim that a finite subtree U ⊂ S satisfying (1.94) is uniquely determined and in fact x ∈ Ξ U implies x i = 0 for all i ∈ U. To prove this, let A = {i ∈ U : x i = 0 ∀x ∈ Ξ U }. Since U is finite, if U\A is not empty then we can find some i ∈ U\A such that ij ∈ A for j = 1, 2, 3. (Here we take T to be the set of all words made from the alphabet {1, 2, 3}.) If γ[ω i ] = dth, then x i = 0 for all x ∈ Ξ U which contradicts the fact that i ∈ U\A. But if γ[ω i ] = cob, then (1.94) and the fact that ij ∈ A for j = 1, 2, 3 again imply x i = 0 for all x ∈ Ξ U , so we see that U\A must be empty. In particular, this shows that U is root determining.
To see that U is a minimal root determining subtree, assume that V ⊂ U is a smaller one. Then there must be be some i ∈ V such that γ[ω i ] = cob and either i1 ∈ V or V ∩ {i2, i3} = ∅. (Here we use that by definition, minimal root determining subtrees contain the root, so V is not empty.) But then either i1 ∈ ∇V or {i2, i3} ⊂ ∇V. Define x ∈ Ξ V inductively by (1.45) with x j = 1 for all j ∈ ∇V. Then x i = 1. Either i is the root or its predecessor ← i satisfies x ← i = 1 by (1.94), so by induction we see that x i = 1. Since the all-zero configuration is also an element of Ξ V , this proves that V is not root determining and hence U is minimal.