Multiple Scale Analysis of Spatial Branching Processes under the Palm Distribution

We consider two types of measure-valued branching processes on the lattice $Z^d$. These are on the one hand side a particle system, called branching random walk, and on the other hand its continuous mass analogue, a system of interacting diffusions also called super random walk. It is known that the long-term behavior differs sharply in low and high dimensions: if $d\leq 2$ one gets local extinction, while, for $d\geq 3$, the systems tend to a non-trivial equilibrium. Due to Kallenberg's criterion, local extinction goes along with clumping around a 'typical surviving particle.' This phenomenon is called clustering. A detailed description of the clusters has been given for the corresponding processes on $R^2$ in Klenke (1997). Klenke proved that with the right scaling the mean number of particles over certain blocks are asymptotically jointly distributed like marginals of a system of coupled Feller diffusions, called system of tree indexed Feller diffusions, provided that the initial intensity is appropriately increased to counteract the local extinction. The present paper takes different remedy against the local extinction allowing also for state-dependent branching mechanisms. Instead of increasing the initial intensity, the systems are described under the Palm distribution. It will turn out together with the results in Klenke (1997) that the change to the Palm measure and the multiple scale analysis commute, as $t\to\infty$. The method of proof is based on the fact that the tree indexed systems of the branching processes and of the diffusions in the limit are completely characterized by all their moments. We develop a machinery to describe the space-time moments of the superprocess effectively and explicitly.


Introduction 4 Introduction
In this paper we consider interacting branching models on the lattice Z d , where the interaction is due to migration between the sites of Z d , and hence, linear. Their basic ergodic theory is the same as that of a wide class of interacting processes with components indexed by Z d , by R d , or by the hierarchical group. On the one hand, this class includes interacting particle models, for example the voter model (Holley and Liggett (1975) [23]), branching random walk (Kallenberg (1977) [25], Durrett (1979) [14]), or branching Brownian motion (Fleischman (1978) [19], Gorostiza and Wakolbinger (1991) [22]), and on the other hand, interacting diffusions, for instance the Fisher-Wright stepping stone model (Shiga (1980) [36]), the Ornstein-Uhlenbeck process (Deuschel (1988) [13]), or super Brownian motion (Dawson (1977) [7], Etheridge (1993) [17]). For all these processes the long term behavior depends on whether the underlying migration is recurrent or transient; hence it differs sharply in high and low dimensions. In high dimensions each process has a one-parameter family of invariant measures indexed by the 'intensity' of the system which is preserved. In low dimensions the invariant measures are 'degenerate', that is, steady states are concentrated on traps -the systems cluster. For branching models this means that mass becomes locally extinct while the surviving mass piles up at spatially rare sites.
The present paper describes the cluster formation of the branching models in the critical dimension, d c = 2. It has turned out that the growth rate of a cluster is determined by the Green function of the migration kernel. In two dimensions the Green function grows very slowly, namely on a logarithmic scale. This implies the phenomenon of diffusive clustering which means that the clusters extend in space as t α 2 , where α ∈ [0, 1) is a random order. Clusters of branching models in the diffusive regime are investigated in [19], [14], Lee (1991) [32], Dawson and Greven (1996) [11], and Klenke (1997) [28], (1998) [29]. While the first three papers mentioned attempt to describe the rate of growth of clusters, the latter three give an insight into the spatial profile by considering space-time respectively space renormalized systems.
The main points of our paper are twofold. First, in comparison with [28], in addition to the spatial structure, we formulate and prove more detailed statements about how clusters evolve in time and about the family structures in a cluster. Secondly, we exhibit how the clustering phenomena can be studied by zooming into a cluster in a way which allows also for statedependent branching mechanisms. In the context of models with the branching property that means to use techniques from the theory of infinitely divisible systems. We describe now these two points.
First, to follow the concept of multiple space-time scaling, we have to rescale the space-time correlation structure among n components situated at sites y i t and observed at times s i t , i = 1, ..., n, as the system age t tends to infinity. Hence, one of our main tools is to investigate the asymptotic behavior of space-time mixed moments. Dynkin (1989) [16] has systematized moment formulas in terms of possible genealogical trees. We shall refine these calculations to cover the multiple-scale situation. Our aim will be to show that due to the diffusive clustering regime the correlation structure reduces to the knowledge of the two by two space-time distances where A i,j denotes the function of scaling exponents. In order to describe with these exponents the diffusive clustering we construct a process which we call a system of tree indexed Feller diffusions. Since branching is naturally connected with genealogical trees or more generally with genealogical groves, we introduce the grove indexed systems which are lurking in the back.
In the theory of measure-valued processes a treatment using a multiple scale analysis via tree indexed diffusions can be found in the following situations: for a compact state space model in Fleischmann and Greven (1996) [20], for Gaussian fields which allow an explicit calculation in [31], and for super Brownian motion in [28], [29].
Secondly, branching systems, which have components in [0, ∞), become extinct. This means that 'clusters at ∞' have a probability going to 0. In contrast to processes with components in [0, 1] ( [20]) or in (−∞, ∞) ( [31]), for processes with components in [0, ∞) it is not possible to find a suitable renormalization such that the renormalized components tend to a non-trivial limit. Hence, we additionally need a trick to be able to observe a cluster.
An obvious way to focus on clusters is to condition the branching model on local non-extinction. This approach has been chosen by Lee (1991) [32] for the corresponding particle system on R 2 , and Dawson and Greven (1996) [11] for interacting diffusions (containing the case of branching diffusions) with components indexed by the hierarchical group. Unfortunately, none of the techniques used in [32] and [11] work out for the lattice. It is not even understood what here the suitable condition for local non-extinction is.
A way out of conditioning is to blow up the initial intensity to ensure that one finds surviving mass in a particular window in space. This is first done by Klenke ([28], [29]) for branching models on R 2 , and can be extended to the lattice. But by blowing up we observe a random number of families in every bounded set, and we hence loose a slight amount of information on the family structure. Moreover, the method would fail in models with state-dependent branching.
We therefore choose an other approach: instead of concentrating on a 'typical site with surviving mass', we rather describe the space-time picture from the perspective of a 'typical surviving member of the population'. In translation invariant and shift ergodic situations that means rescaling the systems under the Palm distribution. It will turn out that under the Palm measure we are in a position to focus on a single family.
The main emphasis of this paper will, consequently, be to use the concept of size-biasing, and make then use of the explicit calculations by exploiting the branching property. We consider simultaneously branching particle systems and their continuous mass analogies. In particular we rely on rescaling the closed moment hierarchy for superprocesses.

The models
We start by presenting an intuitive description of the two classes of models considered in this paper, namely the particle model in subsection 1.1.2 and the diffusion model in subsection 1.1.3. Finally, in subsection 1.1.4, we state how these two models are connected via a countable system of ODE. For convenience we first recall the basic notation dealing with random measures.

Branching random walk
The basic ingredient for our processes is a time homogeneous random walk ξ = (ξ t ) t≥0 , which we introduce as follows: let (ξ n ) n∈N be a random walk in discrete-time on Z d with the transition kernel a(x, y) := P [ξ 1 = y |ξ 0 = x ]. The transition probability of its continuous time version ξ is then given by poissonizing, i.e, where a (n) (·, ·) denotes the n-step transition probability. We make the following assumptions on the discrete time kernel a(·, ·): (i) The matrix (a(x, y)) x,y∈Z d is invariant under translation in space and symmetric.
(ii) The kernel a(·, ·) has finite second moments, where · is the maximum norm.
(iii) The covariance matrix of the one-dimensional marginals with respect to the distribution is assumed to be invertible, i.e., det Q = 0. Hence, the matrix a(·, ·) is irreducible.
In the following, the random walk ξ is referred to as the basic process.
We now define the Branching Random Walk on Z d with the lifetime parameter V (BRW) by the following procedure: Migration: Each particle starting from x ∈ Z d moves according to the law of ξ.
Branching: After a mean 1/V exponential life time the particle either dies or is replaced by 2 new particles. Each case occurs with probability 1/2.
Both mechanisms occur independently for all particles, independently of each other and independently of the initial configuration. In particular, the branching is critical in the sense that E[K] = 1, where K is the random number of new particles. The offspring behave as K independent copies of the one particle system started from the parent particle's final site. In this way, the initial particle generates a random population at time t > 0, described by an atomic random measure η t ∈ N f (Z d ).
We denote by (S s,t ) t≥s the transition kernels of ξ, which describes the expected position at time t if we start in some site, say x, at time s. That is, We use the abbreviation S t := S 0,t , which defines a semigroup. Notice that due to (1.4) the generator ∆ RW of (S t ) t≥0 on C b,0 (Z d ) is given by (1.9)

Super random walk
The second class of models we look at occurs as diffusion limit of the particle model previously discussed or may, alternatively, be introduced as a system of interacting diffusions constructed via SDE'. We now give both constructions.
First we consider the short life time -high intensity limit of our particle system BRW. The appropriate scaling is the following: consider a sequence (η N ) N ≥1 , where η N is such that each particle has mass 1 N , the life time parameter is NV , and the initial population . Then the process η N := (η N t ) t≥0 converges, as N → ∞, in law to the Super Random Walk on Z d with life time parameter V (SRW), i.e. to a Markov process X := (X t ) t≥0 with values in M f (Z d ) and with L[X 0 ] = δ µ (see Dawson (1993), Section 4.4 [8]).
So far, SRW is defined for finite initial configurations, µ ∈ M f (Z d ). In principle we can extend this definition by superposing independently single ancestor processes. However, to get a decent Markov process, we want to extend the state space to a Borel space E ⊂ M(Z d ) such that E is invariant under the dynamic. To introduce E, we impose a regularity condition on the initial measure µ. We assume that µ ∈ l p γ , for some p ≥ 1, where l p γ is constructed as follows: choose a positive and summable sequence {γ x ; x ∈ Z d } such that for some finite constant Γ > 1, with a positive and summable sequence {β x ; x ∈ Z d }. Then we set For L[X 0 ] = δ µ ∈ l p γ we define X as the increasing limit of (X n ) n≥1 with finite initial states µ n ↑ µ, as n → ∞. It then turns out that for any t > 0, X t acquires values in l p γ ⊂ M(Z d ) a.s. The same construction guarantees that η t ∈ l p γ ⊂ N (Z d ) for any t > 0 a.s. iff η 0 ∈ l p γ . The state space l p γ was first introduced in the context of particle systems by Liggett and Spitzer (1981) [33].
Due to the construction, the law of SRW is infinitely divisible, a fact which justifies to call X a superprocess. To obtain an infinitely divisible law also for the particle model, choose the initial configuration Φ = δ x k random with a Poisson distribution (compare with (1.3)).
We now come to the second description of the system given by the SRW dynamic. For µ ∈ l 1 γ , the infinite system is known as Interacting Feller Diffusions (IFD). A version is given by the unique strong solution of the system of stochastic differential equations: Here the diffusion coefficient is g(x) := V x, and { w x (t); x ∈ Z d } is a collection of independent Brownian motions on the real line.
We know from Shiga (1992) [37] that IFD is strongly Markovian, and its infinitesimal generator G is given by (1.14) for f ∈ C 2 0 (l 1 γ ), and with this generator G, in a third approach, the IFD can be constructed via semigroups.

Analytical connection between BRW and SRW
Both classes of models just defined are connected in the sense that their laws are ruled by the same dual process, which is deterministic and given by a countable system of ODE. This system describes the Laplace transforms of η t and X t . It is, therefore, the discrete space analogue of the PDE known from Branching Brownian Motion (BBM) and Super Brownian Motion (SBM) on R d , respectively. Namely, for test functions f ∈ M + 0 (Z d ), and s ≥ 0, let t → u s · (t; f ) ∈ l 1 γ be non-negative solutions of the reaction diffusion equation with the initial condition u s · (s; f ) = f. (1.16) Unlike for SBM on R d , where in the reaction diffusion equation the generator ∆ RW is replaced by the one-half Laplacian operator, 1 2 ∆, on the lattice no explicit sub-or super-solutions for (1.15) in terms of a general kernel a(i, j), can be constructed. A simple renewal argument for BRW and the construction of SRW as short life-time-high-density limit show that for f ∈ M + 0 (Z d ) solutions of (1.15) are given by (1.17) For this reason, (1.15) is often referred to as log-Laplace equation of the process X.
Applying (1.9), (1.15) can be rewritten as integral equation ( 1.19) Observe that by the branching property we have a desintegration formula, i.e., 20) where Q ·RW t denotes the canonical measure of either η t or X t .
Recall that since a particle is non-divisible, Moreover, we derive from (1.17) that the particle model is embedded in the continuous mass model via Poissonizing, i.e., in terms of canonical measures the following holds: (1.22)

Basic ergodic theory
If we start the branching processes in a finite measure it is easy to see that the total mass tends to 0. However, as described in (1.10) to (1.12), we have used these finite mass processes to construct processes started from certain infinite measures. The question of the long-time behavior is then a more interesting one because it depends on the relative strength of the two competing mechanisms: branching and spatial migration. In low dimensions branching dominates while in higher dimensions the migration does. This dichotomy goes back to two papers which have become 'classics': Dawson (1977) [7] dealing with SBM, and Kallenberg (1977) [25] dealing with a time-discrete branching particle model. Both papers yield the same result which can be stated as a 'metatheorem': a recurrent particle/mass (symmetrized) migration goes along with local extinction while a (symmetrized) transient migration allows the construction of a non-trivial equilibrium with finite mass. However, the techniques used in these papers are quite different: the first one uses analytic and the second probabilistic techniques.
Dawson analyzed the log-Laplace equation (1.18) (then using characteristic functions instead of Laplace transforms). In the transient case, he constructs a non-trivial equilibrium from a series solution of (1.18). In establishing convergence he obtains upper bounds on the rate of growth of its coefficients. This approach can be transfered to yield an elementary proof for SRW. For a wide class of measure-valued branching processes with recurrent migration, including SRW, this idea is further pursued in Etheridge (1993) [17] by stating lower bounds of the rate of growth of second order terms. While in the recurrent case Dawson used the precise knowledge of the transition density of BM, Etheridge's estimates lead to a proof of local-extinction also for SRW.
Kallenberg's main criterion is that local extinction appears iff the locally size-biased population in a bounded set (or in a lattice site) converges to infinity almost surely as time tends to infinity, i.e., local extinction goes along with clumping around a 'typical surviving particle'. Moreover, he presents the method of backward trees which describe the genealogy of a randomly sampled particle of generation n, and which allows the Palm distribution of the n-th generation to be computed. In Gorostiza and Wakolbinger (1991) [22] this method is extended to a class of continuous-time branching particle models containing BBM on R d . This again extends to a proof for BRW.
A direct approach to BRW can be found in Durrett (1979) [14]. The proofs of Durrett's results are based on a very similar criterion by Liemant (see Matthes (1972) [34], Theorem 4.3) to the effect that a necessary and sufficient condition for local extinction is that the expected number of particles at time 0 which have offspring in some bounded region at time t tends to 0, as t → ∞.
For a more detailed review of the classical theorem we introduce a notation. Recall that a random measure is said to be translation invariant if its probability law is spatially homogeneous. The intensity measure of a translation invariant random measure is spatially homogeneous and hence a constant, θ, times the counting measure, λ. θ is called the intensity.
A random measure on Z d with law P is said to possess an asymptotic intensity function if there is a function γ : To obtain a common theorem for both models, we use the following convention: • In a statement which applies to both models, i.e. to the particle as well as to the diffusion model, let X t denote either BRW or SRW.
The following statements are taken from the above mentioned papers. (i) Fix θ ∈ [0, ∞). Then the weak limit exists, and is spatially homogeneous. The parameter θ corresponds to the preserved intensity.
(iii) If Φ is a random measure with law R and asymptotic intensity function γ, which satisfies γ(m)R(dm) < ∞, then Hence, the set of extreme points of spatially homogeneous and ergodic probability laws with finite intensity is exactly If Φ is a random measure with law R and with asymptotic intensity function γ, which satisfies γ(m)R(dm) < ∞, then

28)
where 0 denotes the zero measure on Z d .
In the transient case, [4] yields the same results as stated above. The main tool is a successful coupling. Since it is easy to check that the second moments are bounded as long as this coupling argument extends to all diffusion coefficients fulfilling (1.29), i.e. in particular to g(x) = x. (see Shiga (1992) [37]).
In the recurrent case, [4] establishes the analogue to (1.28), that is, convergence to (1−θ)δ 0 +θδ 1 , where θ is the initial intensity. Starting from the result for Fisher-Wright diffusions (IFW), that is, g(x) = x(1 − x), obtained by a duality of IFW to delayed coalescing RW, the main tool is comparison with IFW. Based on the intuition that a larger diffusion leads to a process whose distribution is more 'dispersed', [3] gives a general comparison argument for the expectations of some functionals. This generalizes to (1.28) for all diffusion coefficients such that g : [0, ∞) → [0, ∞) fulfilling (1.29). 2

Cluster formation in two dimensions
Part (b) of Theorem 0 states that in low dimensions X t goes to local extinction. On the other hand, mass piles up at spatially rare sites. We call this phenomenon clustering. It contains the observation that for large times locally all components X t ({x}) agree, which means that the components either are extinct or grow to infinity (the latter occuring certainly with a probability going to zero, but being interesting to be investigated).
Concerning the structure and the genealogy of an non-empty cluster there are a number of obvious questions: • What is the height of a cluster, i.e., at what rate does a not yet extinct component grow?
• How fast does a cluster of surviving components expand spatially, i.e., at what spatial rates do clusters of components remain correlated, which actually means dependent, even as t → ∞? Since the correlation can be seen as relying on common ancestry only, rescaling the correlation structure describes, therefore, the genealogy of a cluster by giving insight into the family structure.
• How old is a cluster, i.e., how far back in time reaches the correlation structure among components of a cluster compared with the age of the system?
Before we answer these questions we first need to discuss the different approaches mentioned in the introduction in order to be able to deal with clustering in branching models, namely how do we ensure that we really observe a 'non-empty' cluster.
In Lee's approach ( [32]) BBM on R 2 are conditioned on at least one surviving particle in a finite set. There is more than one reason not to follow this idea in the context of BRW or SRW, that is, in a discrete space situation. All are of a technical nature. On the one hand, Lee studied suband super-solutions of the partial differential equation for the log-Laplace functional (similar to (1.15)). This method makes use of the scaling property of Brownian motion. Therefore it is not clear how to transfer this to BRW where the log-Laplace functional is given by a harder-tohandle difference equation (1.15). On the other hand, in the continuous mass situation it is not clear what local non-extinction could mean. Dawson and Greven ([11]) condition interacting branching diffusions with components indexed by the hierarchical group on the event that one of the components has mass at least ε > 0. To do so, they study the interaction chain, but the construction of that tool makes use of a hierarchical mean field limit. Hence the hierarchical structure is rather important and a similar renormalization approach still has to be constructed for the two-dimensional lattice. Besides the lack of a suitable tool for conditioning on local non-extinction on the lattice, it is still not clear what the right condition on the lattice should be.
So Klenke ([28], [29]) came up with a trick which avoids conditioning. He investigated BBM on R 2 and the respective superprocess starting with more and more densely populated initial configurations. This serves to obtain a non-trivial limiting probability of non-extinction. His method works also for the lattice, but we would observe by that a random number of families in each bounded set. Instead we rather prefer to observe one typical family only.
We therefore introduce another concept working for systems going to extinction, and which is in the context of branching systems really tailored for explicit calculations. The idea is to describe the process seen from a 'typical surviving particle' at given time t, which automatically places the observer in a non-empty cluster. Mathematically, in translation invariant and shift ergodic systems this amounts to a local change of the law of X t , that is, the law of X t , size-biased with one of its components.
To explain this, let X be a random variable taking values in some arbitrary space with distribution P, say, and let h(X) be a non-negative measurable functional of X with E[h(X)] < ∞.
Then the distribution of X, size-biased with h(X), is given by More generally, the distribution of the random variable F (X), size-biased with h(X), is the distribution of F under (P) h .
If X is a random measure on a discrete space E, say, and one puts, for a fixed x ∈ E, h(X) := X({x}), then (P) x := (P) h coincides with the notion of the Palm distribution.

Definition 1 (Family of Palm distributions)
Let E be locally compact, polish, and X be a random measure on (E, B(E)) with law P and locally finite intensity measure Λ P . The associated Palm distributions are a family for all measurable and bounded f : Remark. Assume that the law of X is a translation invariant and shift ergodic atomic random measure. Let each atom represent a particle in a random configuration. Notice then the following interpretation of the Palm distribution: take a very large block in E, and sample one particle from the block. Record the site, say x ∈ E, it is taken from, and shift X by −x. Then (P) x is the weak limit of the distribution of the resulting measures, as the blocks appoximate E. 2 Notice that the nice point about this concept is that branching systems are infinitely divisible, and the Palm measure of their canonical measures possesses a nice representation as a genealogical tree (see Kallenberg (1977) [25], Chauvin, Rouault and Wakolbinger (1991) [2], and Gorostiza, Roelly and Wakolbinger (1992) [21]), i.e. (here formulated for the superprocess and in terms of a heuristic only), for each x ∈ Z d , } is a family of independent SRW starting with µ, and ξ x := (ξ x t ) t≥0 is a RW with kernelā t (x, 0) := a(0, x). We would like to point out that we integrate the right hand side of (2.3) with respect to the increasing process which is given due the monotonicity in the initial measure.
Example. Let us illustrate the above definition by the examples which are relevant for our results. Fix a non-empty finite ordered set T , and a time-vector t := (t e ; e ∈ T ) ∈ [0, ∞) T . We choose E := T × Z d , and consider the random variable X := (X t e ; e ∈ T ).
(2.4) as a shorthand notation for the law of X, size-biased with X t e ({x}).
s., for a given e 0 ∈ T , we use (P) e 0 (2.6) as a shorthand notation for the law of (X t e (Z d ); e ∈ T ), size-biased with X t e 0 (Z d ). 2 In this paper we specifically investigate the questions stated above for the critical dimension d = 2. We are going to rescale the process under the Palm distribution which as it turns out means that we rescale the genealogy of the relatives of a sampled particle. The specific point about the critical dimension as we see later is then that due to diffusive clustering all populations are growing on the same scale. The latter relies mainly on the fact that the sampled particle's relatives are the superposition of different aged family clans.
From now on let d = 2.

Spatial scaling (Theorem 1)
The main aim of this subsection is to answer the questions about the cluster's height and its spatial shape. The analysis of the space-time picture is dealt with in the next subsection.
We now start to renormalize. The precise statement for BRW, η, goes back to Durrett (1979) [14]. Recall Ψ(·) from (1.24), and the covariance matrix Q from (1.6). For Roughly speaking, (2.7) states that with probability of the order 1/ log t we see Z log t surviving particles, where Z is mean 1 exponentially distributed.
In this subsection we state a similar result for SRW, which simultaneously gives more insight into the origin of the exponential law on the right hand side of (2.7), and hence, a better image of the spatial structure of the growth. To do this, we start by introducing the first two concepts to describe phenomena concerning clustering.
Renormalizing. The first step is to renormalize our processes by the growth rate of a surviving mass at a particular site, i.e., at time t > 1 we set where Block averaging. Secondly, in order to get the spatial structure into view, we analyze the clustering from the point of view of averaging over large blocks: for α ∈ [0, 1], x ∈ R 2 and t ≥ 1, we define the α-block mean by where the space block is given by If we obtain a common statement for both models, we use script letters, i.e., we reach again the following agreement, and analogously for the scaled versions, Notice that this includes the case without spatial averaging, i.e., the above setting guarantees that In order to analyze the object just introduced we need two new ingredients which we now present. These are the diffusions which are related to non-spatial branching.
Feller diffusion Z. Let Z := (Z t ) t≥0 denote the Feller diffusion (FD), i.e., Z is Markovian and has 'generator': Notice that the law of FD is infinitely divisible, and for each λ ∈ R + , and hence, is the uniquely determined canonical measure.
Size-biased Feller diffusion Y . In order to introduce a size-biased version of Z, recall that it is known from Theorem 1 in Roelly-Coppoletta and Rouault (1989) [35] that the processes converges in distribution, as T → ∞, to a process, Y := (Y t ) t≥ 0, whose distribution is characterized by . Y is therefore referred to as the size-biased FD, and we let ( Furthermore, by the (P θ )-martingale problem formulated in Theorem 2 in [35], it turns out that Y is FD with immigration, i.e., a diffusion with 'generator' i.e., we have the following 'cluster decomposition': Thus, the law, (P 0 ), of Y started in 0 is non-trivial, and coincides with the Palm canonical measure of FD, i.e., Before we state a limit law of the type (2.7), notice that by Lemma 10.6 in [26], and Q ·RW t is the canonical measure of X started in Ψ(1). Since by Theorem 0, under P θ t , X t tends to extinction, the probability to observe more than one family goes to zero, as t → ∞, and the following theorem states that the suitably rescaled genealogy of the observed family described in terms of Kallenberg's backward tree, converges to a limiting genealogy in distribution, i.e., (recall (2.3) and (2.21)).
Recall from (1.24) the initial law, Ψ(θ), from (2.5) the notation of the Palm distribution, (P) y , and from (2.20) the definition of the size-biased FD, Y .

Theorem 1 (One space scale)
(a) For every ε > 0, Remark. First notice that for α = 0, the limit law in (2.27) describes the growth of one component, and is exactly the statement (2.7), and corresponds to Theorem 4.1 in Fleischman (1978) [19] stated for BBM.
(2.26) is the starting point for different approaches in describing the clusters, and yielding different branching diffusions Z and Y in the limit which are related via size-biasing. On the one hand side, by Theorem 2 in Klenke (1997) [28], while on the other side, we obtain (2.27). In particular, part (b) tells us that size-biasing and rescaling 'commute'.
Notice that, since under (P θ t ) 0 we observe one typical family only, the right hand side of (2.27) does not depend on θ, of course. Hence, summarizing both approaches, together with (2.23), one expects that Moreover, (2.26) suggests indeed a further type of result which is stated in the Theorems 5(a) and 5(b) in Dawson and Greven (1996) [11], where SRW on the hierarchical group are conditioned to have components with values at least ε > 0. In this context, a third branching diffusion appears in the limit, which is now time-inhomogeneous, and its distribution is given by that of a Feller diffusion, Z, conditioned on surviving until the unit time.
We have learned from the genealogical tree that the mass observed from a randomly sampled particle, called ego, is the superposition of the different family clans which have branched off ego's ancestral line. Let now α ∈ (0, 1). Since the total mass of a family which branched off at time t − t α , α ∈ [0, α), is of smaller order than t α , the family clans younger than t α do not have any effect on the density of ego's relatives in a block of side length t α . The fact that the latter phenomenon translates in the limit to a cut of the domain of integration of the limiting genealogical tree indicates that the offspring of each family clan which branched off at time t − t γ , γ ∈ (α, 1), and lives at time t in a block of side length t α is at time t uniformly distributed on that block. In a suitable block ego sees therefore all sites growing on the same scale. 2

Multiple space-time scaling (Proposition 1 and Theorem 2)
Our next goal is to give a more detailed description of the space-time picture of cluster formation via multiple space-time scales. That means, we describe the common law of surviving masses/particles which are simultaneously located at spatially rescaled sites and observed at rescaled times. In order to explain the suitable scales, we need to introduce an object which contains all information about a cluster's genealogy. Having the particle picture in mind, a positive correlation between two components observed at different times is due to a common ancestor. The historical paths of two particles coincide, hence, up to the death of the most recent common ancestor and their increments are independent of each other afterwards. We expect therefore the rescaling analysis to be described via genealogical trees.
To define these we proceed in several steps: (i) we start by defining a binary tree as a special graph without any loops. (ii) We label the tree's vertices by a function which reflects the exponents of the rescaled degrees of kinship of each pair of two individuals, and (iii) we choose space-time scales associated with this labeled tree. (iv) Based on the described genealogy, in a final step we introduce the limiting objects.
(i) Binary tree T. A binary tree T is a finite set of words consisting of finitely many letters of the alphabet {1, 2} with the following compatibility conditions (see Figure 1(a)): In order to describe the structure of the complete spacetime process, in particular the parts where survival is observed, in addition to renormalizing and block-averaging (compare (2.8) and (2.10)) there is a third concept in discussing clustering phenomena, which is called the multiple space-time scale analysis.
With each space of scaling exponents, (T, A), we can associate a corresponding multiple spacetime scale. Namely, given (T, A), a family of sequences of space-time points is said to be on a (T, A)-scale iff for all e + , f + ∈ T + , the following two conditions hold: Recall the definition of α-block means B (·,·),α from (2.13). We are now interested in the joint distribution of several of these objects, i.e., we need to investigate the asymptotics of We expect the genealogies of the limiting objects being the limits of the genealogies corresponding to (B r e + t ,α ; e + ∈ T + ), and hence being associated with the same space of scaling exponents. The candidates for these objects are the so called tree indexed diffusions, which we introduce next. They will describe different aged subfamilies, and play a role similar to that of Kallenberg's backward tree.
is a diffusion on R T with the following dynamic: two branches Z e i , i = 1, 2, -are FD starting in the same value at time t = 0, -and after that time their increments run independently of each other.
We abbreviate Analogously we define for a given leaf, e + 0 ∈ T + , the (T, A)-indexed Feller diffusion with sizebiased trunk e + 0 .
is a diffusion on R T with the dynamics: the trunk {Y (e + 0 ),e ; e ≤ e + 0 } is size-biased FD, but for e 1 , e 2 ≤ e + 0 , two branches Y (e + 0 ),e i , i = 1, 2, evolve as follows: -run together until A(e 1 ∧ e 2 ), -and after that time their increments run independently of each other and of the trunk.
We again abbreviate The name of Y is justified by the following property: recall from (2.6) the notation of the Palm distributions, (P) e .

Proposition 1 (Family of Palm distributions of tree indexed FD)
Fix θ > 0, and let (2.41) (b) P θ is infinitely divisible, and the family of Palm distributions of its canonical measure, Before we state the next theorem we want to give a motivation. Assume for the moment that α = 0, and recall from Theorem 1 that the e + 0 -th marginal in (2.35) equals in law P 0 [Y 1 ∈ · ]. Each other component, say B (yt γ 2 ,t±t β ),0 , e + = e + 0 , measures ego's relatives which live at time t β before or after ego in a spatial distance of order t γ 2 to ego. Then it is clear from the central limit theorem for the underlying motion that the contributions to B (yt γ 2 ,t±t β ),0 came only from those family clans which branched off at least a time t γ∨β before. Moreover, after the splitting time both components evolve independently from each other, but in comparison with ego's ancestral line, afterwards there is no mass immigrating from the family clans anymore.
If we then look at the block means rather than the single sites, we once more observe that no family clan younger than t α contributes to the densities. In fact, we expect the limiting genealogical tree to be cut at time 1 − α.
We are now able to state the result about a cluster's space-time structure, as t → ∞. Recall from (1.24) the initial law, Ψ(θ), and from (2.4) the notation of the Palm distribution, (P) (e,y) .

Theorem 2 (Multiple scaling)
then the following holds as t → ∞: Remark. Let us discuss the clustering phenomena described by Theorem 2.
Look first at the spatial aspect of our theorem. For a fixed t ≡ s t , (2.43) is associated with the statement of Theorem 2 in Klenke (1997) [28] for SBM/BBM on R 2 . Since Klenke blows up the initial state instead of changing to the Palm measure, he obtains a (T, A)-indexed system of FD, Z (T,A) , in the limit.
By Corollary 1, stated in the very end of section 4, (2.43) is still true if we replace the sequence of centers (y As a consequence, since X is translation invariant, size-biasing with the mass at one single site (the center of the block) has asymptotically the same effect as size-biasing with a whole block mean. This confirms that under ego's perspective all sites grow on the same scale in a suitable block.
(2.43) simply asserts convergence for a given α ∈ [0, 1). It is an open problem to show that (2.43) actually holds in the sense of weak convergence on path space, but having the genealogical representation at hand, there is little doubt that indeed weak convergence takes place. We defer this question to further study. 2 Obviously Theorem 1 and Theorem 2 make resolutely use of the fact that rescaling the branching model under the Palm distribution is the same as rescaling the relatives of a 'typically sampled' particle. Hence, whatever contributes to the densities of two different components has a common history up to some time, and has increments evolving independent of each other afterwards. While of course the common ancestral line is responsible for the common history, there is more than one interpretation for the independent increments. So far we have had in mind the one which is due to the branching mechanism of the model.
But it is also possible that it comes not from one branching model but from a collection of branching models with the exploited property, i.e., each two of them run together for a certain time and have independent increments afterwards. The latter appears for instance in two-level branching models in which the branching concerns not only an individual level, but moreover it is allowed that a group of individuals may be reproduced or disappear. Since the proof of the theorems are not influenced by what the independent parts are due to, we will state with Theorem 2(') a generalization of Theorem 2 in Proposition 3.1.

Outline of the strategy for the proof
We close this section by outlining the strategy for the proofs of our results, which determines the structure of the rest of this paper.
First, section 3 is devoted to a systematic introduction to the representation of the moment formulae for SRW by applying the underlying genealogical structure. It will turn out that the objects these formulae are based on are systems of random walks which are indexed by the possible genealogical groves. That suggests that instead of rescaling functionals of a single branching process we rather have to deal with systems of branching models which are itself indexed by a grove. The latter requires an extension of Theorem 2, and will be stated in Theorem 2(') in subsection 3.1. From there we derive the corresponding moment formulae in subsection 3.2. Finally we prove the representation of the size-biased grove indexed FD which was given in Proposition 1 in subsection 3.3.
Section 4 is devoted to the asymptotic behavior of the moments of a tree indexed system of random walks, for which according to the techniques of section 3 the moment formula for tree indexed SRW is the key. The calculations in section 4 are therefore crucial for the proof of our results for SRW.
Finally, in section 5 we collect all tools mentioned above to actually prove the Theorems 1 and 2(').

Genealogical representation of SRW-functionals
The main objective in this section is to give explicit moment formulae for the processes appearing in the theorems, which are simultaneously observed at different scales in space and time. That means, for t 1 , ..., t n ∈ R + , and x 1 , ..., x n ∈ Z d , we are interested in expressions of the form: To be able to handle the rather complicated formula for SRW, we develop a machinery which describes collections of such moments graphically. This ansatz was first used by Dynkin (1988) [16], where the space-time mixed moments are listed systematically in terms of special binary graphs.
Obviously, the family structure of a branching population is represented by a genealogical tree, or more generally, in the case of collections of independent families, by what we shall call a genealogical grove, i.e. a collection of trees.
• For this reason, in subsection 3.1 we start with the construction of groves. Furthermore due to migration these groves are naturally connected with a grove indexed system of RW which contains the information about how long two particles/masses have followed the same path due to a common ancestor. This concept of grove indexed RW extends to the concept of more general grove indexed systems incorporating the branching property. These being grove indexed systems of the considered branching models, and grove indexed FD. We finally use the latter to extend the multiple scale analysis to a system of grove indexed SRW/BRW (Theorem 2(')).
• In subsection 3.2 (Proposition 2) we derive the moment formulae for grove indexed SRW.
• Finally, in subsection 3.3 we give the probabilistic representation of the random object which is the size-biased grove indexed FD. In particular, we prove Proposition 1.

χ-grove; grove indexed systems (Theorem 2('))
The representation of the moments of SRW, which is given in subsection 3.2, relies on the underlying genealogy due to both, the branching and the migration. It may, therefore, be described by systems of RW which are indexed by binary groves. In order to introduce this and other grove indexed systems, in this subsection, we pursue the following steps: (i) given a sample from the population, we define deterministic binary graphs which contain the information about whether or not two components of the population are positively correlated due to common ancestor mass, then (ii) we label these genealogical 'groves' by the splitting times, and finally (iii) we introduce grove-indexed systems, what in particular takes into account that during a particle's/mass' life-time it is migrating according to a random walk.
(i) χ-grove G. Recall from Figure 1 that rooted binary trees may be represented as sets of words consisting of finitely many letters from the alphabet {1, 2}. We then make the following agreements: -A grove is a finite collection of rooted binary trees.
-Let χ be a non-empty ordered set containing the names of a sample of individuals, which we call in the following leaves. Given χ, a χ-grove G is a grove, whose leaves are marked by the elements of χ.
-We define the following equivalence relation: two marked trees are the same, if they consist of the same words and if each leaf of the one tree has the same mark as the corresponding leaf of the other one. Then, two marked groves are seen as equivalent, if they are collections of the same trees. That means, we allow different families to be exchangeable, but the individuals within a family have to be fixed.
-For a grove G, let G + (G − , G 0 ) denote the set of its leaves (roots, internal vertices).
-Notice that each root, e ∈ G − , corresponds to a binary tree rooted by e , i.e.
(compare also the basic notation concerning trees given in section 2.2).
-Recall from (2.31), that on each (family) tree of G a partial order relation is defined, which extends here to a partial order relation, ∧ G , on G by identifying the roots, i.e., for e, f ∈ G, e ≤ f iff either e ∈ G − or e and f belong to the same family tree, G| e , e ∈ G − , and e ∧ G| e f .
) be the set of distinguishable χ-groves (which consist of exactly k trees resp. of the k trees χ 1 , ..., χ k ). In particular for χ = (1, ..., n), we abbreviate G n : = G (1,...,n) , and call G n (G (1) n ) the set of n-groves (n-trees). See Figure 2 for an example.  It turns out to be necessary to control the number of binary n-groves, in particular to control this number from above. For this reason, we state an explicit formula for the cardinalities of n-groves:

Lemma 1 (Cardinality of binary n-groves)
In particular, for n = 1, 2, 3, 4, as mentioned in [18], Proof. We follow a standard counting argument based on generating functions. Notice that it is not difficult to find an explicit formula for the cardinality of G (1) n . Namely, we get each element of G (1) n+1 by adding a new splitting point to each of the 2n − 1 edges of n-trees, and let an edge turn either to the right or to the left (see Figure 3). Then, since G Now recall that permuting the labels of the exit vertices within a tree yields a different n-grove, while the trees themselves are unordered. Hence by (3.6), The proof of (3.4) relies now on a straightforward calculation via exponential generating functions. Fix z ∈ C , |z| < 1 4 , then Comparison of the coefficients via Taylor's expansion formula yields So far for a given sample of individuals from the space-time population, we can describe possible genealogical records in the sense of specifying the exact kinship between the sample's individuals. Now, suppose the corresponding genealogical grove is well-known. Since the branching occurs at random times, different generations may overlap. Hence, we take next a look at the splitting times from the ancestors, i.e., we need to label the grove by the monotone increasing time points at which the branchings occurred.
(ii) Labeled grove (G, S). Fix a grove G, and a non-negative starting time s ∅ ≥ 0.
-We call the labeling function if S is monotone increasing on trees and fulfills the initial condition on the set of the roots, G − , -The tuple (G, S) is called the labeled grove .
Up to now for a given sample of individuals from the space-time population, we can retrieve all information which is due to branching. We are interested, finally, in the different spatial paths the sampled individuals respectively all their ancestors have followed due to migration on Z d . This leads to a system of RW, which is just indexed by the nodes, i.e. ancestors, of the underlying genealogical grove, and whose dynamic relies on the information about all branching times. This is a special example for a grove indexed system of Markov processes, which we want to introduce now: (iii) Grove indexed system of Markov processes W. Fix a labeled grove, (G, S). Let W be a Markov process on E. Then consider the following collections of such processes, which are such that the branches W e , e ∈ G, are versions of W , and the joint distribution of two branches W e 1 and W e 2 depends on whether or not e 1 and e 2 are related: • if e 1 , e 2 ∈ G belong to the same tree, the branches W e i , i = 1, 2, • if e 1 , e 2 ∈ G belong to different trees, the branches W e i , i = 1, 2, -start at time s ∅ in the same point of E, but their increments are independent of each other.

Definition 2 (Grove indexed system of Markov processes (GI-W))
This defines a Markov process on E G , which is called the grove indexed indexed system of W (GI-W).
Examples. We here want to illustrate these special cases of GI-W which play a role in the following sections.
(i) Let E := Z d . We choose for W our basic process, ξ. Then we obtain the above motivated grove indexed systems of RW (GI-RW), which is illustrated in Figure 4. GI-RW is the object the moment formula in Proposition 2 is based on. We investigate the rescaling analysis for ξ in Proposition 4 in section 4.
(ii) Let E := l 1 γ ⊂ M(Z d ), and let W stand for our branching models, SRW or BRW, respectively. We then obtain a system of grove indexed SRW/BRW (GI-SRW, GI-BRW), which is the main object considered in subsection 3.2.
(iii) Let E := R + , and let W stand for FD, Z, then the system of grove indexed FD (GI-FD), is the generalization of the tree indexed systems of FD introduced in subsection 2.2. Notice that its size-biased version, Then we are considering sequences of labeled groves, (G, S t ), which can be rescaled in a suitable way.

For a given space of scaling exponents, (T, A), recall from (2.33) the definition for a sequence of space-time points being on a (T, A)-scale.
(iv) Extended multiple scale analysis. Fix a labeled grove, (G, A), such that A| G − ≡ 0. Then a family of sequences of space-time points R ext is said to be on an extended (G,A)-scale iff the following three conditions hold: for all e + , f + ∈ (3.19) and in addition to (3.19), for e ∈ G \ G − , and for e, f ∈ G, Recall the definition of α-block means B (·,·),α from (2.13). We are now interested in the joint distribution of several of the objects indexed by a grove, i.e., we need to investigate the asymptotics of Based on that we state the following theorem. Below we define the suitable property of labeled trees one needs in order to conclude Theorem 2 as a particular case of Theorem 2(').

24)
then the following holds as t → ∞: Notice that the concept of grove indexed branching processes includes the description of the finite-dimensional marginals of the single branching process. To see this, we define labeled trees with a special structure:

Definition 3 (Linearly ordered tree)
A labeled tree, (T, S), is refered to as linearly ordered iff for each subtree T ⊆ T,

S(∧ e∈T e) = ∧ e∈T S(e). (3.26)
Remark. Consider a sequence of linearly ordered trees, (T, S t ). Since its labels are such that Furthermore, if a family of space-time points R t is on a (T, A)-scale, it is always possible to extend to R ext t by uniting R t with a family of time points {s e t , e ∈ T 0 } such that T equipped with S t given by (3.22) is linearly ordered. Thus, Theorem 2 follows immediately from Theorem 2(').

Moment formula for the tree indexed SRW
Fix a labeled binary tree, (T, S), a starting time, s ∅ , and let X be a version of a (T, S)-indexed SRW. Then the main aim is to obtain a graphical representation of an explicit formula for the space-time mixed moments of X (T,S) (recall (3.17)).
Due to the underlying genealogical structure, the moment formula for (T, S)-indexed SRW relies on the corresponding moments for (G, U)-indexed RW. The labeled groves (G, U) run through all possible genealogical records, i.e., in particular G ∈ G T + (recall (3.3)), and is equipped with monotone labels, {U (e); e ∈ G}, such that the splitting time, U (e), from a possible common ancestor, e = e + ∧ T f + , of the two individuals e + , f + ∈ T + occurred latest before the branches X e + and X f + started to follow an independent dynamic.
To describe this formally, we need more notation.
-For any grove with the same leaves as T, i.e., G ∈ G T + , we introduce the labeling function on G prescribed by (T, S) as follows: We then define the domain prescribed by (T, S).
, for all e ∈ G 0 . (3.31) With the notation (3.29) to (3.31), we are now in a position to state the moment-formula for X (T,S) . It will be illustrated below again in terms of finite-dimensional marginals of a single SRW:

(b) Specifically, the first and second moments are
Hence we obtain from (3.32) just the moment formula for the finite-dimensional marginals of a single SRW as can be found e.g. in Dynkin (1988) [16]. Here the dU (·)-integrals are restricted only by monotonicity, i.e., Proof of Proposition 2. In the specific situation, where the labeled tree, (T, S), is linearly ordered, the proof goes back to a finite-dimensional marginal representation of the log-Laplace equation, (1.18), and can be found e.g. in [16]. For arbitrary binary trees we refer to Winter (1999) [39].

Fix a labeled grove (G, A).
In this subsection we prove (a generalization of) Proposition 1 (i.e., the analoguous statement for labeled groves rather than trees). Consider a random vector (Z 1 , ..., Z n ) as well as the vector (  ..., Z i , ..., Z n ). Coming back to a (G, A)-indexed system of FD we now have the situation that the increments of two branches are independent after the time of the most recent common ancestor. It turns out that the above discussed cases for grove indexed systems translate as follows: for each e + 0 ∈ G + , To see the latter, by Definition 1 we need to verify that for each λ := (λ e + ; e + ∈ G + ) ∈ (R + ) G + , and for all e + 0 ∈ G + ,, where we have used the induction hypothesis. Since the trunk, Y (e + 0 ),e + 0 , is a size-biased martingale, its inverse is a martingale too. By independence of the branches belonging to different trees and independence of the increments after A(∧e + ), the right hand side is equal to (3.40) (b) Infinitely divisibility follows from the branching property. Hence for each θ ≥ 0,

Moment asymptotics in the critical dimension
In this section we shall give the asymptotics for the moments among suitable rescaled components or blocks of components of SRW, (X t ) t≥0 , in the critical dimension, d = 2. We proceed as follows.
• The main result is stated in Proposition 3 in subsection 4.1. Its proof will consider two cases separately, namely: the case without any and the case with block averaging.
• In subsection 4.2 we look at the single components only.
• In subsection 4.3, we then change that microscopic perspective to rather focus our view on averaging the components over large blocks. The transfer is based on the fact that on the lattice, the typical tuple of components within a block of side length t α 2 has mutual distances of the order t α 2 . That means that blocks of side length t α 2 behave as single components, which are associated with a scaling function truncated at 1 − α.

Results for tree indexed systems of SRW (Proposition 3)
For a given (T, A), and G ∈ G T + , we introduce the coefficients prescribed by (T, A) on G, U (e)), (4.1) where the U (e), e ∈ G 0 , run through the domain of integration given in (3.31). In particular, if G has length 1, we let c (T,A) (G) := 1.
Our main result is then the following:

Proposition 3 (Asymptotics for the moments of (T, S t )-indexed systems of SRW)
Let λ be the counting measure.

Remarks.
Recall the moment formulae given in Proposition 2. Obviously, if the initial state has finite total mass, we can let fall down the condition on the compact support property of test functions. For a given extended space of scaling exponents, (G, A), and θ > 0, in (3.32) inserting µ = θδ 0 and F ≡ 1 yields (4.4) Hence the limit on the right hand side of (4.2) may be thought of as the limit of the moments of a system of tree indexed FD. Namely, See [39] for more explicit formulae of c (T,A) (G) and of the moments of tree indexed systems of FD.

Moments of the single components
The aim of this subsection is to prove Proposition 3 in the case without any averaging, i.e., α = 0. According to the moment formula for tree indexed systems of SRW we proceed as follows.
• We begin with the analysis of tree indexed RW giving the rescaling analysis of the underlying migration. This is formulated in Proposition 4 in subsection 4.2.1.
• Applying Proposition 4, in subsection 4.2.2 we prove Proposition 3, at the present time, for single components only. Recall from (3.32) that we have to sum up, for all possible genealogical groves, expressions of the form (3.29). To do this, we use the representation of TI-FD as total mass process of TI-SRW. We need to ensure that possible error terms are summable. We establish the latter by using estimates obtained from the knowledge about the exact cardinalities of binary groves which was given in Lemma 1.
• The biggest part of this subsection is dedicated to the proof of Proposition 4 in subsection 4.2.3. We proceed inductively over the length of a genealogical tree.

Rescaling of tree indexed RW (Proposition 4)
Fix a space of scaling exponents, (T, A), and a set of sequences of space-time points, R ext t , being on a (T, A)-scale (compare with (3.18)). For this situation, a labeling function S t (compare (3.22)) and a collection of trees with leaves T + are induced. Fix one of the trees called G ∈ G (1) T + . Our aim in this subsection is to analyze the system of grove indexed systems of RW, ξ (G,·) (given in (3.13)), via the multiple space-time scales, R ext t . Recall (3.29) to (3.31). Then we have two tasks. First we need to look for a suitable renormalizing function, f , such that the sequence U (e)) (4.7) converges to a nontrivial limit, as t → ∞.
Secondly, we need to sum up over the sites x ∅ ∈ Z 2 and to look for a second suitable renormalizing function, F , such that the sequence converges once more to a non-trivial limit, as t → ∞. It turns out that the latter limit is given by the coefficients c (T,A) (G) prescribed by (T, A) (compare (4.1)).
Notice that the renormalizing functions, f and F , lead back to the basic rescaling analysis of RW, i.e., for x ∈ R 2 , lim and hence for x ∈ R 2 and α ∈ [0, 1), Recall from (1.6) the covariance matrix Q of the migration. Then ϕ(·) denotes the density of a normal distribution with mean zero and covariance matrix Q, i.e., with Q-norm, We introduce now the basic notation, which is necessary to state the analogies of (4.9) and (4.10). (4.13) The restriction on the Λ (T,S) · -integral has to be taken from (3.31).

(4.15)
In the following we suppress the superscript as long as no problems arise, i.e., we abbreviate m (T,A,R ext t ) and m (T,A,R ext t ) by m and m, respectively.

Proposition 4 (Multiple scaling; grove indexed systems of RW)
(a) Depending on the order of magnitude of that time period, during which all random walk branches run together, we distinguish two cases.
(i) (Trunk of a long time period) the following holds: and uniformly in the sequences (x ∅ t , u ∅ t ) in R 2 × R + such that in addition to (4.18) and (4.19), for all e + ∈ T + also,

Proof of Proposition 3 for single components
Proposition 4 allows us to give a simple proof of Proposition 3 in the case without any blockaveraging.
Proof of Proposition 3. The Case α = 0. Recall from (2.14) that the setting of block-means guarantees for α = 0, x ∈ Z d and t ≥ 1, On the other hand, by the representation of FD as the total mass process of SRW, For a given set of space-time points R ext = {y e + , e + ∈ T + ; s e , e ∈ T 0 }, we fix t ≥ ∨ e + ∈T + s e + + 3. Once more by the moment formula for tree indexed systems of SRW given in Proposition 2, and by part (b) of Proposition 4, Hence, it is sufficient to look for a Γ < ∞ such that for #T + ≥ 1, Recall that the cardinality of T + -groves is explicitly calculated in Lemma 1. By the following rough estimate,

Inductive proof of Proposition 4
The proof of Proposition 4 follows the idea of the proof of Theorem 8.1 in Durrett (1979) [14]: we proceed inductively over the length l(G) of G.
To do this induction properly, we need a further technical lemma giving the uniformity in the central limit theorem, which later allows explicit calculations.

Lemma 2 (One space scale; RW)
(a) Uniformly in sequences (x t , s t ) such that, as t → ∞, uniformly in u ∈ [0, d t s t ] and uniformly in z ∈ Z 2 with z Q ≤ √ uK u , the following holds: lim s t ) such that, as t → ∞, in addition to (4.38), In the proof of (4.42) we make use of a very precise expansion of the migration kernel given by Corollary 22.3 in Bhattacharya and Rao (1976) [1]. It states that on the assumption that the discrete kernel is symmetric and possesses finite second order moments, the following holds, as n → ∞: sup By a straight forward calculation, (4.43) transfers to the continuous time analogue: as t → ∞, But that implies that for each > 0 which is independent of (x t , s t ), u and z, which proves (4.42).
(a) Furthermore, notice that (4.46) Hence our task is to prove that on the assumptions about the sequences (a t ), (C t ), (d t ) and (K t ), uniformly in (x t , s t ), u and z fulfilling (4.37) and (4.38), This, however, holds because , it remains to prove the analogue of (4.47), i.e. that uniformly in (x t , s t ) fulfilling (4.38) and (4.40), u and z, This follows because where the first inequality holds since e x ≥ 1 + x, x ≥ 0. 2

Induction step: the basic recursion relation
Let now l(G) ≥ 2. Suppose that the assertions are true for trees of length n < l(G). The induction step is fairly complicated. We begin by giving the basic recursion relation. Then we outline the three parts in which we split the induction step.
The induction is based on the following recursive formula: qqppqp qqppqp Figure 6 shows the tree G e decomposing into G 1 and G 2 .
To be in a position to apply the induction hypotheses, we need the following three items: A. We want to truncate the domain of integration from u e 1 ∈ [ u ∅ t , U (T,St) (e 1 ) ] to u e 1 ∈ I t , for some I t ⊆ [ u ∅ t , U (T,St) (e 1 ) ], in a way that ensures that the terms left out are small enough.

B.
For u e 1 ∈ I t we want to restrict the spatial summation from z e 1 ∈ Z 2 to z ∈ D u e 1 , for some D u e 1 ⊆ Z 2 , again such that the terms left out are small enough. Here we make use of the induction hypotheses.

C. Having done A and B, it remains to evaluate
We treat the above parts A -C below after first preparing some tools.

Induction step: preparation (including proof of Proposition 4(b))
To carry out A and B, we need to get upper bounds for the terms we want to remove. For this we need uniform bounds on M (recall (4.52) with (4.13)). Therefore, we first of all prove the following uniform estimates, which then imply the proof of Proposition 4(b) immediately:
It remains to show (4.61). We proceed once more by induction on the number of G 2 's leaves.
Let now #G + 2 ≥ 2 and suppose that (4.61) is true for trees which possess less leaves than G 2 . Once more by the induction hypothesis (4.56), (4.66) Thus we have proved (4.56) and (4.57). This implies specifically part (b) of Proposition 4. 2

Induction step: part A
-Fix a non-negative sequence (C t ) ↑ ∞.
-Fix a bounded sequence (d t ) such that d t ∈ [0, 1/2] with (d t ) ↓ 0, but so slowly that and a sequence (D t ) ↑ ∞, but so slowly that as t → ∞.
-To avoid too much indices, we set for i.e., the above integrals are small enough to be neglected.
Recall m(·, ·, ·) from (4.14). Then applying (4.56) yields Since for k ≥ 0, u ≥ e k , (log u) k /u decreases in u, we obtain again by (4.56) for t sufficiently large, (4.74) By a similar argument the following holds: (4.75) An argument like (4.72) works as well in the estimation of the second summand: This finishes the proof of (4.70). Analogous calculations, which use an estimate like (4.75), work as well to finish the proof of (4.71).
-In the following we choose therefore: To see this, we start by replacing summation over subsets of the lattice by integration over subsets of the plane with respect to the Lebesgue measure.
-The extension is defined by . It is noteworthy that that is, D U (T,S t ) (e 1 )−u e 1 exhausts the whole plane in the limit as t → ∞.
Observe furthermore that uniformly in u e 1 ∈ I 1 t ,  Case (i(a)). Suppose that u e 1 ∈ I 1 t , i.e., in particular This is the only case where the main terms will not disappear in the limit, i.e., in this case the principle calculations take place.
(4.112) implies specifically that even for e 1i , i = 1, 2,  Hence for each tree G and β ≥ 1 − A(∧ e + ∈G + e + ), Thus the right hand side of (4.114) is equal to  Observe that (4.134) is essential at this point in the sense that (4.136) would not appear in the case where the trunk comprises a long time period, i.e., U (T,St) (e 1 ) − s ∅ > C t (∆R ext t (G) ∨ 1). Then by an estimate like (4.75), we may show even a bit more than required in (4.135). Namely, (4.137) Case (ii.i(b)). Let (t n ) ↑ ∞ be a subsequence such that Since this implies in particular that ∆R ext tn (G) is dominated by the spatial distances, we may w.l.o.g. assume that e + ∈ G + 1 and f + ∈ G + are such that ∆R ext tn (G) = y e + tn − y f +  is on an extended (T, A ∧ (1 − α))-scale.
Fix θ > 0. We are then in a position to formulate the result, which we show in subsection 4.3.2 and which allows immediately to complete the proof of Proposition 3 in subsection 4.3.1.

Lemma 4 (Asymptotics of block means; tree indexed systems of SRW)
Uniformly in the initial times s ∅ in R + such that as t → ∞, the following holds: Secondly, since B ·, α and Z 1−α are non-negative with E θλ [B ·, α ] ≡ θ < ∞ and E θ [Z 1−α ] ≡ θ < ∞, the functions ϕ X t and ϕ Z t are analytic in the half space D. By Theorem 1.1 in [16], in some non-trivial neighborhood of the origin 0 ∈ C T + , the following power series expansion holds: In order to be in a position to apply Proposition 3 on the right hand side of (5.6), notice that there is no need to worry about the appearing multiplicities in (5.6) because we can always find an other genealogical grove, which fits in the assumptions on Proposition 3, i.e., which avoid multiplicities, but yields the same mixed moments (compare this with figure 7.). Then by (4.2) and the dominated convergence theorem, for all κ ∈ D , the right hand side of (5.6) tends to where for a given labeled (sub)tree, (T, A), and for ν ∈ N T + ; |ν| ≥ 1, the labeled ν-tree, (T ν , A ν ), is explained in the following figure: 0 the right hand side of (5.11) is equal to ], (5.13) which proves (2.26).
(b) Fix κ := (κ e + ; e + ∈ G + ) ∈ R G + , and let Recall that size-biasing a random vector of independent components with one of its components effects only the distinguished component. Hence by Corollary 1, applied on the ν-space of scaling exponents (G ν , A ν ) (recall Figure 7), and the dominated convergence theorem, the right hand side of (5.15) converges to In order to prove the theorems for (G, S t )-indexed systems of BRW it is enough to compare their space-time mixed moments with those for (G, S t )-indexed systems of SRW. Since by the diffusion limit procedure the underlying motion became deterministic, and due to common ancestor particles/mass the components in branching models are positively correlated, it is obvious that for each labeled grove, (G, S), and for each family of test functions F := (F e + ; e + ∈ G + ) such that for each e ∈ G + , F e + ∈ M + 0 , and for each µ ∈ l 1 γ (compare with (1.12)) On the other hand, for the particular translation invariant initial states, we find the following:

Lemma 5 (A moment comparison estimate)
Let (G, S) be a labeled grove. For each family of test boxes L := (L e + ; e + ∈ G + ) such that for each e ∈ G + , L e + is a finite subset of Z d , Proof of the theorems ((G, S t )-BRW). Fix κ := (κ e + ; e + ∈ G + ) ∈ R G + , and recall X (G,St),α (κ) from (5.14). So far we have shown (5.18). Since the right hand side of (5.18) does not depend on θ, and since the intensity measures of SRW started with δ θλ and POIS θλ (recall (1.3)), respectively, coincide, we also have (E POIS θλ ) The same reasoning works for part(a) of Theorem 1. 2 We still have to show Lemma 5.
Proof of Lemma 5. We proceed inductively on the length, l(G), of the grove, i.e., the length of the longest word in G. First let G be of length l(G) = 1. This implies that for a given initial state all branches are independent. Hence by conditioning on the initial state, Inserting the latter in the right hand side of (5.31) gives the claimed assertion.