Rescaling limits of the spatial Lambda-Fleming-Viot process with selection

We consider the spatial Lambda-Fleming-Viot process model [BEV10] for frequencies of genetic types in a population living in R^d, with two types of individuals (0 and 1) and natural selection favouring individuals of type 1. We consider two cases, one in which the dynamics of the process are driven by purely `local' (fixed radius) events and one incorporating large-scale extinction-recolonisation events whose radii have a polynomial tail distribution. In both cases, we consider a sequence of spatial Lambda-Fleming-Viot processes indexed by n, and we assume that the fraction of individuals replaced during a reproduction event and the relative frequency of events during which natural selection acts tend to 0 as n tends to infinity. We choose the decay of these parameters in such a way that when reproduction is only local, the measure-valued process describing the local frequencies of the less favoured type converges in distribution to a (measure-valued) solution to the stochastic Fisher-KPP equation in one dimension, and to a (measure-valued) solution to the deterministic Fisher-KPP equation in more than one dimensions. When large-scale extinction-recolonisation events occur, the sequence of processes converges instead to the solution to the analogous equation in which the Laplacian is replaced by a fractional Laplacian (again, noise can be retained in the limit only in one spatial dimension). We also define the process of `potential ancestors' of a sample of individuals taken from these populations, which takes the form of a system of branching and coalescing symmetric jump processes. We show their convergence in distribution towards a system of Brownian or stable motions which branch at some finite rate. In one dimension, in the limit, pairs of particles also coalesce at a rate proportional to the local time at zero of their separation.


Introduction
The spatial Λ-Fleming-Viot process (SLFV) was introduced in [Eth08,BEV10]. In fact, it is not so much a process as a general framework for modelling frequencies of different genetic types in populations which evolve in a spatial continuum. For example, it is readily adapted to include things like the large-scale extinction/recolonisation events which have dominated the demographic history of many species. In this paper we shall be concerned with extending the framework to incorporate an important form of natural selection and investigating various rescaling limits which capture the resultant patterns of genetic diversity over large spatial and temporal scales. In particular, we recover the Fisher-KPP equation and, in one dimension, its stochastic counterpart. In the presence of large-scale demographic events we obtain analogous equations with the Laplacian replaced by the fractional Laplacian, but, intriguingly, no other trace of the large-scale events survives. The limits obtained here assume that the 'neighbourhood size', which corresponds to the local population density, is high, thus complementing results of [EFS14] which address the interaction of natural selection and genetic drift when neighbourhood size is small. region are replaced by offspring of a parent chosen from the population immediately before the event. Events occur according to a Poisson point process. The Poisson structure renders the process particularly amenable to analytic study. In the neutral setting which has been studied thus far in the literature, the parent is chosen uniformly at random from the affected region, irrespective of type. There are many possible ways to incorporate natural selection. Here we shall focus on one of the simplest, but also most important, in which in the selection of the parent, individuals are weighted according to their genetic type.
To motivate our definition of the process with (fecundity) selection, suppose that there are two possible types in the population, which we shall denote by 0 and 1. In order to give a slight selective advantage to type 1, we fix a selection coefficient s > 0 and suppose that, when an event falls, if the proportion of type 0 individuals in the affected region immediately before the event isw, then the probability of picking a type 0 parent is p(w, s) =w/(1 + s(1 −w)). Typically one is interested in weak selection, so that s ≪ 1 and, in this case, we can estimate this probability by (1 − s)w + sw 2 . Here again we reap the benefit of the Poisson structure of events: we can think of events as being of one of two types. A proportion (1 − s) of events are 'neutral': the parent is selected exactly as in the neutral setting and has probabilityw of being of type 0. On the other hand a proportion s of events are 'selective' and then the probability of a type 0 parent isw 2 . One way to achieve this is to dictate that at selective events we choose two potential parents and only if both are type 0 will the offspring be type 0. The Poisson structure allows us to view neutral and selective events as being driven by independent Poisson processes. This approach exactly parallels that usually adopted to incorporate genic selection into the classical Moran model.
Of course there are many ways to modify the selection mechanism. For example, in Definition 1.1 below, we allow both the distribution of the size of the region affected and of the impact to differ between selective and neutral events.
Let us turn to a precise definition. First we describe the state space of the process. We suppose that the population is evolving in a geographical space E ⊆ R d . We borrow some results from [VW13] in the special case when the space of genetic types is K = {0, 1}. At each time t, the population will be represented by a random function {w t (x), x ∈ E} defined, up to a Lebesgue null set of E by w t (x) := proportion of type 0 at spatial position x at time t.
Similarly, if (t, x, r, u) ∈ Π S , a selective event occurs at time t, within the closed ball B(x, r): 1. Choose two 'potential' parental locations z, z ′ , independently and uniformly at random within B(x, r), and at each of these sites 'potential' parental types κ, κ ′ , according to w t− (z), w t− (z ′ ) respectively.
Since we only consider this particular form of selection in this paper, there should be no ambiguity in simply calling this process the SLFV with selection, but we emphasize that, although this is certainly one of the most natural, there are many alternative models. For example, one could modify the construction so that one first selects a parental type and then an impact depending on that type, or one could 'kill' with differential weights (c.f. [BP13,Fou13] in the non-spatial setting).
Two different constructions of SLFV are given in [EK14]. Both incorporate forms of selection. The first establishes, under somewhat weaker conditions than (2), the existence of an SLFV with differential killing. The second, under precisely conditions (2), allows for the sort of selection which we consider here. Notwithstanding the proof of existence provided by that paper, in §2.2 we shall provide a direct proof of existence. Although our approach does not yield the 'lookdown' construction of [EK14], and nor does it allow us to relax conditions (2), it is extremely flexible and can, under conditions analogous to (2), be readily modified to incorporate models with many different forms of selection. Thus our first result is the following.
Theorem 1.2. Let E be a connected subset of R d , and µ, µ ′ , ν and ν ′ satisfy (2). Write L for the generator of the SLFVS acting on test functions from D(L) = {F ( w, f ); f ∈ C c (E), F ∈ C 1 (R)}. Then the martingale problem for (L, D(L)) is well-posed.

Statement of the main results
Our main results concern the patterns of variation that we see under this model if we look over large spatial and temporal scales. We are concerned with the regime of large neighbourhood size, corresponding to small impact. We shall concentrate on the particular case in which E = R d and µ ′ (dr)ν ′ r (du) = sµ(dr)ν r (du) for a small parameter s > 0. This corresponds to the weighting of the selection of the parent which motivated our definition of the SLFVS. In fact we shall choose very special forms for the measures µ(dr) and ν r (du). Our results will certainly hold under much more general conditions, but the proofs become obscured by notation.
More precisely, for each n ∈ N, we fix a number u n ∈ (0, 1) and assume that all events (neutral and selective) have impact u n , that is, ν r (du) = δ un (du) for every r > 0 and that µ ′ = s n µ for some s n > 0. We consider the regime in which u n and s n go to 0 as n → ∞. That is, the neighbourhood size (or local population density) tends to infinity while selection is weak. This mirrors the usual assumptions in the Moran and Wright-Fisher models, in the absence of spatial structure, in which one is interested in the scaling limits that are obtained as population size N tends to infinity while N s N remains O(1). We shall find scalings of time and space for which the rescaled SLFVS converges to a non-trivial limit as n → ∞. Specifically, we identify a parameter regime in which the SLFVS behaves like the Fisher-KPP equation (with noise in d = 1), or its analogue with long-range dispersal. In particular of course, when neighbourhood size is large, this will tell us how strong selection must be (relative to neighbourhood size) if we are to see its effect over large spatial and temporal scales. With this in mind, let us assume that for some γ, δ > 0, u n = u n γ , and s n = σ n δ . Further, let V r denote the volume of a ball of radius r (which, of course, depends on the dimension d, but we suppress that in our notation).
• Stable radii: For some α ∈ (1, 2), we set Not surprisingly, we recover the parameters for the fixed radius case from those for stable radii on setting α = 2. We provide an informal argument which explains why this is the appropriate choice of the parameters β, γ, δ in §3.
To simplify notation, we write M for M λ (R d ×{0, 1}), and D M [0, ∞) for the set of all càdlàg paths with values in M. We also write C ∞ c (R d ) for the set of smooth compactly supported functions on R d . Our main results are as follows.
Theorem 1.3. (Fixed radius) Suppose thatw n 0 converges weakly to some w 0 ∈ M. Then, as n → ∞, the process (w n t ) t≥0 converges weakly in D M [0, ∞) towards a process (w ∞ t ) t≥0 with initial value w ∞ 0 = w 0 . Furthermore, is the unique process for which, for every f ∈ C ∞ c (R), is a zero-mean martingale with quadratic variation where Γ R > 0 depends only on R and is defined in (24).
where, again, Γ R > 0 depends only on R and is defined in (24).
In other words, in one space dimension, the limiting process (w ∞ t ) t≥0 is a weak solution to the stochastic partial differential equation with w 0 = w 0 , and W a space-time white noise. In dimension d ≥ 2, on the other hand, the noise term disappears in the limit and (w ∞ t ) t≥0 is a weak solution of the deterministic Fisher-KPP equation Remark 1.4. We emphasize that we are taking the limit ofw n , not that of w n .
To state the corresponding result for stable radii, we need some more notation. We write V r (x, y) for the volume of B(x, r) ∩ B(y, r) and define We shall check in Lemma 5.1 that this defines the generator of a symmetric stable process (that is, it is a constant multiple of the fractional Laplacian). Our result for stable radii is then as follows.
Theorem 1.5. (Stable radii) Suppose thatw n 0 converges weakly to some w 0 ∈ M. Then, as n → ∞, the process (w n t ) t≥0 converges weakly in D M [0, ∞) towards a process (w ∞ t ) t≥0 with initial value w 0 . Furthermore, if D α denotes the generator of the symmetric α-stable process defined in (4), then (i) When d = 1, (w ∞ t ) t≥0 is the unique process for which, for every f ∈ C ∞ c (R), is a zero-mean martingale with quadratic variation Remark 1.6. Observe from (4) that here again, the deterministic component of the limiting motion is proportional to u and the quadratic variation is proportional to u 2 so that u can be thought of as scaling time (c.f. Remark 3.1). Moreover, the limiting process that we have obtained under this rescaling is a weak solution to a (stochastic) pde which only differs from that obtained in the fixed radius case in that the Laplacian has been replaced by the generator of a symmetric stable process. This is, perhaps, at first sight rather surprising. The only effect of the large scale events is on the spatial motion of individuals in the population, and we see no trace of the correlations in their movement, or of the selection or 'genetic drift' acting over large scales, that we have in the prelimiting model. Notice also that the scaling of s n (relative to u n ) that leads to a nontrivial limit is independent of spatial dimension. This is in contrast to the case of bounded neighbourhood size considered in [EFS14].
As remarked above, we would obtain the same results under much more general conditions. For example, in selecting the regions to be affected by events, not only could one take more general measures µ (it is the tail behaviour of µ(dr) that we see in our limits), but also reproduction events don't need to be based on discs. We anticipate that this robustness will also be maintained if one replaces our selection mechanism with any other in which one type is favoured over the other (with appropriate modification if the strength of selection is 'density dependent', that is the parameter s n depends on the local frequencies of the different types in the population), and it should be clear how to modify our proofs in such cases.

Duality
In the same way as the Wright-Fisher diffusion is dual to a Kingman coalescent, the neutral SLFV is dual to a system of coalescing random walks that describe the ancestry of a sample from the population. In the presence of selection, the Kingman coalescent is replaced by the branching and coalescing structure known as the ancestral selection graph (ASG), [KN97,NK97]. In §2 we shall see that the SLFVS is dual to a spatial analogue of the ASG, described in terms of a system of 'branching and coalescing' random walks (although the mechanism of branching and coalescence differs substantially from that in the classical ASG). Indeed, this duality will guarantee uniqueness of the SLFVS. We shall use knowledge of the limiting forwards in time model to recover corresponding limiting results for our rescaled branching and coalescing duals. These will be stated in Theorems 2.5 and 2.6. The difficulty with proving these results directly stems from problems with identifying the limiting coalescence mechanism in one dimension. This contrasts with the situation of uniformly bounded neighbourhood size, considered in [BEV12] in the neutral case and in [EFS14] in the selective case, where it is the ability to identify the limiting behaviour of the (analytically tractable) coalescent dual that allows us to prove results about the forwards in time model.

Structure of the paper
The rest of the paper is laid out as follows. In §2we provide a very direct proof of existence of the SLFVS and prove the duality with an analogue of the ASG. We also state the counterparts of Theorems 1.3 and 1.5 for the rescaled dual processes. In §3, we provide heuristic arguments to explain our rescalings. In §4, we turn to proving Theorem 1.3, the scaling limit in the case of fixed radii, and Theorem 2.5 which provides the corresponding result for the rescaled duals. In §5, we prove Theorems 1.5 and 2.6, the analogous results for stable radii. In Appendix A, we obtain continuity estimates for the rescaled SLFVS of Section 5. In particular, these rather technical estimates are a key ingredient in (and a nice complement to) the proof of Theorem 1.5.
2 The spatial Λ-Fleming-Viot process with selection 2.1 A dual process of branching and coalescing jump processes In this subsection, assuming existence of the SLFVS, we identify a dual system of branching and coalescing lineages. The idea is that we sample a finite collection of points at time zero. At any given time in the past, our dual describes the smallest subset of spatial points at which we must know the distribution of types in the ancestral population, in order to determine the current distribution of types at our sampled points. Put another way, it can be understood as the collection of 'potential ancestors' of a sample from the current population. Only by knowing the types of all these potential ancestors are we able to extract the true ancestry of the sample.
The dynamics of the dual are driven by the same Poisson processes of events Π N and Π S that drove the forwards in time process. These processes are reversible and we shall abuse notation by indexing events by 'backwards time' when discussing our dual. We suppose that at time 0, 'the present', we sample k individuals from locations x 1 , . . . , x k and we write ξ 1 s , . . . , ξ Ns s for the locations of the N s 'potential ancestors' that make up our dual at time s before the present.
Definition 2.1 (Branching and coalescing dual). The branching and coalescing dual process (Ξ t ) t≥0 is the n≥1 E n -valued Markov process with dynamics defined as follows. At each event (t, x, r, u) ∈ Π N : 1. for each ξ i t− ∈ B(x, r), independently mark the corresponding potential ancestral lineage with probability u; 2. if at least one lineage is marked, all marked lineages disappear and are replaced by a single potential ancestor, whose location is drawn uniformly at random from within B(x, r).
At each event (t, x, r, u) ∈ Π S : 1. for each ξ i t− ∈ B(x, r), independently mark the corresponding potential ancestral lineage with probability u; 2. if at least one lineage is marked, all marked lineages disappear and are replaced by two potential ancestors, whose locations are drawn independently and uniformly from within B(x, r).
In both cases, if no particles are marked, then nothing happens.
Since we only consider finitely many initial individuals in the sample, the integrability condition (2) guarantees that the jump rate in this process is finite and so this description gives rise to a well-defined process. Note that the process never becomes extinct, since any death is accompanied by the birth of at least one new potential ancestor.
The difficulty that we face in establishing a duality between this system of branching and coalescing lineages and the SLFVS is that, just as in the neutral setting, the SLFVS will only be defined, as a function, Lebesgue a.e. and so the usual test functions used to establish such dualities in population genetics, which take the form k i=1 w t (x i ) for fixed points x 1 , . . . , x k ∈ E, will not make sense. However, if, instead of taking deterministic points x 1 , . . . , x k , we take random points, with a distribution which has a density ψ with respect to Lebesgue measure on E k , then the test function is replaced by Assuming that the spatial Λ-Fleming-Viot process with selection exists, we have the following property.
Proposition 2.2. The spatial Λ-Fleming-Viot process with selection is dual to the process (Ξ t ) t≥0 in the sense that for every k ∈ N and ψ ∈ C(E k ) ∩ L 1 (E k ), we have Remark 2.3. Of course, we are abusing notation here: the expectations on the left and right of this equation are taken with respect to different measures. The subscripts on the expectations are the initial values for the processes on each side.
Proof of Proposition 2.2. By linearity, there is no loss of generality in supposing that ψ is a probability density on E k , which we shall think of as the distribution of Ξ 0 . Again by linearity, setting Nt j=1 w 0 (ξ j t ) to be zero on the Lebesgue-null set where it is not defined, we then have Now note that if the distribution of Ξ 0 has a density with respect to Lebesgue measure on E k , then conditional on N s = l, the distribution of Ξ s has a density with respect to Lebesgue measure on E l . Thus, for such random initial conditions for Ξ 0 , and any times 0 < s < t, E[ Ns j=1 w t−s (ξ j s )] will be well defined. To complete the proof, it suffices to evaluate the generator, G, of (Ξ t ) t≥0 on an appropriate class of test functions. To simplify notation, in the calculations below, we write I for the set of indices of the lineages in Ξ s that lie in the region affected by an event. In addition, |D| denotes the cardinality of the set D and V x,r denotes the volume of the ball B(x, r) ⊆ E. Fix w ∈ M λ (E × {0, 1}). Then for any l ∈ N and x 1 , . . . , x l ∈ E, On the other hand, the generator L of the SLFVS applied to the function which, by Fubini's Theorem, is equal to the first term of G l j=1 w(x j ) integrated against the measure φ(x 1 , . . . , x l )dx 1 . . . dx l . In the same way, expanding the product, we see that the 'selection' term in LF φ is equal to the integral of the second term of G l j=1 w(x j ) against the same measure.
For each 0 ≤ s ≤ t, we now partition on the events {N s = l} and apply the calculation above with φ ∈ C(E l ) the density function of ξ 1 s , . . . , ξ l s conditional on N s = l, to deduce that for every t ≥ 0, provided that the distribution of Ξ 0 has a density with respect to Lebesgue measure, (where the expectation is with respect to the joint distribution of w t−s and Ξ s ). This proves the duality between (w t ) t≥0 and (Ξ t ) t≥0 .
Later it will be convenient to use a different set of test functions to characterise the SLFVS, namely the set of all functions of the form F ( w, f ), with f ∈ C c (E) and F ∈ C 3 (R). To ease notation, we define These quantities describe the value of the process immediately after an event (t, x, r, u) if the parent is of type 0 or type 1 respectively. The generator of the SLFVS on our new set of test functions then takes the form To see that this corresponds to our previous expression for the generator, observe first that, by a density argument, in Proposition 2.2 we could restrict our attention to functions ψ of the product form ψ 1 (x 1 ) · · · ψ k (x k ), with ψ i ∈ C c (E) for every i. The corresponding test function then reads k i=1 ω, ψ i . But by polarization, such a function can be written as a linear combination of functions of the form ω, f m , with m ∈ N and f ∈ C c (E). An easy calculation then shows that our two expressions for L coincide on these test functions. Either of these two sets of test functions characterizes the law of the SLFVS (see e.g. Lemma 2.1 in [VW13] for the first set, while the second assertion is a standard result on random measures), and so we can use them interchangeably. In particular, this guarantees the following corollary to Proposition 2.2.
Corollary 2.4. Suppose that (ω t ) t≥0 is a Markov process with generator given by equation (8) for every f ∈ C c (E) and F ∈ C 1 (R). Then (ω t ) t≥0 is dual to the system of potential ancestral lineages (Ξ t ) t≥0 with parameters µ, µ ′ , ν, ν ′ . Theorems 1.3, 1.5 have counterparts for the corresponding rescaled dual processes.
Theorem 2.5. For every n ∈ N, let (ξ 1 t , . . . , ξ Nt t ) t≥0 be the process of branching and coalescing jump processes which is dual to the unscaled process (w t ) t≥0 with parameters µ = δ R , µ ′ = s n δ R , ν R = ν ′ R = δ un , where s n = σ/n 2/3 and u n = u/n 1/3 . Define the rescaled process (Ξ n t ) t≥0 so that for every t ≥ 0, Ξ n t = ξ n,1 t , . . . , ξ n,N n t t := (n −1/3 ξ 1 nt , . . . , n −1/3 ξ Nnt nt ). Then, if d ≥ 2, as n → ∞, (Ξ n t ) t≥0 converges in distribution (as a càdlàg process) to a branching Brownian motion (Ξ ∞ t ) t≥0 , in which individuals follow independent Brownian motions with variance parameter uΓ R , which branch at rate uσV R into two new particles, started at the location of the parent. When d = 1, the corresponding object is a branching and coalescing system with the same diffusion constant and branching rate, but, in addition, each pair of particles, independently, also coalesces at rate 4R 2 u 2 times their local time together.
The result for stable radii has the same flavour: Theorem 2.6. For every n ∈ N, let (ξ 1 t , . . . , ξ Nt t ) t≥0 be the system of branching and coalescing jump processes which is dual to the unscaled process (w t ) t≥0 corresponding to the case of stable radii. Define the rescaled process (Ξ n t ) t≥0 with parameters u n = u/n γ and s n = σ/n δ in such a way that for every t ≥ 0, Then, if the initial condition Ξ n 0 converges weakly towards some Ξ 0 as n → ∞, (Ξ n t ) t≥0 converges in distribution (as a càdlàg process) to a system (Ξ ∞ t ) t≥0 of independent symmetric α-stable processes, which branch at rate uσV 1 /α into two particles starting at the location of their parent. The motion of a single particle is described by the generator D α defined in (4). In addition, when d = 1, each pair of particles, independently, coalesces at rate 4u 2 /(α − 1) times their local time together.
We shall prove these results in §4 and §5 respectively.

Proof of Theorem 1.2
In this section we provide a rather elementary proof of existence of the SLFVS as the unique solution to a martingale problem. We include it here as what it lacks in sophistication, it makes up for in flexibility.
If E has finite volume and µ and µ ′ have finite masses, then the events fall globally at a finite rate and the SLFVS is well-defined. To extend to arbitrary measures that satisfy condition (2) it is convenient to proceed in two steps: (i) We show existence when E is relatively compact but µ and µ ′ are only σ-finite.
(ii) Given (i), we extend to (an infinite subset of) R d by proving tightness of a sequence of processes obtained by restricting to an increasing family of sets (E n ) n≥1 which exhaust the space.
Proof of (i).
The strategy of the proof is as follows. First we show that the sequence ((w . Now if there is a solution to the martingale problem associated with (L, D(L)), by Corollary 2.4 it is dual to the system of branching and coalescing lineages of Proposition 2.2 and (since by Lemma 1.1 of [VW13] the test functions of Proposition 2.2 form a separating class) it is unique. This uniqueness, plus the uniform (in w) convergence of L (n) ( w, f ) to L( w, f ) is enough to apply Theorem 4.8.10 of [EK86] to deduce that in fact (w (n) t ) t≥0 converges as n → ∞ to a solution to the (L, D(L))-martingale problem. Let us check then tightness and uniform convergence.
, equipped with the topology of vague convergence, is a compact space (c.f. Lemma 1.1 in [VW13]). Therefore, by the Aldous-Rebolledo criterion, we only have to show that for every f ∈ C c (E) and every F ∈ C 1 (R), both the finite variation part and the quadratic variation part of the real-valued processes ((F ( w and, recalling the definitions of Θ + x,r,u (w) and Θ − x,r,u (w) from (7), the quadratic variation is where, as before, V x,r is the volume of B(x, r). Using the expression for L (n) given in (8), a Taylor expansion of F (of class C 1 ), and the fact that the increments of w (n) , f during an event are proportional to u, we obtain that for every w ∈ M λ (E × {0, 1}), where we have used that E has finite volume and, by assumption, µ n (dr)ν n r (du) ր µ(dr)ν r (du) (and the corresponding statement with primes). By condition (2), the expression on the righthand side is finite (and independent of w), and so we conclude that for every ε, there exists δ > 0 such that for every T > 0 lim sup where ω ′ is the Skorokhod modulus of continuity defined by Similarly, the increments of Ψ n (t) are bounded by F ′ f u 2 r d . But u 2 ≤ u and so the same reasoning shows that the sequence ((Ψ n (t)) t≥0 ) n≥1 is also tight. Tightness of ((w Convergence of L (n) to L.
Using (9) and the Dominated Convergence Theorem, we see that L (n) converges to the operator L given by Applying the estimate (10) with the measures µ(dr)ν r (du) − µ n (r)ν n r (du) and µ ′ (dr)ν ′ r (du) − µ ′n (r)ν ′n r (du), we see that the convergence is uniform, as required, and the proof of (i) is complete.
The proof of (ii) follows exactly the same pattern, but now the task of bounding the integrals defining Φ n and Ψ n becomes more delicate. The resolution is to exploit the fact that f has compact support S f . Fix µ, µ ′ , ν and ν ′ satisfying (2) and let {E n } n≥1 be a sequence of compact sets increasing to In this way, we have a sequence of SLFVS processes, which by an abuse of notation we also denote (w (n) t ) t≥0 , with generators L (n) given by The key observation is that and Vol{x : where C 1 and C 2 are independent of r and depend only on the support of f . Moreover, the estimate (14) is uniform in w and, in particular, the same bound holds if we replace w by 1 − w.
To see how to apply this, consider the neutral part of (13). We split the integral over (0, ∞) at some radius R 0 > 1. We have that To control the second part of the integral corresponding to the neutral part, notice that a simple estimate using the fact that the corresponding events have radius bounded above by R 0 , yields Exactly the same arguments control the selection part of the generator L (n) and, combining with the above, this gives the tightness of the finite variation parts of the processes. As in (i), since u 2 < u, tightness of the quadratic variation parts follows easily and the Aldous-Rebolledo criterion yields tightness of the sequence of processes ((w but with E n replaced by E in the domain of integration), notice that by Condition (2), by taking R 0 sufficiently large, the right hand side of (16) can be made arbitrarily small, independent of w, and this is enough to ensure that converges to zero, uniformly in w as n → ∞. By the same duality argument (based on Corollary 2.4) that we used in (i), the martingale problem associated with L has at most one solution, and so Theorem 4.8.10 in [EK86] yields the desired convergence. Theorem 1.2 is proved.

Heuristics
In this section we provide an informal justification of our choices for the parameters β, γ and δ. To understand why they lead to nontrivial limits, it is convenient to think about our branching and coalescing dual process.
First consider the case of fixed (or more generally bounded variance) radius events. Ignoring for a moment the selective (branching) events, a single ancestral lineage in the rescaled dual makes mean zero, finite variance, jumps of size of order 1/n β at rate proportional to nu n = n 1−γ . Thus, provided that 1−γ = 2β, its spatial motion will converge to a Brownian motion as n → ∞. Now consider what happens at a selective event. The two new lineages are born at a separation of order 1/n β . If we are to 'see' the event, they must move apart to a separation of order one before (perhaps) coalescing. The number of excursions they must make away from the region in which they can both be affected by an event (and thus coalesce) before we can expect to see such a 'long' excursion is order 1 in d ≥ 3, order log n in d = 2 and order n β in d = 1. On the other hand, when they are sufficiently close together that they can be hit by the same event, given that one of them jumps, there is a probability of order u n = u/n γ that the other one is affected by the same event and so they coalesce. So the number of times they come close to one another before they coalesce is order n γ . Thus, in the limit as n → ∞, for each branching event in the dual, in dimensions at least 2, the probability that there is a long excursion before coalescence (and so we 'see' the event) tends to one. Moreover, the same argument tells us that we will never see coalescence of any other lineages in our system. In one dimension, we can expect to see both branching and coalescence provided that the number of excursions we expect to wait before seeing a coalescence and the number we expect to wait before the lineages escape to a distance of order one, that is 1/u n and n β , are comparable. This gives β = γ and, combining with the condition 1 − γ = 2β above, we find β = γ = 1/3. Finally, selection events occur at a rate proportional to nu n s n in the rescaled process, and so we choose δ = 2/3 to make this order one.
We now turn to the stable case. As before, we first consider the motion of a single rescaled lineage and we see that if we choose nu n = n αβ , then in the limit as n → ∞ its motion will converge to a symmetric stable process. Now consider selection. Since u n → 0 as n → ∞, although it is now the case that two lineages can always be affected by the same event, 'most of the time' they will not and the motions are almost independent. Moreover, since 'small' events are so much more frequent than 'big' events, selection events are almost always 'small' and, moreover, lineages only have a realistic chance of coalescing when they are close together. We now use the same argument as before. The number of excursions away from each other before they are 'visible' under our rescaling is order 1 in d ≥ 2 and order n (α−1)β in d = 1. Equating this to the number of visits together before we expect to see a coalescence event yields γ = (α − 1)β. In order to see any selection events at all, we need nu n s n to be order one, so 1 − γ − δ = 0. We now have three equations in three unknowns (in one dimension) and solving gives the values in equation (3).
Remark 3.1. At first sight, these scalings do not perhaps look altogether natural. The reason for this is that the 'timescale' of the SLFV process is not one of generations. Suppose that one thinks of a generation as being the time that it takes for an 'individual' in the SLFV to be affected by a reproduction event. Then a generation is proportional to 1/u units of SLFV time. In the 'generation timescale', we are speeding up time by a factor of nu n and then we recognise the scaling in the fixed radius case as exactly the diffusive rescaling and the scaling in the stable case as its natural analogue when we have long-range dispersal.
Note also that by choosing 1 − γ = αβ and 1 − γ − δ = 0 but with γ > (α − 1)β, we could eliminate the coalescence term in one dimension, corresponding to removing the noise term in the forwards in time description of allele frequencies.
It turns out to be highly non-trivial to turn these heuristics into a rigorous proof and so, instead, we work with the forwards in time model and deduce convergence of the dual processes as a corollary.
4 Convergence of the rescaled SLFVS and its dual -the fixed radius case In this section, we prove Theorem 1.3 and, from it, deduce Theorem 2.5. From now on, we shall be concerned with E = R d and so there will be no ambiguity in writing V r for the volume of a ball of radius r. Recall our notation V r (x, y) for the volume of the intersection B(x, r) ∩ B(y, r) and that in the fixed radius case, all reproduction events have the same radius R > 0.

Proof of Theorem 1.3.
The proof proceeds in the usual way. First we show that the sequence of rescaled processes is tight, then we identify the possible limit points and finally uniqueness of the limit point guarantees that the whole sequence in fact converges.
1) Tightness. Recall the notation Θ + and Θ − from (7). Before scaling, the generator of the SLFVS with reproduction events of fixed radius R, and parameters u n , s n , acting on functions of the form F ( w, ϕ ) with ϕ ∈ C c (R d ) and F ∈ C 1 (R), takes the form To identify the generator of the rescaled process (w nt (n 1/3 ·)) t≥0 , observe that for a reproduction event with centre x, the value of w n t (y) will change iff n 1/3 y ∈ B(x, R). Since we also accelerate time by a factor n, the generator of (w n t ) t≥0 is given by where we have performed several changes of variables and we have set B n (x) = B(x, n −1/3 R).
The generator of the process (w n t ) t≥0 in which we are interested, acting on F ( w, f ) for f ∈ C c (R d ) and F ∈ C 1 (R), can be recovered by evaluating the expression above for the generator of (w n t ) t≥0 with ϕ of the form and observing that, by Fubini's Theorem, Remark 4.1. Assuming that f is twice continuously differentiable, and writing S f for its compact support, we have where the error term is bounded uniformly in x. Consequently, we can write We shall sometimes use this approximation to simplify our computations.
First we must show that the sequence of processes ((w n t ) t≥0 ) n≥1 is tight. Since the state space of the processes is compact (in the topology of vague convergence), as in Section 2.2, we use the Aldous-Rebolledo criterion to reduce the problem to tightness of the finite and quadratic variation parts of the sequences (F ( w n t , f )) t≥0 for every F ∈ C 3 (R) and f ∈ C 2 c (R d ). Fix such F and f . Since (w n t ) t≥0 is a jump process, it is a simple matter to identify the finite variation part, (Φ n (t)) t≥0 , of (F ( w n t , f )) t≥0 .
and its quadratic variation (Ψ n (t)) t≥0 is given by so that both increments are of the order of u n n −d/3 . Moreover we can use the bounds (14) and (15) to control their integrals.
Using this observation, we first show that |Φ n (t)| is bounded by a constant independent of n and w. To this end, we write it as the sum of a neutral term and a selective term and perform a Taylor expansion of F (truncating at second order in the neutral term and at first order in the selective term). This yields To control these expressions, one takes a Taylor expansion of ϕ f . We illustrate with the term A n (s). In fact, in identifying the limiting process we shall need a precise expression for the limit of A n (s) and so we perform the expansion slightly more carefully than would be required to simply conclude boundedness.
Let us write Dϕ f for the vector of first derivatives of ϕ f and Hϕ f for the corresponding Hessian (Hϕ f = DDϕ f ). Then Consider the first term on the right. Integrating first with respect to x (using Fubini's Theorem) this term is Vol(B n (y) ∩ B n (z))Dϕ f (y)(z − y)dzdy, and since Vol(B n (y) ∩ B n (z)) is a function of z − y alone, the integrand is antisymmetric as a function of z − y and so the integral with respect to z vanishes.
Similary, the integrals corresponding to the off-diagonal terms in the Hessian will vanish, leaving plus a lower order term. Now observe that (from Remark 4.1) In particular, we conclude that |A n (s)| ≤ C A for every time s and every large n. Very similar arguments allow us to control the other terms: and, again by the same arguments, Consequently, for every s < t we have which shows that the finite variation part of ((F ( w n t , f )) t≥0 ) n≥1 is tight. (In fact, its modulus of continuity is uniformly bounded and so we actually have tightness in the topology of uniform convergence over compact sets).
Similarly, an elementary application of (14) yields Notice that this bound is independent of the value of w n s . Substituting into the definition of Ψ n (t), we obtain that for every s < t, and the sequence of quadratic variations of ((F ( w n t , f )) t≥0 ) n≥1 is not only tight, but also when d ≥ 2 it tends to 0 uniformly over compact time intervals.
Combining these results with the Aldous-Rebolledo criterion, we conclude that ((w n t ) t≥0 ) n≥1 is tight in D M [0, ∞), as required.
2) Limiting process. We now identify the limiting process. We begin with the case d ≥ 2. Since the quadratic variation vanishes (uniformly in w) as n → ∞, any limit of ((w n t ) t≥0 ) n≥1 should be deterministic. It remains to identify the limit of the finite variation part. Specialising the computation of Φ n above to the case F = Id, we have where depends only on R and d. Next, Combining (23) and (25), and using the fact that the terms B n (s), C n (s) and E n (s) tend to zero uniformly in w, we conclude that any limit point of ((w n t ) t≥0 ) satisfies, for every f ∈ C ∞ c (R d ) and every t ≥ 0, As a consequence, such a limit is continuous and since the solution to this problem is unique, we conclude that ((w n t ) t≥0 ) n≥1 converges in distribution to the process described in (ii) of Theorem 1.3.
We now turn to the case d = 1. Observe first that (23) and (25) still hold. Consequently, we know that for every f ∈ C ∞ c (R d ) (taking again F = Id so that F ′′ = 0), is a zero-mean martingale with quadratic variation Recall that the sequence ((w n t ) t≥0 ) n≥1 is tight. Theorem 6.3.3 in [EK86] implies that if some subsequence ((w φ(n) t ) t≥0 ) n≥1 converges in distribution to a process (w t ) t≥0 , then the sequence ((M φ(n) t (f )) t≥0 ) n≥1 converges to the (time-changed Brownian motion) solution to where (B f t ) t≥0 denotes standard Brownian motion. As a consequence, any limit point (w t ) t≥0 of ((w n t ) t≥0 ) n≥1 satisfies the following system of stochastic differential equations: with w 0 = w 0 . Since this system has a unique solution, Theorem 1.3 is proved.

Proof of Theorem 2.5.
We divide the proof into two parts. The first, and simpler, shows that the only possible limit for ((Ξ n t ) t≥0 ) n≥1 is the system of branching and coalescing Brownian motions (Ξ ∞ t ) t≥0 . The second part, tightness of the sequence ((Ξ n t ) t≥0 ) n≥1 , is rather more involved and will be broken into a number of smaller steps.
Lemma 4.2. The finite dimensional distributions of the system of scaled processes ((Ξ n t ) t≥0 ) n≥1 converge as n → ∞ to those of the system of branching and coalescing Brownian motions (Ξ ∞ t ) t≥0 , described in the statement of Theorem 2.5. In particular, the only possible limit point for the sequence ((Ξ n t ) t≥0 ) n≥1 is (Ξ ∞ t ) t≥0 .
Proof By Theorem 1.3, the rescaled forwards-in-time processes (w n t ) t≥0 converge to the process (w ∞ t ) t≥0 for which, for every f ∈ C ∞ c (R d ), is a martingale, with quadratic variation 0 when d ≥ 2, and when d = 1. By Itô's formula, this description suffices to characterize the evolution of any product of the form w ∞ t , f 1 w ∞ t , f 2 · · · w ∞ t , f k , for any k ≥ 1 and f 1 , . . . , f k ∈ C ∞ c (R d ). Now, in Chapter 7 of [Lia09], Liang shows that when σ = 0, any solution to these equations is dual, through the set of functions (5), to a system of independent Brownian motions with variance parameter uΓ R , in which, when d = 1, individuals coalesce (pairwise), at a rate proportional to u 2 V 2 R times the local time at 0 of their separation. This is easily modified to σ > 0, in which case individuals branch into two at rate uσV R .
To be slightly more rigorous, let us describe a particle configuration Ξ by an element of N (R d ), the set of all point measures on R d , through the identification where f is measurable and takes values in (0, 1], is sufficient to characterize the law of Ξ. Now we use the duality formula (5) and the approximation (20) to write that, for every k ∈ N and ψ 1 , . . . , ψ k ∈ C 2 c (R d ), Letting n → ∞ and applying the Dominated Convergence Theorem, we deduce that the only possible limit for ((Ξ n t ) t≥0 ) n≥1 is the system of particles (Ξ ∞ t ) t≥0 . In addition, since (26) holds for every ψ ∈ C 2 c ((R d ) k ) and every w 0 , we also obtain convergence of the one-dimensional distributions of (Ξ n t ) t≥0 to those of (Ξ ∞ t ) t≥0 . The generalization to the finite-dimensional distributions is straightforward since the duality formula (5) holds on any time interval [s, t] (if we replace w 0 by w s and ξ j t by ξ j t−s ).

Tightness
We now show tightness of the sequence ((Ξ n t ) t≥0 ) n≥1 , for every initial value x = {x 1 , . . . , x k } ∈ (R d ) k . We apply Aldous's criterion based on stopping times (see Theorem 1 in [Ald78]). Fix T > 0 and f ∈ C 2 c (R d ) with values in [0, 1], and suppose that (τ n ) n≥1 is any sequence of stopping times bounded by T . We must show that for every ε > 0 there exists δ = δ(f, T, x) such that We shall proceed in a number of steps. First we control the maximum number of particles in Ξ n t up to time T . Conditional on this, it is easy to control the probability that there is a branch in an interval of length δ. If we can also show that with high probability there is no coalescence (so that the number of particles in the system does not change), then the problem is reduced to controlling the jumps in a random walk. The most involved step, which is the substance of Proposition 4.4, is showing that there is no accumulation of coalescence events. Lemma 4.3. Given ε > 0, there is K > 0 such that

Proof
Recall that two particles are created when at least one of the extant particles is affected by a selective event. For a given particle of Ξ n t , this happens at rate ns n V R u n = uσV R . Furthermore, the presence of more than one particle in the area affected by the event does not speed up the branching. Consequently, the number of particles in (Ξ n t ) t≥0 is stochastically bounded by the number of particles in a continuous-time branching process in which particles split (independently of one another) into two offspring at rate uσV R . Since the initial value, Ξ n 0 , has finitely many particles, we conclude that there exists K ∈ N such that for every n ∈ N, as required.
¿From now on, all our calculations will proceed conditional on the event A n = {sup 0≤t≤T |Ξ n t | ≤ K}. ¿From our calculations above, we already see that for any t ∈ [0, T ], conditional on A n , the probability that at least one particle is created during the time interval [t, t + δ] is bounded by K P a given particle branches in This bound is uniform and so we see that there exists δ 1 > 0 such that P x at least 1 particle created in [τ n , τ n + δ 1 ] ; A n ≤ ε 4 .
We also want to control the probability of coalescence events. Because of the calculation above, it is enough to do so in the absence of branching.
Proposition 4.4. Let B c δ denote the event that there is no branching event in [τ n , τ n + δ]. There exists δ 3 > 0 such that Before proving Proposition 4.4, let us turn to the final ingredient in the proof and control the jumps of a single lineage.
Conditional on the number of individuals not changing during a time interval of length δ, we can index the particles of Ξ n τn and Ξ n τn+δ by a common indexing set which we denote I n . Under this assumption, recalling that f ≤ 1, a Taylor expansion yields for some C > 1. Since, on the event A n , |I n | ≤ K, it suffices to consider the motion of each lineage separately. ¿From the description in Section 2.1, after rescaling of time and space, ξ n,1 jumps at rate nu n V R (1 + s n ) = n 2/3 uV R (1 + o(1)), to a new location whose distribution is symmetric about its current location. Since the locations of the lineage both before and after the jump belong to the same ball of radius Rn −1/3 , the length of the jump is bounded by 2Rn −1/3 . Moreover, since the distribution of the new location depends on n only through the spatial scaling by n −1/3 , it follows that the motion (ξ n,1 t ) t≥0 converges to Brownian motion. Doob's Maximal Inequality then implies that there exists C 1 > 0 such that for n sufficiently large and any δ, η > 0, where we have used the strong Markov property of ξ n,1 at time τ n . Together with (31) and the choice η = ε/(KC ∇f ), this shows that there exists δ 2 > 0 such that for n large enough, writing C c δ for the event that there is no coalescence in [τ n , τ n + δ], Remark 4.5. Before proving Proposition 4.4, let us remark that it is not enough to consider lineages at an initial separation of order O(1) (or O(n 1/3 ) before rescaling). In particular, when two particles are created through a selective event, their (rescaled) initial distance is of order O(n −1/3 ) and so we also need to control the coalescence of particles starting from very small initial separations.

Proof of Proposition 4.4.
It suffices to consider just two particles and find δ 3 > 0 such that the probability that they coalesce in a time interval of length δ 3 is bounded by ε/(2K(K − 1)), irrespective of their initial separation. Once this bound has been established, we can write since, on the event A n , there are at most K(K − 1)/2 pairs of particles at any time.
Recall that before scaling, each lineage jumps at rate proportional to u n . This makes it convenient to work in the timescale (n 1/3 t, t ≥ 0) and without rescaling space. We shall writẽ ξ n,i t = ξ i n 1/3 t , i ∈ {1, 2}. Whenξ n,1 andξ n,2 are separated by more than 2R, they cannot be contained in the same reproduction event, and so they evolve independently of one another. The ith lineage jumps at rate n 1/3 u n V R (1 + s n ) = uV R (1 + o(1)) to a new location, which is uniformly distributed over the ball B(Z, R), where Z itself is chosen uniformly at random from B(ξ n,i, R). In what follows, we only need that the jump made by each lineage is an independent realization of a random variable X taking values in B(0, 2R), whose distribution is symmetric about the origin.
On the other hand, when |ξ n,1 −ξ n,2 | < 2R, the two particles can both lie in a region affected by a given reproduction event and their jumps become correlated. In particular, if they are both affected by this event, they merge together. The generator of ((ξ n,1 t ,ξ n,2 t )) t≥0 takes the form 1 V R f (z,ξ n,2 ) + f (ξ n,1 , z) − 2f (ξ n,1 ,ξ n,2 ) dzdx We can think of this as composed of two parts: the process ((ξ n,1 t ,ξ n,2 t )) t≥0 whose generator is determined by the first three lines above, on top of which a coalescence event occurs at instantaneous rate u 2 n −1/3 (1 + s n )V R (0,ξ n,1 t −ξ n,2 t ) (recall that V R (0, a) is the volume of the intersection B(0, R) ∩ B(a, R)).
With this description, the probability that the two particles have not coalesced by time δn 2/3 (which corresponds to a time span of δ on the timescale of ξ n,i ) is given by where we have writtenT for the coalescence time of the two particles.
Since V R (0, x) = 0 when x ≥ 2R, it just remains to establish how much timeξ n,1 −ξ n,2 spends in the ball B(0, 2R) by time δn 2/3 . To do this, we define two sequences of stopping times, (σ n k ) k≥1 and (τ n k ) k≥1 by and for every k ≥ 1, Now, we have the following result.
Lemma 4.6. There exists C > 0 such that for every n, k ≥ 1, In words, although the two particles are correlated when they are close together, each 'incursion' ofξ n,1 −ξ n,2 inside B(0, 2R) lasts only O(1) units of time, uniformly in n. The proof of Lemma 4.6 is similar to that of Lemma 6.6 in [BEV10] (based on the facts that the difference walk jumps at a rate bounded from below by a positive constant, independent of its current value, and that the probability that this jump leads to a sufficient increase of their separation forξ n,1 t −ξ n,2 t to leave B(0, 2R) is also bounded from below by a positive constant). Therefore, we omit it here.
Outside B(0, 2R), the differenceξ n,1 t −ξ n,2 t has the same law as a symmetric random walk, with jumps of size at most 2R, jumping at rate 2uV R (1 + s n ). Its behaviour will be determined by the spatial dimension. d ≥ 3: When d ≥ 3, transience of the random walk guarantees that the number of timesξ n,1 −ξ n,2 returns to B(0, 2R) is a.s. finite. Since the parameter n appears only in the jump rates and not in the embedded chain of locations (during an excursion outside B(0, 2R)), the probability that the difference walk enters B(0, 2R) at least k times decays to 0, uniformly in n, as k → ∞. Together with Lemma 4.6 and the fact that V R (0, ·) is bounded, this shows that for every η > 0, lim n→∞ P x δn 2/3 0 V R (0,ξ n,1 s −ξ n,2 s ) ds > η n 1/3 u 2 (1 + s n ) = 0 As a consequence, coming back to (34) and choosing η small enough that P[Exp(1) ≤ η] ≤ ε/12, we can conclude that for any δ > 0, uniformly in x ∈ (R d ) 2 . d = 2: When d = 2, we claim that there exists C ′ > 0, independent of n, such that for every x 1 , x 2 with |x 1 − x 2 | > 2R, The proof of this claim is very similar to the beginning of the proof of Lemma 4.2 in [BEV12], and so we only sketch the main ideas. We can a.s. embed the trajectories of the difference processξ n,1 t −ξ n,2 t into the trajectories of a two-dimensional Brownian motion, in the same spirit as Skorokhod's embedding in one dimension (see e.g. [Bil95]). Now, since the jumps of the difference process (when outside B(0, 2R)) are rotationally invariant, we have inf |x 1 −x 2 |>2R P {x 1 ,x 2 } ξ n,1 −ξ n,2 leaves B(0, 4R) before entering B(0, 2R) > 0, and the result then follows from that for Brownian motion, namely Theorem 2 in [RR66] applied with a = 2R and r ≥ 4R. As a consequence, the number N n E of excursions outside B(0, 2R) that the difference walk makes before starting an excursion of (time) length at least δn 2/3 is stochastically bounded by a geometric random variable with success probability C/ log(δn 2/3 ). Now, once the difference walk has started such a long excursion (say, the kth one), it is sure not to come back within B(0, 2R) before time δn 2/3 and the number of incursions in B(0, 2R) in the time interval [0, δn 2/3 ] is bounded by k. Thus, fixing η > 0 as before, we obtain that where the last inequality uses the stochastic bound of N n E first, and then Markov's inequality. Choosing C n E = log n, for instance, we deduce that for any δ > 0, lim n→∞ P x δn 2/3 0 V R (0,ξ n,1 s −ξ n,2 s ) ds > η n 1/3 u 2 (1 + s n ) = 0, and we conclude as in (35). d = 1: Finally, when d = 1 it is shown in [PS71] that there exists C ′ > 0 such that for every Proceeding as before, and with the same notation, we therefore have Choosing C n E to be a constant large enough for the first term to be less than ε/12, and then δ 3 > 0 small enough for the second term to be less than ε/12, we obtain that for any δ ≤ δ 3 , We have now proved the desired bound for (30) in any dimension and the proof of Proposition 4.4 is complete.

Convergence of the SLFVS and its dual -the stable radii case
Recall the notation V r for the volume of a ball of radius r in R d , and V r (x, y) for the volume of the intersection B(x, r) ∩ B(y, r).
Proof of Theorem 1.5.

1) Tightness.
We shall use the same method as in the proof of Theorem 1.3, but the computations required will be different. Recall the notation Θ + x,r,un (w) and Θ − x,r,un (w) from (7). The generator of the unscaled process is To make the expressions easier to read, we retain the notation β, γ and δ from (3). The generator of the rescaled process is then given by To simplify notation, we shall show the tightness of ((F ( w n t , f )) t≥0 ) n≥1 . This implies that the sequence ((F ( w n t , f )) t≥0 ) n≥1 in which we are interested is tight on replacing f by φ f defined by and exploiting the bound where |δ n (w, f )| is bounded by a constant η(f ) > 0 uniformly in n, w. Just as in the previous section, we write (Φ n (t)) t≥0 for the finite variation part of (F ( w n t , f )) t≥0 and (Ψ n (t)) t≥0 for its quadratic variation. As before, it is convenient to split G n F ( w, f ) into its neutral and selective components. Using a Taylor expansion of the function F , we obtain that the neutral part is equal to where Consider the first term on the right hand side of (40). Since 1 − αβ − γ = 0, n 1−βα−γ = 1. We split the integral over the radii into the sum of the integrals over [n −β , 1] and [1, ∞]. By using a Taylor expansion of f and a symmetry argument to cancel the integral of (z − y)dz, we obtain that for some constants C, C ′ , C ′′ > 0. To control the integral over radii in [1, ∞), the cruder bound |f (y) − f (z)| ≤ 2 f suffices and, using (15), we have again for some constants C, C ′ , C ′′ and C ′′′ which depend only on d, F and f .
To control the second term on the right hand side of (40) we use the bounds (14), (15) to see that it is bounded by When d ≥ 2, d − α > 0 and so this bound tends to 0 as n → ∞. When d = 1, (α − 1)β − γ = 0, and so this term is bounded by a constant as n → ∞. The same calculation shows that ε n → 0, uniformly in w, as n → ∞. As a consequence, in any dimension the absolute value of the neutral term of G n F ( w, f ) is bounded by a constant independent of n and w.
Proceeding in the same way as for the second term above, we obtain that the selection term of the generator is bounded by (recall that 1 − αβ − γ = 0) since αβ − δ = 0. Together with our bounds on the neutral term, just as in the corresponding part of the proof of Theorem 1.3, this shows the tightness of the sequence of finite variation parts of (F ( w n t , f )) t≥0 .
For the quadratic variation part, a similar analysis yields that the integrand in Ψ n (t) is bounded by which is bounded by a constant independent of n and w. As before, we conclude that the quadratic variation part of (F ( w n t , f )) t≥0 is tight. By the same arguments as in the proof of Theorem 1.3, we conclude that the sequence ((w n t ) t≥0 ) n≥1 is tight in D M [0, ∞).

2) Identifying the limit.
Recall that the generator of (w n t ) t≥0 applied to the test function ·, f can be obtained by applying the generator of (w n t ) t≥0 (see (37)) to the function ·, ϕ f , where ϕ is given by (38). First, let us find the limit of its neutral component. Since 1 − αβ − γ = 0, the prelimit takes the form But just as in Remark 4.1, a simple Taylor expansion gives us that where B n (·) = B(·, n −β ) and the error term is uniform in y and z. Since as n → ∞, we can conclude that up to a vanishing error term, the neutral part of the generator is given by Now, our computations in the paragraph on tightness imply that the function a n (y) : is a continuous function, uniformly bounded in y and n. Hence, up to a vanishing error term we can first replace w n byw n in (41) and, second, use dominated convergence to pass to the limit as n → ∞ in the expression for a n . Doing so, we obtain that the limit of the neutral term in the generator of (w n t ) t≥0 , applied to somew, is equal to where, as in (4), Lemma 5.1. Writing D α is the generator of a symmetric α-stable process (ζ t ) t≥0 .

Proof
It is reassuring to first check that this is the generator of a well-defined Lévy process: To verify that the associated Lévy process is a symmetric stable process, we check the scaling property. The generator of (b −1/α ζ bt ) is given by But a simple change of variables gives us that and so D α b = D α for all b > 0. This shows the desired property of D α . Having identified the neutral part of the limit, we now turn to the selection part. It is given by Now, the term which is linear in w n is easy to deal with: by Fubini's theorem, it is equal to where the last equality uses the fact that αβ − δ = 0. Similar calculations show that the 'quadratic' term in (43) is equal to Suppose we have the following lemma (whose proof is quite technical and is given in Appendix A).
Lemma 5.2. Under the conditions of Theorem 1.5, for every x ∈ R d , t ≥ 0 and r ∈ [n −β , n −β log n], ¿From this result, we can conclude from a simple dominated convergence argument, and a Taylor expansion of ϕ f , that along any trajectory of w n , the 'quadratic' part of (43) is equal to So far we have shown that is a martingale. As a consequence, any limit point (w ∞ t ) t≥0 of (w n t ) t≥0 should satisfy: for every f ∈ C ∞ c and every t ≥ 0, where (M t (f )) t≥0 is a zero-mean martingale. As in the fixed radius case, when d ≥ 2 our computations in the paragraph on tightness show that M (f ) ≡ 0 and (45) characterizes any limit point of ((w n t ) t≥0 ) n≥1 . When d = 1, the instantaneous quadratic variation of w n t , f is given by where we have used the estimates obtained in the paragraph on tightness, which show that selective events and neutral events with radii greater than n −β log n do not contribute.
Writingw n,r (x) for the average value of w n over B(x, r) (so thatw n,n −β (x) =w n (x)), essentially the same techniques show that the integral above is equal to But the first term on the right hand side of (46) is equal to while the other two terms tend to 0 by Lemma 5.2 and dominated convergence. As a consequence, any limit of ((w n t ) t≥0 ) n≥1 satisfies the martingale problem stated in Theorem 1.5. Uniqueness follows from duality with the limiting process of Theorem 2.6, as before. This completes the proof of Theorem 1.5.
Finally, let us prove the convergence of the rescaled dual process.
Proof of Theorem 2.6.
Most of the proof is identical to that of Theorem 2.5. That the only possible limit for (Ξ n t ) t≥0 is the system of branching (and in one dimension coalescing) symmetric α-stable processes described in the theorem, again follows from an adaptation of Chapter 7 of [Lia09], in which the only change is that Brownian motion is replaced by the stable process generated by D α (see (42)).
Next, we have to show that the sequence ((Ξ n t ) t≥0 ) n≥1 is tight. As in the proof of Theorem 2.5, we shall use Aldous' criterion based on stopping times. That is, we fix an initial value with values in [0, 1] and show that if (τ n ) n≥1 is a sequence of stopping times bounded by T , then for every ε > 0 there exists δ = δ(f, T, x) > 0 such that Again, we proceed in three steps. First, by exactly the same arguments as in the proof of Theorem 2.5, and retaining the notation from that setting, there exists K > 0 such that for every n ∈ N, Furthermore, there exists δ 1 > 0, independent of the subinterval of [0, T ] considered, such that P x at least 1 particle created in [τ n , τ n + δ 1 ] ; A n ≤ ε 4 .
As before, the difficulty will be to control the coalescence, but suppose for a moment that there is no change in the number of particles in the interval [τ n , τ n + δ] and write I n for the indexing set of the particles in Ξ n τn . Then, exactly as before, and so it suffices to consider the motion of a single lineage. This is slightly more involved than in the fixed radius case. Let (Z n t ) t≥0 be a Lévy process, independent of (ξ n t ) t≥0 and with generator then the process (X t ) t≥0 defined by X t = ξ n t + Z n t has generator (1 + s n )D α , where D α was shown in Lemma 5.1 to be the generator of a symmetric stable process. Using the strong Markov property and standard results on the growth of Lévy processes, see e.g. [Pru81], we have for η and δ > 0, for a constant C which is independent of η and δ. Since Z n τn+t − Z n τn > η → 0, as δ → 0. Now, by construction, (Z n t ) t≥0 is a Lévy process whose generator D n satisfies, for f ∈ C 2 (R d ), where the Taylor expansion is justified since V r (x, y) = 0 if |x− y| > 2r and we are concentrating on radii r ≤ n −β , and the first integral on the right hand side of (52) vanishes by rotational symmetry. The process (Z n t ) t≥0 has finite quadratic variation, whose time derivative when Z n t = x is |y − x| 2 dydr = C ′ n −β(2−α) .
Hence, we can also conclude that for a fixed η and any δ, Coming back to (51), this means that we can find δ 2 > 0 such that for n large enough, Choosing η = ε/(KC ∇f ) and recalling (50), we obtain that for all sufficiently large n, Exactly as in the fixed radius case, tightness will be proven if we can show that coalescence events cannot accumulate. In particular, since we have controlled the total number of particles and the probability of branching, we just need to control the probability that two lineages coalesce. The result will be based on the following lemma.
Lemma 5.3. Let (ξ 1 n γ t ) t≥0 and (ξ 2 n γ t ) t≥0 be two independent copies of the motion of a single (unscaled) lineage on the timescale (n γ t, t ≥ 0), and let ζ n t =ξ 2 n γ t −ξ 1 n γ t denote their difference. Then, for every t ≥ 0 we have: (i) When d = 1, there exists C(t) > 0 such that Furthermore, the function t → C(t) can be chosen such that C(t) ↓ 0 as t → 0.
(ii) When d ≥ 2, lim n→∞ E 1 n γ n 1−γ t 0 1 2 α ∨ |ζ n s | α ds = 0. We defer the proof of Lemma 5.3 until after the end of the proof of Theorem 2.6. Suppose that we start with two lineages at some (unscaled) separation z 0 ∈ R d . As before, we work on the timescale n γ so that a single lineage jumps at rate O(1) and suppose the two lineages ξ 1 and ξ 2 are currently at locations 0 and z (in fact, only their separation matters). Then, the generator A of the difference walk (ξ 2 n γ t − ξ 1 n γ t ) t≥0 is equal to where ∆ is a cemetry state, corresponding to the two walks having coalesced. As a consequence, until coalescence we can couple the difference walk (on the timescale n γ ) with the difference (ζ n t ) t≥0 between two independent random walks, each jumping according to the law of a single lineage but with each jump z → y 'cancelled' with probability Vr(y,z) Vr dr .
(One can check that these two descriptions give rise to the same jump times and embedded chain.) Each time we cancel a jump, with probability one half it was a coalescence in the original system, but the key point is that if there are no cancelled jumps, then there was no coalescence.
It therefore suffices to show that we can find δ 3 > 0 such that, for sufficiently large n, the probability that an event is cancelled in the interval [0, δ 3 n 1−γ ] is smaller than ε/(4K(K − 1)). Now, according to the expression on the right-hand side of (54), when the two lineages lie at separation z ∈ R d , a cancelled event occurs at instantaneous rate Hence, (using the coupling with (ζ n t ) t≥0 ), the probability of having no event cancelled up to time n 1−γ t (corresponding to time nt in original units) is equal to But Lemma 5.3 shows that we can find δ 3 > 0 such that This completes the proof of tightness and therefore of Theorem 2.6.
Proof of Lemma 5.3. As before, we shall exploit the fact that (ζ n t ) t≥0 is 'nearly' a symmetric α-stable process. Indeed, the intensity at which (ζ n t ) t≥0 jumps by some vector y is independent of its current location and equal to Writing (Z n t ) t≥0 for a jump process, independent of (ζ n t ) t≥0 , and with jump intensity then the generator of the process (X t ) t≥0 , where X t = ζ n t + Z t , is precisely 2(1 + s n ) times the operator D α defined in (42), which we already checked corresponds to a symmetric α-stable process. Once again, the idea is that the jumps of (Z t ) t≥0 (which are bounded by 2) do not contribute much to the evolution of (X t ) t≥0 . More precisely, let us show that there exists C > 0 such that for every n large enough and every s ≥ 1, To this end, observe first that since the law of Z s is invariant under rotation, we can write that where Z (1) s denotes the first coordinate of Z s . Now, (Z (1) s ) s≥0 is again a symmetric Lévy process with jumps bounded by 2, and so Theorem 25.3 in [Sat99] shows that for every s, q ≥ 0, E[exp(qZ (1) s )] < ∞. In this case, it is known that the characteristic exponent Ψ of (Z (1) s ) s≥0 , given here by a formula of the form has an analytic extension to the half-plane with negative imaginary part, and we have As a consequence, the Markov inequality gives us that Since the measure ν has support in [−2, 2], we can write that when q is small where the first term on the right is zero, by symmetry. Consequently, there exists a constant C > 0 such that for every s ≥ 1, ψ(1/ √ s) ≤ C/s. Together with (58) and (59), this gives us (57). It will be convenient to suppose that ζ 0 = 0, but notice that there will be no loss of generality in so-doing, since for n sufficiently large, ζ 0 will be bounded by (log n) 2 and so, for s > 1, can be absorbed into our bound for Z s . Similarly, we can, and do, replace 2 α ∧ |ζ n s | α by 1 ∧ |ζ n s | α in the denominator of our integrand.
Based on these considerations, let us return to the integral of interest when d ≥ 2. Fixing a ∈ (0, γ) and splitting the integral with respect to time into [0,n a ] + [n a ,n 1−γ t] , we obtain Since the first two terms on the right tend to 0 as n → ∞, it now suffices to show that the last term remains bounded when n is large. By Lemma 5.3 in [BW98], if (p α s ) s≥0 denotes the transition density of (X t ) s≥0 , we have, for every s > 0 and x ∈ R d , and there exists C d,α > 0 (independent of x) such that Hence, for any s ≥ n a and any z ∈ R d such that |z| ≤ (log n) 2 √ s, we can write But since s ≥ n a and |z| ≤ (log n) 2 √ s, we have and so the second term on the right is bounded (after a change to polar coordinates) by while the third term is bounded by Since all the constants depend on neither z (in the range considered) nor s, we deduce that the right hand side of (60) is bounded by which proves (ii).

A Continuity estimates in the stable radii case
Our aim in this section is to prove Lemma 5.2. In fact, we establish a stronger result, of interest in its own right, from which Lemma 5.2 easily follows.
In particular, (ii) implies uniform continuity of the limiting process of allele frequencies. That is: Under the conditions of Theorem 1.5, For every |z 1 − z 2 | < 1, T > 0 and ǫ ∈ (0, 1), Before proving Proposition A.1, let us show how it implies Lemma 5.2.
Proof of Lemma 5.2. Set ǫ n = n −β and ǫ ′ n ∈ [n −β , n −β log n] in (i). Then and it is straightforward to check that the exponent of n on the right hand side is negative for any α ∈ (1, 2). Moreover, for some a > 0, and again one can check that the exponent of n is negative in all dimensions. Thus the right hand side of (63) tends to zero and the lemma follows.
The rest of this section is devoted to the proof of Proposition A.1. Note that the different lemmas that appear in this proof will be shown later in Appendix A.3. in (68) is equal to: substituting in the martingale problem in the usual way, we obtain where S n t is the sum of the remaining three terms comprising the squared integral on the first line. Since all these terms behave in the same way, we shall only bound the first one. Writing as before V r (y, y ′ )(≤ C d r d ) for the volume of B(y, r) ∩ B(y ′ , r), and using Fubini's theorem, we can replace the integral over x by V r (y, y ′ ). Next, as in our estimates of the drift, we split the integrals over y, y ′ according to whether or not y, y ′ ∈ B(z, log n). This gives us the following first bound, using Lemma A.3(iii): Integrating over t and r, we obtain . (78) Secondly, considering the case where y ∈ B(z, log n) and y ′ ∈ B(z, log n) c and using Lemma A.3(iii) and (iv), the corresponding integral is bounded by Integrating over t and r as well, we obtain The case where y ∈ B(z, log n) c and y ′ ∈ B(z, log n) is treated in the same way. Finally, if For t ∈ (T − τ 1 , T ), we apply Lemma A.7 to obtain and correspondinglyq n,I t (θ) for I ⊂ [0, ∞), as well asq n t (θ) =q n,[0,∞) t (θ). Recall the representation of X n using random walks in (71). As X n has independent and stationary increments, the Lévy-Khintchine formula (see e.g. Theorems 2.7.10 and 2.8.1 of [Sat99]) implies that Similarly, we define the limiting Lévy measure as well as the corresponding function ψ, We observe that for all t > 0, |e tψ n (θ) | ≤ 1 and hence |e tψ(θ) | ≤ 1.
The calculations above can easily be repeated for X and ψ, then sin 2 (θ · x/2)dxdr.
Since θ 1 = |θ| ≤ n β , the double integral in the above is bounded below by a constant. Therefore we arrive at the same estimate as in (85) and we have proved (ii).