The spatial Lambda-Fleming-Viot process with fluctuating selection

We are interested in populations in which the fitness of different genetic types fluctuates in time and space, driven by temporal and spatial fluctuations in the environment. For simplicity, our population is assumed to be composed of just two genetic types. Short bursts of selection acting in opposing directions drive to maintain both types at intermediate frequencies, while the fluctuations due to 'genetic drift' work to eliminate variation in the population. We consider first a population with no spatial structure, modelled by an adaptation of the Lambda (or generalised) Fleming-Viot process, and derive a stochastic differential equation as a scaling limit. This amounts to a limit result for a Lambda-Fleming-Viot process in a rapidly fluctuating random environment. We then extend to a population that is distributed across a spatial continuum, which we model through a modification of the spatial Lambda-Fleming-Viot process with selection. In this setting we show that the scaling limit is a stochastic partial differential equation. As is usual with spatially distributed populations, in dimensions greater than one, the 'genetic drift' disappears in the scaling limit, but here we retain some stochasticity due to the fluctuations in the environment, resulting in a stochastic p.d.e. driven by a noise that is white in time but coloured in space. We discuss the (rather limited) situations under which there is a duality with a system of branching and annihilating particles. We also write down a system of equations that captures the frequency of descendants of particular subsets of the population and use this same idea of 'tracers', which we learned from Hallatschek and Nelson (2008) and Durrett and Fan (2016), in numerical experiments with a closely related model based on the classical Moran model.


Introduction
A fundamental challenge in population genetics is to understand the balance between adaptive processes (selection) and random neutral processes (genetic drift). The most studied example of adaptation is directional selection acting on a single genetic locus.
In the simplest model, each individual is either of type a or type A at the locus under selection, and the relative fitnesses of individuals carrying the two types is 1 + s 0 : 1, for some small parameter s 0 . At least provided that random fluctuations don't eliminate the favoured type before it can become established, natural selection will act to remove variability from the population until, in the absence of mutation, everyone is of the favoured type. However, there are other forms of selection that act to maintain genetic variation. In this paper we are concerned with populations that are subject to changing environmental conditions, that cause relative fitnesses of different genotypes to fluctuate in time and space. To quote GILLESPIE (2004, [20]), "If fitnesses do depend on the state EJP 26 (2021), paper 25. Evolution in a fluctuating environment of the environment, as they surely must, then they must just as assuredly change in both time and space, driven by temporal and spatial fluctuations in the environment." We shall suppose that our population occurs in just two types (alleles), {a, A} and that the environment fluctuates between two states, in the first of which a, and in the second of which A, is favoured. We suppose that selection is sufficiently strong that if the environment did not fluctuate, the favoured type would rapidly fix in the population, but that there is a 'balance' between the two environments so that both types can be maintained at non-trivial frequencies for long periods of time. If the population has no spatial structure, then over large timescales the frequency of a-alleles can be modelled by a stochastic differential equation: where B 1 and B 2 are independent Brownian motions, the first (as we shall explain in Section 3) capturing the randomness due to genetic drift (that is the randomness due to reproduction in a finite population), the second encoding the random fluctuations in the environment (which are assumed to happen quickly on evolutionary timescales). The constant s is a scaled selection coefficient (see Section 3). For a population distributed across a one-dimensional spatial continuum, one can write down an analogous stochastic partial differential equation: x))W (dt, dx), (1.2) where W is space-time white noise (capturing genetic drift) and the independent noise W is white in time, but may be coloured in space reflecting spatial correlations in the environmental fluctuations. In the biologically most relevant case of two dimensions, this equation has no solution, and we must find a different approach. The difficulties with modelling genetic drift in populations evolving in higher dimensional spatial continua, often referred to as 'the pain in the torus', are well known; see BARTON, ETHERIDGE and VÉBER (2013, [4]) for a review. They can be overcome using the spatial Lambda-Fleming-Viot process, introduced in ETHERIDGE (2008, [14]) and rigorously constructed in BARTON, ETHERIDGE and VÉBER (2010, [3]), and here we adapt that model to incorporate fluctuating selection.
Our first result deals with the non-spatial case. We take a scaling limit of the Lambda-Fleming-Viot process and recover (1.1), which coincides with that obtained by GILLESPIE (2004, [20]) as a scaling limit of a Wright-Fisher type model. We then turn to the scaling limit of the spatial Lambda-Fleming-Viot process with fluctuating selection. In dimension one, the limiting process coincides (up to constants) with the stochastic p.d.e. (1.2). In higher dimensions, the term corresponding to genetic drift vanishes in the limit, but the effects of the fluctuations in the environment can still persist, resulting in a stochastic p.d.e. driven by (spatially) coloured noise.
Our ultimate aim is to find ways to distinguish the effects of spatial and temporal environmental fluctuations on genetic data. This would involve understanding the genealogical trees relating individuals in a sample from the population. Although for our prelimiting model we can write down an analogue of the ancestral selection graph of KRONE and NEUHAUSER (1997, [28]), NEUHAUSER and KRONE (1997, [38]), which tracks all 'potential ancestors' of individuals in a sample from the population, this process seems to be rather unwieldy. Moreover, when we apply our rescaling, the scaled ancestral selection graphs do not converge and we have not found a satisfactory way to extract genealogies for the limiting model. When selection does not fluctuate, the original ancestral selection graph can be thought of as a moment dual to the forwards EJP 26 (2021), paper 25. in time diffusion describing allele frequencies in the population. It is natural to ask whether there are other dual processes that we could exploit when selection fluctuates. Our attempts to find a useful dual for the equation (1.2) have met with limited success, but in Section 5 we show that (after an affine transformation) there are circumstances in which a branching and annihilating dual exists.
In the absence of a useful dual process, instead we take a first step towards understanding ancestry in the population by following an interesting approach of HAL-LATSCHEK and NELSON (2008, [23]) and, more recently, DURRETT and FAN (2016, [13]) which uses the idea of 'tracers' to explore the way in which descendants of a subpopulation of the type a individuals evolve forwards in time. In Section 6 we write down the system of stochastic p.d.e.'s that will determine the tracer dynamics. This idea is exploited further in our numerical experiments of Section 7.
The rest of the article is laid out as follows. In Section 2 we very briefly outline some of the biological background. In Section 3 we consider the case in which the population has no spatial structure. To prepare the ground for the case of spatially structured populations, we work with the Lambda-Fleming-Viot process (also sometimes known as the generalised Fleming-Viot process) that was introduced in DONNELLY and KURTZ (1999, [12]), BERTOIN and LE GALL (2003, [6]). In particular, we investigate different scaling limits, reflecting longtime behaviour of the process for different balances between the rate of changes of environment and the strength of selection. In Section 4 we define the spatial Lambda-Fleming-Viot process with fluctuating selection and give a precise statement of our scaling result for this model. In Section 5, we discuss the situations in which we can investigate the limiting process through duality with a system of branching and annihilating particles. Tracers are introduced in Section 6 and then explored numerically (for a Moran model of a subdivided population) in Section 7. The proof of our main scaling limit is in Section 8. The appendices contain some (important) technical results that we require in the course of the proofs.

Biological background
In this section we outline the biological context for this work. Although not a prerequisite for understanding the mathematics of subsequent sections, it explains our motivation for tackling this particular scaling limit.
Suppose that a gene occurs in just two forms that, because of environmental fluctuations, each finds itself subject to short alternating bursts of positive and negative selection. Even if these changes are happening on a much faster scale than neutral evolution, they may influence gene frequencies. For example, in diploid individuals (carrying two copies of the gene), a heterozygote (carrying one allele of each type) may have higher mean fitness, when we take account of different environments, than either homozygote, and so allelic variation can be maintained for long periods, even though at any given time the population is subject to directional selection. This marginal overdominance is an example of balancing selection. In equation (1.1) we see this in the deterministic term (sp(1 − p)(1 − 2p)dt) on the right hand side. WRIGHT (1969, [50]) observed that spatial heterogeneity in the direction of selection combined with density dependent reproduction can also lead to balanced polymorphism, that is adaptive alleles are held at intermediate frequencies for long periods (see also DELPH and KELLY (2014, [10]) and references therein).
One of the recurring arguments in evolutionary biology is whether evolution occurs principally through natural selection or through neutral processes, in which no particular genetic type is favoured, such as genetic drift. A data set that has sat at the heart of this debate for the last 70 years is a time series of changes in the genotype frequency EJP 26 (2021), paper 25. of a polymorphism of the Scarlet Tiger Moth, Callimporpha (Panaxia) dominula, in an isolated population at Cothill Fen near Oxford, UK. FISHER and FORD (1947, [17]) found that the proportion of a certain medionigra allele in the population increased significantly between 1929 and 1941 from 1.2% to 11.1%, and decreased to 5.2% between 1941 and 1946. They concluded, ". . . the observed fluctuations in generations are much greater than could be ascribed to random survival only. Fluctuations in natural selection must therefore be responsible for them.". Fisher (a strong proponent of the importance of selection) was challenged by Wright (a champion of genetic drift) who argued that multiple factors could be at play and, moreover, Fisher may have underestimated the strength of genetic drift. O'HARA 2005 (2005, [41]) analysed the, by then 60 year long, time series of data from Cothill and concluded that most of the pattern of variation in the population should be attributed to genetic drift. Moreover, although selection is acting, mean fitness barely increased.
It is unusual to have such a long time series of data, especially in conjunction with information about the environment. In general it will also be far from clear which genes are undergoing selection, rather one tries to infer the action of selection through studying neutral diversity. For most populations, it may be very difficult to distinguish fluctuating selection from genetic drift. To see why, we recall a model due to Gillespie that captures the effect of a series of 'selective sweeps' through a population. Suppose that a selectively favoured mutation arises at some point on the genome and rapidly increases in frequency (until the whole population carries it). Because genes are arranged on chromosomes, different genes do not evolve independently of one another. As a result of a process called recombination, correlations between genes decrease as a function of the distance between them on the chromosome. Nonetheless, a neutral allele fortunate enough to be on the same chromosome as the selectively favoured mutation will itself receive a boost in its frequency (even if as a result of recombination it doesn't exhaust the whole population). This boost to the type at the neutral locus is known as 'genetic hitchhiking', a term introduced by MAYNARD SMITH and HAIGH (1974 [35]). Of course, correspondingly, a neutral allele associated with an unfavoured type will decrease in frequency. GILLESPIE (2000, [18]), GILLESPIE (2001, [19]) investigated a model in which strongly selected mutations which give rise to hitchhiking events occur at the points of a Poisson process. He assumes that selection is strong enough that the duration of the sweeps causing the hitchhiking events that affect a given locus is small compared to the time between them so that we can ignore the possibility that a locus will be subject to two simultaneous hitchhiking events. He establishes that the first two moments of the change in allele frequency at the neutral locus over the course of a hitchhiking event take exactly the same form as if they had been produced by genetic drift over a single generation of reproduction. It is not hard to see that we will see the same hitchhiking effects under 'partial sweeps' driven by environmental fluctuations. This process of 'genetic draft' induced by selection, strongly resembles genetic drift and it may be hard to distinguish the two. BARTON (2000, [2]) also considers the 'genetic drift' induced by hitchhiking.
Not surprisingly, the impact of environmental fluctuations on genetic variation has been extensively studied. Nonetheless, even in the absence of spatial structure, it remains an open question to characterise situations under which fluctuating environmental conditions can maintain genetic variation; see e.g. NOVAK and BARTON (2017, [40]). Moreover, the effects of genetic drift have been largely ignored. This is perhaps because acting in isolation, genetic drift typically impacts gene frequencies over periods of (tens of) thousands of generations, much longer than the time scales of climatic fluctuation. However, once a particular genotype becomes rare, perhaps as a result of a run of unfavourable environments, stochastic fluctuations will be dominated by genetic drift, EJP 26 (2021), paper 25. through which the genotype can be lost.
A further challenge in identifying genes that are subject to fluctuating selection is that, even if we can disentangle the effects of drift, numerous selection schemes lead to forms of balancing selection. For example, in the absence of spatial structure, allele frequency dynamics under fluctuating selection are identical to those under within-generation fecundity variance polymorphism. In this setting, TAYLOR (2013, [47]) shows that the effects on the genealogy at a linked neutral locus will differ. FIJARCZYK and BABIK (2015, [16]) and the references therein provide an overview of theoretical and empirical evidence for various forms of balancing selection and methods for their detection.
Recently, BERGLAND, BEHRMAN, O'BRIEN, SCHMIDT and PETROV (2014, [5]) reported hundreds of polymorphisms in Drosophila melanogaster whose frequencies oscillate among seasons and they attribute this to strong, temporally variable selection. They also cite evidence that genetic (and phenotypic) variation is maintained by temporally fluctuating selection for a variety of other organisms. GOMPERT (2016, [21]) proposes an approach to quantifying variable selection in populations experiencing both spatial and temporal variations in selection pressure. In spite of this combination of theoretical and empirical evidence for the importance of fluctuating selection, we have only a limited understanding of some basic questions: how many loci are subject to temporally fluctuating selection? How strong is that selection? What is the relationship between temporally and spatially varying selection?
Since natural environments are never truly constant, it is clearly important to understand the implication of temporally and spatially varying selection pressures. CVIJOVIC, GOOD, JERISON and DESAI (2015, [9]) examines some of the implications of temporal fluctuations. Our work here is a step towards a tractable framework in which to consider the combined effects of spatial and temporal fluctuations.

The (non-spatial) model
We first consider a population without spatial structure. Although we would obtain exactly the same scaling limits if we were to use the classical Moran or Wright-Fisher models as the basis of our approach, c.f. GILLESPIE (2004, [20]), for consistency with what follows, we shall work with the (non-spatial) Lambda-Fleming-Viot process. The key ideas that will be required in the spatial setting already appear here, where they are not obscured by notational complexity. An analogous scaling limit is obtained for the Wright-Fisher model with fluctuating selection (using similar reasoning) in HUTZENTHALER, PFAFFELHUBER and PRINTZ (2018, [25]).
We shall restrict ourselves to the special case of the Lambda-Fleming-Viot process in which reproduction events fall at a finite rate, determined by a Poisson process. We shall also suppose that there are just two types of individual, {a, A}. In each event, a parent is chosen from the population immediately before the event, and a portion u of the population is replaced by offspring of the same type as the parent. In general the quantity u, which we shall call the impact of the event, may be random. Selection (on fecundity) can be incorporated by weighting the choice of parent, to favour one type or the other, and we shall extend previous versions of the model to allow the direction of selection to fluctuate. More precisely, we have the following definition.  Let Π be a Poisson process defined on R × (0, 1) × (0, 1) with intensity measure dt ⊗ ν(du) ⊗ σ(ds), where ν and σ are some probability measures. Moreover, let Π env be a rate τ env Poisson process, independent of Π (where τ env ∈ (0, ∞)). The state of the environment is a random variable ζ(t) ∈ {−1, 1}. At the times of the Poisson process Π env , ζ is resampled uniformly from {−1, 1}.
The dynamics of {p(t)} t≥0 can be described as follows. If (t, u, s) ∈ Π, a reproduction event occurs. Then: 1. select a parental type κ ∈ {a, A} according to if ζ = 1.

2.
A proportion u of the population immediately before the event dies and is replaced by offspring of the chosen type, that is Poisson process, we resampled it at each reproduction event, by choosing ν to be distributed as the proportion of hitchhikers when a selective sweep occurs at a random distance from our chosen locus, we would recover Gillespie's model of genetic draft at a neutral locus linked to loci undergoing a sequence of selective sweeps.
We shall see that the rate of resampling of the environment (relative to the strength of selection in each event) plays a key role in the long term behaviour of the population.

Scaling limits
In order to simplify the notation still further, we specialise to the case in which the Poisson point process Π of Definition 3.1 has intensity dt ⊗ δū ⊗ δ s for some fixedū and s; in other words we fix the impact and the strength of selection in each event. A general result can be obtained from our calculations below by integration.
In order to obtain a diffusion approximation, we shall speed up the rate of reproduction events by a factor n, but scale down both the impact and the strength of the selection. We write u n , s n for the impact and strength of selection at the nth stage of our rescaling. We shall also scale Π env to have rate n γ , with γ > 0 to be chosen. We shall need the joint generator L (n) of the pair (p, ζ) at the nth stage of this rescaling. We write π for the uniform measure on {−1, 1} and E π for the corresponding expectation. In an obvious notation, for suitable test functions f , we have Expanding the ratios involving s n as geometric series, and using Taylor's Theorem to expand f (as a function of p), we obtain In order to obtain a diffusion limit, we see that we should take nu 2 n to be O(1). If the environment didn't change, then we would require nu n s n to be O(1) and on passage to the limit recover the classical Wright-Fisher diffusion with selection, whose generator, if ζ = −1 say, takes the form Since we are modelling short bursts of strong selection, we set u n = n −1/2ū , s n = n −1/2+α s, (3.2) for some α ∈ (0, 1/4). The restriction α < 1/4 ensures that the error term ns 2 n u n in the expression (3.1) is negligible as n → ∞.
We can then write the rescaled generator in the form To see how we should choose γ, we employ a 'separation of timescales' trick due to KURTZ (1973, [29]). We apply the generator (3.3) to test functions of the form For this choice, we obtain where we have used the fact that L env f (p) = 0 (since f does not depend on ζ) and Evidently, to obtain a non-trivial limit we should take γ = δ + α. The most interesting case is when δ = α and so γ = 2α. In that case, letting n → ∞, in the limit the equation Noting that ζ 2 ≡ 1, equation (3.6) then reads There are other limits that can be obtained when γ > 2α. For example if α = 1/4 and we resample the environment at every reproduction event, corresponding to γ > 1, then [36] shows that, under the same scaling of u n , the frequency of type a alleles in the population converges weakly to the solution of dp = 1 2 for a standard Brownian motion {B t } t≥0 . The deterministic drift here arises from the term of order nu n s 2 n that under our previous scaling we were able to neglect in (3.1). Based on these calculations, the following proposition follows easily from Theorem 2.1 of [30], which we recall later as Theorem 8.1. In the interests of space, we omit the details of the proof, which follows from exactly the same arguments as those that we employ in the spatial setting.

Definition and scaling of the SLFVFS
In this section we first extend the Lambda-Fleming-Viot model with fluctuating selection of Section 3 to the spatial setting. The idea is simple: reproduction events are EJP 26 (2021), paper 25. still driven by a Poisson point process, but now, in addition to specifying the strength of selection and the impact associated with each event, we must also specify the spatial region in which it takes place. As has become usual in this framework, we shall take those regions to be closed balls (indeed for simplicity we shall take our events to be of a fixed radius), but the same results will hold under much more general conditions, subject to some symmetry and boundedness assumptions. Having defined the model, we state our main scaling result for the spatial model.

Spatial Lambda-Fleming-Viot process with fluctuating selection
We suppose that the population, which is distributed across R d , is subdivided into two genetic types {a, A}. As explained in detail in ETHERIDGE, VÉBER and YU (2020, [15]), which in turn borrows results from VÉBER and WAKOLBINGER (2015, [49]), formally, at each time the state of the population is described by a measure M t on R d × K, where K = {a, A}, whose first marginal is Lebesgue measure on R d . At any fixed time there is a density w(t, ·) : Of course w(t, x), which one should interpret as the proportion of the population at the location x at time t that is of type a, is only defined up to a Lebesgue null set. We shall topologize M λ by declaring that w n converges to w if for any continuous and compactly supported function f The space M λ with this topology is metrizable and compact, c.f. Section 2 of VÉBER and WAKOLBINGER (2015, [49]). A brief discussion of the topology can be found in Appendix C.
In what follows, we shall consider a representative of the density of M t . It will be convenient to fix a representative w(0, ·) of M 0 and then update it using the procedure described in the definition below, but the reader should bear in mind that the fundamental object is the measure-valued evolution. This becomes important when we talk about convergence of our rescaled processes; tightness will be immediate in the space of measures, but we will need to work harder to identify the dynamics of the density of the limit.
In what follows, for every f ∈ C c (continuous functions of compact support on R d ) we shall use the notation Recall that the limit that we obtained in Section 3 corresponded to our throwing away the terms of order ns 2 n u n in (3.1). In other words we approximated (1 + s)p/(1 + sp) by p + sp(1 − p) = p(1 − s) + s 1 − (1 − p) 2 and similarly p/(1 + s(1 − p)) was approximated by p − sp(1 − p) = p(1 − s) + sp 2 . Under this approximation, since reproduction events are based on a Poisson process of events, we can think of splitting those events into two types: neutral events and selective events. In the non-spatial setting, neutral events occur at rate (1 − s) and, for such an event, the chance that the parent is type a is p. Selective events fall at rate s. One then selects two 'potential' parents. If ζ = −1, EJP 26 (2021), paper 25. then the offspring are type a provided not both potential parents are type A, which has probability 1 − (1 − p) 2 , whereas if ζ = 1, the offspring are type a only if both potential parents are type a (probability p 2 ).
To avoid additional algebra, we shall define the spatial version of our model using this approximation. In our main scaling result, we shall indeed choose our scaling in such a way that ns 2 n u n → 0 as n → ∞.
We shall assume that the random field which specifies the environment is of the form 2X1 B − 1, for a random closed subset B ⊆ R d and X a Bernoulli random variable with parameter 1/2. Our main result will require some additional regularity which guarantees that changes in the environment are typically seen over larger spatial scales than reproduction events. This is made precise in (4.3) below. Let µ be a measure on (0, ∞) and for each r ∈ (0, ∞), let ν r be a probability measure on [0, 1], such that the mapping r → ν r is measurable and Further, fix s ∈ [0, 1] and let Π neu , Π f sel , be independent Poisson point processes on Let Π env be a Poisson process, independent of Π neu , Π f sel , with intensity τ env , dictating the times of the changes in the environment. Let {ξ (m) (·)} m≥0 be a family of independent identically distributed random fields of the form ξ where the covariance function g(x, y) will be assumed to be an element of C b R d × R d . Set τ 0 = 0 and write {τ m } m≥1 for the points in Π env and define In other words, the environment ζ(t, ·) is resampled, independently, at the times of the Poisson process Π env .
The spatial Lambda-Fleming-Viot process with fluctuating selection (SLFVFS) with driving noises Π neu , Π f sel , Π env , is the M λ -valued process M t t≥0 with dynamics described as follows. Let w(t − , ·) be a representative of the density of M t− immediately before an event (t, x, r, u) from Π neu or Π f sel . Then the measure M t immediately after the event has density w(t, ·) determined by: 1. If (t, x, r, u) ∈ Π neu , a neutral event occurs at time t within the closed ball B(x, r).  (c) A proportion u of the population within B(x, r) dies and is replaced by offspring with type κ. Therefore, for each point y ∈ B(x, r), 2. If (t, x, r, u) ∈ Π f sel , a selective event occurs at time t within the closed ball B(x, r).
Then (a) Choose two parental locations l 0 , l 1 independently, according to the uniform distribution on B(x, r). (b) Choose the two parental types, κ 0 , κ 1 , independently, according to (c) A proportion u of the population within B(x, r) dies and is replaced by offspring with type chosen as follows: i. If ζ(t, x) = 1, their type is set to be a if κ 0 = κ 1 = a, and A otherwise.
Thus, for each y ∈ B(x, r), ii. If ζ(t, x) = −1, their type is set to be a if κ 0 = κ 1 = a or κ 0 = κ 1 and A otherwise, so that for each y ∈ B(x, r), Existence of the SLFVFS is guaranteed by the methods of ETHERIDGE, VÉBER and YU (2020, [15]). Indeed, we could have taken different measures µ and ν r according to whether events are selective or neutral. Although it is convenient to take the strength of selection to be constant in space and have its direction determined by the variable ζ ∈ {−1, +1}, we could, of course, have defined a much more general model. For example, one could allow s to vary in space, or even resample sζ from a suitable random field whenever the environment is resampled. However, this would be at the expense of considerably more complicated notation and it would become more involved to exploit the Poisson structure of our model. See Remark 4.4 below for some comments on when our scaling result would generalise. One of the key tools in the study of the neutral SLFV is the dual process of coalescing random walkers which traces out the genealogical trees relating individuals in a sample from the population. An ancestral lineage doesn't move until it is both in the region affected by an event and is among the offspring of that event, at which time it jumps to the location of the parent of the event (which is uniformly distributed on the affected region). Things are more complicated in the presence of selection. Whereas in the neutral case we can always identify the distribution of the location of the parent of each event, now, at a selective event, even knowing the state of the environment, we are unable to identify which of the 'potential parents' is the true parent of the event without knowing their types. These can only be established by tracing further into the past. The resolution is to follow all potential ancestral lineages backwards in time. This results in a system of branching and coalescing walks in which branching and coalescence events are 'tagged' according to the state of the environment at the time at which they occur.
Just as in the neutral case, the dynamics of the dual are driven by the time reversals It is convenient to extend time to the whole of R. The distribution of these Poisson point processes is then invariant under the time reversal. We emphasize that time for the process of ancestral lineages runs in the opposite direction to that for the allele frequencies. Our dual will relate the distribution of allele frequencies in a sample from the population at a time T , to allele frequencies at time 0. More precisely, suppose that we know the frequencies w(0, ·) of a-alleles at time 0. At time T , which we think of as 'the present', we sample j individuals from locations χ 1 0 , . . . , χ j 0 . Tracing backwards in time, we write χ 1 s , . . . , χ Ns s for the locations of the N s potential ancestors that make up our dual at time s before the present.
, enriched by 'environmental tags' as follows: , independently mark the corresponding potential ancestor with probability u; 2. if at least one lineage is marked, all marked lineages disappear and are replaced by a single potential ancestor, whose location is drawn uniformly at random from within B(x, r).
At each event (t, x, r, u) ∈ ← − Π sel : 1. for each χ i t− ∈ B(x, r), independently mark the corresponding potential ancestor with probability u; 2. if at least one lineage is marked, all marked lineages disappear and are replaced by two potential ancestors, whose locations are drawn independently and uniformly from within B(x, r). The type of the environment is recorded.
In both cases, if no particles are marked, then nothing happens.
To determine the distribution of types of a sample of the population w(T, ·), taken from locations χ 1 0 , . . . , χ j 0 , knowing the distribution of w(0, ·) at time T before the present, first evolve the process of branching and coalescing lineages until time T . At time T assign types to χ 1 T , . . . , χ N T T using independent Bernoulli random variables such that . Tracing back through the system of branching and coalescing lineages (χ i t ) Nt i=1 , we define types recursively: at each neutral event the lineages that coalesced during the event are assigned the type of the parent; at a selective event, if the state of the environment is equal to 1 then all coalescing lineages are type a if and only if both parents are type a, otherwise they are type A whereas if the state of the environment is equal to −1, all coalescing lineages are type A if and only if both parents are type A, otherwise they are type a. The distribution of types at time zero is the desired quantity.
Since we only consider finitely many initial individuals in the sample, the jump rate in this process is finite and so this description gives rise to a well-defined process.
This dual process is the analogue for the SLFVFS of the Ancestral Selection Graph (ASG), introduced in the companion papers KRONE and NEUHAUSER (1997, [28]), NEUHAUSER and KRONE (1997, [38]), which describes all the potential ancestors of a sample from a population evolving according to the Wright-Fisher diffusion with selection. Indeed we could be more careful and use this process to extract the genealogy of a sample from the population. However, in this setting, this object seems to be rather unwieldy and, under the scalings in which we are interested, it will not converge to a well-defined limit.  4.3. Informally, the procedure described above allows us to write down an expression, in terms of the marked process of branching and coalescing ancestral lineages ; that is the probability that j individuals, sampled from the present day population at locations χ 1 0 , . . . , χ j 0 , are all of type a. More formally, since the density of the SLFVFS is only defined Lebesgue-almost everywhere, the quantities w(T, χ i 0 ) are only defined for Lebesgue almost every choice of χ 1 0 , . . . , χ j 0 and, just as in ETHERIDGE, VÉBER and YU (2020, [15]) Section 1.2, this duality must be defined 'weakly', that is by integrating against a suitable test function ψ(χ 1 0 , . . . , χ j 0 ). Also mirroring that setting, the resulting 'moment duality' is sufficient to guarantee uniqueness of the SLFVFS. Since we do not use the duality in what follows, we refer the reader to ETHERIDGE, VÉBER and YU (2020, [15]) for details.

Scaling the SLFVFS
We are interested in the effects of fluctuating selection over large spatial and temporal scales and so we shall consider a rescaling of our model. ETHERIDGE, VÉBER and YU (2020, [15]) consider the corresponding process in which selection does not fluctuate with time, but instead always favours type A (say). In that setting it is shown that if impact scales as u n =ū/n 1/3 and selection scales as s n = s/n 2/3 , then as n → ∞, w(nt, n 1/3 x) (or rather a local average of this quantity) converges in d ≥ 2 to the solution to the deterministic Fisher-KPP equation, and to the solution of the corresponding stochastic p.d.e. in which a 'Wright-Fisher noise' term, corresponding to genetic drift, has been added in d = 1. Here we wish to consider short periods of stronger selection and so, by analogy with what we did in Section 3, we choose s n = sn α /n 2/3 for some α > 0, but we change the favoured type at times of mean 1/n 2α . This is of course the scaling suggested by the Central Limit Theorem (and is the natural analogue of our results in Section 3). In order to be able to ignore terms of order ns 2 n u n (as we did in the nonspatial setting) it is easy to see that we must take α ∈ (0, 1/3). In fact, our proof of our scaling result will require α ∈ (0, 1/6).
We must also scale the environment in a consistent way. In the examples that we have in mind, environmental correlations can be expected to extend over very large scales and so we actually fix the correlations in the limiting environment by fixing the distribution of a random field ξ and at the nth stage of the scaling sampling the environment according to ξ n determined by At the nth stage of the rescaling, the environment will be resampled at points of a rate n 2α Poisson process.

Remark 4.4 (Extensions).
We have taken selection to be constant in magnitude and just to vary in sign. This is not necessary, even for our scaling result. As an obvious extension, we could fix the distribution of s(x)ξ(x) and, at the nth stage of the scaling define At the expense of introducing an additional truncation of s(x) at the nth stage of the scaling, to ensure that |s n (x)ξ n (x)| < 1, it is enough to insist that plus some regularity to reflect equation (4.3) below. To avoid a proliferation of notation, we omit this somewhat artificial generalisation of our results. Our definition of the SLFVFS is still rather general. We include it to underline the possibility of extending our results. However, in the interest of avoiding even more complex expressions than those that follow, from now on we shall specialise to fix the radius and impact of reproduction events. Assumption 4.5. From now on, fix R ∈ (0, ∞) andū ∈ (0, 1) and take Just as in ETHERIDGE, VÉBER and YU (2020, [15]), we shall prove convergence, not of the sequence of densities of the SLFVFS, but of a sequence of local averages. We require some notation. Let R be the fixed radius of events. Set and define the sequence of rescaled processes where Bn(x) denotes an average integral over the ball B n (x). We write V R for the volume of a ball of radius R.
. Further, fix α ∈ (0, 1/6), set s n = sn α /n 2/3 , u n =ū/n 1/3 and suppose that the environment is resampled at the times of a Poisson process of rate n 2α . We assume that the correlation function g(x, y) that is determined by the environment is a continuous martingale; EJP 26 (2021), paper 25.
2. in dimension d ≥ 2, w ∞ is the process for which, for every F ∈ C ∞ c (R) and for is a continuous martingale. Moreover, the solution to this martingale problem is unique and so {M n } n≥1 actually converges.
The constant Γ R depends only on R and is defined in (8.19).
The proof of uniqueness in d ≥ 2 uses a uniqueness result of HU, NUALART and SONG [24] for a corresponding stochastic p.d.e.. In Appendix B, we follow the approach of KURTZ (2010, [32]), which uses the Markov Mapping Theorem, to show that any solution to the martingale problem (4.5) is actually a weak solution to the stochastic p.d.e.: where the noise W is white in time and coloured in space, with quadratic variation given The corresponding equation for dimension d = 1 is where W is white in time and coloured in space as above and W is a space-time white noise. As in d ≥ 2, any solution to the martingale problem (4.4) will be a weak solution to this stochastic p.d.e., but we only have a proof of uniqueness of (4.8) in the special case in which W is also space-time white noise (in which case we can invoke the duality of Section 5).

Duality
In general, we have been unable to identify a useful dual process for our limiting equation. The exceptions are the non-spatial setting and the special case of one spatial dimension, with both noises in the stochastic p.d.e. being white in space as well as time.
Consider first the non-spatial case. We rewrite equation (3.7) by making the substitu- The duality relationship takes the form where the expectation on the left is with respect to the law of {X t } t≥0 started from initial condition X 0 , and that on the right is with respect to the law of {N t } t≥0 , started from N 0 .
The proof is an application of Itô's formula. It is easy to see that started from an even number of particles, the dual process will die out in finite time (count the number of pairs of particles and compare to a subcritical birth-death process), corresponding to the process {p t } t≥0 of allele frequencies being absorbed in either zero or one. Of course, this is also readily checked directly for the diffusion (1.1) using the theory of speed and scale, but if we could find an analogous dual for a spatially extended population, where the theory of one dimensional diffusions is no longer helpful, we might be able to exploit it to study the behaviour of allele frequencies.
In one spatial dimension, if the noises W and W are both white in space as well as time, then we can extend this.
The duality is expressed for points χ 1 (0), . . . χ N0 (0) ∈ R and any continuous function where the expectation on the left is with respect to the law of the stochastic p.d.e. and that on the right with respect to the law of the dual system of branching and annihilating lineages.
TRIBE (1995, [48]) gives a construction of the analogous system of coalescing Brownian motions, which is dual to the stochastic heat equation with Wright-Fisher noise, as discussed in SHIGA (1988 [46]). DOERING, MUELLER and SMEREKA (2002, [11]) provide a complete derivation in that context; see also LIANG (2009, [34]). We also note that BIRKNER (2003, [7]) considers a similar system of branching random walkers on Z d , in which particles reproduce at a rate that depends on the number of other particles within the same site.

Remark 5.3.
If we consider subdivided populations (i.e. an analogous model on a lattice), then the analogous system of stochastic (ordinary) differential equations satisfies a duality of this form with a system of branching and annihilating random walks. Moreover, we can apply the analogous transformation to the stochastic p.d.e. with coloured noise, but now, when we seek a branching annihilating dual process, in addition to the branching annihilating term when dual particles meet, we obtain cross terms of the form We can rearrange the factors involving X as If g(x, y) > 0, then we can interpret the first two terms in (5.3) as branching and annihilating terms in a putative dual; if g(x, y) < 0, then the last two terms can be interpreted as one particle jumping to the location of another. However, we have not, in either case, found a way to interpret both terms simultaneously. The obvious approach is to follow ATHREYA and TRIBE (2000, [1]) and introduce an additional 'marker' that switches sign every time we have an event of 'the wrong sign'. This leads to a Feynman-Kac correction term in the duality relation (5.2), which it turns out is infinite.

Tracer dynamics
On passing to a stochastic p.d.e. limit, we have lost sight of the way in which individuals in the population were related to one another and the ancestral selection graphs which encoded that information in the prelimiting models do not converge. However, some information about heredity can be recovered using the notion of 'tracers'. The idea, which finds its roots in the statistical physics literature, has more recently found application in models of population genetics, notably in HALLATSCHEK and NELSON (2008, [23]), or, for a more mathematical approach, see DURRETT and FAN (2016, [13]). The idea is simple: one labels some portion of the population of type a individuals, say, at time zero not just according to their type at the selected locus, but EJP 26 (2021), paper 25. also with a 'neutral marker' that is passed down from parent to offspring. Individuals in the population at time t that carry the neutral marker are precisely the descendants of our original marked individuals.
To introduce this in our setting, let us label a portion of the a population at time zero by a neutral marker. We shall use v(t, x) to denote the proportion of the (total) population that are both type a and labelled, and we shall use a * to denote that combined type. Thus The dynamics are driven by the same Poisson point processes of events as before, but now we modify our description of inheritance to include the extra label. Definition 6.1 (The SLFVFS with tracers). Let ζ, Π neu , Π f sel , Π env be exactly as in Definition 4.1. The dynamics of the pair (v, w) can be described as follows. Write a † to denote individuals of type a, but not a * .
2. If (t, x, r,ū) ∈ Π f sel , a selective event occurs at time t within the closed ball B(x, r). Then: (a) Choose the two parental locations l 0 , l 1 independently, according to the uniform distribution on B(x, r). (b) Choose the two parental types, κ 0 , κ 1 , according to (c) i. If ζ(t, x) = 1, offspring inherit type κ 0 if κ 0 and κ 1 are both type a (with or without the neutral marker), otherwise they are type A; so ii. If ζ(t, x) = −1, offspring inherit type κ 0 if κ 0 is type a, and they inherit type Assume that the sequence of measures of the initial states of the marked population converges weakly in M λ to a measure with a density v ∞ (0, x) = lim n→∞ v n (0, x). Then as n → ∞, the corresponding sequence of rescaled measure-valued processes is tight in D([0, ∞), M 2 λ ), and any limit point is a weak solution to a system of stochastic p.d.e.'s which in d ≥ 2 takes the form where the noise W is as before.
For dimension d = 1, the limiting process is a weak solution to the system of stochastic partial differential equations where, as before, W is white in time and coloured in space and and W i , i = 0, 1, 2, are independent space-time white noises.
The proof of this result is an even longer version of the proof of Theorem 4.6, but it follows exactly the same strategy. First we show that the limit points are solutions to appropriate martingale problems, then we show that any solution to the martingale problem provides a weak solution to the system of stochastic p.d.e.'s. Rather than providing details of the proof, we indicate why this result is to be expected. We once again use the trick of KURTZ (1973, [29]). In the case of one dimension and genic selection, DURRETT and FAN (2016, [13]) obtain a pair of stochastic p.d.e.'s of the form where W 0 , W 1 and W 2 are independent space-time white noises. Writing the corresponding generator acting on test functions f (w, l) as L neu + L sel as in Section 3, to identify the generator in the limit of (appropriately scaled) rapidly fluctuating selection we must evaluate L sel (L sel f (·, ·))(w, v) which, up to constants, is From this we see that the stochastic p.d.e's in Theorem 6.2 are of precisely the form that we should expect.

Numerical results
In order to gain a little more intuition about the effects of fluctuating selection on allele frequencies, in this section we present the results of a simple numerical experiment. It is certainly not an exhaustive study, but it points to some of the challenges that face us in distinguishing causes of patterns of allele frequencies. Our simulations are not of the spatial Lambda-Fleming-Viot models, but of natural extensions of the classical Moran model to incorporate spatial structure and fluctuating selection. After suitable scaling, we expect allele frequencies under these models to converge to the same limiting stochastic p.d.e. as our scaled SLFVFS. In the experiments that follow, we take the lattice L to be a circle of 100 demes with nearest neighbour migration at rate 1. We set N d = 400, s = 0.1 and α = 10. The environment variables in demes 0 − 50 all take the same value, as do those in demes 51 − 100. We consider four different scenarios: To ensure comparability of experiments, the same events are used for all the scenarios, with the only difference lying in the value of the environment variable. Thus, in the neutral case, either individual is equally likely to be the parent in 'selective events', irrespective of type. A more precise description of the code used for simulations can be found in Appendix D.
As a first comparison, Figure 1 shows the proportion of type a individuals across the whole population. This is just a single realisation of the experiment. There is certainly no dramatic divergence from neutrality. However, as we illustrate below, this can mask some more interesting effects at the local level.  In Figure 2, we have used a greyscale to record the proportion of type a individuals in each deme -the darker the colour, the greater the proportion of type a. In the top left, selection is fixed, and we clearly see the effect of type a being favoured in demes 50 − 100. In the next two frames (top right and bottom left), the environment fluctuates, but whereas on the top right demes 0 − 50 always favour the opposite type to demes 51 − 100, on the bottom left all demes always favour the same type. The neutral model is the bottom right. When we repeat over many realisations, we see a greater concentration of types than for the neutral model, but it is certainly not easy to distinguish between the two frames.
Just following the overall proportion of types in a deme is throwing away a lot of information, which may be available in genetic data, about the distribution of families. To explore this we used 'tracers', further investigated in the context of Spatial Lambda-Fleming-Viot processes in Section 6. In Figure 3 we mark individuals descended from the population in particular demes at time zero. The darker the colour, the higher the proportion of marked individuals. In a constant environment, see the left panel, the surviving family is well adapted to the environment in demes 50 − 100, but has difficulty invading demes 1 − 50, where it is not favoured. It also struggles to expand beyond deme 80. This turns out to be because of competition with the equally well adapted family that is descended from individuals that were in deme 84 at time zero (see the right hand panel).
In the left panel of Figure 4, we see a successful family in a fluctuating environment   Figure 5 shows the 'thinning' of the family resulting from the periods of time when it is not favoured.
Finally, Figure 6 shows the trace of descendants of ancestors in demes 12 and 16 for the neutral model. There appears to be a barrier between the two families, which could easily be mistaken for a change in the environment somewhere between demes 12 and 16, on the lower side of which the family descended from deme 12 is better adapted and on the upper side of which the family from deme 16 is better adapted. In fact this is due to competition between two equally fit families.   The environment is fluctuating, with its value in demes 0 − 50 perfectly anticorrelated with that in demes 51 − 100.

Proof of Theorem 4.6
The proof of Theorem 4.6 will rest on Theorem 2.  . Let E 1 , E 2 be complete separable metric spaces, and set E = E 1 × E 2 . For each n, let {(X n , Y n )} be a stochastic process with sample paths in D E ([0, ∞)) adapted to a filtration {F n t }. Assume that {X n } satisfies the compact containment condition, that is, for each > 0 and T > 0, there exists a and assume that {Y n (t) : t ≥ 0, n = 1, 2, . . .} is relatively compact (as a collection of E 2 -valued random variables). Suppose that there is an operator A : Af (X n (s), Y n (s))ds + f n (t) is an {F n t }-martingale. Let D(A) be dense in C(E 1 ) in the topology of uniform convergence on compact sets. Suppose that for each f ∈ D(A) and each T > 0, there exists p > 1 such that Af (X(s), y)Γ(ds × dy) is a {G t }-martingale for each f ∈ D(A).
As a particular case, KURTZ (1992, [30]) provides the following example. Let us briefly discuss how our setup fits into that of Theorem 8.1. The sequence of measure-valued evolutions M n t t≥0 corresponds to the process X n , while the sequence of environments corresponds to the process Y n . The sequence X n takes values in M λ . The state space of the sequence Y n can be identified with the product of {−1, 1} with the collection of closed subsets of R d . Taking {E k } k≥1 to be an increasing sequence of closed and bounded subsets with R d = ∞ k=1 E k , and writing H for Hausdorff distance, , to recover a metric under which this latter space is complete and separable. Notice that Y n defined in this way satisfies the assumptions of Example 8.2 with n replaced by n 2α , and therefore those of Theorem 8.1. The domain of the generator A is given by the closure of Since the state space of M n t t≥0 is compact, the compact containment condition (8.1) is satisfied.
The key step in applying Theorem 8.1 is the identification of the generator A, which leads to the decomposition (8.2). This consists of two steps. First we approximate the part of the SLFVFS generator corresponding to neutral events, and second we deal with EJP 26 (2021), paper 25. the terms corresponding to fluctuating selection. These two steps are accomplished in Subsection 8.1.
Our approach mirrors the strategy of Section 3. Having transformed the generator of the SLFVFS into a form which allows us to consider the process of averaged densities w n t , we use the 'separation of timescales' trick of KURTZ (1973, [29]), to (up to a small error) split the generator into parts corresponding to neutral and selective events in a convenient way; see equation (8.16). We then consider the two terms in this decomposition. The analysis of the neutral part of the generator will closely follow ETHERIDGE, VÉBER and YU (2020, [15]); the novel step is the second, which deals with the terms corresponding to fluctuating selection. Our analysis will not only identify the correct form of A and f n (see (8.28) and (8.29) respectively), but also provides a uniform bound on A, which shows that (8.3) is satisfied.

Identifying the limit
In what follows, C is a constant that depends on certain fixed quantities such as the functions f and F that appear in our test functions, c.f. (8.6), and the quantities s andū appearing in our scaled selection and impact parameters in the statement of Theorem 4.6. The quantity C may vary from line to line; its exact value is unimportant.
Throughout this section we shall be manipulating expressions pertaining to a local average of a version of the density of the SLFVFS. These manipulations are insensitive to changing that density on a set of Lebesgue measure zero and so should really be thought of as results for the measure-valued evolution itself.
Our result concerns the locally averaged process w n . However, before passage to the limit, this process is not Markov (in contrast to w itself). In order to overcome this, we follow ETHERIDGE, VÉBER and YU (2020, [15]) and define f (y)dy.
To alleviate notation, we have suppressed the dependence on n in this quantity. Notice that w n , φ f = where S f denotes the support of f . Here, and throughout, we have used | · | to denote the Euclidean norm. This yields the useful approximation It will be convenient to have some notation.

Notation 8.3.
Suppose that immediately before a reproduction event the unscaled process takes the value w, then immediately after the event, its state will be θ + x,R,ū (w) if the parent of the event was of type a, and θ − x,R,ū (w) if the parent was of type A, where (8.9) Using this notation, The first step will be to evaluate the generator on test functions of the form Notice that these are independent of the environment ζ. Recall that the analogous functions were our starting point in Section 3. Since the support S f of f is compact, it is only overlapped by events at a finite rate and so, writing Lψ F,f , which will be a function of the state ζ(·) of the environment, for the generator of the unscaled process acting on test functions of this form, we have We recognise the generator of the neutral spatial Lambda-Fleming-Viot process (which does not depend on the environment) plus two terms reflecting the selection events. If EJP 26 (2021), paper 25. we set the environment to be identically equal to 1, say, then the surviving selection term is precisely that considered in ETHERIDGE, VÉBER and YU (2020, [15]) in modelling selection in favour of type A individuals. After scaling (and a change of variables), the generator of w n acting on test functions of this form is given by By analogy with (3.3), we write this as (8.14) and, writing s n = n αs n = n α s/n 2/3 , Just as in the non-spatial case, in order to reduce the generator to the form (8.2), we appeal to the approach of KURTZ (1973, [29]). Entirely analogously to our calculations of Section 3 (see (3.4)), we evaluate the generator on test functions of the form G F,f (w n , ζ) = ψ F,f (w n ) + n −α L f sel n ψ F,f (w n , ζ), (8.15) where F ∈ C ∞ (R) and f ∈ C ∞ c (R d ). The second term on the right hand side will form part of f n in (8.2). Evidently we must now account for the changing environment, but this is just the generator of a pure jump process in which a new state is sampled from a (mean zero) stationary distribution at the times of a Poisson process of rate n 2α τ env . Writing the full generator as (1 − s n )L neu n + n α L f sel n + n 2α L env n , as in (3.3), we observe that L env n ψ F,f = 0 (since ψ F,f is independent of ζ). In (8.23) below, we show that Calculating exactly as in the non-spatial case (see (3.5)), we find a spatial analogue of (3.6), that is where we have used that α < 1/6 < 1/3 to absorb E π [L f sel n ψ F,f ] into the error on the left hand side. In this notation, for F ∈ C ∞ (R) and f ∈ C ∞ c (R d )

Neutral part
First we find an approximation to the part of the generator corresponding to neutral events. This mirrors the proof of Theorem 1.8 of ETHERIDGE, VÉBER and YU (2020, [15]) and so we recall the strategy, but omit many of the details.
Taking the expression (8.14) for L neu n , we perform a Taylor expansion of F about w n , φ f to obtain where we have used (8.10) and (8.11) and C n corresponds to the third order term in the Taylor expansion.
The key to controlling A n and B n is a Taylor expansion of φ f . First we use Fubini's Theorem to write  (8.17) where H denotes the Hessian. Using anti-symmetry of z − y, the term involving the gradient vanishes, as do the off-diagonal terms in the Hessian, so that (8.17) is equal to plus a lower order term which is bounded uniformly in the time variable s by Vol(S f )u n C f (where C f is a bound on the third derivative of f and S f is the support of f ). The analogue of (8.7) with φ f replaced by and (using (8.8) with ∆f in place of f ) we deduce that (8.19) and z 1 is the first component of the vector z ∈ R d . The error term tends to zero uniformly in s and since | w n (s), f | ≤ f ∞ Vol(S f ), we have that |A n (s)| is uniformly bounded in s and n. The contribution coming from B n (s) can be treated similarly: EJP 26 (2021), paper 25. (8.20) which implies that the sequence of quadratic variations is not only tight, but also, for dimensions d ≥ 2, it tends to 0 uniformly on compact time sets. In other words, any randomness remaining in the limit in d ≥ 2 will be due to the fluctuations in the environment.
In d = 1, using (8.7) and a further Taylor expansion to approximate φ f (y) by f (x) for y ∈ B n (x) (up to an error of order n −1/3 ) and that n 1/3 u n = u, It is this term that (via an application of Lemma 8.4) leads to the final term in (4.4).

Fluctuating part
In order to tackle the part of the generator which describes the fluctuating selection, we need to understand the action of the operator L f sel n on L f sel n ψ F,f (w n , ζ). It is here that we must diverge from ETHERIDGE, VÉBER and YU (2020, [15]).
As before, although the test function ψ F,f (w n ) does not depend on the environment, the result of applying L f sel n to it is a function that does depend on the environment, to which we must then apply L f sel n . Recall thats n = s/n 2/3 . We take a Taylor expansion of F about w n , φ f to obtain We now need to evaluate L f sel n on the product in (8.23). ζ). (8.24) As in the non-spatial case, these two terms will correspond to the diffusion and drift terms respectively in the stochastic p.d.e..
The difficulty that we now face is that H depends on w 2 , f as well as w, f and it will no longer suffice to use the test function φ f and integrate by parts. Let us consider the effect on w n (x) 2 of an event in the ball B n (y) in which the parent is of type a. After the event, w n (x) 2 is replaced by The corresponding change in w n (x)(1 − w n (x)) is then If the parent is type A, then the change is To evaluate the second term in (8.24), we calculate the effect of an event in B n (y) and integrate with respect to y, taking into account only the first order change in w 2 n (x) as given in (8.25). We can now observe that the second term in (8.24) can be written as performed a Taylor expansion of f around y for |y − x| < R n , and used Fubini's Theorem. Since F ( w n , φ f ) is independent of ζ, the first term in (8.24) can be read off immediately from our previous calculations, and combining all the above, we see that we should like to take A in Theorem 8.1 to be × w n (s, y) (1 − w n (s, y)) f (y) dxdyds + CtO(n −α ), (8.29) where the term of order n −α comes from the second term on the right of (8.15) and we have used that α < 1/6 to absorb a term of order n −1/3 . The additional term in AF ( ·, f ) in d = 1 stems from the term B n (s) in (8.21). Since AF ( ·, f ) is uniformly bounded, condition (8.3) is obviously satisfied. This is where we benefit from our choice of topology on M λ , which guarantees that the operator A maps into the space of continuous functions. To check that E[sup t≤T | f n (t)|] → 0 as n → ∞, we must control the integral terms in (8.29). We shall require the following lemma. The proof is given in Appendix A.
The expectation of the supremum of the second integral in (8.29) is immediately controlled by (8.30).
Since α < 1/6, we may choose δ > 0 in such a way that the right hand side tends to zero, and so we conclude that for fixed T , as n → ∞, as required.
Remark 8.5. This is the only part of the argument where the condition α < 1/6 is required. For all other arguments α < 1/3 would suffice.
Remark 8.6. The estimate on E sup t≤T | f n (t)| depends on the Lipschitz bound on the correlation kernel g(x, y). This is precisely the place where our argument breaks down if we allow the environmental noise to be 'white'. Remark 8.7. In our arguments we have implicitly used the fact that there exists a continuous (in time) modification of the solutions to the limiting martingale problem. Let us briefly sketch an argument which shows that this is indeed the case. (with similar statements for w 2 n and w 3 n then following immediately, since |w n | ≤ 1). In our case, since at each event the measure is changed only within a ball of radius R n , and the size u n of the jump is of order n −1/3 , which tends to 0 as n tends to infinity.

Uniqueness of solutions for the limiting equation
In order to establish uniqueness of the limit points in d ≥ 2, and hence convergence of our rescaled SLFVFS, we consider the corresponding stochastic partial differential equation. The first task is to show that any solution to the martingale problem (4.5) is a weak solution to the stochastic p.d.e. (4.6). We establish this in Appendix B, using a technique of KURTZ (2010, [32]). This then reduces the question to that of the weak uniqueness of the solution to the corresponding stochastic p.d.e., which we deduce from a result of HU, NUALART and SONG (2013, [24]).
In one dimension, although any solution to the martingale problem does indeed give a solution to the stochastic p.d.e., we do not have an analogue of the result of HU, NUALART and SONG (2013, [24]). The only case in which we have a proof of uniqueness of the equation is when W is also space-time white noise, in which case we can use the duality of Section 5. We note, however, that the proof of our main result would need to be modified to capture this form of environmental noise, see Remark 8.6. We do not currently know how to do this.
with initial data X 0 , where b, σ are real-valued, Lipschitz functions and the noise W is white in time and coloured in space with quadratic variation given by Suppose that the correlation kernel g(x, y) satisfies where p t is defined as Then, for any bounded initial data X 0 , there exists a unique adapted process X satisfying Let f * k denote the k-fold convolution of f . Writing w n as an integral and using the notation Rn (x − y) w n (y) − w n (x) dy dx.
Analogously, by setting F (x) = x in (8.24) and using (8.26), we obtain where the function v n in the integrand is a uniformly bounded function dependent on ζ and w n (in a neighbourhood of x) whose exact value will not be important. The main idea is to find a set of time-dependent test functions φ n such that For this purpose, we define a family of functions φ y n (0, A set of test functions which satisfy (A.2) is given by solutions to with initial condition φ y n (0, x). Notice that the initial condition is the density of a uniform distribution over the ball of radius R n centred at y. With this family of test functions we consider a time-dependent martingale problem of the form w n (T, ·), φ y n (0, ·) is a mean-zero martingale and we have suppressed the dependence of v n on ζ and w n in our notation. The question of regularity of w can therefore be translated into a question of regularity of the test function. To be a little more precise, notice that w n (T, y) − w n (T, z) = w n (T, ·), φ y n (0, ·) − w n (T, ·), φ z n (0, ·) = w n (T, ·), φ y n (0, ·) − φ z n (0, ·) and therefore, using the time dependent martingale problem, Remark A.1. We have defined the generator of the process for functions which are twice differentiable in the spatial coordinates. The characteristic function of a ball clearly does not satisfy this assumption. However, we may approximate such a function with a smooth function, making sure that the approximation does not lead to an error of order greater than n −1/3 . We shall ignore this error in what follows.
Observe that φ y n has a probabilistic interpretation. It is the density of the continuous time random walk with jump rate n 2/3 for which the distribution of an increment is given by that of the sum of two independent uniform random variables on B Rn (0), and for which the initial position is a point picked uniformly from the ball of radius R n about y. It will be convenient to write Λ Rn = * 2 Rn . The density of the random walk above, when started from the point y, which we shall denote by ψ y n , can be written as ψ y n (t, x) = ∞ k=0 e −n 2/3 t (n 2/3 t) k k! * 2k As n tends to infinity, ψ n tends to the density of a Brownian motion.
We shall require the following Lemma, which is an analogue of Lemma 3 of MUELLER and TRIBE (1995, [37]). ii) For any y, z ∈ R d , iii) For |x − y| > 1 and λ > 0, ψ y n (t, x) < C λ,T e −λ|x−y| ; iv) For |y − z| < 1 and λ > 0, We defer a sketch of the proof of this Lemma until the end of this Appendix. Armed with Lemma A.2, we may now sketch the proof of (8.30). Since all parts of the estimates have a similar flavour, we focus on the integral term of (A.4). As above, we shall write v n (s, x) for a uniformly bounded function, whose exact value, which depends on ζ and w n (s, ·) in a neighbourhood of x, will not be important. With the intention of using the bounds in Lemma A.2 we write We estimate these three terms separately. First observe that where we have again used Part iv) of Lemma A.2. Finally, I 3 can be bounded by Combining the estimates above, and observing that the term which involves w n (0, ·) can be tackled analogously, we conclude that for |y − z| < 1, Choosing τ = max{|y − z| 1/(d+1) , n −1/3 }, the deterministic term on the right hand side is at most C |y − z| 1/2(d+1) + n −1/6 e λ|y| .
It remains to bound the martingale difference. We use the Burkholder-Davis-Gundy inequality in the form see HALL and HEYDE, (1980, [22]), Theorem 2.11 for the proof in the discrete time setting, the continuous time case follows by approximation. The first term can be controlled in the same way as above, by writing it as an integral of (φ y n − φ z n ) times a bounded function and the jumps of the martingale are of size O(n −1/3 ), so combining with (A.6) (recall that α < 1/6) leads to, for any λ > 0 and |y − z| < 1, which concludes the proof of the Lemma.

2
Proof of Lemma A.2. To prove the first part of Lemma A.2, we first observe that if P t is a Poisson process with rate n 2/3 , then for a > 0, where p(t, z) := 1 (2πt) d/2 exp − |z| 2 /2t , and the error e k is bounded by C/ √ k with C independent of k and x. Now using that the Λ * k Rn in the kth summand in (A.5) is the density function of √ k/n 1/3 times the random variable in (A.9), a change of variable allows us to express it as p(Γk/n 2/3 , x − y) plus an error of order (k/n 2/3 ) d/2 1/ √ k. To estimate the error term in i), we split the sum in (A.5) into two parts, {k ≤ 2eT n 2/3 } and {k > 2eT n 2/3 }. For the first, (k/n 2/3 ) d/2 is uniformly bounded and exploiting (A.8) with a = 1/2 yields a term of the form of the error in i).
(A.10) Evidently this can also be absorbed into the error term. An identical argument controls the part of the sum over normal densities corresponding to k > 2eT n 2/3 , and the proof of i) is complete. For ii), we use i) to approximate both terms by sums of normal densities plus an error of order n −1/3 t −1/2 . Now observe that and sum using the estimate (A.8) to conclude this part of the proof.
For iv), we begin by observing that We focus on bounding the supremum of the difference of ψ. Recall that if a, b > 0, min(a, b) ≤ a 1/(d+1) b d/(d+1) . We use ii) and iii) to bound The result follows immediately from (A.12).

B Equivalence of martingale problem and stochastic p.d.e. formulations
Suppose that d ≥ 2. Write w for a representative of the density of the M λ -valued process, which is characterised as a solution to the martingale problem (4.5). Our aim is to show that w is also a weak solution to the stochastic p.d.e. (4.6). This follows as a special case from the following result.
is a continuous martingale with quadratic variation Then w t is a (stochastically and analytically) weak solution to the stochastic p.d.e.
where the noise W is white in time and coloured in space with quadratic variation The converse of Theorem B.1 is a simple application of Itô's formula. We closely follow KURTZ (2010, [32]), who provides a powerful proof of the corresponding result for stochastic (ordinary) differential equations, not through the classical approach of explicitly constructing the driving noise, but via the Markov Mapping Theorem. The proof of our case requires only very minor modifications of the proof from KURTZ (2010, [32]). This proof is very flexible and could be adapted further without any major difficulties to cover a wider class of equations, e.g. with more singular coefficients Suppose that E is a complete separable metric space and that the operator B ⊆ C(E) × C(E) is separable and a pre-generator and that its domain D(B) is closed under multiplication and separates points in E. Let (E 0 , r 0 ) be a complete, separable, metric space and γ : E → E 0 be Borel measurable. Let P(E) denote the space of probability measures over E. Let α be a transition function from E 0 into E (that is, assume that y ∈ E 0 → α(y, ·) ∈ P(E) is Borel measurable) satisfying α(y, γ −1 (y)) = 1. Define Let µ 0 ∈ P(E 0 ), and define ν 0 = α(y, ·)µ 0 (dy). If U is a solution of the martingale problem for (C, µ 0 ), then there exists a solution V of the martingale problem for (B, ν 0 ) such that U has the same distribution on M E0 [0, ∞) (the space of measurable functions mapping [0, ∞) to E 0 , with topology given by convergence in Lebesgue measure) as where F U t is the completion of σ r 0 h(U (s))ds : r ≤ t) ∨ σ(U 0 ) (for bounded and measurable h) and T U is the set of times for which U (t) is F U t -measurable.
The proof of Theorem B.1 is based on the introduction of an auxiliary process Z, whose generator plays the role of B in the statement of Theorem B.2. This process is introduced in such a way that we can evaluate the conditional distributions of the driving EJP 26 (2021), paper 25. noise given the state of the process w. Those conditional distributions are given in terms of stationary processes.
First we describe the representation of the driving noise in terms of a countable collection of stationary processes. By our assumptions, the noise can be realised as an element of the Hilbert space W −1−d,2 (R d ) (see, e.g. RIPPL (2012, [44])), that is on the dual of the Sobolev space W 1+d,2 (R d ). Therefore there exists an orthonormal basis {φ i } i≥1 such that the noise W is completely characterised by a countable sequence of independent, one-dimensional Brownian and, for any adapted process H, the integral with respect to the noise can be written in We can recover W t (φ i ) (and hence W ) from Y i , by observing the increments. Indeed, Remark B.3. Our definition ensures that Y i (t) are independent. Identifying 0 and 2π, we note that Y = {Y i } i≥1 is a Markov process with compact state space [0, 2π) ∞ . If Y i (0) is uniformly distributed on [0, 2π) and independent of W , then Y i is stationary.
We define an auxiliary process Z by where w is a solution of the martingale problem (B.1) and Y = {Y i } i≥1 . Let A be the generator associated with the martingale problem (B.1). Its domain is given by where L i is the generator of Y i and the last term (which is finite due to our construction of the noise) is determined by the Meyer process between w and Y i . The exact value of the c i 's will not concern us. We write it in this form to stress the dependence on the derivative of f i .
Suppose D(A) is an algebra and that f • X is càdlàg for each f ∈ D(A). Let f 0 · · · , f m ∈ D(A) and h 0 , · · · , h m ∈ B(E). Then is a square integrable martingale with Meyer process where we have usedȳ to denote the vector (y i ) i≥1 . This choice of functions guarantees that N (t) defined as in Lemma B.4 is a continuous martingale with quadratic variation equal to 0. This implies that N (t) = 0. Using (B.4) and rearranging, we observe that this implies that This, in turn, implies that w is a weak solution to (B.2).
Proof of Theorem B.1. We begin by observing that D( A) is closed under multiplication and separable. Separability of the operator follows from separability of its domain. The operator A is a pre-generator and separates points. With our choice of topology on M λ , A maps into the space of continuous functions. We proceed to construct the transition function α. Let η y ∈ P([0, 2π) ∞ ) be the product of independent uniform distributions on [0, 2π). We define α(w, ·) = δ w × η y ∈ P C b (R d ) × [0, 2π) ∞ .
The function γ can be defined as a projection into the first coordinate, that is γ(x, y) = x.
We observe that αL x f (x, y) = L x αf (x, y). (B.6) Since η y has been chosen in a way which guarantees that it is a stationary distribution for Y , we have, by definition, αL y f (x, y) = 0.
An application of the disintegration theorem leads to existence of a density w : R d → [0, 1].
The density can be recovered from the formula M t (dx, dκ) = w(t, x)δ a (dκ) + (1 − w(t, x))δ A (dκ) and is unique up to sets of Lebesgue measure 0. The usual choice of topology on M λ is the topology of vague convergence. To check that the sequence of measures M n ∈ M λ with densities w n converges vaguely to M with density w, it is sufficient to show that for every f ∈ C c R d w n (x)f (x)dx → w(x)f (x)dx.
As Theorem 8.1 requires that the limiting operator maps into the space of continuous functions, it is more convenient for us to consider the topology given by the joint EJP 26 (2021), paper 25. convergence of the first three powers of w n . To be more precise, we require joint convergence of We shall denote the topology generated by this convergence by τ . Note that the topology τ is a priori stronger than the topology of vague convergence.
The following discussion, whose main aim is to show that our choice does not alter compactness of the space, is a minor adaptation of Appendix A of KALLENBERG (2002, [27]). It is summarized by the following theorem. Observe that d(µ, ν) defines a metric on M λ . In particular, observe that since w n (x) and w(x) are uniformly bounded and non-negative, the convergence of w n f (x)dx implies convergence of w 2 n f (x)dx. Separability follows. Completeness will be clear once we prove ii).
Since integration against an element of M λ is continuous for f ∈ C c , the relative compactness of A implies (C.1). For the other direction, assume that the set A ∈ M λ satisfies (C.1). Consider a family of compact subsets of R d × K such that for each n ∈ N the set K n is contained in the interior of the set K n+1 and ∪ n K n = R d × K. Consider a family of continuous cut-off functions g n ∈ C c , satisfying We observe that for each n, the set { g n · µ : µ ∈ A} is uniformly bounded and therefore, by Prochorov's Theorem, it is sequentially relatively compact. The compactness of M λ follows by a diagonal argument.
Remark C.2. Our consideration can be extended to a wider class of spaces and topologies without any major changes. For example, one could take K to be any compact subset of R or require convergence of any higher power of the densities.

D Detailed description of simulations D.1 A general description
We provide a more detailed description of the simulation.

Initialization
The simulation begins by choosing random seeds, separately for the times of reproduction/migration events and for the selection of individuals during those events. The user specifies the parameters of the simulation: the number of demes, their spatial structure (for the simulations reported in Section 7 this is always a onedimensional torus), the number of individuals per deme, a correlation function for the environments, the selection coefficient, the rate of changes in the environment, the frequency of creating records and the time of the simulation. The rates of neutral and migration rates are determined by the number of individuals per deme, and the rate of selection events depends additionally on the selection coefficient. The basic objects in the simulation are initialized. Each deme is assigned an identification number and a list of neighbours, determined by the selected spatial structure (for the one-dimensional torus, each deme has two neighbours). Each individual is randomly assigned a type according to a Bernoulli (1/2) distribution. Each individual is assigned an identification number, according to the deme in which it lives. This number allow us to track the ancestral origin of the population forward in time. The initial state of the environment in each deme is specified. A maximal time of the simulation is set.
The object that determines the times of the events, 'the Clock', is initialised. For each deme, for each type of event, the next time at which an event occurs is encoded. A separate variable with the time of the next environment change is also kept. The Clock is initialised with the exponential distribution with rates specified for each type of an event. The current time is set to 0.
Simulation As long as the time of the simulation is smaller than the maximal time of the simulation the following is repeated.
The clock is asked to return the time, deme identification number α, and type κ of the next event (chosen to be the smallest number on the list of all times stored). This time is saved as the current time in the simulation. A random number is drawn from an exponential distribution with rate determined by the type κ of the event. This new time is added to the current time and passed to the Clock to specify the next time an event of type κ occurs in deme α.
The event of type κ is executed in deme with identification number α. With a given frequency (specified by the user with respect to the number of events that have taken place), a record of the current state of the population is created.

Events
There are four type of events: two types of reproduction events (neutral and selective), migration events and environmental events.
Whenever a reproduction event takes place in deme α, two individuals are chosen uniformly from deme α. One of the selected individuals is then randomly chosen to be a potential parent, using a Bernoulli (1/2) distribution. If the event is neutral, the potential parent is selected as the parent. If the event is selective, the types of the two selected individuals are checked. If they are the same, the potential parent is selected as the parent. If the types are different, the state of the environment is checked. Then the individual with type favoured by the environment is selected as the parent. The type and the ancestral origin of the parent are assigned to the second individual.
Whenever a migration event occurs in deme α, one of the neighbouring demes is randomly selected according to the uniform distribution. A single individual is chosen EJP 26 (2021), paper 25. uniformly from each of deme α and the randomly selected neighbour and the type and ancestral origin of those individuals are swapped.
Whenever an environmental event occurs, a new state of the environment is randomly chosen. For the environmental regimes considered in Section 7, a single number (1 or −1) is drawn randomly from a Bernoulli (1/2) distribution. In the cases of neutral/constant selection, the environment is not changed. In the fluctuating cases, the environment is changed in all demes (consistent with the specified correlation function).
Recording Two separate types of record are created, the 'proportion file' and the collection of 'ancestral origin files'. The number of ancestral origin files is equal to the number of demes, β. The proportion file is a .txt file with β + 3 columns, separated by a single space. The first column contains the information of the time at which records are created. The second column contains the global proportion of type a at the given time. Columns 3, . . . , (β + 3) record the proportion of type a in each deme. Each 'ancestral origin' file is a .txt file with β + 1 columns, separated by a single space. File number γ contains information on the proportion of individuals in each deme with ancestral origin equal to γ. The first column contains the time of creation of the records. Columns 2, . . . , (β + 1) record the proportion of individuals with ancestral origin equal to γ in each deme.

D.2 Implementation
The simulation was implemented in C++14. The Mersenne Twister pseudorandom number generator was used. Figures were created using R, (2015, [43]).