ON CONVERGENCE OF POPULATION PROCESSES IN RANDOM ENVIRONMENTS TO THE STOCHASTIC HEAT EQUATION WITH COLORED NOISE

We consider the stochastic heat equation with a multiplicative colored noise term on R d for d ≥ 1 . First, we prove convergence of a branching particle system in a random environment to this stochastic heat equation with linear noise coeﬃcients. For this stochastic partial diﬀerential equation with more general non-Lipschitz noise coeﬃcients we show convergence of associated lattice systems, which are inﬁnite dimensional stochastic diﬀerential equations with correlated noise terms, provided that uniqueness of the limit is known. In the course of the proof, we establish existence and uniqueness of solutions to the lattice systems, as well as a new existence result for solutions to the stochastic heat equation. The latter are shown to be jointly continuous in time and space under some mild additional assumptions.


Introduction
The stochastic heat equation considered in this paper is a stochastic partial differential equation (SPDE), which can formally be written as ∂ ∂t u(t, x) = ∆u(t, x) + f (t, x, u(t, x)) + σ (t, x, u(t, x))Ẇ (t, x).
Here, u is a random function on R + × R d , where R + ≡ [0, ∞), and the operator ∆ denotes the Laplacian acting on R d . W is a noise on R + × R d that is white in time and colored in space, for example a spatially homogeneous noise. The coefficients f and σ are real valued continuous functions on R + × R d × R. They are mostly nonlinear, and we pay particular attention to coefficients which are not Lipschitz continuous in u.
We are concerned with convergence of rescaled branching particle systems in a random environment and associated lattice systems, which are infinite systems of stochastic differential equations (SDE), to solutions of (1). Intimately connected to convergence are questions of existence and uniqueness, for the lattice systems as well as for the SPDE. For the more delicate case of non-Lipschitz coefficients, a new existence result is established through the approximation procedure. In this case, uniqueness has to be shown separately to assure convergence.
The choice of SPDE and the study of convergence of associated systems to that equation has been motivated by three factors: (i) The heat equation with a multiplicative noise term that is white in space and time arises in studying the diffusion limit of a large class of spatially distributed (for the most part branching) particle systems. It describes, for example, the weak limit of branching Brownian motion as well as of lattice systems of reproducing populations. Spatially colored noise reflects spatial correlations of solutions to the SPDE. Given the recent focus on interacting particle systems, it is an intriguing question how the stochastic heat equation with colored noise relates to such systems or -as an intermediate step-to infinite systems of SDEs with correlated noise terms.
(ii) Stochastic heat equations of the form (1), where W is white in space and time, have function valued solutions only in dimension one. Thus, connections of these SPDEs to population systems are restricted to a one dimensional state space. Some conditions on the coefficients and the noise are known so that the heat equation with colored noise has function valued solutions in all dimensions. This class of equations can therefore be expected to offer a description for population processes in more general settings. Biologically interesting are in particular the dimensions two and three.
(iii) The particle picture and the approximation by systems of related SDEs provide a representation of a general class of SPDEs that also arise in other areas of application, for example in filtering theory. In our case, the approximation by a system of SDEs leads to a new existence result for the stochastic heat equation with colored noise and non-Lipschitz noise coefficients. Both representations may be exploited further for numerical purposes or the study of properties of these SPDEs.
In the following we elucidate these points a bit further and point out connections to related work. One of the classical examples in the study of spatially structured branching processes is super Brownian motion, also called the Dawson-Watanabe superprocess (see Watanabe [44] and Dawson [6]). It is a process, X, that describes the mass distribution of branching particles on a state space and thus takes values (for example) in M f (R d ), the space of finite measures on R d equipped with the topology of weak convergence. The measure valued process X can be characterised by the following martingale problem: Let Φ ∈ C 2 b (R d ), the space of bounded continuous functions which are twice continuously differentiable, and let X(Φ) ≡ R d Φ(x)X(dx). Then where M (Φ) is a martingale with quadratic variation M (Φ) , and ρ is a constant. The Dawson-Watanabe superprocess can be obtained as the diffusion limit of branching Brownian motion. In this population model, individuals independently follow Brownian paths during their exponentially distributed lifetime, leaving a random number of offspring after their death according to a fixed offspring distribution. As approximations one considers then the empirical measure when particle mass and lifetime are rescaled appropriately, where the sum is over all particles α alive at time t. In the n-th approximation, the branching rate is increased by a factor of n, and each particle contributes a mass 1 n at its position Y α,n t in the state space. In the limit, the Laplacian in (2) corresponds to the spatial motion, the quadratic variation (3) reflects the reproduction with the constant ρ depending on the variance of the offspring distribution as well as on the branching rate. One may take another step back from these approximating population models. Branching Brownian motion itself is the diffusion limit of a branching random walk on a lattice, for example on Z d . As considered by Dawson [7], one may change the order of limits, first taking the diffusion limit for the reproduction, and then rescaling the motion. The intermediate step can now be described by a lattice system of the form Here, x i describes the population size at lattice point i, and m ij migration between site i and j. In the special case relating to the Dawson-Watanabe superprocess, the migration is given by the generator of a simple random walk for which m ij = 1 2d if |i − j| = 1 and zero otherwise. Reflecting that branching is a local property, W i are independent Brownian motions, and the noise coefficients σ i (t, x) = √ ρx take the same shape as in the one dimensional Feller diffusion, see [15]. The latter is the diffusion limit of a Galton-Watson branching process without a spatial component. While the measure valued process X satisfying (2) and (3) is well defined in any dimension, it has a density, which we denote by u, only in dimension one. It has been shown (see Konno and Shiga [21], Reimers [35]) that u is a solution to the stochastic heat equation ∂ ∂t u(t, x) = ∆u(t, x)dt + ρu(t, x)Ẇ (t, x), where W is a one dimensional space-time white noise. Moreover, one can show (see Blount [1] and Kotelenez [22]) that (approximate) densities of the particles in a branching random walk converge directly to the SPDE (6).
The area of superprocesses, in general, has expanded rapidly with the main interest focused on interacting particle systems. A number of variations of (6), also for white noise and d = 1, have been linked to generalisations of the Dawson-Watanabe superprocess and other particle systems. We refer to [11,9,13,30] for an overview and further references.
Apart from its connections to population processes, the stochastic heat equation is naturally a prominent example within the area of SPDE -and thus the references given here will be far from complete. Function valued solutions of the heat equation with multiplicative white noise have been studied in one dimension in a multitude of settings, see for example Dawson [6], Walsh [43], DaPrato and Zabczyk [34], Shiga [38], Pardoux [29], Gyöngy [17], and references therein. In higher dimensions, function valued solutions for the stochastic heat equation with multiplicative colored noise have been investigated. The case of a linear noise coefficient has been treated by Dawson and Salehi [10] and Noble [28]. Amongst others, Kotelenez [23], Peszat and Zabczyk [31,32], Brzeźniak and Peszat [2], Dalang [5], Manthey and Mittmann [25], Tindel and Viens [41] and Sanz-Solé and Sarrà [37] investigate solutions with Lipschitz coefficients. For some results on equations with non-Lipschitz coefficient see Viot [42], DaPrato and Zabczyk [34], Krylov [24] and Kallianpur and Sundar [20]. However, these earlier results are not directly applicable to the agenda considered here due to various assumptions like boundedness of the domain, compactness of the differential operator, Lipschitz continuity of the coefficients or nuclear or spatially homogeneous noise.
The paper is organized as follows. In Section 2 we give some notation and state the main results. In Section 3 we rigorously construct a particle system in a random environment and show that it converges to the martingale problem associated to (1) when W is colored noise and the noise coefficient σ is linear. In Section 4 we consider related lattice systems with non-Lipschitz noise coefficients and correlated noise terms, and establish their existence and uniqueness in weighted l p spaces. We then prove existence of the corresponding stochastic heat equations with non-Lipschitz noise coefficients on weighted L p spaces. Convergence of approximate densities of the rescaled lattice systems is shown provided that uniqueness holds for the limit. Section 5 shows that, under some additional assumptions, the solutions constructed in section 4 are jointly continuous in time and space.

Formulation of the main results
Let (Ω, F, F t , P) be a complete probability space. We use C as a generic constant, which may change its precise value from line to line. Frequently, we list the quantities that the constant C depends on in parentheses. Let C ∞ c be the infinitely differentiable functions with compact support. The space D(R + , E) denotes the càdlàg functions from R + → E, endowed with the Skorohod topology, and C(R + , E) the closed subspace of continuous functions endowed with the supremum norm.
The noises W considered in this work are Gaussian martingale measures on R + × R d in the sense of Walsh [43]. They can be characterized by their covariance functional . We call the function k : R d × R d → R the correlation kernel of W. We remark that some sufficient conditions for the existence of a martingale measure W corresponding to k are that J k is symmetric, positive definite and continuous. Thus, necessarily, k(x, y) = k(y, x) for all x, y ∈ R d . Continuity on C ∞ c , is implied, for example, if k is integrable on compact sets.
We also note that a general class of martingale measures, spatially homogeneous noises, can formally be described by (7). Here, k(x, y) =k(x − y), and one can show that all spatially homogeneous noises are of this form if we allowk to be a generalised function on R d . White noise is probably the most prominent example of this class, which we recover fork = δ 0 , the delta function. Also, J k describes a nuclear martingale measure if and only if k ∈ L 2 (R d × R d ). See Sturm [40] pp 18 for more detail.
Here, we focus on colored noises for which k ∈ C b (R d × R d ). In this case, W is a random field on R + × R d . We remark that by lettingk approach a δ 0 -function, this case may still -in some sense-be considered as a "smoothing out" of white noise.
Throughout this work we consider solutions to the formal equation (1) in the mild form in the sense of the following Definition 2.1. Let p be the d-dimensional heat kernel, We will sometimes abuse notation and abbreviate p(t, x − y) = p(t, x, y).
Definition 2.1 A stochastic process u : Ω × R + × R d → R, which is jointly measurable and F t -adapted, is said to be a (stochastically) weak solution to the stochastic heat equation (1) with initial condition u 0 , if there exists a martingale measure W, defined on Ω, such that a.s. for almost all x ∈ R d , x, y)f (s, y, u(s, y))dyds x, y)σ(s, y, u(s, y))W (dy, ds).
The process u is called a (stochastically) strong solution to (1) if (9) is fulfilled a.s. for almost all x ∈ R d for a given W.
We assume that the coefficients f, σ : R + × R d × R → R are continuous and satisfy the following growth condition. For all T ≥ 0, there exists a constant c(T ), such that for all 0 ≤ t ≤ T, x ∈ R d , and u ∈ R, As solution spaces we consider L p -spaces on R d for p ≥ 2 with some weight function γ : and define We write L p (R d ) if γ ≡ 1. As a choice for the weight function we consider γ λ (x) = e −λ||x|| for λ > 0. However, other integrable weight functions, in particular any positive continuous function that equals γ λ outside a bounded region could be used (cf. the remarks in Section 4.5). It can be shown by standard methods (see for example Walsh [43] and Sturm [40] Proposition 3.2.3 for detail in this specific case) that mild solutions as in Definition 2.1 satisfy the corresponding martingale problem, Here, M t (Φ) is a continuous square integrable martingale with given quadratic variation for a class of test functions Φ, that depends on the regularity of the solution sought. With the appropriate class of test functions, solving the martingale problem is indeed equivalent to finding a stochastically weak solution to (9), see Sturm [40] pp 103.
Interpreting u once again as the density of a measure, we note that the martingale problem (13) to (14) makes sense for measure valued solutions if σ(t, x, u) = C σ u and f (t, x, u) = C f u are linear in the solution u, where C σ and C f are constants, possibly dependent on t and x. In Section 3 we define a branching particle system that converges to this solution in a measure sense. We do so in a more general setting since the arguments are identical, but take C f ≡ 0 and C σ = 1 for notational convenience.
In the model we consider, the particles move independently from each other on a locally compact Polish space E with their motion given by a Feller generator (A, D(A)), where D(A) is the domain of A. At given times, each particle may branch into two particles or die, or just live on. The main difference to the Dawson-Watanabe superprocess lies in the fact that the distribution of the branching behavior is dependent on a random environment which is correlated in space but independent in time: At each branch time we consider an independent copy of a random field ξ on E. We assume that ξ is symmetric, and that for some > 0, uniformly in all x ∈ E. The correlation of ξ at different points in space is given by vanishing at infinity. The probabilities for a birth/death event to happen at a branch time are given by the positive/negative part of the (appropriately truncated) random field evaluated at the location of the particle. These birth/death probabilities are rescaled by 1 √ n in the n-th diffusion approximation. For the rescaled empirical measure X n (defined as in (4)) we can then establish the following result: is the unique solution of the following martingale problem. For all Φ ∈ D(A), is a continuous square integrable F X t − martingale with The model is inspired by Mytnik [27], who considers a related branching mechanism. In comparison, branching is a rather rare event in our model: As n → ∞ the particles just live on for the majority of branch times. As a result, the branching does not give rise to the white noise term which is present in most superprocesses, in the archetypical Dawson-Watanabe superprocess as well as in Mytnik's limiting superprocess. For E = R d and A = ∆, any density u of X is a solution to (13) and (14) with σ linear and f ≡ 0, corresponding to a weak solution of the linear heat equation with no drift. For E = Z d and A the discrete Laplacian we see that solutions X to the martingale problem are solutions (in l 1 (Z d )) to the lattice system (5) for f i ≡ 0 and σ i (t, x) = x, and correlated Brownian motions As in the work by Mueller and Perkins [26], nonlinear noise coefficients can be expected to arise from the above particle picture by an additional density dependence of the branching mechanism. In Section 4 we consider such nonlinear noise coefficients which may, in particular, be non-Lipschitz. Since in this case we need to show convergence of approximate densities directly to the solution of the limiting SPDE (rather than in a measure sense), it is convenient to start with the corresponding lattice systems instead of the particle model itself (see Funaki [16] and Gyöngy [18] for this approach applied to related systems). Thus, in Section 4, we first consider existence and uniqueness questions of the following system: for all i in a countable index set S. We write X for ( where k ij are constants. The lattice system that interests us in particular is contained in this description by setting for each i ∈ S, In analogy to the definition of solution spaces in the continuous setting, (11) and (12), we consider solutions with continuous paths in the space is a weighted l p -norm on the index set S and Γ = (γ i ) i∈S ∈ l 1 (S). We define the following growth and Lipschitz conditions on f m and σ i : For any T ≥ 0 there exists a constant c(T ) so that for all 0 ≤ t ≤ T, Strong existence and uniqueness of lattice systems of the form (20) with independent noise terms W i have, for example, been investigated by Shiga and Shimizu [39]. With correlated Brownian motions they have not been considered in such detail. However, as we show in Section 4, existence and uniqueness results for solutions in the space l p Γ carry over from the uncorrelated systems, leading to the following Theorem: Assume also 0 ≤ k ij ≤ K for all i, j ∈ S, and some constant K > 0, as well as Γ = (γ i ) i∈S ∈ l 1 (S) with all γ i > 0. Let f m and σ i for i ∈ S be continuous in their components, f m with respect to the product topology on R S . Suppose that the growth condition (23) holds for f m and (25) holds for σ i and all i ∈ S.
Then there exists for each initial condition X(0) ∈ l p Γ a solution X to the infinite dimensional system (20) with paths in C(R + , l p Γ ). We have the bound If we furthermore assume that f m satisfies (24) for p = 1 and σ i satisfies (26) for all i ∈ S, then solutions to (20) are pathwise unique.
We remark that, as in the result for finite dimensional SDEs, see [45], pathwise uniqueness together with existence of (stochastically) weak solutions implies the existence of (stochastically) strong solutions. The following corollary demonstrates that the infinite dimensional SDEs, that will subsequently be used as approximations to the stochastic heat equation, are covered by Theorem 2.3 and gives some conditions on positivity of solutions. The latter property is of interest if the system and its limit is interpreted as representing particle densities. We continue to show that the approximate densities of the appropriately rescaled systems converge to solutions of the stochastic heat equation with colored noise (9) in the spaces L p γ λ (R d ). We start with the following system defined on the rescaled lattice 1 n Z d . For z n = (z n 1 , . . . , z n d ) ∈ 1 n Z d we define du n (t, z n ) = ∆ n u n (t, z n )dt + f (t, z n , u n (t, z n ))dt + n d σ(t, z n , u n (t, z n ))W (dt, I n z n ), (28) where, denoting the unit vectors on R d by e i , the operator in the first term is given by This discrete Laplacian ∆ n is the generator of a simple random walk, Y n , on the rescaled lattice, for which time has been speeded up by a factor 2dn 2 . Hence, jumps to any neighboring site happen independently at rate n 2 . The noises W (t, I n z n ) are derived from a colored noise W on R d with covariance given by (7) and k ∈ C b (R d × R d ). Specifically, where the intervals I n z n are defined by Hence, W (t, I n z n ) are correlated one dimensional Brownian motions and we note that for k bounded we have W (t, I n z n ) ≈ n −2d , explaining the factor n d in definition (28). For putting (28) in its mild form, we define heat kernel approximations with the help of the random walk Y n . Set for z n ,z n ∈ 1 n Z d , We will extend the lattice systems to all of R d as step functions. For this define the projection κ n (x) ≡ z n for x ∈ I n z n . The associated heat kernels on all of R d are given bȳ Note thatp n is not any more a function of x − y. Instead, we will later use the translation invariance of p n and the fact that κ n (x) − κ n (y) = κ n (x − κ n (y)) for all x, y ∈ R d . In order to simplify notation we abbreviate occasionallyp n (t, x) ≡ p n (t, 0, κ n (x)), and writep n d , p n d and p d to indicate the dimension if necessary.
The rescaled lattice systems, u n (t, x) ≡ u n (t, κ n (x)), can now be written in the mild form for Observe that if u n understood as a function on the lattice 1 n Z d has paths in C(R + , l p Γ λ ( 1 n Z d )) it follows from the properties of γ λ (see (66)) that the extension to R d has paths in C(R + , L p γ λ (R d )). We can now state the main theorems proven in Section 4. The first is an existence result for solutions to (1) with non-Lipschitz coefficients.
Theorem 2.5 Assume that the coefficients f (t, x, u) and σ(t, x, u) are real valued functions on R + × R d × R that are continuous in x and u, and satisfy the growth condition (10). Assume further that E [||u 0 || p γ λ ,p ] < ∞, for some p > 2. Let W be a colored noise of the form (7) such that ||k|| ∞ ≤ K < ∞. Then there exists a (stochastically) weak solution, u ∈ C(R + , L p γ λ (R d )), to the stochastic heat equation (1) with respect to W. For any T > 0 there exists a constant C(T ), so that The second result additionally states convergence of the associated lattice systems to solutions of (1) -provided that uniqueness is known for the limit.
Theorem 2.6 Let f, σ, u 0 and k satisfy the conditions of Theorem 2.5. Assume further that there exist (stochastically) strong solutions, u n , to the approximating lattice systems (34). If in addition pathwise uniqueness holds for the heat equation (1) then convergence in probability of u n to u on the space C(R + , L p γ λ (R d )) holds. If weak uniqueness holds for (1) we obtain weak convergence of u n to u on the space C(R + , L p γ λ (R d )).
Not surprisingly, pathwise uniqueness -and thus convergence of the approximations-holds if the coefficients satisfy Lipschitz conditions (see Peszat and Zabczyk [32]). But it can also be shown that pathwise uniqueness holds for the lattice systems if the drift coefficients are Lipschitz continuous and σ satisfies the conditions of Yamada and Watanabe [45]. The latter condition is for example fulfilled for σ(t, x, u) = u θ for θ ≥ 1 2 . For the special case σ(t, x, u) = √ u and some conditions on u 0 , arguments inspired by those of Yamada and Watanabe show pathwise uniqueness for the limiting equation (1) on all of R d , see Sturm [40] Chapter 3.3. As the colored noise analogue to the Dawson-Watanabe superprocess, for which pathwise uniqueness is an open question, this is a particularly interesting case.
Finally, we comment on the setting and conditions of Theorem 2.5 and Theorem 2.6 in Section 4.5 before proving continuity of the constructed solutions in Section 5. Let C γ λ (R d ) be the space of continuous functions on R d , endowed with the weighted supremum norm, Theorem 2.7 Let u be a solution of (1) with coefficients that satisfy the growth condition (10). Let p > 2, d < p − 2, and assume that E[||u 0 || p ∞,γ λ p ] < ∞, as well as (35). Then , and for any T > 0 there exists a constant, C(T ), so that The arguments follow along the lines of those given in Brzeźniak and Peszat [2], who restrict themselves to coefficients f and σ which are Lipschitz continuous, but consider more general unbounded correlation kernels. In fact, Lipschitz continuity is not crucial for this result as becomes apparent in the proof -which is for bounded k particularly accessible and therefore given.
We finally remark that Hölder continuity for the stochastic heat equation with colored noise has recently been investigated with similar means. We have just become aware of a result by Sanz-Solé and Sarrà [37] whose setting is close to ours. It differs through the assumption of Lipschitz coefficients and the assumption of a stronger (uniform) bound on the solutions -focusing instead on the most general conditions on the correlation kernel.

A particle system in a random environment
In the following, we rigorously construct the branching particle system in a random environment in Section 3.1 and give the proof of Theorem 2.2 in Section 3.2.

Construction of the particle system
In keeping track of the particles and their genealogy we follow the construction of Walsh [43], which has been used by Perkins [30], and -in a setting similar to ours-by Mytnik [27]. Let all particle be labeled by The quantity |α| = N specifies the generation of the particle. The unique ancestor of α = (α 0 , . . . , α N ) k generations back is denoted by α − k ≡ (α 0 , . . . , α N −k ). We note that I is the index set for all possible particles since in our model there are at most two offspring. Which particles really exist is decided by the offspring distribution.
In the n-th approximation, branching events happen at times i n for i ∈ N. For t ∈ R + , we set t n = [nt] n , the last branch time before t. Now let {Ỹ α,n } α∈I be a collection of independent Feller processes with generator A andỸ α 0 = 0. The path of a particular particle and its ancestors is then given by Here, x α 0 is the initial position of the first particle, and Λ is a "cemetery"-state. Thus, each particle moves independently during its lifetime, starting from its birth place. The branching behavior is dependent on the random environment. Let ξ be as in (15) to (17). In order to define the offspring distribution of the approximating particle systems we need to truncate the random fields. For all x ∈ E set Analogously to (17), we now define Let (ξ n i ) i∈N be independent copies of the truncated random field ξ n on E. Now let (N α,n ) α∈I be a family of random variables so that {N α,n , |α| = i} are conditionally independent given ξ n i , and the position of the particles in the i-th generation at the end of their lifespan. Denoting by ξ n+ i and ξ n− i the positive and negative part of ξ n i respectively, the conditional offspring probabilities are given by According to the offspring distribution we trim the branching tree down to its existent particles. For any particle α = (α 0 , . . . , α N ) we write α ∼ n t whenever the particle α is alive at time t, which is the case if and only if α had an unbroken line of ancestors. Thus, α ∼ n t for all t with Lastly, we need to define a filtration. It will be the natural filtration generated by the process, We remark that the environment is not a part of the filtration, and will therefore be averaged in each step. In the studies of random media this is called the "annealed" case in contrast to the "quenched" case, where one considers statements for almost all random environment. The quenched case of a similar model to the one considered here, called the parabolic Anderson model, has been studied in some detail, see for example Carmona and Molchanov [4]. For the branching times t n , we also define a discrete filtration, that will be used later in conditioning. Note thatF n tn = F n (tn+n −1 )− includes the sigma-algebra generated by the motion of the particles born at time t n , but not that generated by their offspring distribution or the random environment at time t n + n −1 .
Before proceeding to the proof of Theorem 2.2 we put the rescaled empirical measure X n (see (4)) into a form which gives an intuitive idea how the limit emerges. For Φ ∈ D(A), α ∼ n t n and t ∈ [t n , t n + n −1 ) we define the F n t -martingales Thus, we have for t ∈ [t n , t n + n −1 ), For the difference of measures between two branch times we get thus By adding all the differences from (47) and (48) we finally obtain The term M b,n tn (Φ) is a discrete martingale with respect to the filtrationF n tn , as can be seen easily by conditioning appropriately and using the fact that for α ∼ n t n , because of the symmetry condition (15). The term M r,n this by calculating M b,n . We note first that where we have, for notational brevity, not always explicitly stated where ξ is evaluated. By conditioning we obtain with (50) and (51), The quadratic variation of M b,n is thus given by In the following, we will show that the first two terms of this expression vanish in the limit while the third term converges to M as given in (19).

Proof of Theorem 2.2
The proof of convergence proceeds in several well known steps. First, we show tightness of the sequence in D(R + , M f (Ê)), whereÊ is the one point compactification of the space E. This implies relative compactness and thus the existence of a convergent subsequence in the compactified state space. We then show that all limit points of the sequence are in C(R + , M f (E)) and that they are solutions to the martingale problem given by (18) and (19). The uniqueness of the martingale problem finally implies convergence of the particle system. In order to show tightness of the measures X n in D(R + , M f (Ê)) we start with several lemmas. We define for a process Y in D(R + , R), δY t ≡ Y t − Y t− , specifying the height of the jumps of the process Y. The first lemma gives us moment bounds on X n and the branching martingale M b,n , and states that the jumps of the latter vanish in the limit.
PROOF. We first obtain an L p (Ω)-estimate on M b,n (Φ) tn as given in (52). For 0 ≤ t ≤ T and p ≥ 1 we have for C where we have used the fact that, for all Let T (j) be the semigroup on E j of j independent motions with generator A, and define for x, y ∈ E the function d n (x, y) = Φ(x)Φ(y)k n (x, y). Then, Note that the last line follows from ||d n || ∞ ≤ ||Φ|| 2 ∞ K and T (j) being a contraction semigroup as well as Jensen's Inequality. Similarly, We also need a bound on the jumps of M b,n (Φ). For doing this we define the decomposition δM b,n tn (Φ) = δB 1,n tn (Φ) + δB 2,n tn (Φ), where Indexed lexicographically by α ∼ n t n and conditioned on σ(F n tn ∪ξ n |α| ), each δB 1,n tn (Φ) is a discrete martingale with respect to its natural filtration since E[(N α,n − 1)|ξ n |α| (x)] = 1 √ n ξ n |α| (x). Using the martingale properties, we obtain for C where in the third inequality we have used Burkholder's Inequality for discrete martingales (see [3]) resulting in some constants C. For the fourth inequality note that |N α,n −1− 1 √ n ξ n |α| (Y α,n tn+n −1 )| ≤ 2. We are left to estimate δB 2,n (Φ) : where we have first applied Jensen's Inequality to the particle sum. Now, because A is a conservative operator we have for Φ ≡ 1 that AΦ ≡ 0. Thus we obtain, from (46) and (49), X n t (1) = X n 0 (1) + M b,n tn (1). By the same version of Burkholder's Inequality as above and for small enough as in (16), setting p = 1 + 2 , it now follows that The last line and the choice of the constant C(T, K), which is independent of n, followed from (53) to (55) and (56) as well as (57). But the function T → E sup 0≤t≤T X n t (1) 2+ < E[(X n 0 (1)2 nT ) 2+ ] is a bounded measurable function, and thus we can apply Gronwall's Lemma to obtain the bound E sup 0≤t≤T X n t (1) 2+ ≤ C(T, K). This completes the proof of (i) since E sup 0≤t≤T X n t (Φ) 2+ ≤ ||Φ|| 2+ ∞ E sup 0≤t≤T X n t (1) 2+ . Property (ii) follows now from the calculations in (58) with an additional constant depending on ||Φ|| ∞ and the boundedness of the mass shown in (i).
Property (iii) follows from (56) and (57) combined with the boundedness of the total mass shown in (i). 2 Using the above, specifically Lemma 3.1(i), we can now show that both, M r,n (Φ) as well as the terms in (52), become indeed negligible.
Since the motion of the particle system is no different from that of the Dawson-Watanabe superprocess considered by Perkins [30] in the same set-up we may simply refer to his Lemma II.4.3 for proof. To show convergence of the remaining terms we need the following lemma, which is a consequence of condition (16): .
Since for some > 0, E |ξ(x)| 2(1+ ) and certainly also E [|ξ(x)|] are uniformly bounded on E by (16), the above converges to zero uniformly on E × E. 2 With this property we deduce: We now show that the martingale as well as the integral part of X n are C-tight, meaning that they are tight and have continuous limit points. For this we use the following criterion (see Theorem 8.6 of Chapter 3 in Ethier and Kurtz [14] and Proposition VI.3.26 of Jacod and Shiryaev [19]).
Theorem 3.5 Let (G, r) be a complete and separable metric space and let X n be a sequence of processes with paths in D([0, ∞), G). The sequence is tight in D([0, ∞), G) if the following conditions hold: (i) X n t is tight for every rational t ≥ 0, Furthermore, a sequence of processes is C-tight in D([0, ∞), R) if it is tight and satisfies for all N ∈ N and > 0. PROOF. Lemma 3.1(ii), together with Markov's Inequality and Burkholder's Inequality, implies that M b,n tn (Φ) and M b,n (Φ) tn are tight in R for any fixed t ≥ 0. Hence, condition (i) of Theorem 3.5 is satisfied. To complete the tightness proof we show condition (ii) of the theorem. We note first that for any fixed n, 2n and t ≥ 0. In order to obtain the uniformity of estimate (59) in n consider for 0 ≤ t ≤ T and 0 ≤ u ≤ δ, using (52) and the calculations in (53) to (55), Due to Lemma 3.1(i) there exists, for all > 0, an n such that for all n > n and δ small enough the last quantity is bounded by . Theorem 3.5 now implies the tightness of M b,n (Φ) . The tightness of M b,n (Φ) follows with analogous arguments and the observation that It remains to show C-tightness of X n , which is done by showing C-tightness of its components.
Given their tightness we just need to show (60) according to Theorem 3.5. For the quadratic variation we just need to observe that which converges to zero by (61). For M b,n (Φ) itself, the same condition has already been shown in Lemma 3.1(iii). The arguments for C n t (Φ) follow the same pattern using the boundedness of ||AΦ|| ∞ and Lemma 3.1(i). 2 Denote byÊ = E ∪Θ the one point compactification of E. The generator A and its semigroup T t are extended toÊ by setting for f ∈ C(Ê), Proposition 3.7 The X n are a tight sequence in D(R + , M f (Ê)) and all limit points are continuous.
Here, the first term converges weakly by assumption, the branching part M b,n (Φ) and C n (Φ) are C-tight in D(R + , R) by Lemma 3.6. The martingale M r,n (Φ) converges to zero in C(R + , R) in L 2 (Ω) by Lemma 3.2 so certainly also in law. Thus X n (Φ) is C-tight in D(R + , R) for Φ in a dense subset of C b (Ê). As M f (Ê) is compact, Theorem 2.1 of Roelly-Coppoletta [36] now implies tightness in D(R + , M f (Ê)). All limit points X must have continuous sample paths since X(Φ) is continuous for Φ in a dense subset of C b (E). 2 Now, let X n k be a subsequence which converges weakly in the space D(R + , M f (Ê)). By Skorohod's Representation Theorem we can find a probability space and on it a sequenceX n k such that L(X n k ) = L(X n k ) withX n k converging toX almost surely in D(R + , M f (Ê)).
Lemma 3.8 For any a.s. convergent subsequenceX n k ,M b,n k t (Φ) converges in probability for each t ≥ 0 and Φ ∈ D(Â). The limit is a square integrable continuous martingale, M (Φ), with quadratic variation given by (19).
PROOF. By the continuity of the limit, for all t ≥ 0, sup 0≤s≤t |X n k s (Φ) −X s (Φ)| → 0 a.s., and so t 0X n k s (ÂΦ)ds → t 0X s (ÂΦ)ds a.s.. By Lemma 3.2, sup 0≤s≤t M r,n s (Φ) → 0 in L 2 (Ω). Thus, converges in probability on D(R + ,Ê). The limit is a square integrable martingale because of Lemma 3.1(ii). It is continuous since all the terms in (62) have continuous limits. It only remains to show that the quadratic variation converges to the appropriate expression. Thus, consider a.s.
Here, the first term converges to zero by Lemma 3.3 and Lemma 3.1(i  [11]). The remainder terms of the quadratic variation, see (52), converge to zero in L 1 (Ω) due to Lemma 3.4. Thus, the expression (19) is the a.s. limit of M b,n k (Φ) t . 2 Lemma 3.9 The limit takes values in space C(R + , M f (E)).
PROOF. ConsiderΦ l ∈ C b (Ê) Such thatΦ l → 1 Θ andÂΦ l → 0, both boundedly pointwise. Choose, for example,Φ l (x) = 1 0 T s e −l||x−Θ|| ds. Since for all t ≥ 0, X t ∈ M f (Ê) a.s. this implies that X t (ÂΦ l ) → 0 a.s. for all t. Now All terms are uniformly bounded by Lemma 3.1(i) the first and last converge to zero by the Dominated Convergence Theorem. The second term is bounded by K T 0 E[sup 0≤s≤t X t (Φ l ) 2 ]dt. Thus, as it is again bounded for each l by Lemma 3.1(i), we obtain E[sup 0≤t≤T X t (Φ l ) 2 ] → 0, by Gronwall's Lemma. By Lebesgue's Dominated Convergence Theorem we obtain sup 0≤t≤T X t (Θ) = 0 a.s. and the process X takes indeed values in M f (E), which implies C-tightness in D(R + , M f (E)). 2 The proof of Theorem 2.2 is now complete upon noting the following result which is contained in the main theorem of Mytnik [27].
Theorem 3.10 Solutions to the martingale problem (18) and (19) are unique in distribution.
The proof is based on the observation that X would be dual to itself if it was sufficiently regular: If u and v are the densities of two independent processes that satisfy the martingale problem (18) to (19), then it would follow that For proving uniqueness, it suffices then to construct a suitably regular approximation to X, and apply an approximate duality argument.

Convergence to the heat equation
We start with an outline of the proof of Theorem 2.3 in section 4.1. It follows arguments of Shiga and Shimizu [39] closely, and we will therefore only be explicit about the necessary modifications. We then prove Corollary 2.4 in Section 4.2, which shows that the lattice systems that approximate the heat equation fulfill the conditions of Theorem 2.3.
Subsequently, in Section 4.3, we cite some auxiliary lemmas, whose proofs are postponed to the appendix. The lemmas are crucial in the following Section 4.4, where we give the proof of Theorem 2.5 and Theorem 2.6 by showing tightness and convergence of subsequences of the approximations. We conclude with some remarks on the obtained results in Section 4.5.

Proof of Theorem 2.3
We approximate the solution X to (20) by finite dimensional diffusions. So choose S n ⊂ S finite such that S n ↑ S as n → ∞, and let X n be the solution of the diffusion for i ∈ S n . Set x n i (t) = x i (0) for i / ∈ S n . Note that the W n i can be represented by a linear combination of at most n independent Brownian motions. Thus, existence of weak solutions with continuous sample paths is a classic result, see for example, Theorem 3.10 of Chapter 5 in [14].
The key of the proof is to obtain a uniform bound on the approximating finite dimensional solutions X n in the norm || · || Γ,p , which can then be used to bound temporal differences in the same norm. This implies tightness in C([0, ∞), R S ) where R S is equipped with the product topology, and limit points satisfy (20). That the solutions in fact live on C([0, ∞), l p Γ ) is obtained by transferring the uniform bound from the approximations to the limit points by Fatou's Lemma. In order to obtain a uniform bound we apply Gronwall's Lemma combined with a stopping time argument. Define T (N,n) = inf{t ≥ 0 | ||X n (t)|| Γ,p ≥ N } and consider .
While the first term is bounded by assumption the next two can be bounded by the term C(c, T, K)(1 + The estimates (64) and (65) combined with Theorem 3.5 show that each coordinate is tight in C(R + , R). By a diagonalisation argument one can then find a weakly convergent subsequence in C(R + , R S ), where R S is equipped with the product topology. Using Skorohod's Representation Theorem and the continuity of the coefficients one can show that all limit points solve (20) for each i ∈ S, see Sturm [40] pp 69 for detail. This completes the proof of existence. As remarked earlier this argument does not imply any convergence on C(R + , l p Γ ), and thus (27) needs to be verified separately for the solution to the infinite dimensional lattice system. By (64) and Fatou's Lemma we obtain first sup 0≤t≤T E[||X(t)|| p Γ,p ] < ∞. From this, (27) follows by a calculation analogous to that of (63). The a.s. continuity of the sample paths in the space l p Γ follows from this bound with a calculation similar to that in (65).
Pathwise uniqueness follows now with the same calculations as for the uniform bound (see (63) following) if the Lipschitz conditions on the coefficients are assumed. Here we deduce that the difference of two solutions with respect to the same noise must be zero by using the Lipschitz conditions (24) and (26) where we have previously used the growth conditions (23) and (25).

Proof of Corollary 2.4
For existence of solutions we merely have to verify (23).
Here, we have first applied Jensen's Inequality and the growth condition (25). Then we used that γ λ (·) is summable over S. Finally, we note that the term in parentheses is bounded by a constant since for |i − j| ≤ C m , γ λ (j)/γ λ (i) ≤ e λC(Cm,d) .
For uniqueness we have to verify (24), which works in an analogous way, using the Lipschitz condition (26) instead of the growth condition (25).
Positivity follows with arguments identical to those in [39].

Auxiliary lemmas
In this section, we state a number of technical lemmas, proofs of which can be found in the appendix. Lemma 4.1 estimates spatial and temporal differences of the heat kernelsp n , as well as of the differences ofp n and p (see (8) and (33) for definitions). Lemma 4.2 provides an estimate for the heat kernelsp n and p integrated against the weight function γ λ . In order to show tightness of the approximations we need a compactness criterion on L p γ λ (R d ), which is stated in Lemma 4.3. This is an adaptation of the Frechet-Kolmogorov Theorem to our setting. (i) R dp n (t, x, y)dy = R d p(t, x, y)dy = 1 for all x ∈ R d , t ≥ 0.
(iv) sup 0≤h≤δ sup x∈R d R d |p n (t + h, x, y) −p n (t, x, y)| dy → 0 uniformly in n as δ → 0 for each t > 0. The analogous result holds for p.
Lemma 4.2 Let γ λ (x) = e −λ||x|| , and λ ∈ R. Then there exists a constant C(δ, λ) → 1 as δ → 0 such that Also, for all T ≥ 0, there exists a constant C(T, λ) independent of n such that for all x ∈ R d and 0 ≤ t ≤ T, R dp and likewise is relatively compact if and only if the following conditions hold, where B α is the ball with radius α.
is relatively compact if the above conditions hold for Lebesgue measure replaced by γ λ (x)dx.

Proof of Theorems 2.5 and 2.6
We first show tightness of the rescaled systems defined in equation (34). In the case of Theorem 2.5, we consider approximating systems for which the coefficients f and σ in definition (34) are replaced by Lipschitz continuous approximationsf n andσ n , and can therefore apply the same arguments as in the case of Theorem 2.6.
Thus, through the convergence of subsequences we are able to prove existence of weak solutions to the heat equation with colored noise for continuous coefficients that obey a linear growth bound, see Theorem 2.5. When the strong existence of the approximating systems and uniqueness of the SPDE, well-known for Lipschitz coefficients, and for non-Lipschitz coefficients investigated in Sturm [40], is known, Theorem 2.6 establishes convergence of the approximations.
By the assumptions in 2.6, we have existence of (stochastically) strong solutions to the system (34) with initial conditions u 0 . In this case, we setf n (t, x, u) = f (t, κ n (x), u) and likewise forσ n . By the continuity of f and σ we obtain pointwise convergence: For all (t, x, u) ∈ R + × R d × R as n → ∞,σ n (t, x, u) → σ(t, x, u) andf n (t, x, u) → f (t, x, u).
In order to obtain approximations driven by a given noise W to the SPDE of Theorem 2.5 we exploit the continuity of f and σ and definef n andσ n which converge pointwise as in (69), and satisfy in addition to the growth condition (10) the Lipschitz condition for all t ∈ R + , x ∈ R d , and u, v ∈ R. Corollary 2.4 now implies pathwise uniqueness and thus existence of strong solutions to (34) with initial condition u 0 and coefficientsf andσ, and so we define in this casef n (t, x, u) =f (t, κ n (x), u) and likewiseσ n . Note that these functions also satisfy (69).
The proof of tightness now proceeds by showing condition (i) and (ii) of Theorem 3.5. First, Proposition 4.4 gives a uniform bound on u n in the || · || γ λ ,p -norm using Lemma 4.2. Proposition 4.5 estimates temporal and spatial differences of u n in this norm following a similar line of arguments. Here, we exploit the fact that, since u n is a mild solution, we can estimate temporal and spatial differences of the heat kernelsp n instead. Their properties are more amenable and the necessary results have been given in Lemma 4.1.
Subsequently we prove compact containment, condition (i) of Theorem 3.5, in Proposition 4.6. Here, we use the compactness criterion of Lemma 4.3, which states that we need apart from a uniform bound and an estimate of spatial differences -shown previously-also a uniform estimate of the tails of the u n . Tightness follows finally since condition (ii) is fulfilled by the convergence of temporal differences already completed in Proposition 4.5.
For A 3 we use the factorisation method first introduced by DaPrato, Kwapień and Zabczyk [33], which is based on the fact that for 0 < α < 1, .
We then define for some function v : as well as so that by the stochastic Fubini Theorem (see Theorem 2.6 of Walsh [43]), R dp n (t − s, x, y)σ n (s, y, u n (s, y))W (dy, ds). Thus Here, we apply Burkholder's Inequality and the growth condition onσ n , as well as ||k|| ∞ ≤ K and Jensen's Inequality (twice) and obtaiñ · (1 + |u n (s, y))|)(1 + |u n (s, z)|)dydzds R dp n (t − s, x, y) · (1 + |u n (s, y)|)dy since α < 1 2 . Thus, we obtain with several applications of Hölder's Inequality, x R dp n (s − s , x, y)dy where we have used in the last inequality that α > 1 p as well as the estimate in (74). Taken together, we obtain that there is a constant C = C(T, c, K, p, λ, u 0 ) independent of n such that for all t ≤ T, g n (t) ≤ C(1 + t 0 g n (s)ds). But each g n is bounded according to (27). Thus, sup n g n (t) ≤ Ce CT =: C(T ) for all t ≤ T by Gronwall's Lemma. 2 Using this bound we can prove the following approximation of differences. For the difference of spatial translations we obtain for all 0 ≤ t ≤ T, PROOF. In order to show (76) we use the decomposition (34) and split the integral into four parts. Here, the stochastic integral part is represented via the factorisation method introduced in (71) to (73) in the proof of Proposition 4.4. Abbreviate the differencep n h (t, x, y) ≡p n (t + h, x, y) −p n (t, x, y). Because the paths of u n are in C(R + , L p γ λ (R d )) we can define pathwise h n,δ = h n,δ (T ) ≤ δ ≤ 1 so that R dp n h n,δ (t − s, ·, y)f n (s, y, u n (s, y))dyds|| p γ λ ,p +E sup 0≤t≤T || t+h n,δ t R dp n (t + h n,δ − s, ·, y)f n (s, y, u n (s, y))dyds|| p γ λ ,p +E sup 0≤t≤T ||J α−1,n J n α u n (t + h n,δ , ·) − J α−1,n J n α u n (t, ·)|| p γ λ ,p For bounding B 1 let us first assume that E [||∆u 0 || p γ λ ,p ] < ∞. Then, Here, we have applied Jensen's Inequality before using Lemma 4.2. For general u 0 consider u 0 (x) ≡ R d p( , x − y)u 0 (y)dy. Observe that ∆p( , x, y) = ( 1 2 ||x − y|| 2 − d )p( , x, y). Thus, with arguments almost identical to those of Lemma 4.2 we obtain R d ∆p( , x, y)γ λ (y)dy ≤ C( )γ λ (x), as well as Notice that the first term bounding B 1 is itself bounded by C(T + 1)E [||u 0 − u 0 || p γ λ ,p ] by Lemma 4.2 and |p n h n,δ (t, x, y)| ≤p n (t + h n,δ , x, y) +p n (t, x, y) for all t ∈ R + , x, y ∈ R d . Also by Lemma 4.2 we obtain that for all ≥ 0 bounded, a.s. ||u 0 || p γ λ ,p ≤ C||u 0 || p γ λ ,p < ∞. Thus, by first applying Lebesgue's Differentiation Theorem and then using the above bound and Lebesgue's Dominated Convergence Theorem we obtain E [||u 0 − u 0 || p γ λ ,p ] → 0 as → 0. It follows that B 1 converges to 0 uniformly in n as δ → 0 since we can make the right hand side arbitrarily small by choosing and then δ small enough. Similarly to the previous calculations, we obtain for B 2 , uniformly in n as δ → 0 according to Lemma 4.1 (i) and (iv) and Proposition 4.4. Similarly, we can bound (1 + |u n (s, y))|) p γ λ (y)dyds ≤ δ p c p 2C(T + 1)T sup 0≤t≤T E || 1 + |u n (t, y)| || p γ λ ,p → 0 as δ → 0 uniformly in n because of Proposition 4.4. The term B 4 we split into three parts and obtain Assume again that 1 p < α < 1 2 . Thus, by Proposition 4.4 the calculations in (74) render that We can hence estimate the outer integrals of the components of B 4 analogously to the calculations in (75). For B 4,1 this leads to where we have applied Hölder's Inequality and subsequently used (79) in the last step. The quantity now converges to zero by Lemma 4.1 (iv) and Lebesgue's Dominated Convergence Theorem. The term B 4,2 is estimated similarly to B 4,1 , Here, we have used that the function t → t α−1 is monotone. Because of its continuity, the integrand converges pointwise to zero on the interval (0, T ] as δ → 0. It is further bounded by 2 t (α−1) p p−1 , which is an integrable dominating function due to α > 1 p . Thus, convergence to zero follows by Lebesgue's Dominated Convergence Theorem. For the term B 4,3 we obtain Here, we have used that for any h ≥ 0 and t > 0, (t + h) α−1 ≤ t α−1 (α < 1). The integrand is integrable around 0 because of α > 1 p and so we obtain convergence to zero as δ → 0. Taken together, we have now shown that B 4 converges to zero and so (76) follows.
For proving (77), define pathwise x n,δ (t) ∈ R d such that ||x n,δ (t)|| ≤ δ ≤ 1 as well as sup ||x ||≤δ ||u n (t, · + x ) − u n (t, ·)|| p γ λ ,p = ||u n (t, · + x n,δ (t)) − u n (t, ·)|| p γ λ ,p . Since the shift operator is continuous on L p γ λ (R d ), such a x n,δ (t) does exist. Setp n x (t, x, y) =p n (t, x + x , y) −p n (t, x, y). According to (34) we have to bound the following terms, R dp n x n,δ (t−s) (t − s, x, y)f n (s, y, u n (s, y))dyds|| p γ λ ,p +E || t 0 R dp n x n,δ (t−s) (t − s, x, y)σ n (s, y, u n (s, y))W (dy, ds)|| p γ λ ,p ) We now estimate the first term similarly to B 1 , In the second inequality we have used that ||x n,δ (t)|| ≤ δ, together with a shift of variable and (66) of Lemma 4.2 for the first term in the sum. In the third inequality we have estimated γ λ (y) ≤ Cγ −λ (x − κ n (y)) with (66). We have then performed the variable shifts x = x − κ n (y) as well as y = y − κ n (y) , and exploited the shift invariance of p n (see (33)). For fixed n the supremum converges to zero as δ → 0 for almost all x due to Lemma 4.1(v). Since it is bounded by 2 andp n (t, x )γ −λ (x ) is integrable by Lemma 4.2 the result follows for fixed n and any t > 0 by Lebesgue's Dominated Convergence Theorem. Similarly, using the growth conditions onf n andσ n as well as Burkholder's Inequality for the stochastic integral, we obtain that C 2 and C 3 are bounded by where the expectation is bounded according to Proposition 4.4. Thus, with the same arguments as for (80), plus an additional application of Lebesgue's Dominated Convergence Theorem for the time integral, convergence follows for each n fixed as δ → 0.
To obtain convergence uniformly in n we note that the arguments in (80) and (81) are true, uniformly in n, if p replacesp n . Furthermore, when |p n − p| replaces the spatial differences |p n x | in the C i we obtain convergence to zero as n → ∞. For example, the stochastic integral is bounded by which converges to zero according to Lemma 4.1(iii) and Proposition 4.4. Inserting p(·, x, y) and p(·, x + x , y) and using (66) as well as a 3 argument now implies convergence uniformly in n. 2 We obtain the compact containment condition of the approximating sequence of solutions.
Proposition 4.6 Assuming the conditions of Proposition 4.4 we obtain that for each t ≥ 0 and > 0 there exists a compact set C K = C K (t, ) in the space L p γ λ (R d ) so that for all n, PROOF. We start by showing that for each > 0, Using Markov's Inequality, the convergence is implied by (77) of Proposition 4.5. We will also need to show that for all > 0 We define an auxiliary function which, as an immediate consequence of Lemma 4.2, also satisfies (67). Thus, we obtain as in the proof of Proposition 4.4, Since the first term is independent of n and converges to zero as α → ∞ by Lebesgue's Dominated Convergence Theorem, we obtain uniform convergence of (86) to zero by Gronwall's Inequality.
λ , and so (84) follows by Markov's Inequality. Now, by (83) and (84) we can for any > 0 and k ∈ N choose δ k and α k such that sup n P sup Also choose N such that P [||u n (t, ·)|| p γ λ ,p > N ] ≤ 3 , and define the sets By Lemma 4.3, C K is a compact set in L p γ λ (R d ), and by the above definitions we obtain that Finally putting the pieces together, we can now show tightness and identify the limit pointsestablishing the existence statement of Theorem 2.5-and prove the convergence result of Theorem 2.6. For the latter we require another Lemma (see Lemma 4.4 of [18]).
Lemma 4.7 Let E be a Polish space equipped with its Borel σ-algebra. A sequence of E-valued random elements u n converges in probability if and only if for every pair of subsequences u l and u m there exists a subsequence v k ≡ (u l(k) , u m(k) ) converging weakly to a random element v supported on the diagonal {(u, u ) ∈ E × E u = u }.
PROOF OF THEOREM 2.5 and THEOREM 2.6. Taking together the tightness condition for each t ≥ 0, that has been shown in Proposition 4.6, and the estimation of the differences in time given by (76) of Proposition 4.5, we obtain tightness of u n in D(R + , L p γ λ (R d )) according to Theorem 3.5. Since all u n are continuous in time (Theorem 2.3), they are relatively compact in C(R + , L p γ λ (R d )). This implies that we can find a subsequence which converges weakly on C(R + , L p γ λ (R d )) to a process u. By Skorohod's Representation Theorem we can find another probability spaceΩ, and on it a further subsequence,ũ n , as well as a noiseW equivalent in law to u n and W, so thatũ n converges almost surely toũ in C(R + , L p γ λ (R d )). We now show that, by taking a further subsequence if necessary, the right hand side of (34) converges a.s. for all t ≥ 0 in L p γ λ (R d ) to the appropriate expressions for the limit processũ. This implies thatũ satisfies (9) and is thus a solution to the heat equation with colored noise as in Definition 2.1. Following the calculations for B 1 in the proof of Proposition 4.5, we obtain for any t ≤ T, Here, the first term converges to zero as n → ∞ by Lemma 4.1(iii), and the second integral is bounded a.s. by assumption. We consider next − p(t − s, x, y)σ(s, y,ũ(s, y)) W (dy, ds) Here, we split the integrand into a term, D 1 , involving the differences of the convolution kernels, and one, D 2 , involving the differences of the solutions. Applying Burkholder's Inequality and then following calculations analogous to those for B 2 in the proof of Proposition 4.5 we obtain that E[D 1 ] is bounded by |p n (t − s, x, y) − p(t − s, x, y)|σ n (s, y,ũ n (s, y))dy p ds γ λ (x)dx E ||1 +ũ n (s, y))|| p γ λ ,p , which converges to zero by Proposition 4.4 and Lemma 4.1(iii). By choosing a further subsequence if necessary, a.s. convergence follows. To estimate the second difference, D 2 , we define V T ≡ sup n sup s≤T ||ũ n (s, y)|| p γ λ ,p , which is bounded a.s. because of the convergence of theũ n in C(R + , L p γ λ (R d )). As a consequence, we have lim N →∞ P[V T > N ] = 0. Since, by Markov's Inequality, , it suffices to show for any fixed N, With a similar calculation as for D 1 , we bound this expectation by C(p, K, T ) t 0 E ||σ n (s, y,ũ n (s, y)) − σ(s, y,ũ(s, y))|| p γ λ ,p V T ≤ N ds.
Sinceũ n (s, ·) → u(s, ·) in L p γ λ (R d ) a.s. for each s, the right hand side and so also the left hand side of (89) is uniformly integrable in L p γ λ (R d ) a.s. for each s. Therefore, the norm converges a.s. for each s. The conditioning on the event {V T ≤ N } and Lebesgue's Dominated Convergence Theorem, now imply that (88) converges to zero. Thus, D 2 → 0 in probability as n → ∞, and a further subsequence converges a.s..
Taking the two estimates together, we have proven that, for a further subsequence if necessary, (87) converges to zero a.s. for t ∈ [0, T ] and so, since T is arbitrary, for all t ≥ 0. We can perform essentially the same, albeit slightly simpler, calculation to show that for the chosen subsequencẽ u n , R dp n (t − s, x, y)f n (s, y,ũ n (s, y)) − p(t − s, x, y)f (s, y,ũ(s, y))dyds as n → ∞ a.s. for all t ≥ 0. Thus,ũ is a solution to (1), which by Proposition (4.4) and Fatou's Lemma also satisfies E[sup 0≤t≤T ||ũ(t, ·)|| p γ λ ,p ] for any T > 0. By repeating the calculations in the proof of Proposition 4.4 we finally obtain (35). Since (ũ,W ) have the same distribution as (u, W ) we have shown the existence result of Theorem 2.5.
It remains to complete the proof for Theorem 2.6. The weak convergence result follows immediately from weak uniqueness of the limit. For convergence in probability when pathwise uniqueness of the limit is known we consider a pair of subsequences u l and u m . By the tightness on C(R + , L p γ λ (R d )) we can find further subsequences u l(k) and u m(k) that converge weakly on C(R + , L p γ λ (R d )). The above calculation shows that both limit points satisfy the heat equation with respect to W. Thus, the pathwise uniqueness implies that they are equal a.s., and so on the diagonal of E × E. Theorem 2.6 follows now by Lemma 4.7.

Remarks
We finish with some remarks on the setting and proof of Theorem 2.5 and Theorem 2.6. First, we note that we could have considered different function spaces or regularity conditions for solutions to (1) and their approximations. For example, an analogous convergence result of the form sup as n → ∞ can be obtained, at least if the coefficients f and σ are Lipschitz continuous and p ≥ 1 (see Sturm [40] Chapter 4). Here, solutions to the lattice system are established via a Picard iteration scheme, and convergence is shown with similar arguments by directly considering (90). The setting and proof is inspired by Dalang [5], who shows existence and uniqueness of the solution u under the above conditions. However, these proof techniques, in particular Picard iterations, cannot be used directly for non-Lipschitz coefficients. In order to proceed via tightness arguments, an appropriate function space (instead of uniform moment bounds) is needed. While we could have considered approximations and convergence in a space of continuous functions, we have found C(R + , L p γ λ (R d )) to be convenient. That the solutions constructed in this function space nonetheless live in the appropriately weighted space of continuous functions, C(R d , C γ λ (R d )), is shown in the next section (under some additional conditions on p and d).
The space C(R + , L p γ λ (R d )) has been used repeatedly as a solution space in the context of the stochastic heat equation with colored noise, see for example Peszat and Zabczyk [32], who consider existence and uniqueness for the case of Lipschitz coefficients. The space C(R + , l p Γ ), where Γ is simply summable, is used frequently for lattice systems. This prompts us to remark that weight functions other than γ λ could have been chosen. In our calculations, we have -apart from the integrability of the weight function-primarily used the properties of Lemma 4.2. For a sufficiently smooth function these are conditions on the tail behavior. Hence, amongst others, any positive continuous function that equals γ λ outside a bounded region can certainly be used.
Finally, we remark on our rather stringent boundedness assumptions on the correlation kernel k. Both, Dalang [5] and Peszat and Zabczyk [32] cited above, investigate translation invariant k which is singular at the origin. In these works, the smoothness of the heat kernel p is offset against the singularity of k. In our approximations (see for example the calculation in the proof of Proposition 4.4), the approximate heat kernels p n integrated against k need to be estimated. Accordingly, we would need stronger results in Lemma 4.1 that involve a singular k. Difficulties arise from the fact that p n -unlike p-is not known explicitly, and further since statements need to be made uniformly in n. Note that Lemma 4.2 and E[||u 0 || p ∞,γ λp ] < ∞ bound S 1 . To bound S 2 and S 3 we use the factorisation method already introduced in the proof of Proposition 4.4. To demonstrate the argument we focus on the stochastic integral S 3 and define J α−1 and J α as in (71) and (72) withp n ,σ n and u n replaced by p, σ and u. Thus, we obtain Here, we have first used Jensen's and the Cauchy-Schwartz Inequality. We have then used (68) of Lemma 4.2 and (93) of the proof of Lemma 4.1 to see that R d p(t − s, x, y) 2 γ −λ (y)dy ≤ C(T )(t − s) − d 2 γ −λ (x). Subsequently we have used Lemma 4.2 and Hölder's Inequality. In the last step the expectation has been bounded by a calculation as in (74) requiring α < 1 2 . Thus, by (35), (91) is bounded provided that α < 1 2 and (α − 1 − d 2p ) p p−1 > −1, which can be fulfilled if and only if d < p − 2. The term S 2 works similarly, implying the same conditions on α.
In order to see that u(t, ·) ∈ C γ λp for any 0 ≤ t ≤ T, consider a.s. |u(t, x) − u(t, x + x )| for ||x || < 1. The difference can again be bounded by three terms according to (9). The term involving the initial condition converges as ||x || → 0 due to Lemma 4.1(v) and Lemma 4.2. We focus again on the stochastic integral, which may be approximated analogously to (91), and is thus bounded by ||J α u(s, ·)|| p γ λ ,p · ( R d |p(t − s, x + x , y) − p(t − s, x, y)|γ −λ (y)dy)ds By Lemma 4.2 the integral of the heat kernel differences is bounded by C(T, x). Since J α u ∈ L p ([0, T ], L p γ λ (R d )), a.s. it is sufficient by Lebesgue's Dominated Convergence Theorem to note that the integral of the heat kernel differences converges to zero for each s ≤ t. This is again a consequence of Lebesgue's Theorem combined with Taylor's Theorem and Lemma 4.2.
We end the proof by showing that u is in C([0, T ], C γ λp (R d )) for any T > 0, and thus in C(R + , C γ λp (R d )). Once again, we use the definition in (9) and show continuity of the stochastic integral (cf (78)). We note that the drift term can be treated similarly and that the first term converges according to Lemma 4.1(iv) and Lemma 4.2. Hence, consider a.s. ||J α−1 J α u(t + h, ·) − J α−1 J α u(t, ·)|| ∞,λp ≤ || Arguments analogous to those in (91) explain the second inequality. We observe for the first term that the inner integral is bounded by Lemma 4.2 and converges pointwise for each s > 0. Pointwise convergence to zero for s > 0 is also true for the integrand of the first integral in the second term because of continuity. Recall also that T 0 ||J α u(s, y)|| p γ λ ,p ds is bounded a.s. by (74) and Proposition 4.4. Thus, all three terms converge to zero by Lebesgue's Dominated Convergence Theorem as h → 0.
Therefore, combining (92) and (94) implies for all n > K 0 π t − 1 which converges to zero uniformly over 0 ≤ h ≤ δ as δ → 0. Lebesgue's Dominated Convergence Theorem now implies statement (iv) for p. Forp n we use a decomposition as in (96) as well as property (i) to obtain that sup 0≤h≤δ sup x∈R d R d |p n (t + h, x, y) −p n (t, x, y)| dy But by the definition ofp n the term in absolute values equals t+h t n 2 2 (p n 1 (t, κ n (x i ), κ n (y i ) + 1 n ) + p n 1 (t, κ n (x i ), κ n (y i ) − 1 n ) − 2p n 1 (t, κ n (x i ), κ n (y i )) ) dt.
Hence, by property (i) the quantity in (98) is bounded by 2Cdδn 2 , and so converges to zero for each t > 0, which proves (iv) for any givenp n . That the convergence is uniform in n follows now by a 3 argument from the statement for p and the appropriate convergence shown in (iii).
For the first statement of (v) we merely note that, for all x in the interior of the intervals I n (see the definition of κ n ), the spatial differences ofp n are identically zero for δ small enough. But the boundary of these intervals form a null set. To show (v) for p we use arguments analogous to those in (97). For all , δ > 0, find a compact set C such that, for all ||x || ≤ δ and t ≤ T, R d \C p(t, x , y)dy < . Thus, sup ||x ||≤δ C |p(t, x , y) − p(t, 0, y)| dy → 0,as δ → 0. Because of shift invariance in x this establishes the convergence result for p.
Let Y n be a simple random walk as in the definition (32) ofp n . Using the norm equivalence on R d we obtain R dp n (t, x, y)e −λ(||y||−||x||) dy ≤ y n ∈ 1 n Z d In the first inequality we have used the symmetry in x as well as (66) and subsequently Lemma 4.1(ii). By similar arguments (68) follows, see Sturm [40] p. 75 for detail.