Distance Estimates for Poisson Process Approximations of Dependent Thinnings

: It is well known, that under certain conditions, gradual thinning of a point process on R d + , accompanied by a contraction of space to compensate for the thinning, leads in the weak limit to a Cox process. In this article, we apply discretization and a result based on Stein’s method to give estimates of the Barbour-Brown distance d 2 between the distribution of a thinned point process and an approximating Poisson process, and evaluate the estimates in concrete examples. We work in terms of two, somewhat diﬁerent, thinning models. The main model is based on the usual thinning notion of deleting points independently according to probabilities supplied by a random ﬂeld. In Section 4, however, we use an alternative thinning model, which can be more straightforward to apply if the thinning is determined by point interactions.


Introduction
Thinning is one of the fundamental operations to construct a new point process from a given one. Consider a point process ξ on R d + , and a [0, 1]-valued measurable random field π on R d + . A π-thinning ξ π of ξ can then be obtained in the following way: for any realizations ξ(ω) (point measure on R d + ) and π(ω, ·) (function R d + → [0, 1]), look at each point s i of ξ(ω) in turn, and retain it with probability π(ω, s i ), or delete it with probability 1 − π(ω, s i ), independently of any retention/deletion decisions of other points. Note that multiple points s i 1 = . . . = s i k at the same location are retained independently of one another with equal probabilities. Regard the points left over by this procedure as a realization of the thinned point process ξ π (for a more formal definition see Section 2). We will usually refer to ξ as "the original process", and to π as "the retention field".
Consider now a sequence (π n ) n∈N of retention fields, such that sup s∈R d + π n (s) D −→ 0, corresponding to the idea of ξ being gradually thinned away. To compensate for this effect, we contract the Euclidean space by κ n : R d + → R d + , x → (1/n)x. Then, Theorem 1.A is a standard result (see e.g. Daley and Vere-Jones (1988), Theorem 9.3.III). Convergence in distribution of random measures, and in particular, of point processes, is defined via the convergence of expectations of bounded continuous functions, where continuity is in terms of the vague topology on the corresponding space of (all respectively only Z + ∪ {∞}-valued) boundedly finite measures (for details see Kallenberg (1986), Section 4.1).
Theorem 1.A. We obtain convergence in distribution of the thinned and contracted process ξ π κ −1 n to a point process η if and only if, with Λ n (A) := κ −1 n (A) π n (s) ξ(ds) for every measurable set A ⊂ R d + , we have Λ n D −→ Λ for n → ∞ (1.1) for some random measure Λ on R d + . In this case η ∼ Cox(Λ), i.e. η is a Cox process with directing measure Λ.
The main goal of the present article is to examine the rate of convergence in the above theorem. For that purpose, we use the Barbour-Brown distance d 2 between the distributions of point processes, which is a very natural choice (see Section 2 for the definition and some elementary properties). Due to the method of proof that we use, we will always assume that a mixing condition holds for the random measures Λ n (see Assumption 2 in Section 3). Since Condition (1.1) can be interpreted as the statement of a weak ergodic theorem, it is natural, in view of the usual chain "mixing implies ergodic implies constant limit in the ergodic theorem", that we get a deterministic limiting measure Λ = λ, and hence even a Poisson process as the limiting process η of the contracted thinning. These heuristics can easily be made rigorous if ξ is stationary, and π n = (1/n d )π for a stationary random fieldπ.
The method of our proof is a combination of discretization and an application of the Barbour-Brown theorem for distance estimates for discrete Poisson process approximations where the approximated process has a local dependence property. The same method has been successfully used in Schuhmacher (2005), and in fact, by applying Theorem 2.A. from that paper to a suitable randomization of ξ, we obtain directly upper bounds for the distance we seek here. While these bounds are quite good in special cases (e.g. if our retention field is deterministic, and even more so if it is constant), a direct application of the above method yields considerably better and more intuitive results in the general case.
The thinning of point processes was first studied in Rényi (1957): a renewal process on R + was subjected to an independent thinning with constant retention probability p, and a Poisson limit theorem was obtained for p → 0 upon a change of time scale by a factor 1/p. There have been many generalizations within the theory of point processes on the real line since then, with some of the most comprehensive found in Jagers and Lindvall (1974), Serfozo (1977), and Böker and Serfozo (1983).
Also in Kallenberg (1975) (alternatively, see Kallenberg (1986), Section 8.3), independent thinnings with constant retention probability p were considered, but this time the processes to be thinned were arbitrary point processes on general locally compact, second countable, Hausdorff spaces. Necessary and sufficent conditions were derived for the convergence of thinnings of increasingly "dense" point processes to a Cox limit. This result was generalized in Brown (1979) to position dependent, random retention probabilities, which yielded, up to some negligible details in the setting, exactly the statement of Theorem 9.3.III in Daley and Vere-Jones (1988), from which Theorem 1.A is a direct consequence.
A result regarding distance estimates in thinning theorems may also be found in Daley and Vere-Jones (1988). In Proposition 9.3.IV, the authors give a quite abstract upper bound for the total variation distance between the distributions of the point counts ξ πn κ −1 n (A) and η(A) for any bounded Borel set A. By a rather similar argument, it is possible to obtain a corresponding upper bound for the d 2 -distance between the distributions of the restricted point processes ξ πn κ −1 n | K and η| K , where K ⊂ R d + is an arbitrary compact set. Write d W for the Wasserstein distance on R d + with respect to the Euclidean distance truncated at one (which is denoted by d 0 in this paper), that is d W (P, Q) = sup f f dP − f dQ for probability measures P and Q on R d + , where the supremum ranges over all functions f : R d + → R with |f (s 1 ) − f (s 2 )| ≤ |s 1 − s 2 | ∧ 1 (see e.g. Barbour, Holst and Janson (1992), Appendix A.1 for the general definition and results). The proof of the following proposition can be found in Appendix A.2.
While such an upper bound is very nice from a theoretical point of view, because it is of a comparatively simple form, and because it can be shown that it goes to zero under the general Condition (1.1) if K satisfies P[Λ(∂K) > 0] = 0, it is usually not so easy to calculate. In what follows, we are interested in more explicit upper bounds for the thinning approximation, i.e. upper bounds that can be calculated directly in terms of certain characteristic quantities of the point process ξ and the retention field π. Since, in the (still quite general) case where ξ and π satisfy mixing conditions, and the approximating process is Poisson, we have available well-developed tools from the field of Stein's method for point process approximation, we prefer to make use of these, rather than trying to reduce the right hand side of Inequality (1.2) to more fundamental characteristics of ξ and π.
We start out in Section 2, by providing the necessary definitions and notation along with some technical background. Section 3 contains the main results, namely upper bounds for the d 2 -distance between L(ξ πn κ −1 n | K ) and a Poisson process law under different conditions, as well as an application of these results. Finally, in Section 4, a slightly different notion of a thinning, called Q-thinning, is introduced. A corresponding upper bound is given, and the benefit of the new definition is demonstrated in another application.
The difference between the two thinning models in Section 1-3 and in Section 4, is conceptual rather than a difference in terms of the modeled objects (see Remarks 4.B and 4.E for details on how the resulting thinnings differ). Under π-thinning, points are more or less likely to be deleted according to the conditions they encounter in a random environment (which itself may respond to the point configuration); under Q-thinning, points are more or less likely to be deleted according directly to the interactions among themselves. This difference is nicely illustrated by the two applications in Subsection 3.3 and Subsection 4.2. In both situations, we start with a point process ξ on R d + having "reasonable" first and second factorial moment measures, and obeying an appropriate mixing condition. We have a "basic" thinning effect given by the constant retention probability q (n) 0 , and an additional "characteristic" thinning effect for each of the two examples. The basic and the characteristic effect are combined, and R d + is contracted by a factor 1/n to obtain a point process that is compared to a Poisson process.
In the first example (Subsection 3.3), we consider a random environment given by a union Ξ of balls whose centers form a stationary Poisson process on R d , and whose radii are i.i.d. and bounded with L d -norm r n . The characteristic thinning effect is then provided by deleting all the points that are covered by Ξ (we illustrate this situation with the image of stars that are covered by clouds). In the second example (Subsection 4.2), we give point interactions by assigning i.i.d. real-valued marks to the points. The characteristic thinning effect is then provided by deleting all the points whose distance to some other point having a larger mark is at most r n (we illustrate this situation with the scenario of competition within a plant population).
Roughly speaking, Poisson approximation is good in both examples if the retention probabilities q (n) 0 become appropriately small, or the ranges r n become appropriately large.
It should be noted, that the illustrations of the examples as "visibility of stars" and "plant competition", respectively, may well provide inspiration for modeling similar situations, but are not meant to be serious modeling attempts in themselves.
We end this section by giving an indication of the type of results that are obtained in Section 3. The following proposition covers the important special case when π n = p n is non-random, which follows directly from either Theorem 3.C or Theorem 3.F. The situation corresponds to an independent thinning of the points of ξ with deterministic retention probabilities, each of which may depend on the location of the point it is attached to.
Proposition 1.C (Non-random retention field). Let ξ be a point process on R d + which has an expectation measure µ 1 that has a bounded density with respect to Lebesgue measure, and a second factorial moment measure µ 2 that is bounded on the set of all unit cubes in R 2d + . Let (p n ) n∈N be a sequence of functions R d + → [0, 1], and letp n := sup s∈R d + p n (s). Suppose that "long range" covariances in ξ are controlled by a decreasing functionβ : R + → R + , such that for every open cube A int = (a, a + h1) of side length h ∈ (0, 1] and with minimal corner in a ∈ R d + , and every surrounding set A (t) where the supremum ranges over all D ∈ σ(ξ| A (t) ext ) and Z ∈ L 2 (σ(ξ| A int )) with 0 ≤ Z ≤ ξ(A int )/|A int |. Set furthermore J := [0, 1] d . Then we obtain for arbitrarily chosen m := m(n) ∈ N d 2 L(ξ pn κ −1 n | J ), Po(ν n κ −1 n | J ) = O n dp n m dp n ∨β(m) for n → ∞ with ν n (·) := · p n (s) µ 1 (ds). In the most natural case, wherep n = O(1/n d ), this implies Note that we maintain a great deal of flexibility, and avoid evaluating awkward distances between general probability measures, by approximating the thinning with a Poisson process whose intensity measure depends on n. If we must have a fixed approximating Poisson process, we get by Inequality (A.3) from Appendix A.2 the additional term of which is, in most cases, still much more convenient to estimate than the two corresponding terms in (1.2), inasmuch as the quantities appearing in (1.3) are no longer random.

Preliminaries
Assume that the underlying probability space (Ω, F, P) is complete in order to avoid measurability problems. Denote by B d + the Borel σ-field on R d + , and by B A for any set A ⊂ R d + the trace σ-field B d + | A . Furthermore, write M for the space of boundedly finite measures on R d + , that is measures µ with µ(B) < ∞ for any bounded B ∈ B d + , and equip it with the usual topology (i.e. the vague one, see Kallenberg (1986), Section 15.7) and the smallest σ-field M that renders the masses of bounded Borel sets measurable (see Kallenberg (1986), Section 1.1). Do the same for the subspace N ⊂ M of boundedly finite point measures, and denote the corresponding σ-field by N. A random measure is a random element of M, and a point process a random element of N . By Po(λ) we denote the distribution of the Poisson process with intensity measure λ if λ ∈ M, and the Poisson distribution with parameter λ if λ is a positive real number.
In view of the Barbour-Brown distance d 2 that we use, we always restrict our point processes to the unit cube J = [0, 1] d . Write furthermore J n := κ −1 , 2, . . . , n} d for the unit hypercubes that make up J n . We make use of a simplified multi-index notation by denoting properties of all the individual components as though they were properties of the whole multi-index, for example writing J n = n k=1 C k instead of J n = n k:k 1 ,...k d =1 C k . Furthermore, we use the distance between multi-indices that is defined by |l−k| := max 1≤i≤d |l i −k i |. As a last convention, all equations and inequalities between random variables in this article hold almost surely, if it is not explicitly stated that they hold pointwise.
In what follows, let ξ always be a point process on R d + , and π := (π(·, s); s ∈ R d + ) a [0, 1]-valued random field on R d + that satisfies the following measurability condition: for any bounded rectangle R that is open in R d + , the mapping π * R : Ω × R → [0, 1], (ω, s) → π(ω, s) is σ(π| R ) ⊗ B R -measurable (we say in this case that π is locally evaluable). Assuming this technical condition simplifies part of the notation and the arguments in Section 3 considerably. However, if one accepts a more involved presentation, in particular larger and more complicated σ-fields in Assumption 2, and a stronger independence property for Theorem 3.F, all that is needed to prove the corresponding theorems is the measurability of π as a mapping Ω×R d + → [0, 1]. The latter is necessary to ensure that for any random point S in R d + , π(S) is a random variable. Note that local evaluability is satisfied for many desirable random fields, as for example, for random fields which are pathwise continuous or for indicators of (fully) separable random closed sets (see Appendix A.3 for a more detailed discussion of locally evaluable random fields with proofs).
The π-thinning of ξ can now be defined as follows.
Definition (Thinning). First, assume that ξ = σ = v i=1 δ s i and π = p are nonrandom, where v ∈ Z + := {0, 1, 2, . . .} ∪ {∞}, s i ∈ R d + , and p is a function R d + → [0, 1]. Then, a π-thinning of ξ is defined as ξ π = v i=1 X i δ s i , where the X i are independent indicator random variables with expectations p(s i ), respectively. Under these circumstances, ξ π has a distribution P (σ, p) that does not depend on the chosen enumeration of σ. We obtain the general π-thinning from this by randomization, that is by the condition P[ξ π ∈ · | ξ, π] = P (ξ,π) (under our conditions on π it is straightforward to see that P (ξ,π) is a σ(ξ, π)-measurable family in the sense that P (ξ,π) (D) is σ(ξ, π)-measurable for every D ∈ N). Note that the distribution of ξ π is uniquely determined by this procedure.
This definition of thinning can be found in Kallenberg (1986) for non-random π, and was generalized in Serfozo (1984) to random π. The definition of Brown (1979), and the less formal definitions of Stoyan, Kendall and Mecke (1987) and Daley and Vere-Jones (1988) also yield the same distribution for ξ π .
The following remark simplifies the presentation of the proofs.
Remark 2.A (Numberings of points / definition of retention decisions). Given a countable partition (B j ) j∈N of R d + into bounded measurable sets, there is always a representation of ξ as . . , S ξ(B 1 )+ξ(B 2 ) ∈ B 2 , and so on (see e.g. the proof of Lemma 2.3 in Kallenberg (1986)). We will make tacit use of this fact in connection with the thinning definition on various occasions. For example, for a point process ξ and a bounded Borel set A, we may write ξ π (A) = ξ(A) i=1 X i , and hereby imply that we define σ(ξ)-measurable "point random elements" S i for ξ in such a way that the first ξ(A) points S 1 , . . . , S ξ(A) always lie in A, and all other points in A c , and define "retention decisions" X i which, conditional on ξ and π n , are independent with expectations π n (S i ), respectively.
We measure distances between distributions of point processes on a compact subset K ⊂ R d by means of the Barbour-Brown distance d 2 , a variant of a Wasserstein distance, which has proved to be a useful metric between distributions of point processes in many examples. It can be defined in the following way: Let d 0 be the minimum of the usual Euclidean distance on R d and 1. Denote by N (K) the set of all finite point measures on K, set F 1 := {k : K → R ; |k(s 1 ) − k(s 2 )| ≤ d 0 (s 1 , s 2 )}, and define the d 1 -distance (w.r.t. d 0 ) between 1 , 2 ∈ N (K) as It can be seen that (N (K), d 1 ) is a complete, separable metric space and that d 1 is bounded by 1. Furthermore, Dudley (1989), Section 11.8) yields that between probability measures P and Q on N (K) (distributions of point processes on K) as By the Kantorovich-Rubinstein theorem, one obtains that Furthermore, because of the bound on the d 1 -distance, the d 2 -distance can also be interpreted as a variant of a bounded Wasserstein distance. Hence Theorem 11.3.3 in Dudley (1989) yields that d 2 metrizes the weak convergence of point process distributions. In other words, for point processes ξ, ξ 1 , ξ 2 , . . . on K, we have where the convergence in distribution for point processes is defined in terms of the vague topology (see also the explanation before Theorem 1.A), which, since K is compact, is the same as the weak topology. The fact that is crucial here is that, for d 0 as defined, the topology generated by the metric d 1 on N (K) is equal to the weak topology. For further information on the d 2 -distance see Barbour, Holst and Janson (1992), Section 10.2. For applications of d 2 upper bounds see also Schuhmacher (2005), Section 3.

The main results
For this whole section, let ξ be a point process on R d + , and, for each n ∈ N, let π n := (π n (s) ; s ∈ R d + ) := (π n (·, s) ; s ∈ R d + ) be a [0, 1]-valued, locally evaluable random field on R d + .

Results
Recall that the expectation measure µ 1 of ξ is given by µ 1 (B) := E ξ(B) for any B ∈ B d + , and that the second factorial moment measure of ξ is the measure on The following two assumptions, in one or the other form, are used several times in this article.

Assumption 1 (Control of moment measures).
a) ξ has an expectation measure µ 1 Leb d with bounded density h 1 : R d + → R + , andh 1 := h 1 ∞ ; b) ξ has a second factorial moment measure µ 2 which is bounded on the set of unit cubes in R 2d + , i.e. there ish 2 ≥ 0 such that µ 2 (C) ≤h 2 for any unit cube C ⊂ R 2d + .
Assumption 2 (Mixing property). For each n ∈ N, letβ n : R + → R + be a decreasing function such that for every cube A int = (a, a+h1) of side length h ∈ (0, 1] and with minimal corner in a ∈ R d + , and every surrounding set A (t) Remark 3.A. Since π n is a measurable random field and F is complete, it can be shown by a standard argument involving analytic sets (using the Lusin-Choquet-Meyer theorem from Kallenberg (2002), Appendix A1) that sup s∈A π n (s) is a random variable for any A ∈ B d + .
We now state the main theorem, first in its most general form, and then in weaker but less involved versions. In all the results, we use the notation O f 1 (n), . . . , f j (n) as short hand for O max{f 1 (n), . . . , f j (n)} . Quantitative versions of the upper bounds can be found in the proofs in Subsection 3.2.
Theorem 3.B. Suppose that the point process ξ and the sequence (π n ) n∈N of retention fields satisfy Assumptions 1 and 2 above. Set and let ν n (·) := E · π n (s) ξ(ds) , which is the expectation measure of ξ πn . Then we obtain for arbitrary m : The next few results represent different attempts to simplify the above theorem by separating ξ from π n in the various terms involved. For the first result only a slight modification in the proof of Theorem 3.B is necessary.
Theorem 3.C (L ∞ -version: upper bound and convergence). Suppose that the prerequisites of Theorem 3.B hold, but now replace Π in the mixing condition 2 by Π := ξ(A int ) |A int |, and writeβ (∞) n , instead ofβ n , for the function bounding the covariances. Furthermore set Then we obtain for arbitrary m : Convergence Condition: The right hand side goes to 0 if, for example, p (∞) Remark 3.D (Convergence towards a fixed Poisson process). If in fact ξ and π n are such that for some finite measure λ on J, then we obtain by Theorem 1.A that, on J, which implies under the convergence condition in Theorem 3.C that also for any Borel set A ⊂ J with λ(∂A) = 0. Thus, by Inequality (1.3) and Theorem 3.C, we get an upper bound for d 2 L(ξ πn κ −1 n | J ), Po(λ) that goes to zero under the convergence condition. Of course, the conditions are stronger than the ones for Proposition 1.B, but in return the upper bound is much more explicit and easier to apply.
The next result is a direct consequence of Theorem 3.B. This time, we try to gain a convergence rate that goes to zero under the weaker assumption on the sequence (π n ) that was used for Theorem 1.A, namely that sup s∈R d + π n (s) D −→ 0. It should be noted, that the simple but rather crude estimates we are using here, are based on the assumption that the ξ(C k ) have a generous number of moments. It is obviously by no means the best result attainable based on Theorem 3.B.
Convergence Condition: Provided that ξ(C k ) has sufficiently many moments that are bounded uniformly in k, we can choose m := m(n), Q 1 := Q 1 (n), and Q 2 := Q 2 (n) in such a way that the right hand side goes to 0. For example, under the assumptions that We now examine how certain independence properties can be exploited. The main benefit of this is a more convenient mixing condition, that takes only the point process into account, allowing the retention field to be dealt with separately.
Assumption 2 (Mixing property). Letβ (ind) : R + → R + be a decreasing function such that for every open cube A int = (a, a + h1) of side length h ∈ (0, 1], and with minimal corner in a ∈ R d + , and every surrounding set A (t) Theorem 3.F (Independent thinning: upper bound and convergence). Suppose that the prerequisites of Theorem 3.B hold with Assumption 2 , instead of Assumption 2. Let ξ and π n be independent for any n ∈ N. Note that we now have ν n (·) = E · π n (s) ξ(ds) = · Eπ n (s) µ 1 (ds).
Choose m := m(n) ∈ N in such a way that π n | A int and π n | A (m) ext are independent for every unit cube A int = (a, a+1) and every surrounding set A (m) Then Convergence Condition: The right hand side goes to 0 if, for example, p (1) are independent for all sets A int and A (m(n)) ext of the above form.

Proofs
A complete proof is presented only for Theorem 3.B. For the other statements the corresponding modifications are given.
Proof of Theorem 3.B. Let η n ∼ Po(ν n ). Our processes ξ πn κ −1 n | J and η n κ −1 n | J are discretized in the following way. By Assumption 1 we may suppose that ξ(ω)(J n \ J n ) = η(ω)(J n \ J n ) = 0 for every ω ∈ Ω, without changing the distributions of the point processes. Then, subdivide J n in the domain of the contraction κ n into the hypercubes C k , which were introduced in Section 2. Choose an arbitraryñ ∈ N, and further subdivide every C k into cubes C kr := d i=1 [k i − 1 + (r i − 1)/ñ, k i − 1 + r i /ñ) of side length 1/ñ for r = (r 1 , . . . , r d ) ∈ {1, 2, . . . ,ñ} d . The concrete "inner" and "outer" sets used for Assumption 2 are given by . . , n} d , and r ∈ {1, 2, . . . ,ñ} d , whereC denotes the interior of C ⊂ R d . We denote by α kr the center of C kr , and by θ kr := κ n (α kr ) the center of the contracted hypercube κ n (C kr ). Set furthermore Construct the discretized processes (note that for the Poisson process our discretization is only one "in distribution") as Note that the r-sum over q kr can be estimated from above as where we used the "tacit numbering of points" and the "tacit definition of the retention decisions X i " as announced in Remark 2.A. In the analogous way, the same sum can be estimated from below as The initial distance is split up as follows: We first attend to the discretization errors, which are represented by the first and third terms on the right hand side. We have by Equations (2.3) and (2.2) The first summand in the last line is obtained because, for estimating the d 1 -distance on {ξ πn (C kr ) = Ξ n (C kr ) ∀k, r}, we can pair every point of ξ πn κ −1 n | J with the center of that cube κ n (C kr ) in which it lies; this center is at most √ d/(2nñ) (half a body diagonal of κ n (C kr )) apart, and a point of Ξ n κ −1 n . A similar argument is used to obtain √ d/(2nñ) in the fourth line of the next formula, with the difference that now there might be more than one point pairing in each κ n (C kr ). Thus where d T V denotes the total variation distance between probability distributions (see e.g. Barbour, Holst and Janson (1992), Appendix A.1 for definition and results). Since the sum over r in the second term can be estimated as we obtain, as an overall bound for the discretization error terms, √ d (nñ) + n d w [2] . (3.4) Next we consider the left over term in (3.2), which is estimated by application of the Barbour-Brown theorem A.A (see Appendix A.1). We choose m := m(n) ∈ N arbitrarily, and set in the notation of the appendix we can apply Theorem A.A and obtain (once more in the notation of the appendix): Further estimation of the various terms yields and, using part of Inequality (3.3) for the second line, where our numbering in Inequality (3.8) is such that S 1 , . . . , S ξ(C k ) lie in C k , and S ξ(C k )+1 , . . . , S ξ(C k )+ξ(C l ) in C l .
A little more work is needed to estimate e kr . First, note that by Assumption 1, the probability that any points of ξ lie on the grid G := kr ∂C kr is zero. Since we are only interested in distributional properties of ξ π , we may therefore assume w.l.o.g. that ξ(ω)(G) = 0 for all ω ∈ Ω. For any set U , denote its power set by P(U ), and write F w k := P {0, 1} Θ w kr , and W k := (I ls ) (l,s)∈Θ w kr (note that Θ w kr does not depend on r). If E ξ(C kr ) sup s∈C kr π n (s) is zero, it is evident that e kr = 0, so assume that the expectation is non-zero. We use the "formula of total covariance", that is, the relation for random variables X, Y ∈ L 2 and an arbitrary random variable Z, along with the conditional independence of the retention decisions, to obtain (3.9) Since no realization of ξ has any points in G, the arguments of the covariance are conditional probabilities of the form P ext (k), respectively. We then may condition as well only on ξ and π n restricted to the corresponding set A (the proof of this is rather technical and has therefore been placed in Appendix A.4). Hence where for the inequality, the factor E ξ(C kr ) sup s∈C kr π n (s) was extracted from the first argument of the covariance. Note that in the second argument we do not have an indicator as required, but a general [0, 1]-valued random variable. The upper bound from Assumption 2 still holds, as can be seen from the proof of Equation (1 ) in Doukhan (1994), Section 1.1. We assemble the different parts from Result (3.4), Inequalities (3.5) to (3.8), and Inequality (3.10), and letñ go to infinity to obtain the overall estimate which is of the required order for n → ∞.
Proof of Theorem 3.C. Use, in the upper bound of Theorem 3.B, the estimates w 1 ≤ h 1 p (∞) n and w [2] ≤h 2 (p (∞) n ) 2 , and modify the estimation of the e kr by extracting p (∞) n |C kr | instead of E ξ(C kr ) sup s∈C kr π n (s) in Inequality (3.10), such that we get Proof of Corollary 3.E. In the upper bound of Theorem 3.B, use the estimates from which we obtain the required order. For the convergence condition, set m(n) := n y , Q 1 (n) := n (sy−ε)d and Q (ε) 2 (n) := n (2s(x+1)y−ε)d , where y := x 2s(x + 1) + 1 and ε > 0 is arbitrary. Use that E(Y 1 {Y >Q} ) = o (1/Q) r−1 for any non-negative random variable Y with finite r-th moment, in order to show that for any r > 2 + (2s + 1)/(sx), there is still a choice of ε > 0, such that the upper bound in Corollary 3.E goes to zero.
Proof of Theorem 3.F. Use in the upper bound of Theorem 3.B the estimates w 1 ≤ h 1 p (1) n and w [2] ≤h 2 p (2) n , and modify the estimation of the e kr in the following way: for better readability, suppress k and r in the expressions A int (k, r), A Then, because of the independence of (ξ, π n | A int , π n | A (m) ext ), which yields the desired result.

Example of thinning according to a random environment (clouds in the starry sky)
Suppose that the stars within a certain distancer to earth, and lying in a large window J n ⊂ R 2 + of the night sky, are part of a point process ξ on R 2 + that fulfills Assumptions 1 and 2 . Whether you can actually see a particular star in this window depends, among other things, upon the distance and the brightness of the star, and upon whether any other object (here, a cloud) covers the star. Suppose, for the sake of simplicity, that the distributions of distance and brightness of stars do not depend on their position in the window, so that there is a basic probability q 0 = q (n) 0 that you can see any fixed star in that part of the window that is not covered by clouds. Suppose furthermore, that the clouds in the sky as seen from earth in the upper right area of some reference point, form a separable RACS (random closed set) Ξ ⊂ R 2 + that is independent of ξ. By Remark A.E from the appendix, Theorem 3.F may then be applied for the retention field given by π n (ω, s) = q 0 (1−1 Ξ(ω) (s)). In the following, we admit a general dimension d ∈ N for the sky.
In order to make things more concrete, let us consider a toy example. Suppose the cloud RACS Ξ := Ξ n is a very simple Boolean model consisting only of discs of positive i.i.d. radii whose centers form a homogeneous Poisson process on R d : Denote by ⊕ the (Minkowski-)addition of subsets of R d , i.e. A ⊕ B = {a + b; a ∈ A, b ∈ B}, and set where (Y i ) i∈N are the points of a Po(λLeb d )-process on R d , λ > 0, and B(0, R i ) is the closed Euclidean ball with center at 0 and radius R 1/d =:ř n , and R (n) i L∞ =: r (∞) n . Note that Ξ is a separable RACS. Its capacity functional is given by for any compact subset K ⊂ R d ; see e.g. Stoyan, Kendall and Mecke (1987), Section 3.1. Let Ξ be the RACS that is obtained from Ξ by decreasing the radii R where α d := π d/2 Γ(d/2 + 1) is the volume of the d-dimensional unit ball. Therefore Furthermore, we obtain for the approximating expectation measure in Theorem 3.F where the second equality is obtained by Equation (3.12) with K = {s}, yielding in total the following result.
Proposition 3.G. Let ξ be a point process on R d + that satisfies Assumptions 1 and 2 . Furthermore, let π n (ω, s) = q 0 (1 − 1 Ξn(ω) (s)) for all s ∈ R d + , ω ∈ Ω, where q 0 = q (n) 0 ∈ [0, 1] and Ξ n is the Boolean model given by Equation (3.11), i.e. the R d +part of a union of balls in R d whose centers form a Po(λLeb d )-process and whose radii R We then obtain, for m = m(n) ∈ N with m(n) ≥ 2r (∞) n , that where ν n = q 0 exp(−λα d r d n )µ 1 , and α d is the volume of the d-dimensional unit ball.
From an asymptotical point of view, clearly the interesting cases are those in which ν n κ −1 n | J does not fade away to the zero measure as n tends to infinity, giving us an artificial benefit for our distance estimate. In order to prevent this behavior, we must avoid choosing r n of a higher than logarithmic order. To get a quick idea, consider the special case in which there is a n 0 ∈ N such that R (n) 1 > √ d for n ≥ n 0 , and E(R This situation allows us to choose an arbitrary ζ > 0, and we still find n 1 ∈ N such thatř d n ≥ (1 − ζ)r d n for all n ≥ n 1 . Let us furthermore arrange for a constant non-zero ν n κ −1 n : let µ 1 := µ 0 Leb d with µ 0 > 0, choose q 0 ≥ 1/n d and set such that ν n κ −1 n = µ 0 Leb d for every n ∈ N. The result is as follows.
Corollary 3.H. Under the conditions of Proposition 3.G, as well as the additional conditions above, we have for any ζ > 0 The upper bound in the above proposition goes to zero under an appropriate choice of m ≥ 2r (∞) n and ζ > 0 if q 0 = O(n −ε 1 d ) as n → ∞ andβ (ind) (t) = O(t −ε 2 d ) as t → ∞ for some ε 1 , ε 2 > 0. Note that it is always possible to choose m appropriately, provided that r (∞) n = O(n ε 3 ) for some ε 3 ≤ ε 1 . In the case of convergence of the above bound to zero we obtain furthermore by Result (2.4) that

An alternative thinning definition
In this section, we consider a different thinning concept, where in place of a random retention field, we have deterministic retention kernels by which we directly model dependences between retention decisions. This will lead us, by means of the same method of proof as before, to a theorem that is similar to Theorem 3.B, less appealing from a theoretical and a typographical point of view, but sometimes more intuitively applied, because it permits us to look at the situation from a different angle: rather than thinking of a thinning as the result of point deletions according to a (potentially inscrutable) random environment, we now understand it as the result of point deletions according to (potentially more transparent) point interactions.

Definition, requirements and results
We first present the new thinning definition. Let ξ be a point process on R d + . To simplify the presentation we assume, for this whole section, that all realizations of ξ have infinitely many points in R d + . Definition ("Q-Thinning"). For any u ∈ N let D u := {(s 1 , s 2 , . . . , s u ; σ) ∈ (R d + ) u × N ; u i=1 δ s i ≤ σ}, equipped with the trace σ-field of (B d + ) u ⊗ N, and denote by D u (σ) := {s ∈ (R d + ) u ; (s; σ) ∈ D u } the section of D u at σ ∈ N . Let Q u be a probability kernel from D u to {0, 1} u . We call (Q u ) u∈N an admissible sequence of retention kernels if (a) the Q u are "simultaneously symmetrical" in the sense that for any permutation τ on {1, . . . , u} and its corresponding linear transformations Assume now that Q = (Q u ) u∈N is an admissible sequence of retention kernels. In analogy with Section 2, the Q-thinning of ξ can now be defined as follows. First, assume that ξ = σ = ∞ i=1 δ s i is non-random, and define a Q-thinning of ξ in this case as ξ Q := ∞ i=1 X i δ s i , where the X i are indicator random variables whose joint distribution is given by the fidi-distributions L(X 1 , . . . , X u ) = Q u (s 1 , . . . , s u ; σ), · for every u ∈ N. It is easy to see that, due to Properties (a) and (b) from above, ξ Q has a distribution P σ that is well-defined and does not depend on the enumeration of σ. We obtain the general Q-thinning from this by randomization, as in Section 2.
for every σ ∈ N . Such functions exist by Remark 2.A. With any such sequence of functions it is enough to define Q u (f 1 (σ), . . . , f u (σ); σ), A for every σ ∈ N , every A ⊂ {0, 1} u , and every u ∈ N, in such a way that Properties (a) and (b) from the thinning definition are satisfied, and the above term is a N-measurable mapping in σ and a probability measure in A. There is then a unique admissible sequence ( Q u ) u∈N of retention kernels such that Q u extends Q u to the whole of D u for every u ∈ N. A short proof of this statement is the topic of Appendix A.5.
Remark 4.B. The new Q-thinning concept generalizes the thinning concept from Section 2. That is to say, for any combination of a point process ξ and a locally evaluable random field π, the thinning ξ π can be modeled as a Q-thinning ξ Q . As in Remark 4.A, let f 1 , f 2 , . . . be N-B d + -measurable functions with σ = i δ f i (σ) for every σ ∈ N . Define then for any u ∈ N, and for e 1 , . . . , e u ∈ {0, 1}, (4.1) for almost every σ, which, upon adaptation on a Pξ −1 -null set, yields by Remark 4.A, a well-defined sequence of retention kernels. It follows then from (4.1) that for every and hence that ξ Q has the same distribution as ξ π .
One can prove a theorem corresponding to Theorem 3.B, which now relies on separate control of a mixing coefficient with respect to ξ alone and of the conditional covariances between functions of retention decisions. This seems intuitively more appealing, but is also quite a bit more inconvenient to formulate than the more abstract way via the σ-fields F (n) int and F (n,t) ext used for Theorem 3.B. To keep things reasonably neat, we only state a special case here, which is basically the analogue of Theorem 3.C, the L ∞ -version.
We first present the additional assumptions we need. Let Q (n) := (Q (n) u ) u∈N for each n ∈ N be an admissible sequence of retention kernels, and set p n := ess sup sup where we define the supremum over the empty set as 0, and the infimum as 1. Note that the role of π n (ω, s) is now always taken by Q (n) 1 (s; ξ(ω)), {1} , and that D 1 (σ) is just the set of all points of σ. The additional assumptions are as follows.
Assumption 5 (Control of the long range dependence between the retention decisions given ξ). For each n ∈ N, letγ n : R + → R + be a decreasing function such that the following property holds: for every cube A int = (a, a + h1) with a ∈ R d + , h ∈ (0, 1], for t ∈ R + , for every (s 1 , . . . , s l , s l+1 , . . . , s u ; σ) ∈ D u with l ≥ 1, u > l, Remark 4.C. Assumptions 4 and 5 amount to the statement that, for any t ∈ R + and for any set A int of the above form, given a representation ∞ i=1 δ f i (ξ) of ξ with measurable functions f 1 , f 2 , . . . which first enumerate all the points in A int and have images s i := f i (σ), and given an associated sequence (X i ) of retention decisions with respect to Q (n) , we have with i = j We are now in the position to formulate the theorem. Again, the corresponding quantitative upper bound can be found in the proof.
Theorem 4.D (L ∞ -version: upper bound and convergence). Let ξ be a point process on R d + that satisfies Assumptions 1 and 2 above, and for each n ∈ N, let Q (n) := (Q (n) u ) be an admissible sequence of retention kernels that satisfies Assumptions 3, 4, and 5. Set ϕ := inf k∈N d P[ξ(C k ) ≥ 1] ∈ [0, 1], and letν n (·) := E · Q (n) 1 (s; ξ), {1} ξ(ds) , which is the expectation measure of ξ Q (n) . Then we obtain for any m := m(n) ∈ N with m ≥ 2t n d 2 L(ξ Q (n) κ −1 n | J ), Po(ν n κ −1 n | J ) = O n dp nδn (0) , m d n d ∧ log ↑ (n dp n) ϕ p n p n (p n ∨δ n (m)) , The right hand side goes to 0 if, for example,p n = O(1/n d ),δ n (0) = o(1), and there is a sequence (m(n)) n with m(n) ≥ 2t n , m(n) = o 1 p Remark 4.E. The main ideas of Theorem 4.D and Theorem 3.C are closely related. For example we have formulated in two different ways -once in Assumption 2 and once in Assumptions 2 , 3, and 5 -what is essentially the decreasing dependence between (ξ| A int , X ξ(A int )+ξ(Aext) ) with increasing distance between an inner set A int and an outer set A ext and increasing n (where X ξ(A int )+ξ(Aext) denote the retention decisions for the points in A int and in A ext , respectively).
Nevertheless, there are also substantial differences between the two theorems. We mention briefly two of the more important ones, leaving the confirmation of the formal details for the reader. First, in Theorem 3.C the thinning is location-based (i.e. the location in the state space), whereas in Theorem 4.D the thinning is point-based (i.e. the point of the original process ξ), that is to say that two or more points of ξ which occupy the same location in the state space must be thinned with the same dependence of other retention decisions and with equal probabilities if we want to apply Theorem 3.B, but they can be thinned much more generally if we want to apply Theorem 4.D.
Secondly, in Theorem 4.D the conditional distribution L(X 1 , . . . , X u | ξ = σ) is only allowed to be a function of σ| u i=1 B(s i ,tn) , whereas in Theorem 3.C it may be a function of all of σ. As an example, consider the π-thinning of a homogenous Poisson process on the real half line with points S 1 < S 2 < . . ., using a retention field of the form where the idea is that h 1 (n) → 0 and/or h 2 (n) → ∞ as n → ∞. Since π n is only via ξ a function of ω, this situation is easily translated into a Q-thinning model. Obviously, Condition 3 is not satisfied by this model, because the retention probabilities may depend on arbitrarily distant regions. On the other hand, this long range dependence is very weak, and it can in fact be shown that the conditions for Theorem 3.C are met with a mixing coefficientβ (∞) (t) that does not depend on n and that goes to zero exponentially fast as t → ∞.
Proof of Theorem 4.D. Let η n ∼ Po(ν n ), and choose an arbitraryñ ∈ N. We use the notation and the conventions from Subsection 3.2, replacing only ξ πn by ξ Q (n) , and define the concrete "inner" and "outer" sets needed by fort, t ∈ Z + , k ∈ {1, 2, . . . , n} d , and r ∈ {1, 2, . . . ,ñ} d . Most of the estimates from the proof of Theorem 3.B are still valid for the new thinning ξ Q (n) if we condition on ξ alone instead of ξ and π n together and replace in the bounds w 1 byh 1pn and w 1 by ϕ p n . Thus, e.g. ϕ p n ≤ñ r=1 q kr ≤h 1pn , and also M 1 (λ), M 2 (λ) and k,r q kr E Z kr from the Barbour-Brown theorem follow exactly this pattern.
The remaining terms can be estimated in a very similar fashion as in the proof of Theorem 3.B, namelỹ and in the Barbour-Brown theorem +h 2 n dp nδn (0) ≤h 2 (2m + 1) d n dp nδn (m), using the same partition of the index set Θ as in Subsection 3.2.
Again we have to argue a bit more carefully for the estimation of the e kr . If we again set F w k := P {0, 1} Θ w kr and W k := (I ls ) (l,s)∈Θ w kr , we obtain, in a similar fashion as in the proof of Theorem 3.B, where this time Lemma A.F(ii) from Appendix A.4 was used. Compared with the earlier proofs, we now have a second term that is in general not zero, because this time the retention decisions need not be independent under the conditioning we have. In the first summand, we extractp n |C kr | from the first argument of the covariance, and use Assumption 2 , in the second summand we use Assumption 5, and obtain altogether 1 2ñ r=1 e kr ≤p n β (∞) (m − 2t n ) +h 1γn (m) . Assembling once more the different parts, and lettingñ go to infinity, gives the overall estimate ≤h 2 n dp nδn (0) + 1 ∧ 2 ϕn d p n 1 + 2 log + h 1 2 n dp n (2m + 1) d n d h 2 1p n +h 2δn (m) p n + 1 ∧ 1.65 which is of the required order for n → ∞.

Example of thinning according to point interactions (competition in a plant population)
Suppose that the individuals of a certain kind of plant that grow in a large piece J n ⊂ R 2 + of soil are part of a point process ξ = ∞ i=1 δ S i on R 2 + which has σ(ξ)-measurable points S i with realizations s i , and fulfills the Assumptions 1 and 2 above. As before we will do our analysis for a general dimension d ∈ N. Assume that the plants have certain "fitness parameters" Ψ i , one per plant, which are i.i.d. (0, 1]-valued random variables, independent also of everything else and following a continuous distribution function. Whether a given plant survives until some time t 0 , depends firstly on the overall environmental conditions, which we require to be the same for all plants (say, each plant has a basic survival probability q 0 = q (n) 0 ), and secondly on the influence of other plants in its immediate surroundings. Suppose that the competition is such, that an individual plant survives it, independently of whether it survives the environmental effect, if there are no plants with a higher degree of fitness within a radius of r n > 0. We model this situation by assuming that we have for each S i a retention decision where the Y (n) i are i.i.d. Be(q 0 )-random variables that are independent of everything else and determine survival due to the environmental effect, and the Z (n) i are defined by and determine survival due to the competition effect. This second thinning effect is the same one used for the construction of the Matern hard core process (see Stoyan, Kendall and Mecke (1987), Section 5.4). We obtain from symmetry considerations and for i = j, (4.5) For the second result a bit of computation is necessary, which basically consists of counting the number of orderings of the fitness values of the individuals in B(s i , r n ) ∪ B(s j , r n ) that leave the values of s i and s j as highest in their respective "competition ball". Note that for |s i − s j | > 2r n the retention decisions X i and X j are independent given ξ, because the "competition balls" B(s i , r n ) and B(s j , r n ) are disjoint. By Remark 4.A we can define probability kernels by for almost every σ, in such a way that (Q (n) u ) u∈N is an admissible family of retention kernels for the sequence (X i ) i∈N .
In view of Theorem 4.D, Condition 3 is satisfied witht n = r n , and for m(n) ≥ 2r n we can always setγ n (m(n)) to zero in Condition 5. Furthermore, we have by Equation Obviously, for most point processes ξ, the norms on the right hand side of the above equations are one, and we then do not get very interesting upper bounds for the d 2 -distance between the law of the thinned process and a Poisson process law, because the bounds take into account only the environmental thinning effect and not the competition effect.
There are two ways in which this can be rescued. First, we could use the L 1 -version of Theorem 4.D (i.e. the Q-thinning analogue of Corollary 3.E), which in this case is much more promising, because we have expectations instead of the L ∞ -norms above. However, as we have not formulated this more involved version, we do not pursue this idea here any further. Secondly, we can look at a specific ξ-process where we can be sure of having a certain number of competitors in every competition ball: let ξ be the point process on all of R d (in order to avoid edge effects) that has in every unit square [k, k + 1), k ∈ Z d , exactly one point, which is uniformly distributed over the square and independent of the locations of all the other points. We might think of a gardener who sows seeds, exactly one per square, by just throwing each carelessly over its square; the fact that the distribution over each square is the uniform is by no means crucial to the essence of the following explanations.
We then have, for all s ∈ J n , ξ B(s, r n ) ≥ (r n − 1)/ √ 2 d ≥ (r n /2) d for r n ≥ 4, and ξ B(s, r n ) ≤ 2(r n + 1) d ≤ (3r n ) d for r n ≥ 2. Therefore, in the notation of Theorem 4.D, Furthermore, we obviously have Eξ = Leb d , and we can calculate for the second factorial moment measure such that µ 2 (C) ≤ 1 for every unit cube C ⊂ R 2d . Finally, the mixing property 2 is also met withβ (∞) (t) = 0 for t ≥ 1. With m(n) := 2r n + 1, this yields the following result.
In Brown, Weinberg and Xia (2000), a method was proposed to dispose of the factor log + (λ/2) at some cost (compare also Brown and Xia (2001), Section 5, where the estimates needed were considerably simplified). Since this requires a more specialized treatment of the terms obtained in the upper bound, and since the logarithm above is negligible for almost all practical purposes, we do not pursue this method here much further. We only give an adaptation of Proposition 4.1 from Brown, Weinberg and Xia (2000), which is needed for the proof of Proposition 1.B, where a logarithmic term would be slightly annoying.
Proposition A.C. Suppose that (I k ) k∈Θ is an independent sequence of indicators, and set q k := EI k , Ξ := k I k δ α k , λ := k q k δ α k , and λ := k q k > 0 as above. Then Proof. Suppose that λ > 10. By applying Theorem 3.1 from Brown, Weinberg and Xia (2000) in the same way as the authors did to obtain the first inequality of Proposition 4.1, we get Note that this estimate does not need any requirements on the underlying space (i.e. the positions of the points α k and the metric d 0 ), or on the individual q k , and the only requirement on λ is that λ > q k for every k. Using q k ≤ 1 < λ/10 for the last term, we thus obtain d 2 (L(Ξ), Po(λ)) ≤ 8.45 λ k∈Θ q 2 k .
For λ ≤ 10 we simply use Theorem A.A with Θ s k = ∅ and Θ w k = Θ k to obtain A.2 Proof of Proposition 1.B We use the notation and some results from Section 2. Let n ∈ N be fixed, and let i=1 δ S i be a representation of ξ with σ(ξ)-measurable random elements S 1 , S 2 , . . . numbered in such a way that all the points in κ −1 n (K) come first. Let (X i ) i∈N , (Y i ) i∈N be sequences of random variables such that, given ξ and π n , the X i are independent indicators with expectations π n (S i ), and the Y i are independent and Po(π n (S i ))-distributed (i.e. the X i are retention decisions as usual, and the Y i are "Poissonized" variants of retention decisions). We have Choosing an arbitrary Cox(Λ n )-distributed point process η n , the first term is estimated as π n (s) , where for the last line we used the fact that, given ξ and π n , each of the sequences (X i ) and (Y i ) is independent, so that we can apply Proposition A.C. For the second term, we just use an upper bound for the d 2 -distance between two general Cox processes. Let Λ and M be arbitrary finite random measures on K, and write Φ(λ, µ) := d 2 Po(λ), Po(µ) for finite measures λ, µ on K. Then, by conditioning on Λ and M, Inequality (2.8) in Brown and Xia (1995) tells us that 3) (note that the first d 1 in Inequality (2.8) should be d 2 ; also, there is no formal difference between the two measures in Inequality (2.8), so they can be exchanged in the upper bound). Combining of the estimates (A.1), (A.2), and (A.3) yields the desired result.

A.3 Locally evaluable random fields
In Section 2, the notion of a locally evaluable random field was introduced. Essentially, any measurable random field which owes its measurability to some local feature (such as continuity of paths) has this property. In what follows we show local evaluability for two important classes of random fields. We first give the corresponding definitions.
Definition. Let f : R d + → [0, 1] be a function, and π := (π(·, s); s ∈ R d + ) a [0, 1]-valued random fields on R d + . (a) Let A ⊂ R d + be a closed convex cone, that is a closed convex set for which v ∈ A implies αv ∈ A for every α ∈ R + . We call f continuous from A at a point s ∈ R d + if lim n→∞ f (s n ) = f (s) for any sequence (s n ) in s + A with s n → s (n → ∞). We call f continuous from A if it is continuous from A at every point.
(b) We say f is lower semicontinuous at a point s ∈ R d + if lim inf n→∞ f (s n ) ≥ f (s) for every sequence (s n ) in R d + with s n → s (n → ∞). We say f is upper semicontinuous at s if −f is lower semicontinuous at s. Furthermore f is called lower [resp. upper] semicontinuous if it is lower [resp. upper] semicontinuous at every point.
(c) Let C be a class of subsets of R. We call π separable for C if there exists a countable set Σ ⊂ R d + , and a fixed set N ∈ F with P(N ) = 0, such that for any set C ∈ C and for any rectangle R that is open in R d + , we have ω ∈ Ω; π(ω, s) ∈ C, s ∈ R ∆ ω ∈ Ω; π(ω, s) ∈ C, s ∈ R ∩ Σ ⊂ N.
In this case we call Σ a separant for π. We call π fully separable for C and accordingly Σ a full separant for π if the above property holds with N = ∅. Accordingly, every separable random field π can be made fully separable by adjustment on a set of probability zero. Note that such an adjustment does not change the distribution of the thinning ξ π .
Proposition A.D. The random field π := (π(·, s); s ∈ R d + ) is locally evaluable under each of the following conditions: (a) There is a closed convex cone A ⊂ R d + of positive volume, such that the paths π(ω, ·) are all continuous from A; (b) the paths π(ω, ·) are all lower semicontinuous, and π is fully separable with respect to the class {(y, ∞); y ∈ R}; (c) the paths π(ω, ·) are all upper semicontinuous, and π is fully separable with respect to the class {(−∞, y); y ∈ R}.
Remark A.E. A special case of a random field with all upper semicontinuous paths is the indicator of a random closed set (RACS) F . It is straightforward to see that such an indicator is separable with respect to the class {(−∞, y); y ∈ R}, if and only if F is a separable RACS, meaning that there is a countable, dense subset Σ of R d + such that F = F ∩ Σ a.s. Hence Proposition A.D guarantees us, if π(ω, s) := 1 F (ω) (s) for a separable RACS F , that by adjusting π on a null set, we obtain a locally evaluable random field. For more details on the relationship between indicator random fields and RACS see Matheron (1975), Section 2-4. v ; v ∈ L (N ) } is, for any N , a partition of R d . We may now, for any open set R ⊂ R d + , approximate π * R by the random fields π N given as π N (ω, s) :=

Proof of Proposition
for every ω ∈ Ω and every s ∈ R. It is easy to see that π N is (σ(π| R ) ⊗ B R )measurable. Furthermore, π N converges pointwise to π * R for N → ∞, because for any (ω, s), the sequence (π N (ω, s)) N has (with the possible exception of the first finitely many elements) the same values as (π(ω, s N )) N , where s N are defined via the condition s ∈ E (N ) s N for every N ∈ N, such that s N → s from the cone A. Hence, π * R is (σ(π| R ) ⊗ B R )-measurable, which completes the proof of part (a).

A.4 Thinnings are determined locally
We show here that "local conditioning" is enough to determine the "local distribution" of the thinnings, a result which is required for the estimation of the e kr -terms in both Section 3 and Section 4. For any set A ⊂ R d + write N(A) := σ { ∈ N ; (B) = l} ; B ∈ B A , l ∈ Z + .
We then obtain Lemma A.F. Suppose that ξ is a point process on R d + .
(i) For the π-thinning: Let π be a locally evaluable [0, 1]-valued random field on R d + . Then we have for any bounded open set A ⊂ R d + , and for any D ∈ N(A), that P ξ π ∈ D ξ, π = P ξ π ∈ D ξ| A , π| A a.s.
(ii) For the Q-thinning: Let (Q u ) u∈N be an admissible sequence of retention kernels which satisfy Assumption 3 from Section 4. Then we have for any bounded measurable set A ⊂ R d + , and for any D ∈ N(A), that P ξ Q ∈ D ξ = P ξ Q ∈ D ξ| H (tn) (A) a.s., where H (tn) (A) := {s ∈ R d + ; ∃s ∈ A s.t. |s − s| ≤t n } denotes thet n -halo set for A.