A functional central limit theorem for the empirical Ripley’s K-function

: We establish a functional central limit theorem for the empirical Ripley’s K -function of Gibbs point processes and point processes with fast decay of correlations. Our theorem greatly extend past results that were restricted to the Poisson case and allow to determine the asymptotic behaviour of statistics based on the K -function which may be used, for example, to develop goodness-of-ﬁt tests. We illustrate this in a simulation study.


Introduction
Ripley's K-function [21] is a classical way of summarizing the second order structure of a stationary point process in spatial statistics. For a given r > 0, K(r) is defined to be the expected number of points in a ball of radius r around a typical point of the point process. Apart from being geometrically intuitive, an important reason for studying the K-function is that for isotropic point processes it contains all information about the pair correlation function, see e.g. [5]. Consequently, the Ripley's K-function is commonly used for statistical inference in spatial statistics for point processes arising in various fields such as medicine [17], wireless communication [25], forestry [10] and neuroinformatics [15]. A review of the statistical aspects of the Ripley's K-function may be found [1].
A common problem in this field is to assess how well an observed point pattern x fits an assumed model P 0 based on a chosen summary statistics, e.g. the Ripley's K-function and the pair correlation function. A standard way to perform goodness-of-fit tests is to use confidence bands derived from central limit theorems of the chosen summary statistic under P 0 . However, in the case of the Ripley's K-function, such theorems have only been established when P 0 is a Poisson point process with known or unknown intensity, see [12]. When no confidence bands are available, pointwise envelope tests, see e.g. [1], and their extension to global envelope tests have been a popular alternative [20]. However, pointwise envelope tests are restricted to comparison of the summary statistics evaluated at a given argument while global envelope tests require several thousands of simulations of point patterns following P 0 with on average the same number of points as in the observed pattern x. This allows for goodness-of-fit test of more models than the Poisson point process but may be infeasible when the number of observed points is large.
Thus, establishing central limit theorems for the K-function of P 0 when it is not necessarily a Poisson point process would allow for goodness-of-fit tests which are not based on envelope methods as in [12]. In Theorem 4.2 of the present paper, we provide a first necessary step in that direction by establishing central limit theorems for Ripley's K-function when P 0 does not depend of unknown parameters. However, a problem arises when P 0 belongs to a parametric family of models that contains several unknown parameters, e.g. the intensity as described in [1]. As observed in [12] for Poisson processes, replacing the intensity by an estimate changes the limiting distribution. For more general point process models, generalising Theorem 4.2 in the presence of unknown parameters appears to be challenging and require a further development on our main result in Theorem 4.2. We thus leave this open for future works. To the best of the authors' knowledge, the only previous contribution to a functional central limit theorem for the K-function is [12,13] where the framework is limited to stationary Poisson processes. Other recent results on functional central limit theorems for geometric summary statistics are given in [19,24].
The key to our results is to rewrite the estimator for Ripley's K-function in the form where P is the point process, W n is an observation window, and ξ is a socalled score function. There has been a lot of recent activity to establish the asymptotic behaviour of summary statistics of this form [4,26] when either P has fast decay of correlations or it belongs to a suitable class of Gibbs processes. This immediately leads to point-wise laws of large numbers for the mean and variance and, when a certain variance lower bound is satisfied, these papers also provide point-wise central limit theorems. This variance bound can be shown for the K-function by applying techniques of [2,26]. To obtain a functional central limit theorem, tightness is shown by applying a machinery developed in [2] for persistence diagrams and adapting it to Gibbs point processes.
The paper is structured as follows. In Section 2 we introduce the K-function and its most common edge corrected estimators. In Section 3 we introduce the classes of point processes to which our results apply. The main results for the Kfunction are stated in Section 4 and corollaries for related functionals are given in Section 4.1. We investigate the statistical performance of a test based on the central limit theorem in Section 5 before proving the main results in Sections 6-7. Appendix A contains two required results on the conditional variance of random variables and Appendix B presents some background on Gibbs point processes.

The K-function
Let P ⊆ R d be a simple stationary point process of intensity ρ > 0 and A ⊆ R d a set of positive and finite volume |A|. For all r ≥ 0, the K-function is defined by where E o denotes the Palm expectation given o ∈ P. This definition is independent of the set A. Typically, the point process is only observed inside a bounded observation window. Throughout this paper, we will consider a square observation window W n = [− 1 2 n 1/d , 1 2 n 1/d ] d of volume n and write P n = P ∩ W n . The most naive estimator for the K-function based on P n iŝ K n (r) = 1 nρ 2 x∈Pn y∈Pn However, this estimator is downward biased due to points y close to the edges of W n not being counted. This bias tends to zero when the volume of the window goes to infinity, see Theorem 3.5 below, but for finite window sizes, an edge corrected estimator is typically used to avoid the bias [22]. This is a weighted estimator of the form K e,n (r) = 1 nρ 2 x∈Pn y∈Pn where e n is an edge correction factor depending on the window. Several edge correction factors have been proposed in the literature, see e.g. [22]. In the case of stationary point processes, the most commonly used edge corrections are: • No correction: e 1,n (x, y) = 1. This corresponds to the uncorrected estima-torK n . • Translation correction: e 2,n (x, y) = |Wn| |Wn∩(Wn+x−y)| , where | · | denotes volume.
• Rigid motion correction: where SO(d) is the space of rotations equipped with the normalized Haar measure ν. This is the inverse of the proportion of all rigid motions keeping x in W n that also keep y in W n . • Border correction (minus sampling): e 4,n (x, y) = 1 Wn Br(0) (x) n |Wn Br(0)| , where B r (x) is the ball around x of radius r and denotes Minkowski set difference. This edge correction is equivalent to sampling x from a smaller window such that all points within distance r from x can be observed in W n . • Isotropic correction: e 5,n (x, y) = The edge corrections e 2,n and e 4,n lead to unbiased estimators for all stationary point processes, while e 3,n and e 5,n also require that the point process is isotropic to yield unbiasedness.
In the proofs below, we write the estimators aŝ where ξ r and ξ e,n,r are the so-called score functions defined for a locally finite point pattern X and a point x ∈ X as Note that, for all the non-trivial edge corrections listed above, the associated score functions depend on n. Moreover, some of them are not invariant with respect to translations (x, X ) → (x + y, X + y).

Classes of point processes
In this section, we introduce the two main types of point processes that we are going to consider. One is the class of conditionally m-dependent point processes having fast decay of correlations as considered in [2]. The second is a class of Gibbs point processes considered in [23]. In Section 3.3, we state a formula from [4,26] for the limiting mean and covariance ofK n (r) for fixed value(s) of r when the volume of the observation window goes to infinity. A point process P is formally defined as a random variable taking values in the space of locally finite subsets N of R d endowed with the smallest σ-algebra N such that the number of points in any given Borel set is measurable. We assume that P is simple with intensity ρ and stationary.
We will assume that all factorial moment measures exist and are absolutely continuous, that is, the pth factorial moment density ρ (p) is determined via the identity for all pairwise disjoint bounded Borel sets A 1 , . . . , A p ⊆ R d . Here P(A i ) denotes the number of points of P in A i .

Conditionally m-dependent point processes
The first class of point processes we will consider satisfy a set of conditions that we introduce in this section, namely fast decay of correlations, conditional m-dependence, and Conditions (M) and (R) below. We say that a function φ : [0, ∞) → [0, 1] is fast decreasing if lim t→∞ t m φ(t) = 0 for all m ≥ 1.
Definition 3.1. Let P be a simple stationary point process in R d , such that the pth factorial moment density ρ (p) exist for all p ≥ 1. Then, P exhibits fast decay of correlations if there exists a fast decreasing function φ and constants c n , C n > 0 for all n ∈ N such that for any p, q ∈ N and Here dist denotes the distance between two point sets, i.e.
For p ∈ N, let P = p denotes p-tuples of pairwise distinct points in P and recall that the p-point Palm distribution P x1,...,xp is determined by for any bounded measurable f : R pd × N → R. With this notation, we make the following condition on the Palm moments: We summarize for later reference the conditions of [4] that are satisfied by the point processes and score functions we consider.

Lemma 3.2.
Suppose that P has fast decay of correlations and let ξ be a linear combination of score functions ξ r1 , . . . , ξ rp of the form (3). Then (P, ξ) is admissible of class (A1) in the sense of [4] and ξ(x, X ) depends only on X ∩ B r (x) where r = max i=1,...,p r i . If, moreover, P satisfies the moment condition (M), then the p-moment condition [4, (1.19)] is satisfied, i.e. for all p > 0, there is an M p > 0 such that Proof. It suffices to show the lemma for ξ = ξ r , the extension to general linear combinations being trivial.
Since P has fast decay of correlations by assumption and since ξ r (x, P) has the form of a U -statistics, i.e.
The central limit theorems in Section 4 require a variance lower bound. To obtain this, we make two further assumptions. For this, we assume that we study the K-function on an interval [r 0 , R].

Definition 3.3.
A point process P is said to be conditionally m-dependent if there exists a random measure Λ such that P ∩ C and P ∩ C are conditionally independent given σ(Λ, P ∩ A) for any bounded Borel sets C, C , A ⊆ R d such that the distance between C and C is larger than some m. Here, σ(Λ, P ∩ A) denotes the σ-algebra generated by Λ and P ∩ A. For such a process, we let R = max(m, R).
The condition says that conditionally on a suitable measure Λ the point process P is m-dependent. This should also hold if we further condition on P inside any set A. The latter is a technical assumption needed in the proof of Proposition 6.1 below. In fact, we only need it to hold for sets A of a specific form, however, it seems no big restriction to require it for all Borel sets. The condition is e.g. satisfied if P is Poisson.
A natural example of conditionally m-dependent processes is Cox processes. A Cox process P is given in terms of a stationary random measure Λ on R d . Conditionally on Λ, P is a Poisson process with intensity measure Λ. This clearly satisfies Definition 3.3.
Another example are cluster processes with bounded clusters. Such processes are built from a stationary parent process P and a sequence of a.s. finite i.i.d. point clusters in the ball B r (o). Each point x ∈ P is then replaced by one of the clusters translated by x forming a new point process P. It follows that Definition 3.3 is satisfied if we take P as the random measure Λ and m = 2r.
For a conditionally m-dependent process in the sense of Definition 3.3, we introduce the following condition, which essentially guarantees that certain point configurations occur with a probability which is bounded from below by a positive constant. This will be used to derive a lower bound on the variance in the proof of Proposition 6.1.
(R) Let P be a conditionally m-dependent point process. For any r 0 ≤ r 1 < r 2 ≤ R, define events F 1 , F 2 ∈ N by Then it must hold that where Λ is the measure from the definition of conditional m-dependence.
An example of a point process satisfying all assumptions in this section is a log-Gaussian Cox process. This is a Cox process with Λ(A) = A exp(Y (x))dx, where {Y (x)} x∈R d is a stationary Gaussian process. It was shown in [2] that if Y has compactly supported covariance function, then P is a conditionally m-dependent process with fast decay of correlations satisfying Condition (M) and (R). In fact, it is proved that a stronger version of (R), called (AC) in [2], holds. Note that the proof in [2] easily generalizes to fast decaying covariance functions such as the exponential covariance function, which will be used for simulations in Section 5.
Another example are Matérn cluster processes. These are cluster processes where P is Poisson and clusters are i.i.d. homogeneous Poisson in B r (o). Fast decay of correlations and Conditions (M) and (R) are again satisfied, as shown in [2]. The proof easily generalizes to Neyman-Scott processes, which are defined similarly to the Matérn processes, except that clusters are inhomogeneous Poisson with an intensity function of bounded support.

Gibbs point processes
The second class of point processes we shall consider is a class of Gibbs point processes which we call Ψ * . This will be almost the same as the class Ψ * in [23,26] but with a few restrictions. We consider an energy functional Ψ defined on finite point sets X that satisfies the following conditions: with the convention ∞ − ∞ = 0. We say that Ψ has finite range if there is a radius r Ψ such that for all (x, X ). For a finite range energy functional Ψ, we may consider the infinite volume Gibbs point process with inverse temperature β and activity τ satisfying where is the volume of the d-dimensional unit ball. Condition (4), together with the finite range, ensures existence and uniqueness of the infinite volume Gibbs process [6,Thm. 4]. This is the point process P satisfying that for any bounded domain D, conditionally on P ∩ D c = X 0 , P ∩ D is absolutely continuous with respect to a Poisson process Q on D of intensity τ with density where expectation is taken with respect to Q, and for X ⊆ D, The class Ψ * consists of all infinite volume Gibbs point processes satisfying (4), where the energy functional has one of the following forms: (i) Pair potential: There is a pair potential function φ : such that φ has compact support, φ −1 (∞) ⊆ [0, r 0 ] and φ is bounded on compact subintervals of (r 0 , ∞). Then (ii) Area interaction process: Let K ⊆ R d be a deterministic compact convex set. Then Note that all these energy functionals have finite range. It was shown in [23] that the point processes in P can be constructed by a backwards oriented perfect simulation technique, which is recalled in Appendix B for reference in the proofs. It follows from this construction that all Gibbs point processes of class Ψ * have fast decay of correlations as noted in [4]. Remark 3.4. The energy functionals here are essentially the same as in [26,23], except that 1) we modified the pair potentials in (i) to have finite range and 2) some of the energy functionals in [23] allowed an extra term αn, where n is the number of points in X . However, removing the term αn yields the same point process if τ is replaced byτ = τ exp(−βα), see [6,Def. 8]. The requirement 23] then becomes equivalent to our conditionτ κ d (r Ψ ) d < 1.

Laws of large numbers
For both types of point processes we consider, the literature provides formulas for the limiting mean and variance when n → ∞.
Theorem 3.5 (LLN forK n (r)). Let P be either a simple stationary point process in R d that exhibits fast decay of correlations as in Definition 3.1 and satisfies Condition (M) or a Gibbs process of class Ψ * . Further, letK n be defined as in (1). Then, for any r > 0, there is a constant C r such that Moreover, for r 1 , r 2 > 0, The limit (7) is finite. In particular,K n (r) converges in probability towards K(r).
Proof. In the case of fast decay of correlations, the inequality (6) and the limit (7) for r 1 = r 2 follow directly from [4, Thm. 1.12] and Lemma 3.2. The case r 1 = r 2 in (7) follows from the identity together with [4, Thm. 1.12] and Lemma 3.2 applied to ξ r1 + ξ r2 . For Gibbs point processes, the first statement follows from a direct computation valid for any stationary point process, while the limiting covariance follows from [26, Thm. 1.1].

Main results
In this section, we state the main results of this paper, which is a central limit theorem forK e,n (r) when restricted to a bounded interval [r 0 , R]. Throughout the paper, r 0 and R will denote the constants in condition (R) in the case of conditionally m-dependent processes, and the constants from the definition of the energy functional for Gibbs point processes.
We first state a central limit theorem for the finite dimensional distributions ofK e,n (r) as described in (2). The proof is given in Section 6.
converges in distribution to a multivariate Gaussian variable with mean zero and covariance structure given by Theorem 3.5.
The next theorem is a functional central limit theorem forK e,n (r). The proof is given in Section 7. converges weakly in Skorokhod topology to a centered Gaussian process with covariance structure given by Theorem 3.5. The limiting process has a modification that is Hölder continuous for any exponent γ < 1/2.

Generalization to other summary statistics
The arguments used to prove Theorem 4.1 and 4.2 apply to other geometric functionals given in terms of score functions. The key ingredients in the proofs is that the score function satisfies the inequalities in Lemmas 6.2, 7.2, and 7.3. However, the proofs of these lemmas, especially Lemma 7.2, rely on the geometric properties of the specific score function. Thus, a direct generalization of the proofs seems only possible for score functions similar in flavour to the Kfunction. The most obvious such functional is the pair correlation function given for isotropic point patterns by The pair correlation function relates to the K-function by dκ d r d−1 g(r) = K (r). A kernel estimator for g(r) is given in [5] bŷ where k is a compactly supported kernel function which is C 1 on its support. Another related functional is the nearest neighbor function given by , which can be estimated [5] bŷ  (8) with , and letD n be as in (9). The processes converge weakly in Skorokhod topology to a centered Gaussian process, which has a modification that is Hölder continuous for any exponent γ < 1/2.
In the case of D n , the score function ξ = 1 (P\{x})∩Br(x) =∅ can be bounded by the score function (3) of the K-function, which can be used to generalize Lemmas 7.2-7.3. Only Lemma 6.2 needs a new proof, which can be given in a way similar to Lemma 7.1.
The case of g n can also be shown by a straightforward generalization of Lemmas 6.2, 7.2, and 7.3. Alternatively, the result could be derived directly from Theorem 4.2 by writinĝ and applying the continuous mapping theorem, noting that the functional where β is a cadlag function on [r 0 , R], is continuous in the Skorokhod topology, see arguments of [2,Cor. 3.4].
In [13], a functional limit theorem was shown in the Poisson case for the more general multi-parameter K-function The results of this paper easily generalize to the multiparameter K-function, using a multiparameter version of the tightness criterion [14, Lem. 3]. A functional central limit theorem for persistence diagrams was shown in [2] for conditionally m-dependent processes. The proofs easily generalize to Gibbs point processes using Lemma 7.3 of the present paper. The only case where one has to be careful is when a hardcore radius is present since this may rule out the possibility of a variance lower bound in certain parts of the persistence diagram.

Goodness-of-fit tests
Consider the problem of how well an observed point pattern x fits an assumed model P 0 . We present how the results of Theorem 4.2 can be used for construction of goodness-of-fit tests by considering a Kolmogorov-Smirnov type test based on Ripley's K-function. We assume that all parameters are known. Let H 0 be the hypothesis that x is a realisation of P 0 . If H 0 is true, then for a given R > 0, sup where Y (r) is a centered Gaussian process with covariance C P0 given by the limiting covariance function in (7). Let q α denote the quantile of sup Note that in place of the supremum in (10), we could have used the integral over [0, R]. Altenatively, one might use either the Cramer-von Mieses type test or the χ 2 -test presented in [12].

Simulation study
We illustrate the goodness-of-fit test presented above in a simulation study where we estimate the rejection rate of H 0 when x is indeed a realisation of P 0 and when it is a realisation of an alternative model having the same intensity as P 0 . All point processes considered have intensity 1. Each rejection rate has been estimated, with α = 5%, by simulating 10000 point patterns x on W n for n = 20 2 , 50 2 , 100 2 , 200 2 . Moreover, the statistic in (10) has been computed for R = 1, 2, 3, 4, 5. All the simulations have been done in R with the package spatstat and the border edge correction was used.
To shorten, we denote a Poisson point process by P oi, a log-Gaussian Cox process (LGCP) with exponential covariance function, variance σ 2 , and scale parameter a by LGCP a (σ 2 ), and a Strauss point process with interaction parameter γ and interaction radius 0.4 by Str 0.4 (γ). The dependence on the activity parameter β of the Strauss process is omitted in the notation as we always determine it by simulation so that the intensity of the process is 1. The first row in Figure 1 shows examples of realisations of the point processes considered for our simulation study: P oi(1), Str 0.4 (0.2), and LGCP 2 (0.2), simulated on W n = [0, 100] 2 . The second and third row in Figure 1 shows a zoom of the first row of point configurations on [0, 50] 2 and [0, 20] 2 , respectively. Note that depending on the scale, it is not visually easy to distinguish the different processes. For example on [0, 100] 2 , there is no clear distinction between P oi(1) and Str 0.4 (0.2). This distinction is however clearer at small scale as seen in the third row where Str 0.4 (0.2) appears to be more regular.

C. A. N. Biscio and A. M. Svane
We have discretized the segment [0, R] by steps of 0.1 and estimate empirically the quantile q α in (10) with 100000 realisations of a multivariate centered Gaussian random variable with covariance matrix (C P0 (r 1 , We estimate the rejection rate of H 0 over 10000 realisations of P 0 and 10000 realisations of an LGCP 2 (0.2) model. The results are reported in Table 1. Second, we repeat the same procedure but with P 0 ∼ LGCP 2 (σ 2 ) for σ = 0.2, 1, and P 0 ∼ Str 0.4 (γ), for γ = 0.2, 0.5, 0.8. Using 100000 realisations, the intensity of the point process is estimated to be 1 when β = 1.556, 1.298, 1.107 for γ = 0.2, 0.5, 0.8, respectively. Note that Inequality (4) holds for these values. Then, C P0 and EK e,n (r) have been estimated using 10000 realisations of LGCP 2 (σ 2 ) and Str 0.4 (γ). The rejection rates have been estimated over 10000 realisations of P 0 and 10000 realisations of a Poisson process with intensity 1. Due to computational power limitation, we have only run perfect simulations of the Strauss process up to n = 100 2 . We report the results for LGCP 2 (0.2) and Str 0.4 (0.2) in Tables 2 and 3, respectively, and comment on the remaining cases below.
Finally, to better illustrate our results, we report in Table 4 the output of the test of the null hypothesis P 0 ∼ P oi(1) for the three point process realisations in Figure 1 for different windows W n and intervals [0, R].

Discussion
The results reported on Tables 1-3 are in agreement with Theorem 4.2. When P 0 is a Poisson point process or LGCP with variance σ 2 = 0.2, we recover the correct type I error rate as soon as n ≥ 100 2 and R ≥ 2. When P 0 is a LGCP with larger variance σ 2 = 1, the type I error was always estimated around 2% and we actually need n ≥ 300 2 to recover the correct type I error meaning that results in this setting would hold only when a very large number of points are observed. In the case of the Strauss process, the correct type I error is always found when n ≥ 50 2 .
When tested against an alternative hypothesis, the test is good at detecting deviations from the Poisson point process, see Table 1. When P 0 ∼ LGCP 2 (0.2) or LGCP 2 (1), results are bad. This is due to the large variance ofK e,n (r) in these cases leading to a large value of the quantile q α . Consequently, better results are obtained for small values of R and large windows. In the case of Strauss processes with γ = 0.2, the assumption that Ripley's K-function is the one of the Strauss process is correctly rejected only for small values of R and n ≥ 50 2 . As we increase γ, the power decreases, as the null model comes closer to the Poisson process. Hence, depending on the alternative hypothesis considered, we suggest to use another functional than the supremum in (10).
In the case where the number of points is very large, i.e. more than some millions, it may not be feasible to estimate C P0 and EK e,n (r) through simulation of realisations of P 0 as done in our simulation study for the LGCP and Strauss process. In this case, both C P0 and EK e,n (r) need to be known which, although relying on numerical integrations, holds for LGCPs. Therefore, when the null model is assumed to be a Poisson point process or LGCP, our goodness-of-fit test only requires the estimation of the quantile q α which can easily be obtained through simulation of Gaussian paths. Therefore, assuming a known intensity, the goodness-of-fit test proposed in Section 5.1 only take some minutes to return an output.
Looking at Table 4, we observe that the realisation of LGCP 2 (0.2) is correctly rejected except when W n = [0, 100] 2 and R = 3 or R = 5. This may look counter intuitive as visually the realisation appears to be more clustered than a P oi (1). However, this may be due to the supremum in (10) and the cumulative nature of the K-function which hides the clustered behaviour visible on Figure 1 (top right panel) when R is too large. Indeed, the test is correctly rejected for the small value R = 1, for all sizes of the window W n .
Similarly the difference between P oi (1) and Str 0.4 (0.2) is best shown at small scale, i.e. on the last row in Figure 1. Therefore, R in the test statistics (10) should be chosen small to distinguish the two processes. This is confirmed in Table 4 where the null hypothesis is rejected only when R = 1 and W n larger than [0, 50] 2 . Note that the test fails to detect the bottom middle panel on Figure 1 as not Poisson although the pattern visually appears to be too regular for a Poisson process.
As an example of an incorrectly rejected null hypothesis, we can see in Table 4 that the Poisson point process realisation from Figure 1 is wrongly rejected when tested on [0, 50] 2 . It may be due to the point configurations being too clustered on this region of the space.

The case of conditionally m-dependent processes
In this section, we prove Theorem 4.1 for conditionally m-dependent processes. When no edge corrections are present, the proof follows directly from [4, Thm. 1.13] once we can show the following variance lower bound. To obtain the proof in the edge corrected case, we need to consider the deviation E e,n (r) = (K n (r) − EK n (r)) − (K e,n (r) − EK e,n (r)) (11) between the edge corrected and uncorrected centered K-functions. We show an upper bound on the variance of E e,n (r) in the following lemma together with a variance bound that we are going to need later in the proof of Theorem 4.2. To state these results, we define for an interval I = (r 1 , n Var(E e,n (r)) ≤ Cn −1/d .
We first show how Theorem 4.1 follows from these two results and then give their proofs.
Proof of Theorem 4.1(m-dependent case). We first consider the uncorrected case and show the joint convergence of (K n (r 1 ), . . . ,K n (r p )). By the Cramér-Wold device, it suffices to show a pointwise central limit theorem for all linear combinations ofK n (r 1 ), . . . ,K n (r p ). This follows directly from [4, Thm. 1.13], where the assumptions are satisfied by Lemma 3.2 and the asymptotic variance lower bound in Proposition 6.1.
Therefore, for each j = 1, . . . , p we have that √ n(K n (r j ) − EK n (r j )) − √ n(K e,n (r j ) − EK e,n (r j )) converges to 0 in probability, so the statement of Theorem 4.1 in the edge corrected case follows from the uncorrected case.
It remains to prove Proposition 6.1 and Lemma 6.2.
Proof of Proposition 6.1. Let p ∈ N, s 1 , . . . , s p ∈ R and r 0 ≤ r 1 < · · · < r p ≤ R be given. For any Borel set A ⊂ R d , we introduce the notation As a special case, so what we need to show is lim inf For any Borel set A ⊂ R d , we let Λ P A = σ(Λ, P ∩ A) to shorten notation and recall thatR = max(m, R). For t > 0, let be the union of cubes of side length t centered at the vertices of the lattice 6RZ d . Finally, let From the law of total variance, it follows that We have that T n (W n ) = T n (A 3R ) + T n (C 3R ) and T n (A 3R ) depends only on The squares in C 3R are separated by at least distance 3R, and T n (C 3R ) only depends on the points of P within distance R ≤R from C 3R . Thus, the conditional m-dependence and (15)-(16) yield Since for all z ∈ Z d , P ∩ AR ⊂ P ∩ (6Rz + WR d ) c , Lemma A.2 yields, The last inequality used stationarity and the fact that T n (6Rz + W (3R) d ) only depends on P ∩ (6Rz + W (5R) d ).
We may assume s p = 0. Let Then, where I 1 = {0} and I 2 = [|s p |, ∞). Therefore, by applying Lemma A.1 with Thus, by Condition (R), This, together with (17), shows (14) since the number of terms in (17) is of order n.

Proof of Lemma 6.2. For
denote the annulus centered at x and having inner and outer radius r 1 and r 2 , respectively. Note that |A I (x)| = κ d (r d 2 − r d 1 ) and that y ∈ A I (x) is equivalent to x ∈ A I (y).
Both nK e,n (I) and nE e,n (r) take the form where nK e,n (I) corresponds to e = e i,n , and nE e,n (r) corresponds to e = 1−e i,n and I = (0, r]. Since we may write the variance of (19) as I 1 + I 2 + I 3 where First consider nK e,n (I). Definition 3.1 implies that ρ (p) is bounded on R pd , see also [4, (1.11)]. Moreover, the edge correction factors e i,n are bounded whenever n > (2R) d . Thus, there exists a constant C 1 > 0 such that Since r 1 , r 2 ∈ [0, R], there exists a constant C 2 , depending only on R and d, Similarly, there exist constants C 4 , C 5 > 0 such that By Definition 3.1, there exist constants C 6 , C 7 , C 8 > 0 and a function φ(t) ≤ C 6 (t −(d+1) ∧ 1) such that {x, y}, {u, v}))dxdydudv 3080
Next consider Var(nE e,n (r)). For e = 1 − e i,n (x, y), i = 4, 5, we note that e is again bounded. Moreover, e(x, y) vanishes outside (W n,2r ) 2 , where we use the notation for the set of points that are within distance s from the boundary of W n . This implies that the factor n = |W n | in the variance bound can be replaced by |W n,2r |. Note that For e = 1 − e 2,n , we note that whenever |x − y| ≤ r, we have W n B r (o) ⊆ W n ∩ (W n + x − y), so for n large enough Thus, for some C 14 > 0, Using this and proceeding as in the case ofK e,n (I) yields (13).
The case e = 1 − e 3,n is very similar. For any rotation η ∈ SO(d), apply the inequality (21) to |W n + η(x − y)|. Averaging over all rotations, the inequality is preserved. Proceeding as before, we obtain (13).

Proof of Theorem 4.1 in the Gibbs case
Proof of Theorem 4.1(Gibbs case). We first consider the case without edge corrections. The modification for edge corrections follows from Lemma 6.2 as in the case of conditionally m-dependent point processes.
We again use the Cramér-Wold device, that is, we need to show a central limit theorem for all linear combinations ofK n (r 1 ), . . . ,K n (r p ), p ≥ 1, 0 ≤ r 1 ≤ · · · ≤ r p ≤ R. Such a linear combination corresponds to a score function of the type The last inequality holds because P can be given as a thinning of a Poisson process of intensity τ , which has finite moments.
To show 4., we take s = R d and note that by Lemma A.2, and since ξ has stabilization radius R, Place two small open balls, A 1 and A 2 , inside W R d at least distance r p−1 apart and both contained inside a ball of diameter r p > r p−1 . Define the events Then where I 1 = {0} and I 2 = [|a p |, ∞), and hence by Lemma A.1, Since E is measurable with respect to P ∩ W c R d , we find We now apply [26, Lem. 2.2] to show that We need to check that the latter expectation is strictly positive. Recall that the Gibbs point process has the density (5) with respect to the Poisson process Q on W R d when conditioning on P ∩ W c R d = X 0 . This density is bounded from below on E 2 by some c > 0 uniformly in all values of X 0 ∈ E . This is because the denominator in the density is bounded from above by 1, and the numerator is bounded from below on E 2 ∩ E . Indeed, the maximal number of points in W (3R) d is two, and these points are at distance of at least r 0 from each other and at least R from any other points. Hence Δ Ψ (X , X 0 ) is bounded from above, resulting in a lower bound on the numerator. With this bound on the density, we have on E Positivity of (22) now follows because P(P ∈ E ) ≥ P(Q ∈ E ) since P can be given as a thinning of a Poisson process Q, see Appendix B.

Proof of Theorem 4.2
The proof of Theorem 4.2 is based on Lemma 7.1 below, which provides a bound on the fourth cumulant c 4 . To state Lemma 7.1, let r ∈ [r 0 , R] and I = (r 1 , r 2 ] ⊆ [r 0 , R] and introduce the notations K e,n (r) =K e,n (r) − EK e,n (r) K e,n (I) =K e,n (r 2 ) −K e,n (r 1 ).
Two intervals are called neighboring if they share exactly one end point. We recall the definition of the fourth multivariate cumulant of centered stochastic variables X, Y, Z, W : Proof of Theorem 4.2. Let k r denote the limit of √ nK n,e (r) and let I = (r 1 , r 2 ]. Since by Theorem 4.1, √ nK e,n (I) converges in distribution to (k r2 − k r1 ) as n tends to infinity, the Portmanteau theorem and Lemma 6.2 yield According to [14,Lem. 3], convergence in Skorokhod topology is ensured if we can show three properties. The first is convergence of finite-dimensional distributions, which follows from Theorem 4.1. The second property we need to check is that lim for any δ > 0. This follows from (24) and the Chebyshev inequality. Finally, we need to show that there is an ε > 0 and a constant C > 0 such that for all neighboring intervals I 1 and I 2 , To see this, note that Definition (23) of c 4 yields Applying Lemmas 6.2 and 7.1 with X = √ nK e,n (I 1 ) and Y = √ nK e,n (I 2 ), we obtain (25) with ε = 1/4.
Hölder continuity of (k r ) r∈[r0,R] follows from the Kolmogorov continuity theorem [16]. Indeed, we know from Theorem 4.1 that for any r 0 ≤ r 1 < r 2 ≤ R, (k r2 − k r1 ) is gaussian with mean 0. It follows from (24) and Jensen's inequality that for any integer m ≥ 1, Hence (k r ) r∈[r0,R] has a Hölder continuous modification for any exponent γ < 1 2 − 1 2m . It remains to show Lemma 7.1. The proof is based on decompositions of cumulant measures into mixed-moments and semi-clusters. The necessary background is given in Section 7.1. Two preliminary lemmas are shown in Section 7.2 and 7.3, respectively, before the proof is presented in Section 7.4.

Background on decomposition of cumulant measures
In this section, we recall the necessary background on decomposition of cumulant measures into mixed-moments and semi-clusters. We refer the reader to [7] for a detailed presentation.

Moment and cumulant measures
Let P be a simple point process on R d . We define a marked versionP = P×{1, 2} onȒ d = R d × {1, 2} and letP n = P n × {1, 2}. For any extended score function ξ :Ȓ d × N → R d , we define the random measure onȒ d Then, following [7, Section 3.1], the k-th moment measure M k (μ n ) = M k n is defined as for any non-negative measurable function Similarly, the k-th cumulant measure c k (μ n ) = c k n is given by We have the following expression for c k n in terms of moment measures

Semi-cluster decomposition
The cumulant measures further decompose into the so-called semi-cluster measures. These are defined for any disjoint non-empty sets S, T ⊂ {1, . . . k}, and where a S ,T ,T1,...,Tp ∈ R and the sum runs over all partitions of {1, . . . , k}, such that S and T are non-empty subsets of S and T , respectively.

Decomposition with respect to the diagonal
The maximal separation distance ofx ∈ (Ȓ d ) k into two disjoint subsets is defined as For S, T a non-trivial partition of {1, . . . , k}, we let On the set σ(S, T ) we can use (30) to decompose dU S ,T as where we have writtendx S ∪T =dx S 1 · · ·dx S qdx T 1 · · ·dx T r . Note here, that there are terms in the definition (30) of dM S ∪T n corresponding to partitions that do not split into a partition of S and a partition of T , but these vanish on σ(S , T ).
As in [7, (3.28)], we have the following decomposition for any non-negative measurable function f : f (x)dc k n (x). (33)
Define the extended score functionξ e,n : and let the associated random measure μ n be as in (26). In this section, we establish a bound on the mixed ξ-moments (29) that is used in the proof of Theorem 7.1. Recall the map π : Δ T1,...,Tp → R p that ignores repeated points and let Similarly, for T ⊆ {1, 2, 3, 4}, (W where the mixed ξ-moments are defined via the score functionξ e,n in (34), and Proof. It is enough to show the theorem in the case without edge corrections since the edge correction factors are bounded and hence there is a C > 0 such that ξ e,n,Ii ≤ Cξ e1,n,n,Ii , i = 1, 2. On (W Then, by stationarity, the integral in (35) is bounded by The same bound obviously holds for (36) when V = B K (x 0 ). Using the Campbell formula and the definition of ξ Ii , we rewrite the integral in the bound as Multiplying out the sums over y j i , we obtain a sum over y ∈ P |T | , where the points in y are not necessarily all different and could equal some of the points in x. We now apply the Campbell formula backwards. To do so, we must write the above sum as a linear combination of sums over different points x 1 , . . . , x p , y j 1 , . . . , y j q . To illustrate how each such term is bounded, we consider the case p = 3, l 1 = l 2 = l 3 = k 1 = 1 and y 1 1 = x 2 , y 1 2 = x 3 og y 2 1 = x 3 while y 1 3 is different from the points in x. This yields where we used that ρ (4) is bounded in the first inequality. It is a straightforward check that all other integrals can be bounded similarly. Indeed, the y j i 's that are free can always be integrated out. If the obtained bound involves the necessary factors of |I 1 | or |I 2 | we bound the remaining integral by |B 2K (o) p |. Otherwise, it is convenient to bound one or more of the indicator functions in the integral by 1, leaving at most one involving I 1 and one involving I 2 . As I 1 and I 2 are disjoint, a product of indicator functions for these two sets must involve at least three different points to be non-zero. Thus, these can be integrated out one after the other to obtain the necessary bound.

Fast decay of mixed ξ-moments
In this section, we show an analogue of the fast decay of correlations for a slightly more general version of the ξ-weighted measures. Throughout the section, we use the notation x = (x 1 , . . . , x k ) = (x 1 , x 2 ), where x 1 = (x 1 , . . . , x p ) and x 2 = (x p+1 , . . . , x k ).
Proof. First we consider a point process having fast decay of correlations and satisfying Condition (M). By the Hölder inequality, where L = L 1 + · · · + L k . By Condition (M) and boundedness of ρ (k) (x), this is bounded, which shows (i).
In [4,Thm. 1.11], (ii) was shown for the case where ξ is a score function that does not depend on n, satisfies (37), and is invariant under translation, i.e. ξ(x, X ) = ξ(x + y, X + y).
Our case is different in several ways. First of all, not all score functions are identical. However, this is not a problem for the proof of [4,Thm. 1.11], since the only properties needed in the proof was a factorization [4, (3.21)] of the form and a common bound of the form (37) for all the involved score functions. The proof yields a bound of the form (ii) where the fast decreasing function ω depends only on the score functions involved via the constantĉ.
Moreover, the involved score function was assumed translation invariant in [4], which is not assumed here. However, the proof of [4, Thm. 1.11] does not use translation invariance, since the factorial moment expansion [3], on which the proof was built, does not require translation invariance. As long as the bound (37) holds for all (x, X ), the proof still goes through.
Next, we consider the Gibbs case. Assume that P is constructed by thinning a free birth-death process γ(t) of intensity τ as described in Appendix B. Let Q be the Poisson process γ(0) and P = γ where the last inequality follows from (i) that has been proved for point processes satisfying Condition (M) and thus for Poisson point processes. It follows that m L1,...,L k n (x, P n ) must be bounded by C for almost all x. where ξ jlj (y j , X ).
Let E 1 and E 2 be the events that all ancestors of points in The first equality follows from independence between disjoint parts of γ(t). We need to bound the last four terms. Since the argument is the same in all four cases, we just consider E[f (P n )1 E c 1 ]E[g(P n )]. By (i), E[g(P n )] ≤ C|A 2 |. By the Cauchy-Schwarz inequality, The latter bound follows from (41) in Appendix B and because whereỹ = (y 1 , y p+1 , . . . , y p+l ). The last inequality follows from the Hölder inequality and boundedness of moments for the Poisson process. The above shows that Finally, (ii) is shown by the Lebegue differentiation theorem when A tends to x.

Proof of Lemma 7.1
Proof of Lemma 7.1. We decompose the fourth cumulant into integrals with respect to the fourth cumulant measure. According to (27) We bound each term separately. The strategy is to apply either the moment decomposition (28) and (30) or the semi-cluster decomposition (31)-(32) of the cumulant measure c 4 n to write each integral as a sum of integrals of ξ-weighted moments with respect to the singular differentials. Lemma 7.2 and 7.3 provide bounds on the ξ-weighted moments, which lead to the bound claimed in Lemma 7.1. The details are given below.
We first consider the diagonal term. In (28) We bound the integral over D K and σ(S, T )\D K separately using the semi-cluster decomposition and moment decomposition, respectively.
We first consider the integral over D K . Here we apply the semi-cluster decomposition of c 4 n given in (31) To bound this, we apply the bound on the mixed ξ-moments in Lemma 7.3 (ii) to obtain a fast decreasing function ω not depending on n, I 1 , and I 2 such that Let m = 12d. By Definition 3.1, there exists a constant c > 0 such that ω(t) ≤ c t −m so that Since we are on σ(S, T ), D(x S ∪T ) ≥ dist(x S ,x T ) ≥ dist(x S ,x T ). Moreover, noting that 1 ≤ |S|, |T | ≤ 3, all points inx S andx T , respectively, must be within distance 2dist(x S ,x T ) of each other.
Letting δ(x) denote the maximal pairwise distance between points inx, we get We now recall the decomposition (30) We note that on σ(S, T ) only terms with all T j ⊆ S or T j ⊆ T contribute since the measuredx T j is concentrated on the diagonal ofȒ |T j | . Moreover, we know from Lemma 7.3 that all m where the maximum is taken over i = 1, . . . , q, j = 1, . . . , s, k = 1, . . . , r, l = 1, . . . , t. The remaining integrals are bounded similarly.
Next, we consider the integral over σ(S, T )\D K . On this set, each of the four points inx ∈ (Ȓ d ) 4 must be within distance 3K from each other. Thus, Using the moment decomposition (28) and (30) and stationarity of P, we obtain a factorization of this integral into factors of the same form as the ones in Lemma 7.2. This yields the bound c 4 n (W 1,2 n ∩ D c K ) ≤ CnK 3d |I 1 ||I 2 | = Cn(|I 1 ||I 2 |) 3/4 .
Remark 7.4. The machinery of decompositions of the cumulant measures into moment and semi-cluster measures used in the proof of Lemma 7.1 is general and applies to any functional of the form x∈Pn ξ n (x, P n ). The only properties used in the proof, which are specific to the K-function, is that it satisfies Lemma 7.2 and Lemma 7.3. Lemma 7.3 is quite general, requiring only a common bound of the form (37) for all ξ n , while the proof of Lemma 7.2 relies on the specific form of the score function and would need a new proof for a different score function.

Appendix A: Results on conditional variances
This appendix contains two lemmas referred to in the proof of Proposition 6.1.
This is a conditional version of [26, Lemma 2.3]. The second lemma contains some results on conditional variances. The proofs are included for completeness.
We first note that all ancestor clans A(x) are almost surely finite. Here, the idea of [8] was to dominate the clan of ancestors by a Galton-Watson branching process, where each split has a Poisson distributed number of branches with mean λ = τ κ d (r Ψ ) d . By standard branching process theory [11,Chapter 1.6], the branches die out almost surely if λ < 1, which is ensured by (4). This ensures that the thinning procedure above can also be applied to the full free birth-death process γ(t) resulting in the process γ Ψ (t).
Next, we consider the spatial diameter of B(x) = proj(A(x))⊕B r Ψ (o), where proj(A(x)) is the projection of the ancestor clan to R d , such that B(x) is the projection to R d of all the balls we need to search in order to determine the acceptance probability of x. Let Z n be the number of branches in the nth generation of the dominating branching process. Then This means that for any bounded D, P Wn ∩ D = γ Ψ Wn (0) ∩ D coincides with γ Ψ (0) ∩ D with a probability that tends to 1 exponentially fast for n → ∞. This implies the convergence in the local bounded topology of P Wn (andP Wn ) to γ Ψ (0), which must thus be the infinite volume Gibbs process.