Estimation of the asymptotic variance of univariate and multivariate random fields and statistical inference

Correlated random fields are a common way to model dependence struc- tures in high-dimensional data, especially for data collected in imaging. One important parameter characterizing the degree of dependence is the asymp- totic variance which adds up all autocovariances in the temporal and spatial domain. Especially, it arises in the standardization of test statistics based on partial sums of random fields and thus the construction of tests requires its estimation. In this paper we propose consistent estimators for this parameter for strictly stationary {\phi}-mixing random fields with arbitrary dimension of the domain and taking values in a Euclidean space of arbitrary dimension, thus allowing for multivariate random fields. We establish consistency, provide cen- tral limit theorems and show that distributional approximations of related test statistics based on sample autocovariances of random fields can be obtained by the subsampling approach. As in applications the spatial-temporal correlations are often quite local, such that a large number of autocovariances vanish or are negligible, we also investigate a thresholding approach where sample autocovariances of small magnitude are omitted. Extensive simulation studies show that the proposed estimators work well in practice and, when used to standardize image test statistics, can provide highly accurate image testing procedures.


Introduction
Random fields are a natural approach to model high-dimensional data. They arise in a natural way when analyzing digitized image data as arising in medical imaging, e.g. MRI or CT images, or in industrial quality control, e.g. images of materials obtained from cameras capturing the visual or infrared spectrum or electroluminescence images of solar cells, in order to analyze the structure of the material of interest and to detect defects. Although the main results go beyond that scope and even allow to treat, for example, voxel-by-voxel multivariate brain data collected at a sample of individuals, let us first stick to the following setting, in order to motivate the approach, outline basic ideas and fix notations: Consider a sequence of n 1 images represented by n 2 × n 3 dimensional matrices (Y (i1,i2,i3) ) (i2,i3)∈(1:n2)×(1:n3) , i 1 = 1, . . . , n 1 , of real-valued random variables, where for any q ∈ N we put a : b = × q i=1 {a i , . . . , b i } for a = (a 1 , . . . , a q ), b = (b 1 , . . . , b q ) ∈ N q and 1 = (1, . . . , 1) ∈ N q . The sequence (1) of matrices can be seen as the corresponding subset {Y i } i∈1:n of a three-dimensional random field {Y i : i ∈ Z 3 } of random variables defined on a common probability space. A statistical test to check whether or not one observes a reference image m (0) = (m (0) (i2,i3) ) (i2,i3)∈1:(n2,n3) can be based on the sum In order to test for the presence of a known reference series of images m the partial sum scaled by the squared root of |n| = i n i converges weakly (in distribution) to a Gaussian law, as n → ∞, where from now on n → ∞ is understood as min j n j → ∞. Here B(x, y, z), x, y, z ≥ 0, is a standard Brownian motion in dimension 3. For example, (2) has been shown for weakly stationary linear processes, see [19] and [20], strictly stationary ϕ-mixing fields on which we shall focus in the present paper, see [11], or, using other notions of weak dependence, in [7], [3], [27], [14] and [8].

A. Prause and A. Steland
Here a ≤ b for vectors a, b is understood component-wise and a ≤ * b means a ≤ b with a = b, is the Euclidean vector norm and 0 denotes the number of non-zero coordinates of with components j .
In order to use the central limit theorem (2), it is crucial to be in a position to estimate σ 2 consistently, and the lack of such estimators, contrary to the time series literature, motivated this paper. For example, for the above mentioned image testing problem an asymptotic level α test, α ∈ (0, 1), for H 0 can be devised by rejecting H 0 if where Φ denotes the distribution function of the standard normal law, as long as σ 2 n is a consistent estimator of σ 2 . In the same vain, one can construct a statistical test for a single image, in order to test for departures from a reference model m (0) for the expectation of the two-dimensional random field {ξ i : 0 < i ≤ n} representing the image of pixel resolution n 1 × n 2 . In our simulations we do not only investigate the proposed estimators in their own right, but also examine the accuracy of such an image test in terms of the type I error rate, see Section 4.3. It turns out that our approach allows for highly accurate test procedures for images as required by present day image analysis.
The aims and contributions of the present paper are therefore as follows: Going beyond the above scope of sequences of images, we consider general random fields of arbitrary dimension q of the domain and taking values in the p-dimensional Euclidean space R p . We propose and study a nonparametric estimator of the asymptotic variance, which directly generalizes the class of Bartletttype long run variance estimators studied in time series analysis. In addition, we provide a central limit theorem for that estimator and establish the consistency of subsampling under weak conditions. A concise model when capturing a sample of images of constant scenery, e.g. in order to estimate the true underlying object by the noisy images and to estimate the spatial-temporal correlations, is to assume that the data is given by a superposition of a time series and a spatial random field. Then one may center the observed images at their temporal average. We show that the corresponding estimator of σ 2 is still consistent under fairly weak conditions. Lastly, as spatial (or time) correlations in image data resp. sequences of images are often of a local nature, so that autocovariances corresponding to larger lags are negligible or even vanish, we introduce and study a class of cut-off or thresholding estimators, which resemble to some extent thresholding estimators studied in high-dimensional statistics. Those cut-off estimators aim at reducing the estimation variability by neglecting autocovariances of small order. We present extensive simulation results, in order to shed some light onto the accuracy of the proposed estimators, to identify situations where the thresholding estimator improves upon the classical lag-truncation approach, and to investigate how the estimators perform when used in statistical image testing.
Consider a general q-dimensional real-valued random field {ξ n : n ∈ Z q } with E(ξ n ) = 0 for all n ∈ Z q , where, as above, the first dimension typically represents time. Then the asymptotic variance of the random field is defined as When the random field attains values in R p for some p ∈ N, i.e. ξ n : (Ω, F, P) → (R p , B p ), for n ∈ Z q , where (Ω, F, P) denotes the underlying probability space and B p the usual Borel σ-field on R p , the asymptotic variance is the matrix since now σ 2 n = Var (|n | −1/2 S n ) = |n| −1 E(S n S n ). Observe that σ 2 adds up all (cross-) autocovariances in the temporal and spatial domain. Obviously, for q = 1 we are given a univariate times series and σ 2 is the well known long run variance, for which an extensive literature on its estimation exists, e.g. smoothing window type estimators, cf. [17], or estimators based on batched means, cf. [1] and [28]. But for a random field of dimension q > 1 there are only a few results which address certain special cases. Indeed, most of the existing results are motivated from an economic point of view and concern the spatial heteroscedasticity and autocorrelation consistent estimation of covariance matrices with applications in two dimensions as in [13], [9], [15] and [16]. [13] obtain consistent estimates of the matrix of cross-sectional correlations by averaging over the time dimension, i.e. they request that the time dimension grows while the size of the cross-sectional dimension stays fixed. They construct an estimator that relies on the standard Newey and West estimator of the time series literature, see [18], and show that it is robust to very general forms of spatial and temporal dependence as the time dimension becomes large. [15] as well as [16] study spatial heteroscedasticity and autocorrelation consistent estimators of covariance matrices of parameter estimators, where the spatial dependence is measured by a so-called economic distance. If this economic distance d ij,n between two units i and j is small these units are highly dependent, if it is large, however, the units are nearly independent. Examples for an economic distance include geographic distances as well as transportation costs. Both papers also allow for errors in the measurement of the distance. The main disadvantage is, however, that both papers focus on linear processes with i.i.d. innovations, which include certain non-stationary models, but exclude non-linear models.
A slightly different approach is the one of [9] who considers the estimation of the asymptotic variance for strictly stationary and α-mixing random fields in two dimensions. He constructs an estimator as the weighted average of products of the observations and finally shows its consistency.
However, to the best of our knowledge, no further estimators of the asymptotic variance for arbitrary stationary mixing random fields in q dimensions exist in the literature so far. Thus, in Section 2 we propose an estimator defined as a weighted sum of the sample autocovariances of the random field and show its consistency for mixing random fields and also provide results about the asymptotic distribution. In Section 3 we show how one can improve the estimator and propose a data-adaptive procedure to select remaining unknowns which make use of subsampling. Section 4 is devoted to an extensive simulation study of both estimators. The simulations study for several models the behavior of the estimators and demonstrate that the thresholding estimator is preferable. We also investigate the statistical properties of the image test and show that the thresholding estimator leads to very accurate image tests. For its application and accuracy in change detection, see [23] and [25]. Rigorous proofs of all results are provided in Section 5.

Nonparametric estimation of the asymptotic variance
In this section, we introduce the proposed nonparametric estimator for the asymptotic variance of a random field. The estimator is based on the formula (4) and belongs to the class of lag-truncation estimators. We provide an interesting extension to the case of multiplicative random fields which are well suited to analyze image samples of a constant scenery.

Estimation for general random fields
the autocovariances of {ξ i }. Define the sample autocovariances and set Recall that |j| ≤ m is understood component-by- The weights w m (j), for fixed j ∈ Z q and m = m n ∈ N q , arising in (4) are assumed to satisfy the following assumptions.
independently of j and m. The weights w m (j) can be chosen as the product of one-dimensional weights, i.e. we can put w m (j) := q i=1 w mi (j i ), where the w mi (j i ), 1 ≤ i ≤ q, are weights satisfying (W1) and (W2). Examples for such weights are the Bartlett weights, w mi (j i ) = 1− ji mi for 1 ≤ i ≤ q, or the Tukey-Hanning weight sequence, For multivariate random fields, i.e. for p > 1, σ 2 and its estimator are matrixvalued, such that the evaluation of an estimator's accuracy requires to select a matrix norm. For simplicity of presentation, in the sequel the matrix maximum norm on R p×p defined by , will be used. One could also employ the frequently used Frobenius norm A F = p i,j=1 |a ij | 2 , which, however, satisfies such that all results can be easily reformulated in terms of · F instead of · ∞ , or any other matrix norm, since all norms on R p×p are equivalent.
Our main results require the random field to be ϕ-mixing. For related and other notions of weak dependence for random fields we refer to [12] and [6] and the discussion at the end of this section. Let us briefly recall the definition of ϕ-mixing.
Putting ϕ(0) = 1 we can observe that the set {ϕ(r)} is a decreasing sequence of real numbers. The random field {ξ n : n ∈ Z q } is now called ϕ-mixing, if ϕ(r) → 0 as r → ∞.
We are now in a position to formulate the following theorem on the consistency of the estimator σ 2 n under mild regularity conditions for a general mutivariate random field.

A. Prause and A. Steland
Theorem 2.1. Suppose that Assumption 1 holds and that the weights w m (j) fulfill (W 1) and (W 2). Furthermore, assume that m → ∞, as n → ∞, with where m = max 1≤i≤q m i and n = min 1≤i≤q n i . Then the estimator σ 2 n is weakly consistent, i.e.
It is interesting to note that the consistency can be strengthened to L 2consistency resp. L 1 -consistency, as shown by the following theorem.

Theorem 2.2. Let the conditions of Theorem 2.1 be satisfied. Then for
A direct consequence of Theorem 2.1 is the consistency of the autocovariance estimators, γ n (j), used to define the estimator of σ 2 , which are, of course, of independent interest when analyzing random field data.
Corollary 2.1. The estimator γ n (j) of the lag j-autocovariance of random field for j ∈ Z q , is L r -and thus also weakly consistent for the lag j autocovariance γ(j), where r = 2 for p = 1 and r = 1 otherwise.

Asymptotic distribution theory
Let us now turn to the asymptotic distribution theory, which we study for univariate as well as multivariate random fields. We show that, under weak regularity conditions, the sample autocovariances are asymptotically Gaussian and provide an associated limit theorem for the estimator σ 2 n . The latter is interesting in its own right, but is also needed to establish the subsampling central limit theorem to be discussed and applied in the next section.
We need the following approximation result, which shows that the sample autocovariances can be scaled with | Γ n (j)| or |n|.
We have the following result providing the weak convergence of the sample autocovariances and the estimator σ 2 n when appropriately centered and scaled. Let us denote by ⇒ the weak convergence in the Euclidean space R l , l ∈ N.
Recall that a matrix A = (a νμ ) νμ of dimension p × p defines the quadratic , which is positive definite if it is positive for all (λ Gaussian with mean zero and covariances given by for |j| ≤ m and |j | ≤ m, provided (11) induces a positive definite quadratic form.
For a multivariate random field the autocovariances γ(j) are p × p matrices with elements γ(j) νμ , 1 ≤ ν, μ ≤ p. We have the following multivariate extension of Theorem 2.4. Theorem 2.5. Suppose that Assumption 1 holds true.
is Gaussian with mean zero and covariances given by It is worth mentioning that, by Lemma 5.1 and Lemma 5.4 (a), the asymptotic variance of the random field ξ i ξ i+j − γ(j), i ∈ Z q , exists for each j ∈ Z q . Next we aim at studying the centered and scaled sample autocovariances as a process indexed by the lag j, thus extending the above results. Observe that for fixed n the estimator γ n (j) is only well defined for j ≤ n, and it is common to put γ n (j) = 0 if j i > n i for some i ∈ {1, . . . , q}. This motivates to consider the multivariate random field indexed by Z q and thus taking values in the space S = (R p×p ) Z q of mappings Z q → R p×p . The space S = (R p×p ) Z q is a separable and complete metric space when equipped with the metric The question arises whether G n converges weakly in the space (S, ρ), as Theorem 2.5 already provides the convergence of the finite-dimensional distributions. Since weak convergence in (S, ρ) is not elaborated in the literature, let us briefly discuss some details. First note that convergence with respect to ρ is pointwise convergence. For x ∈ S and t 1 , . . . , t k ∈ Z q , k ∈ N, the projection π t1,...,t k = (x t1 , . . . , x t k ) ∈ R p×pk , is continuous. The finitedimensional distributions of a random element X = (X k : k ∈ Z q ) taking values in (R p×p ) Z q are given by the laws of the random matrices (X t1 , . . . , X tn ) of dimension p × (pn), or, equivalently, by the laws of the p 2 n-dimensional random vectors ((vecX t1 ) , . . . (vecX tn ) ) , for t 1 , . . . , t n ∈ N, where vecA denotes the vector obtained by stacking the columns of a matrix A. The finite-dimensional sets, π −1 t1,...,t k (H) = {z ∈ S : (z t1 , . . . , z t k ) ∈ H, H ⊂ R p×pk measurable, are a π-system. For x ∈ S and ε > 0 let A x,ε be the system of sets A with is the open ball around x with radius ε. Choose k ∈ Z q such that j∈Z q 2 −|j| − |j|≤k 2 −|j| < ε/2. Consider now the uncountable many disjoint sets for 0 < η < ε/(2 · 3 q ). Since j∈Z q 2 −|j| = 3 q and d 0 is bounded by 1, any Hence, A η ⊂ B(x, ε). This shows that the system of boundaries of the sets A x,ε contains uncountably many disjoint sets, such that by [5] the weak convergence in (R p×pn ) Z q coincides with the convergence of the finite-dimensional distributions.
Therefore, Theorem 2.5 implies the following result.

Subsampling
Let us briefly review how subsampling for random fields works. For more details we refer to [22], especially Chapter 5.3 therein. Define and suppose that given the random field {X(t), t ∈ E n }, we have a consistent estimator θ n = θ n (X(t), t ∈ E n ) of an unknown real-valued parameter θ(P).
If we define J n (P) as the sampling probability law of τ n θ n − θ(P) under P, where τ n is a normalizing constant, and write J n (x, P) for the corresponding sampling probability distribution function, i.e.
J n (x, P) := P{τ n ( θ n − θ(P)) ≤ x}, x∈ R, the aim of subsampling is to find an approximation of J n (x, P) by recomputing the statistic θ n over random fields of smaller size than n and by considering the empirical distribution function of these subsampled values. Thus, for b, h ∈ Z q we define these smaller random fields by where E j,b,h stands for the rectangle containing the points The point b represents the size of the smaller random field defined on E j,b,h , the point h determines how many of these smaller random fields are taken into account, as Y j is only defined if This means, the vector h defines a grid and at each point of that grid a rectangle (block) defined by the block size b is located. Depending on the choice of h and b, those rectangles may overlap or not. Now, one can easily see that the number of considered subrandom fields decreases when h increases and that it increases when h decreases with a maximum for h = (1, 1, . . . , 1).
We further define the subsample value θ n,b,i as the statistic θ b evaluated at the smaller random field Y i , i.e. θ n,b,i := θ b (Y i ). The desired subsampling approximation of the sampling probability distribution function J n (x, P) is then defined as The question arises under which conditions this approximation is consistent. The main assumption for this to hold is that J n (P) converges weakly to a limit law J(P) with corresponding distribution function J(x, P), as n → ∞; the precise assumptions are as in [22,Theorem 5.3.1] and are listed in the theorem below, which is adapted to our setting. The assumptions mainly control the growth rate of b with respect to n and the mixing coefficients of the underlying random field. A common choice studied in greater detail in Section 4.2 is b = n γ for γ ∈ (0, 1).
The following theorem establishes the consistency of subsampling for statistics calculated from the underlying random field or, going beyond that case, which depend on (collections) of the terms arising in the lag j sample autocovariances. Define Y i (j) = ξ i ξ i+j − γ(j), i ∈ Z q , for any j ∈ Z q . Theorem 2.7. Let θ n be a real-valued statistic depending on , |j| ≤ m}, used for estimating the unknown real-valued parameter θ(P), such that as n → ∞. Assume that h is a non-zero constant and assume that as n → ∞, and as n → ∞. Then, in continuity points x of J(x, P), converges in probability ot q(γ) = inf{x : J(x, P) ≥ γ} and as n → ∞.

Extensions to multiplicative models
An important subclass of spatial-temporal random fields arises as the additive superposition of a time series introducing the serial dependencies and a spatial univariate random field, i.e.
As discussed in the introduction, this is a reasonable model when capturing, say, n 1 images of a fixed experimental situation, in order to estimate the image, its spatial dependence structure and the serial correlations of the data acquisition process. For a related detection procedures we refer to [24] and [25], Section 4.2.
Obviously, the above model can be also formulated as a multiplicative model, and thus we assume from now on that ξ i = (ξ where : i ∈ Z} is a strictly stationary second order ϕ-mixing univariate time series and ε (S) = {ε We further assume that ε (T ) and ε (S) have ϕ-mixing coefficients satisfying (9) for dimension 1 and q − 1, respectively, and are independent from each other.
In the present setting, one may center the ξ i at their temporal average and therefore we defině where ξ ·,ι := 1 n1 n1 l=1 ξ l,ι , and introduce the estimatoř for σ 2 . We then obtain the following theorem.
Theorem 2.8. Suppose the noise process satisfies in addition to the assumptions of Theorem 2.1, the multiplicative model (15).

Discussion of the mixing assumptions
The above results assume that the random field is ϕ-mixing. Inspecting the proof shows that this condition can be replaced by other weak dependence conditions ensuring that as n → ∞, cf. Lemma 5.4. For example, a natural condition is to assume that ξ k is ρ * -mixing, which implies that the strictly stationary random field Y k (j) = ξ 0 ξ k+j −γ(j) has a continuous spectral density function, cf. Lemma 5.2, which is then given by where from now on we put n1 = (n, . . . , n) for n ∈ N.
The supremum is taken over all nonempty disjoint sets A 1 , A 2 ⊆ Z q with distance dist(A 1 , A 2 ) ≥ r, and the ρ-mixing coefficient ρ(G, H) for two sub-σ-fields G and H of F is defined as where the supremum is taken over all G-measurable random variables U and all H-measurable random variables V , both with finite second moment. The distance between two sets A 1 , where · is the usual Euclidean vector norm. It is also worth mentioning that [3] establish a strong invariance principle for partial sums of α-mixing random fields taking values in R p and (16), if the mixing coefficients, defined as for any two disjoint non-empty sets E 1 , E 2 ⊆ N q , can be bounded in terms of the distance, namely by [3] is limited to upper summation limits n satisfying a constraint. That constraint is, however, satisfied for the important special case of proportional sample sizes, i.e. n j = c j n 1 , j = 2, . . . , q, holds for constants c 2 , . . . , c q . The strong invariance principle of [3] and (16) have been also established by [8] for random fields, which are weakly dependent in the sense that the covariance between f (ξ i : i ∈ I) and g(ξ i : i ∈ J ), for any pair of disjoint finite sets I, J with sup-distance r, see (46) for a definition of the latter distance, and any pair of bounded Lipschitz functions f and g, is of the order θ r , for a sequene θ r decaying exponentially fast.

Threshold cut-off estimation of the asymptotic variance
As the simulation studies in Section 4.1 will show, the RMSE of the variance estimator σ 2 n increases for larger values of m. Moreover, the smallest possible RMSE in most situations is still quite high. This problem often arises when the spatial autocovariance matrix is sparse, i.e. the number of non-vanishing entries is much smaller than the number of null entries, or when it is close to sparsity. The latter typically occurs when the autocovariances decrease fast. This estimation of null entries increases the variance but does not reduce the bias. By thresholding sample autocovariances which are small in magnitude, we may reduce the variance.
We allow for parameterized thresholding functions and propose to determine remaining unknown parameters by a subsampling procedure, which implicitly determines an approximating m-dependent random field. For that purpose, we adopt a known result about the consistency of subsampling for random fields, see [22], to the setting of this paper.

Cut-off estimation
These observations motivate to propose a thresholding estimator for σ 2 , where the idea is to multiply γ n (j) with some weight function g. This leads to the following estimator σ 2 n,th , defined as where g is a function depending on γ n (j) and some sequence c n (j) and which satisfies the following assumption.
The next theorem states some sufficient conditions for σ 2 n,th to be weakly consistent.

Theorem 3.1. Under the conditions of Theorem 2.1 and Assumption 2 the thresholding estimator σ 2
n,th defined as in (17) is a weakly consistent estimator for σ 2 .
For the special choice of g as we obtain the so-called cut-off estimator which belongs to the subclass of hard-thresholding estimators, where only autocovariances are taken into account whose absolute value exceeds the threshold c n (j). Theorem 3.1 then simplifies to the following Corollary. (18) is a weakly consistent estimator for σ 2 .

Corollary 3.1. Under the conditions of Theorem 17 and if for each
The extension to multivariate random fields is as follows. Define the matrixvalued threshold estimator where for a p × p matrix A = (a νμ ) 1≤ν≤p 1≤μ≤p and a real number c ≥ 0 we extend the function g to the domain R p×p × [0, ∞) by setting In (19) the symbol • denotes the Hadamard product, i.e. the element-wise multiplication of two matrices of the same dimension. This means, we threshold all elements of the variance-covariance matrix using the same threshold c n (j) depending on the lag j.  (19), is a weakly consistent estimator for σ 2 , i.e.
Motivated by our empirical studies presented in the next section, we propose to use rules of the form with α ≥ 0 and δ = 0.0001. For this rule we have c n (j) → −δ =: c(j) for all j ∈ Z q and n → ∞, such that the condition c(j) < |γ(j)| of Corollary 3.1 is fulfilled in any case. Let us consider a special choice of the lag truncation constants m for the case q = 2 corresponding to image data, in order to clarify, for a valid choice of the lag truncation constants m, possible values for α and the relationship of the lag-dependent truncation rule c n (j) to a constant-in-lag truncation constant typically used in time series analysis. Hence, let us assume that n 2 = fn 1 for some known f > 0, the aspect ratio of the image, and let for constants c 1 , c 2 and some 0 < γ < 1/3, such that the sufficient condition (10) of Theorem 2.1 is satisfied. Clearly, for larger γ a larger number of spatial autocovariances is taken into account. Then for all |j| ≤ m we have If α = 2/γ, then at the boundary (j = m) the constant (in sample size) threshold applies, whereas for α < 2/γ the boundary threshold converges to −δ, as n 1 → ∞. For γ slightly smaller than 1/3, in order to take into account a large number of autocovariances, the parameter α can be selected slightly larger than 6. This is in nice agreement with our empirical findings concerning the choice of α when aiming at precise estimation of the asymptotic variance σ 2 . However, when the estimated asymptotic variance is used to standardize an image test statistic, our simulations indicate that smaller values of α (around 3.6) are preferable, which are admissible for all admissible values of γ.

Data-adaptive estimation of lag truncation constants and optimal thresholds
The proposed estimators require to select the lag truncation constants m and, for the cut-off threshold estimator, the thresholds c n (j). Let us assume that the thresholds are given by a parametric family of functions with parameter α, i.e. c n (j) = c n (j; α), e.g. as in (20). To determine m and α we propose to proceed as follows: To determine the optimal value m opt for the discrete-valued parameter m from the set {k1 : k ∈ N}, we apply a sequential testing procedure which analyzes the sample autocovariances on the unit spheres with respect to the maximum norm and stops when they are no longer statistically different from zero. The statistical tests are based on subsampling in order to determine critical values. This procedure has the nice property that it is consistent for m-dependent random fields. For other fields, applying this procedure may serve as a guide, in order to approximate the random field under investigation by an m-dependent one, e.g. a spatial moving average model. A reasonable measure to evaluate σ 2 n,c is, of course, the root mean squared error (RMSE), which is a function of m and α. We propose to plug-in m opt and then to determine α by minimizing an estimate of the RMSE obtained by a subsampling procedure, too.
The subsampling-based proposal to deal with the estimators (4) and (18) is as follows. In accordance with the above notation introduced in Section 2.3, Y j now stands for the subrandom field {ξ t , t ∈ E j,b,h }, and we write σ 2 n,b,i for the subsample value that equals the statistic σ 2 b with constant weights equal to one evaluated at the random field We choose the constant weights here as these do not depend on m (and thus not on n) to avoid additional difficulties when working with the smaller random fields of size b.
In order to find a subsampling approximation for the RMSE we need to find a reasonable centering term. A natural choice is σ 2 n itself. However, as the summation in (4) still depends on m, we need to choose m first. For random fields with γ(j) ≥ 0 we propose the following sequential testing procedure. For each subrandom field of size b we start by considering the test statistic which estimates r(m) = j:|j k |=m k γ(j). We reject H 0 : r(m) = 0 if the 90%confidence interval for r(m) obtained by subsampling does not contain 0. We propose to determine m opt by the following sequential testing procedure: Starting with m = 1 and proceeding with k1, k = 1, 2, . . . , we apply the above test until it rejects the associated null hypothesis H 0 : r(m ) = 0 for the first time.
Then we put m opt = m − 1.
The consistency of the proposed subsampling testing procedure follows from Theorem 2.7 and Theorem 2.4, since the latter ensures that the root converges weakly to a Gaussian random vector: (13) and (14) hold, Theorem 2.4. If q(γ) denotes the γ-quantile of K(·, P) and q n,b (γ) the γ-quantile of the empirical subsampling distribution for the statistic R(m), then the asymptotic coverage of the confidence interval
Once we have determined m opt , we can evaluate σ 2 n for m opt and use σ 2 n,mopt as centering term for the subsampling approximation for the RMSE given a block size b, whose selection is discussed and studied in Section 4.2. This approach finally leads to the approximation where σ 2 n,b,i , i ∈ 1 : N , are the subsampling replicates and N = (N 1 , . . . , N q ). That approximation is now minimized to determine the tuning parameter α.

Simulation studies
In this section we present simulations about the proposed methods. Sections 4.1 to 4.2 provide a thorough analysis of the variance estimators of Sections 2 and 3, where for the latter we will focus our attention on the cut-off estimator. The results show that thresholding can improve in terms of the RMSE and, especially, is more robust with respect to the choice of the lag truncation constants m. Then we investigate the proposed data-adaptive procedure to estimate the remaining unknown tuning parameters of these estimators via subsampling. Section 4.3 provides results about the accuracy of significance tests for image data based on partial sums standardized with the proposed cut-off estimator. The simulations show that highly accurate testing is achieved by our proposal, even in the presence of substantial image correlations.

Simulation results for the variance estimators
Having in mind applications in imaging, where quite often spatial dependencies are local, we investigate the statistical behavior of the variance estimators σ 2 n and σ 2 n,c for selected stationary two-and three-dimensional random fields, including an autoregressive model to take into account serial correlations present in sequences of images.
Further, we investigate both estimators for different weighting functions and different values of m and n and analyse the resulting behaviour of the root mean square error and the bias. Finally, we will also analyse the second estimator dependent on the sequence c n (j) and make a concrete proposal for its selection.

Spatial moving average models of order one in two dimensions
The first model for the random field for which we want to analyse the behaviour of the estimators is a spatial moving average model of order one in two dimensions. For that purpose, take i.i.d. innovations η i,j with η i,j ∼ N (0, 1) for all i and j and define with real weights a k , k = 1, . . . , 9. We first take a 5 = 1 and a k = 0.3 for k = 5. We can now calculate the theoretical asymptotic variance as follows. If we write a = (a 1 , . . . , a 9 ) for the column vector with the weights a k , k = 1, . . . , 9, and for the column vector with the innovation η i,j and all the innovations that are located around it, we can rewrite (M1) as scalar product of the vectors a and η i,j , i.e. as ε i,j = a η i,j . Thus, we obtain as only 9 2 = 81 summands in the above summation over (h 1 , h 2 ) ∈ Z 2 are non-zero, namely one for each combination of a i and a j , i, j = 1, . . . , 9. For our concrete vector a one can thus calculate that the theoretical variance is σ 2 = 11.56.
Throughout this study we choose the weights w m (j), that are needed for the calculation of σ 2 n in (4), as the product of one-dimensional weights, where we take the w mi (j i ), 1 ≤ i ≤ 2, either as constant weights (CW) with w mi (j i ) = 1, 1 ≤ i ≤ 2, or as the Quadratic Spectral weight sequence (QS), i.e. w mi (j i ) = 25 where b w is a bandwidth parameter that still has to be chosen. In dimension one [2] could show that the QS kernel with a properly chosen bandwidth b w is the optimal choice with respect to the asymptotic MSE, see Theorem 2 in [2]. In his simulation studies, however, the author could also show that in a lot of situations the constant weights lead to more efficient estimates than the quadratic spectral weights. The disadvantage of the constant weights, however, is that in contrast to the QS weights they do not necessarily generate nonnegative variance estimates. Now, we first consider a random field of size (30,40) and investigate our estimator in (4)    Note, that M 2 mainly consists of pairs with equal components. These pairs are quite natural choices for m, as (M1) is a symmetric model where the influence of the innovations around η i,j is the same in both directions. Nevertheless, we also consider some pairs where the components differ, including the cases (10,13) and (15,20) which correspond to a third and a half of the random field of size (30,40), respectively.
The first table shows the values of the estimator and its RMSE for selected choices of m ∈ M 2 for the constant kernel based on 10000 repetitions. Table 1 shows that the RMSE, seen as a function of m, is convex with a minimum in m = (2,2), which corresponds to the largest lag for which the theoretical autocovariance in (M1) is non-zero. We also see that for values of m larger than (2,2) the RMSE increases quite fast while the bias does not vary much. This implies that the variance increases.
Next, we also examine our estimator for the QS kernel, for which we still have to choose the bandwidth parameter b w which we choose the same in all dimensions. We want to choose this parameter in such a way that the RMSE gets as small as possible. Figure 1 depicts for different values of b w the optimal RMSE with respect to m (i.e. when m ranges over the values of M 2 ) and the corresponding bias.
To abbreviate the notation we write m + b w for (m 1 , m 2 ) + (b w , b w ). We see that the smallest RMSE is achieved for b w = 6.4 which corresponds to m = (2,2). Thus, we will fix the bandwidth at this value. Furthermore, we can also see that for b w ≥ 4 the RMSE stays nearly fixed such that different choices of b w near the optimal one would lead to nearly as good estimates as for the optimal bandwidth b w = 6.4. The jags in the bias for small choices of the 910 A. Prause  bandwidth parameter arise from the fact, that we allow m to range over all the values of M 2 . A jag occurs, when the value of m, for which the smallest RMSE is attained, changes. For example, for b w = 0 the optimal m is (4,5) whereas for b w ≥ 2.3 it is (2,2), which is again the largest lag for which the theoretical autocovariance in (M1) is non-zero. For 0 < b w ≤ 2.3 the optimal value of m varies. Table 2 shows the values of the estimator and its RMSE in the above scenario of (M1) for selected choices of m for the QS kernel with b w = 6.4 for 10000 repetitions.
The smallest RMSE is now 1.7793 and again achieved for m = (2,2). Similar as in Table 1 the RMSE increases for values of m larger than (2,2), but not as rapid as in Table 1. Thus, we see that in this scenario the QS kernel with a reasonably chosen bandwidth is nearly 4% more efficient than the constant kernel which is in accordance with the simulation results in [2].
As this simulation study shows, the smallest RMSE is still quite high and what is even more problematic, the RMSE depends quite heavily on the proper choice of m. As already explained in the motivation for the improved variance estimator, this problem often appears when a lot of 'zeros' have to be estimated. This is the case here, as the dependencies of the random field are only weak, as the autocovariances vanish for |j| > 2. The estimation of many such null entries worsens the RMSE a lot. Thus we shall now investigate the improved variance estimator σ 2 n,c in (18) which is designed to circumvent this issue. For that we need to choose an appropriate cutting rule c n (j).
We focus on the rule c n (j) introduced in the previous section, which is parameterized by α > 0. The parameter α is a further tuning parameter that we want to choose in such a way that the RMSE of σ 2 n,c attains the smallest possible value when m ranges again over the values of M 2 .
In the following, we investigate the same situation as before, the spatial moving average model (M1) with weights a 5 = 1 and a i = 0.3 for i = 5. Figure  2 shows for both weighting schemes the curves of the optimal RMSE and the corresponding bias as a function of α.
We can see that the curves of the RMSE and the bias are very similar to each other. If α is too small the thresholds c n (j) do not lead to an improvement of the optimal RMSE. However, for α ∈ [5.0, 6.2] and constant weights, and for α ∈ [5.6, 6.0] and QS weights, one has slight improvements of the optimized RMSE. Here the optimal choice for α is α = 5.8 for both kernels. If α > 6.2 and α > 6.0 respectively the optimized RMSE gets worse than without cutting. The jags in the bias arise again due to the fact that we do not fix m but allow it to vary over M 2 . A jag occurs when the value of m for which the optimal RMSE is attained changes.
We can, however, see more easily the improvement of the estimator σ 2 n,c upon σ 2 n when we inspect tables similar to 1 and 2, but with σ 2 n replaced by σ 2 n,c and a cutting rule of the form (20) with α = 5.8. Tables 3 and 4 show, that the RMSE for both kernels levels off at approximately 1.76, which is in both cases an improvement of the RMSE of the estimator without cutting rule, though this improvement is higher for the estimator with constant weights. This shows that the proper choice of m is now of minor importance than before, since now each choice of m, that is larger or equal than the largest lag j for which the theoretical autocovariances are non-zero, leads to nearly the same RMSE. If we had chosen a different α than the optimal one, but one close to it, we would not have improved the optimal RMSE, but nevertheless the effect for m larger than (2,2) would have been the same.
We can observe a further interesting fact by inspecting Table 5. We have already obtained the results of the last two columns of Table 5 before. What is new now are the results of the first column. If we take b w = 0 and thus have j  Table 4 Simulation results for (M1) for different values of m with QS kernel, α = 5.8. m = (4,5). If we now optimize the cutting rule (20) for this situation as above, we obtain again α = 5.8 as the optimal value for α. This leads to a smallest possible RMSE of 1.7593 for m = (28,28). Thus we see that the estimator σ 2 n,c with α = 5.8 and a QS kernel with bandwidth b w = 0 leads to a smaller RMSE than the estimator σ 2 n with a QS kernel and bandwidth b w = 6.4 or than the estimator σ 2 n with constant weights and to an equally good RMSE if we further use the cutting rule. Similar observations also hold true for larger sample sizes.
Finally, we also want to investigate the behaviour of both estimators for both kernels for different sample sizes when we choose different weights for a k . So, in the following let a 5 = 1 and a k = a for k = 5. We again fix α = 5.8 and b w = 6.4. This time we do not only calculate the optimal RMSE (denoted by RMSE opt ), the corresponding bias and the value of m for which RMSE opt is attained, but also the quotient RMSE 29 /RMSE opt , where RMSE 29 is the RMSE of the estimator for m = (29,29). This is to get an impression of how much the RMSE can vary when we choose m too large in the case that we do not use a cutting rule. Table 6 shows the corresponding simulation results.
The table shows that both kernels compete reasonably well with each other. The estimator σ 2 n with a QS kernel, for example, ranges from being 6% less efficient to 4% more efficient than the estimator σ 2 n with a constant kernel in the case of a random field of size (30,40) with different weights a. Similar observations hold true for the other cases. In some scenarios the optimized RMSE of σ 2 n,c is worse than the one for σ 2 n as we do not vary the value of α, but keep it fix as α = 5.8 which was the value of α adapted to the situation n = (30,40) with a k = 0.3 for k = 5. However, the main advantage of the estimator σ 2 n,c has again to be seen in the fact that it stabilizes the RMSE for values of m larger than (2,2). This can be seen by the fact that for σ 2 n,c the quotient RMSE 29 /RMSE opt is close to one in all cases. For example, for n = (30,40), a k = 0.7 for k = 5, and the constant kernel the optimized RMSE worsens from 6.7235 for σ 2 n to 6.8246 for σ 2 n,c , while the quotient RMSE 29 /RMSE opt improves from 33.6324 to 1.1006. So regarding the RMSE we see that for σ 2 n,c it does not make a big difference if we choose m as (2,2) or (29,29) or as some value in between, whereas for σ 2 n it does. The same is also true in the case that we use a QS kernel, with the only difference that for σ 2 n the quotient RMSE 29 /RMSE opt equals 14.7478 which is a lot smaller than 33.6324 in the case of a constant kernel. This observation also holds true for all other scenarios, i.e. for σ 2 n this quotient is larger if we use the constant kernel instead of the QS kernel. Naturally, we could further improve the simulation results of the estimators with QS weights if we chose the bandwidth b w differently in each scenario. Similarly, we could also improve the simulation results of σ 2 n,c with an α chosen differently in each scenario.

Spatial moving average models of order three in two dimensions
We also investigate the behaviour of the estimators for stronger dependence structures such as a spatial moving average model of order three. The model, (M2), used here is similar to (M1), but now with weights 1, a 1 , a 2 and a 3 for the center pixel and the pixels in the first, second and third ring around it instead of only giving weights to the center pixel and the pixels in the first ring around it. If H i,j is the (7 × 7)-matrix with the column by column entries j−2 , η i−3,j , . . . , η i,j , . . . , η i+3,j+1 , η i+3,j+2 , η i+3,j+3 and write vec(·) for the vectorization of a matrix, which transforms a matrix into a vector by writing the columns on top of one another, we can formulate model (M2) as Our findings are in agreement with model (M1), in particular the threshold estimator again stabilizes the RMSE for large values of m, and are therefore deferred to the supplement.

Autoregressive spatial moving average mixture models in three dimensions
The last model that we consider is a mixture between an autoregressive model of order one in the time domain and a moving average model of order two in the spatial domain. More specifically we put for ρ ∈ (−1, 1) ∼ N(0, 1) for all t, i, j ∈ Z and the v t,i,j follow for each fixed t model (M1) and are uncorrelated for different values of t. Moreover, we suppose that u t1,i1,j1 and v t2,i2,j2 are uncorrelated for all t 1 , t 2 , i 1 , i 2 , j 1 , j 2 ∈ Z.
We now calculate the asymptotic variance as follows. Note, that by the onedimensional theory for AR(1)-models in the time series literature we get for the X t,i,j the representation as a linear process by 3 and E 4 equal zero as u t1,i1,j1 and v t2,i2,j2 are uncorrelated by assumption. E 1 is only non-zero if h i = h j = 0. In this case we get E 2 , on the contrary, is only non-zero if h t = 0 and it then equals the autocovariances of model (M1). We thus obtain which corresponds to the sum of the variance caused by the AR(1)-model in the time domain and the asymptotic variance of (M1). In our first simulation setting for (M4) we take ρ = 0.2, a 5 = 1 and a i = 0.3 for i = 5 and consequently have σ 2 = 1.5625 + 11.56 = 13.1225. Now, we first consider a random field of size (20,30,40) Note that this time we include the triple (1,2,2), as this is the largest lag with a significant contribution to the asymptotic variance of model (M4). Table 7 shows the corresponding simulation results for the constant kernel using 10000 repetitions. We can see that the smallest RMSE is attained for m = (1,2,2) which corresponds in the spatial domain to the largest lag with non-zero autocovariances and in the time domain to the largest lag with a significant contribution to the asymptotic variance as the autoregressive model is of order one. Greater values of m lead to a rapid increase of the RMSE while the bias stays stable.
In the case of QS weights we first have to determine the bandwidth parameter b w . This time we allowed b w to vary from 0 to 40, but for these values no minimum is attained (except on the boundary for b w = 40). On the contrary, the RMSE decreases slowly when b w increases. As, however, for b w ≥ 20 the RMSE decreases so slowly that visually it is constant, we take b w = 20 from now on. Note that in the previous three simulation models a minimal RMSE with respect to b w was always attained, but in all these cases the RMSE was also nearly constant around the minimum. Table 8 shows the corresponding simulation results.
We see that also for the QS kernel the smallest RMSE is achieved for m = (1,2,2) which is now slightly larger than for the constant kernel. However, if we 916 A. Prause  chose a larger value for b w we could probably get a smaller RMSE as in the previous simulation settings.
Next, we want to investigate how the improved variance estimator σ 2 n,c with the cutting rule with α ≥ 0 and δ = 0.0001, behaves in this model. If we optimize again the RMSE with respect to α, we obtain for the constant as well as for the QS weights an optimal value of α as α = 9.4. Due to very high computational costs we allowed m only to vary up to values of M 3 smaller than m = (7,7,7) which does not effect the results when using constant weights, but does not lead to the smallest possible RMSE when using QS weights. The corresponding results are gathered in Tables 9 and 10. We see that for both weighting schemes we again have the stabilization property of the RMSE for values of m larger than m = (2,2,2). For the constant weights the RMSE levels off at approximately 0.71 and for the QS weights between 0.73 and 0.76. But what is even more appealing now is that we can actually improve the optimized RMSE by almost 15% in the case of constant weights and by nearly 14% in the case of QS weights. The reason for this is that in this model the autocovariances for m larger than (1,2,2) are close to zero, but not identically zero as in the previous simulation settings.
Finally, we want to investigate if we still have an improvement of the optimized RMSE if we take the cutting rule with the optimal α = 9.4 in cases where we change the weights of the autoregressive parameter ρ in model (M4). All other parameters are chosen as before. Table 11 shows that this is indeed the case. For ρ = 0.4, for example, we have an improvement of around 17% for both kernels. If we optimized α to this situation we would probably obtain an even higher improvement. Also the other values for ρ lead to an improvement of the optimized RMSE of nearly or even more than 10%.  Table 11 Simulation results for (M4) for constant / QS kernel, α = 9.4 and different weights ρ.

Data-adaptive variance estimation using subsampling
In the previous section we have performed an extensive simulation study about the behaviour of the RMSE of the variance estimators (4) and (18). We have seen that dependent on the dependence structure of the underlying random fields it is quite useful to consider the improved variance estimator in (18) instead of the one in (4), as the former is very robust with respect to the proper choice of m and can also lead to improvements of the optimized RMSE up to more than 15% in some situations.
In the following simulation study we want to check how well the data-adaptive subsampling procedure proposed in Section 3.2 works. For this puropose, we consider model (M1) and choose b = n γ = ( n γ 1 , n γ 2 ) with γ ∈ {0.7, 0.8, 0.9} as well as h = 1, so that we consider all possible subrandom fields of size b. Moreover, we do not only investigate the subsampling approximation of the RMSE, but also the one for the mean which is defined as

A. Prause and A. Steland
The general simulation setting of model (M1) stays the same as in Subsection 4.1.1; in particular we consider a random field of size (30,40) leading to subrandom fields of size (10,13), (15,19) and (21,27) for γ = 0.7, 0.8 and 0.9, respectively. Table 12 shows the simulation results for different values of m and 10000 repetitions. We can see that the subsampling approximation of the mean works very well for all values of γ and m. Regarding the subsampling approximation of the RMSE, we see that this is most adequate if we choose γ = 0.9. Therefore, we fix this value from now on. As our main goal is, however, to find a good estimate for the parameter α used in the cutting rules of the improved variance estimator (18), we now want to investigate how well we can imitate the graph of the RMSE in Figure 2 by the subsampling approximation. We compute the (optimal) subsampling approximation RMSE σ 2 n,c of the RMSE of σ 2 n,c with respect to m (i.e. when m ranges over the values of M 2 in (24)) analogously to the one of σ 2 n in (22), whereby we now have to replace c n (j) by c b (j).  Table 12 Subsampling approximation of the mean and the RMSE in model (M1) Figure 3 shows the optimal and subsampled RMSE in model (M1) for random fields of size (30,40) and (45,60) using constant weights. We can see that the shape of the curves of the subsampling approximation of the RMSE coincides quite well with the shape of the optimal curves of the RMSE. Moreover, the difference between the approximated and the optimal curves becomes smaller for larger sample sizes. The only problem is that the curves of the approximation attain their minimum for α = 0; however, this is not very suprising as the minimum of the curves of the optimal RMSE is not very marked. To deal with such situations, we choose α as the largest value for which the corresponding subsampled RMSE differs less than 1%, say, from the subsampled RMSE for α = 0. In the considered situation this leads to α = 5.4 compared to the optimal value of α = 5.8 for a random field of size (30,40) and to α = 5.7 compared to the optimal value of α = 6.4 for a random field of size (45,60). For a tolerance of 3% one would obtain α = 5.7 for the smaller and α = 6.1 for the larger random field. This shows that the proposed method leads to reasonable estimates for α if one chooses an appropriate tolerance level. Even if the optimal and the estimated values for α do not match exactly, they are still close enough to each other to guarantee good results and to profit from the advantages that one achieves when using σ 2 n,c instead of σ 2 n .

Accurary for testing image data
Lastly, we analyzed the accuracy of the proposed estimator when used to standardize the image test statistic based on the partial sums S n , in order to test for the presence of deviations from a null or reference model for the image, as introduced and discussed in the introduction. Concretely, we investigated the test which rejects the null hypothesis H 0 , if Observe that we use the threshold cut-off estimator with constant weights to standardize the partial sum. We used the threshold rule c n (α) investigated above in detail and investigated how one should select α to obtain accurate tests in terms of the type I error rate. We simulated image data from spatial moving average models of order d given by where ij are i.i. d. N (0, 1). The first model is model (M2), i.e. a spatial moving average model of order d = 3 with weights a 1 = 0.5, a 2 = 0.3 and a 3 = 0.1, cf. Section 4.1.2. The second model allows for much stronger dependencies and is of order d = 40 with coefficients given by for |ρ| < 1. The degree was chosen as 40, in order to approximate a spatial linear process with d = ∞ and those coefficients, and nevertheless keeping the computational costs at a reasonable level. Depending on the parameter ρ the correlations can be substantial. Table 13 provides the results for random fields following those models with sizes ranging between n 1 = n 2 = 50 and n 1 = n 2 = 250. For the second model the parameter ρ was chosen as 0.1, 0.3 and 0.5, in order to evaluate the procedure for weak, intermediate and stronger dependence structures. Indeed, visual inspection of random fields with ρ = 0.5 shows that the covariances induce an interesting texture-like random pattern which comes close to those observed in certain inhomogenous materials. For each case the type I error rate was simulated based on 5,000 simulation runs.
Compared to the results addressing the estimation accurarcy, where values of α around 5.8 led to optimal RMSE values, it can be seen that smaller values around 3.6 are preferable, in order to obtain accurate type I error rates for the settings studied in this simulation, whatever the degree of correlation of the random field. This is, however, in good agreement with the findings of the previous simulations, especially the RMSE and bias curves shown in Figure 2. Indeed, according to those results for values around 3.6 the variance is slightly larger (leading to the slightly higher RMSE values), but the bias is smaller. Our simulations reveal the interesting observation that for the accuracy in terms of the type I error rate it seems to be better to have a smaller bias.

Concluding remarks
In all the settings of Subsection 4.1 we presented results for positive values of the MA-and AR-parameters. However, in our simulation studies we also considered negative values and mixtures of positive and negative values for these parameters. As the results are qualitatively the same as for only positive weights we do not present them here.
Moreover, the question arises whether cutting rules different from those in (20) and (26) respectively, would lead to similar or even better results regarding the RMSE of the improved variance estimator. One could for example also think of a cutting rule of the form or even of using a constant cutting rule, i.e. a rule that does not depend on the lag j and the sample size n. However, the simulation studies showed that a constant cutting rule in general cannot improve the RMSE as much as a cutting rule depending on j and n. Moreover, rules as in (29) or of similar forms lead to comparable but not better results as the cutting rules in (20) and (26) concerning the optimized RMSE. That is why we only focused our attention for the analysis of σ 2 n,c on sequences c n (j) as in (20) and (26). Lastly, we may conclude that the proposed estimators work well in various models of practical importance and lead to highly accurate results, both in terms of estimation accuracy and in terms of the type I error rates when considering significance testing for image data, when the parameters are selected appropriately. The latter can be achieved in a data-adaptive way by the proposed subsampling-based selection procedure.

Proofs
This section is devoted to rigorous proofs of the presented theorems. The consistency proofs for the case p = 1, except Theorem 2.8, are part of the first author's Ph.D. thesis, see [23]. The extensions to multivariate fields (p > 1) as well as Theorems 2.4, 2.5, 2.6 and 2.7 are newer and due to the second author.

Proofs of Section 2
Frequently we shall use the following result, see [4]: by definition of ϕ. For the proof of Theorem 2.1 we need the following lemma.
It follows now from Theorem 28.21 [6] that Y i (j) possesses a continuous spectral density function, since for any ρ * -mixing random field the linear dependence measure κ(r) defined by where the supremum is taken over all pairs of nonempty, finite and disjoint sets Q, S such that d(Q, S) ≥ r, see [6, p. 148], satisfies the necessary and sufficient condition κ(r) = o(1), as r → ∞.
For p = 1 the following basic result for ϕ-mixing random fields can be found in [11] without a proof. Lemma 5.3. Suppose p = 1. Let condition (9) be satisfied. Set S n := 1≤j≤n ξ j for all n ≥ 1. Then the following three assertions hold true.
The following result generalizes Lemma 5.3 to a general random field taking values in R p , p ∈ N. Lemma 5.4. Assume that condition (9) is satisfied. Let S n := 1≤j≤n ξ j , n ≥ 1.
To verify (b), we have to show that as n → ∞. In what follows, = ( 1 , . . . , q ) ∈ N q 0 , −k = ( j ) j=1,...,q,j =k and ( −k , i k ) is obtained from by replacing the kth entry, k , by i k . Note that We shall show that the second term is o(1) and remains o(1) when propagated through the outer summations. To show that it suffices to consider the case I = 1, i.e. we may neglect this indicator. De- can be handled analogously. Indeed, we will show that for all k ∈ {1, . . . , q} Observe that for any j ∈ {1, . . . , q}\{k} as n → ∞, by a Cesaro sum argument, since, of course, Similarly, using again a Cesaro sum argument, as n → ∞. By combinding the results for r + k and r − k we now also obtain nj −1 as n → ∞. More generally, by iterating the above argument, we have, firstly, for any pairwise different j 1 , . . . , j r ∈ {1, . . . , q}\{k}, in the · ∞ -norm, (1) and, secondly, as n → ∞. Now we may conclude that (35) yields as n → ∞. This shows (b). Assertion (c) can be shown by summing over all q-dimensional unit squares w.r.t to the maximum norm, where c(r) = #(−n ≤ * i ≤ n : max j |i j | = r) ≤ 2q(2r + 1) q−1 . Using again (34), we obtain for any ∈ 1 : n with l = r. Hence and we arrive at We are now in a position to prove the consistency of the estimator σ 2 n for multivariate random fields.
Proof of Theorem 2.1. Define the p × p matrix We show that σ 2 n − σ 2 n ∞ P → 0 and σ 2 n − σ 2 ∞ → 0 as n → ∞. Notice that as n → ∞, which implies m → ∞, by Lemma 5.3(a). For the first summand this follows by dominated convergence: Clearly, f m (j) → 0 for each fixed j ∈ Z q , as m → ∞. Moreover, with j∈Z q |g(j)| < ∞ by Lemma 5.3(a). Thus, the first summand also converges to zero and hence we obtain σ 2 n − σ 2 ∞ → 0. We now show that σ 2 n − σ 2 n ∞ P → 0. By the Markov inequality we have We shall bound the right-hand side by an expression which is o(1). Combined with σ 2 n − σ 2 ∞ = o(1), as n → ∞, this then shows the second assertion of Theorem 2.2. By boundedness of the weights w m (j) we obtain Observe that For each j ∈ Z q we can now apply Lemma 5.3(c) to the expectation of the squared sum of the random field {Y (ν,μ) i (j)} as condition (32) is fulfilled by Lemma 5.1. This leads to where the last inequality is fulfilled by Lemma 5.1 and the definition of A(q, ϕ) in Lemma 5.3(c). Combining (36) with (37) we obtain To estimate the right-hand side consider the decomposition Since the ϕ-mixing coefficients are decreasing with ϕ(0) = 1, we obtain (2r + 1) q−1 ≤ 2j (4j + 1) q−1 .
As the sum in (38) only ranges over j ∈ Z q with |j| ≤ m we know that j ≤ m . This implies S 1 ≤ 2m (4m + 1) q−1 ≤ c 1 (m ) q for some finite constant c 1 ∈ R depending only on the dimension q of the random field. For the estimation of S 2 we use again (33) and get (2(r + 2j ) + 1) q−1 ϕ Since ∞ r=1 (2r + 2) q−1 ϕ 1 2 ξ (r) is finite by assumption, we can find a constant c 2 ∈ R such that for all |j| ≤ m we have S 2 ≤ c 2 (m ) q−1 .
Recall that a sequence of distribution functions converges in all continuity points of the limit function, if and only if the associated sequence of quantile functions converges in all continuity points of the limiting quantile function, see [26,Lemma 21.2]. Consequently, since by continuity of J(x, P), x ∈ R, as n → ∞, where R ∼ J(x, P), which implies P(|n| 1/2 ( θ n − θ(P)) ≤ q n,b (γ)) → P(R ≤ q(γ)) = γ, as n → ∞, which proofs the remaining assertions.

Proofs of Section 3
Let us now establish the consistency of the thresholding estimators of the asymptotic variance.
Proof of Corollary 3.1. We only have to show that the special choice of g in Corollary 3.1 fulfills Assumption 2 and then the assertion directly follows by Theorem 3.1. This means that we have to show that E (1 − g ( γ n (j), c n (j))) = P (| γ n (j)| ≤ c n (j)) → 0 for all j ∈ Z q . For that observe that by Lemma 5.3 in [25] and the assumptions of the corollary we have | γ n (j)| − c n (j) → |γ(j)| − c(j) =: X in probability, as n → ∞. This implies the convergence in distribution, i.e. if F | γn(j)|−cn(j) denotes the distribution function of γ n (j) − c n (j) and F X the one of X, we have P (| γ n (j)| ≤ c n (j)) = F | γn(j)|−cn(j) (0) → F X (0), as n → ∞, if zero is a continuity point of F X . But since and c(j) < |γ(j)| by assumption this is fulfilled. Moreover, the last inequality also implies that F X (0) = 0. Thus we have P (| γ n (j)| ≤ c n (j)) → 0 for all j ∈ Z q as n → ∞ which completes the proof.