Parameter estimation for SPDEs based on discrete observations in time and space

Parameter estimation for a parabolic linear stochastic partial differential equation in one space dimension is studied observing the solution field on a discrete grid in a fixed bounded domain. Considering an infill asymptotic regime in both coordinates, we prove central limit theorems for realized quadratic variations based on temporal and spatial increments as well as on double increments in time and space. Resulting method of moments estimators for the diffusivity and the volatility parameter inherit the asymptotic normality and can be constructed robustly with respect to the sampling frequencies in time and space. Upper and lower bounds reveal that in general the optimal convergence rate for joint estimation of the parameters is slower than the usual parametric rate. The theoretical results are illustrated in a numerical example.


Introduction
Stochastic partial differential equations (SPDEs) combine the ability of deterministic PDE models to describe complex mechanisms with the key feature of diffusion models, namely a stochastic signal which evolves within the system. While SPDEs have been intensively studied in stochastic analysis, their statistical theory is only at its beginnings. Since we first need to have a thorough statistical understanding for basic SPDEs before more complex models can be studied, let us consider the prototype for the large class of parabolic SPDEs given by the stochastic heat equation on [0, 1]: where dW denotes white noise in space and time, ξ is some independent initial condition and we impose Dirichlet boundary conditions. More general, we will later incorporate also a first and zero order term in the differential operator. The statistical aim is to infer on the diffusivity parameter ϑ 2 > 0 and the diffusion or volatility parameter σ 2 > 0.
In the seminal works by Huebner et al. [13] as well as Huebner and Rozovskii [14] a spectral approach has been considered where the processes t → u (t) := X t , e L 2 are observable for the eigenfunctions e of the underlying differential operator. These so called Fourier modes u are independent and satisfy Ornstein-Uhlenbeck dynamics. Consequently, classical results from statistics for stochastic processes can be applied directly. While the spectral approach is studied in numerous papers, see Lototsky [21] or Cialenco [6] for a review, this specific observation scheme is limiting and too restrictive in potential applications. Especially, for more general equations the eigenfunctions will depend on unknown parameters, which is already the case if we add a first order term ϑ 1 ∂ ∂x X t (x)dt with unknown ϑ 1 ∈ R in (1).
Complementary to this spectral approach, the canonical problem of parameter estimation based on discrete observations of the solution field of the SPDE recently attracted an increased research activity. Assuming X is observed on a discrete grid (t i , y k ) i=0,...,N,k=0,...,M ⊂ [0, T ] × [0, 1], approximate maximum likelihood estimators have been first investigated by Markussen [22] for T → ∞. For various linear SPDEs central limit theorems for method of moment type estimators based on realized quadratic variations have been studied by Torres et al. [29], Cialenco and Huang [7], Bibinger and Trabs [3,2], Chong [4,5], Shevchenko et al. [28], as well as Kaino and Uchida [18]. However, all these works only give partial answers to the estimation problem. Even for the stochastic heat equation there neither is a sharp analysis for joint estimation of ϑ 2 and σ 2 nor the case where the number of spatial observations M dominates the number of temporal observations N has been explored in general.
Therefore, in this relatively young research field basic and elementary questions even for simple (linear, parabolic) SPDEs still need to be answered. This becomes most important with regard to an increasing number of SPDE models in applications, e.g., in neurobiology [31], for the description of oceans [25,11], climate modelling [12] or the description of interest rates [8,27].
In order to provide a complete statistical analysis of parametric estimation for linear parabolic SPDEs in dimension one based on discrete observations on a finite time horizon T > 0, our main contributions reveal that: (i) ϑ 2 and σ 2 cannot be jointly estimated if N or M is fixed. (ii) The optimal convergence rate for estimating (ϑ 2 , σ 2 ) is 1/ √ M 3 ∧ N 3/2 which generally is slower than the parametric rate 1/ √ M N . (iii) Realized space-time quadratic variations can be used to construct estimators which are robust with respect to the sampling frequencies N and M in time and space, respectively.
In view of (i), we will consider the double asymptotic regime M, N → ∞ in our analysis which results in infill asymptotics in time and space. Since the vector of observations (X ti (y k )) i=0,...,N,k=0,...,M is normally distributed with only two unknown parameters in equation (1), it might surprise that there is no estimator with parametric rate for (ϑ 2 , σ 2 ). Indeed, our lower bound verifies that the parametric rate can only be achieved if N and M 2 are of the same order of magnitude. In view of the scaling invariance of the stochastic heat equation, this particular asymptotic regime N M 2 implies that we add the same amount of information in time and space as N and M increase. In this sense we have a balanced design. An unbalanced regime N = o(M 2 ) or M = o( √ N ) causes a deterioration of the convergence rate.
Our statistical analysis also gives insights into the relation between the spectral and the discrete observation scheme. While both are heuristically comparable in view of the discrete Fourier transform, it turns out that there are important differences. In particular, the fully discrete observation scheme is not statistically equivalent (in the sense of Le Cam) to time discrete observations of the first M Fourier modes in general.
Our estimators rely on realized quadratic variations, taking into account time and space increments (∆ N i X)(y k ) := X ti+1 (y k ) − X ti (y k ), (δ M k X)(t i ) := X ti (y k+1 ) − X ti (y k ), respectively, as well as space-time increments or double increments In contrast to the maximum likelihood approach which requires inversion of the large M N × M N covariance matrix, method of moments type estimators based on (2) and (3) are easy to implement. As observed in [3], a central limit theorem for realized temporal quadratic variations requires that the observation frequency in time dominates the observation frequency in space, more precisely, is necessary. Complementarily, we show that the realized spatial quadratic variation satisfies a central limit theorem if N = o(M ). The remaining gap can be filled by double increments and the corresponding realized space-time quadratic variation turns out to be robust with respect to the sampling frequencies M and N . Based on these statistics, we construct method of moments estimators for ϑ 2 and σ 2 (as well as ϑ 1 from a first order term). Hereby, the rate optimal method for joint estimation of all identifiable parameters is an M-estimator relying on double increments. Our proofs employ directly the Gaussian distribution of X which allows for an explicit covariance condition for asymptotic normality of quadratic forms of Gaussian triangular schemes. Let us remark that our estimators could be directly generalized to a nonparametric model with time dependent coefficients, as indicated in [3,4]. Note that the solution process X to the SPDE (1) admits continuous trajectories only in one spatial dimension. In the multi-dimensional case one could consider noise processes which are more regular in space as studied by Chong [4]. Alternatively, Kriz and Maslowski [20] as well as Altmeyer and Reiß [1] generalize the spectral approach to the observation of functionals X t , K for some (localizing) kernel K.
This work is organized as follows: In Section 2 we give a precise definition of the model and study probabilistic properties of the solution field. In Section 3 we present the central limit theorems for realized quadratic variations based on space and double increments. The resulting method of moments estimators are constructed in Section 4. Lower bounds are derived in Section 5. In Section 6 we illustrate our results with a numerical example. The proofs of the main results are collected in Section 7 while auxiliary results are postponed to the appendix.
Throughout, we restrict the parameter space to such that all the eigenvalues are negative and A ϑ is a negative self-adjoint operator. Consequently, the weak solution to the SPDE (4) exists and is given by the variation of constants formula where (e tA ϑ ) t≥0 denotes the strongly continuous semigroup generated by A ϑ , see [26,Theorem 5.4].
Since (e ) ≥1 is a complete orthonormal system, the cylindrical Brownian motion W can be realized via W t = ≥1 β (t)e in the sense of W t (·) = ≥1 β (t) ·, e k for a sequence of independent standard Brownian motions (β ) ≥1 . In terms of the projections or Fourier modes u (t) := X t , e , t ≥ 0, ∈ N, we obtain the representation where (u ) ≥1 are one dimensional independent processes satisfying the Ornstein-Uhlenbeck dynamics du (t) = −λ u (t) dt + σ dβ (t) or equivalently in the sense of the usual finite dimensional stochastic integral. For simplicity, we will assume throughout that {β , u (0), ∈ N} is an independent family and u (0) ∼ N (0, σ 2 /(2λ )) such that each coefficient process u is stationary with covariance Cov(u (s), u (t)) = σ 2 2λ e −λ |t−s| , s, t ≥ 0. The reduction of more general conditions on X 0 to the stationary case is discussed in [3].
From representation (5) it is evident that X is a two parameter centered Gaussian field. Therefore, the model is completely specified by its covariance structure While σ 2 is only a multiplicative factor, the covariance structure depends on ϑ through λ and e . By Kolmogorov's criterion there is a continuous version of the process (X t (x), t ≥ 0, x ∈ [0, 1]), cf. [26,Chapter 5.5]. In particular, point evaluations X t (x) for fixed values of t and x are well defined.
For a fixed spatial location x the sample paths of the process X · (x) are no semi-martingales. In fact, t → X t (x) is only Hölder continuous of order almost 1/4 [26,Theorem 5.22] and thus has infinite quadratic variation over any time interval. On the other hand, regarding X as a function of space at a fixed point in time substantially simplifies the probabilistic structure of the process: , is an Itô diffusion. In particular, where B(·) = B t (·) is a standard Brownian motion.
Note the similarity of the covariance structures of X t (·) and of the Brownian bridge, especially in the case Γ = 0. This resemblance is in line with the Dirichlet boundary conditions X t (0) = X t (1) = 0 in our model. Remark 2.2. For N ≥ 2 and fixed 0 ≤ t 1 < t 2 < . . . < t N the multi-dimensional process x → (X t1 (x), . . . , X t N (x)) is not an Itô diffusion. Indeed, it is not even a Markov process: Take N = 2 and let s < t. It is a well known fact that for Markov processes past and future are independent, given the present state. For x < y < z on the other hand, using the Gaussianity of X, the (Gaussian) conditional distribution of (X s (x), X t (z)) given (X s (y), X t (y)) can be computed explicitly. From here, independence is easily disproved by checking the non-diagonal entries of the conditional covariance matrix.

Central limit theorems for realized quadratic variations
We will now study central limit theorems for realized quadratic variations based on the space and double increments from (2) and (3), respectively. To fix assumptions and notation, let X be given by (5) and suppose we have (M + 1)(N + 1) time and space discrete observations for some fixed b ∈ [0, 1/2). The spatial locations y k are thus equidistant inside a (possibly proper) Note that whenever M → ∞ or/and N → ∞, we obtain infill asymptotics in space δ → 0 or/and time ∆ → 0, respectively. Throughout, M, N → ∞ should be understood in the sense of min(M, N ) → ∞. For two sequences (a n ), (b n ), we write a n b n to indicate that there exist some c > 0 such that |a n | ≤ c · |b n | for all n ∈ N and we write a n b n if a n b n a n . If a n = a for some a ∈ R and all n ∈ N, we write (a n ) ≡ a. Moreover, · 2 denotes the spectral norm and · F denotes the Frobenius norm for matrices.
The realized quadratic variations can be regarded as sums of squares of certain Gaussian random vectors. Hence, our central limit theorems embed into the literature on quadratic forms in random variables and their asymptotic properties, see e.g. [23]. Our key tool for proving asymptotic normality is the following proposition which is tailor made for the situation present in this work and which gives an explicit covariance condition that ensures convergence to the normal distribution.
Proposition 3.1. Let (Z i,n , 1 ≤ i ≤ d n , n ∈ N) be a triangular array which satisfies (Z 1,n . . . , Z dn,n ) ∼ N (0, Σ n ) for a covariance matrix Σ n ∈ R dn×dn , n ∈ N, and let (α i,n , 1 ≤ i ≤ d n , n ∈ N) be a deterministic triangular array with values in The proof relies on the fact that S n can be represented as a linear combination of independent χ 2 (1)-distributed random variables. Σ n 2 2 /Var(S n ) → 0 then implies that the corresponding Lyapunov condition is fulfilled. In this section we only require α i,n = 1 for all i and n, i.e. S n = Z •,n 2 2 . The general case will be necessary to verify asymptotic normality of the M-estimator in Section 4. It is worth noting that Proposition 3.1 reveals a quite elementary proof strategy to verify several central limit theorems in [3,7,28,29] instead of advanced techniques from Malliavin calculus or mixing theory. Remark 3.2.
1. If α i,n = 1 for all i, n, it follows from Isserlis' theorem [17] that Var(S n ) = 2 Σ n 2 F and thus, the condition for asymptotic normality may be written as Σ n 2 / Σ n F → 0. This condition is essentially optimal: In case of independent observations it is in fact equivalent to asymptotic negligibility of the individual normalized and centered summands and hence equivalent to Lindeberg's condition. 2. The spectral norm is bounded by the maximum absolute row sum. Writing Σ n = σ (n) ij i,j , asymptotic normality thus holds under the sufficient condition So far, the double asymptotic regime M, N → ∞ has only been studied for time increments (∆ N i X)(y k ) = X ti+1 (y k ) − X ti (y k ): If b > 0 and if there exists ρ ∈ (0, 1/2) such that M = O(N ρ ), then the rescaled realized temporal quadratic variation where cf. [3,Thm. 3.4]. Note that this result is only valid under the condition M = o( √ N ), i.e., the observation frequency in time is much higher than in space. This constraint is due to a nonnegligible correlation of realized temporal quadratic variations at two neighboring points in space if the distance δ of these points is small compared to ∆ or, equivalently, if M is large compared to N .
In the situation where the number of spatial observations dominates the number of temporal observations the above result is not applicable. In this case, spatial increments (δ M k X)(t i ) = X ti (y k+1 ) − X ti (y k ) and the corresponding rescaled realized spatial quadratic variations at time t i turn out to be useful. In contrast to squared time increments, which have to be renormalized by √ ∆ due to the roughness of t → X t (y), squared space increments have to be renormalized by δ due to the semi-martingale nature of y → X t (y).
In the extreme case where observations are only available at one point t in time (and assuming ϑ 1 = ϑ 0 = 0 as well as X 0 = 0) Cialenco and Huang [7] showed that V sp (t) is asymptotically normal with 1/ √ M -rate of convergence. An analogous result has been proved by Shevchenko et al. [28] for the wave equation. Proposition 2.1 reveals that V sp (t) is in fact a rescaled realized quadratic variation of the Itô diffusion y → X t (y). Hence, follows from standard theory on quadratic variation for semi-martingales. In order to generalize this central limit theorem to the double asymptotic regime M, N → ∞, we define the time average of the rescaled realized spatial quadratic variations: Remark 3.4. The condition N/M → 0 is necessary in order to to neglect the bias: The proof of the theorem reveals that δ and consequently, the overall bias is of the order We conclude that the central limit theorem for realized temporal quadratic variations V t holds when (roughly) M = o( √ N ), whereas the central limit theorem for realized spatial quadratic variations V sp is fulfilled if N = o(M ). To close the remaining gap, we finally study the space-time increments D ik from (3). The corresponding rescaled realized quadratic variations are robust with respect to the sampling regime, as indicated by the representation in terms of the series expansion (5).
In contrast to the case of space increments (and in line with the result for time increments), we impose b > 0 for the remainder of this section. Inspection of the proofs suggests that this condition may be relaxed to b → 0 as long as the decay is sufficiently slow. As a first step, we calculate the expectation of the double increments Proposition 3.5. Let b ∈ (0, 1/2). Then: 1 − e −π 2 ϑ2 2 ∆ π 2 ϑ 2 2 cos(π δ).
(ii) Assuming that r = lim δ/ √ ∆ ∈ [0, ∞] exists, Φ ϑ admits three different asymptotic regimes: If moreover δ/ √ ∆ ≡ r ∈ (0, ∞), we have Remark 3.6. The first order constants appearing in the asymptotic expressions in (ii) stem from a first derivative of F ϑ2 (·, ∆) in 0 in case r = 0 and a Riemann sum approximation of F ϑ2 (δ, ∆) in case r = 0, respectively. Assuming for simplicity that κ = 0, the proof of Proposition 3.5 shows a more precise expression for the remainder terms in case r ∈ {0, ∞}: Thus, if our analysis of the remainder terms is sharp (which we believe is the case), the first order approximations have a poor quality if δ/ √ ∆ converges slowly.
Proposition 3.5 suggests to renormalize double increments with δ if δ/ √ ∆ → 0 and with √ ∆ otherwise, which is in line with the renormalization of V sp and V sp , respectively. However, this approach might not be feasible: Firstly, it requires the knowledge which asymptotic regime is present, i.e., whether or not δ/ √ ∆ → 0. Especially for one given set of observations this information may be inaccessible. In this case renormalizing with Φ ϑ (δ, ∆) automatically captures the correct asymptotic regime. Secondly, if r ∈ {0, ∞}, the previous remark shows that the asymptotic expressions for Φ ϑ (δ, ∆) may lead to an undesirably large bias. In fact, in order to obtain a central limit theorem with 1/ √ M N -rate of convergence, we would have to impose the assumptions N 2 /M → 0 and M 5 /N → 0, respectively. These constraints are even more restrictive than the ones required for time or space increments. Therefore, we renormalize with Φ ϑ (δ, ∆) and introduce the rescaled realized quadratic space-time variation where C(·) is a bounded continuous function on [0, ∞], given by (25), satisfying The condition δ/ √ ∆ ≡ r ∈ (0, ∞) can be relaxed to δ/ √ ∆ → r ∈ (0, ∞) as long as the convergence is fast enough which we omit for the sake of simplicity. If δ/ √ ∆ ≡ r ∈ (0, ∞) holds, (13) shows that the renormalization Φ ϑ (δ, ∆) and its first order approximation are close enough to be exchanged in the previous theorem. In this case we obtain a central limit theorem with a simpler renormalization which particularly does not depend on the model parameters: satisfies with ψ ϑ2 (r) from (12) and C(·) from (25): To end this section, we compare the realized quadratic variations V t , V sp and V and their asymptotic variances. For this purpose, we scale the statistics in such a way that they are asymptotically centered around the same mean, say σ 2 : For simplicity, let κ = 0. Plugging in the asymptotic expressions for Φ ϑ (δ, ∆) from Proposition 3.5 shows that Therefore, V approximately coincides with V sp and V t for r ∈ {0, ∞}, respectively, except for the factor 1/2 and using double increments instead of time or space increments, respectively. Further, denoting the asymptotic variances of V t , V sp and V by S t , S sp and S(r), respectively, we observe the relations S(∞) = 3 2 S t and S(0) = 3 2 S sp , where the factor 3/2 occurs since each double increment consists of two space or time increments, respectively.

Parameter estimation
In view of the covariance structure of the observation vector and the fact that the value of ϑ 0 is irrelevant from a statistical point of view (cf. Proposition 2.3), we consider the parameter vector It is straightforward to use the results from the previous section to construct method of moments estimators for the volatility parameter σ 2 or the diffusivity parameter ϑ 2 , provided that the other two parameters in (σ 2 , ϑ 2 , κ) are known, respectively. Doing so, we generalize the spatial increments based estimator from [7] to the double asymptotic regime and we complement the time increments based methods in [3,4]. Our estimators do not hinge on ϑ 0 (or Γ) such that the knowledge of its true value is not required.
Assuming firstly that ϑ 2 and κ are known, we obtain the following volatility estimators: where V sp and V t have been introduced in (15). (10): As discussed above, the double increments estimator has a larger variance than the single increments estimators. Hence, if one of the regimes N = o(M ) or M = o( √ N ) certainly applies, the single increments estimators are preferable. If none of the regimes is present or the situation is unclear, one can profit from the robustness of the double increments estimator with respect to the sampling regime.
If N = o(M ), the situation is close to that of N independent semi-martingales (cf. Proposition 2.1) and the asymptotic variance 2σ 4 of the spatial increments estimator equals the Cramér-Rao lower bound for estimating σ 2 , as can be seen by a simple calculation. Consequently,σ 2 sp is an asymptotically efficient estimator. The efficiency loss of the other estimators is due to the fact that for increasingly more temporal observations the infinite dimensional nature of the process X becomes apparent, leading to non-negligible covariances between increments.
If σ 2 and κ are known, the diffusivity ϑ 2 can be estimated bŷ using V sp and V t from (11) and (8), respectively. Due to the non-trivial dependence of the renormalization Φ ϑ (δ, ∆) on ϑ, it is not apparent how to construct a method of moments estimator for ϑ 2 based on Theorem 3.7 in general. However, if √ N /M ≡ r 0 > 0, the renormalization can be decoupled from the unknown parameter as exploited in Corollary 3.8. Since the function ϑ 2 → ψ ϑ2 (r) has range (0, ∞) and is monotonic, there is an inverse H r (·) and we can define the method of moments estimatorθ (14) and r = δ √ ∆ = r 0 1−2b √ T . As a direct consequence of the delta method, and the above central limit theorems, we obtain: We now consider parameter estimation when (σ 2 , ϑ) is unknown. Recall from Proposition 2.3 and its subsequent discussion that ϑ 0 cannot be estimated consistently on a finite time horizon. Moreover, it is not possible to estimate other parameters than (σ 2 / √ ϑ 2 , κ) or (σ 2 /ϑ 2 , κ) only based on the temporal or the spatial covariance structure, respectively. Estimation of (σ 2 / √ ϑ 2 , κ) via a least squares procedure based on temporal increments is disussed in [3]  To estimate all three identifiable parameters η = (σ 2 , ϑ 2 , κ), we employ a least squares approach based on double increments. Due to the highly nontrivial dependence of the normalization Φ ϑ (δ, ∆) on ϑ, a direct application of Theorem 3.7 is impossible. Assuming, however, a balanced design in the sense of δ/ √ ∆ ≡ r ∈ (0, ∞), we can use Corollary 3.8 where the normalization is decoupled from the unknown parameter ϑ.
Remark 4.4. Based onη, we can defineθ 1 :=η 2η3 =θ 2κ to estimate ϑ 1 . The delta method then yields a central limit theorem for (σ 2 ,θ 2 ,θ 1 ). Even when δ/ √ ∆ ≡ r > 0 does not hold, there are always subsets of the data having the balanced sampling design. Hence, the estimation procedure treated in Theorem 4.3 can be generalized to an arbitrary set {X ti (y k ), i ≤ N, k ≤ M } of discrete observations by considering an averaged version of the above contrast process. To that aim, choose v, w ∈ N such that v max(1, N/M 2 ) and w max(1, M/ √ N ). Then,∆ := v∆ andδ := wδ satisfy r :=δ/ ∆ 1.
Using double increments on the coarser grid where f ν η (z) = 2σ 2 ψ ϑ2 (r/ √ ν)e −κz and ν = 1, 2. The final estimator for η is then defined aŝ The rate of convergence of this estimation procedure is inherited from the observations on the coarser grids observations and has a balanced design by construction. Therefore, Theorem 4.3 implies the convergence rate 1/  Compared to the thinning method of [18], this rate is a considerable improvement. Indeed, it is (almost) optimal in the minimax sense, as shown in Section 5.

Lower bounds
Our next theorem proves that the estimatorη from (17) for η = (σ 2 , ϑ 2 , κ) is optimal in the minimax sense, up to a logarithmic factor. To obtain a lower bound, it suffices to consider the sub-problem where ϑ 1 = ϑ 0 = 0 and only (σ 2 , ϑ 2 ) has to be estimated.
and inf T is taken over all estimators T of (σ 2 , Remark 5.2. The lower bound for the case M/ √ N 1 is also valid for estimators based on {X ti (y j ), i ≤ N, k ≤ M } instead of the increments. We conjecture that this is also true for the case M/ √ N → 0. This lower bound shows that, in general, (σ 2 , ϑ 2 ) cannot be estimated with the parametric rate 1/ √ M N , in contrast to a conjecture in [7]. Instead, we observe a phase transition in the rate depending on the sampling frequency. The parametric rate can only be attained for a balanced design N M 2 .
The proof of Theorem 5.1 relies on the standard lower bound technique, cf. Tsybakov [30]. Using an inequality by Ibragimov and Has'minskii [16], we will bound the Hellinger distance of the laws of the observations in terms of the corresponding Fisher information for suitably chosen reparametrizations of (σ 2 , ϑ 2 ). For each sampling regime we choose a reparametrization (γ 1 , γ 2 ) of (σ 2 , ϑ 2 ) in such a way that γ 1 can be estimated with parametric rate, even without knowledge of γ 2 . Bounding the Fisher information for γ 2 , we then obtain a lower bound for the simpler problem of estimating the one dimensional parameter γ 2 , assuming that γ 1 is known. Clearly, the resulting lower bound for γ 2 carries over to (γ 1 , γ 2 ) and consequently to (σ 2 , ϑ 2 ). The main effort, noting that the observations are significantly correlated, is to derive sharp upper bounds for the Fisher information in the different sampling regimes.
In the case M/ √ N 1 we apply the following bound on the Fisher information for discrete observations of the first M coefficient processes. Thanks to the Markov property, the probability density function for discrete observations of an Ornstein-Uhlenbeck process is provided by the transition density and allows for explicit computations.
1. If M √ N and σ 2 is known, Proposition 5.3 suggest a lower bound of M −3/2 for estimation of ϑ 2 in the spectral approach. Indeed, this rate is achieved by the maximum likelihood estimator for time continuous observations of the coefficient processes, cf. [21]. 2. The reparametrization was chosen since σ 2 can be computed from the quadratic variation of any coefficient process u when N → ∞, while ρ 2 can be computed from the empirical variance of u (t i ), ≤ M, for a fixed t i as M → ∞, even without knowledge of the other parameter, respectively.
Letting M → ∞, Proposition 5.3 suggests that based on observations of the coefficient processes it is not possible to estimate σ 2 (and in particular (σ 2 , ϑ 2 )) at a rate faster than N −3/4 . Further, assuming ϑ 1 = 0, the eigenfunctions e (·) do not depend on unknown parameters and hence, the space-time discrete observations of the SPDE may be reconstructed from Consequently, the lower bound N −3/4 carries over to discrete observations of the SPDE.
Although the lower bounds resulting from Proposition 5.3 and Theorem 5.1 are almost the same, their proofs require a very different reasoning if M/ √ N → 0: In this case, if σ 2 is known, Proposition 4.2 shows that it is possible to estimate ϑ 2 with parametric rate of convergence based on discrete observations of the SPDE whereas Proposition 5.3 suggests that ϑ 2 = σ 2 /ρ 2 cannot be estimated at a faster rate than M −3/2 based on the coefficient processes. In particular, both observation schemes are not asymptotically equivalent in the sense of Le Cam.
To derive the lower bound in the case M/ √ N → 0, we consider the situation where observations are recorded at rational positions y k = k M , k = 1, . . . , M − 1, where we work with M − 1 instead of M spatial observations for ease of notation. Thus, we potentially add spatial observations on the margin [0, b) ∪ (1 − b, 1] which can only increase the amount of information contained in the data. Since e (·) = √ 2 sin(π ·) is the sine basis, trigonometric identities imply that the vectors e k := (e k (y 1 ), . . . , e k (y M −1 )) ∈ R M −1 , k ∈ N, satisfyē k+2M =ē k for all k ∈ N and ē k , Equivalently, (e k ) k=1,...,M −1 form an orthonormal basis with respect to the empirical scalar product and the relations for (ē k ) k≥1 follow from the symmetry of the sine. Therefore, observing where Since the sets I k = I + k ∪ I − k are disjoint for different values of k, the processes {U 1 , . . . , U M −1 } are independent which simplifies the calculation of the Fisher information considerably. Based on their spectral densities and Whittle's formula (34) for the asymptotic Fisher information of a stationary Gaussian time series, we obtain the following result for the increment processesŪ k , k ≤ M − 1, defined bȳ Hereby, the reparametrization allows for estimation of σ 2 0 = σ 2 / √ ϑ 2 with parameteric rate based on time increments in the regime M/ √ N → 0, even when ϑ 2 is unknown. We have considered U k instead of U k due to the technical reason that the N -th order Fourier approximation of the spectral density of the increment process is positive and hence, a spectral density as well. We conjecture that the same bound holds for the Fisher information of U k .

Simulations
The following numerical example illustrates the asymptotic results for the estimators derived in Section 4. In order to simulate X on a grid in time and space, we have considered the approxima- where K is a large number. Moreover, the Ornstein-Uhlenbeck processes u are simulated exploiting their AR(1)-structure, namely √ N M . Using the same value for r, the simplified double increments estimator for σ 2 is computed by replacing the normalization Φ ϑ (δ, ∆) by e −κδ/2 ψ ϑ2 (r) √ ∆. As expected, the estimators based on temporal increments only achieve the parametric rate of convergence as long as M is not too large, whereas estimators based on space increments only work well when M is not too small. The estimators based on double increments perform very well throughout any regime depicted in the plot. Even the simplified versions work surprisingly well, although their applicability is only supported by our theory as long as M √ N . In particular, the double increments estimator for σ 2 can barely be distinguished from the simplified one. The theory suggests that the estimators based on space increments or time increments should have a smaller mean squared error than the double increments estimators in the regimes √ N /M → 0 or √ N /M → ∞, respectively. The simulation confirms this effect for time increments, while we would require larger values of M to see the asymptotic behavior for space increments. However, to simulate the spatial increments estimator for large M , a considerably larger value of K turns out to be crucial since otherwise the statistical bias of the estimator is amplified by a numerical bias. The above estimators require that all but one of the parameters (σ 2 , ϑ 2 , κ) are known. In the more difficult statistical problem where all parameters are unknown, η = (σ 2 , ϑ 2 , κ) can be estimated byη from (16) and byη v,w from (17). Figure 2 shows their mean squared error, again based on 500 Monte Carlo iterations. For the averaged estimatorη v,w , we set indicates rounding to the next integer. Since minimizing a functional of the type F (η) 2 for some function F on a compact set is a hard numerical task we have considered the corresponding ridge regression problem, that is we minimize F (η) 2 + λ η 2 instead. Regularizing with the squared inverse of the expected rate of convergence, i.e. λ = 1/(N 3/2 ∧ M 3 ) forη v,w and λ = 1/(N M ) forη produced reasonable results, respectively.
In contrast to the double increments estimators for single parameters,η only produces good results as long as M √ N , which is covered by the theoretical foundation. The averaged version η v,w works well throughout. Furthermore, we see that it is only possible to profit from an increasing number of spatial observations up to a certain degree. Indeed, for M ≥ √ N the optimal rate is N −3/2 and the Monte Carlo mean squared error does not improve further. To cover also the regime √ N /M → ∞ for sufficiently large values of M, N , corresponding simulations are costly and not part of this simulation study. The Monte Carlo mean squared error ofη v,w is not everywhere monotonic in M since the effective sampling frequency ratio r on the coarser grid where the double increments are computed is only approximately constant throughout the plot. Finally, we remark that our choice of v and w results in v = w = 1 for the two smallest values of M and hence, the two estimators are the same.

Proofs for the central limit theorems for realized quadratic variations
First, we prove the generic central limit result in Proposition 3.1. Afterwards, we can verify the central limit theorems for realized quadratic variations based on spatial increments (Theorem 3.3) and double increments (Theorem 3.7).
Thus, in terms of Owing to its closed form expression in (40) below, we see that f ∈ C ∞ b ([0, 2]) and f (0) = − 1 4ϑ2 . Hence, For y = y k we obtain the asymptotic mean E(V sp ) = σ 2 2ϑ2 + O(δ) and in particular, under the condition N/M → 0, Step 2. We calculate the asymptotic variance. By Isserlis' Theorem [17] we have Together with the symmetry Cov(S ik ,S jl ) = Cov(S jk ,S il ) this implies We have already shown that Var In the sequel, we show that the remaining covariances do not contribute to the asymptotic variance.
For v 2 we define ω := ϑ 2 (π 2 ∧ (π 2 + Γ)) > 0 such that λ ≥ ω 2 for all where the last step follows by Riemann summation with mesh size To bound v 3 we follow the same strategy as for the mean: Since (21) consists exclusively of second order differences we have Cov To estimate v 4 , we deduce from (21) for k < l and J = |i − j| ≥ 1 that e −λ t 2λ cos(π y).
By Riemann summation we have f t (y) ≥1 e −λ t 1 √ t . On the other hand, by Lemma A.7, .
Similarly, f t (y), f t (y) B(t, y) can be shown. We conclude where the last step follows from Step 3. To prove asymptotic normality, we interpret the number of temporal and spatial observations as sequences M = M n , N = N n indexed by n ∈ N and consider the triangular array uniformly in j < N, l < M in view of criterion (7). The covariance bounds in Step 2 yield uniformly in j < N, k < M : where we have used the Cauchy-Schwarz inequality to obtain the last bound. It remains to note N/M → 0 and N ∆ 1.
The proof of Theorem 3.7 is similar to the previous one but the more complex covariance structure of the double increments has to be taken into account carefully, see Section A.1. The (asymptotic) mean of the realized quadratic space-time variation is provided by Proposition 3.5, which we prove first. In the following, we writẽ D ik := e κy k /2 D ik .
Proof of Proposition 3.5.
Using the cosine series formula (40), we can compute from which it easily follows that H ∆ (x) ∆ and H ∆ (x) √ ∆. The derivatives of can be bounded summand-wisely, where the bounds follow from the Riemann sum approximations in Lemma A.8, owing to xh(x)| x=0 = x 2 h(x)| x=0 = 0.
Step 3. We show the asymptotic expressions in (ii). Due to a Riemann sum argument, we have F ϑ2 (·, ∆) ∞ √ ∆ and consequently, In the case δ/ √ ∆ → 0 Taylor's formula yields for some η ∈ [0, δ]. We employ the representation F ϑ2 (·, ∆) = H ∆ +G ∆ from Step 2: Since sin(0) = 0 we have . Finally, we derive the asymptotic expression for the case δ/ √ ∆ ≡ r, while δ/ √ ∆ → r can be handled similarly. We have and since 1 − cos(0) = 0, Lemma A.9 yields It remains to compute the integral. By substitutingr = r/ √ ϑ 2 we can pass to To compute h 1 , note that S(z) + cos ( The claim thus follows from Proof of Theorem 3.7. Asymptotic normality follows just like in the proof of Theorem 3.3. Using the notation from the proof of the latter theorem (with space increments replaced by double increments) we have To determine the asymptotic variances, we have to treat the three different sampling regimes separately.
Case δ/ √ ∆ → 0. By Lemmas A.1 and A.2 we have as well as The latter covariances are negligible for the asymptotic variance since k≤M ( M 2 Case δ/ √ ∆ → ∞. By Lemmas A.1 and A.3 we have Note that the O(∆ 3 )-term is negligible for the asymptotic variance since The remaining covariances do not contribute to the asymptotic variance since for |k − l| ≥ 2 we have The claim is now proved by inserting Φ 2 ϑ (δ, ∆) = 4 πϑ2 e −κδ ∆+o(∆) and noting that for the function We show that the asymptotic variance is given by and Since each term ξ ∆ J,L is a Riemann sum multiplied by √ ∆, we have for J, L ≥ 0 By symmetry of the cosine, also holds for negative L. Hence, we can write for all L ∈ Z and J ≥ 0 and with G from (26) where the last equality follows from Therefore, By dominated convergence and taking Cesàro limits twice, we conclude lim M,N →∞ ϑ 2 )/ϑ 2 yields the claimed asymptotic variance.

Proofs for the estimators
Propositions 4.1 and 4.2 follow immediately from the central limit theorems for the realized quadratic variations and the delta method. Before proving Theorem 4.3, we introduce some notation that will be used throughout the proof and we state the asymptotic covariance matrix explicitly. Recall the definition of Λ i,k (·) from (25) and for any i, k ∈ Z let In terms of . We will prove that the asymptotic covariance matrix equals where U = U (η) and V = V (η) are defined via Proof of Theorem 4.3. The proof uses the classical theory on minimum contrast estimators, see e.g. [9]. In particular, the mean value theorem yields as soon as [η, η] ⊂ H, whereK N,M andK N,M denote gradient and Hessian with respect to η, respectively. In the sequel, we will verify that K N,M is associated with the contrast function (Steps 1-2), show consistency ofη (Step 3), prove asymptotic normality ofK N,M (η) with covariance matrix U (Steps 4-7) and deduce stochastic convergence of 1 0K N,M (η + τ (η − η)) dτ to the invertable matrix V (Steps 8-9). The result then follows from Slutsky's Lemma and Step 1. We show that K is a contrast function in the sense that for each η the functionη → K(η,η) attains its unique minimum inη = η.
holds if and only if κ =κ and σ 2 ψ ϑ2 (r i ) =σ 2 ψθ 2 (r i ) for i = 1, 2. Therefore, in order to prove identifyability, it is sufficient to show that ϑ 2 → ψ ϑ2 (r 1 )/ψ ϑ2 (r 2 ) is injective, which in turn is implied by strict monotonicity of H(r 1 z)/H(r 2 z) in z > 0. We show that the corresponding derivative or, equivalently, the function z → H (r 1 z)H(r 2 z)r 1 − H (r 2 z)H(r 1 z)r 2 , is strictly negative for all z > 0: For x > 0 define p(x) = ∞ x e −z 2 dz and q(x) = 1 − e −x 2 . A simple calculation shows that H (r 1 z)H(r 2 z)r 1 − H (r 2 z)H(r 1 z)r 2 = 32 π r 1 r 2 z p(r 1 z)q(r 2 z)r 1 z − p(r 2 z)q(r 1 z)r 2 z which is strictly negative if we can show that p(b)q(a)b − p(a)q(b)a < 0 for all 0 < a < b. Now, a substitution yields p(x) = x ∞ 1 e −x 2 t 2 dt and q(x) = 2x 2 1 0 se −x 2 s 2 ds and therefore, follows from negativity of the integrand.
In the sequel we follow the series of arguments from Theorem 5.1 of [3].
Step 2. K is the contrast function associated with the process K N,M in the sense that K N,M (η) ξ ∆ i,k = O √ ∆ (|i| + 1)(|k| + 1) (31) and lim N, we can write Clearly, the first summand converges to K 1 (η,η). To prove that the other two summands are negligible, note that By Markov's inequality and boundedness of φ(·) = f 1 η (·) − f 1 η (·), we have for any ε > 0, hence, the second summand in (32) converges to zero in probability. For the third summand the same conclusion holds since and L 1 -convergence implies convergence in probability. K 2 N,M can be handled similarly by considering a decomposition into two sums of non-overlapping increments: Step 3. Consistency ofη follows from uniform convergence in probability of the contrast process.
The same argument applies to K 2 N,M and the result follows.
Step 5. We show that the asymptotic covariance matrix of √ N MK N,M (η) is given by U : We havė and similarly forK 2 N,M (η). From Isserlis' theorem, (30) Now, for any 1 ≤ e, f ≤ 3, the first summand in the expansion is given by Like in the proof of Theorem 3.7, the covariances may be replaced by their asymptotic expressions due to dominated convergence. Further, using (h i η ) e (z) = e −κz (g i η ) e (z) and Step 4, we have Analogously, and insertion into (33) yields the claimed asymptotic covariance matrix.
Step 6. U is strictly positive definite: It is sufficient to show that C r < √ A r B r , then it follows for where we may assume H 1 α , H 2 α b < 0 since otherwise α U α > 0 follows immediately from the first equality. Now, consider (A r i,k ) and (B r i,k ) as elements in the Hilbert space 2 of square summable 2 and a direct calculation shows that C r = (A r i,k ), (B r i,k ) 2 . Thus, by the Cauchy-Schwarz inequality we have C r ≤ √ A r B r and equality is ruled out by the fact that (A r i,k ) and (B r i,k ) are not linearly dependent.
In view of the Cramér-Wold device, we for any α ∈ R 3 . Let s ik and Z ik be given by the ik where s ik ∈ {−1, 1} is deterministic. Analogously, defines ik and Z 2 i,k . Then, Z N,M = (Z ik ,Z j,l ) i,j,k,l is a Gaussian vector and from Proposition 3.5 it follows that ik . From Steps 5 and 6 we can deduce that Var (S N,M ) → α U α > 0, N, M → ∞ and thus, in view of criterion (7), asymptotic normality follows if the absolute row sums of the covariance matrix of Z N,M vanish uniformly. This in turn is a simple consequence of (30) and (31).
Step 8. In order to prove and analogously forK 2 N,M . By using P η (η N,M ∈ H) → 1 and the uniform continuity of f i η (z) and its derivatives in the parameter (z, η) ∈ [0, 1] × H, it is straightforward to showK N,M (η N,M ) − Clearly, first summand ofK 1 N,M (η) converges to 2V 1 while the calculations of Step 2 show that the second summand converges to 0 in probability. The same reasoning holds forK 2 N,M (η) and the result follows.
Step 9. V is strictly positive definite: Being Gram matrices, V 1 and V 2 are positive semi-definite and consequently, the same holds for V . Clearly, the only way V can be singular is if there exists holds for both i ∈ {1, 2}. From the particular form of the functions (g i η ) e it is apparent that this would imply that α 1 ψ ϑ2 (r i ) + α 2 σ 2 ∂ψ ϑ 2 (ri) ∂ϑ2 = α 3 = 0 for both i ∈ {1, 2}, which is impossible.
Proof of Proposition 4.5. We have to prove Similar calculations as in Theorem 4.3 show that Steps 1-3 and 8-9 of the corresponding proof remain valid. Consequently, we have the representation −K N, is an invertible deterministic matrix. In particular, the set The second summand becomes arbitrarily small as M, N → ∞. For the first summand, let γ(η) = V (η) −1 2 + 1, then it follows from Markov's inequality that

Proofs of the lower bounds
Before we prove Theorem 5.1, we verify its ingredients Proposition 5.3 and Proposition 5.5.
Proof of Proposition 5.3. By setting a = k 2 , µ = π 2 ϑ 2 and ν 2 = σ 2 π 2 ϑ2 in Lemma A.4 and using independence of (u , ∈ N) we get the Fisher information matrix I for the parameters (µ, ν 2 ), namely , The Fisher information matrix J = J M,N for the parameters (σ 2 , ρ 2 ) can be computed via the change of variables formula J = A IA where A = π 2 /ρ 2 −π 2 σ 2 /ρ 4 0 1/π 2 is the Jacobian of the function transforming (σ 2 , ρ 2 ) to (µ, ν 2 ). Hence, the diagonal entries of J are given by If M √ ∆ is bounded away from 0, then I 11 can be interpreted as a Riemann sum. We obtain On the other hand, if M √ ∆ → 0, it follows from Lemma A.10 and g 11 (0) = 1 2µ 2 = ρ 4 2π 4 σ 4 , g 12 (0) = 1 2µν 2 = 1 2σ 2 as well as g 11 (0) = g 12 (0) = 0 that Therefore, the leading terms in J 22 cancel and consequently, Proof of Proposition 5.5. For a discrete time, centered, stationary Gaussian process (Z j ) j∈Z whose covariance function depends on an unknown parameter θ ∈ R we denote the Fisher information of a sample (Z 0 , . . . , Z n−1 ) with respect to θ by I n (Z). A particularly useful result to calculate I n (Z) for the above class of Gaussian processes is given by Whittle [32]: where , is the spectral density of Z. Setting θ = π 2 ϑ 2 , (34) cannot be directly applied to the process Z =Ū k , for 1 ≤ k ≤ M − 1, sinceŪ k arises from high-frequency increments of the continuous time process U k . In this case, the spectral density Φ ∆ k ofŪ k hinges on ∆ = 1/N and therefore, even for large N , I N (Ū k )/N is not necessarily close to the asymptotic Fisher information defined in (34).
In order to verify (38), we only have to consider the integral over [0, π] by symmetry. From Lemma A.6 we can deduce for ω ≥ k 2 ∆ For ω ≤ k 2 ∆, Lemma A.6 gives S(ω) ( ω 2 a substitution yields We can now conclude the main lower bound. Proof of Theorem 5.1. The proof of the lower bound relies on the fact that if (P γ ) γ∈G is a dominated family of distributions with a convex parameter space G ⊂ R, then the Hellinger distance H can be bounded in terms of the Fisher Information J: Let ν be a dominating measure, p(·, γ) = dP γ /dν and g = √ p. Then, as shown in [16,Theorem I.7.6], Jensen's inequality yields Combining this bound of the Hellinger distance (in the setting of Theorem 5.1) with Theorem 2.2 by Tsybakov [30], it suffices that for each sampling regime there is a reparametrization (γ 1 , γ 2 ) of (σ 2 , ϑ 2 ) such that the corresponding Fisher information satisfies J (i) If min(M, N ) remains finite and M/ √ N 1, then N necessarily remains finite and the result follows from (ii). On the other hand, if M/ √ N → 0, then M must remain finite. Like in the proof of (ii), extend the set of spatial locations to {z k , k < qM } and consider the corresponding processes U k , k = 1, . . . , qM −1 from (19). A similar calculation as in the proof of Proposition 2.3 shows that for any k < qM , the laws of the independent continuous processes {U k (t), t ≤ 1} are absolutely continuous for different parameter values (σ 2 , ϑ 2 ) and (σ 2 ,θ 2 ) as long as σ 2 / √ ϑ 2 =σ 2 / θ 2 and hence, consistent estimation of (σ 2 , ϑ 2 ) based on continuous or discrete observations is impossible: where h (σ 2 ,ϑ2) is defined in the proof of Proposition 2.3. Now, a Riemann sum midpoint approximation, cf. Lemma A.9, shows that as u → ∞. Since h (σ 2 ,ϑ2) is symmetric around 0 we obtain from which equivalence follows as in Proposition 2.3.
The claimed formulas now follow by inserting the closed expressions for x ∈ [0, 1] and again applying (39) and sinh(α) sinh(β) = 1 2 (cosh(α + β) − cosh(α − β)), respectively. To prove the second statement we use the ansatz Z(x) = u(x)B(v(x)), u, v positive and v non-decreasing, which is the general form of a Gaussian Markov process, cf. [24]. Comparison of covariance functions yields explicit expressions for u and v. Further, u(x)B(v(x)) for v(0) = 0 and the claimed semi-martingale representation follows from Itô's formula.
Proof of Proposition 2.3. The necessity of the conditions on the parameters follow from the fact that (i) the parameter σ 2 / √ ϑ 2 e −κx0 may be consistently estimated using time increments, see [3], and (ii) the parameters σ 2 ϑ2 and κ may be consistently estimated by computing the quadratic variation of the process x → X t (x) on two different sub-intervals of [0, 1] in view of Proposition 2.1. It remains to prove sufficiency of the conditions on the parameters: (i) is a simple consequence of [19, Proposition 1]: Set λ = ϑ 2 (π 2 2 + Γ) andλ = ϑ 2 (π 2 2 +Γ) where Γ = < ∞. Thanks to (i) and due to the one to one correspondence between Γ and ϑ 0 we may assume Γ =Γ = 0 for the remainder of the proof.
(ii) follows from the fact that Cov(X t0 (x), X t0 (y)) only depends on σ 2 ϑ2 , κ in view of the Gaussianity of X.

A.1. Covariances of double increments
The following three lemmas are used to calculate the asymptotic variance of V. Recall the definition ofD ik from (23).
Hence, as in previous Lemmas it is sufficient to establish For J = 0 this was already proven in Proposition 3.5. The case J = 1 follows from the case J = 0 since (41) shows For J ≥ 2 we have and therefore, again using a Riemann sum approximation with lag (J − 1)∆, The bound on the first derivative is provided by Lemma A.7, i.e. h J ∞ J −3/2 . In view of Lemma A.8 this shows Lemma A.2. For J ∈ N 0 and z ∈ (0,2) it holds that Proof. (i) The validity for the case J = 0 follows from the proof of Proposition 3.5 (ii), the case J = 1 follows from (42). For J ≥ 2 we have by Taylor's theorem for some ξ ∈ [0, δ]. Now, the claim is proved by inserting F J,∆ (0) = 0 and noting due to (43): (ii) As in previous Lemmas it suffices to establish . and since g J (0) = 0 we have by Lemma A.9 Finally, (ii) is a direct consequence of (i).

A.2. Auxiliary results for the lower bounds
For the proofs of Propositions 5.3 and 5.5 we require the following auxiliary lemmas.
Lemma A.4. Consider a discrete sample (u(i∆), i = 0, . . . , N ) of an Ornstein-Uhlenbeck process given by and assume ∆ = 1/N . Then, the Fisher information I = I N for the parameter (µ, ν 2 ) is given by , Proof. By the Markov property of u, the log-likelihood function of (µ, ν 2 ) is given by where p t (x, y) = 1 √ πν 2 (1−e −2µat )/a exp − (y−xe −µat ) 2 ν 2 (1−e −2µat )/a is the transition density of u and π 0 is the density of the initial distribution N 0, ν 2 2a . By stationarity of u, the Fisher information simplifies to where we write D 2 g for the Hessian of a function g. This expression can be computed explicitly, yielding the claimed formulas.
(ii) can be shown by writing G(·, ω) as a sum of monotonic functions and noting that for a monotonic function g : R + → R it holds that g L 1 = | lim x→∞ g(x) − lim x→0 g(x)|.

A.3. Bounds on Fourier series and Riemann summation
The Lemmas in this section provide bounds for Fourier series and Taylor expansions for Riemann sums. Similar results are stated in Lemma 7.2 of [3].
Lemma A.7. Let (a n ) be a real sequence and τ ∈ {sin, cos}. Then, N k=1 a k τ (ky) ≤ 1 + 2K N y ∧ (2π − y) sup n≤N |a n | holds for any y ∈ (0, 2π) where K N is the number of monotone sections of (a n ) 1≤n≤N .