Smoothness estimation of nonstationary Gaussian random fields from irregularly spaced data observed along a curve

Abstract: This article considers estimating the smoothness parameter of a class of nonstationary Gaussian random fields on Rd using irregularly spaced data observed along a curve. The set of covariance functions includes a nonstationary version of the Matérn covariance function as well as isotropic Matérn covariance function. Smoothness estimators are constructed via higher-order quadratic variations. Under mild conditions, these estimators are shown to be strongly consistent and convergence rate upper bounds are established with respect to fixed-domain asymptotics. Simulations indicate that the proposed estimators perform well for moderate sample sizes.

The aim of this article is to estimate ν in (5) using a sample of observations X(t 1 ), . . . , X(t n ) where the design sites t i , 1 ≤ i ≤ n, are irregularly spaced on a sufficiently smooth curve γ : [0, L] → R d for some constant L > 0. In Selecting the design sites on a (1-dimensional) curve in R d is commonly called curved line transect sampling. This is a generalization of the usual line transect sampling (cf. Chapter 17 of Thompson [27]). Hiby and Krishna [12] presented convincing arguments for the use of curved line transect sampling by replacing the straight line in line transect sampling by a curve. Constantine and Hall [8] wrote that "in a variety of practical problems often data are only available through one-dimensional line transect 'samples' of the surface". Adler and Pyke [1] considered first-order quadratic variations using design sites on a curve in [0, 1] 2 when the underlying Gaussian random field is in some sense 'like' Brownian motion on R 2 . Loh [20] discussed second-order quadratic variations from a sample of Gaussian random field observations taken along a smooth curve in R 2 . However supporting theory behind curved line transect sampling appears to be rather underdeveloped; possibly because the theoretical extension from line transect to curved line transect is non-trivial. We hope to work out a theoretical justification for curved line transect in the setting of this article.
Even though the likelihood is Gaussian, likelihood methods (such as maximum likelihood estimation) appear to be analytically intractable under fixeddomain asymptotics. The latter asymptotics imply that as sample size n → ∞, the n sites get to be increasingly dense in a compact set in R d . In this article, the dependence among all the observations X(t 1 ), . . . , X(t n ) remains strong as sample size n → ∞. Furthermore since |X(t i )−X(t j )| → 0 as t i −t j → 0, the n × n covariance matrix of the observations tends to a singular matrix and its determinant tends to 0. This is exacerbated by the t i 's being irregularly spaced. All these indicate that any theoretical analysis of the MLE for ν is a formidable (or even intractable) task with respect to fixed-domain asymptotics. Indeed as far as we know, the consistency of the MLE for ν is still an open problem.
This article is motivated by [20] where the idea of using higher-order quadratic variations V θ, , θ ∈ {1, 2}, ∈ Z + , for constructing smoothness estimators is proposed for irregularly spaced data. However [20] considers stationary, isotropic Gaussian random fields whereas this article is concerned with nonstationary Gaussian random fields. Convergence rates are not available in [20] whereas upper bounds to the convergence rate of the proposed smoothness estimators are established here. The results in this article complement those in [21] with regard to the estimation of the smoothness parameter of an isotropic Gaussian random field with a Matérn covariance function. The difference lies in the choice of the design sites. While this article is concerned with design sites chosen on a curve, [21] chooses design sites randomly on [0, 1] d . Consequently, if d ≥ 2, the results of [21] are not applicable to this article.
Quadratic variations started with [19] and is currently a rather active field; some examples being [1,2,5,17]. However most higher-order quadratic variations in the literature use data on a regular grid in R d ; cf. [7,15] and references cited therein.
The estimation of the smoothness parameter ν of a stationary Gaussian random field has been addressed in the literature under various conditions by many authors. [14] proposes a semiparametric method of estimating ν using irregularly spaced observations. The estimates in [14] appear to be analytically intractable under fixed-domain asymptotics. Furthermore in the simulations, [14] uses 200 independent realizations of a Gaussian random field, whereas this article is concerned with estimating ν based on observations from one realization of the underlying Gaussian random field.
In the case of equally spaced data on an interval of the real line, [10] considers a box-counting estimator while [8,16] study estimators based on process increments. [8,10,16] all assume that ν ∈ (0, 1). Another example of equally spaced data on an interval is [15] where higher-order quadratic variations are used to construct a consistent estimate for ν given that ν ∈ (D, D + 1) for some known integer D. In contrast, this article assumes that ν > 0 and a known upper bound for ν is not required.
Finally, [6] considers smoothness estimation for a class of locally stationary Gaussian processes using irregularly spaced data observed on a compact interval in R. Assuming that the smoothness parameter ν / ∈ Z + , [6] proposes estimators for ν , ν − ν and proved, in the almost sure sense, a O(n −1/2 log a (n)) convergence rate for estimating ν − ν where a > 0 is some constant. The design sites in [6] are deterministic like those in [20] and do not apply to random designs.
The remainder of this article is organized as follows. Section 2 contains remarks on notation and preliminary technical results that are needed in the sequel.
For θ ∈ {1, 2} and ∈ Z + , Section 3 presents the construction of the th-order quadratic variations V θ, for stratified design sites. Theorem 1 proves a number of fixed-domain asymptotic results on V θ, . These results are needed for the construction of the smoothness estimators for ν in this section. Two estimatorŝ ν n, andν n are proposed. Theorems 2 and 3 prove the strong consistency ofν n, andν n and establish upper bounds to the convergence rate of these estimators.
Section 4 adapts the results on stratified design in Section 3 to random design, i.e. the t i 's are i.i.d. random variables on the curve segment γ : [0, L] → R d .
Section 5 considers the case where the design sites t 1 , . . . , t n are deterministic points on the curve segment γ : [0, L] → R d whose relative spacing between each other is governed by a nonrandom strictly increasing mapping ϕ ∈ C 2 (R). Two estimatorsν D n, andν D n for ν are proposed. Theorems 6 and 7 prove the strong consistency ofν D n, andν D n and establish convergence rate upper bounds of these estimators. In particular under mild conditions, E|ν D n − ν| = O(n −1/2 ) as n → ∞.
Section 6 presents Monte Carlo simulations to study the finite sample accuracy of the smoothness estimatorsν n andν D n . Since the random design can be reduced to a stratified design, the simulations are carried out only for stratified design and deterministic design.
Appendices A to F contain the proofs of all the results in this article.

Some preliminary results
For a function f : R k → R, k ∈ Z + , we write if the latter exists whose value does not depend on the order of differentiation, where u 1 , . . . , u k are nonnegative integers. For M ∈ Z + , let C M (S) be the set of functions f : S → R that are M times continuously differentiable (i.e. all M th-order partial derivatives of f exist and are continuous). . and . denote the greatest integer function and least integer function respectively. a n b n means 0 < lim inf n→∞ a n /b n ≤ lim sup n→∞ a n /b n < ∞ and a n ∼ b n means lim n→∞ a n /b n = 1. If A is a matrix, then A is its transpose and if A is a square matrix, |A| denotes its determinant. For x, y ∈ R, x ∧ y = min{x, y} and x ∨ y = max{x, y}. Conditions 1 and 2 below will be needed in the sequel. Condition 1. X(t), t ∈ R d , is a Gaussian random field with covariance function K(., .) as in (5) and mean function m(t) = EX(t) where m(.) ∈ C N (R d ) for some integer N ≥ 2ν + 6.
for all x, y ∈ D and for all x, y ∈ D satisfying x = y.

Proposition 1. Condition 2 is satisfied for the covariance function
In particular, Condition 2 holds for an isotropic Matérn covariance function.
This article assumes that the observations of X are taken along a fixed curve γ in R d and that γ(.) satisfies Condition 3 below. Condition 3. The curve γ : R → R d is a simple C N -curve parametrized by its arc length for some integer N ≥ 2ν + 6. In particular, (ii) writing γ(t) = (γ 1 (t), . . . , γ d (t)) and its kth derivative by γ (k) (t) = (γ (k)
be as in (9). Then we haveρ 0 (x, y) ∈ C 2ν +1 (R 2 ) and for M ≤ N , for all x, y ∈ [0, L] and We end this section with Proposition 3 below on the smoothness of the Gaussian random fields X(.) and X(γ(.)). Definition 1. Following Section 3.1 of [9], two random fields X t and Y t defined on a common probability space and indexed by a common set T are said to be equivalent versions of each other if for every fixed t ∈ T , P(X t = Y t ) = 1. (i) X(γ(t)), t ∈ R, has jth-order mean square derivative if and only if j < ν.
(ii) For any bounded open interval T 1 ⊂ R, X(γ(t)) when restricted to t ∈ T 1 has an equivalent version which possesses, with probability 1, a C ν −1 (T 1 ) sample path (iii) X(t), t ∈ R d , has all jth-order mean square partial derivatives if and only if j < ν. (iv) For any bounded open set T ⊂ R d , X(t) when restricted to t ∈ T has an equivalent version which possesses, with probability 1, a C ν −1 (T ) sample path.

Stratified design
Sections 3 and 4 assume that the curve segment γ : [0, L] → R d is known. Let the observed sample be X(γ(t n,1 )), . . . , X(γ(t n,n )) where 0 ≤ t n,1 < . . . < t n,n ≤ L. For brevity, we write t i = t n,i and X(γ(t n,i )) = X i . Let be the Euclidean distance between γ(t i ) and γ(t j ). We consider the following stratified design where t i satisfies 0 ≤ δ i < 1, i = 1, . . . , n. In this section, the δ i 's are assumed to be (nonrandom) constants though they can vary with n.

Higher-order quadratic variations
This section introduces a new class of higher-order quadratic variations that are needed in the construction of smoothness estimators for ν. This is accomplished by using these higher-order quadratic variations to filter out the asymptotic contributions of ν. Let ω n be a positive integer such that ω n n ξ for some constant ξ ∈ (0, 1). Define for ∈ Z + and θ ∈ {1, 2}, and the th-order quadratic variation based on X 1 , . . . , X n to be The properties of V θ, depend crucially on Lemma 1 and this lemma is the motivation for using the term " th-order" for V θ, .
where we use the convention 0 0 = 1. Lemma 1 goes back to at least [26]; see also Section 1.2.3, problem 33, of [18]. For t > 0, define Denote the (n − ω n θ ) × 1 vector ). Denote . 2 and . F as the spectral norm and Frobenious norm of matrices respectively. Then where Z ∼ N n−ωnθ (0, I). Theorem 1 is crucial to the construction of the estimators for ν in Section 3.2.
(a) Suppose ν < . Then Then and for all s > 0, there exist constants C, C 1 such that

Estimating the smoothness parameter ν
Motivated by Theorem 1(a), we shall now proceed to construct a consistent estimatorν n, for the smoothness parameter ν assuming ν ≤ for some known ∈ Z + . After that, we introduce the estimatorν n which no longer relies on the known upper bound of ν. Letν n, be such that Theorem 2 shows thatν n, is a strongly consistent estimator for ν provided ν < . We now propose the estimatorν n which is of more practical interest as its computation does not require a known upper bound for ν.ν n is motivated by a construction in [6]. Let M n be a positive integer such that M n = o(n/ω n ) and M n → ∞ as n → ∞. Define for = 1, . . . , M n , the events and 0 = j if Ω j occurs where j ∈ {0, . . . , M n }. 0 is well defined integer-valued random variable as Ω 0 , . . . , Ω Mn form a partition of the sample space. Defineν n asν n, 0 whereν n,0 = M n .

Theorem 3. Suppose Conditions 1 to 3 hold. Thenν n → ν almost surely and
as n → ∞ uniformly over δ i ∈ [0, 1), 1 ≤ i ≤ n. By taking ω n n 1/3 , we obtain The O(.)'s in Section 3 are uniform over δ i ∈ [0, 1), 1 ≤ i ≤ n. Hence the theorems in this section hold when the δ i 's are random and are independent of X(.).

Random design
Let t 1 , . . . , t n be a sequence of i.i.d. random variables where t 1 has probability density function p(t), t ∈ [0, L], satisfying inf t∈[0,L] p(t) = p 0 > 0 for some unknown p 0 . We assume that t 1 , . . . , t n are independent of the Gaussian random field X(.). Set n τ = n τ log 2 (n) for any constant τ ≥ 1. Then It follows from the Borel-Cantelli lemma that with probability 1, there exists a (random) integer N 0 such that for n ≥ N 0 , every interval [(i − 1)L/n, iL/n) contains at least one t j ∈ {t 1 , . . . , t n }.
We propose the following estimator for ν. Letτ be the smallest real number greater than or equal to 1 such that Denote t j ∈ [(i − 1)L/nτ , iL/nτ ) as t(i) such that where 0 ≤ δ i < 1 and the δ i 's are random variables independent of X(.). If there are more that one t j 's in [(i − 1)L/nτ , iL/nτ ), we choose any one of these t j 's to be t(i). It follows from the above argument thatτ = 1 for sufficiently large n almost surely. Now this random design reduces to the stratified design of Section 3 where the observed sample is We note that the effective sample size correspondingly reduces from n to nτ n/ log 2 (n). Letν R nτ be asν nτ in Section 3 but based on the sample X(γ(t(1))), . . . , X(γ(t(nτ ))).

Theorem 6.
Suppose Conditions 1 to 4 hold. Letν D n, be such that as n → ∞.

Simulation study
Following a suggestion by the referee, Figure 1 illustrates what a Gaussian random field on a curve looks like. The left column presents a random field on a quarter circle γ(t) = (cos(t), sin(t)) for t ∈ [0, π/4] in R 2 and and the right column presents a helix γ(t) . From top to bottom, the random fields are simulated from stratified design, random design and deterministic design respectively as Gaussian random field with mean 0 and Matérn covariance function K M with σ 2 , α = 1 and ν = 0.1. For stratified design, δ i 's are independent uniform random variables, for the random design, the location sampling distribution is a uniform distribution on the corresponding curve and for the deterministic design, ϕ(s) = s(s+1) 3 /(π/2+ 1) 3 for the curve in R 2 and ϕ(s) = s(s + 1) 2 /(4π √ 2 + 1) 2 for the curve in R 3 . We observe that as the curve traces from upper left to bottom right for the curve in R 2 and from top to the bottom for the curve in R 3 , the points of the deterministically sampled random fields get closer and closer to each other gradually, the points of the stratified design keep neither too close nor too separated from each other and the spacings of the random design are most irregular.
Since Section 4 shows that the random design can be reduced to stratified design, Monte Carlo simulations are carried out only for the stratified design and the deterministic design. This section assumes that (i) X(t), t ∈ R d , be a Gaussian random field with mean function m(.) and covariance function K P (., .) as in (1), (ii) ω n = 2 ∨ n 1/3 and M n = log(n) , (iii) the observed sample is X 1 , . . . , X n .
That ω n n 1/3 follows from Theorem 3. Simulations indicate that the estimators are not too affected by the value of M n as long as M n is sufficiently large. The optimal choices of ω n and M n are not addressed in this article but are left to future work.
Letν n ,ν D n be as in Sections 3, 5 respectively. Experiments 1, 2, 3 are conducted to study the finite sample accuracy ofν n and Experiments 4, 5, 6 are conducted to study the finite sample accuracy ofν D n . For each experiment, we carry out ten sets of simulations with sample sizes n = 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 with 400 Monte Carlo repetitions each time. The estimated mean absolute errors and their standard errors of the estimators are computed. The results of Experiments 1 to 3 are reported in Tables 1, 2, 3 respectively and the results of Experiments 4 to 6 are reported in Tables 4, 5, 6 respectively.
In summary,ν n andν D n perform well in this simulation study withν D n significantly more accurate thanν n if ν is not close to 0. This is consistent with the convergence rates O(n −1/3 ), O(n −1/2 ) ofν n ,ν D n as reported in Theorems 3, 7 respectively.
The other parameters in (1) are as in Experiment 3.
As noted in [20], simulating a Gaussian process on [0, 1] accurately when n and ν are large is a difficult problem; cf. [25,29]. This is especially so when the data are irregularly spaced. This is the reason for setting the upper limit for the value of ν to be 2.5 in Experiments 1 to 6.

Lemma 3.
Let M ≥ 1 be an arbitrary but fixed integer and k, k 1 , k 2 be integers (12) and Condition 3 hold. Then we have the following approximations as n → ∞ uniformly over and for any 1 ≤ l ≤ N , where Q i+ωnk1,i+ωnk2 is as in (20) and Proof. Using Taylor's expansion, it follows from Condition 3 that where the last equality follows from (7). Taking square root, we have as n → ∞. Setting k 1 = 0 in (25), we have which together with (25) implies (21). (22) follows from (12) and (25) by observing that as n → ∞. (23) follows from the observation that as n → ∞. To show (24), we observe that where x, y is the inner product of vectors x and y. From (26), we have where c j are constants originating from the coefficients of the Taylor expansion of the function x → √ 1 + x. This further yields where as n → ∞. This proves Lemma 3.

Appendix B: Proofs of Propositions 1 to 3
Proof of Proposition 1 (3) can be written as in (5) with Q ν When ν ∈ Z + , K P (x, y) in (4) can be written as in (5) with In both of the above cases, The latter is as in (2). We observe that where A(x, y) is the adjugate of (Σ x + Σ y ). The (i, j)th entry of A(x, y) is the (j, i)th cofactor of (Σ x + Σ y ) and is therefore a polynomial of the entries of the In the following, we let D ⊂ R 2d be an arbitrary compact set and D 0 = D\{(x, y) : x = y}. We note that there exist ψ min,D , ψ max,D > 0 such that where ψ min,D , ψ max,D can be chosen as the positive lower bound of the smallest eigenvalue of A(x, y) and upper bound of the largest We observe that where P l1,l2 is a polynomial of entries of Σ x and Σ y .
Since Σ x is positive definite and N times continuously differentiable over R d , there exist positive constants Let C M,D be a positive constant depending on M and D whose value may be different from line to line. Then, we can see that, for any Applying similar arguments that proveρ ν (., .) ∈ C 2ν −1 (R 2 ) in Proposition 3, it is not difficult to show that ρ 0 (., .) ∈ C 2ν +1 (R 2d ). The details are omitted here.
Next we compute appropriate bounds on the partial derivatives of ρ 0 (x, y). We observe that the leading term in Q (u1,...,u d ,v1,...,v d ) α (x, y) has the same order as that ofQ where the constant C M,D is independent of α. Consequently, When ν / ∈ Z + , we observe from (28) that ρ 0 (x, y) can be expressed as where c k ,c k are constant coefficients such that for any M and compact D, When ν ∈ Z + , we have Let Q k+ν (x, y) = Q k+ν (x, y) log{Q(x, y)}. By applying similar arguments leading to (34) and (35), it follows that Since k+ν is an integer and Σ x is positive definite and M times continuously differentiable, the partial derivatives of the term Q k and Q k+ν (x, y) log(|Σ x +Σ y |) up to M th-order are bounded by C k+1 M,D and C k+ν M,D respectively for all (x, y) ∈ D. Moreover for any M and compact D, This shows that the ρ 0 (x, y) part of K P (x, y) in (1) satisfies the bound in Condition 2. The N times continuously differentiability conditions on β ν (., .) and A(., .) follows from (29), (30), (31), (32) and (33). This proves Proposition 1.
To show the required bounds onK (u,v) (x, y), it suffices to show the same bounds onρ (u,v) ν (x, y) when x−y is close to 0. When ν / ∈ Z + , the desired bounds follow from (42) and (8) that |x − y| is of the same order as γ(x) − γ(y) when x − y → 0. When ν ∈ Z + , the first bound of (12) and the first bound of (13) follow from (44), and the second bound of (13) follows from similar arguments leading to (44) and the observation that when M/2 > ν, the leading term no longer has log terms and is of the same order as |x − y| 2ν−M (and thus is of the same order as γ(x) − γ(y) 2ν−M ). This proves Proposition 2.

Proof of Proposition 3
We show in the order (i) → (iii) → (iv) → (ii). Since N is large, it suffices to consider the Gaussian random field X(t) − m(t). Therefore, we assume without loss of generality in the following that E{X(t)} = 0 for all t ∈ R d .
Proof of (i). We observe from Lemma 7 that to prove (i), it suffices to show thatK(., .) ∈ C 2ν −1 (R 2 ) and does not exist for any t ∈ R. We recall thatρ 0 (x, y) = ρ 0 (γ(x), γ(y)) and ρ ν (x, y) = ρ ν (γ(x), γ(y)). According to the differentiability assumption of ρ 0 in Condition 2, to showK(., .) ∈ C 2ν −1 (R 2 ) and (37) does not exist, it suffices to show thatρ ν (., .) ∈ C 2ν −1 (R 2 ) and does not exist. In particular, we shall prove that (38) diverges to either −∞ or +∞. The proof for ν ≤ 1 is straightforward and we assume ν > 1 in the following. First, we introduce some notation. Let x, y ∈ R and define We observe thatβ, Q ∈ C N (R 2 ) by assumption and Ψ x > 0 for all x ∈ R since γ (1) (x) is a unit length vector and A(γ(x), γ(x)) is a positive definite matrix. In the following, we focus on a bounded open set, say U ⊂ R, and derive some asymptotic results on the partial derivatives ofρ ν when x, y are close together in U . Then, by saying x − y → 0, we mean x − y tends to 0 with x, y restricted to U and x = y. All the o(.), O(.) terms below are uniform for all x, y in U as x − y → 0. We observe that as x − y → 0, Case 1. Suppose ν / ∈ Z + . We see thatρ ν (., .) is N times continuously differentiable at every point (x, y) for x = y. Differentiatingρ ν (x, y) for x = y, we obtaiñ where . . . represents the negligible terms as x − y → 0. We observe from (39) that as x − y → 0, |ρ ν (x, y)| |x − y| 2ν since |β| 1, Q |x − y| 2 , each first-order partial derivative of Q has the same order as |x − y| and each second and higher-order partial derivative has the same order as 1. From this observation and (40), we conclude that when a differentiation with respect to either x or y occurs on one of the terms Q, Q (1,0) , Q (0,1) the order of the resulting term decreases by |x − y| but if the differentiation occurs onβ or Q (u,v) with u + v ≥ 2, then the resulting term has the same order as the original term. For example, when we are differentiating νβQ ν−1 Q (1,1) with respect to x, we have The three terms in the right hand side of (41) are of order |x − y| 2ν−2 , |x − y| 2ν−3 and |x − y| 2ν−2 respectively. This indicates that as x − y → 0, νβ (1,0) Q ν−1 Q (1,1) and νβQ ν−1 Q (2,1) are negligible compared to This shows that for any u, v ∈ Z + such that u + v ≤ N as x − y → 0, the leading terms ofρ (u,v) ν (x, y) are those for which the differentiation only occurred on Q, Q (1,0) , Q (0,1) . This implies that for any u, v ∈ Z + such that u + v ≤ N , as x − y → 0, where . . . represents the negligible terms. If we continue differentiatingρ ν (x, y), x = y, we observe from (39) that which implies that Next we shall prove thatρ ν (., .) ∈ C 2ν −1 (R 2 ). We observe from (42) that ρ (u,v) ν (x, y) → 0 as x − y → 0 given u + v ≤ 2ν − 1. Therefore, to show ρ ν (., .) ∈ C 2ν −1 (R 2 ), we only need to showρ (u,v) ν (x, x) exists and equals to 0. We use induction to show the result. Observe that x) exists and equals to 0. We verify that the same holds forρ We observe from (42) and (43) that , x), the result follows similarly. This completes the proof that ρ (u,v) ν (x, x) exists and equals to 0 when u + v ≤ 2ν − 1. Since x can be any real number,ρ ν (., .) ∈ C 2ν −1 (R 2 ) follows. Now we show (38) diverges to ∞, where ∞ can be either +∞ or −∞ depending on ν. By using the resultρ as x − y → 0, where . . . represents negligible terms. Using arguments similar to the ν / ∈ Z + case, we observe that as x − y → 0. The final result follows from the same remaining steps as for the ν / ∈ Z + case. This completes the proof of (i).
Proof of (iii). The result follows from choosing the curve γ(t) in (i) to be each axis of R d .
Proof of (iv). We observe from Corollary 1.

15(b) of [4] that X(t) has an equivalent version with a continuous path on T provided
for all t and h = (h 1 , . . . , h d ) with h sufficiently small. Here r > 3 and C > 0 are constants. One can see that (45) is equivalent to Let a = (a 1 , . . . , a 2d ) where the a i 's are nonnegative integers and |a| = 2d i=1 a i . We define , y), where x j = z j and y j = z j+d , j = 1, . . . , d. Denote e i = (0, . . . , 1, . . . , 0) as the ith standard basis of R 2d for i = 1, . . . , 2d. Case 1. Suppose ν > 1. LetẊ i (t), denote the first-order mean square partial derivative of X(t) in the direction of e i . First, we check that for any bounded open set T ⊂ R d ,Ẋ i (t) when restricted on T has an equivalent versionẊ i (t) which possesses, with probability 1, a continuous sample path. From Lemma 7, we observe that the covariance function ofẊ i (t) is K (ei+e i+d ) (x, y). It follows from the condition ρ 0 (x, y) ∈ C 2ν +1 (R 2d ) that as h → 0 uniformly for all t ∈ T . For ρ , we see that the leading term of ρ (ei+e i+d ) ν (x, y) has the same order as as h → 0 uniformly for all t ∈ T . (47) and (48) show that K (ei,e i+d ) satisfies (46), which means that on T ,Ẋ i (t) has an equivalent versionẊ i (t) which possesses, with probability 1, a continuous sample path. Denote t = (t 1 , . . . , t d ) . In the following, e i is understood to be the ith standard basis of R 2d or R d provided no ambiguity occurs. We have Let r ∈ T . We have from Cauchy-Schwarz inequality that as h → 0, Using the arguments of (47) and (48), one can easily verify that on T , X(t) has an equivalent version, say X(t), possessing, with probability 1, a continuous sample path.
Let r 1 be such that (s, t 2 , . . . , t d ) ∈ T for all s ∈ [r 1 , t 1 ]. Define It is clear that on T , with probability 1, the sample path of Y 1 (t) has continuous partial derivative in the direction of e 1 . Meanwhile, using (49) and Fubini's theorem, we have In the same way, It then follows from (50) and (51) that (52) implies for any fixed t ∈ T , If we define analogously Y i (t) for i = 2, . . . , d, using similar arguments we can conclude that for any fixed t ∈ T Since Y i (t), i = 1, . . . , d are continuous with probability 1 in T , it follows that P (53) shows that on T each of Y 1 (t), . . . , Y d (t) is an equivalent version of X(t) possessing, with probability 1, continuous first-order partial derivatives in all directions e 1 , . . . , e d .
Applying the arguments above inductively, we conclude that restricted to T , X(t) has an equivalent version possessing, with probability 1, a C ν −1 (T ) sample path.
Case 2. Suppose that ν ≤ 1. It is easy to verify that X(t) is mean square continuous and when restricted on T has an equivalent version possessing, with probability 1, a continuous sample path.
Proof of (ii). Since X(γ(t)) is the Gaussian random field X(t) restricted to the C N -curve γ(t), (ii) follows directly from (iv). This proves Proposition 3.

Appendix C: Proof of Theorem 1
(a) By using (24), Lemma 1, Lemma 2 and Lemma 3, we obtain When ν ∈ Z + , it follows from Lemma 1 and Lemma 3 that a θ, ;i;k1 a θ, ;i;k2 as n → ∞. It follows from (54), (55) and (56) that as n → ∞. Writingm(t) = m(γ(t)) and using (24), Lemma 1 and Lemma 3, we obtain as n → ∞. Consequently, it follows from (57) and (58) that as n → ∞ since N ≥ 2ν + 6 (Condition 1). To conclude that we need only verify that because EV θ, ≥ 0. We observe that β ν (x, y) = 0 for all x, y ∈ R d implies that β ν (x, y) does not change sign, namely, it is either positive or negative for all x, y ∈ R d . From the definition of Ψ and that A(x, y) is positive definite for all x, y ∈ R d , we conclude that Ψ(s) > 0 for all s ∈ [0, L]. All these results together with H (ν) = 0 (see Section E) imply that (61) is true. Recall the notation introduced in equation (15). To prove that V θ, /EV θ, → 1 as n → ∞ almost surely, it suffices to show that and as n → ∞ almost surely. We observe from (15) that Applying the Hanson-Wright inequality [11], it follows from (64) that there exists some constant C > 0 such that We shall now evaluate the orders of Σ abs 2 and Σ 2 F . We observe from Lemma 2 that k1=0 k2=0
By applying Borel-Cantelli lemma, we get that as n → ∞ almost surely. The proof of (a) is complete.
We observe from the proof of Proposition 3(iii) that X ( ) (t) is continuous on t ∈ (−ε, L + ε) and is an equivalent version of the th-order mean square derivative of X(γ(t)). Consequently, it follows from Lemma 7 in Section F that the covariance function of for all h such that |h| > 0 is small and t ∈ [0, L], where C > 0 is a constant. Forρ ν (., .), we observe in the proof of Proposition 3(i) that the leading term ofρ ( , ) ν (x, y) has the same order as Thus we conclude that for all h such that |h| > 0 is small and t ∈ [0, L], where C > 0 is a constant. Applying the result on page 186 of [9], (93) and (94) show that for any < (ν − ) ∧ 1, X ( ) (t) can be chosen such that its sample path is Lipschitz continuous of order . In particular, with probability 1, there exist constants C, δ > 0 (may be random) such that if |h| < δ, for all t, t + h ∈ [0, L]. We extend the definitions of f w (t) in (24) by f 1 (t) = 1 and f 2 (t) = 0 for any t ∈ (−ε, L + ε). Using Taylor expansion, (24) and Lemma 1, we have with probability 1, as n → ∞. Hence it follows from (95) that with probability 1, as n → ∞. Since f 1 (t), . . . , f (t) are all bounded, using the arguments in the proof of Proposition 3 (see Section B), it is not difficult to verify that is a Gaussian process with mean 0 and bounded variance for all t ∈ [0, L]. Since the limit is the same for both θ = 1, 2, it follows that with probability 1. Now, the final task is to verify (96). Define Suppose the probability that (96) is true is strictly less than 1. Then from the sample path continuity of Y (s) on [0, L], it follows that Consequently with probability 1, we obtain For simplicity of notation, we denote It follows from the definitions of f w1 , . . . , f w that c j (t) ∈ C N − (0, L). From the proof of Proposition 3 in Section B, the mean square derivatives of X (1) (t), . . . , X ( −1) (t) coincide with their sample path derivatives which are X (2) (t), . . . , X ( ) (t) respectively. Using (97), it follows that X ( ) (t) is mean square differentiable on (0, L). DenoteẊ ( ) (t) be the mean square derivative of X ( ) (t). One can verify that the product rule in mean square sense applies and yieldṡ Plugging (97) into (98), we can inductively differentiate X ( ) (t) in the mean square sense N − times. However this contradicts Proposition 3 which asserts that X(t) can be differentiated in mean square at most ν −1 times. This proves that (96) must be true with probability 1. Now we show that By Fatou's Lemma, it follows that Also, we observe that as n → ∞. Thus (99) follows from (100) and (101). Next we prove the result by contradiction. We observe that It is straightforward to see that Var(V θ, ) = O(n 2 ) and thus it follows from (99) that Var(V θ, /EV θ, ) = O (1). To show lim inf n→∞ Var(V θ, /EV θ, ) > 0, we show that the integral in (103) is strictly positive. It suffices to prove that the integrand is not identically 0 since the integrand is a continuous function. We note that if the integrand is identically 0, u1,u2≥1 u1+u2=2 This implies that the 2 -th partial derivative ofK(., .) can be expressed as a linear combination of its lower order partial derivatives. Hence due to the differentiability ofK(., .), it can be further differentiated iteratively. This contradicts to the fact thatK(., .) is at most 2ν times differentiable. Therefore, the integral in (103) is strictly positive and the proof of (102) is done. We further observe that To evaluate Σ 2 F , we observe that It thus follows that Using Hanson-Wright inequality, we have for sufficiently large n, ≤ 2 exp{−C min(q 1,ωn s, q 2,ωn s 2 )} + min{1, C 1 s −1 q −1/2 3,ωn exp(−Cs 2 q 3,ωn )}, for all s > 0. The proof of (c) is complete.
The result in (d) can be directly read from (a), (b) and (c). This completes the proof of Theorem 1.

Appendix D: Proofs of Lemmas 4 to 6 and Theorems 2 to 4
Define W θ = V θ, /EV θ, . Lemma 4. Suppose ν ≤ and Conditions 1 to 3 hold. For any j ∈ Z + , there exists a constant C j > 0 such that Proof. We recall the notation from (15) that for ν ≤ Let C j be a generic constant that depends on j which can take different values at different locations. We observe from the proof of Theorem 1(a) and (b) that for j ∈ Z + , . This proves Lemma 4.
Then we haveW We observe that when 0 <W θ ≤ 1/2, log(W θ ) ≤ 0 and hence Consequently it follows that for any j ∈ Z + , Now we establish lower and upper bounds for λ 1 and an upper bound forμ 1 . Since (74) and (92) that for all large n. Let C j denote a generic constant that can take different values at different locations. Consequently using (105), we obtain Then it follows from (104) and Lemma 4 that for all large n, Next, we improve the bound (106). Recall that We observe from (16) that there exists some constant C, C 1 > 0 such that for all large n P(|W θ −1| ≥ 1 2 ) ≤ 2 exp(−C min(q 1,ωn , q 2,ωn ))+min{1, Suppose ν < . Applying Cauchy-Schwartz inequality and (106), it follows that Suppose ν = . Applying Hölder's inequality and (106), we can choose a sufficiently large number r > 1 such that for some constant ε > 0, for all large n. By mean value theorem, there exists some c ∈ (1/2, 3/2) such that The last inequality follows from Lemma 4. Then the desired result follows from (107), (108), (109) and the fact that (107) (or (108) if ν = ) is negligible compared to (109) as n → ∞. This proves Lemma 5.

Lemma 6.
Let Ω be as in (18). There exist constants C 1 , C 2 > 0 such that for n sufficiently large, Proof. Case 1. Suppose that 1 ≤ < ν. We observe from Theorem 1(c) and (16) that for sufficiently large n, Case 2. Suppose = ν. We observe from Theorem 1(b) and (d) that for sufficiently large n, Case 3. Suppose ≥ ν + 2. We observe from Theorem 1(a) that as n → ∞. This implies that for sufficiently large n,
Using Lemma 6 and Theorem 2, we have Using Lemma 6 again, we have for sufficiently large n, We conclude from (130) E(|ν n, 0 − ν|I{Ω }), and the result follows similarly. Finally, we shall prove the almost sure convergence ofν n . We claim that as n → ∞, almost surely, the index 0 equals either ν when ν < ν − 1/4 or ν + 1 when ν ≥ ν − 1/4. Then the desired result follows from Theorem 2. We consider the case when ν ≥ ν − 1/4. To verify the latter, it suffices to prove that We observe that ∞ n=j = ν +1 Ω is a decreasing sequence of sets as j increases. Consequently using Lemma 6, we obtain The case ν < ν − 1/4 follows in a similar manner.

t) hl
imply that (137) exists. This completes the proof of the "only if" part. To prove the "if" part, that is, X h (t) converges in mean square sense as h → 0, from the completeness of the L 2 space, it suffices to show that lim (h,l)→(0,0) We observe that Then (139) follows from (140) and the assumption that (137) exists. Suppose h, l are small enough and ∂ 2 ∂x∂y K(x, y) exists in a neighborhood of (t, t) and is continuous at (t, t). Define A(x) = K(x, t + l) − K(x, t). Since ∂K ∂x (x, y) exists in a neighborhood of (t, t), A(x) is differentiable in a neighborhood of t. Hence by the mean value theorem, there exists some θ 1 ∈ (0, 1) such that Define B(y) = ∂K ∂x (t + θ 1 h, y). Since ∂ 2 K ∂x∂y (x, y) exists in a neighborhood of (t, t), B(y) is differentiable in a neighborhood of t. It follows from the mean value theorem and (141) that there exists some θ 2 ∈ (0, 1) such that The existence of (137) follows from the (142) and the continuity of ∂ 2 ∂x∂y K(x, y) at (t, t). If ∂ 2 ∂y∂x K(x, y) exists in a neighborhood of (t, t) and is continuous at (t, t), the result follows similarly.
Next, we calculate the mean function and covariance function ofẊ(t). It follows that for t, x, y ∈ R This proves Lemma 7.