Pathwise least-squares estimator for linear SPDEs with additive fractional noise

Abstract: This paper deals with the drift estimation in linear stochastic evolution equations (with emphasis on linear SPDEs) with additive fractional noise (with Hurst index ranging from 0 to 1) via least-squares procedure. Since the least-squares estimator contains stochastic integrals of divergence type, we address the problem of its pathwise (and robust to observation errors) evaluation by comparison with the pathwise integral of Stratonovich type and using its chain-rule property. The resulting pathwise LSE is then defined implicitly as a solution to a non-linear equation. We study its numerical properties (existence and uniqueness of the solution) as well as statistical properties (strong consistency and the speed of its convergence). The asymptotic properties are obtained assuming fixed time horizon and increasing number of the observed Fourier modes (space asymptotics). We also conjecture the asymptotic normality of the pathwise LSE.


Introduction
Least-squares type estimators of an unknown drift parameter have recently become very popular in the setting of (semi)linear SPDEs driven by an infinite-di- * The work of PK was supported by the Czech Science Foundation project No. 19-07140S and the work of JŠ was supported by the grant LTAIN19007 Development of Advanced Computational Algorithms for Evaluating Post-surgery Rehabilitation. 1562 P. Kříž and J.Šnupárková mensional Brownian motion, because they have advantageous asymptotic properties (they basically coincide with MLE estimators in this case) and are relatively easy to implement. For recent theoretical works, the reader may check [27], [7], [2] or [3] to name just a few. The theory for models with Brownian motion has reached certain level of maturity so that it enables several interesting practical applications, such as [1] or [26].
Wide range of randomly evolving phenomena are forced with auto-correlated noise, which can be effectively modeled by a fractional Brownian motion (fBm). Stochastic evolution equations driven by a fBm appear in diverse fields such as biology, neuroscience, hydrology, climatology, finance and many others (see e.g. monographs [5] or [23] for discussion). Least-squares estimators of the drift parameter for continuously observed trajectories of solutions to one-dimensional linear equations with additive fractional noise have been studied in [13] for Hurst index H ≥ 1/2 and [14] for general H ∈ (0, 1).
The available literature on parameter estimation for infinite-dimensional equations driven by an (infinite-dimensional) fBm is rather sparse. A least-squares estimator for a one-dimensional projection of the solution has been studied in long-span asymptotic regime in [22] assuming the regular case H ≥ 1/2. Drift estimation in spectral setting (Fourier modes are observed) for infinite-dimensional equations with fractional noise has been studied in [8] (MLE approach, H ≥ 1/2) and [17] (ergodic-type estimator, H ∈ (0, 1)). We are not aware of any study on LSE in spectral setting (Fourier modes are observed) for these models.
LSE-type estimators typically contain stochastic integrals, which makes them difficult to apply. For equations with Brownian noise, evaluations of Itô stochastic integrals may be sensitive to small perturbations of trajectories. A robust version of LSE can be obtained by application of Itô formula (for simple equations) or rough path theory (more complex equations), see [11]. The least-squares estimators for equations with fBm incorporate divergence-type integrals (sometimes called the Skorokhod integrals). These are not defined pathwise, but rather as the adjoint operators to Malliavin derivatives, which makes them extremely difficult (if not impossible) to evaluate from the observed trajectory in practice or in simulations, which prevents them from wider use. To avoid these complications, the authors in [14] switched to the ergodic-type estimator for onedimensional fractional Ornstein-Uhlenbeck process, which contains only Riemann integrals. Estimators of ergodic type for more general equations driven by a finite-dimensional fBm were recently studied in [25]. Application of this ergodic approach in the spectral setting for infinite-dimensional equations corresponds to the weighted minimum-contrast estimator (weighted MCE) introduced in [17]. Such estimator is plausible for stationary processes, but fails for processes at far-from-stationary state and is difficult to generalize to more complex equations, because (in contrast to LSE) it requires ergodicity and precise knowledge of the limiting covariance operator.
In this paper, we start with derivation of a new spectral version of the leastsquares estimator (theoretical LSE) for an unknown drift parameter in linear SPDEs driven by an additive infinite-dimensional fractional Brownian motion and prove its strong consistency in space (incl. speed of convergence) by the tools of Malliavin calculus. We choose to work in spectral setting (observations in spectral space domain rather than in physical space domain are available) and to study asymptotic properties under increasing number of the observed Fourier modes while time horizon is fixed. Such approach makes it possible to consistently (with error approaching zero) estimate the unknown parameter based on observations in fixed time horizon, but with sufficient detail in space (number of Fourier modes). This is particularly useful for observations having limited time horizon but high space resolution/detail. It contrasts to more typical time asymptotics, which assumes increasing time horizon and which is not studied here. Moreover, sever complications with its numerical evaluation has led us to construct a pathwise version of the LSE, which is defined (and computable) from the observed trajectory and which is robust to small perturbations in the observations.
Our construction of the robust pathwise LSE is inspired by the work [11]. It is based on calculation of pathwise Stratonovich integral, which can be expressed explicitly in our case, and compensation for difference between Stratonovich and divergence-type integrals. In our setting with fBm, this leads to an implicitly defined estimator. We prove its existence and uniqueness, as well as its strong consistency in space. Surprisingly, if H > 1/2, it has slightly better performance than the theoretical LSE. To our best knowledge, this is the first attempt to define a pathwise robust LSE for equations driven by fBm. We believe that this pathwise least-squares approach is applicable also for different models with fBm (such as those in [22] or [14]) and may offer estimators applicable in practice or in simulations.
Main purpose of this paper is to make the first step in the development of a theoretical framework necessary for studying spectral asymptotic properties as well as performing effective enumeration of the least-squares estimators for stochastic evolution equations with infinite-dimensional auto-correlated noise. The assumed linearity of the operators ensures reasonable clarity and simplicity of the exposition. The next step would be to study this problem in the setting of semilinear equations with non-linear lower order perturbation with numerous practical applications. This, however, requires fine analysis of space regularity of solutions and is beyond the scope of this work.
The paper is organized as follows. In section 2 we introduce the studied problem in more detail. The two new estimators -theoretical and pathwise LSE -are derived in section 3. Existence and uniqueness of the implicitly defined pathwise LSE is studied in section 4. Section 5 is devoted to the strong consistency of the estimators. A simulation study is presented is section 6. Section 7 summarizes main results and findings of this paper. In Appendix, we briefly describe some elements of Malliavin calculus, which are useful for the asymptotic analysis and outline the connection between cylindrical fractional Brownian motion and Gaussian noise that is white in space and fractional in time.

The model
Consider the evolution equation in a real separable Hilbert space V where A, B are densely defined linear operators, B H is a cylindrical fractional Brownian motion on V , X 0 ∈ V and λ is a positive unknown parameter, which we want to estimate. Throughout this paper we assume that the equation (1) is diagonalizable, i.e. there exists an orthonormal basis {e k } ∞ k=1 in V such that for any k ∈ N we have where α k ≥ 0 and β k ≥ 0. Furthermore, set Remark 2.1. For existence of the solution to (1) in an appropriate interpolation space, constructed via spectral decomposition, it suffices that for some γ ∈ R holds. For more details and proof, see e.g. Theorem 2.1. (and Remark 2.1.) in [17]. We assume (throughout the paper) that the existence condition (2) is satisfied. It also implies that lim Denote the Fourier modes (projections to eigenvectors) of the solution by x k (t) = X(t), e k V . By diagonality of the equation (1) these projections satisfy the system of independent one-dimensional equations where {B H k (t), t ≥ 0} are mutually independent real-valued fractional Brownian motions with the same Hurst index H ∈ (0, 1).
The solutions to the equations (4) are mutually independent real-valued fractional Ornstein-Uhlenbeck processes given by a formula for any k ∈ N.
Remark 2.2. If the operators A, B from (1) are negative definite self-adjoint elliptic linear differenetial operators of even orders 2m 1 , 2m 2 (m 1 , m 2 ∈ N), respectively, acting on a compact d-dimensional manifold, then (see [30]) where Such operators are of great interest as they frequently appear in various stochastic PDEs.

Statistical problem
To estimate the value of λ, we have continuous sample trajectories of the first N Fourier modes observed on a fixed time-window [0, T ] at our disposal, i.e. our data are x k (t), t ∈ [0, T ] , k = 1, . . . , N. We aim at deriving an estimator that would be computable from this data and that would be consistent with increasing number of Fourier modes N → ∞ (the so-called space asymptotics), the time horizon T being fixed.
Remark 2.3. In this paper we assume that the Hurst parameter H is known.
If not, it can be consistently determined from a single continuous trajectory of the process x 1 (t), t ∈ [0, T ] by some of various techniques developed for estimating H for real-valued processes sampled under in-fill asymptotics regime (discrete time observation with fixed time horizon and decreasing time step), cf. [15], [12], [9], or [29]) to name just a few. If, moreover, the noise processes for individual Fourier modes B H k have different intensities (volatilities, denote by σ k ), we can simultaneously and consistently estimate σ k and H from the observed modes under in-fill asymptotics regime using the powers of the second order variations (see [4], Chapter 3.3) and subsequently eliminate different noise intensities by appropriate rescaling of the observed processes x k .

Theoretical LSE
We derive the estimator for the drift parameter by applying the least-squares concept to the SDEs understood as a system of linear regression models. In particular, we find λ which minimizes (for fixed time horizont T ) the formal sum Note that this is only a heuristic technique, since time derivativesẋ k (t) do not exist (in the classical sense). However, we can rewrite the minimizer back in terms of the well-defined stochastic integrals and Riemann integrals. It readŝ where the stochastic integral is understood in the Skorokhod sense (sometimes referred to as the divergence type integral). Applying the stochastic differential from the equation (4) we obtain This form is convenient for proving the asymptotic properties of the estimatorλ N whereas the form (8) is better for derivation of the pathwise version of the LSE for λ. Remark 3.1. We did not specify the type of stochastic integral with respect to the fBm in the equations (1) or (4), because all integration concepts coincide in case of additive noise. However, the choice of the appropriate type of stochastic integration in (9) (and in (8)) is critical, since the integrands are stochastic processes here. The choice of the Skorokhod type integral is motivated by the fact that its expectation is zero, which is advantageous for the error term in (9) and will be utilized in the sequel. If we choose the pathwise Stratonovich integral, for example, its nonzero expectation would generate unwanted bias in (9) and the corresponding estimator would not be consistent.

Robust pathwise LSE
Although the Skorokhod type integral in (8) has advantageous probabilistic properties, it is not constructed pathwise, but rather as the adjoint operator to Malliavin derivative. This imposes sever complications to numerical evaluation of the estimatorλ N for the observed single trajectory. To overcome this issue, we derive below its robust pathwise version. The strategy, inspired by the paper [11], is to express the estimator in terms of the pathwise Stratonovich integral (for definition, see [14]) with appropriate compensation for difference between Stratonovich and Skorokhod integrals.
where T 0 x k (t)• dx k (t) stands for the pathwise Stratonovich integral and Δ(μ k ) is the compensation. Since x k is in the first Wiener chaos (it is Gaussian), the compensation Δ(μ k ) takes rather simple form (see [14], formulas (3.4)-(3.6)): where Moreover, since Stratonovich integral satisfies the rules of the first order calculus, we can easily evaluate (without need for rough path lift): This provides a simple pathwise formula for evaluation of the Skorokhod-type integral and the corresponding formula for LSÊ Unfortunately, μ k = α k λ + β k in this evaluation formula depend on the unknown value of parameter λ. A natural workaround here is to define new estimator (the pathwise LSE) as a solution to the following equation (in unknown Λ): This follows from the fact that the right-hand side of (11) is constant (in Λ) in this case, which is shown below.

Existence and uniqueness of pathwise LSE
We can not expect the existence and uniqueness of the positive solution to (11) in general. This can be seen on the simplest example of LSE for one-dimensional Ornstein-Uhlenbeck process (driven by a Wiener process) , which can provide negative estimates. We address this problem below by taking positive part of the solution and specifying, which one to choose if there are multiple solutions. To derive the conditions for the existence and uniqueness of the positive solution to equation (11), we need to understand the properties of the function on its right-hand side. Set with function Δ defined in (10). Direct calculations lead to the following formula for its first derivative (after cancelling some terms) (13) and the second derivative in view of Derivatives of R are immediate consequence of (13) and (14) and Direct calculations yield The above properties of R N lead us to the following conclusion: Theorem 4.1. Fix N and consider the function R N defined in (12) with R N (0) defined in (16) and let H ∈ (0, 1). Sufficient condition for the existence and uniqueness of the positive solution to equation (11), which can be rephrased as If H ≥ 1/2, this condition is also necessary.
Note that the condition R N (0) > 0 can be easily verified from the data by direct evaluation of R N at Λ = 0 (if β k are positive) or by calculating the limit for Λ → 0 + (if β k are zero). We are now in a position to define the pathwise LSE correctly: (11), if it exists and is unique; 0, if there is no positive solution to (11); the greater solution to (11), if there are two positive solutions.  Monotonicity and convexity (if H > 1/2) or concavity (if H < 1/2), respectively, ensure that the Newton-Raphson numerical method is well suited for numerical approximation ofλ N .

Strong consistency of theoretical LSE
We start with strong consistency of the theoretical LSEλ N defined in (8), because it will be helpful to show strong consistency of the pathwise LSE in the sequel. For simplicity, we assume throughout this section that The self-similarity of a fBm enables us to study the effect of different drifts μ k on the distributions of the processes x k . Its combination with the (longspan) asymptotic behavior of a real-valued fractional Ornstein-Uhlenbeck process, studied in [14], forms the strategy of the proof of the strong consistency (in space) ofλ N .
Using integration-by-parts in the formula (5) it is possible to show that the solutions {x k (t), t ≥ 0} can be expressed as for any k ∈ N. This can also be verified by direct substitution of (18) into the integral form of equation (4).
given by the formulax for any k ∈ N.
Using the self-similarity of a fractional Brownian motion in the expression (20) we get the equality of distributions Moreover, let {y(t), t ≥ 0} be the stationary and ergodic solution to (19) (for k = 1) given by a formula for any t ≥ 0 and Lemma 5.1. The following convergence Obviously, the first and third term tend to zero as T goes to infinity. Also the second term tends to zero due to [16], Lemma 3.1, and the limit properties of for H = 1/2 (see [6], Theorem 2.3) and for the Wiener case H = 1/2.
Define the weak asymptotic equivalence for sequences (23) Note that this relation is weaker than the equivalence relation ∼ defined in (7).
Recall the formula (9) (suitable for probabilistic analysis) and denote For the needs of Lemma A.1 we explore the limit behavior of the variance of Proposition 5.1. The expectations of C N and D N satisfy Proof. The first part is obvious. By Lemma 5.1 k∈N is positive and bounded. Using (21) and substitution we get To study the time asymptotics of var T 0x 2 1 (t) dt , we utilize the results from [14]. Denote By (3.9) in [14] this equals Further define 3 4 . Then by [14], Lemma 17, there exists a constant 0 < C H < ∞ such that As a consequence, we obtain: Proof. In the first step we show that var x 2 In the second step we apply (25) and the first step to equality (24) and yield the statement.

P. Kříž and J.Šnupárková
Proof. Using again (3.9) in [14], (21) and the fact that {x k (t), t ≥ 0} have the same law for all k ∈ N, we get equality of laws: for any k ∈ N. This implies that where the last equality follows from the definition of F T . By (25) and Lemma 5.2 to obtain that the sequence Q H (T k ) var Note that if for some c, δ > 0 and for all N ∈ N var so that the strong consistency of the estimatê follows.
It is difficult to verify the sufficient conditions (27) in general and it should be done on case-by-case basis. However, if we adopt assumption (6), often satisfied for parabolic SPDEs, the calculations of asymptotic variances simplify substantially. Proof. Denote the growth exponent of α k by M 1 := 2m 1 /d. Then using formula (26) we obtain Thus (27) is satisfied for any m 1 , d, so the conclusion follows from Lemma A.1.

Theorem 5.2.
Assume the power growth of eigenvalues α k and β k given in (6): thenλ N is strongly consistent.
Proof. (i) Applying again formula (26) These estimates give us the required relationship between m 1 , m 2 and d to meet (27). The rest follows from Lemma A.1.
Remark 5.1. The case m 1 = m 2 is reduced to the case of Theorem 5.1. Remark 5.2. The expressions for var C N E D N given in formulas (28) and (29) show the speed of the convergence ofλ N to λ.

Strong consistency of pathwise LSE
In this subsection, strong consistency (as N → ∞) ofλ N is demonstrated. We continue to denote the true value of the unknown parameter by λ. Recall that the theoretical LSE satisfiesλ N = R N (λ). Hence, ifλ N is strongly consistent (e.g. if the conditions of Theorem 5.1 or Theorem 5.2 are satisfied), we have: To show that the pathwise LSEλ N = R N (λ N ) is arbitrarily close to λ for sufficiently large N , the limiting behavior of derivatives of R N is studied. The following elementary observation will be useful in the sequel: Proof. It is a simple consequence of the Stolz-Cesaro theorem.
Next lemma bounds the derivatives of R N on a compact neighborhood of λ from above by a constant smaller than one (for sufficiently large N): Then, almost surely, if we take arbitrary where the right-hand sides equal one if c = ∞.
Proof. Recall the formula for the derivative of R N in (15) and the notation from previous subsection D N = N k=1 α 2 k T 0 x 2 k (t) dt. It follows from the proof of Proposition 5.1 that
Recall that condition (6) Fix arbitrary 0 < Λ L < Λ U < ∞. Obviously, due to (3) and (34), and, due to (31) inf Similarly to proof of Lemma 5.3 we can show For the last term we can use the upper bound: Assume first lim k→∞ Application of Lemma 5.3 with and Λ = Λ L or Λ = λ, respectively, and where ∞ k=1 Next, assume instead c = ∞, i.e. lim k→∞ α k β k = 0. We can proceed similarly to previous case To summarize, we have The limit for infimum can be obtained accordingly, since

P. Kříž and J.Šnupárková
Theorem 5.3. Assume that the eigenvalues α k and β k satisfy the power growth conditions (6) and the asymptotic conditions (30), (31) and (3). Further assume that the theoretical LSEλ N is strongly consistent (see e.g. Theorem 5.1 and Theorem 5.2). Then the pathwise LSE is strongly consistent as well: Moreover,λ N andλ N have the same speed of convergence (up to a constant): Proof. Recall that if H = 1/2, the theorem holds true trivially, becauseλ N = λ N in this case. In the rest of the proof, consider H = 1/2. First, take Λ L < λ so that and arbitrary Λ U > λ. Lemma 5.4 guarantees that, for almost all ω and some for sufficiently large N . Fix > 0 such that Λ L < λ− < λ+ < Λ U and denote δ = Ψ. The almost sure convergence lim N →∞ R N (λ) = λ, which is guaranteed by strong consistency of the theoretical LSE, enables us to take N large enough in order to λ − R N (λ) < δ.
Combination of the lower bound on derivative of Λ − R N (Λ) and its closeness to zero at λ implies existence of its root inside (λ − , λ + ), which is unique therein and it equalsλ N . This proves (35).
To demonstrate (36), note that for sufficiently large N where Λ 0 N is betweenλ N and λ and therefore converges to λ almost surely. Next, fix arbitrary > 0 and take Λ L < λ < Λ U so that the right-hand sides of (32) are close enough to one and for sufficiently large N . In particular, for N large enough we have almost surely This proves the desired almost sure convergence Remark 5.3. Immediate, and possibly surprising, consequence of (36) is that the pathwise LSE performs asymptotically better than the theoretical LSE whenever H > 1/2.

Example
As an example of the use of Theorem 5.3, consider a stochastic heat equation with additional mean reversion, with a distributed noise autocorrelated in time and with the Dirichlet boundary condition. Formally, we can write this equation as where Δ ξ is Laplace operator in variable ξ, O ⊂ R d is a d-dimensional bounded domain with smooth boundary ∂O, λ 1 ≥ 0 is the diffusivity parameter determining the speed of diffusion of a modeled substance along the domain O, λ 2 ≥ 0 is the rate of mean reversion related to the speed of reversion of the amount of substance to the zero mean. The process {η H (t, ξ); t ≥ 0, ξ ∈ O} is a Gaussian noise, which is fractional (autocorrelated) in time with Hurst parameter H ∈ (0, 1) and white in space. Formally it can be modelled as the differential of a cylindrical fractional Brownian motion dB H (t) (see Appendix B for details). This formal equation can be rewritten rigorously as a stochastic evolution equation (1) being Dirichlet Laplace operator defined on a standard Sobolev space (cf. [31]) and with B being the identity operator on L 2 (O), should λ 1 be the estimated parameter (or vice-versa, should we estimate λ 2 ). Consequently, this equation is diagonalizable with eigenvalues satisfying the power growth condition (6).
Firstly, assume we want to estimate the diffusivity parameter λ 1 . If λ 2 = 0, we have a stochastic heat equation with α k ∼ k 2/d , and β k = 0. Theorem 5.1 guarantees strong consistency of the theoretical LSEλ N . If λ 2 > 0, there is an additional mean reversion in the process and we have α k ∼ k 2/d , and β k ∼ 1. We can apply Theorem 5.2 (i) to get strong consistency of the theoretical LSÊ λ N . In both cases, Theorem 5.3 implies strong consistency of the pathwise LSE λ N with the speed of convergence (in terms of RMSE) given by Secondly, let λ 2 be the parameter of interest. If λ 1 = 0 we have α k ∼ 1 and β k = 0, and the existence condition (2) is violated. Hence, we consider the case λ 1 > 0, when the process combines diffusion and mean reversion. For the eigenvalues, we have α k ∼ 1, and β k ∼ k 2/d . Theorem 5.2 (ii) provides us with strong consistency ofλ N if d > 2 for H < 3/4 and d > 4(2H − 1) otherwise. To show strong consistency of the pathwise LSE by Theorem 5.3, we need in addition that d ≥ 4H so that (31) is satisfied. If the conditions for consistency are satisfied, the speed of convergence is (39)

Simulation study
To illustrate the actual performance of the studied pathwise LSE we perform a Monte Carlo analysis for two specific equations -a 1D heat equation (the simplest setting) and a 2D heat equation with coupling with surroundings (a model popular e.g. in oceanography). All simulations were performed in statistical software R.
The observed Fourier modes take the following form For simulations we used the Spectral Galerkin Method (see [20], chapter 10.7.), which is very efficient in our setting (space discretization provides system of independent linear stochastic ODEs, which, in addition, can be solved explicitly). Heat maps of four simulated sample solutions for various scenarios are shown in Figure 1.
We test performance of the pathwise LSE, defined in (17), and compare it with the weighted MCE, an alternative estimator defined and studied in [17], cf. formulas (3.21) -(3.23) and Theorem 3.2 therein. Recall that the weighted MCE is strongly consistent and asymptotically normal with increasing number of observed modes.
Simulation results were obtained from 200 runs for each scenarios producing sample of 200 estimates for each setting. Figure 2 illustrates the convergence of the estimates to the true value of the parameter with increasing number of Fourier modes by showing selected sample quantiles. It confirms the convergence for all settings. In addition, non-zero initial condition significantly improves the performance of the pathwise LSE. Figure 3 shows convergence of the root mean square error (RMSE) for samples of the two types of estimators (pathwise LSE and weighted MCE) and its comparison with the theoretical speed of convergence for RMSE from (38) (applicable only for solutions with zero initial condition). In case of zero initial condition, RMSE for the pathwise LSE and the weighted MCE practically coincide and their speed of convergence is consistent with the theoretical one. For solutions with far-from-zero initial conditions RMSE for the pathwise LSE is close to zero even for small number of   Fourier modes, whereas the weighted MCE converges significantly slower. Figure 4 shows comparison of sample quantiles of pathwise LSE (40 Fourier modes considered) with quantiles of normal distribution including 95% confidence envelope. Although not proved theoretically, these Q-Q plots suggest asymptotic normality of the pathwise LSE for all settings considered.

2D stochastic heat equation with coupling with surroundings
Equations of this type are popular (not only) in physical oceanography, where they model fluctuations of the temperature of the top layer of the ocean around its long-time average, see [28] or [21]. These equations fall into the setting of Example in 5.3 with d = 2 (a two-dimensional domain being the surface of the ocean), where the 2D Laplace operator models thermal conduction along the water surface, the mean-reversion term reflects atmosphere and ocean coupling and the noise term represents random forcing (e.g. vertical heat fluxes due to turbulent environment) -for more details see [28]. The authors in [28] and [21] (Section 6.1.1) discuss estimation of parameters in these equations based on observations in spectral domain under the assumption that the noise process is white in time and space (generated by a standard cylindrical Brownian motion). In contrast, we study parameter estimation assuming more general noise, which can be autocorrelated in time (generated by a cylindrical fractional Brownian motion). In particular, we simulate equation (37) with the following setting: • d = 2, O = (0, 1) 2 and time horizon T = 1, • λ 1 = 1 (the parameter to be estimated), λ 2 = 1, • Hurst index H = 0.3, zero initial condition.
The simulated distribution of the heat along the water surface at specific time instants is shown in Figure 5. Results of the Monte Carlo study of the behavior of the pathwise LSE obtained from 200 runs are depicted in Figure 6. They confirm our theoretical findings on the almost-sure convergence of the pathwise LSE and its speed of convergence (in terms of RMSE). Moreover, Q-Q plot indicate its asymptotic normality, although we do not provide theoretical reasoning (for conciseness).

Discussion
Results of the simulations confirm our findings on strong consistency of the pathwise LSE and on the speed of the convergence. In case of zero initial condition, it behaves similarly to the weighted MCE. However, for non-zero initial condition, pathwise LSE significantly outperforms the weighted MCE. This is in accordance with similar findings from finite-dimensional models (cf. [18] or [19]), where LSE-type estimators improve with far-from-stationary initial conditions (the drift part dominates the noise part in the dynamics of the process in such case), whereas the ergodic-type estimators (such as the weighted MCE) reflecting long-term stationary behavior are ruined.
The theory for the weighted MCE, developed in [17], covers only singleoperator equations, so it is not directly applicable for the second example (2D stochastic heat equation with coupling with surroundings). To our best knowledge, the pathwise LSE is the first efficient tool for parameter estimation in this  equation (important in oceanography, for example) when the noise is generated by a cylindrical fractional Brownian motion with general Hurst parameter 0 < H < 1.
Although we did not address the asymptotic normality of the pathwise LSE in this paper (to keep it concise), we conjecture that it holds. Our conjecture is based on simulation results and asymptotic similarity with the weighted MCE, whose asymptotic normality has been proven in [17].

Conclusions
Using the least-squares formalism we have derived a spectral version of the least-squares estimator of the drift parameter in linear SPDEs with additive fractional noise for arbitrary H ∈ (0, 1). This estimator is strongly consistent in space, however, it can not be directly implemented pathwise, since it contains divergence-type stochastic integral w.r.t. a fractional process. This fact prevents its simulations and its use in practice. We addressed this issue by eliminating the stochastic integral getting a novel pathwise LSE. This estimator is constructed trully pathwise and, moreover, is robust to minor perturbations of the observed process (it does not contain differentiation of the observed trajectory), which makes it particularly useful in practice.
Using the standard tools from classical probability and analysis, as well as modern tools from Malliavin calculus, we demonstrated that the newly developed pathwise LSE is strongly consistent in space and we conjecture that it is also asymptotically normal. Simulations reveal that it can efficiently utilize information about the unknown parameter from both non-stationary and stationary parts to the observed process.
The presented pathwise least-squares estimation procedure appears to be promising approach for various equations driven by fractional Brownian motion. The potentially interesting directions of further research may include (but are not limited to) proving asymptotic normality, studying different observation/sampling schemes, different asymptotic regimes, or generalization to semilinear stochastic equations driven by fractional Brownian motion.
Note that this limit, understood in L 2 -sense, is well-defined because for any h = (h (1) , h (2) , . . .), g = (g (1) , g (2) , . . .) ∈ H the process W (h), h ∈ H is an isonormal Gaussian. Denote by CH (k) n the n-th Wiener chaos associated with W (k) for any k ∈ N, i.e. CH (k) n is the closed (in L 2 (Ω)) linear subspace generated by where H n is the n-th Hermite polynomial. Similarly, let CH n be the n-th Wiener chaos associated with W , i.e. CH n is the closed linear subspace generated by H n W (h) , h H = 1 . Consider X = H n W (k) (h (k) ) ∈ CH (k) n . Then X = H n W (h) and X ∈ CH n . By the linearity and L 2 (Ω)-closedness of CH n it follows that CH (k) n ⊂ CH n ∀k ∈ N. (40)

A.2. Selected functionals of the solution in the second chaos
It can be shown (see e.g. [14]) that Using the construction of the infinite-dimensional isonormal Gaussian process above and the chaos embedding (40), we easily get for

A.3. Almost sure convergence on a fixed chaos
The following lemma is an important tool to demonstrate strong consistency of our estimators. It turns the L 2 -convergence of a sequence of variables on a fixed chaos into the almost sure convergence. This idea, based on the combination of hypercontractivity with Borel-Cantelli 0-1 law, has already been used before, but we decided to formulate it as a separate lemma, because we believe it is interesting on its own. Proof. Consider parameters ζ and η so that 0 < ζ < δ/2 and η > 1 δ/2−ζ . Denote by C a positive constant (independent of N ), which may change from line to line and calculate where Chebyshev's inequality and hypercontractivity property on the fixed Wiener chaos (see e.g. [24], Theorem 2.7.2) were used. Application of Borel-Cantelli lemma yields the almost sure convergence.
negatively correlated increments and for H > 1/2 we have positively correlated increments. Formal derivative of fBm (which exists only in distributional sense) thus represents a good model for fractional noise in time.
To conclude, take the family of independent real-valued fractional Brownian motions {B H k (t), t ≥ 0}, k = 1, 2, . . . and construct the cylindrical fBm on V = L 2 (O) using an orthonormal basis {e k } ∞ k=1 as a formal sum Increments (differentials) of this cylindrical fBm can serve as a model for the noise that is white in space and fractional in time. Indeed, for a fixed 0 < s < t we have which is the (scaled) white-in-space noise. Next, for a fixed measurable D ⊂ O with the finite volume take which itself is a (scaled) fractional Brownian motion, since ∞ k=1 w 2 k = 1 D 2 V < ∞ and so it represents an integrated fractional-in-time noise.