The nonparametric LAN expansion for discretely observed diffusions

Consider a scalar reflected diffusion $(X_t:t\geq 0)$, where the unknown drift function $b$ is modelled nonparametrically. We show that in the low frequency sampling case, when the sample consists of $(X_0,X_\Delta,...,X_{n\Delta})$ for some fixed sampling distance $\Delta>0$, the model satisfies the local asymptotic normality (LAN) property, assuming that $b$ satisfies some mild regularity assumptions. This is established by using the connections of diffusion processes to elliptic and parabolic PDEs. The key tools will be regularity estimates from the theory of parabolic PDEs as well as a detailed analysis of the spectral properties of the elliptic differential operator related to $(X_t:t\geq 0)$.


Introduction
Consider a scalar diffusion, described by a stochastic differential equation (SDE) where (W t : t ≥ 0) is a standard Brownian motion and b is the unknown drift function that is to be estimated. We investigate the so-called low frequency observation scheme, where the data consists of states X (n) = (X 0 , X ∆ ..., X n∆ ) (1) of one sample path of (X t : t ≥ 0), where ∆ > 0 is the fixed time difference between measurements. To ensure ergodicity and to limit technical difficulties, we follow [12] and [24] and consider a version of the model where the diffusion takes values on [0, 1] with reflection at the boundary points {0, 1}, see Section 2.1 for the precise definition. The nonparametric estimation of the coefficients of a diffusion process has attracted a great deal of attention in the past. For the low-frequency sampling scheme (1), Gobet, Hoffmann and Reiss [12] determined the minimax rate of estimation for both the drift and diffusion coefficient and also devised a spectral estimation method which achieves this rate. Thereafter, Nickl and Söhl [24] proved that the Bayesian posterior distribution contracts at the minimax rate, giving a frequentist justification for the use of Bayesian methods. In other sampling schemes, various methods have been studied, see e.g. [15] for a frequentist approach, [25,26] for MCMC methodology for the computation of the Bayesian posterior, as well as [13,16,28,1,22] for recent results on the posterior consistency and contraction rates in the Bayesian setting.
However, often one desires a more detailed understanding of the performance of both frequentist and Bayesian methods, e.g. by establishing semi-parametric efficiency bounds or by proving a nonparametric Bernstein-von Mises theorem (BvM), which would give a frequentist justification for the use of Bayesian credible sets as confidence sets (see [10], Chapter 7.3). Nonparametric BvMs have been explored by Castillo and Nickl [5,6] and have recently been proven for a number of statistical inverse problems [21,23,20,22] by Nickl and co-authors, of which the first two are non-linear inverse problems.
To order to achieve this detailed understanding, a key step lies in studying the local information geometry of the parameter space, which in terms of semiparametric efficiency theory (see e.g. [27], Chapter 25) involves finding the LAN expansion and the corresponding (Fisher) information operator. This in turn determines the Cramér-Rao lower bound for estimating a certain class of functionals of the parameter of interest. While in the Gaussian white noise model with direct observations, the LAN expansion of the log-likelihood ratio is exact and given by the Cameron-Martin theorem, in inverse problems proving the LAN property is often not straightforward.
In a finite-dimensional (parametric) model for multidimensional diffusions which are sampled at high frequency, where the sample consists of states X (n) = (X 0 , X ∆n ..., X n∆n ) with asymptotics such that ∆ n → 0 and n∆ n → ∞, the LAN property was shown by Gobet [11] by use of Malliavin calculus.
The main contribution of this paper is to prove that also with low frequency observations, the reflected diffusion model satisfies the LAN property, under mild regularity assumptions on the drift b. If the transition densities of the Markov chain (X i∆ : i ∈ N) are denoted by p ∆,b , then the log-likelihood of the sample (1) is approximately equal to from which one can see the necessity of two ingredients to show the LAN expansion: • The first is a result on the differentiability of the transition densities b → p ∆,b (x, y), which guarantees that we can form the second-order Taylor expansion of the log-likelihood in certain 'directions' h/ √ n with sufficiently good control over the remainder. See Theorem 1 for the precise statement, where we importantly also obtain an explicit form for the first derivative A b , the 'score operator'.
• The second main ingredient consists of two well known limit theorems, the central limit theorem for martingale difference sequences [4] and the ergodic theorem, which ensure the right limits for the first and second order terms in the Taylor expansion respectively.
In view of this, the main work done in this paper lies in establishing the regularity needed for p ∆,b (x, y), see Theorem 1 below. As there is no explicit formula for p ∆,b (x, y) in terms of b, our approach relies on techniques from the theory of parabolic PDE and spectral theory. We use a PDE perturbation argument, based on the fact that the transition densities of a diffusion process can naturally be viewed as the fundamental solution to a related parabolic PDE. The main difficulty in the proofs lies in the singular behaviour of p t,b (x, y) as (t, x) approaches (0, y), which is why standard PDE results cannot be applied directly, but only in a regularised setting. Thus the arguments will first be carried out for any fixed regularisation parameter δ > 0, where the analysis needs to be done carefully in order to ensure that the estimates obtained are uniform in δ > 0 and hence still valid in the limit δ → 0.
In the context of a statistical inverse problem for the (elliptic) Schrödinger equation [21,18], where the above singular behaviour is not present, PDE perturbation arguments have previously been used to linearize the log-likelihood.
We also remark that the use of more probabilistic proof techniques like in [11] would have been conceivable, too. However, we found the PDE approach employed here to be more naturally suited to dealing with boundary conditions, and it avoids dealing with pathwise properties of the diffusions by working with the transitions densities directly, which are ultimately the objects of interest for analyzing the likelihood.
Potential applications of the LAN expansion presented in Theorem 2 include the study of semiparametric efficiency for a certain class of functionals of b which is implicitly defined by the range of the 'information operator' A * b A b (where A b is the score operator (9)), as well as an infinite-dimensional Bernstein-von-Mises theorem similar to [20,21,22,23]. However, studying the properties of A * b A b needed for this poses a highly non-trivial challenge which still has to be overcome, see Section 2.4 for a more detailed discussion.
In Section 2, we state and prove the LAN expansion. Section 3 is devoted to proving Theorem 1. Finally, in Section 4, we derive the spectral properties of the differential operator L b and the transition semigroup (P t,b : t ≥ 0) needed throughout the proofs.

A reflected diffusion model
We shall work with boundary reflected diffusions on the interval [0, 1], following [12,23]. Consider the stochastic process (X t : t ≥ 0), whose evolution is described by the stochastic differential equation (SDE) Here (W t : t ≥ 0) is a standard Brownian motion, (K t (X) : t ≥ 0) is a nonanticipative finite variation process that only changes when X t ∈ {0, 1} and b : [0, 1] → R is the unknown drift function. We note that K(X), which accounts for the reflecting boundary behaviour, is part of a solution to (2) and is in fact given by the difference of the local times of X at 0 and 1. For any integer s ≥ 0, let C s = C s ((0, 1)) and H s = H s ((0, 1)) denote the spaces of s-times continuously differentiable functions and s-times weakly differentiable functions with L 2 -derivatives, respectively, endowed with the usual norms, and define the subspace This ensures the existence of a pathwise solution (X t : t ≥ 0) to (2) which can be constructed by a reflection argument, see e.g. Section I. §23 in [9] or [24]. For some ∆ > 0, which we assume to be fixed throughout the paper, our sample consists of measurements X (n) = (X 0 , X ∆ ..., X n∆ ) of one sample path, with asymptotics n → ∞.
The process (X t : t ≥ 0) forms an ergodic Markov process with invariant distribution µ b = µ, whose Lebesgue density (which we also denote by see e.g. Chapter 4 in [3]. Moreover, we denote the Lebesgue transition densities and the semigroup associated to (X t : t ≥ 0) by p t,b and P t,b respectively: Here, by Proposition 9 in [24], the transition densities are well-defined as well as bounded above and below for each t > 0, so that (6) is well-defined, too. Let P b denote the law of (X i∆ : i ≥ 0) on [0, 1] N . For ease of exposition, we assume throughout that X 0 ∼ µ b under P b , a common assumption (cf. [12,24]) which we make due to the uniform spectral gap over b ∈ Θ guaranteed by Lemma 12 below, which yields exponentially fast convergence of X t to µ b . Then under any P b , b ∈ Θ, the law of X (n) from (1) on [0, 1] n+1 is absolutely continuous to the n + 1-dimensional Lebesgue measure, and the log-density, which also constitutes the log-likelihood (when viewed as a function of b), is given by We note that some of the above assumptions can be relaxed at the expense of further technicalities in the proofs: Firstly, the assumption X 0 ∼ µ b could be replaced by X 0 ∼ π b (under P b ), for any measures π b with Lebesgue densities such that for all b ∈ Θ, log dπb( Secondly, it is conceivable that the main Theorems 1 and 2 below can be generalized to all b ∈ H 1 and h ∈ {f ∈ H 1 : f (0) = f (1) = 0}, which we shall not pursue further here, however.

Differentiability of the transition densities
In order to prove the LAN property, we need to differentiate the log-likelihood (7) at any drift parameter b ∈ Θ, and the following theorem shows that for any x, y ∈ [0, 1], maps of the form b → p ∆,b (x, y) are infinitely differentiable in 'directions' h ∈ C 1 0 (and in fact, Fréchet differentiable). For b, h ∈ C 1 0 , η ∈ R and x, y ∈ [0, 1], for convenience we introduce the notation Theorem 1. For all b, h ∈ C 1 0 and x, y ∈ [0, 1], Φ is a smooth (in fact, real analytic) function on R, and we have Moreover, for each integer k ≥ 1, we have the following bound on the k-th derivative of Φ at 0: Section 3 is devoted to the proof of Theorem 1. Heuristically, the right hand side of (8) has the form of a solution to an inhomogeneous parabolic PDE (cf. Proposition 4), and this PDE perspective will be key in the proofs. However, one has to be careful with such an interpretation, as the singular 'source term' h∂ 1 p b,t (·, y) does not fall within the scope of classical PDE theory. Therefore, the above intuition needs to be made rigorous via a regularisation argument, see Section 3.

LAN expansion
By Lemma 15, for each b ∈ Θ, p ∆,b (·, ·) is bounded and bounded below. Hence by Theorem 1 and the chain rule, the score operator is given by For any f, g ∈ L 2 ([0, 1] × [0, 1]), we also define the corresponding 'LAN inner product' and 'LAN norm' as follows: Here is our main result, the proof can be found in Section 2.5.
Theorem 2 (LAN expansion). For any b, h ∈ C 1 0 , we have that as n → ∞ and Note that due to the nature of the non-i.i.d. Markov chain data at hand, A b necessarily needs to map into a function space of two variables, as the overall log-likelihood cannot be formed as a sum of functions of single states of the chain, but only of increments of the chain.

Potential statistical applications of Theorem 2
The LAN expansion can be used to obtain semiparametric lower bounds for the estimation of certain linear functionals L(b) for which there exists a Riesz representer Ψ ∈ C 1 0 such that L(·) = Ψ, · LAN , and can potentially further be used to prove a non-parametric Bernstein-von-Mises theorem.
To make this more precise, we define the 'information operator' (which generalizes the Fisher information) by (9) is viewed as a densely defined operator on L 2 with domain C 1 0 and A * b is the adjoint of A b with respect to the inner products ·, · L 2 and ·, · L 2 (p ∆,b µ b ) . Then, for example, to study semiparametric Cramér-Rao lower bounds for functionals of the form ), see p.372-373 in [27] for a detailed discussion. Assuming the injectivity of I b , the 'optimal asymptotic variance' for estimators of L(b) is then given by which may intuitively be understood as an 'inverse Fisher information ψ, I −1 b ψ L 2 ', in analogy to the parametric setting.
When R(I b ) is known to contain at least a 'nice' subspace of functions, e.g. C ∞ c , I b can be inverted on that subspace, and if key mapping properties of I −1 b are known, then along the lines of [21,23,22,20], one can further try to prove a nonparametric BvM. This would assert the convergence of infinite-dimensional posterior distributions to a Gaussian limit measure G whose covariance is given by the LAN inner product via Cov[G(ψ 1 ), G(ψ 2 )] = Ψ 1 , Ψ 2 LAN , cf. (28) in [21]. The identification of R(I b ) in the present case of diffusions sampled at low frequency, as well as the study of mapping properties of I b , remain challenging open problems.

Proof of the LAN expansion
We now give the proof of Theorem 2, assuming the validity of Theorem 1 which is proven in Section 3 below. Besides Theorem 1, the other key ingredient for Theorem 2 is the following CLT for martingale difference sequences, which is due to Brown [4] (building on ideas of Billingsley and Lévy).
Suppose that V 2 n s −2 n n→∞ − −−− → 1 in probability and that for all ǫ > 0, in probability. Then, as n → ∞, we have Proof of Theorem 2. Fix b, h ∈ C 1 0 . Due to the spectral gap of the generator L b (see Lemma 12), the Markov chain (X n∆ : n ∈ N) originating from the diffusion (2) with initial distribution X 0 ∼ µ b , is stationary and geometrically ergodicwe will use this fact repeatedly.
For notational convenience, we write By Theorem 1, f is smooth in η on a neighbourhood of 0, and for some C < ∞, the second order Taylor remainder satisfies Thus, Taylor-expanding the log-likelihood (7) in direction h/ √ n yields that For the remainder term D n , we immediately see from (14) that For C n , observe that the function ∂ 2 η f (0, ·, ·) is a bounded by Theorem 1, such that the almost sure ergodic theorem yields that where E b denotes the expectation with respect to P b . Moreover, we have and by interchanging differentiation and integration (which is possible by Theorem 1), we see that We next treat B n . Let (F n : n ≥ 0) denote the natural filtration of (X ∆n : n ≥ 0). In view of Proposition 3, we write Then, using dominated convergence and the Markov property, we see that M n is a stationary martingale: for any n ≥ 1,we have Moreover, we have that σ 2 n =σ 2 (X (n−1)∆ ) for some bounded measurable functionσ 2 : [0, 1] → [0, ∞) and by the stationarity of (X i∆ : i ≥ 0), we have LAN , whence the ergodic theorem yields that P b -a.s., Lastly, as the Y i 's are bounded random variables, condition (13) is fulfilled, so Proposition 3 yields that B n → d N (0, h 2 LAN ). Finally, we observe that the term A n in (15) from the invariant measure is of order o P b (1), as it can be bounded uniformly using (4):

Local approximation of transition densities
In this section, we study the differentiability properties of p t,b (x, y) as a function of the drift b, and the main goal is to prove Theorem 1. For technical reasons, we first prove a regularized version of it (Lemma 8 in Section 3.2) and then let the regularization parameter δ > 0 tend to 0 to obtain Theorem 1 (Section 3.3).

Preliminaries and notation
We begin by introducing some notation and important classical results.

Some function spaces
For any integer s ≥ 0, we equip the Sobolev space H s = H ( (0, 1)) with the inner product where L 2 is the usual space of square integrable functions with respect to Lebesgue measure. Occasionally it will be convenient to replace the L 2 -inner product above by the L 2 (µ)-inner product, where µ is the invariant measure of (X t : t ≥ 0), which by (21) induces a norm which is equivalent to the one induced by the standard inner product on H s .
We will also use the fractional Sobolev spaces H s for real s ≥ 0, which are obtained by interpolation, see [17]. For s > 1 2 , the Sobolev embedding (18) implies that any function f ∈ H s extends uniquely to a continuous function on [0, 1]. The following standard interpolation equalities and embeddings (see [17], p.44-45) will be used throughout. For all s 1 , s 2 ≥ 0 and θ ∈ (0, 1), we have and for each s > 1/2, we have the multiplicative inequality as well as the continuous embedding where C([0, 1]) denotes the space of continuous functions on [0, 1]. Moreover, for any s > 0, we define the negative order Sobolev spaces H −s as the topological dual space of H s , where for any f ∈ L 2 , the norm can be written as For any positive number T > 0, any Banach space (X, · ) and any inte- For α > 0 with α ∈ N, we denote the space of α-Hölder continuous functions f : [0, T ] → X by C α ([0, T ], X) and equip it with the usual norm We will frequently, without further comment, interpret functions f : , and vice versa.

The differential operator L b
For any drift function b ∈ H 1 , we define the differential operator It is well-known that L b is the infinitesimal generator of the semigroup (P t,b : t ≥ 0) defined in (6), so that we get by the usual functional calculus that P t,b = e tL b for all t ≥ 0 (with the convention e 0 = Id). The fact that the domain D of L b is equipped with Neumann boundary conditions corresponds to the diffusion being reflected at the boundary, see [14] for a detailed discussion. We equip D with the graph norm which by Lemma 13 is equivalent to the H 2 -norm on D. Moreover, for h ∈ H 1 , we define the first order differential operator The operator L b has a purely discrete spectrum Spec(L b ) ⊆ (−∞, 0] (see [8], Theorem 7.2.2). We will denote by (u j,b ) j≥0 the L 2 (µ b )-normalized orthogonal basis of L 2 (µ b ), consisting of the eigenfunctions u j,b ∈ D of L b , ordered such that the corresponding eigenvalues (λ j,b ) j≥0 are non-increasing. When there is no ambiguity, we will often simply write λ j and u j . We will use throughout the spectral decomposition see e.g. p. 101 in [2], and the spectral representations We also note that (4) immediately yields that there exist constants 0 < C < C ′ < ∞ such that for all b ∈ Θ and all x ∈ [0, 1],

A key PDE result
For any f ∈ C([0, T ], L 2 ) and u 0 ∈ D, consider the inhomogeneous parabolic equation We say that a function u : and (22) holds. The following Proposition regarding the existence, uniqueness and regularity properties of solutions to (22) will play a key role for the proofs in the rest of Section 3. To state the result, we need the interpolation spaces D(α), 0 ≤ α ≤ 1, between L 2 and D.
Proposition 4. Suppose 0 < α < 1, f ∈ C α ([0, T ], L 2 ) and u 0 ∈ D. Then there exists a unique solution u to (22), given by the Bochner integral If also f (0) and there exists C < ∞ so that for all such f and u 0 , Proof. This is a special case of Theorem 4.3.1 (iii) in [19] with X = L 2 (µ b ) and A = L b , where we note that the integral formula (24) is given by Proposition 4.1.2 in the same reference. We also note that D(α) coincides with the space D A (α, ∞) from [19] with equivalent norms, see Proposition 2.2.4 in [19]. It therefore suffices to verify that the general theory for parabolic PDEs developed in [19] applies to our particular case. For that, we need to check that (P t,b : t ≥ 0) is an analytic semigroup of operators on L 2 in the sense of [19], p.34, which requires the following.
1. The resolvent set ρ(L b ) of L b contains a sector S θ,ω ⊆ C for some θ > π/2 and ω ∈ R, where S θ,ω is defined by 2. There exists some M < ∞ such that we have the resolvent estimate As L b has a discrete spectrum which is contained in the non-positive half line, both of the above properties are easily checked with ω = 0.

Approximation of regularized transition densities
The main result of this section is Lemma 8, which can be viewed as a regularized version of Theorem 1. The main tools used to prove it are Proposition 4 as well as the spectral analysis of L b and P t,b from Section 4. In order to apply Proposition 4, we view the transition densities p t,b (x, y) as functions of the two variables (t, T is an arbitrary constant T > ∆ > 0 (with the convention that p 0,b (·, y) is the point mass at y). Due to the singular behaviour of p t,b (x, y) for (t, x) → (0, y), a regularisation argument is needed. For any δ > 0 and d ∈ C 1 0 , define the δregularized transition densities by where (P t,0 : t ≥ 0) denotes the transition semigroup for b = 0, which corresponds to reflected Brownian motion started at some point y ∈ [0, 1].

Recursive definition of approximations
We now implicitly define the 'candidate' local approximations to u δ d as solutions to certain parabolic PDEs. To that end, we note that using (6), one easily checks that for all t ≥ 0, Hence we can give the following crucial PDE interpretation to u δ d .
, and u δ d is the unique solution to the initial value problem Proof. We check that Proposition 4 applies with α = 1/2. For this, we need that ϕ δ ∈ D and that L d ϕ δ ∈ D(1/2). Using the spectral decomposition (20) and the fact that µ b = Leb([0, 1]) for b = 0, we see by differentiating under the sum that ϕ δ ∈ D. This is possible by Lemma 12 and the dominated convergence theorem. The same argument yields that ϕ δ ∈ H 3 . Thus, we have that L d ϕ δ ∈ H 1 , which is a subset of D(1/2) by Lemma 17.
We now recursively define the functions R δ k [h] and v δ k [h], k ≥ 0. The norm estimates in Section 3.2.2 justify that they are the correct remainder and approximating terms, respectively, in the k-th order Taylor expansion of η → u δ b+ηh . Definition 7. Let b, h ∈ C 1 0 and δ > 0. 1. For k = 0, we define the '0-th order local approximation' of η → u δ b+ηh at 0, and the remainder of this approximation, by where S is the solution operator defined in (25) and L h was defined in (19).
We should justify why the definition (27) is admissible. We argue by induction. By Lemma 6, we have R δ Hence, using Proposition 4 and the definition of R δ k [h] inductively, one obtains that for each k ≥ 1, (27) is well-defined. From the preceding definitions, we clearly have that Again, this is seen by induction. For k = 0, we have that v δ where we have used that for each k ∈ N ∪ {0},

Regularity estimates
We now derive norm estimates for the remainders R δ k [h] from (27) and (29), using Propsition 4 and the results from Section 4.
The following Lemma is the main result of Section 3. It can be viewed as a regularised version of Theorem 1. Crucially, the estimate below is uniform in δ > 0 such that it can be preserved in the limit δ → 0.
The rest of this section is concerned with proving Lemma 8. In what follows, when we write that an inequality is 'uniform' without further comment, or when we use the symbols , , ≃, we mean that the constants involved can be chosen uniformly over b, h, y, k and δ as in the statement of Lemma 8.
The proof of Lemma 8 consists of two separate Lemmas, which establish an L 2 -estimate (34) and an H 1 -estimate (37) for R δ k [h](∆) respectively. Given these two estimates, Lemma 8 then immediately follows from interpolating: The L 2 -estimate To obtain estimates which are uniform in δ > 0, we 'regularise' R δ k further by integrating in time. For k ≥ 0, define Here is the L 2 -estimate.
Lemma 9. 1. Let b, h ∈ C 1 0 , δ > 0 and recall the definition (25) of S. Then we have that and for k ≥ 1, we have that 2. For all α < 1/4, there exists C < ∞ such that for all b, h, y, k, δ as in Lemma 8, In particular, we have that Proof. We first show (31). Using Riemann sums to approximate the integrals below, the closedness of the operators L b and L h as well as (28), we obtain that Moreover, we have Q δ 0 (0) = 0 and Q δ , so that (31) follows from Proposition 4. For k ≥ 1, (32) is proved in the same manner.
Next, we prove (33) for k = 0. Let α < 1/4, δ > 0, b ∈ Θ, h ∈ C 1 0 with h H 1 ≤ 1, and let us write In view of (31) and Proposition 4, and noting that hf (0) = 0, it suffices to show that f C α ([0,T ],L 2 ) ≤ C for some uniform constant C. For all t < t ′ ∈ [0, T ], we have by the definition of u δ b+h and Fubini's theorem that For convenience, let us for now write µ for µ b+h and (λ j , u j ) k≥0 for the eigenpairs of L b+h . Using the spectral decomposition (20) with b + h in place of b and Fubini's theorem, integrating each summand separately yields that where Fubini's theorem is applicable due to Lemma 12: From Lemma 14 and (36), it follows that Using this, (57), the self-adjointness of P t,b+h with respect to ·, · L 2 (µ) and Lemma 17, we obtain that Hence, Proposition 4 and (31) imply (33) for k = 0. For k ≥ 1, (33) now follows from inductively using Proposition 4, (32) and Finally, (34) follows upon differentiating (33) in t.
The H 1 estimate The H 1 -estimate reads as follows.
Lemma 10. Let k ≥ 0 be an integer and ∆ > 0. Then there exists C < ∞ such that for all b, h, y, k, δ as in Lemma 8, To prove Lemma 10, we express R δ k [h] using (24) and decompose the integral into times close to 0 and times bounded away from 0. The following Lemma allows us to control the respective integrals.
Lemma 11. Let T > 0 and 0 < η < T . Then there exists C < ∞ such that for all b, h, y, k, δ as in Lemma 8, the following estimates hold.
1. For allT ∈ (0, T ], we have Proof. We first show (38). By Lemma 13, we can estimate the (−L b ) 1/2 -graph norm instead of the H 1 norm. Using Lemma 12, we have A similar calculation yields that Combining the last two displays completes the proof of (38).
To prove (39), we use Lemma 16 and the boundary condition h(0) = h(1) = 0 to integrate by parts, which yields that Proof of Lemma 10. The case k = 0 follows from Lemma 15. For k ≥ 1, we split Taking C to be the larger of the two constants from (39) and (34), we have by (39) and (34) that For the second term, we apply (38) to obtain In order to further estimate II, we iterate the preceding argument k − 1 times by writing R δ j (s) H 1 , j ∈ {1, ..., k − 1}, in the form (24), splitting the integral and estimating the two terms analogously, which yields where we have used (59) in the last step. This completes the proof.

Proof of Theorem 1
We now prove Theorem 1 by letting the regularisation parameter δ > 0 in Lemma 9 tend to 0. Let us fix b ∈ Θ, h ∈ C 1 0 with h H 1 ≤ 1 and x, y ∈ [0, 1], and recall the notation Φ(η) := p ∆,b+ηh (x, y) for η ∈ R. For notational convenience, for any δ > 0, η ∈ R and integer k ≥ 0, we define Then Lemma 8 implies that there is C < ∞ such that for all δ > 0 and η ∈ R, Hence, for all δ > 0, Φ δ is given by the power We divide the rest of the proof into three steps. The first two steps imply an analogous power series for Φ, and the third proves the integral formula (8).
1. Convergence of Φ δ (η). Note that by the definition of u δ b+ηh , we have that Moreover, by Lemma 15, for any B > 0 we have that Thus, using Lemma 17 and choosing B large enough, it follows that for any α < 1/4, there is M < ∞ such that for all b ∈ Θ, h ∈ C 1 0 with h H 1 ≤ 1 and |η| ≤ 1, 2. Convergence of a δ k . Fix some η = 0 and some sequence δ n > 0 tending to 0 as n → ∞. Using (40), it is easily seen inductively that for all k ≥ 0, the sequence (a δn k : n ∈ N) is bounded. Hence, by a diagonal argument there exists a subsequence (δ n l : l ∈ N) and some sequence a k ∈ R such that for all k, a δn l k l→∞ − −− → a k . Defining the polynomials we see that (40) still holds with Φ and p k in place of Φ δ and p δ k , whence we have proved that Φ is analytic and Φ(η) = ∞ k=0 a i η i on η ∈ − 1 2C , 1 2C . 3. Proof of (8). It remains to show the integral formula (8) for Φ ′ (0). By what precedes, we know that the constants a 0 , a 1 from (43) satisfy Moreover, by definition of v δ 1 [h], we have for all δ > 0 that Therefore, (8) is proven if we can show that the following expression converges to 0 as δ → 0: showing that the dzds-integrand in I tends to 0 pointwise. By the heat kernel estimate (61) and (59), we can also bound the dzds-integrand uniformly in δ by where C is the constant from (61 whence Lemma 17 yields that Moreover, the ds-integrand is bounded by (cf. Lemma 15) an integrable function. Hence by dominated convergence, we have |II| In this section, we collect some properties of the generator L b , the differential equation related to L b and the transition semigroup (P t,b : t ≥ 0) which are needed for the proofs of Section 3. Although some results can be obtained using well-known, more general theory, our proofs are based on more or less elementary arguments, using the spectral analysis of L b in Section 4.1.

Bounds on eigenvalues and eigenfunctions of L b
The following Lemma summarizes some key properties of the eigenparis (u j , λ j ) of L b . Note that the estimate (45) is an improvement on the bound in Lemma 6.6 of [12], and that (45) moreover coincides with the intuition from the eigenvalue equation L b u j = λ j u j that "two derivatives of u j correspond to one order of growth in λ j ".
Lemma 12. Let s ≥ 1 be an integer and B > 0.
Proof. Using that u j ∈ D ⊆ H 2 and (17), we obtain that for all j ≥ 0, u ′′ j = λu j − bu ′ j ∈ H 1 . Differentiating this equation s − 1 times and bootstrapping this argument yields that u (s+1) j ∈ H 1 . Next, we prove (44) by adapting arguments from Chapter 4 of [8]. The standard Laplacian L 0 = ∆ with domain D is a nonpositive operator, self-adjoint with respect to the L 2 -inner product, with spectrum −j 2 π 2 : j = 0, 1, 2, ... and associated quadratic form where the fact that Dom((−L 0 ) 1/2 ) = H 1 is shown in Chapter 7 of [8]. Similarly, using (4) and integrating by parts using f ′ (0) = f ′ (1) = 0, we have that L b , with domain D, is self-adjoint with respect to the L 2 (µ b )-inner product, and that for any f ∈ D, the associated quadratic form is given by For any finite-dimensional subspace L ⊆ D, define Then by Theorem 4.5.3 of [8], the eigenvalues of L 0 and L b are given by respectively. This, combined with (46) and (21), yields (44). We now prove (45). Iterating the equation L b u j = λ j u j , we have Note that in each summand above, except for the first one, the sum of the orders of all derivatives is at most 3. This generalizes to n ≥ 3, in that there exist polynomials P n,j such that for which one can check the following properties by induction: 1. For all n ≥ 1 and m ≤ 2n − 1, P n,m has degree at most n.
. For the odd order derivatives of u j , there similarly exist polynomialsP n,m of degree at most n such thatP n,2n = nb and where the only summand containing the factor b (2n−1) is u ′ j b (2n−1) . We now show (45) by an induction argument which consists of the base case and two induction steps.

4.2.
Characterisation of Sobolev norms in terms of (λ j , u j ) Using Lemma 12, we now prove that the graph norms of the non-negative selfadjoint operators (−L b ) θ , θ ∈ {0, 1 2 , 1}, on the respective domains, are equivalent to standard Sobolev norms. For any Banach space (X, · X ) and any linear operator T : D → X with domain D ⊆ X, we denote the graph norm of T by Then for any f ∈ L 2 , we have and for any f ∈ Dom (−L b ) θ , we have 2. There exists 0 < C < ∞ such that for any θ ∈ {0, 1 2 , 1} , we have 3. There exists 0 < C < ∞ such that for all f ∈ L 2 , Proof. 1. Let ℓ 2 = ℓ 2 (N ∪ {0}) denote the usual space of square-summable sequences. We first prove (51) for θ = 1. Define the dense linear subspace Then by Lemma 1.2.2 in [8], we know that the restriction of L b to D, which we shall denote by L D b , is an essentially self-adjoint operator on L 2 (µ b ). Also, it is elementary to see that under the unitary operator Thus, the unique self-adjoint extentions of both operators (cf. [8], Theorem 1.2.7), which we denote by L b and M , are also unitarily equivalent. Hence, for The above condition defines the domain of the self-adjoint extension of M D , see [8], Theorem 1.3.1). Moreover, we have the norm equality To see (51) for θ ∈ [0, 1], we note that the fractional power (−L b ) θ is unitarily equivalent to multiplication with |λ j | θ : j ≥ 0 , and that f ∈ Dom (−L b ) θ iff 2. We now show (53). For θ = 0, there is nothing to prove. For θ = 1/2, note that by Theorem 7.2.1 in [8] and (46), we have Dom (−L b ) 1/2 = H 1 and The case θ = 1/2 now follows from (21). Finally, let θ = 1. It is clear that For this, we use Cauchy's inequality with ǫ to obtain for some c 1 that Hence, integrating by parts and using Cauchy's inequality with ǫ again yields proving that f 2 . For any f ∈ L 2 and any test function ψ ∈ H 1 , let us write f j = f, u j L 2 (µ b ) and ψ j = ψ, u j L 2 (µ b ) , j ≥ 0 respectively. Then by (52)-(53), we have (55)

Basic norm estimates for the one-dimensional Neumann problem
From the preceding Lemma, we can immediately derive some basic properties of the (elliptic) boundary value problem L b u = f on (0, 1), u ′ (0) = u ′ (1) = 0 (56) needed in the proof of Lemma 9. Let us denote the L 2 (µ b )-orthogonal complement of the first eigenfunction u 0 ≡ 1 of L b by u ⊥ 0 = f ∈ L 2 : f dµ = 0 .
Lemma 14. For every f ∈ u ⊥ 0 , there exists a unique function u ∈ D ∩ u ⊥ 0 such that L b u = f , for which we use the notation u = L −1 b f . Moreover, for every B > 0 there exists C < ∞ such that for all b ∈ C 1 0 with b ∞ ≤ B and f ∈ u ⊥ 0 , Proof. It follows immediately from the spectral representation (52) that L b is a one-to-one map from D ∩ u ⊥ 0 to L 2 ∩ u ⊥ 0 , and that L −1 b is unitarily equivalent to multiplication by (λ −1 j ½ j≥1 : j ≥ 0) in the spectral domain, so that the L 2 → L 2 norm of L −1 b is finite. Hence, using the norm equivalence (53), the case s = 2 then follows from The case s = 0 is obtained by duality: Using that L −1 b is a self-adjoint operator on u ⊥ 0 and the previous case s = 2, we have that Finally, for s = 1, Lemma 12 and 13 imply that 4.4. Estimates on p t,b (·, ·) and P t,b Using Lemmata 12 and 13, we now state and prove some (partially well-known) results of the transition operators P t,b and the Lebesgue transition densities p t,b (·, ·), as defined in (6) and (5).
1. There exist constants 0 < C < C ′ < ∞ such that for all t ≥ t 0 , b ∈ C 1 0 with b C 1 ≤ B and x, y ∈ [0, 1], Proof. For a proof of (58), we refer to Proposition 9 in [24]. Let us now prove the first part of (59); the second is obtained analogously. Let n ≤ s + 2, m ≤ s. Then (4) yields that sup Using the multiplicative inequality (17), the spectral decomposition (20) and Lemma 12, we have where Lemma 12 implies that the constants above are uniform in b H s ≤ B.
The following Lemma summarizes some basic properties of (P t,b : t ≥ 0).
Finally, we collect some properties of P t,b and p t,b in the limit t → 0.