STEIN’S METHOD, SMOOTHING AND FUNCTIONAL APPROXIMATION

Stein’s method for Gaussian process approximation can be used to bound the diﬀerences between the expectations of smooth functionals h of a c`adl`ag random process X of interest and the expectations of the same functionals of a well understood target random process Z with continuous paths. Unfortunately, the class of smooth functionals for which this is easily possible is very restricted. Here, we provide an inﬁnite dimensional Gaussian smoothing inequality, which enables the class of functionals to be greatly expanded — examples are Lipschitz functionals with respect to the uniform metric, and indicators of arbitrary events — in exchange for a loss of precision in the bounds. Our inequalities are expressed in terms of the smooth test function bound, an expectation of a functional of X that is closely related to classical tightness criteria, a similar expectation for Z , and, for the indicator of a set K , the probability P ( Z ∈ K θ \ K − θ ) that the target process is close to the boundary of K .


INTRODUCTION
Stein's method [Stein, 1972[Stein, , 1986] is a powerful method of obtaining explicit bounds on the distance between a probability distribution L(X) of interest and a well-understood approximating distribution L(Z) on some metric space (S, dist). Here, L(X) denotes the distribution of the random variable X, and "distance" is represented by a bound on the differences |Eh(X) − Eh(Z)|, for all functions in some class H of test functions:  dist(x, y) 1, the distance is the Wasserstein metric. The general method was treated in monograph form in [Stein, 1986], its application to approximation by the Poisson and normal distributions is described in the books [Barbour, Holst, and Janson, 1992] and [Chen, Goldstein, and Shao, 2011], respectively, and its many uses in combination with the Malliavin calculus are presented in the monograph [Nourdin and Peccati, 2012]. Stein's method is not restricted to approximating the distributions of real-valued random variables, but can be used for multivariate distributions, as introduced in [Barbour, 1988] for the Poisson and [Götze, 1991] for the normal, as well as for entire processes, as developed by [Barbour, 1988] and [Arratia, Goldstein, and Gordon, 1989] for Poisson processes and [Barbour, 1990] for Brownian motion. A feature of Stein's method is that, in applications, there is often a class of functions H that is particularly well adapted for use with the method, resulting in a distance that is easily bounded. For normal approximation in one dimension, the family of (bounded) Lipschitz functions is typically amenable, leading to approximation with respect to a (bounded) Wasserstein distance. This distance is very natural in the context of weak convergence, but is not well suited for approximating tail probabilities, where the appropriate test functions are indicators of half lines, and hence are not Lipschitz. Nonetheless, by approximating the indicator functions above and below by Lipschitz functions with steep gradient, a (bounded) Wasserstein distance of ε easily implies an approximation bound of order O(ε 1/2 ) for the probability of a half line. Thus smoothing the indicator function, and then using the error bound for smooth functions, immediately results in bounds for the probabilities of half lines, albeit at the cost of an inferior rate of approximation. If better rates of approximation are required for tail probabilities, then (much) more work usually has to be done.
For process approximation by Brownian motion, the classes of 'smooth' test functions M 0 c , c > 0, used in [Barbour, 1990] and [Kasprzak, 2020a,b], and given in (1.2) and (1.5) below, are particularly well adapted for use with Stein's method. However, the classes are not rich enough to directly imply bounds for the distributions of functionals, such as the supremum, that have immediate practical application. This limits the usefulness of the results obtained. As an example, it would be advantageous to know that, if X belonged to the space D of càdlàg processes indexed by [0, T ]  useful, being designed to give error bounds in situations that are not amenable to other more direct approaches. In the context of the general version of Stein's method for Gaussian (not necessarily Brownian) process approximation, introduced in [Barbour, Ross, and Zheng, 2021], it has already proved successful in deriving bounds for the error in approximations to the distributions of useful functionals, based on those that can be established for functions in the class M 0 c . The ideas are also fundamental to the Gaussian smoothing techniques recently derived in [Balasubramanian, Goldstein, Ross, and Salim, 2023], and applied to the analysis of wide random neural networks.

Related approaches for process approximation
There is an enormous literature on process approximation in classical settings, such as random walks and martingales, with the best results using strong embeddings. As is typical for Stein's method, our focus is on non-classical settings where strong embeddings are not available, and so this literature is not relevant here. There are other general approaches to Gaussian process approximation in the Stein's method literature. These approaches either suffer from lack of applicability, or are developed in function spaces, such as L 2 [0, 1], equipped with metrics that are too weak to see natural statistics of the process, such as the maximum, or the finite dimensional distributions. Even convergence for such statistics cannot be established by using such metrics. Regarding applicability, the approach of [Barbour, 1990] is the most flexible, because it is a natural extension of the methods previously used for approximating the distributions of random variables using Stein's method, and many of the techniques that have found great success there can be adapted to it; see, for example, [Döbler and Kasprzak, 2021], [Kasprzak, 2020a,b]. The results of this paper show that rates of convergence from the approach of [Barbour, 1990] can be relatively easily adapted to imply rates of convergence for many natural statistics that are continuous with respect to Skorokhod topology.
In more detail, [Shih, 2011] develops an approach to Stein's method for Gaussian measures on separable Banach spaces. When approximating continuous processes, this setting is rich enough to include most natural statistics, because C[0, 1] equipped with the sup norm is such a space. However, the bounds developed there are complicated, being expressed in terms of associated Hilbert norms and embeddings, and their evaluation in concrete settings seems to be too difficult to have been widely used. The next step was taken in [Coutin and Decreusefond, 2013]. Here, Stein's method is developed for Brownian motion, now viewed as an abstract Wiener measure on Hilbert space. The corresponding inner products are of integral type, and do not see finite dimensional distributions. Since the inner product determines the metric on the underlying space, the rates of convergence are not transferable to many natural statistics. Their approach has been further applied and refined in [Besançon, Decreusefond, and Moyal, 2020] and [Bourguin and Campese, 2020], to make it somewhat more applicable, but without removing the drawback inherent in the weak metric.
Finally, in a recent paper [Coutin and Decreusefond, 2020], a rate of convergence is derived that is expressed in terms of the bounded Wasserstein distance with respect to the fractional Sobolev norm, 1 but only in the special setting of Donsker's theorem. This metric is much stronger. However, their technique involves applying Stein's method to a finitedimensional discretization of the process, and then using bounds on maximal fluctuations to handle the error in the discretization. In the setting of Donsker's theorem, the growth of the error in dimension when applying Stein's method is well controlled, and sharp maximal inequalities are classically available. Both of these are crucial, if good bounds are to be obtained using their method. Its applicability in more general settings has not yet been established. Their bounds, in the limited context of Donsker's theorem, are better than ours, as discussed below in Example 1.8; both are rather worse than those obtained using strong approximation Tusnády, 1975, 1976]; see the discussion in Remark 1.9. As the highlight of this article, the bounds that we derive are applicable to functionals that need not be Lipschitz, the limiting process can be quite general, and the process to be approximated may have an arbitrary dependence structure. All of these features were needed for the queueing application in our companion paper [Barbour et al., 2021, Theorem 1.2].

Test functions
In order to state the main result, we need some further definitions. Let D := D [0, T ]; R d be the set of functions from [0, T ] to R d that are right continuous with left limits. We assume, with little loss of functionality, that T 1 to simplify forthcoming bounds. The space D endowed with the sup norm · is a Banach space (though not separable), and we denote the Fréchet derivatives of functions h : D → R by Dh, D 2 h, . . ..
As in [Barbour, 1990] and [Kasprzak, 2020a], let M 0 be the set of functions h : D → R such that is finite, where we write A := sup w: w =1 |A[w, . . . , w]| for any k-linear form A. Letting I t ∈ D([0, T ]; R) be defined by we are interested in functions h ∈ M 0 such that for all r, s, t ∈ [0, T ] and x 1 , (1.5) For θ > 0 and a Skorokhod-measurable set K ⊂ D, we define the θ-enlargement and θshrinkage as follows: to the usual sup norm.
where dist(w, For w ∈ D with w < ∞, we can define the ε-regularized versions of w as follows: For ε > 0, ( 1.6) where U is uniformly distributed over the interval (−1, 1), and we define w(t) = w(T ) for t > T and w(t) = w(0) for t < 0. In other words, we follow the convention that for a function s ∈ [0, T ] → w(s) and for any x ∈ R, the function w(• + x) is understood as It is easy to see that the path w ε , defined in (1.6), is absolutely continuous, so that by Rademacher's theorem, ∇w ε is well defined almost everywhere.
(1.16) Thus, with this condition in addition to κ (n) i → 0, i = 1, 2, it follows that X n converges weakly to Z. Now, by the definition of X n,ε , we have X n,ε − X n ω Xn (ε)[0, T ], where denotes the uniform modulus of continuity of x on [0, T ]. It is well known that any sequence (X n , n ∈ N) ⊂ D that converges weakly to a limit with continuous sample paths satisfies the tightness condition lim sup Hence, given κ (n) i → 0, i = 1, 2, the condition (1.16) is a necessary and sufficient condition for weak convergence to Z.
The Lévy-Prokhorov distance between X and Z can be defined as for all Skorokhod measurable subsets K , and the bounded Wasserstein distance by  19) and the bounded Wasserstein distance by for any positive δ, ε, θ and γ, where The main use for the bound given in Theorem 1.1 is to obtain explicit bounds on the error in approximating probabilities and expectations of functionals involving the process X by the corresponding values for the process Z. These follow from (1.14) and (1.15) by optimizing the choice of ε, δ, γ, θ. In the case of a sequence of processes indexed by n, rates of convergence can be deduced, as illustrated in Examples 1.8 and 1.10 below. The following lemma provides a useful bound for probabilities of the form P( Y ε − Y θ). It is a quantitative version of the classical condition of [Chentsov, 1956].
) ∈ D is a random process, and that, for some β > 1 and γ > 0, for all s < u < t such that 1 2 n −1 t − s 1. Then, for ϕ n (·) defined by it holds that, for any ε ∈ (n −1 , 1), we have Remark 1.5. (i) The condition (1.21) can be replaced by is true for all t > s, then the term ϕ n in the bound (1.23) can be dropped.
A standard setting in which the modulus of continuity can be bounded is that of normalized sums of mixing random variables. Suppose that Y (t) := n −1/2 ⌊nt⌋ j=1 X j , where X 1 , X 2 , . . . , X N is a sequence of centred random variables such that, for some p > 2, E[|X j | p ] 1/p c p uniformly in 1 j N. Suppose also that the sequence is strongly mixing, with mixing coefficients satisfying Lemma 1.6. Under the above mixing conditions, for ε > 1 2n and for T N/n, with ω Y (ε)[0, T ] as defined in (1.17) and with r : For a process Y for which the differences √ n{Y (j/n) − Y ((j − 1)/n)} satisfy the same conditions as the X j , but Y (t) may vary on intervals of the form ((j − 1)/n, j/n], the bound (2.43) in the proof of Lemma 1.6 can be used to show that (1.21) in Lemma 1.4 is satisfied, with β = r/2 and γ = r. A pendant of (1.22) is then needed to control the variation on intervals of length 1/n.
for some positive constants k and τ , then, for any γ 2, where G ∼ N (0, 1) is standard normal. Then Lemma 1.4 and Remark 1.5-(iii) imply that We are not aware of any general theory for bounding the final term P(Z ∈ K θ \ K −θ ), even for restricted classes of sets and continuous Gaussian processes. For finite dimensional Gaussian measures and convex sets, such enlargements have order θ as θ → 0; see, for example, [Ball, 1993] and [Götze, Naumov, Spokoiny, and Ulyanov, 2019, Section 1.1.4]. For Gaussian processes with values in a Hilbert space, there are some results when K is an open ball [Götze et al., 2019]. That being said, for certain K and Z, it may nonetheless be possible to obtain quantitive results; see the following two examples.
(i) If (Z t : t ∈ [0, 1]) is a Brownian motion on R d and g : D([0, 1]; R d ) → R is a measurable function that is Lipschitz on C([0, 1]; R d ) such that g(Z) has a bounded density, for example if d = 1 and g(w) = sup 0 s 1 w(s), then for K = {w ∈ D : g(w) y}, it is easy to see that where c ′ is a constant depending on the density bound and the Lipschitz constant of g. In such an example, Theorem 1.1 can be used to provide bounds on the Kolmogorov distance between L(g(X)) and L(g(Z)).
(ii) We can also obtain quantitive results for finite-dimensional distributions as follows. Setting d = 1, let 0 < t 1 < · · · < t k T , K be a convex set in R k , and has non-singular covariance, then Gaussian isoperimetry or anti-concentration (e.g., [Ball, 1993] and [Götze et al., 2019, Section 1.1.4]) implies where c k is a constant depending on the dimension k and on the covariance kernel of Z.
Theorem 1.1 and the discussion following should be compared to [Barbour, 1990, Theorem 2] and [Kasprzak, 2020a, Proposition 2.3], which give criteria for weak convergence assuming a bound of the form for all functions h in the larger class M 0 , which are not assumed to satisfy the smoothness condition (1.4) (the statement in [Barbour, 1990, Theorem 2] is not correct, and the bound must hold for functions without the smoothness condition). For functions h such that Under the additional smoothness condition (1.4), |D 2 h(w)[x, y r ]| → 0 for some sequences y r such that y r = 1 for all r, in which the functions y r become 'small' in the sense that |{u ∈ [0, T ], y r (u) = 0}| → 0. 2 A minor advantage of working with this smaller class of functions M 0 c is that a discretization step can be avoided, which in turn can remove a log-term from the convergence rate; see [Barbour, 1990, Remark 2 and (2.29)]. More importantly, applying Stein's method using only the test functions in the smaller class has wider applicability; for instance, [Barbour et al., 2021, Theorem 1.2] gives a Gaussian process approximation to the GI/GI/∞ queue, using condition (1.4) in the proof in an essential way.
Example 1.8. As a proof of concept, we explore the quality of result that can be obtained with Theorem 1.1 in the classical case, where 1 ] = 1 and E |W 1 | p < ∞ for some p 3. Donsker's theorem implies that the limiting process Z is a standard Brownian motion. First, [Barbour, 1990, Theorem 1 and Remark 2] implies that, for p 3, the bound (1.13) holds with for a universal constant C, and then the first two terms of (1.14) are bounded by where C 0 = C 0 (ε, δ) and c 1 = c 1 (ε, δ) are as in (1.11), (1.9), and (1.12). Note that, for the Brownian motion Z, we can deduce from Remark 1.7 that for any l 2, for some constant K Z depending on L(Z) and l. Moving to P X n,ε −X n θ , it is possible to use Doob's L p -inequality and Rosenthal's inequality to bound P sup for some constant K W depending on L(W 1 ) and p. From here, a standard argument, based on the inequality However, we can get a bound of a similar quality by applying Lemma 1.4 and Remark 1.5(2), which we do here to illustrate their use. We first verify (1.21) for all 0 < s < t T . If |t − s| < 1/n, then for since at least one term in the minimum must be zero. If |t − s| 1/n, then for p > 2, Rosenthal's inequality [Rosenthal, 1970, Theorem 3] implies that where C p is a constant depending only on p, and (1.21) thus holds, for β = p 2 − 1 > 0 and γ = p, by Markov's inequality. In order to use Remark 1.5-(ii), we also note that Altogether, we deduce from Lemma 1.4 with β = p 2 − 1 > 0 and γ = p that for some constant K W depending on L(W 1 ) and p.
Balancing θ and T ε p/2−1 θ −p gives θ = T 1/(p+1) ε (p−2)/(2(p+1)) and then balancing the final two terms in ε and δ gives As a result, we have established a rate of convergence in Lévy-Prokhorov distance: Assuming finite third moments (p = 3) and T = 1, the rate is O n − 1 56 √ log n , and we can obtain the rate O n − 1 20 +a for arbitrarily small a > 0, if we assume that W 1 has all its moments.
Remark 1.9. Fix T = 1. For the bounded Wasserstein distance, similar calculations can be carried through, based on the bound given in Corollary 1.3. From (1.32), it follows by integration that E X n,ε − X n = O ε (p−2)/(2p) , so that the bound (1.20) is easily seen to be of order O ε Balancing the terms by taking gives ε = n − p 7p−6 , and hence a bound Thus, assuming finite third moments (p = 3), the rate is O n − 1 30 , and, if W 1 has all its moments, the rate is O n − 1 14 +a , for any a > 0. In this example, the approach of [Coutin and Decreusefond, 2020], discussed in Section 1.1, can also be applied. It gives the rates O n − 1 18 +a and O n − 1 6 +a for any a > 0, respectively, for bounded Lipschitz functionals, which are better; however, no bounds are given by their method for the Lévy-Prokhorov distance. In the strong approximation theorems of Tusnády, 1975, 1976], copies of X n and Z are constructed on the same probability space, in such a way that the distribution of the random variable X n − Z is tightly controlled. In particular, with the moment assumptions above, their bounds on E X n − Z imply corresponding rates for bounded Lipschitz functionals of orders O n − 1 6 and O n − 1 2 +a for any a > 0, respectively (see Csörgő and Révész [1981, Theorem 2.6.7]), and, if W 1 has a finite moment generating function, a rate of order O n − 1 2 log n [Komlós, Major, and Tusnády, 1976, Theorem 1]; these are much better still.
Example 1.10. In [Döbler and Kasprzak, 2021, Section 6], the joint distribution of the processes counting edges and two-stars in the Bernoulli random graph G(n, p) is shown, after appropriate centering and normalization, to converge weakly to a Gaussian limit. In this example, we complement Döbler and Kasprzak's result with a convergence rate, and we refer interested readers to their paper for an overview of relevant literature. The two-dimensional process that they considered was X n := (X (1) n , X (2) n ) defined by where E ij , 1 i < j n, are independent indicator random variables with fixed expectation p ∈ (0, 1), and where where B is a standard real Brownian motion, the limiting random process is the degenerate two-dimensional process 3 Z := (Y, 2pY ).
In this particular example, we obtain the following rates of functional convergence in the Lévy-Prokhorov distance and in the bounded Wasserstein distance: for any a > 0. To establish the claim by invoking Corollary 1.3, we need to bound probabilities like P X n,ε − X n > θ and P Z ε − Z > θ (1.37) for any θ, ε > 0 and for any n 2. For this purpose, we use Lemma 1.4 together with Remark 1.5 (i) to bound the first term in (1.37), and Remark 1.7 to bound the second term in (1.37). For the latter, it is immediate from (1.34), (1.35), and independence of Brownian increments that, for 0 v < u 1, , so that the bound (1.28) can be used with T = τ = 1 and with any choice of γ 2. In other words, in view of (1.35), we have ( 1.38) which is of the same order as the bound in (1.31). For P[ X n,ε − X n > θ], considering the first component, note that, for 0 s < t 1, say. Now, for U ∼ Binomial(m, p), it follows that E{(U − mp) 2r } C r (p)m r , for a suitable constant C r (p), for any r ∈ N. Hence it follows, after a little calculation, that, for 0 s < t 1 such that (t − s) 1 2 n −1 , we have where the constants K 1 , K 2 , and K in (1.39) below do not depend on (t, s, n). Note that the lower bound on t − s is used to accommodate the rounding error: ⌊nt⌋ − ⌊ns⌋ nt − ns + 1 3n(t − s) when (t − s) 1 2 n −1 . It now follows easily that, in the same range of s and t, and for any r ∈ N, (1.39) For the second component, observe first that for 0 s < t 1 Now, in computing E{(X n (s)) 2r }, the expectations E 2r l=1 E i l ,j l ,k l are zero unless each index set {i l , j l , k l } overlaps with another index set {i l ′ , j l ′ , k l ′ } in at least two elements. The dominant contribution to the sum making up E{(X (2) n (t) − X (2) n (s)) 2r } is seen to come from collections of index sets consisting of r pairs that overlap in two elements. Each such pair has 4 distinct indices, the largest of which lies between ⌊ns⌋ and ⌊nt⌋, so that there are O {n 4 (t − s)} r such collections of index sets (when t − s 1 2 n −1 ) and each gives a contribution of order O {n −2 } 2r . The contribution from all other arrangements of index set is of smaller order. Hence for a suitable constant K 3 , and it follows that, for 0 s < t 1 such that (t − s) 1 2 n −1 and for any r ∈ N, we have (1.40) Since the process X has only one jump in any interval of length 1/n, the bounds (1.39) and (1.40) can be used with t − s = 1/n to bound ϕ n (η) defined at (1.22). That is, we can find a constant K 3 such that ϕ n (η) K 3 n 1−r η −2r .
Invoking Lemma 1.4, it now follows that, for any r ∈ N, for a suitable constant K ′ r . Note that the above bound is of exactly the same orders as (1.32) in the case of sums of i.i.d. random variables, so that the same choices of ε, δ, θ, and γ = p = 2r (for any r 1) can be made as in Example 1.8 and Remark 1.9 so as to verify our claim (1.36).
The example illustrates the strength of our approach, as is typical in Stein's method, that it applies in situations with non-trivial dependencies, where rates of convergence are not otherwise available; see [Barbour et al., 2021] for another application where Theorem 1.1 is needed. Being able to explicitly incorporate a time interval of length T that may depend on n is also very useful. Note that the error estimates given above still converge to zero as n → ∞, if T = T n grows like a small enough power of n.
The key to proving Theorem 1.1 is the following lemma on Gaussian smoothing, for which we need the (ε, δ)-smoothing of h, defined in (1.8). The lemma is an infinite-dimensional analog of finite-dimensional Gaussian smoothing inequalities found, for example, in [Raič, 2018, Section 4.2]. The result is closely related to [Kuo, 1975, Theorem 6.2, Chapter II].
Lemma 1.11. Let h : D → R be bounded and measurable with respect to the Skorokhod topology, and let ε and δ be positive. Then the function h ε,δ , defined in (1.8), is also Skorokhodmeasurable, and has infinitely many bounded Fréchet derivatives (with respect to the uniform norm) satisfying
An expression for the Fréchet derivatives and bounds can be found at (2.11) and (2.24). They are not complicated, but require some set-partition notation, stemming from Faà di Bruno's formula for the derivatives of an exponential. The proof begins with the easy fact that w ε belongs to the Cameron-Martin space of the sum of a d-dimensional Brownian motion and an independent Gaussian vector. (We must add the Gaussian vector because w ε may not satisfy w ε (0) = 0, and thus we present a variant of the Cameron-Martin theorem in Theorem 2.1 below.) As a consequence, we can write h ε,δ (w + x) − h ε,δ (w) as a single expectation with respect to the Gaussian process (Brownian motion plus an independent Gaussian vector). Roughly speaking, such a difference is smooth in x due to the change of measure formula.

PROOFS
Let us first state a variant of the Cameron-Martin-Girsanov theorem [Cameron and Martin, 1944], when the Gaussian process is the sum of a Brownian motion and an independent Gaussian random variable.
Let Θ be a standard Gaussian random vector on R d that is independent of B. Then, for any bounded measurable function Φ : where · denotes the inner product on R d .

Note that the Wiener integral
is normally distributed with mean zero and variance T 0 |∇g(t)| 2 dt. The usual Cameron-Martin theorem asserts that the probability measure induced by B+g on the path space C([0, T ]; R d ) is equivalent to that of B, when g satisfies the condition (2.1) and g(0) = 0; see also pages 333-335 in [Revuz and Yor, 1999]. For our purpose, we need a process that is absolutely continuous with its shift by w ε from (1.6), which may not begin at zero, Since the law of a + Θ is equivalent to the law of Θ for any a ∈ R, we use the additional Gaussian smoothing by Θ in (1.8).
Proof of Theorem 2.1. Given a bounded measurable function Φ : where the third equality in (2.4) follows from a simple change of variable y = g(0) + z. It is clear that Φ is also a real bounded measurable function on C([0, T ]; R d ). Then, we deduce from the independence between B and Θ, the Cameron-Martin theorem, and (2.4) that with g * = g − g (0), which is exactly the equality (2.2).
With the above change of measure formula, we are ready to prove Lemma 1.11.
Proof of Lemma 1.11. To establish the measurability of h ε,δ , note that x → x ε is continuous (and hence measurable with respect to the Skorokhod topology) and then that (x, y) → h(x ε + δy) is measurable with respect to the product topology. Therefore, h ε,δ is also measurable. We first give a formal computation to indicate where the formulas below come from. Note that for w ∈ D, w ε is absolutely continuous from [0, T ] to R d and where the function w(• + ε) is defined according to the convention (1.7). Thus, we can apply the formula (2.2) to write where Ψ(w) =: Ψ Θ (w) + Ψ B (w) is a random element given by such that the random variable e Ψ(w) has mean one and finite moments of all order, for any w ∈ D. Now, formally, we ought to have and then D k exp(Ψ(w))[x 1 , . . . , x k ] can be understood as exp(Ψ(w)) times a polynomial of the derivatives of Ψ(w), motivated by the Faà di Bruno's formula. Looking at the expression 5 with v ε defined according to (1.6), we can deduce from (2.5) that (2.9) And, it is also easy to see that (2.10) and these higher derivatives are no longer random. The above discussion, together with the Faà di Bruno's formula, leads to the following claim: (2.11) where • P n,2 is the set of all partitions of {1, ..., n}, whose blocks have at most 2 elements; • b ∈ π means that b is a block of π, whose cardinality is denoted by |b|; .., x i |b| ]; see (2.9) and (2.10).
Let us first verify the claim (2.11) for n = 1: (2.12) By (1.8), (2.6), and (2.7), we can write for w, z ∈ D, (2.13) Now, we deduce from (2.8), (2.9), and (2.5) that (2.14) It is thus straightforward to see from Taylor's expansion and simple Gaussian computations that Here and in what follows, the big-O is to be understood in the L p (Ω)-sense; for example, U(z) = O( z ) means that U(z) is a random variable that depends on z, and (E|U(z)| p ) 1/p = O( z ), in the usual sense of O, for any p ∈ [2, ∞). Therefore, the equality in (2.12) follows from (2.13) and (2.15). That is, the claim (2.11) is verified for n = 1. For later use, let us first recall from (2.9)-(2.10) that D 2 Ψ(w)[x, y] does not depend on w and that (2.16) Next, we assume that the formula (2.11) holds for all n k, and we want to show that (2.11) holds for n = k + 1. For z ∈ D([0, 1]; R d ), (2.17) Let us deal with the above two sums now.
(i) The sum in (2.17) can be rewritten as which is a consequence of (2.15), (2.16), and the fact that DΨ(w)[z] = O( z ).
(ii) In (2.18), the expectation vanishes if π ∈ P k,2 is a partition with all blocks having exactly 2 elements, since D 2 Ψ(y) does not depend on y. Suppose now that the partition π ∈ P k,2 contains ℓ blocks with exactly one element, for some ℓ ∈ {1, ..., k} (say the blocks {1}, ..., {ℓ}). Then, it follows from (2.16) that As a consequence, we can write the second sum (2.18) as where P ′ k+1,2 is the set of all partitions of {1, ..., k, k + 1} (x k+1 = z) whose blocks have at most 2 elements, and such that k + 1 (that corresponds to z) belongs to a block of size 2. Since the first term (2.19) accounts for the partitions where k + 1 is in a block of size 1, we have just established formula (2.11) for n = k + 1, Hence by mathematical induction, the formula (2.11) holds true for all n 1. In particular, we prove that the function h ε,δ has infinitely many bounded Fréchet derivatives.
It remains to prove the bounds on the derivatives given in (1.41), (1.42), (1.43), (1.44), and (1.45). Since the random variable e Ψ(w) in (2.11) is not uniformly bounded in L p (Ω) for p > 1, which makes a direct proof of the bounds more awkward, we undo the change of measure, and work with an equivalent version of (2.11): y] as in (2.10). (2.21) The equivalence between (2.20) and (2.11) essentially follows from the Cameron-Martin formula. However, there is a stochastic integral with respect to B − δ −1 w ε in DΨ (which should become an integral with respect to B under the change of measure, leading to DΨ), and so some additional justification may be desired. We provide a proof in the Appendix A. Let us now apply the formula (2.20) to establish the bounds (1.41), (1.42), (1.43), (1.44), and (1.45). Without loss of generality, we assume that |h(y)| 1 for any y ∈ D. Let us first prove the bound (1.41). Using (2.14), (2.5), and (2.21), we have (2.23) Therefore, we can deduce from (2.22), (2.23), and (2.20) that Thus, the bound (1.41) is proved, where we can choose the constant C k to be E |G| k × card(P k,2 ) with G ∼ N (0, 1), and hence C 1 = 2/π, and C 2 = 3. (2.25) Next, we prove the bound (1.42). Using (2.24) directly yields that sup w∈D Dh ε,δ (w) 1 εδ √ T + ε 2 . (2.26) For k = 2, 3, we can do better than simply applying (2.24). We first deduce from the formula (2.20) with n = 2, and (2.21) that ∇y ε (s) dB(s) + Θ, y ε (0) . (2.27) From this, and because |h(y)| 1 for all y ∈ D, it easily follows from Cauchy-Schwarz that (2.28) Now, for (U, V ) bivariate normal with mean zero, variance 1 and correlation ρ, Var(UV ) = 1 + ρ 2 2. Hence, and since, from (2.14) and (2.5), for any z ∈ D, we have (2.29) it follows that (2.30) For the Lipschitz constant of the second derivative, we claim that (2.31) Using the definition of the derivative, (2.20) (in which DΨ(w) does not depend on w), (2.22), and (2.23), we have Next, we show the bound (1.44), under the additional assumption that D n h < ∞. First, we note that the sum inside the expectation in (2.20) (2.33) does not depend on w. We now show by induction that, for 0 r n, where z j,ε = (z j ) ε is defined according to (1.6), the case r = 0 being just (2.20). Then, assuming that (2.34) is true for r, (2.35) For r < n, since D n h < ∞, is bounded by a polynomial of degree n−r −1 in δ B +Θ , and hence its product with |T k | is integrable 6 , by Fernique's theorem (see [Bogachev, 1998, Theorem 2.8.5]). Hence, we deduce from the dominated convergence theorem that (2.36) establishing (2.34) for r + 1 also. The bound (1.44) follows from (2.34) with r = n, using (2.14) and (2.5), as in proving (2.24). Finally, we point out that the inequality (1.45) follows from (2.34) with k = r = 1, (2.22), and the fact that DΨ(w)[x] is Gaussian: with G ∼ N (0, 1), which concludes our proof. Now we present the proof of Theorem 1.1.
Since E B [0,T ] T 1/2 E B [0,1] and E|Θ| √ d, the above bounds lead us to the desired estimate (1.15). Hence, the proof is completed.
The rest of this section is devoted to the proofs of Lemma 1.4 and Lemma 1.6.
Then, defining R := R n,ε := ⌈log 2 (nε)⌉, we first establish that, for s u s + ε, (2.42) The argument to show (2.42) is based on the following two observations. First, the triangle inequality can be used to bound the minimal change in the value of y when going from an argument of the form x j,r to one of the form x j ′ ,r−1 , which is no more than max 1 j<2 r δ jr . Secondly, the change when going from any value in [s, s+ε] to the next smaller value s+j2 −R ε is bounded by 2 max 0 j<nT δ * jn . As a result of these observations, for any s u s + ε, there is a path from u to x j 0 ,0 of the form (u, x j R ,R , x j R−1 ,R−1 , . . . , x j 1 ,1 , x j 0 ,0 ), where x j 0 ,0 ∈ {s, s + ε}, along which the value of y changes in total by no more than We call such a path (u, x j R ,R , . . . , x j 1 ,1 , x j 0 ,0 ) admissible. Then, if J denotes the maximal value of j such that there is an admissible path from x j,R to s with |y(x j,R ) − y(s)| K. It is immediate that |y(x j,R ) − y(s)| 2K for all 0 j J, because an admissible path from x j,R to s + ε has to cross an admissible path from x J,R to s in this case and can be modified to follow the admissible path from x J,R to s thereafter. For each j > J, we can find an admissible path Γ j from x j,R to s + ε.
• If Γ j crosses the admissible path from x J,R to s, then from the triangle inequality it follows immediately that y(x j,R ) − y(s) 2K.
Proof. First, recalling the definitions of the derivatives of Ψ in (2.9) and (2.10), we write RHS of (2.11) = where we used the fact that D 2 Ψ(w)[x, y] is deterministic. Suppose b = {i 1 , i 2 , ..., i ℓ }. Note that the Wiener integral is not pathwise defined, so it prevents us from applying the change of measure for h(δB + δΘ)DΨ(w) [x]. However, we can proceed by an approximation argument.
(i) For each j ∈ {1, ..., ℓ}, one can find a sequence of uniformly bounded piecewise constant functions {F j,n : n 1} such that F j,n −δ −1 ∇(x i j ) ε → 0 as n tends to infinity; denote the time instants at which the function F j,n jumps by t j,n k , 1 k N j,n . (ii) By dominated convergence and Ito isometry for Wiener integral, the following L p (Ω)convergence of Gaussians holds: N j,n k=1 F j,n (t j,n k ) B t j,n k+1 − B t j,n k L p (Ω) for any finite p 1.
Then, defining w * ε (s) := w ε (s)−w ε (0) and writing Π t v = v(t) for the canonical evaluation map of v ∈ C([0, T ]; R d ) at time t, and recalling the definitions of Ψ B (w) and Ψ Θ (w) in (2.7), the expectation in (A.1) can be rewritten (noting that Ψ B (w) = Ψ B (w * )) as 2) The first and third equalities in (A.2) follow from (ii), and the second follows by applying the Cameron-Martin change of measure formula for the Brownian motion B with respect to B + δ −1 w * ε . The final equality in (A.2) follows from the same change of measure as in (2.4). Therefore, we have verified the equivalence between (2.20) and (2.11).