Donsker's theorem in {Wasserstein}-1 distance

We compute the Wassertein-1 (or Kolmogorov-Rubinstein) distance between a random walk in $R^d$ and the Brownian motion. The proof is based on a new estimate of the Lipschitz modulus of the solution of the Stein's equation. As an application, we can evaluate the rate of convergence towards the local time at 0 of the Brownian motion.


Motivations
For a complete, separable metric space X, the topology of convergence in distribution is metrizable [8] by considering the so-called Kolmogorov-Rubinstein or Wasserstein-1 distance: The formulation (1) is well suited to evaluate distance by the Stein's method. When X = R, there is no particular difficulty to evaluate the K-R distance when µ is the Gaussian distribution. When, X = R d , it is only recently (see [9,12,15] and references therein) that some improvement of the standard Stein's method has been proposed to get the K-R distance to the Gaussian measure on R d . The bottleneck is the estimate of the Lipschitz modulus of the second order derivative of the solution of the Stein's equation when F is only assumed to be Lipschitz continuous. Namely, for f : R d → R, for any t > 0, consider the function where µ d is the standard Gaussian measure on R d . In dimension 1, the Stein's equation reads as and the subsequent computations require to evaluate only the Lipschitz modulus of h ′ . For f ∈ L 1 (µ), it is classical to see that P t f is infinitely differentiable and that where H k is the k-th Hermite polynomial. On the other hand, if f is k-times differentiable, we have (4) (P t f ) (k) = e −kt P t (f (k) ).
According to (3), we get It is apparent that the Lipschitz modulus of h ′ simply depends on the Lipschitz modulus of f . However, in higher dimension, the Stein's equation becomes whose solution is formally given by (2). The form of (5) entails that we need to estimate the Lipschitz modulus of ∆h, which requires to use (3) for k = 2. Unfortunately, we have to realize that Hence, until the very recent papers [9,15], the strategy was to assume that ∇f is Lipschitz, apply once (4) to compute the first derivative of P t f and then apply (3) to this expression: This means that instead of computing the supremum in the right-hand-side of (1), over Lipschitz functions, it is computed over functions whose first derivative is Lipschitz. This also defines a distance, which does not change the induced topology but the accuracy of the bound is degraded. In infinite dimension, a new problem arises which is best explained by going back to the roots of the Stein's method in dimension 1. Consider that we want to estimate the K-R distance in the standard Central Limit Theorem. Let (X n , n ≥ 1) be a sequence of independent, identically distributed random variables with E [X] = 0 and E X 2 = 1. Let T n = n −1/2 n j=1 X j . The Stein-Dirichlet representation formula [6] states that with obvious notations. Now, The trick, which amounts to an integration by parts for a Malliavin structure on independent random variables (see [7]), is to write in view of the independence of the random variables. Then, we use the fundamental theorem of calculus in this expression around the point T ¬j n = T n − X j / √ n: Since, This formula confirms that the crux of the matter is now to estimate uniformly the Lipschitz modulus of (P t f ) ′′ . It also shows how we get the order of convergence.
We have one occurrence of n −1/2 in the definition of T n , which appears in the expression of L 1 . The same factor appears a second time when we proceed to the Taylor expansion and then, it will appear a third time when we plug (3) into (7). This means that we have a factor n −3/2 which is summed up n times, hence the rate of convergence which is known to be n −1/2 . Now, if we are interested in the Donsker theorem, the process whose limit we would like to assess is For reasons that will be explained below, the analog of the second order derivatives will involve where ∇ is the Malliavin derivative, I 1,2 is the Cameron-Martin space Recall that in the context of Malliavin calculus, this space is identified to its dual which means that the dual of L 2 is not itself. The difficulty is then that we do not have a n −1/2 factor in the definition of S n and it is easily seen that h n j I1,2 = 1, hence no multiplicative factor will pop up in (8). In [4], we bypassed this difficulty by assuming enough regularity of f so that ∇ (2) P t f belong to the dual of L 2 . Then, in the estimate of terms as those appearing in (8), it is the L 2 -norm of h n j which appears and it turns out that h n j L 2 ≤ c n −1/2 , hence the presence of a factor n −1 , which saves the proof.
The goal of this paper is to weaken the hypothesis on f to be able to upperbound the true K-R distance between the distribution of S n and the distribution of a Brownian motion, that is The space X is a Banach space we can choose arbitrarily as far as it can be equipped with the structure of an abstract Wiener space and it contains the sample paths of S n and B.
The main technical result of this article is Theorem 4.4 which gives a new estimate of the Lipschitz modulus of ∇ (2) P t f for t > 0. The main idea is to introduce a hierarchy of approximations. There is a first scale induced by the time discretization coming from the definition of S n . Then, we consider a coarser discretization onto which we project our approximations in order to benefit from the averaging effect of the ordinary CLT. It turns out that the optimal ratio is obtained when the mesh of the coarser subdivision is roughly the cubic root of the mesh of the reference partition. Moreover, after [3] and [4], we are convinced that it is simpler and as efficient to stick to finite dimension as long as possible. For, we consider the affine interpolation of the Brownian motion as an intermediary process. The distance between the Brownian sample-paths and their affine interpolation is well known. This reduces the problem to estimate the distance between S n and the affine interpolation of B, a task which can be handled by the Stein's method. It turns out that the bottleneck is in fact the rate of convergence of the Brownian interpolation to the Brownian motion.
This paper is organized as follows. In Section 2, we show how to view fractional Sobolev spaces as Wiener spaces. In Section 3, we explain the line of thoughts we used. The proofs are given in Section 4.

Preliminaries
2.1. Fractional Sobolev spaces. As in [5,11], we consider the fractional Sobolev spaces W η,p defined for η ∈ (0, 1) and p ≥ 1 as the the closure of C 1 functions with respect to the norm For η = 1, W 1,p is the completion of C 1 for the norm: They are known to be Banach spaces and to satisfy the Sobolev embeddings [1,10]: As a consequence, since W 1,p is separable (see [2]), so does W η,p . We need to compute the W η,p norm of primitive of step functions.
There exists c > 0 such that for any s 1 , s 2 , we have Proof. Remark that for any s, t ∈ [0, 1], The result then follows from the definition of the W η,p norm.
We denote by W 0,∞ the space of continuous (hence bounded) functions on [0, 1] equipped with the uniform norm.
2.2. Fractional spaces W η,p as Wiener spaces. Let In what follows, we always choose η and p in Λ. Consider (Z n , n ≥ 1) a sequence of independent, standard Gaussian random variables and let (z n , n ≥ 1) be a complete orthonormal basis of I 1,2 . Then, we know from [13] that where B is a Brownian motion. We clearly have the diagram where e η,p is the embedding from I 1,2 into W η,p . The space I 1,2 is dense in W η,p since polynomials do belong to I 1,2 . Moreover, Eqn. (10) and the Parseval identity entail that for any z ∈ W * , We denote by µ η,p the law of B on W η,p . Then, the diagram (11) and the identity (12) mean that (I 1,2 , W η,p , µ η,p ) is a Wiener space.
Definition 2.1 (Wiener integral). The Wiener integral, denoted as δ η,p , is the isometric extension of the map . Definition 2.2 (Ornstein-Uhlenbeck semi-group). For any Lipschitz function on W η,p , for any τ ≥ 0, The dominated convergence theorem entails that P τ is ergodic: For any x ∈ W η,p , with probability 1,

Moreover, the invariance by rotation of Gaussian measures implies that
Otherwise stated, the Gaussian measure on W η,p is the invariant and stationary measure of the semi-group P = (P τ , τ ≥ 0). For details on the Malliavin gradient, we refer to [14,17].
where for any j ∈ {1, · · · , k}, f j belongs to the Schwartz space on R k , (h 1 , · · · , h k ) are elements of I 1,2 and (x 1 , · · · , x k ) belong to X. The set of such functions is denoted by C(X).
For h ∈ I 1,2 , which is equivalent to say It is proved in [16,Theorem 4.8] that where (h n , n ≥ 1) is complete orthonormal basis of H.
Note that a non trivial part of this theorem is to prove that the terms are meaningful: that ∇P t f has values in W * η,p instead of I 1,2 and that ∇ (2) P t f (x) is trace-class. Actually, we only need a finite dimensional version of this identity in which all these difficulties do not appear. Consider where (X a , a ∈ A m ) is a family of independent identically distributed, R d -valued, random variables. We denote by X a random variable which has their common distribution. Moreover, we assume that E [X] = 0 and E X 2 For any m > 0, the map π m is the orthogonal projection from H := I d 1,2 onto V m . Let 0 < N < m, for f ∈ Lip 1 (W η,p ), we write where where B m is the affine interpolation of the Brownian motion: The two terms A 1 and A 3 are of the same nature: We have to compare two processes which live on the same probability space. Since f is Lipschitz, we can proceed by comparison of their sample-paths. The term A 2 is different as the two processes involved live on different probability spaces. This is for this term that the Stein's method will be used. We know from [11] that Theorem 3.1. For any (η, p) ∈ Λ, there exists c > 0 such that Moreover, we have This upper-bound is far from being optimal and it is likely that it could be improved to obtain a factor N 1−η . However, in view of (15), it would bring no improvement to our final result. Theorem 3.3. Let (η, p) ∈ Λ. Let X a belong to L p (W ; R d , µ η,p ) for some p ≥ 3. Then, there exists c > 0 such that for any f ∈ Lip 1 (W η,p ), The global upper-bound for (14) is proportional to See N as a function of m and note that this expression is minimal for N ∼ m 1/3 . Plug this into the previous expressions to obtain the main result of this paper: Theorem 3.4. Assume that X ∈ L p (W ; R d , µ η,p ). Then, there exists a constant c > 0 such that As an application of the previous considerations, we obtain as a corollary an approximation theorem for the local time of the Brownian motion.
The reflected Brownian motion is defined as and the reflected linear interpolation of random walk is The process L 0 (t) := sup 0≤s≤t max (0, −B s ) is an expression of the local time of the Brownian motion at 0. Note that the map f → t → f (t) + sup 0≤s≤t max (0, −f (s)) is Lipschitz continuous from any W η,p into W 0,∞ . One of the interest of our new result is that we can then apply the previous theorem in W 0,∞ to L m 0 and L 0 . We get Corollary 3.5. Assume that the hypothesis of Theorem 3.4 hold. There exists a constant c > 0 such that

Proofs
In what follows, c denote a non significant constant which may vary from line to line. We borrow from the current usage in rough path theory the notation As a preparation to the proof of Theorem 3.2, we need the following lemma.
Lemma 4.1. For all p ≥ 2, there exists a constant c p such that for any sequence of independent, identically distributed random variables (X i , i ∈ N) with X ∈ L p and any sequence (α i , i ∈ N).
where |A| is the cardinality of the set A.
Proof. The Burkholder-Davis Gundy inequality applied to the discrete martingale ( Using Jensen inequality we obtain The proof is thus complete.
Proof of Theorem 3.2. Actually, we already proved in [4] that Assume that s and t belongs to the same sub-interval: There exists l ∈ {1, ..., N } such that Using Lemma 4.1, there exists a constant c such that Note that |(h m k , h N l ) I1,2 | ≤ N m and there is at most m N + 2 terms such that (h m k , h l N ) I1,2 is non zero. Thus, as m/N tends to infinity. Since |t − s| ≤ 1/N , For 0 ≤ s ≤ t ≤ 1 let s N + := min{l, s ≤ l N } and t N − := sup{l, t ≥ l N }. We have Note that for all f ∈ W η,p , π N (f ) is the linear interpolation of f along the subdivision D N ; hence, for s, t ∈ D N , π N (S m ) s,t = S m s,t . Thus the median term vanishes and we obtain From (20), we deduce that A straightforward computation shows that (24) The result follows (23) and (24).

Stein method.
We wish to estimate using the Stein's method. For the sake of simplicity, we set The Stein-Dirichlet representation formula [6] stands that, for any τ 0 > 0, where It is straightforward (see [4,Lemma 4.1]): For any (η, p) ∈ Λ, there exists a constant c > 0 such that for any sequence of independent, centered random vectors (X a , a ∈ A m ) such that We now show, that as usual, the rate of convergence in the Stein's method is related to the Lipschitz modulus of the second order derivative of the solution of the Stein's equation. Namely, we have Since the X a 's are independent, according to the Taylor formula. Since E X 2 a = 1, we have The result follows by difference.
The main difficulty and then the main contribution of this paper is to find an estimate of for any ε.
Theorem 4.4. There exists a constant c such that for any τ > 0, for any v ∈ V m , for any f ∈ Lip(W η,p ), Proof of Theorem 4.4. We know from [16,4] that we have the following representation: for any h ∈ I 1,2 , andB is an independent copy of B. Since the map v is linear with respect to its three arguments, Hence, From Lemma 4.7, we know that (28) Var E δ η,p h(B) | π NBm ≤ c N m for m > 8 N , and the same holds for the other conditional expectation. Use Cauchy-Schwarz inequality in (27) and take (28) into account to obtain We already know that Thus, Plug estimation (30) into estimation (29) yields estimate (25).
According to (25) and Lemma 4.3, since the cardinality of A m is dm, we obtain the following theorem.
Theorem 4.5. If X a belongs to L p , for any τ 0 > 0, there exists c > 0 such that If we combine Lemma 4.2 and (31), we get Optimizing with respect to τ 0 yields Theorem 3.3.
It remains to prove (28). For the sake of simplicity, we give the proof for d = 1. The general situation is similar but with more involved notations.
We recall that , b = 0, · · · , N − 1) is invertible and satisfies Proof. Since the h m a are orthogonal in L 2 , for any b, c ∈ {0, · · · , N − 1}, Since a sub-interval of D m intersects at most two sub-intervals of D N , the matrix Γ is tridiagonal. Furthermore, we know that and for each b, there are at least (N/m − 3) terms of this kind which are equal to (N/m) −1/2 . Hence, Since Γ is tridiagonal, this implies that it is invertible. Moreover, let D be the diagonal matrix extracted from Γ. We have proved that D ∞ ≥ 3/4. For |b − c| = 1, there is at most one term of the sum (34) which yields a non zero scalar product, hence The matrix D −1 S has at most two non null entries and if m > 8N . By iteration, we get for any k ≥ 1, Thus, The proof is thus complete.