Optimal transport bounds between the time-marginals of a multidimensional diffusion and its Euler scheme

In this paper, we prove that the time supremum of the Wasserstein distance between the time-marginals of a uniformly elliptic multidimensional diffusion with coefficients bounded together with their derivatives up to the order $2$ in the spatial variables and H{\"o}lder continuous with exponent $\gamma$ with respect to the time variable and its Euler scheme with $N$ uniform time-steps is smaller than $C \left(1+\mathbf{1}\_{\gamma=1} \sqrt{\ln(N)}\right)N^{-\gamma}$. To do so, we use the theory of optimal transport. More precisely, we investigate how to apply the theory by Ambrosio, Gigli and Savar{\'e} to compute the time derivative of the Wasserstein distance between the time-marginals. We deduce a stability inequality for the Wasserstein distance which finally leads to the desired estimation.


Introduction
Consider the R d -valued Stochastic Differential Equation (SDE) : In what follows, σ and b will be assumed to be Lispchitz continuous in the spatial variable uniformly for t ∈ [0, T ] and such that sup t∈[0,T ] (|σ(t, 0)| + |b(t, 0)|) < +∞ so that trajectorial existence and uniqueness hold for this SDE.
We now introduce the Euler scheme. To do so, we consider for N ∈ N * the regular time grid t i = iT N . We define the continuous time Euler scheme by the following induction for i ∈ {0, . . . , N − 1} : withX t 0 = x 0 . By setting τ t = ⌊ N t T ⌋ T N , we can also write the Euler scheme as an Itô process : The goal of this paper is to study the Wasserstein distance between the laws L(X t ) and L(X t ) of X t andX t . We first recall the definition of the Wasserstein distance. Let µ and ν denote two probability measures on R d and ρ ≥ 1. The ρ-Wasserstein distance between µ and ν is defined by where Π(µ, ν) is the set of probability measures on R d × R d with respective marginals µ and ν.
In this paper, we will work with the Euclidean norm on R d , i.e.
We are interested in sup t∈[0,T ] W ρ (L(X t ), L(X t )). Thanks to the Kantorovitch duality (see Corollary 2.5.2 in Rachev and Rüschendorf [15]), we know that for t ∈ [0, T ], where . From the weak error expansion given by Talay and Tubaro [17] when the coefficients are smooth enough, we deduce that W 1 (L(X T ), L(X T )) ≥ C N for some constant C > 0. Since, by Hölder's inequality, ρ → W ρ is non-decreasing, we cannot therefore hope the order of convergence of sup t∈[0,T ] W ρ (L(X t ), L(X t )) to be better than one. On the other hand, as remarked by Sbai [16], a result of Gobet and Labart [10] supposing uniform ellipticity and some regularity on σ and b that will be made precise below implies that sup t∈[0,T ] W 1 (L(X t ), L(X t )) ≤ C N .
In a recent paper [1], we proved that in dimension d = 1, under uniform ellipticity and for coefficients b and σ time-homogeneous, bounded together with their derivatives up to the order 4, one has sup t∈[0,T ] W ρ (L(X t ), L(X t )) ≤ C ln(N ) N (1.5) for any ρ > 1. For the proof, we used that in dimension one, the optimal coupling measure π between the measures µ and ν in the definition (1.4) of the Wasserstein distance is explicitly given by the inverse transform sampling: π is the image of the Lebesgue measure on [0, 1] by the couple of pseudo-inverses of the cumulative distribution functions of µ and ν. Our main result in the present paper is the generalization of (1.5) to any dimension d when the coefficients b and σ are time-homogeneous C 2 , bounded together with their derivatives up to the order 2 and uniform ellipticity holds. We also generalize the analysis to time-dependent coefficients b and σ Hölder continuous with exponent γ in the time variable. For γ ∈ (0, 1), the rate of convergence worsens i.e. the right-hand side of (1.5) becomes C N γ whereas it is preserved in the Lipschitz case γ = 1. These results are stated in Section 2 together with the remark that the choice of a non-uniform time grid refined near the origin for the Euler scheme permits to get rid of the ln(N ) term in the numerator in the case γ = 1. To our knowledge, they provide a new estimation of the weak error of the Euler scheme when the coefficients b and σ are only Hölder continuous in the time variable. The main difficulty to prove them is that, in contrast with the one-dimensional case, the optimal coupling between L(X t ) and L(X t ) is only characterized in an abstract way. We want to apply the theory by Ambrosio et al. [2] to compute the time derivative d dt W ρ ρ (L(X t ), L(X t )). To do so, we have to interpret the Fokker-Planck equations giving the time derivatives of the densities of X t andX t with respect to the Lebesgue measure as transport equations : the contribution of the Brownian term has to be written in the same way as the one of the drift term. This requires some regularity properties of the densities. In Section 3, we give a heuristic proof of our main result without caring about these regularity properties. This allows us to present in a heuristic and pedagogical way the main arguments, and to introduce the notations related to the optimal transport theory. In the obtained expression for d dt W ρ ρ (L(X t ), L(X t )), it turns out that, somehow because of the first order optimality condition on the optimal transport maps at time t, the derivatives of these maps with respect to the time variable do not appear (see Equation (3.11) below). The contribution of the drift term is similar to the one that we would obtain when computing d dt E(|X t −X t | ρ ) i.e. when working with the natural coupling between the SDE (1.1) and its Euler scheme. To be able to deal with the contribution of the Brownian term, we first have to perform a spatial integration by parts. Then the uniform ellipticity condition enables us to apply a key lemma on pseudo-distances between matrices to see that this contribution is better behaved than the corresponding one in d dt E(|X t −X t | ρ ) and derive a stability inequality for W ρ ρ (L(X t ), L(X t )) analogous to the one obtained in dimension d = 1 in [1]. Like in this paper, we conclude the heuristic proof by a Gronwall's type argument using estimations based on Malliavin calculus. In [1], our main motivation was to analyze the Wasserstein distance between the pathwise laws L((X t ) t∈[0,T ] ) and L((X t ) t∈[0,T ] ). This gives then an upper bound of the error made when one approximates the expectation of a pathwise functional of the diffusion by the corresponding one computed with the Euler scheme. We were able to deduce from the upper bound on the Wasserstein distance between the marginal laws that the pathwise Wasserstein distance is upper bounded by CN −2/3+ε , for any ε > 0. This improves the N −1/2 rate given by the strong error analysis by Kanagawa [12]. To do so, we established using the Lamperti transform some key stability result for one-dimensional diffusion bridges in terms of the couple of initial and terminal positions. So far, we have not been able to generalize this stability result to higher dimensions. Nevertheless, our main result can be seen as a first step in order to improve the estimation of the pathwise Wasserstein distance deduced from the strong error analysis.
In Section 4, we give a rigorous proof of the main result. The theory of Ambrosio et al. [2] has been recently applied to Fokker-Planck equations associated with linear SDEs and SDEs nonlinear in the sense of McKean by Bolley et al. [3,4] in the particular case σ = I d of an additive noise and for the quadratic Wasserstein distance ρ = 2 to study the long-time behavior of their solutions. In the present paper, we want to estimate the error introduced by a discretization scheme on a finite time-horizon with a general exponent ρ and a non-constant diffusion matrix σ. It turns out that, due to the local Gaussian behavior of the Euler scheme on each time-step, it is easier to apply the theory of Ambrosio et al. [2] to this scheme than to the limiting SDE (1.1). The justification of the spatial integration by parts performed on the Brownian contribution in the time derivative of the Wasserstein distance is also easier for the Euler scheme. That is why introduce a second Euler scheme with time step T /M and estimate the Wasserstein distance between the marginal laws of the two Euler schemes. We conclude the proof by letting M → ∞ in this estimation thanks to the lower-semicontinuity of the Wasserstein distance with respect to the narrow convergence. The computation of the time derivative of the Wasserstein distance between the time-marginals of two Euler schemes can be seen as a first step to justify the formal expression of the time derivative of the Wasserstein distance between the time-marginals of the two limiting SDEs. We plan to investigate this problem in a future work. Section 5 is devoted to technical lemmas including the already mentioned key lemma on the pseudo-distances between matrices and estimations based on Malliavin calculus.

Notations
• Unless explicitly stated, vectors are consider as column vectors.
• The set of real d × d matrices is denoted by M d (R). • For n ∈ N, we introduce .
• For f : R d → R d , we denote by ∇f the Jacobian matrix (∂ x i f j ) 1≤i,j≤d and by ∇ * f its transpose.
, the partial gradient of f with respect to its d last variables.
• For two density functions p andp on R d , if there is a measurable function f : R d → R d such that the image of the probability measure p(x)dx by f admits the densityp, we write p#f =p.

The main result
Our main result is the following theorem.
where C is a positive constant that only depends on ρ, a, ( ∂ α a ∞ , ∂ α b ∞ , 0 ≤ |α| ≤ 2), and the coefficients K, q involved in the γ-Hölder time regularity of a and b. In particular C does not depend on the initial condition x 0 ∈ R.
Remark 2.2 Under the assumptions of Theorem 2.1 with γ = 1, by discretizing the SDE (1.1) with the Euler scheme on the non-uniform time grids refined near the origin t i = ( i N ) β T 0≤i≤N with β > 1, one gets rid of the ln(N ) term in the numerator : For γ < 1, the choice of such non-uniform time grids does not lead to an improvement of the convergence rate in (2.1). For more details, see Remark 3.2 below.
To our knowledge, Theorem 2.1 is a new result concerning the weak error of the Euler scheme, for coefficients σ, b only γ-Hölder continuous in the time variable with γ < 1. For γ = 1, as remarked by Sbai [16], a result of Gobet and Labart [10] supposing uniform ellipticity and that Compared to this result, we have a slightly less accurate upper bound due to the ln(N ) term, but Theorem 2.1 requires slightly less assumptions on the diffusion coefficients and most importantly concerns any ρ-Wasserstein distance. Using Hölder's inequality and the well-known boundedness of the moments of both X t andX t for t ∈ [0, T ], one deduces that Remark 2. 4 We have stated Theorem 2.1 under assumptions that lead to a constant C that does not depend on the initial condition x 0 . This is a nice feature that we used in [1] to bound the Wasserstein distance between the pathwise laws L((X t ) t∈[0,T ] ) and L((X t ) t∈[0,T ] ) from above. However, Theorem 2.1 still holds with a constant C depending in addition on x 0 if we relax the assumptions on b and σ as follows: • b and σ are globally Lipschitz with respect to x, i.e.
• b and σ are twice continuously differentiable in x and γ-Hölder in time, and such that we have the following polynomial growth • a(t, x) = σ(t, x)σ(t, x) * is uniformly elliptic.
Since by Hölder's inequality, ρ → W ρ is non-increasing, it is sufficient to prove Theorem 2.1 for ρ large enough. In fact, we will assume through the rest of the article without loss of generality that ρ ≥ 2. The main reason for this assumption is that the function R d × R d ∋ (x, y) → |x− y| ρ , which appears in the definition (1.4) of W ρ , becomes globally C 2 . This will be convenient when studying the second order optimality condition. Furthermore, note that by the uniform ellipticity and regularity assumptions in Theorem 2.1, for t ∈ (0, T ], X t andX t admit densities respectively denoted by p t andp t with respect to the Lebesgue measure. By a slight abuse of notation, we still denote by W ρ (p t ,p t ) the ρ-Wasserstein distance between the probability measures p t (x)dx andp t (x)dx on R d .

Heuristic proof of the main result
The heuristic proof of Theorem 2.1 is structured as follows. First, we recall some optimal transport results about the Wasserstein distance and its associated optimal coupling, and we make some simplifying assumptions on the optimal transport maps that will be removed in the rigorous proof. Then, we can heuristically calculate d dt W ρ ρ (p t ,p t ), and get a sharp upper bound for this quantity. Last, we use a Gronwall's type argument to conclude the heuristic proof.

Preliminaries on the optimal transport for the Wasserstein distance
We introduce some notations that are rather standard in the theory of optimal transport (see [2,15,18]) and which will be useful to characterize the optimal coupling for the ρ-Wasserstein distance. We will say that a function ψ : In this case, we know from Proposition 3.3.5 of Rachev and Rüschendorf [15] that (3.1) We equivalently have, This result can be seen as an extension of the well-known Fenchel-Legendre duality for convex functions which corresponds to the case ρ = 2. We then introduce the ρ-subdifferentials of these functions. These are the sets defined by Let t ∈ [0, T ]. According to Theorem 3.3.11 of Rachev and Rüschendorf [15], we know that there is a couple (ξ t ,ξ t ) of random variables with respective densities p t andp t which attains the ρ-Wasserstein distance : . Such a couple is called an optimal coupling for the Wasserstein distance. Besides, there exist two ρ-convex function ψ t andψ t satisfying the duality property (3.1) and such that ξ t ∈ ∂ ρ ψ t (ξ t ) and ξ t ∈ ∂ ρψt (ξ t ), a.s.. Now that we have recalled this well known result of optimal transport, we can start our heuristic proof of Theorem 2.1. To do so, we will assume that the ρ-subdifferentials ∂ ρ ψ t (x) and ∂ ρψt (x) are non empty and single valued for any x ∈ R d , i.e.
The functions T t (x) andT t (x) depend on ρ but we do not state explicitly this dependence for notational simplicity. Now, we clearly have (3.5) Besides, we can write the Wasserstein distance as follows: Since on the one handξ t = T t (ξ t ) and ξ t =T t (ξ t ) almost surely, and on the other hand p t (x)p t (x) > 0 thanks to the uniform ellipticity assumption, dx a.e.,T t (T t (x)) = T t (T t (x)) = x. (3.7) In the remaining of Section 3, we will perform heuristic computations without caring about the actual smoothness of the functions ψ t ,ψ t , T t andT t . In particular, we suppose that where the two last equations are the first order Euler conditions of optimality in the minimization problems (3.2).

A formal computation of
We now make a heuristic differentiation of (3.6) with respect to t. A computation of the same kind for the case ρ = 2 and with identity diffusion matrix σ is given by Bolley et al. : see p.2437 and Remark 3.6 p.2445 in [3] or p.431 in [4].
where we used (3.9) for the second equality and (3.5) for the fourth. Since the image of the probability measure p t (x)dx by the map T t is the probability measurep t (x)dx, which we write asp Let us assume now that the following Fokker-Planck equations for the densities p t andp t hold in the classical sense 14) The first equation is the usual Fokker-Planck equation for the SDE (1.1). For the second one, we also use the result by Gyöngy [11] that ensures that the SDE with coefficientsb andā 1 2 has the same marginal laws as the Euler scheme. Now, plugging these equations in (3.11), we get by using integrations by parts and assuming that the boundary terms vanish. We now usē which is deduced from (3.8), (3.9) and (3.10), to get This formula looks very nice but due to the lack of regularity of ψ t andψ t , which are merely semiconvex functions, it is only likely to hold with the equality replaced by ≤ and the ∇ 2 ψ t and ∇ 2ψ t replaced by the respective Hessians in the sense of Alexandrov of ψ t andψ t . See Proposition 4.4 where such an inequality is proved rigorously for the Wassertein distance between the time marginals of two Euler schemes.

Derivation of a stability inequality for
In (3.16), the contribution of the drift terms only involves the optimal transport and is equal to To obtain this term, it was enough to use the first order optimality conditions (3.9) and (3.10).
To deal with the Hessians ∇ 2 ψ t and ∇ 2ψ t which appear in the contribution of the diffusion terms, we will need the associated second order optimality conditions. Differentiating (3.15) with respect to x, we get By symmetry and (3.8), By differentiation of (3.8), we get that In order to make the diffusion contribution of the same order as the drift one, we want to upper-bound the trace term by the square of a distance between a(t, x) andā(t, T t (x)). The key Lemma 5.2 permits to do so. To check that its hypotheses are satisfied, we remark that the second order optimality condition for (3.2) computed at y = T t (x) combined with (3.8) and (3.17) gives that is a positive semidefinite matrix. It is in fact positive since it is the product of two invertible matrices. We can then apply the key Lemma 5.
x) and M defined just above and get: Finally, using thatp t = T t #p t , we get gether with the assumptions on a and b to get that there is a constant C depending only on ρ, a and the spatial Lipschitz constants of a and b such that illustrates the difference between the weak error and the strong error analysis. To study the strong error between X t andX t , one would typically apply Itô's formula and take expectations to get (3.21) The diffusion contribution is very different from the one in (3.20) : indeed, the absence of conditional expectation in the quadratic factor (σ(t,X t )−σ(τ t ,X τt ))(σ(t,X t )−σ(τ t ,X τt )) * in the trace term does not permit cancellations like in As an aside remark, we see that when σ is constant, the diffusion contribution disappears in Equation (3.21) and is non-positive in Equation (3.16). In this case, can be upper bounded by C/N γ where γ denotes the Hölder exponent of the coefficient b in the time variable. For γ = 1, this leads to the improved bound sup t∈[0,T ] W ρ (p t ,p t ) ≤ C/N .

The argument based on Gronwall's lemma
Starting from (3.20), we can conclude by applying a rigorous Gronwall type argument, which is analogous to the one used in the one-dimensional case in [1]. For the sake of completeness, we nevertheless repeat these calculations since we consider here in addition coefficients which are not time-homogeneous but γ-Hölder continuous in time.
We set ζ ρ (t) = W 2 ρ (p t ,p t ) and define for any integer k ≥ 1, Since h k is C 1 and non-decreasing, we get from (3.20) and Hölder's inequality Since (h ′ k ) k≥1 is a non-decreasing sequence of functions that converges to x → 2 ρ x 2 ρ −1 as k → ∞, we get by the monotone convergence theorem and (3.14) Let us focus for example on the diffusion term. First, Now, we use Jensen's inequality together with the boundedness of b and the boundedness and Lipschitz property of By the boundedness of σ and b, one easily checks that With Lemma 5.5 and the spatial Lipschitz continuity of σ, we deduce that As a similar bound holds for the drift contribution, we finally get: and we obtain Theorem 2.1 by Gronwall's lemma.
Remark 3.2 In case γ = 1, choosing β > 1 and replacing the uniform time-grid by the grid t i = ( i N ) β T 0≤i≤N refined near the origin, one may take advantage of (3.24) which is still valid with the last discretization time τ t before t now equal to ⌊N Adapting the above argument based on Gronwall's lemma, one obtains the statement in Remark 2.2. Indeed, one has Expanding the term between square brackets in powers of 1/k, one easily checks that this term behaves like O(k −3 ). Now One concludes that We can conclude that (2.1) still holds with a constant C depending on x 0 by using that the moments of the Euler scheme are uniformly bounded i.e. 1. The ρ-subdifferentials ∂ ρ ψ t (x) and ∂ ρψt (x) are single valued.
2. The optimal transport and the densities p t andp t are smooth enough to get the time derivative of the Wasserstein distance (3.11).
4. The functions ψ t andψ t are smooth enough and the integration by parts leading to (3.16) are valid.
Let us now comment how we will manage to prove our main result without using these simplifying hypotheses. The first one was mainly used to get that the optimal transport maps are inverse functions (see (3.8) above). Still, the optimal transport theory will give us the existence of optimal transport maps that are inverse functions of each other.
The second point is more crucial and is related to the third. Let us assume that there are Borel vector and the so-called transport equations hold in the sense of distributions. This means that for any C ∞ function ϕ with compact support on (0, T ) × R d , .∇ϕ(t, x)) p t (x)dxdt = 0, and the same forp t . Then, it can be deduced from Ambrosio, Gigli and Savaré [2] that t → W ρ ρ (p t ,p t ) is absolutely continuous and such that dt a.e., For the details, see the second paragraph called "The time derivative of the Wasserstein distance" in Subsection 4.3.1.
Thus, it would be sufficient to show that the Fokker-Planck equations may be reformulated as the transport equations (4.2). Concerning p t , for the integrability condition (4.1) to be satisfied by the natural deduced from (3.12), one typically needs For ρ = 2, one may generalize the argument given by Bolley et al. p.2438 [3] in the particular case σ = I d . Using (3.12) and an integration by parts for the last equality, one obtains formally to deduce with the uniform ellipticity condition and the positivity of the relative entropy When a ∈ C 0,2 b (M d (R)) and b ∈ C 0,1 b (R d ) with spatial derivatives of respective orders 2 and 1 globally Hölder continuous in space, the Gaussian bounds for p t and ∇ x p t deduced from Theorems 4.5 and 4.7 in [9], ensure that the estimation (4.4) should hold for ρ = 2 as soon as the time integral is restricted to the interval [t 0 , T ] with t 0 > 0. To our knowledge, even with such a restriction of the time-interval, (4.4) is not available in the literature for ρ > 2.
In fact, we are going to replace the diffusion by another Euler schemeX with time step T /M and estimate the Wasserstein distance between the marginal laws of the two Euler schemes. We take advantage of the local Gaussian properties of the Euler scheme on each time-step to check that (4.4) holds when p t is replaced byp t and to get rid of the boundary terms when performing spatial integration by parts. Finally, we obtain an estimation of the Wasserstein distance between the marginal laws of the diffusion and the Euler scheme by letting M → ∞. Note that we need less spatial regularity on the coefficients σ and b than in Theorem 2.2 in [1] which directly estimates W ρ (p t ,p t ) in dimension d = 1 by using the optimal coupling given by the inverse transform sampling.  In what follows, we denote the probability density ofX t for t ∈ (0, T ] byp t and also set W ρ (p t ,p t ) = W ρ (L(X t ), L(X t ))) even for t = 0 when there is no density.
From the strong error estimate given by Kanagawa [12] in the Lipschitz case and Proposition 14 of Faure [7] for coefficients Hölder continuous in time (see also Theorem 4.1 in Yan [19]), we 0, and then deduce Theorem 2.1 from (4.5). Note that since the Wasserstein distance is lower semicontinuous with respect to the narrow convergence, the convergence in law ofX t towards X t would be enough to obtain the same conclusion.
Concerning the fourth simplifying hypothesis introduced at the beginning of Subsection 4.1, we see that the equation (4.3) given by the results of Ambrosio Gigli and Savaré already gives "for free" the first of the two spatial integrations by parts needed to deduce (3.16) from (3.11). We will not be able to prove the second integration by parts on the diffusion terms as in (3.16), but the regularity of the optimal transport maps is sufficient to get an inequality instead of the equality in (3.16) and to go on with the calculations.
The proof is structured as follows. First, we state the optimal transport results between the two Euler schemesX andX. Then, we show the Fokker-Planck equation for the Euler scheme and deduce an explicit expression for d dt W ρ (p t ,p t ). Next, we show how we can perform the integration by parts. Last, we put the pieces together and conclude the proof.

4.2
The optimal transport for the Wasserstein distance W ρ (p t ,p t ) From (1.2) and since σ does not vanish, it is clear that, for t > 0,X t andX t admit positive densitiesp t andp t with respect to the Lebesgue measure. By Theorem 6.2.4 of Ambrosio, Gigli and Savaré [2], for t ∈ (0, T ], there exist measurable optimal transport maps :T t ,T t : R d → R d such thatT t (X t ) andT t (X t ) have respective densities p t andp t and Moreover, the positivity of the densitiesp t andp t , combined with Theorem 3.3.11 and Remark 3.3.14 (b) of Rachev and Rüschendorf [15] ensure that dx a.e.,T t (x) ∈ ∂ ρψt (x) andT t (x) ∈ ∂ ρψt (x), whereψ t andψ t : R d → [−∞, +∞] are two ρ-convex (see (3.1)) functions satisfying the duality equationψ We recall that Let us stress thatT t (x) now denotes the optimal transport from the law ofX t to the law ofX t , while, in Section 3.1, it denoted the optimal transport from the law ofX t to the one of X t . However, there is no possible confusion since we will only work in the remainder of Section 4 with the coupling betweenX t andX t . By the uniqueness in law of the optimal coupling, see e.g Theorem 6.2.4 of Ambrosio, Gigli and Savaré [2], (X t ,T t (X t )), (T t (X t ),X t ), (T t (X t ),T t (T t (X t ))) and (T t (T t (X t )),T t (X t )) have the same distribution. The equality of the laws of (X t ,T t (X t )) and (T t (T t (X t )),T t (X t )) implies thatp t (y)dy a.e. L(X t |T t (X t ) = y) and L(T t (T t (X t ))|T t (X t ) = y) are both equal to the Dirac mass atT t (y) so thatX t =T t (T t (X t )) a.s.. By positivity of the densities and symmetry we deduce that dx a.e., x =T t (T t (x)) =T t (T t (x)).
From Theorem 14.25 of Villani [18] also known as Alexandrov's second differentiability theorem, we deduce that there is a Borel subset A(ψ t ) of R d such that R d \A(ψ t ) has zero Lebesgue measure and for any x ∈ A(ψ t ),ψ t is differentiable at x and there is a symmetric matrix ∇ 2 Aψ t (x) ∈ M d (R) called the Hessian ofψ t such that Besides, according to Dudley [6] p.167, ∇ 2 Aψ t (x)dx coincides with the absolutely continuous part of the distributional Hessian ofψ t , and, by [6], the singular part is positive semidefinite in the following sense : for any C ∞ function φ with compact support on R d with values in the subset of M d (R) consisting in symmetric positive semidefinite matrices, From (4.12), we can write the second order optimality condition for the minimization of y → |x − y| ρ +ψ t (y) and get that i.e. it is a positive semidefinite matrix. By Lemma 5.1, (4.14) We deduce that dx a.e., ∇ 2 (4.15) and similarly, dx a.e., ∇ 2 Remark 4.2 One may wonder whether the optimal transport mapsT t (x) andT t (x) satisfy additional regularity properties allowing to proceed as in the heuristic proof, for example to obtain the optimality conditions (3.9) and (3.10). We were not able to prove rigorously those conditions. In particular, the assumptions (C) and (STwist) made in Chapter 12 [18] to get smoothness results are not satisfied by our cost function c(x, y) = |x− y| ρ for ρ > 2. Fortunately, the regularity and optimality properties of the optimal transport maps that we have stated from the beginning of Section 4.2 will be enough to complete the proof of Theorem 2.1.
We set The rest of Section 4 will consist in proving the following result. and assume uniform ellipticity : there exists a positive constant a such that a(t, x) − aI d is positive semidefinite for any (t, x) ∈ [0, T ] × R d . Then, t → W ρ ρ (p t ,p t ) is absolutely continuous and such that dt a.e., where the finite constant C does not depend on t ∈ [0, T ], x 0 ∈ R d and N, M ≥ 1.
With this result, we can repeat the arguments of Subsection 3.4, and obtain Proposition 4.1 and thus Theorem 2.1.

Proof of Proposition 4.3
The proof is based on the second of the two next propositions which estimates the time-derivative of the Wasserstein distance under gradually stronger assumptions on the coefficients a and b.
Proposition 4. 4 We assume ellipticity : a(t, x) is positive definite for any t ∈ (0, T ], x ∈ R d . We also suppose that ∃K ∈ [0, +∞), ∀x ∈ R d , sup t∈[0,T ] |σ(t, x)| + |b(t, x)| ≤ K(1 + |x|). Then t → W ρ ρ (p t ,p t ) is absolutely continuous and such that dt a.e., Remark 4.6 Notice that these two propositions still hold with whenX t is the Euler scheme with step T /M for the stochastic differential equation the bounds on the first derivatives of a and b andT t #p t =p t .
The proofs of Propositions 4.4 and 4.5 are given in the two next sections.

Proof of Proposition 4.4
The proof of Proposition 4.4 is split in the next three paragraphs. We first explicit the time evolution of the probability density of the Euler scheme. Then, this enables us to apply the results of Ambrosio, Gigli and Savaré and get a formula for d dt W ρ ρ (p t ,p t ) in (4.24). Last, we show that we have the desired inequality by a spatial integration by parts. Of course, we work under the assumptions of Proposition 4.4 in these two paragraphs.
The Fokker-Planck equation for the Euler scheme. We focus on the Euler schemeX and use the notations given in the introduction.
Proof . Let ϕ be a C ∞ function with compact support on (0, T ) × R d . From (1.3), we apply Ito's formula to ϕ(t,X t ) between 0 and T and then take the expectation to get from the tower property of the conditional expectation. This then leads to: By performing one integration by parts with respect to x, we get that holds in the sense of distributions in (0, T ) × R d . It remains to check that From the assumption on b and σ, the Euler scheme has bounded moments, and therefore We can then focus on the second term in (4.20). We notice that for t ∈ (t k , t k+1 ), we have by Jensen's inequality and usingp t ( and max z≥0 z ρ/2 e −αz = ρ 2αe ρ/2 for α > 0, we get whereλ(a) denotes the largest eigenvalue of the matrix a. Therefore, since by assumptionλ(a(t, x)) ≤ K(1 + |x|) 2 for some K < +∞, and we deduce that √ N T and the boundedness of the moments of the Euler scheme, we The time derivative of the Wasserstein distance. To compute d dt W ρ ρ (p t ,p t ), we are going to adapt to the differentiation of the Wasserstein distance between two absolutely continuous curves the proof of Theorem 8.4.7 of Ambrosio, Gigli and Savaré [2] where one of these curves is constant. We also need to introducẽ whereτ t is defined in (4.17) andμτ t (dy) denotes the law ofXτ t . Note that the conclusion of Lemma 4.7 is also valid with (p t ,v t ) replaced by (p t ,ṽ t ). By the last statement in Theorem 8.3.1 [2], t →p t (x)dx and t →p t (x)dx are absolutely continuous curves in the set of probability measures on R d with bounded moment of order ρ endowed with W ρ as a metric. By the triangle inequality, one deduces that t → W ρ (p t ,p t ) is an absolutely continuous function, which, with the continuous differentiability of w → |w| ρ on R, ensures the absolute continuity of t → W ρ ρ (p t ,p t ). By the first statement in Theorem 8.3.1 and Proposition 8.4.6 [2], there exist Borel and dt a.e. on (0, T ), where i(x) = x denotes the identity function on R d . Note that these vector fields characterized (up to dt a.e. equality) by (4.23) together with dt a.e.
are called in Proposition 8.4.5 [2] the tangent vectors to the absolutely continuous curves t → Using (4.11), plugging the expressions ofv t andṽ t then (4.10) andT t #p t =p t , we get that, dt a.e., τt,t (y, x)dxμ τt (dy) The integration by parts inequality. The aim of this paragraph is to prove the following inequality To do so, we introduce cutoff functions to use the inequality (4.13). We recall that B(r) denotes the closed ball in R d centered in 0 with radius r > 0. For ℓ ≥ 1, we consider a C ∞ function ϕ ℓ : R d → [0, 1] such that: One has From (4.11) and (4.6), we have . By (4.22) and Hölder's inequality, we deduce that We also have Using the dominated convergence theorem, we obtain On the other hand we use the inequality (4.13) to get for any y ∈ R d , and thus where we used the definition ofā for the equality. Using this definition again, we get With (4.15), we deduce that Tr(∇ 2 Aψ t (x)ā(t, x))p t (x) is the sum of a non-negative and an integrable function. Using Fatou's Lemma for the contribution of the non-negative function and Lebesgue's theorem for the contribution of the integrable function in (4.26), we finally obtain (4.25). By symmetry, we have UsingT t #p t =p t in the right-hand-side of (4.25) leads to Plugging the two last inequalities in (4.24) gives Proposition 4.4.
Using (4.11), we get Plugging the above identities in (4.28), we obtain We set M (x) = 1 ρ |x −T t (x)| 2−ρ ∇ 2 Aψ t (x) + A(x) for x ∈ E such that the right-hand-side makes sense. By (4.15), Lemma 5.1 and (4.10), M (x) is a positive semidefinite matrix dx a.e. on E. Moreover, Using this equality in the right hand side of (4.29), we get Therefore dx a.e. on E, every element of R d in the kernel of the matrix M (x) belongs to the kernel of the invertible matrix A(x) so that M (x) is invertible. We finally have Plugging this equality in (4.18), we obtain that When ρ > 2 and x ∈ E, we have from (4.10), (4.15), (4.16) and Lemma 5.1 that ∇ 2 Aψ t (x) and ∇ 2 Aψ t (T t (x)) are positive semidefinite dx a.e. on R d \ E and therefore Therefore the third term in the right-hand-side of (4.30) is non positive. Using Lemma 5.2 for the second term, we conclude that (4.19) holds by remarking that the definition of E ensures that 5 Technical Lemmas 5.1 Transport of negligible sets Lemma 5.1 LetT (x) andT (x) be measurable optimal transport maps for W ρ with ρ ≥ 2 between two probability measures with positive densitiesp andp with respect to the Lebesgue measure on R d :p =T #p andp =T #p. For any Borel subset A of R d such that R d \ A has zero Lebesgue measure, dx a.e.T (x) ∈ A andT (x) ∈ A.
Proof. SinceT #p =p and R d \ A has zero Lebesgue measure, By positivity ofp, one concludes that dx a.e.T (x) ∈ A.

A key Lemma on pseudo-distances between matrices
The next Lemma holds as soon as ρ > 1 and not only under the assumption ρ ≥ 2 made from Section 3.1 on.

Lemma 5.2
For v ∈ R d such that |v| = 1, let A denote the positive definite matrix I d + (ρ − 2)vv * . Let M, a 1 , a 2 ∈ M d (R) be positive definite symmetric matrices. Then for any a > 0 such that a i − aI d is positive semidefinite for i ∈ {1, 2}, one has Notice that the left-hand side of the inequality is linear in a 1 and a 2 , whereas thanks to the positivity of a we obtain the quadratic factor Tr (a 1 − a 2 ) 2 in the right-hand side.
the quantity to be estimated. We have, using the cyclicity of the trace for the third equality below, Since for all λ ∈ R, On the one hand, by Cauchy-Schwarz and Young's inequalities, for symmetric matrices S 1 , S 2 , which implies that On the other hand, we recall that Tr(S 1 S 2 ) ≥ c Tr(S 1 ) when S 1 , S 2 are symmetric positive semidefinite matrices such that S 2 − cI d is positive semidefinite. Since the smallest eigenvalue of A is 1 ∧ (ρ − 1), A 1 2 a 1 A 1 2 − a(1 ∧ (ρ − 1))I d is positive semidefinite and we get and similarly ≤ 0, we finally get that: We have used for the last inequality the cyclicity of the trace and Tr(AS) ≤ (1 ∨ (ρ − 1)) Tr(S) for any positive semidefinite matrix S, since the largest eigenvalue of A is 1 ∨ (ρ − 1).

Remark 5.3
1. In dimension d = 1, the only eigenvalue of A is ρ− 1, and we get the slightly better bound 2. Inequality (5.1) still holds with Tr((a 1 − a 2 ) 2 ) replaced by Tr((a 1 − a 2 )(a 1 − a 2 ) * ] in the right-hand side for all a 1 , a 2 ∈ M d (R) such that a 1 + a * 1 − 2aI d and a 2 + a * 2 − 2aI d are positive semidefinite.
3. Since the second and third terms in the right-hand-side of (5.2) are non-positive, applying Cauchy-Schwarz inequality to the first term, one obtains that ∀a 1 , a 2 ∈ M d (R),
Proof. We do the proof forψ t and follow the arguments of Figalli and Gigli [8]. Let r ∈ (0, +∞). We consider the set Let us check that the existence of a finite constant K r,ρ depending on r and ρ such that sup y∈A min x∈B(r) |x − y| ≤ K r,ρ ensures that the conclusion holds. We have A ⊂ B(K ′ r,ρ ) with K ′ r,ρ = K r,ρ + r. This gives that We also remark that for a constant C r large enough, x → −|x − y| ρ + C r (|x| 2 + |x| ρ ) is convex for any y ∈ B(K ′ r,ρ ). In fact, the Hessian matrix is positive semidefinite for C r large enough since for any y ∈ B(K ′ r,ρ ) and . Thus, for x ∈ B(r),ψ t (x) + C r (|x| 2 + |x| ρ ) is convex as it is the supremum of convex functions.

Estimations using Malliavin calculus
Lemma 5.5 Under the assumptions of Theorem 2.1, we have for all ρ ≥ 1 : Proof of Lemma 5.5. By Jensen's inequality, Let us now check that the left-hand-side is also smaller than . To do this, we will study where g : R d → R d is any smooth function.
In order to continue, we need to do various estimations on the Euler scheme, its limit and their Malliavin derivatives, which we denote by D i uX j t and D i u X j t . Let η t = min{t i ; t ≤ t i } denote the discretization time just after t. We have D i uX j t = 0 for u > t, i, j = 1, ..., d and for u ≤ t, Let us define DX := (D iX j ) ij . Then by induction, one clearly obtains that for u ≤ t, Here ∇b := (∂ x k b j ) kj , σ ′ = (∂ x k σ j· ) kj and n i=1 A i := A 1 · · · A n . Therefore the above product between σ ′ and the increment of W is to be interpreted as the inner product between vectors once k and j are fixed.
Note thatĒ satisfies the following properties: 1.Ē u,t =Ē η(u),t and 2.Ē t i ,t jĒ t j ,t =Ē t i ,t for t i ≤ t j ≤ t.
We also introduce the process E as the d × d-matrix solution to the linear stochastic differential equation The next lemma, the proof of which is postponed at the end of the present proof states some useful properties of the processes E andĒ. From now on, for A ∈ M d (R), |A| = Tr(A * A) denotes its Frobenius norm.
Lemma 5.6 Let us assume that b, σ ∈ C 2 b . Then, we have: where C is a positive constant depending only on ρ and T .
Proof of Lemma 5.6. The finiteness of sup 0≤s≤t≤T E [|E s,t | ρ ]+sup 0≤s≤t≤T E |Ē s,t | ρ is obvious since ∇b and σ ′ are bounded. The upper bound for sup 0≤s≤t≤T E |E −1 s,t | ρ is obtained using the same method of proof as in Theorem 48, Section V.9, p.320 in [14], together with Gronwall's lemma.
The estimate (5.6) on D u E is given, for example, by Theorem 2.2.1 in [13] for time independent coefficients. The same method of proof works for our case. In fact, let us remark that E satisfies (5.4) and thatĒ satisfies E ηu,t = I + t ηuĒ ηu,τs σ ′ (τ s ,X τs )dW s + t ηuĒ ηu,τs ∇b(τ s ,X τs )ds.
Furthermore, (5.7) can be easily obtained by noticing that (X t ,Ē 0,t ) is the Euler scheme for the SDE (X t , E 0,t ) which has coefficients Lipschitz continuous in space and γ-Hölder continuous in time, and by using the strong convergence order of 1 2 ∧ γ (see e.g. Proposition 14 [7]).