Total variation distance between two diffusions in small time with unbounded drift: application to the Euler-Maruyama scheme

We give bounds for the total variation distance between the solutions to two stochastic differential equations starting at the same point and with close coefficients, which applies in particular to the distance between an exact solution and its Euler-Maruyama scheme in small time. We show that for small $t$, the total variation distance is of order $t^{r/(2r+1)}$ if the noise coefficient $\sigma$ of the SDE is elliptic and $\mathcal{C}^{2r}_b$, $r\in \mathbb{N}$ and if the drift is $C^1$ with bounded derivatives, using multi-step Richardson-Romberg extrapolation. We do not require the drift to be bounded. Then we prove with a counterexample that we cannot achieve a bound better than $t^{1/2}$ in general.


Introduction
The convergence properties of Euler-Maruyama schemes to approximate the solution of a Stochastic Differential Equation (SDE) have been extensively studied, in particular for L p distances. However, the literature seems to lack some results about the convergence in total variation in small time. More specifically, in this paper we consider the two following SDEs in R d starting at the same point: where W is a Brownian motion. We generally assume that for i = 1, 2, b i is Lipschitz continuous and that σ i is elliptic, bounded and Lipschitz continuous, but we do not assume that b i is bounded. Our objective is to give bounds of the total variation distance between the law of X x t and the law of Y x t , denoted d TV (X x t , Y x t ), as t → 0. In particular, we apply our results to the case where Y x =X x is the one-step Euler-Maruyama scheme associated to the SDE X, given by Such bounds are well known for L p distances and their associated Wasserstein distances and are known to be of order t as t → 0. Yet the literature seems to lack results as it comes to d TV . If σ 1 = σ 2 is constant, then it is classical background that d TV (X x t , Y x t ) is of order t, using a Girsanov change of measure (see for example [PP20,Proposition 4.1]) but this strategy cannot be applied to non-constant σ. The difficulty of the total variation distance in small time is the following: considering its representation formula and comparing it with the L 1 -Wasserstein distance, if x and y ∈ R d are close to each other and if f : R d → R is Lipschitz continuous, then we can bound |f (x)−f (y)| by [f ] Lip |x−y|; whereas if f is simply measurable and bounded, then we cannot directly bound |f (x) − f (y)| in terms of |x − y|. Moreover the regularizing properties of the semi-group cannot be used in small time for the total variation distance. Results in the literature focus on the Euler-Maruyama scheme. In [BT96] is proved the convergence for a fixed time horizon T > 0 and as N → ∞, where N is the number of steps in the Euler-Maruyama scheme on a finite horizon. More precisely, if σ 1 is elliptic and if b 1 and σ 1 are C ∞ with bounded derivatives (but b 1 and σ 1 are not supposed bounded themselves), then ([BT96, Theorem 3.1]) whereX x,N T stands for the Euler scheme with N steps, where Q and q are positive exponents and where K is a non-decreasing function depending on b 1 and σ 1 . The common strategy of proof for such bounds is to use Malliavin calculus in order to perform an integration by parts and to use bounds on the derivatives of the density. However, we cannot infer a bound as T → 0 since we do not know whether K(T )/T q → 0 as T → 0 in general. In [GL08] are given bounds in small time and as N → ∞. Assuming that σ 1 is uniformly elliptic and that b 1 and σ 1 are bounded with bounded derivatives up to order 3, then ([GL08, Theorem 2.3]) where p andp N denote the transition densities of X x andX x,N respectively and where C is a positive constant depending on d and on the bounds on b 1 and σ 1 and on their derivatives. However, we cannot directly use this result for the total variation distance: taking N = 1 yields giving a bound in t −1/2 which does not converge to 0 as t → 0. Moreover, [GL08] assumes that b 1 is bounded. [BJ22] focuses on the case where b 1 is bounded and measurable but not necessarily regular and where σ 1 is constant; it proves that the convergence in total variation of the Euler scheme on a finite horizon which is regularized with respect to the irregular drift b 1 and with step h, is of order √ h. In the present paper, we first prove a convergence rate of order t 1/3 for d TV (X x t , Y x t ), provided that for i = 1, 2, σ i is elliptic, σ i and b i are Lipschitz-continuous with respect to their time variable and that σ i is C 2 b and b i is C 1 and Lipschitz-continuous with respect to their space variable. More generally, if we furthermore assume that σ is C 2r b , then we obtain a convergence rate of order t r/(2r+1) . Letting r → ∞, we also prove that if σ ∈ C ∞ b with some technical condition on the derivatives of the densities of the random variables X x t and Y x t , then the convergence rate is of order t 1/2 exp(C log(1/t)) which is "almost" t 1/2 . Moreover, we provide an example using the geometric Brownian motion where the convergence rate is exactly t 1/2 , thus showing that we cannot achieve better bounds in general. To prove the bound in t r/(2r+1) , we use a multi-step Richardson-Romberg extrapolation [RG11] [LP17], which is a method imported from numerical analysis that we use in our case for theoretical purposes. It relies on a Taylor expansion with null coefficients up to some high order. Such method can be used in more general settings with regularization arguments in order to improve the convergence rates (in our case, we improve t 1/3 into t r/(2r+1) ).
Interestingly, the difference between the drift coefficients b 1 − b 2 does not need to be small for our bounds to be valid. This is because the dominant term in d TV (X x t , Y x t ) comes from the the diffusion part.
Recent results (see [Cle21]) establish a convergence in small time at rate t 1/2 for the Euler scheme of certain classes of diffusions driven by stable Lévy processes, not directly including the Brownian case. This approach relies on Malliavin calculus techniques. In this work the "standard" drift is replaced in the Euler scheme by the flow of the associated (noiseless) ODE. This seems to be specific to Lévy driven SDEs. Adapting this approach to our general continuous framework is not as straightforward as could be expected and would deserve further investigations for future work.
The total variation distance is closely related to the estimation of the density of the solution to an SDE and this density satisfies a Fokker-Planck Partial Differential Equation PDE (3.2). If the drift is bounded, then the density and its partial derivatives admit sub-gaussian Aronson's bounds (see [Fri64] and Section 3.1). However, giving estimates and bounds for the solution of the PDE in the case of unbounded drift appears to be more difficult, see [Lun97], [Cer00], [BL05]. Recent improvements have been made in [MPZ21] using the parametrix method. Studying this case is useful to study the convergence in total variation of SDE's with unbounded drift, in particular for the Langevin equation, very popular in stochastic optimization, and which reads where in many cases, V : R d → R has quadratic growth and ∇V has linear growth (see for example [BP21]).
In order to deal with unbounded b i , we propose two different methods. First, we use a localization argument and "cut" the drift b i intob i outside a compact set, so that we can use bounds from [Fri64] for the bounded drift case. We use the Girsanov formula to explicit the dependence of these bounds in b i ∞ . A second method consists in using the density estimates from [MPZ21, Section 4] to improve the dependency with respect to the bounds in x. However this second approach relies on advanced parametrix methods which require further regularity assumptions on the coefficients of the SDE and which are not fully detailed for higher order derivatives. Our first method is clearly much more elementary, starting from a quite general bound established for any pair of integrable random vectors (see Theorem 2.7) and calling upon a standard regularization strategy which combined with a multistep procedure, seems to be at least quasi-optimal in a very general framework.

Notations
We endow the space R d with the canonical Euclidean norm denoted by | · |. For x ∈ R d and for R > 0, we denote B(x, R) = {y ∈ R d : |y − x| ≤ R}.
We denote M d (R) the set of d × d matrices with real coefficients. For M ∈ (R d ) ⊗k , we denote by M its operator norm, i.e. M = sup u∈R d×k , |u|=1 M · u. If M : For k ∈ N and if f : x) is C k with respect to x, we still denote by ∇ k f its differential with respect to x.
We denote the total variation distance between two distributions π 1 and π 2 on R d : Without ambiguity, if Z 1 and Z 2 are two R d -valued random vectors, we also write d TV (Z 1 , Z 2 ) to denote the total variation distance between the law of Z 1 and the law of Z 2 . We have as well Moreover, we recall that if π 1 and π 2 admit densities with respect to some measure λ, then We denote by W 1 the L 1 -Wasserstein distance. For x ∈ R d , we denote by δ x the Dirac mass at x. If Z is a Markov process with values in R d , we denote, when it exists, its transition probability from x to y ∈ R d between times s < t, p Z (s, t, x, y).
In this paper, we use the notation C and c to denote positive constants, which may change from line to line.

Main results
We consider the two following SDEs in R d : To allievate notations, we also define x) ∈ R q and r ∈ N, let us define the following assumptions: • Lip t (g): g is Lipschitz continuous with respect to t, uniformly in x.
• g ∈ C r : g is differentiable with respect to x with continuous partial derivatives up to the order r.
• g ∈ C r b : g ∈ C r and is bounded with bounded partial derivatives up to the order r. • g ∈ C r b : g ∈ C r and has partial bounded derivatives up to the order r, but we do not assume that g is bounded itself.
• For σ : (2.5) Theorem 2.1. Let X and Y be the solutions of the SDEs (2.1) and (2.2). For i = 1, 2, assume where the positive constants C and c only depend on d, Theorem 2.2. Let X and Y be the solutions of the SDEs (2.1) and (2.2). For i = 1, 2, assume where the positive constants C and c only depend on d, T ,σ 0 , on σ i ∞ , on the bounds on the derivatives of b i and σ i and on their Lipschitz constants. In particular, if Y =X we have Remark 2.3. In Theorems 2.1 and 2.2, we can actually improve the dependency of the constants in x in small time since we have more precisely: Remark 2.4. We can adapt the framework of Theorem 2.2 to the study of SDEs with homogeneous vanishing noise, i.e. where for a > 0, and we identify the dependency in a as a → 0. Namely, with the same assumptions, for a > 0 small enough we have where the constant C does not depend on a. This bound is obtained adapting the proof of Theorem 2.2 in Section 3.4 and using that if Z x is the martingale dZ We can also improve the dependency in the initial condition x ∈ R d using [MPZ21], however at the expense of further regularity assumptions on b i and σ i , i = 1, 2.
Theorem 2.5. Let X and Y be the solutions of the SDEs (2.1) and (2.2). For i = 1, 2, assume and we obtain different convergence rates as t → ∞ and s → 0, depending on (t, s). A noticeable example of this is the Langevin-simulated annealing SDE [BP21]. Furthermore we remark that in Theorems 2.1 and 2.2, the bounds do not depend on ∆b, enhancing that the dominant term in the total variation comes from the diffusion part.
To improve the rate of convergence from t 1/3 in Theorem 2.1 to t r/(2r+1) in Theorem 2.2, we rely on a Richardson-Romberg extrapolation [RG11] [LP17]; this argument can also be applied in a more general framework. The following proposition gives bounds on the total variation between two random vectors, knowing bounds on the L 1 -Wasserstein distance and bounds on the partial derivatives of the densities up to some order.
Theorem 2.7. Let Z 1 and Z 2 be two random vectors in L 1 (R d ) and admitting densities p 1 and p 2 respectively with respect to the Lebesgue measure. Assume furthermore that p 1 and p 2 are C 2r with r ∈ N and that ∇ k p i ∈ L 1 (R d ) for i = 1, 2 and k = 1, . . . , 2r. Then we have where the constant C d,r depends only on d and on r.
If σ ∈ C ∞ b , then we also prove that we can "almost" get a convergence rate of order t 1/2 .
Theorem 2.8. Let X and Y be the solutions of the SDEs (2.1) and (2.2). For i = 1, 2, assume , that σ i is elliptic and that ∆σ(x) = 0. Assume furthermore that if Z and V are the martingales dZ t = σ 1 (t, Z t )dW t and dV t = σ 2 (t, Z t )dW t , then (2.12) (see Theorem 3.1). Then where the positive constants C and c only depend on d, T ,σ 0 , on σ i ∞ , on the bounds on the derivatives of b i and σ i and on their Lipschitz constants.
Remark 2.9. Assumption (2.12) is satisfied in the case of a Brownian motion, which suggests that this assumption is satisfied in general provided that σ is "regular enough". Indeed, if dZ t = σdW t with σ ∈ M d (R) being non degenerate, then with Σ := σσ ⊤ we have for t > 0 and x, y ∈ R d : Moreover for every r ∈ N and u ∈ R d we have where He r is the r th probabilist Hermite polynomial. Following [Kra04] we have using the Stirling formula for the last inequality. On the other hand, using [AS64, 22.14.15] we have ∀u ≥ 0, | He 2r (u)|e −u 2 /4 ≤ 2 r+1 r!.
Remark 2.10. For the Euler-Maruyama scheme (2.3), with a slight abuse of notation, x is used both for the starting point and in the definition of the drift and diffusion coefficients. The transition density should be considered for constant drift and diffusion coefficients in this case. However the results remain valid as the Euler scheme is simply a Brownian process.

Recalls on density estimates for SDEs with bounded drift
We recall results on the bounds for the density of the solution of the SDE using the theory of partial differential equations. Let us consider a generic SDE: Then under regularity assumptions on b Z and on σ Z , the transition probability p Z exists and is solution of the backward Kolmogorov PDE: Moreover, p Z and its derivatives satisfy sub-gaussian Aronson's bounds: Theorem 3.1 ([Fri64], Chapter 9, Theorem 7). Let Z be the solution of (3.1) and let T > 0. Assume Lip t (b Z ) and Lip t (σ Z ), that b Z , σ Z ∈ C r b and that σ Z is elliptic. Then for every m 0 = 0, 1 and for every 0 ≤ m 1 + m 2 ≤ r, ∇ m 0 +m 1 x ∇ m 2 y p Z exists and Theorem 3.2. Let Z be the solution of (3.1) and let T > 0. Assume Lip t (b Z ) and Lip t (σ Z ), that b Z ∈ C r b , σ Z ∈ C r+1 b and that σ Z is elliptic. Then for every 0 ≤ m ≤ r, ∇ m y p Z exists and where the constants C and c only depend on the bounds on b Z and σ Z and on their derivatives and on their Lipschitz constants, on the modulus of ellipticity of σ Z , on d and on T .

Preliminary results
In order to apply the bounds on the densities from Theorem 3.1 to Theorem 2.7, we first "cut" the drifts b 1 and b 2 on a compact set. That is, we instead consider the processes X and Y defined by whereb i , i = 1, 2 is defined as follows. We choose R > 0 and we consider a C ∞ decreasing function ψ : R + → R + such that ψ = 1 on [0, R 2 ] and ψ = 0 on [(R + 1) 2 , ∞) and we defineb x i (t, y) := b i (t, y)ψ(|y − x| 2 ), so thatb x i is bounded: Proof. Let f : R d → R be measurable and bounded. We remark that on the event {sup s∈[0,t] |X x s −x| 2 ≤ R 2 }, we have X x t = X x t , so that But using the inequality |u + v| 2 ≤ 2|u| 2 + 2|v| 2 we have Moreover using Doob's martingale inequality we have Then we define the non-decreasing deterministic process S t := E sup s∈[0,t] |X x s − x| 2 and we get the differential inequality (using t ≤ T ) so the Gronwall lemma yields Using Markov's inequality, we have then We can now apply Theorem 3.1 to X and to Y however the constants arising depend on the bound on b x i ∞ and thus on x. In order to deal with the dependency in b x i ∞ , we apply the Girsanov formula and reduce to the null drift case.
Proposition 3.4. Let Z x be the solution of where X is defined in (3.5) and U x is defined as Proof. First, note that since σ 1 is elliptic and sinceb x 1 , σ 1 ∈ C 1 b , then p X and p Z exist as well as ∇ x p Z (Theorem 3.1). We then use [QZ04, Theorem 2.4] extended to non-homogeneous diffusion processes. Following [QZ04, Remark 2.5], since σ 1 is elliptic and bounded, the assumptions of [QZ04, Theorem 2.4] hold.
We also have the following bounds on the process U x .
Lemma 3.5. With the same assumptions as in Proposition 3.4, for every p ≥ 2, x ∈ R d and t ∈ [0, T ] we have Proof. We recall that for every q ≥ 1, the process (U x ) q is a martingale with

Thus, Doob's martingale inequality yields
Lemma 3.6. With the same assumptions as in Proposition 3.4, we have for every x, y ∈ R d and (3.14) Proof. We use Theorem 3.1 on the process Z x , which yields bounds with constants depending on σ 1 but not onb x 1 . We obtain for every q ≥ 1 and for every s ∈ [0, t]: where we used Lemma A.1 in the appendix. Then for p −1 + q −1 = 1 and p ≥ 2, using the Hölder inequality we have The integral in ds converges under the condition q < d/(d − 1) if d > 1, and for any value of q > 1 if d = 1. Then performing the change of variable s = tu we obtain

Proof of Theorem 2.1
Lemma 3.7. We have for every x ∈ R d and t ∈ [0, T ]: Proof. Let us write We now prove Theorem 2.1.
Proof. Let us introduce an artificial regularization. For ε > 0 and using Lemma 3.7 we have where ζ ∼ N (0, I d ) and is independent of the Brownian motion W .
• Let f : R d → R be measurable and bounded and let us define Then ϕ is C 2 with Moreover, using Theorem 3.1, we have where the constants C and c do not depend onb x 1 . This implies that for every y ∈ R d , Then using the Taylor formula, for every y ∈ R d there existsỹ ∈ (0, y) such that and then for some randomζ ∈ (0, ζ) we have where we used that E[∇ϕ(0) · ζ] = ∇ϕ(0) · E[ζ] = 0. This way we obtain Let ϕ be as defined in (3.16) replacing Z x t by Z 1 . Then, ϕ is differentiable up to the order 2r and for all k = 0, 1, . . . , 2r: Using the Taylor formula up to order 2r, for every y ∈ R d there exitsỹ ∈ (0, y) such that Moreover, we have Then there exists a randomζ ∈ (0, ζ) such that because if k is odd, then E[ζ ⊗k ] = 0. We now rely on a multi-step Richardson-Romberg extrapolation [LP17, Appendix A]. Let us denote the refiners n i = 2 i−1 and the auxiliary sequences and weights Then we have where we used (3.24) in the last equation. Now, using (3.21) we have r i=1β Since u k → u ∞ = ℓ≥1 (1 − 2 −ℓ ) −1 < ∞, the weights satisfy As a consequence and since r i=1 w i = 1, we may write from (3.25) The same way, we obtain On the other side, using (3.18) we have Moreover, for every i = 1, . . . , r, Thus considering (3.20), we obtain for every ε > 0, Optimizing in ε gives and then .
We now prove Theorem 2.2.

Then we have
The same way we have Applying Theorem 2.7 with Lemma A.3 yields

Proof of Theorem 2.5
For the proof of Theorem 2.5, we do not use Lemma 3.7; instead we directly apply Theorem 2.7. Using Theorem 3.2, ∇ k y p X and ∇ k y p Y exist for k = 0, 1, . . . , 2r and satisfy the same bounds as previously. Then using Theorem 2.7 with Lemma A.3 we obtain

Proof of Theorem 2.8
Proof. We use Lemma 3.7 again and rework the bound on d TV (Z x t , V x t ) by paying attention to the dependency of the constants in r in the proof of Theorem 2.7 with Z 1 := Z x t and Z 2 := V x t . Since σ 1 ∈ C 2r b for every r ∈ N, we write (3.22) for any r ∈ N and we have where C 2r and c 2r are defined in (2.12) and where Using (3.26) we get r i=1β and we obtain as in (3.27): On the other hand, considering (3.28) and (3.29) with Lemma A.3 with ∆σ(x) = 0 we have (3.31) We now minimize κ 1 ε r t −r + κ 2 ε −1/2 t in ε, giving and then with as r → ∞: , lim sup where we used Assumption (2.12), so that Then we have d TV (Z x t , V x t ) ≤ C2 r r −1/2 t r/(2r+1) and we choose r(t) = ⌊log 1/2 (1/t)⌋ so that as t → 0,

Counterexample
In this section we give a counter-example showing that we cannot achieve a bound better than t 1/2 in general. More specifically, we show that we cannot achieve a bound better than t 1/2 for the total variation between an SDE and its Euler-Maruyama-scheme in general. For x > 0 and σ > 0, let us consider the one-dimensional process Y x t = xe σWt , (4.1) where W is a standard Brownian motion. The process Y is solution of the SDE dY x t = (σ 2 /2)Y x t dt + σY x t dW t and its associated Euler-Maruyama schemes reads Y x t = x + (σ 2 /2)xt + σxW t ∼ N x(1 + tσ 2 /2), σ 2 x 2 t . (4.2) Proposition 4.1. Let Y be the process defined in (4.1). Then for small enough t we have Proof. We have p Y (t, x, y) = 1 √ 2πσ 2 t exp − 1 2σ 2 t log 2 (y/x) y 1 y≥0 (4.4) so that dy.
But we have as (t, y) → 0: Thus there exists ǫ > 0 and t 0 such that for every t ≤ t 0 : is of order t 1/2 as t → 0.
However, the process Y does not satisfy the assumptions of Theorem 2.2 as its noise coefficient is not elliptic neither bounded on (0, ∞). We then prove the following result.
Proposition 4.2. There exists a diffusion process X on R with C 1 and Lipschitz continuous drift, with C ∞ b and elliptic diffusion coefficient σ and there exists T > 0 and ε ∈ (0, 1) such that ∀t ∈ [0, T ], ∀x ∈ (ε, ε −1 ), d TV (X x t ,X x t ) ≥ C x t 1/2 whereX is the Euler-Maruyama scheme of X and where the positive constant C x depends on x.