Approximation of Markov semigroups in total variation distance

The ﬁrst goal of this paper is to prove that, regularization properties of a Markov semigroup enable to prove convergence in total variation distance for approximation schemes for the semigroup. Moreover, using an interpolation argument we obtain estimates for the error in distribution sense (at the level of the densities of the semigroup with respect to the Lebesgue measure). In a second step, we build an abstract Malliavin calculus based on a splitting procedure, which turns out to be the suited instrument in order to prove the above mentioned regularization properties. Finally, we use these results in order to estimate the error in total variation distance for the Ninomiya Victoir scheme (which is an approximation scheme, of order 2, for diﬀusion processes).


Introduction
In this paper we study the total variation distance between two discrete time Markov semigroups and we give applications for the speed of convergence of approximation schemes.In order to do it we use an abstract Malliavin type calculus based on a splitting procedure which enables us to prove regularization properties of the semigroup -and it turns out that such regularization properties are crucial in order to be able to deal with measurable test functions.Moreover, we take a step further and we give estimates for the distance between the density function of the Markov semigroup and the density function of the approximation scheme.At this level we have to use an interpolation argument which has been recently obtained in [8].Let us be more specific and describe the different steps of our approach.We consider the d dimensional Markov chain where ψ k : R d ×R N → R d is a smooth function such that ψ k (x, 0) = x and Z k ∈ R N , k ∈ N is a sequence of independent random variables.The semigroup of the Markov chain X n k is denoted 2 by P n k and the transition probabilities are µ n k (x, dy) = P(X n k+1 (x) ∈ dy|X n k = x).Moreover we consider a Markov process in continuous time (X t ) t 0 with semigroup P t and we denote ν n k (x, dy) = P(X t k ∈ dy|X t k = x) where t k = kδ = k n with δ = 1 n .A first standard result is the following: let us assume that there exists h > 0, p ∈ N such that for every f ∈ C p (R d ), k ∈ N and x ∈ R d , where f p,∞ denotes the supremum norm of f and of its derivatives up to order p.Then, for every T > 0, sup It means that (X n k ) k∈N is an approximation scheme of weak order h for the Markov process (X t ) t 0 .In the case of the Euler scheme for diffusion processes, this result, with h = 1, has initially been proved in the seminal papers of Milstein [26] and of Talay and Tubaro [32] (see also [17]).Similar results were obtained in various situations: diffusion processes with jumps (see [31], [15]) or diffusion processes with boundary conditions (see [12], [11], [13]).See [16] for an overview of this subject.More recently, approximation schemes of higher orders (e.g., h = 2), based on a cubature method, have been introduced and studied by Kusuoka [21], Lyons [25], Ninomiya, Victoir [27], Alfonsi [1], Kohatsu-Higa and Tankov [18].
Another result concerns convergence in total variation distance: we want to obtain (3) with f p,∞ replaced by f ∞ when f is a measurable function.In the case of the Euler scheme for diffusion processes, a first result of this type has been obtained by Bally and Talay [6], [7] using the Malliavin calculus (see also Guyon [14]).Afterwards Konakov, Menozzi and Molchanov [19], [20] obtained similar results using a parametrix method.Recently Kusuoka [22] obtained estimates of the error in total variation distance for the Victoir Ninomiya scheme (which corresponds to the case h = 2).We will obtain a similar result using our approach.Moreover, we give estimates of the rate of convergence of the density function and its derivatives.
Regularization properties.We first remark that the crucial property which is used in order to replace f p,∞ by f ∞ in (3), is the regularization property of the semigroup.Let us be more precise: let η > 0, p ∈ N be fixed.Given the time grid t k = kδ, we say that a semigroup (P k ) k∈N satisfies R p,η , if We also introduce a dual regularization property: we consider the dual semigroup P * k (i.e.P * k g, f = g, P k f with the scalar product in L 2 (R d )) and we assume that where f p,1 denotes the L 1 norm of f and of its derivatives up to order p.Finally, we consider the following stronger regularization property: for every multi-index α, β with |α| where λ is the Lebesgue measure.If this property holds then a "splitting method" can be used in order to represent Z k as where χ k , U k , V k are independent random variables, χ k is a Bernoulli random variable and √ nU k ∼ ϕ r * (u)du with ϕ r * ∈ C ∞ (R N ).Then we use the abstract Malliavin calculus based on U k , developed in [5] and [3], in order to obtain integration by parts formulae.The crucial point is that the density ϕ r * of √ nU k is smooth and we control its logarithmic derivatives.Using this, we construct integration by parts formulae and obtain relevant estimates for the weights which appear in these formulae.It is worth mentioning that, a variant of the Malliavin calculus based on a similar splitting method has already been used by Nourdin and Poly [29] (see also [28] and [23]).They use the so called Γ calculus introduced by Backry, Gentil and Ledoux [2].Roughly speaking the difference between the approach in our paper and the one in [2] is the following: our construction is similar to the "simple functionals" approach in Malliavin calculus and has the derivative operator as basic object.In contrast, in the Γ calculus, the basic object is the Ornstein Uhlenbeck operator.In order to state the main result of our paper, we introduce some additional assumptions: ∀r ∈ N * , sup ∃λ * > 0, ∀k ∈ N, inf Moreover, we introduce the following regularized version of the approximation scheme X n k : with G a standard normal random variable independent from X n k and θ > h + 1.Here X n k (x) is the Markov chain which starts from x: X n 0 (x) = x.We denote P n,θ (x, dy) = P(X n,θ k (x) ∈ dy) = p n,θ k (x, y)dy.
Theorem 1.2.Consider a Markov semigroup (P t ) t 0 and the approximation Markov chain (P n k ) k∈N defined in (1).We fix T, h > 0, p ∈ N and we assume that the short time estimates (2) and ( 7) hold (with this p and h).Moreover, we assume ( 9), (10), (11) and (12). A. For every 0 < S T , we have B. For every t > 0, P t (x, dy) = p t (x, y)dy with (x, y) C. For every R, ε > 0 and every multi-index α, β, we have We notice that (15) gives the total variation distance between the semigroups (P t ) t 0 and (P n k ) k∈N .Once the appropriate regularization properties are obtained (using the abstract Malliavin calculus), the proof of ( 15) is rather elementary.In contrast, the estimate ( 16) is based on a non trivial interpolation result recently obtained in [8].Notice, however, that the estimate ( 16) is sub-optimal (because of ε > 0).We will illustrate (15) by taking X n k to be the Ninomiya Victoir scheme of a diffusion process.This is a variant of the result already obtained by Kusuoka [22] in the case where Z k has a Gaussian distribution (and so the standard Malliavin calculus is available).Since in our paper Z k has an arbitrary distribution (except the property (9)), our result may be seen as an invariance principle as well.
The paper is organized as follows.In Section 2, we prove Theorem 1.1.In Section 3, we settle the abstract Malliavin calculus based on the splitting method and we use it in order to prove the regularization properties for the approximation scheme X n k (in fact for the regularization X n,θ k ) and we prove Theorem 1.2.Finally, in Section 4, we use the previous results in order to give estimates of the total variation distance for the Ninomiya Victoir approximation scheme.In the Appendix, we prove some technical estimates concerning the Sobolev norms of X n k .

The distance between two Markov semigroups
Throughout this section the following notations will prevail.We fix T > 0 the horizon of the underlying processes and we denote n ∈ N * , the number of time step between 0 and T .Then, we set δ := δ n = 1 n and introduce the time homogeneous time grid t k = kT δ = kT /n.Notice that, all the results from this paper remains true with non homogeneous time step but, for sake of simplicity, we will not consider this case.First, we state some results for smooth test functions.

Regular test functions
We consider a sequence of finite transition measures µ k (x, dy), k ∈ N from R d to R d .This means that for each fixed x and k, µ k (x, dy) is a finite measure on R d with the borelian σ field and for each bounded measurable function f : R d → R, the application is measurable.We also denote and, we assume that all the sequences of measures we consider in this paper satisfies: Although the main application concerns the case where µ k (x, dy) is a probability measure, we do not assume this here: we allow µ k (x, dy) to be a signed measure of finite (but arbitrary) total mass.This is because one may use the results from this section not only in order to estimate the distance between two semigroups but also in order to obtain a development of the error.To the sequence µ k , k ∈ N we associate the discrete semigroup More generally, for r k we define P k,r f by We include the multi-index α = (0, ..., 0) and in this case ∂ α f = f.We use the norms In particular f 0,∞ = f ∞ is the usual supremum norm.
We will consider the following hypothesis: let p ∈ N and 0 We consider now a second sequence of finite transition measures ν k (x, dy), k ∈ N and the corresponding semigroup Q k defined as above.Our aim is to estimate the distance between P k f and Q k f in terms of the distance between the transition measures µ k (x, dy) and ν k (x, dy), so we denote P k can be seen as a semigroup in continuous time considered on the time grid t k , k ∈ N, while Q k would be its approximation discrete semigroup.Let p ∈ N, h 0 be fixed.We introduce a short time error approximation assumption: there exists a constant C > 0 (depending on p only) such that for every k ∈ N, we have Proposition 2.1.Let p ∈ N be fixed.Suppose that µ k and ν k satisfy ( 20) and ( 21) with p = 0. Then for every Proof.We have Using (19) and (21), we obtain Summing over k = 0, ..., m − 1, we conclude.

Measurable test functions (convergence in total variation distance)
The estimate (22) requires a lot of regularity for the test function f.Our aim is to show that, if the semigroups at work have a regularization property, then we may obtain estimates of the error for measurable test functions.In order to state this result we have to give some hypothesis on the adjoint semigroup.Let p ∈ N. We assume that there exists a constant C 1 such that for every measurable function f and any g ∈ C p (R d ) where g, f = g(x)f (x)dx is the scalar product in L 2 (R d ).
Our regularization hypothesis is the following.Let p ∈ N, S > 0 and η 0 be given.We assume that there exists a constant C 1 such that We also consider the "adjoint regularization hypothesis".We assume that there exists an adjoint semigroup P * k,r , that is P * k,r g, f = g, P k,r f for every bounded measurable function f and every function g ∈ C ∞ c (R d ).We assume that P * k,r satisfies (20) and moreover Notice that a sufficient condition in order that R * p,η (S) holds is the following: for every multi index α with |α| p Indeed: Proposition 2.2.Let p ∈ N, η > 0, h 0 and 0 < S T /2 be fixed.We suppose that (20), ( 21) and ( 24) hold for P m and Q m .We also suppose that P satisfies R p,η (S) (see (25)) and Q satisfies R * p,η (S) (see (26)).Then, Proof.Using a density argument we may assume that f ∈ C p (R d ).Moreover, by (23), it is sufficient to prove that Since t m 2S we have t k S or t m − t k+1 S. Suppose first that t k S. Using (19) for Q, (21) and (25) for P , Suppose now that t m − t k+1 S. We take Using (24), (26) and then (20), we obtain and since φ ε,x 0 1 = φ 1 C, the proof is completed.
In concrete applications the following slightly more general variant of the above proposition will be useful.
Proposition 2.3.Let p ∈ N, η > 0, h 0 and 0 < S T /2 be fixed.We assume that (20), ( 21) and ( 24) hold for P and Q with these p, η, h and S.Moreover, we assume that there exists some kernels P k,r which satisfies R p,η (S) (see (25)) and Q k,r which satisfies R * p,η (S) (see (26)).We also assume that for every 0 k r, Then, Remark 2.1.Notice that P k,r and Q k,r are not supposed to satisfy the semigroup property and are not directly related to µ k and ν k .
Proof.The proof follows the same line as the one of the previous proposition.Suppose first that t k S.Then, (19) implies Since P k verifies R p,η (S), we deduce from (21) that Using (30), it follows Suppose now that t m − t k+1 S. We write In order to bound Q k+1,m ∆ k+1 P k f ∞ we use the same reasoning as in the proof of the previous proposition.And the second term is bounded using (30).

Convergence of the density functions
In this section we will consider a Markov semigroup (P t ) t 0 and we will give an approximation result and a regularity criterion for it.The regularization property that we assume for the approximation processes is stronger than the one considered in the previous section and, instead of Proposition 2.2 we will use a general approximation result based on an interpolation inequality, proved in [8].We recall that we have fixed T > 0 and that, δ = δ n = 1/n and we denote t n k = t k = kT n .For k ∈ N, we consider µ n k (x, dy) = µ n (x, dy) = P δ (x, dy), for all k ∈ N, the homogeneous sequence of finite transition measures which satisfy (20).Moreover we introduce a sequence of transition probability measures ν n k (x, dy), k ∈ N, and the corresponding discrete semigroups P n (x, dy) defined by P n k,k = Id and P n k,r+1 = ν n r+1 P n k,r .We recall that P n k f = P n 0,k f .We assume that for f ∈ C p (R d ), we have ) and it verifies (20) : For h > 0 and p ∈ N, we assume that for all n ∈ N, and, We introduce now (P n k ) k∈N , a modification of (P n k ) k∈N in the sense that for every measurable and bounded function f : R d → R, we have sup We assume that (P n k ) k∈N satisfies the following strong regularization property.We fix q ∈ N S, η > 0, and we assume that for every multi-index α, β with |α| + |β| q and f ∈ C q (R d ) one has R p,η (S) Notice that if R q+d,η (S) holds, then there exists Moreover, the regularization properties R p,η (S) and R * p,η (S) hold when R p,η (S) is satisfied.
Remark 2.2.The inequality (38) is essentially a consequence of Proposition 2.3.However, we may not use directly this result, because we do not assume that the semigroup (P t ) t 0 has the regularization property (25).This is pleasant, because we have to check the regularization property on the approximation scheme (P n k ) k∈N only.
Remark 2.3.The estimate (39) is sub-optimal because of ε > 0. One may wonder if optimal estimates (with n h instead of n h(1−ε) ) may be obtained -as it was the case in the paper of Bally and Talay [6] concerning the Euler scheme.Notice that, in the above paper, specific properties related to the dynamics of the diffusion process which gives the semigoup are used, and in particular properties of the tangent flow.For example, if X t (x) denotes the diffusion process starting from x then we have ].Such properties are crucial in the above paper -but are difficult to express in terms of general semigroups.
Proof.Let m = ζn, ζ ∈ N * .Using (33), (34), we obtain So the sequence (P n ) n∈N is Cauchy and then converges with rate CS −ηp n −h f ∞ .It remains to identify the limit.By Proposition 2.1, this limit is (P t f ) t 0 for f ∈ C p b , so we conclude.
Let us prove B. We are going to use a result from [8].First, we introduce some notations.Let d p be the distance defined by We have the following result which is Theorem 2.11 from [8].
Theorem 2.2.Let q, p, l, m ∈ N and r > 1 be given and let r * be the conjugate of r.Consider some measures µ(dx, dy) and µ n (dx, dy) = g n (x, y)dxdy with Suppose that for some α > (q + p + d/r * )/m, we have Then µ(dx, dy) = g(x, y)dxdy with g ∈ W q,r (R d ) and In particular, if (40) is satisfied for all m ∈ N * and any α > (q + p + d/r * )/m, then for every ε > 0 we have with a constant C which depends on ε and may go to infinity as ε ↓ 0.
We come back to our framework.We fix R > 0, 2S t T and choose k(n, t) ∈ N such that We use the result above for the sequence g n := g n,R k(n,t) , n ∈ N and µ(dx, dy) := P t (x, dy)dx.In our specific case ( 35) and (38) give d 0 (g, g n ) Cn −h and the hypothesis (36) ensures that sup n g n q+2m,2m,r < ∞ so (40) holds for every α ∈ R + and r > 1.Using Sobolev's embedding theorem, for u q − d/r we have and we conclude.

Integration by parts using a splitting method
In this section we consider a sequence of independent random variables The number n is fixed throughout this section (so there is no asymptotic procedure going on; but morally n is large because we are interested in estimating the error as n → ∞).Our aim is to settle an integration by parts formula based on the law of Z.The basic assumption is the following: there exists z * ,k ∈ R N and ε * , r * > 0 such that for every Borel set where λ is the Lebesgue measure on R N .We also define and assume that M p (Z) < ∞ for every p 1.
It is easy to check that (43) holds if and only if there exists some non negative measures µ k with total mass µ k (R N ) < 1 and a lower semi-continuous function ϕ Notice that the random variables Z 1 , • • • , Z n are not assumed to be identically distributed.However, the fact that r * > 0 and ε * > 0 are the same for all k represents a mild substitute of this property.In order to construct ϕ we have to introduce the following function: For a > 0, set ϕ a : R N → R defined by ϕ a 1 and we have the following crucial property: for every p, k ∈ N there exists a universal constant C q,p such that for every x ∈ R N , q ∈ N and with the convention ln ϕ a (z) = 0 for |z| 2a.As an immediate consequence of (43), for every non negative function f : By a change of variable We denote and we notice that φ n (z)dz = m * ε −1 * .We consider a sequence of independent random variables Notice that (48) guarantees that P(V k ∈ dz) 0. Then a direct computation shows that This is the splitting procedure for 1 √ n Z k .Now on we will work with this representation of the law of 1 √ n Z k .So, we always take Remark 3.1.The above splitting procedure has already been widely used in the litterature: in [30] and [24], it is used in order to prove convergence to equilibrium of Markov processes.In [9], [10] and [33], it is used to study the Central Limit Theorem.Last but not least, in [29], the above splitting method (with ) is used in a framework which is similar to the one in this paper.
In the following we denote χ and we consider the following class of random variables: We construct now a differential calculus based on the laws of the random variables U k , k = 1, • • • , n which mimics the Malliavin calculus, following the ideas from [5], [3] and [4].In order to be self contained we shortly present the results that we need.For F = f (χ, U, V ) ∈ S we define the Malliavin derivatives We denote by •, • the usual scalar product on R N × R n .The Malliavin covariance matrix for a multi dimensional functional The higher order derivatives are defined by iterating D: Now we define the Ornstein Uhlenbeck operator L : S → S. We denote and we notice that Finally, we define Remark 3.2.The basic random variables in our calculus are Z k , k = 1, • • • , n so we precise the way in which the differential operators act on them.Since In our framework, the duality formula in Malliavin calculus reads as follows: for each F, G ∈ S This follows immediately using the independence structure and standard integration by parts on It follows that which is exactly (61).We have the following standard chain rule: Moreover, one may prove, using (62) and the duality relation (or direct computation), that In particular for F, G ∈ S, We are now able to give the Malliavin integration by parts formula: and Moreover, for every multi index α with H α (F, G) defined by the recurrence relation Proof.Using the chain rule Dφ(F ) = ∇φ(F )DF we have It follows that ∇φ(F ) = γ F Dφ(F ), DF .Then, using (64) and the duality formula (61), We use once again (64) in order to obtain H(F, G) in (66).
We give now estimates of the weights H α (F, G) which appear in the above integration by parts formulas.We will work with the norms: and Proposition 3.1.For each m, q ∈ N, there exists a universal constant C 1 (depending on d, m, q only) such that for every multi index α with |α| q and every F ∈ S d and G ∈ S on has The proof is long but straightforward so we skip it.The reader may find the detailed proof in [5] and in [3], Proposition 3.3.
We finish this section with an estimate of LZ i k q,p : B. For every q ∈ N and p 2 there exists a constant C depending on q, p only Proof. A. Using the duality relation we have In order to prove B we recall (see (60)) that Let Λ k,q be the set of the multi-index α = (α 1 , • • • , α q ) such that α j = (k, i j ).Notice that for a multi-index α of length q, such that α / ∈ Λ k,q , we have D α LZ i k = 0. Suppose now that α ∈ Λ k,q and let α = (i 1 , • • • , i q ).It follows Using (46), we obtain and then

Localizaton
In the following, we will not work under P, but under a localized probability measure defined as follows.We fix M n and we consider the set Using Hoeffding's inequality and the fact that E[χ k ] = m * , it can be checked that We consider also the localization function ϕ n 1/4 /2 , defined in (45), and we construct the random variable Since Z k has finite moments of any order, the following inequality can be shown: for every q ∈ N there exists C such that We define the probability measure with and And for every multi index α with H Θ α (F, G) defined by the recurrence relation . Moreover there exists an universal constant C such that for every multi index α with |α| = q with Proof.Using (66) with G replaced by GΘ we obtain It follows that So (79) is proved and (82) follows by recurrence.Notice that by (46) we have Then (83) follows from (71).

Markov chains
Throughout this section, n ∈ N will still be fixed and will be the number of time step between 0 and T and also the number of increments that we consider in our abstract Malliavin calculus.We consider two sequences of independent random variables Z k ∈ R N , κ k ∈ R, k ∈ N and we assume that Z k verifies (43).We also assume that Z k has finite moments of any order and we recall that We construct the R d valued Markov chain We denote and, for C, r 1 we denote Since |β| 1 in the above definition, we have at least one derivative with respect to z.All our estimates will be done in terms of ψ 1,r,∞ so we may assume without loss of generality that Indeed, if this is not true, we denote by ) and we work with Z k = Z k − m k instead of Z k and with ψ(x, z) = ψ(κ, x, z + m k ) instead of ψ.Since ∇ z ψ = ∇ z ψ all the results remain true.
Remark 3.3.The reason to consider the random variables κ k is the following.In the Victoir Ninomiya scheme, at each time step k, one throws a coin κ k ∈ {1, −1} and employs different form of the function ψ according to the fact that κ k is equal to 1 or to −1.
In order to simplify the notation we denote Our aim is to give sufficient conditions under which the above Markov chain has the regularization property (36).In order to do it, we consider the following new representation of X k .Let us introduce some notations.We denote Using a Taylor development of order one, we write We denote and then, we write Moreover we denote by X n m (x) the Markov chain which starts from x (i.e.X n 0 (x) = x) and we denote by ∂ α X n m the derivative with respect to the starting point x.We will use the results from the previous section for X n m .In order to do it we have to estimate the Sobolev norms of X n m : Theorem 3.2.For every q, q ∈ N with q q , and p 2 there exists some constants C 1, l ∈ N (depending on r * , ε * , m * , q, p and the moments of Z, but not on n) such that The proof is long and technical so we postpone it to the Appendix.

The Malliavin covariance matrix
We turn now to the covariance matrix.We will work under the probability P Θ defined in (78).We recall that M n are given and we have denoted 2 }.The localization random variable Θ = Θ M,n is defined in (76) and we have proved in (77) that, for every q ∈ N, We also have Using the computational rules for k ∈ {0, • • • , m − 1} and m n, we obtain with H j k+1 H q k+1 c i,j,q k ) and ( 96) and the d × N dimensional matrices J l , defined by with We first aim to express D (k+1,i) X n m using the variance of constants method.We consider the tangent flow Y m = ∇ x X n m (x) which is the d × d dimensional matrix solution of where I is the identity matrix.The explicit solution of the above equation is given by , then the lower eigenvalue of I + J k is larger then 1  2 , so we have the invertibility property.We denote by Y m the inverse of Y m and it is easy to check that Y m solves the equation: The following representation of the Malliavin derivative, known as the "variance of constants method", is given by We will use the following estimate.
Lemma 3.2.Let p 1.There exists some constants C 1 1, C 2 2, and C 3 1 which depends on M 8 (Z) and ψ 1,3,∞ , such that the following holds.Suppose that M and n are sufficiently large in order to have with Proof.
Step 1.We notice that on the set {Θ = 0} we have } and Y l = Y l where Y l is the solution of the equation It follows that , the last inequality is a consequence of (77).Indeed The last inequality is true under the hypothesis (101).So, our task is now to estimate We have Moreover, using the Hölder inequality, we obtain Since Y l is F l measurable, we obtain Step 3. Using the above estimate we write We write (I + J l ) −1 − I = −J l (I + J l ) −1 and we notice that, (I + J l ) −1 2 (because the lower eigenvalue of (I + J l ) is larger than The same reasoning as above shows that Step 4. We are now ready to start our proof.We write and we write By (104) we have n θ k C Y k and using the triangle inequality, we deduce that So that, We notice that that ), and then, Moreover, M m is a martingale so, using Burkholder's inequality (see ( 144)), we have We conclude that Now, we are going to use the Gronwall's lemma.We put Then, by Gronwall's lemma, where C depends on ψ 1,3,∞ and the moments of Z.The estimate of E Θ [ Y m p ] is similar but simpler, so we leave it out.
We have the following estimate for the covariance matrix of X n m : Proposition 3.2.Suppose that there exists λ * > 0 such that Assume also that M and n are sufficiently large in order that (101) holds and that Let σ X n M be the Malliavin covariance matrix of X n M defined in (55).There exists a universal constant C such that with C 3 defined in (102).
Proof.By (100), σ We estimate now the lower eigenvalue of σ given by Recall that, I k,i is given in (96): H j k+1 H q k+1 c i,j,q k ).
Since we are on the set {Θ = 0}, we have

The regularization property
We still fix n and we consider the Markov chain X n M , M ∈ N, defined in (85).We also recall that Θ M,n is defined in (76) and we introduce Notice that P Θ,n M , M ∈ N, is not a semigroup, but this is not necessary.We will not be able to prove the regularization property for P n M but for P Θ,n M .Proposition 3.3.A.Assume that (106) holds true.There exists some constants C 1 1, C 2 2 such that the following holds: suppose that n and M are sufficiently large in order to have (101) : and (107).Then for every q ∈ N and multi index α, β with |α| + |β| q, there exists l ∈ N * and C which depends on m * , r * and M l (Z) such that In particular, P Θ,n M (x, dy) = p Θ,n M (x, y)dy with (x, y) → p Θ,n M (x, y), a function that belongs to For every l ∈ N there exists C 1 such that Remark 3.4.Recall that t M = M T n .Then (111) means that the strong regularization property R q,η , with η = 1, holds for P Θ,n M .Proof.We fix M and n, and we denote Θ = Θ M,n .A. We have where Using the integration by parts formula (79) and the estimate (83) we obtain

We use now (108) and we obtain
So we have proved that B. We have .
By (77) we have, for every l ∈ N, P( We give now an alternative way to regularize the semigroup P n k (by convolution).We consider a d dimensional standard normal random variable G which is independent from Z k , k ∈ N, and for θ > 0, we introduce We denote by p θ,n k (x, y) the density of the law of X θ,n k (x) and we define Corollary 3.2.Under the hypothesis of the previous proposition we have: A. For every multi index α, β with |α| + |β| q, and every q ∈ N * , there exists C, l 1 such that with C ψ,l given in (111).B. For every q ∈ N * , there exists C, l 1, such that Proof.We fix M and n, and we denote Θ = Θ M,n .
A. As in (113), we write where The reasoning from the previous proof shows that And since G follows the standard normal law, standard integration by parts give the last inequality being a consequence of (77).B. Let q ∈ N * .Using (77) and ( 111), there exists C, l 1 such that

Approximation result
In this section we give the approximation result for a Markov semigroup (P t ) t 0 .For T > 0 and n ∈ N, we denote δ n = 1 n , t k = kT δ n and µ n k (x, dy) = P T δn (x, dy) for all k ∈ N. We consider now an approximation scheme based on the Markov chain introduced in the previous section.So we consider two sequences of independent random variables 43) and have finite moments of any order: for every p 1, Moreover, we take We denote ν n k+1 (x, dy) = P(X n k+1 ∈ dy | X n k = x) and we construct the discrete semigroup P n k+1 = ν n k+1 P n k .We recall that the notation ψ 1,r,∞ is introduced in (87) and we assume that, for every r ∈ N, We also assume that there exists λ * > 0 such that Now we are able to prove our main result.
Theorem 3.3.A. Consider a Markov semigrop P t , t 0, and the approximation Markov chains P n k , k ∈ N, defined above.We fix 0 < S T /2, p ∈ N and h > 0, and we assume that (20) holds for P and that (32), ( 33), (34), ( 119), ( 120) and (121) hold for this p and h, and for every n ∈ N.There exists l, n * ∈ N and C which depends on ψ 1,p+3,∞ , m * , r * and M l (Z) such that, for n n * , we hace B. Moreover, for every t > 0, P t (x, dy) = p t (x, y)dy with (x, y) C. We recall the P Θ,n k is defined in (110) and verifies P Θ,n k (x, dy) = p Θ,n k (x, y)dy.For every R, ε > 0 and every multi-index α, β we have with a constant C which depends on R, S, ε and on |α| + |β| (and may go to infinity as ε ↓ 0).D. Let θ > h + 1.We recall the P θ,n k is defined in (116) and verifies P θ,n k (x, dy) = p θ,n k (x, y)dy.For every R, ε > 0 and every multi-index α, β we have Proof.A-B.We use Proposition 2.3: we have proved in Proposition 3.3 that P Θ,n k verifies the regularization properties.The proof of ( 122) and ( 123) is an immediate consequence of Theorem 2.1.C. In order prove (124) one employs Corollary 3.2 instead of Proposition 3.3.
Remark 3.5.The simulation of an approximation scheme given by P Θ,n may be cumbersome, so the estimate obtained in (123) is not very useful.This is why we propose the regularized scheme X θ,n k which is easier to simulate.

The Ninomiya Victoir scheme
We illustrate this theorem when X n is the Ninomiya Victoir scheme for a diffusion process.This is a variant of the result already obtained by Kusuoka [22] in the case where Z k has a Gaussian distribution (and so the standard Malliavin calculus is available).Since in our paper Z k has an arbitrary distribution (except the property (43)) our result may be seen as an invariance principle as well.We consider the d dimensional diffusion process a Brownian motion and •dW i t denotes the Stratonovich integral with respect to W i .The infinitesimal operator of this Markov process is with the notation V f (x) = V (x), ∇f (x) .Let us define exp(V )(x) := Φ V (x, 1) where Φ V solves the deterministic equation By a change of variables one obtains Φ εV (x, t) = Φ V (x, εt) so we have We also notice that the semigroup of the above Markov process is given by P V t f (x) = f (Φ V (x, t)) and has the infinitesimal operator A V f (x) = V f (x).In particular the relation Using m times Dynkin's formula We present now the Ninomiya Victoir scheme.We consider a sequence ρ k , k ∈ N of independent Bernoulli random variables and we define ψ k : R d × R N +1 → R d in the following way Here w = (w 0 , w ∈ N are independent random variables which verify (43) and moreover satisfy the following moment conditions: In the original paper of Ninomiya Victoir, the random variables Z i k are standard normal distributed, and then verify (43).The new point here is that we do not require that Z k follows this particular law anymore but only the weaker assumptions (43) and (131).We also denote t k = T k/n.One step of our scheme is given by We have the first following result.
There exists some universal constants C, q 1 such that for every Remark 4.1.A slightly more precise estimate has already been proved by Alfonsi [1] : he obtained (133) with f 6,∞ instead of f 6N,∞ .Since in the following Theorem we will replace it by f ∞ , the estimate in (133) is sufficient for us (and the proof is simpler).
Under an ellipticity condition we are able to give an estimate of the total variation distance between a diffusion process of the form (125) and its Ninomiya Victoir scheme.
Then for every 0 < S T /2 and every bounded and measurable function f : Remark 4.2.This estimate has already been proved by Kusoucka [22] (with a different approach).He considers a much more general non degeneracy assumptions (of Hörmander type) and uses Malliavin calculus in order to prove his result.Here the noise Z i k is no more Gaussian so the standard Malliavin calculus does not work anymore, but, since we have the property (43), we may use the abstract integration by parts formula introduced in the first section.
Proof of Theorem 4.1.In order to simplify the notation, we fix T = 1 without loss of generality.We denote Notice that, with the notation introduced in the beginning of this section, . so that, using (128) with t = 1 and V = U i we obtain with and we recall that . For i = 0 or N +1, we have a similar development with U 0 = U N +1 = 1 2n V 0 .Our aim is to give a development of order 3 (with respect to T ) for We replace each T i with the development of order m = 5 given above and calculate the products.All the terms containing 1 n r , r 3 go in the remainder.Moreover one notices that E[(Z i ) r ] = 0 for odd r so a lot of terms cancel.Finally E[(Z i ) 2 ] = 1 and E[(Z i ) 4 ] = 6 and this permits to achieve the computation and to obtain: The remainder R is a sum of terms of the following form: It is easy to check that for every g ∈ C k+p (R d ), one has for some constants C, q 1.So We turn now to the diffusion process X t .We have the development We take t = δ and we compute Af and A 2 f.Then we make the difference between (142) and (138).All the terms cancel except for the remainders so we obtain We clearly have R f ∞ C × C q 6 (V ) f 6,∞ .This together with (141) proves that the hypothesis ( 21) is verified.So (133) is a consequence of Proposition 2.1 with p = 6(N + 1) and constant C × C q 6 (V ).Proof of Theorem 4.2.This will be a consequence of Theorem 3.3 as soon as we check that the ellipticity assumption (106) holds true.We fix k and we look at ψ k (x, w) defined in (130).We supose that ρ k = 1 (the proof for ρ k = −1 is similar).We denote w = (w 1 , • • • , w N } and T k = k and we consider the process x t (w), 0 t T N +2 solution of the following equation: Then, ψ k (x, w) = x T N +2 (w) and consequently for r ∈ {1, It follows that Notice that T r+1 − T r = 1.Then, we have and then, by (134), Notice that So, for n sufficiently large, we obtain

Sobolev Norms
We consider a separable Hilbert space U , we denote |a| U the norm of U and, for a random variable F ∈ U, we denote F U,p = (E[|F | p U )] 1/p .Moreover we consider a martingale M n ∈ U, n ∈ N and we recall Burkholder's inequality in this framework: for each p 2 there exists a constant b p 1 such that As an immediate consequence We consider the scheme defined in the previous sections (see (91)) : with We also denote Our aim is to obtain estimates of the Sobolev norms of X k .Before doing it, we give some abstract estimates.As before, U is a separable Hilbert space.We say that, a U valued random variable F belongs to S(U ) if for every h ∈ U we have h, F ∈ S (see (53)) and we define DF by h, DF = D h, F .Then, we define the norms (see (69) and (70)) The Hilbert space U being given, we denote Notice that we do not discuss about existence and uniqueness of the solution of such an equation.We just suppose that, the process Y at hand satisfies this equation (which naturally appears in our calculus).Our aim is to estimate the Sobolev norms of Y m .Let q ∈ N and p 2. We denote Proposition 5.1.For every q ∈ N and p 2 there exists some constants C 1, l ∈ N (depending on q and p) such that with N ψ (C, l) and M l (Z) defined in (88) and (44). Proof.
Step 1.Let q = 0, so that Y m V,q,p = Y m V,p .We will check that We study the terms which appear in the right hand side of (146).Notice that Since LH i k and β i k are independent, using (73) we obtain Finally, using the triangle inequality We gather all the terms and we obtain Using Gronwall's lemma we obtain (149).
Step 2. Let For h ∈ H we denote Iterating this formula over k we obtain with Γ m (k, i) = 0 for k > m and, for k m One has Step 3. We estimate the derivatives of Y m , solution of (146).We have Notice that DY m is a process with values in H × V. We will prove that Once ( 151) is proved, the whole proof is concluded.Indeed, using (151) and the result from the first step (that is (148) with q = 0 and Y m replaced by DY m ), we obtain (148) with q = 1.
And using recursively the same reasoning we obtain (148) for every q ∈ N.
We estimate each of the terms which appear in the right hand side of (151).To begin we write C(M Using (149), we obtain We have and a similar estimate holds for β i k q,p .Moreover, we have Γ m = N i,j=1 m−1 k=0 γ i,j k so we have to analyse each of the terms in γ i,j k .We look first at LH i k+1 H j k+1 b i,j k (κ k , X n k , H k+1 ) q,p LH i k+1 H j k+1 q,2p b i,j k (κ k , X n k , H k+1 ) q,2p LH i k+1 q,4p H j k+1 q,4p ψ 1,q+2,∞ ( X n k l q,2p + H j k l q,2p ) C n (1 + ψ l 1,q+2,∞ ) exp(C ψ 2 1,3,∞ ).
so the same reasoning as above proves that the previous inequality holds for M m (with m 1/p * r −1 * replaced by M 1/p p (Z) and β i k V,p replaced by α i k V,p ).We use the same reasoning for M

| 2 |β i k | 2 V 2 H×V 1
2p (Z)C 0,2p (α, β, Γ) p = Dβ i k H×V,p .We conclude that sup m n ( α i k U,p + β i k U,p ) C 1,p (α, β,We analyse now Γ m .We treat firstI m := m−1 k=0 β i k DLH i k+1 .Since β i k D p,j LH i k+1 = 0 if p = k+1,so that, using (73), and the independency of LH k+1 and β k , we have|I m | H×V p = |I m |Since DH i k has properties which are similar to the ones of DLH i k , the same reasoning as above gives • • • , N }, we have ∂ wr ψ k (x, w) = ∂ wr x T N +2 (w).Moreover ∂ wr x t (w) = 0 for t T r and ∂ wr x t (w) = ∂ wr x T r+1 (w) + r+1 , in particular for t = T N +1 .For T r < t T r+1 , ∂ wr x t (w) solves the equation ∂ wr x t (w) = x l ∂ xr a i k (κ k , X k ) (DX n k ) r , (DX n k ) l , x l ∂ xr b i,j k (κ k , X n k ) (DX n k ) l , (DX n k ) r + z j b i,j k (κ k , X n k , H k+1 ) + H j k+1 ∂ z i b i,j k (κ k , X n k , H k+1 ) .