Derivative Estimates on Distributions of McKean-Vlasov SDEs

By using the heat kernel parameter expansion with respect to the frozen SDEs, the intrinsic derivative is estimated for the law of Mckean-Vlasov SDEs with respect to the initial distribution. As an application, the total variation distance between the laws of two solutions is bounded by the Wasserstein distance for initial distributions. These extend some recent results proved for distribution-free noise by using the coupling method and Malliavin calculus.


Introduction
Let P 2 be the set of all probability measures on R d with finite second moment, which is called the Wasserstein space under the metric W 2 (µ, ν) := inf π∈C (µ,ν) R d ×R d |x − y| 2 π(dx, dy) 1 2 , µ, ν ∈ P 2 , where C (µ, ν) is the set of all couplings of µ and ν. Consider the following distribution dependent SDE on R d : where W t is an m-dimensional Brownian motion on a complete filtration probability space (Ω, {F t } t≥0 , P), L Xt is the law of X t , and b : are measurable. This type equations, known as Mckean-Vlasov or mean field SDEs, have been intensively investigated and applied, see for instance the monograph [3] and references therein.
To characterize the regularity of the law L X µ t with respect to the initial distribution µ, we investigate the derivative estimate of the functions When the noise coefficient σ t (x, µ) does not depend on µ, the Harnack inequality and derivative formula have been established in [13,10] for P t f by using the coupling by change of measures and Malliavin calculus respectively. See also [2,7,8,12] for extensions to distribution-path dependent SDEs/SPDEs, singular distribution dependent SDEs, and distribution dependent SDEs with jumps, where in [12] allows the noise to be also distribution dependent and establishes the gradient estimate on P t f (x) := (P t f )(δ x ) when the initial distribution is a Dirac measure. In this paper, we estimate the derivative of P t f (µ) in µ by using the heat kernel parameter expansion with respect to the frozen SDE (1.2) dX z,µ t = b t (z, µ t )dt + σ t (z, µ t )dW t for fixed (z, µ) ∈ R d × P 2 (R d ), where µ t := L X µ t . Since this SDE has constant coefficients, the solution has a Gaussian heat kernel which can be easily analyzed.
Before introducing the main result, we first recall the intrinsic derivative and L-derivative for functions on P 2 which go back to [1] where the intrinsic derivative on the configurations space is introduced, see [11] for the link of different derivatives for measures.
(1) f is called intrinsically differentiable, if for any µ ∈ P 2 , is a well defined bounded linear functional. In this case, the unique map holds for any µ ∈ P 2 and φ ∈ L 2 (R d → R d ; µ) is called the intrinsic derivative of f , and we denote we call f L-differentiable, and in this case D L f is also called the L-derivative of f .
(4) A vector-or matrix-valued function is said in a class defined above, if so are its component functions.
To estimate the intrinsic derivative of P t f (µ), we need the following condition. Let | · | and · denote the norm in R d and the operator norm for linear operators repsectively.
(H) For any t ≥ 0, b t , σ t ∈ C 1,1 (R d × P 2 ), and there exists an increasing function K : [0, ∞) → [0, ∞) such that for any t ≥ 0, x, y ∈ R d and µ ∈ P 2 (R d ), It is well known that SDE (1.1) is well-posed under the assumption (H), so that P t f is well defined on P 2 for any t ≥ 0 and f ∈ B b (R d ). In general, for any s ≥ 0 and X µ s,s ∈ L 2 (Ω → R d , F s , P) with L X µ s,s = µ, let X µ s,t be the unique solution of (1.1) for t ≥ s: t ≥ s, L X µ s,s = µ ∈ P 2 . We denote P * s,t µ = L X µ s,t and investigate the regularity of By the uniqueness, we have the flow property P * s,t = P * r,t P * s,r , 0 ≤ s ≤ r ≤ t.
However, due to the distribution dependence, P s,t is no-longer a semigroup, i.e. in general P s,t = P r,t P s,r and so that the regularity of P t f (µ) in µ ∈ P 2 can not be deduced from that of P t f (x) := P t f (δ x ) for x ∈ R d , see for instance [13] for details. We now state the main result of the paper as follows.
Consequently, for any t > s ≥ 0, µ, ν ∈ P 2 , Remark 1.1. We may also apply Malliavin calculus to establish a derivative formula for D L P s,t f (µ) as in [12], where the usual derivative in initial points (rather than in initial distributions) are studied. However, in this way we need stronger conditions on the coefficients, i.e. b t (x, µ) and σ t (x, µ) also have bounded second order derivatives in x. Let us explain this in more details. Firstly, under (H), the Malliavin matrix for some constant c(t) > 0, see [10,Proposition 3.2].
Then for any f ∈ C 1 b (R d ), by the chain rule and the integration by parts formula for the Malliavin gradient D, we have where D * is the Malliavin divergence. To make the above calculations meaningful, we need to verify that (M −1 belongs to the domain of D * , for which the second order derivatives of coefficients will be involved. For instance, as shown in [10,Proposition 3.2] that v φ s,t solves an SDE involving in the first order derivatives of b and σ, making Malliavin derivative to this SDE we see that Dv φ s,t solves an SDE containing the second order derivatives of coefficients. The remainder of the paper is organized as follows. In Section 2, we formulate P s,t f (µ) using classical SDEs with parameter µ and the parameter expansion of heat kernels with respect to the frozen SDE (1.2), and estimate the L-derivative for functions of P * s,t µ. With these preparations, we prove Theorem 1.1 in Section 3.

Preparations
We first represent P s,t f (µ) by using a Markov semigroup P µ s,t with parameter µ, then introduce the heat kernel expansion of P µ s,t with respect to the frozen SDEs. Since the frozen SDE has explicit Gaussian heat kernel, this enables us to calculate the intrinsic derivative of P t f (µ) with respect to µ.

A representation of P s,t
For any s ≥ 0, x ∈ R d and µ ∈ P 2 , consider the decoupled SDE In this SDE, the measure variable P * s,t µ is fixed, so that it reduces to the classical time inhomogeneous SDE. Let P µ s,t be the associated Markov semigroup, i.e.
Since X µ s,t solves (2.1) with the random initial value X µ s,s replacing x, and since L X µ s,s = µ, by the standard Markov property of solutions to (2.1), we have Since for any holds for some constant C > 0.
Proof. Since (H) implies that P * s,t µ is Lipschitz continuous in µ ∈ P 2 , see for instance [13], the desired assertions follow from (H) and the Bismut formula To prove (2.4), for fixed t > s, take Then the Malliavin derivative w x,µ , w x,µ s,s = 0, see for instance [10,Proposition 3.5]. It is easy to see from (2.5) thatv r := (r − s)v x,µ s,r solves the same equation. By the uniqueness we obtain (t − s)v x,µ s,t = D h X x,µ s,t , so that the chain rule and the integration by parts formula yield Consequently, there exists a constant C > 0 such that for any f ∈ B b (R d ) and µ ∈ P 2 , Proof. Obviously, (2.8) is implied by (2.3) and (2.7). So, we only need to prove that P s,t f (µ) is L-differentiable and satisfies (2.7).
(1) We first prove that P s,t f (µ) is intrinsically differentiable and satisfies (2.7). For any So, for any µ ∈ P 2 , the function Combining this with (2.2), (2.3) and (2.6), and using the dominated convergence theorem, we conclude that the map is a bounded linear functional, so that by definition, P s,t f (µ) is intrinsically differentiable in µ ∈ P 2 , and the formula (2.7) holds true.
(2) By (2.7), for any φ ∈ L 2 (R d → R d ; µ), we have Combining this with Lemma 2.1, (2.6), and the L-differentiability of P µ s,t f (x) in µ, we may apply the dominated convergence theorem to derive According to Lemma 2.2, to estimate D L P s,t f (µ) , it remains to investigate the Lderivative of P µ s,t f (x) in µ. To this end, we let p µ s,t (x, y) be the heat kernel of P µ s,t for t > s, which exists and is differentiable in x and y under conditions (H). We have So, to investigate the L-derivative of P µ s,t f (x), we need to study that of p µ s,t (x, y), for which we will use the heat kernel parameter expansion.

Parameter expansion for p µ s,t
Since heat kernel p µ s,t is less explicit, we make use of its parameter expansion with respect to the heat kernel of the Gaussian process for fixed z ∈ R d and µ ∈ P 2 . For any t ≥ r ≥ s ≥ 0, let m µ,z s,r,t := t r b u (z, P * s,u µ)du, m µ,z s,t := m µ,z s,s,t , a µ,z s,r,t := t r (σ u σ * u )(z, P * s,u µ)du, a µ,z s,t := a µ,z s,s,t . (2.11) By (H), we have |m µ,z s,r,t | + |a µ,z s,r,t | ≤ (t − r)K t , t ≥ r ≥ s ≥ 0. (2.12) Obviously, the law of X x,µ,z s,r,t is the d-dimensional normal distribution entered at x + m µ,z s,r,t with covariance matrix a µ,z s,r,t , i.e. the distribution density function is When r = s, we simply denote p µ,z s,t = p µ,z s,s,t , so that For any 0 ≤ s ≤ r < t and y, z ∈ R d , let H µ s,r,t (y, z) := b r (z, P * s,r µ) − b r (y, P * s,r µ), ∇p µ,z s,r,t (·, z)(y) tr (σ r σ * r )(z, P * s,r µ) − (σ r σ * r )(y, P * s,r µ) ∇ 2 p µ,z s,r,t (·, z)(y) .
(2.15) By the parameter expansion, see for instance [9, .14), to estimate D L P µ s,t f , it suffices to study the L-derivative of b r (y, P * u 1 ,u 2 µ) and (σ r σ * r )(y, P * u 1 ,u 2 µ) in µ for r ≥ 0 and u 2 ≥ u 1 ≥ 0. So, we present the following lemma.
Proof. It suffices to prove the first assertion. We first prove the intrinsic differentiability. Let µ ∈ P 2 and φ ∈ L 2 (R d → R d ; µ). Since L X µ s,s = µ implies L X µ s,s +εφ(X µ s,s ) = µ • (Id + εφ) −1 , ε ≥ 0, we have L X ε s,t = P * s,t (µ • (Id + εφ) −1 ) for X ε s,t solving (1.3) with initial value X ε s,s = X µ s,s + εφ(X µ s,s ). By [10, Proposition 3.1] for η = φ(X µ 0 ) and [10, (4.21)] for time s replacing 0, for ; P) for any T > 0, and solves the linear SDEs: v φ,δ s,s = φ(X 0 ), t ≥ s. Fromm (H) we see that v φ,ε s,t is continuous in ε and By the chain rule, see for instance [10, Proposition 3.1], we have Combining this with (H) and (2.20), we obtain Therefore, F (P * s,t µ) is intrinsically differentiable in µ such that (2.18) holds. It remains to verify the L-differentiability. By the chain rule and (2.21), we obtain Combining this with F ∈ C 1 (P 2 ) with bounded D L F , the continuity of v φ,ε s,t in ε, (2.20), and that X ε s,t → X µ s,t when φ L 2 (µ) → 0, by the dominated theorem we prove 3 Proof of Theorem 1.1 According to Lemma 2.2, (2.10) and (2.16), to estimate D L P s,t f (µ) , it suffices to handle the derivative of p µ s,t and H µ,m s,r,t in µ. To this end, for fixed T > 0, we introduce the Gaussian heat kernel which satisfies the Chapman-Kolmogorov equation By (H), there exists a constant K 1 (T ), which increases in T , such that p µ,z s,r,t (y, z) ≤ K 1 (T )h T (t − r, y − z)e − |y−z| 2 Consequently, there exists a constant K 2 (T ), which increases in T , such that There exists a constantK T > 0 which increases in T > 0, such that for any 0 ≤ s ≤ r < t ≤ T, y, z ∈ R d and m ≥ 1, p µ,z s,r,t (y, z) and H µ,m s,r,t are L-differentiable in µ ∈ P 2 satisfying for some constant C 1 (T ) > 0 increasing in T , and all 0 ≤ s ≤ r < t ≤ T, µ ∈ P 2 and y, z ∈ R d . Combining this with (H), (2.13), (3.7) and applying Lemma 2.3, we prove the L-differentiability of p µ,z s,r,t (y, z) in µ ∈ P 2 and the estimate (3.4). Next, by (H), (2.13), (2.15) and (3.7), we find constants C 2 (T ), C 3 (T ) > 0 increasing in T > 0 such that for any 0 ≤ s ≤ r < t ≤ T, µ ∈ P 2 and y, z ∈ R d , |H µ s,r.t (y, z)| ≤ C 2 (T )p µ,z s,r,t (y, z)|y − z| (3.8) Assume that for some k ≥ 1 we have In conclusion, for any m ≥ 1, we have .
for some constant c(t) > 0. Then Therefore, P s,t f (γ) is continuous in γ ∈ P 2 and the proof is then finished.