Backward propagation of chaos

This paper develops a theory of propagation of chaos for a system of weakly interacting particles whose terminal configuration is fixed as opposed to the initial configuration as customary. Such systems are modeled by backward stochastic differential equations. Under standard assumptions on the coefficients of the equations, we prove propagation of chaos results and quantitative estimates on the rate of convergence in Wasserstein distance of the empirical measure of the interacting system to the law of a McKean-Vlasov type equation. These results are accompanied by non-asymptotic concentration inequalities. As an application, we derive rate of convergence results for solutions of second order semilinear partial differential equations to the solution of a partial differential written on an infinite dimensional space.


Introduction
The theory of propagation of chaos takes its origin in the work of M. Kac [39] whose initial aim was to investigate particle system approximations of some nonlocal partial differential equations (PDE) arising in thermodynamics. The intuitive idea is the following: Consider a large number n of (random) particles starting from n given independent and identically distributed random variables and whose respective dynamics interact. Because there is no deterministic pattern for the starting position of the particles, one says that the initial configuration is chaotic. Kac's insight was that if the interaction between the particles is "sufficiently weak" and the particles are "symmetric", then as the size of the system increases, there is less and less interaction and in the limit the particles "become independent". That is, the initial chaotic configuration propagates over time. This intuition was put into firm mathematical ground notably by [48], [53] and [26] and has generated a rich literature with a variety of fundamental applications. We refer for instance to [47,41,7,52,35,37,36] for a few recent developments and applications. In particular, the theory of propagation of chaos has undoubtedly motivated (and benefited from) the more recent and very active theory of mean-field games introduced by [43] and [33].
The basic question motivating the present work is to ask whether Kac's intuition carries over to systems of particles with chaotic terminal configurations. There are numerous such examples, for instance in quantitative finance where different parties independently set investment goals which need to be met at a prescribed future date, but with inter-temporal trading decisions that are correlated. More precisely, we ask whether a chaotic terminal configuration will propagate to past configurations as the size of the system becomes large. As mentioned above, an important application at the origin of the theory of propagation of chaos is the particle system approximation of some nonlocal PDEs. We also analyze such an application in the present setting and use the backward propagation of chaos viewpoint to derive a particle system approximation of a semilinear PDE written on an infinite dimensional space (akin to the master equation in the theory of mean-field games). The interest here lies in the fact that, being written on a finite dimensional space, the approximating PDEs are much easier to handle analytically. For instance, well-developed theories of weak solutions and interior estimates for the gradients are available for such equations. The main idea leading to this approximation result is the probabilistic representation of solutions of some parabolic PDEs, especially due to [20], which allows us to transform the problem of approximating PDE solutions into a purely probabilistic question.
In the present paper, we model backward particles by solutions of backward stochastic differential equations (BSDEs) as introduced by [50]: where W 1 , . . . , W n are independent Brownian motions and δ x is the Dirac mass at a vector x. That is, the interaction is through the empirical distribution of the system.
In the main contributions of the paper we derive various convergence results of the n-particle system to solutions of McKean-Vlasov BSDEs of the form Backward propagation of chaos equation is derived; we recover their result by a different argument based on functional inequalities for BSDEs. In [30] (where the term "backward propagation of chaos" is first coined) a convergence result for the empirical measure of the interacting particles is obtained. However, nothing is said concerning the rate of convergence. Another somewhat related article is the work by [8] on the approximation of BSDEs with normal constraints in law.
The ideas and results of the present paper are also connected to the theory of mean field games, which has recently attracted a surge of interest. In fact, BSDEs of mean-field type arise naturally in optimality conditions for mean field games (MFG) with interactions through the controls, which are sometimes referred to as "extended MFG" or "MFG of controls" and have been introduced by [27]. Such models are particularly relevant in economics and finance, cf. e.g. [19,15]. The connection with mean-field BSDEs stems from Pontryagin's maximum principle and has been stressed by [16,Section 4.7.1] and more recently by [1]. We provide a more extensive discussion on the applications of our results to large population games and mean-field games in [44]. Another application of the limit theorems investigated here concern the study of particle systems with known terminal positions. We refer for instance to works on crowd motion with tagged pedestrians by Aurell and Djehiche in [4,3]. Hereby, Y i,n represents the position of a given pedestrian (in a population of size n) who must be at position G i at time T . Our results show that the path followed in the infinite population limit is given by (1.2).
Concerning the approximation problem of PDEs on the Wasserstein space by PDEs on finite dimensional Euclidean spaces, let us mention that a similar question was first analyzed by [14] (see also [42,12]) based on PDE estimations they derive for the finite dimensional system. Their results concern the quasilinear form of the master equation. Our contribution here is mainly methodological, as we obtain a convergence result by purely probabilistic techniques. However, our setting also differs from that of [14,12] in a number of ways, the most important difference being the type of nonlinearities in the measure argument that we consider.
In the rest of the paper, we dedicate Section 2 to the presentation of the precise setting of the work and its main results. The proofs are postponed to Section 3.

Setting and notation
Let d, m ∈ N be fixed. Unless otherwise specified, R d , R m and R d×m are endowed with the Euclidean norm denoted by | · | in all cases. Let us denote by (Ω, F, P ) a probability space carrying a sequence (W i ) i≥1 of d-dimensional Brownian motions. As usual, equalities and inequalities between random variables will be understood to hold up to null sets of the Wiener measure P . Denote by F n := (F n t ) t∈[0,T ] the completion of the raw filtration of W 1 , . . . , W n . Let us equip Ω with the filtration F n . We will always use the identification W ≡ W 1 and F ≡ F 1 .
Given a vector x := (x 1 , . . . , x n ) ∈ (R m ) n , denote by L n (x) := 1 n n k=1 δ x k the empirical measure associated to x. Then, L n (x) ∈ P p (R m ), the set of probability measures on R m with finite p th moment. Let us be given a function F : [0, T ] × Ω × R m × R m×d × P 2 (R m ) → R m , and a family of F T -measurable i.i.d. random variables G 1 , . . . , G n . We are interested in the asymptotic behavior (as n becomes large) of a family of weakly Backward propagation of chaos interacting processes (Y 1,n , · · · , Y n,n ) evolving backward in time and given by where we used the notation Y := (Y 1,n , . . . , Y n,n ). Here as well as in the remainder of the article, we assume that for every (y, z, µ) ∈ R m × R m×d × P 2 (R m ) the stochastic process F (·, ·, y, z, µ) : (t, ω) → F (t, ω, y, z, µ) is progressively measurable. In analogy to weakly interacting particles evolving forward in time, in the limit, the above family will be intrinsically linked to the so-called McKean-Vlasov BSDE Hereby (and henceforth) L(X) denotes the law of the random variable X with respect to the probability measure P . Since under our assumptions on F and G i the processes (Y i ) i will be i.i.d., we will often omit the superscript i and simply write L(Y ) for the law of Y i . We equip the space P p (R m ) with the p th order Wasserstein distance denoted by W p and defined as where the infimum is over probability measures π on R m × R m with first and second marginals µ and ν, respectively. Given p ∈ [1, 2], we will often consider the condition (Lip p ) The function F is L F -Lipschitz continuous and of linear growth in the sense that there is a constant L F ≥ 0 such that, for all t ∈ [0, T ], y, y ∈ R m , z, z ∈ R m×d and µ, µ ∈ P p (R m ). Throughout, we denote by Y the value process of the solution of (2.1) and by Y that of (2.2), say with i = 1.
Having made precise the probabilistic setting governing the paper, let us now presents its main results. Most of them pertain to the limiting behavior of Y i,n . As explained in the introduction, we also deduce approximation of parabolic PDEs on the Wasserstein space. The focus is put on quantitative (i.e. non-asymptotic) estimations of convergence rates. All proofs are postponed to Section 3.

Convergence of empirical distributions
We start by showing that the empirical distribution L n (Y t ) of the system converges to the law L(Y t ) of the McKean-Vlasov BSDE.
It is well-known that the Wasserstein topology is much stronger than the weak topology. Thus, Theorem 2.2 shows, in particular, that the sequence of (random) measures (L n (Y t )) n converges to the (deterministic) measure L(Y t ) in the weak topology. This can be seen as a type of quantitative law of large numbers. As a direct application we obtain the following strong law of large numbers for the sequence Y i,n . Remark 2.3. The reader may wonder whether the Lipschitz-continuity condition (Lip p ) on the generator F can be generalized to allow quadratic generators. The main issue here is that (2.1) is actually a system of coupled BSDEs and, as is well-known, the well-posedness of multi-dimensional quandratic BSDEs cannot be guaranteed in general. More delicate structural conditions need to be imposed. We refer for instance to [40,21,31,38,55,46] for ample discussions on this issue.
However, if the condition (Lip p ) is replaced by The argument of proof is the same, it will be explained after the proof of Theorem 2.2, see Section 3.1.

Corollary 2.4.
Let p ∈ [1,2). Assume that E[|G i | 2 ] < ∞ and that F satisfies (Lip p ). Then we have the L 1 (Ω, P )-limit Proof. By the Kantorovich-Rubinstein duality, we have ] ≤ Cr n,m,q,p for some p < q < 2 and for every 1-Lipschitz function f : R m → R. In particular, taking f (x) = x yields the result.

Remark 2.5.
Under a stronger integrability condition, namely that E[|G i | k ] < ∞ for some k > m + 5, the argument of the above theorem allows to obtain the bound for some constant C depending only on T, m, L F , G and E[|G i | k ].
The estimates (2.3) and (2.5) are uniform in time in the sense that the convergence rate is time-independent, but the supremum (in t) can be taken only outside the expectation on the left hand side. A stronger uniform estimate can be obtained at the cost of also stronger integrability conditions and a worse convergence rate.
However, this is by no means a restrictive one, since it has been shown to hold in many classical cases. For instance, when G i = G(W i T ) for a bounded and Lipschitz continuous function G, Alternatively, under conditions on the Malliavin differentiability of G and F , it can be shown that Z is even bounded, see [22,40] for details. The results of these papers apply for instance when G is Lipschitz continuous on the path space equipped with the supremum norm and F is deterministic. In this case, the integrability condition on G also follows. Furthermore, a similar condition is also used in [10,Theorem 4.3] to obtain rates of convergence, and the authors provided a sufficient condition as well as an example in [10, Lemma 4.4, and Example 4.5 ]. Lastly, in some cases, Z can also be bounded as a random variable, see [9].

Concentration estimates
Given two probability measures Q 1 and Q 2 on Ω, let us denote the p th order Wasserstein distance on Ω equipped with the supremum norm by where the infimum is over probability measures π on Ω × Ω with first and second marginals Q 1 and Q 2 , respectively. The following result gives concentration estimates for the interacting family Y. We consider concentration for the time t marginal as well as for the law of the entire process. Theorem 2.8. Let p ∈ [1,2]. Assume that E[|G i | k ] < ∞ for some k > 2p, and that F satisfies (Lip p ). Then it holds that, for all ε ∈ (0, ∞) and ε F,T := ε/ exp(T e L F T ), with b n,k,ε := n(nε) −(k−δ)/p and a n,ε := for three positive constants δ ∈ (0, k), C and c depending on p, m, k, T , L F and E[|G i | k ]. Moreover, if the functions F t and G i are also Lipschitz continuous as functions on (Ω, || · || ∞ ), that is, for a constant C depending only on L F , L G and T .
If in addition F does not depend on z, then there is n 0 ∈ N such that for all n ≥ n 0 we have for some constant C depending on T, L F , L G , m and k.
The proof of Theorem 2.8 relies on quadratic transportation inequalities for BSDEs investigated in [5] and on standard results from the theory of concentration of measure, see Section 3.2.

Interacting particles approximation of McKean-Vlasov BSDE
This section is concerned with convergence of the sequence of stochastic processes for all q ∈ (2, k) and for some constant C depending on T, m, L F , L G and E[|Y 1 t | k ] and r n,m,q,2 is defined by (2.4).
Remark 2.10. The above result shows that in general, the sequences (Y i,n ) and (Z i,i,n ) converge at the same rate as L n (Y t ). In the special case of particles in "linear" interaction, such a convergence result has been analyzed in [11]. More precisely, [11] considers the case when Y = (Y 1,n , . . . , Y n,n ) solves the system where W is a given Brownian motion, and (G 1 , . . . , G n ) are functions of the terminal values of a system of interacting (forward) particles. In this case, the rate of convergence of the n-particle system to the McKean-Vlasov equation can be improved and does not depend on the dimension. Interestingly, we can slightly generalize the result of [11] using different arguments. We consider the system that often appears in applications, see e.g. [32,6,18] for linear-quadratic mean-field models and [30] for a contract theory problem. We obtain the usually optimal rate 1/ √ n for this more general system.
In fact, consider the McKean-Vlasov equation and the following Lipschitz continuity and linear growth conditions are respectively L F -Lipschitz and L f -Lipschitz continuous and of linear growth in (y, z, a) and (y 1 , y 2 , z) uniformly with respect to (t, ω) and t respectively. That is, there are constants L F , L f ≥ 0 such that for all t ∈ [0, T ], a, a , y, y , y 1 , y 2 , y 1 , y 2 ∈ R m , z, z ∈ R m×d .
for some constant C depending only on T, L F , L f and L G .
Direct consequences of Theorem 2.9 and Proposition 2.12 are the following quantitative propagations of chaos. Corollary 2.13. Put θ k,n := Law(Y 1,n , . . . , Y k,n ) and let L(Y ) ⊗k be the k-fold product of the law L( 2) respectively, then under the conditions of Theorem 2.2, we have, for all n ∈ N and all k ≤ n, that for some constant C depending on T, L F , L G and m. If (Y i,n , Z i,j,n ) and (Y i , Z i ) solve (2.12) and (2.13) respectively, then under the conditions of Proposition 2.12 we have, for all (t, n) ∈ [0, T ] × N and all k ≤ n, that for some constant C depending on T, L F , L G , L f and m.

Example 2.14 (Convolution interaction).
In relation to a principal-agent problem of mean-field type, [30] investigated the case of the generator F t (y, z, µ) := ϕ * µ(y) for some function ϕ : R m → R m , where the convolution ϕ * µ is defined as ϕ * µ(x) := R m ϕ(x − y) dµ(y). This case falls within the scope of Proposition 2.12 (with F (t, y, z, a) = a and f (y, y , z) = ϕ(y − y )), and Corollary 2.13 additionally gives a sharp convergence rate.

Finite dimensional approximation of parabolic PDEs on the Wasserstein space
In this subsection, we assume that F does not depend on (t, ω). Given four functions B : denotes the so-called Wasserstein derivative (also called L-derivative) of the function V in the direction of the probability measure µ, see e.g. [2,45] or [16,Chapter 5] for details. The goal of this section is to show that the solution V of the PDE (2.17), written on the infinite dimensional space [0, T ] × R d × P 2 (R d ) can be approximated by a sequence of solutions of PDEs written on the finite dimensional space [0, T ] × (R d ) n . More precisely, we will be interested in the system of PDEs The following condition is copied almost verbatim from [20]. It guarantees the existence of a unique classical solution V to (2.17).
(PDE) The functions σ, B, F and G satisfy the following: (PDE1) The function σ is bounded, and the functions B, σ, F and G are three times continuously differentiable in w = (x, y, z) and µ, with bounded and Lipschitzcontinuous first and second derivatives (with common bound and Lipschitz constant denoted L F ).
Under the condition (PDE), we then have the announced convergence of v 1,n to V . More precisely, we have: Theorem 2.15. Assume that F does not depend on (t, ω) and that the condition (PDE) is satisfied. Then the sequence (v 1,n ) n converges to V in the sense that for every i.i.d.
where ε n is defined as and C L F ,T,k is a constant depending on L F , T and E[|ξ 1 | k ]. Moreover, for every n ∈ N and every t ∈ [0, T ] it holds that E |v i,n (t, ξ 1 , . . . , ξ n ) − V (t, ξ i , L n (ξ))| 2 ≤ C L F ,T (ε n + r n,d,k,2 ) (2.21) with ξ := (ξ 1 , . . . , ξ n ), where C L F ,T depends on the Lipschitz constant L F of B, F and G and on T , and r n,d,k,2 is defined by (2.4).

Proofs
In this final section we give detailed proofs of the results presented above. We start by justifying well-posedness of (2.2) and (2.1). For simplicity of notation, we will put Z i,n := Z i,i,n whenever this does not cause confusions.
Proof of Remark 2.1. By [16,Theorem 4.23], the equation (2.2) admits a unique solution (Y, Z) solution in the space S 2 (R m ) × H 2 (R m×d ), where we use the notation: for each integer k ≥ 1, with H 0 (R k ) being the space of all R k -valued progressively measurable processes.

Proofs for Subsection 2.2
We begin with two moment estimates for the solution of the McKean-Vlasov BSDE.
Given a square integrable progressive process q, we will denote by Proof. If q = 2, there is nothing to prove because the result follows from Remark 2.1. Let us assume q < k. Since F is Lipschitz continuous in the z-variable, it is almost everywhere differentiable. Thus, it follows from the mean-value theorem that where we used Girsanov's theorem, with Q being the probability measure with density dQ/dP := E 0,T (γ · W ) and Hence, using the linear growth of F we obtain for a constant C F,k,q depending only on k and F . Thus, by Gronwall's inequality we have Note that, since γ is a bounded process, the random variable E t,T (γ · W ) has moments of all orders. Furthermore, sup u∈[0,T ] M 2 (L(Y u )) < ∞ since Y ∈ S 2 (R m ), see Remark 2.1.
Therefore, taking expectation on both sides and applying again Hölder's inequality concludes the argument.
for some C ≥ 0 and for all t 1 ≤ r < s < t ≤ t 2 .
(ii) E[|Y t − Y s | k ] ≤ C|t − s| for some C ≥ 0 and for every t 1 ≤ s ≤ t ≤ t 2 .
Proof. Let us start with the proof of (i). A direct estimation and repeated applications of Hölder's inequality yield Since k ≥ 3 and t − r ≤ 1, we can conclude from the above that where C k,T,F is a constant depending on k, T , L F and the 2k th moments of Y and Z. This proves the first claim. Let us turn to the proof of the second claim, which is similar (and simpler). In fact, arguing as above we get where we used the facts that t − s ≤ 1 and k ≥ 3.
In the case that F is bounded in z the argument is exactly the same. Since the terms |Z t | 2k will not appear in the estimates we can conclude without assumptions on the moments of Z.
Next, we will adapt to BSDEs a well-known coupling technique that will allow to use some known quantitative bounds for i.i.d. samples in our interacting particles case. This coupling technique, which probably originated from the work of [53], is by now standard in SDE theory, see e.g. [23,14] for recent references. Hence, let (Y, Z) be the solution of the McKean-Vlasov BSDE (2.2). Let (Ỹ 1 ,Z 1 Such copies can be found because the McKean-Vlasov BSDE has a unique solution, and thus we have uniqueness in law. We letỸ = (Ỹ 1 , . . . ,Ỹ n ).
Proof. Let i ∈ {1, . . . , n} be fixed. It follows by the mean-value theorem that ). Note that since F is Lipschitz continuous, the derivative ∂ z F can be defined almost everywhere, and is bounded. Thus, the density process defines an equivalent probability measure Q. Due to Girsanov's theorem and square integrability of Z i,j,n andZ i , taking the expectation above with respect to Q yields Again by Lipschitz continuity of F , it holds that so that by Gronwall's inequality, we have Hence, using the definition of the p th -Wasserstein distance, we obtain the estimate Now, combine this with the triangle inequality to obtain Applying again Gronwall's inequality yields the desired result.
With the proofs of the above lemmas aside, we are ready to prove quantitative estimations for the convergence of the empirical measure of Y t to the law L(Y t ) of the

McKean-Vlasov BSDE.
Proof of Theorem 2.2 and Proposition 2.6. The proofs begin with Lemma 3.3. In fact this lemma implies that for a constant C depending on L F , T, m, p and k. Therefore, the estimate (2.3) is obtained due to (3.6).
Since p ∈ [1,2], it follows by Jensen's inequality and the inequality W p ≤ W 2 that for all n ∈ N, Therefore, we deduce from (3.8) and (3.7) that E sup This concludes the proof, since N can be chosen less than T + 1.
Proof of Remark 2.3. In the quadratic case, the estimate of Lemma 3.1 for q = 2 is guaranteed by assumption. The proof of Lemma 3.3 remains the same, up to replacing the process γ u therein by the process By the quadratic growth assumption in z and square integrability assumption on Z i,i,n andZ i , the process (γ t ) t is square integrable. The rest of the proof is the same.

Proof of Theorem 2.8
The proofs of Theorem 2.8 and Proposition 2.12 partially rely on functional inequalities that we now recall for the reader's convenience. See however [54,Chapters 21 & 22] for further details.
Let W 2,δ denote the Wasserstein distance of order 2 with respect to a distance δ on a Polish space E. A probability measure µ ∈ P(E) is said to satisfy Talagrand's T 2 inequality with constant C if W 2,δ (µ, ν) ≤ CH(ν|µ) for every probability measure ν, where H is the Kullback-Leibler divergence defined as with the convention E[X] := ∞ whenever E[X + ] = ∞. Below, we will exploit the efficiency of Talagrand's inequality in deriving concentration inequalities, but also the fact that it implies other functional inequalities, notably the T 1 inequality W 1,δ (µ, ν) ≤ CH(ν|µ) for every probability measure ν, and Poincaré's inequality for every (weakly) differentiable function f : E → R and where Var(f ) is the variance with respect to the probability measure µ.
Proof of Theorem 2.8. The proof of the first concentration bound also uses Lemma 3.3.
For simplicity, let C F,T = exp(T e L F T ) denote the constant factor appearing in the right hand side of (3.3). It follows by (3.3) that and thus we can apply [25,Theorem 2], to obtain the inequality where ε F,T = ε/C F,T . This concludes the proof of (2.7).
As for the proof of (2.8), consider the representation of the system (2.1) given in (3.1) on the probability space (Ω n , F n , P ). Since G i is Lipschitz continuous for each i, it is easily checked that the random variable G is again L G -Lipschitz continuous. In fact, given ω, θ ∈ Ω n , we have where for every ω ∈ Ω n , ||ω|| ∞ = sup t∈[0,T ] n i=1 |ω i (t)| 2 1/2 . Similarly, one shows that the function F is L F -Lipschitz continuous. In particular, the Lipschitz constants of F and G do not depend on n. By [5,Theorem 1.2], the law of Y t satisfies Talagrand's T 2 -inequality with the constant Thus, it follows by [28,Theorem 1.3] that there is a constant C > 0 such that for every 1-Lipschitz continuous functions f : The function ω := (ω 1 , . . . ω n ) → √ nW 2,||·||∞ (L n (ω), L(Y )) is 1-Lipschitz continuous on Ω n . Thus, we have (3.9) from which we deduce (2.8).
Lastly, we turn to the proof of (2.9). If F does not depend on z, we do not need the change of measure to get (3.4). In fact, a direct estimation yields By Lipschitz continuity of F and Gronwall's inequality we have (3.10) Thus, it follows by triangle inequality and definition of Wasserstein distance with respect to the supremum norm that where the last inequality follows from (3.10). Hence, it follows by Doob's maximal inequality that for some q ∈ (p, k) where the last inequality follows by Fubini's theorem and Theorem 2. for n ≥ñ 0 . Now, by (3.9), it holds that In view of (3.11) and the fact that r n,m,k,p ↓ 0 as n goes to infinity, we can choose n 0 ≥ñ 0 large enough such that for all n ≥ n 0 P E W 2,||·||∞ (L n (Ỹ), L(Y )) ≥ ε/2 = 0.
This concludes the proof of (2.9).

Proofs for Subsection 2.4
We begin with the proof of Theorem 2.9. As we will see below, this result is obtained as a consequence of Theorem 2.2.
Proof of Theorem 2.9. Since Y 1,n and Y 1 satisfy (2.1) and (2.2) respectively, we have so that by Lipschitz continuity of F , and Gronwall's inequality it holds On the other hand, applying Itô's formula to the process |Y 1,n By Lipschitz continuity of F we then have where the last inequality follows by Young's inequality with some constant α > 0. Choosing α = L F + 1, we obtain so that by Gronwall's inequality we have that (3.14) Backward propagation of chaos Now, integrating on both sides yields from which we derive, due to Theorem 2.2, that with q ∈ (2, k).
To get the convergence estimate for the control processes Z 1,n , notice that by (3.13) (with the choice α = L F + 1), we have Now combine this with the inequalities (3.12), (3.16) and Theorem 2.2 to conclude.
We now turn to the particular case of systems with linear interaction. Unlike Theorem 2.9, the proof of Proposition 2.12 does not follow from the convergence of the empirical measure of Y, but by a direct argument which seems tailor-made for "linear interaction functions". Before presenting the proof, let us justify the well-posedness of the system.
Proof of Remark 2.11. The existence and uniqueness of square integrable solutions follows, as in the non-linear interaction case, from [24] for the system (2.12) and [16,Theorem 4.23] for the McKean-Vlasov BSDE (2.13). It suffices to show that the generators are Lipschitz continuous (with Lipschitz constant independent of n). We give only the argument for the McKean-Vlasov equation. Let y, y ∈ R m , z, z ∈ R m×d and µ, µ ∈ P(R m ). By (Lip) we have To derive the penultimate inequality, we used Kantorovich-Rubinstein duality formula.
Proof of Proposition 2.12. It follows by application of Itô's formula that 1 n f u (Y 1 u , Y j,n u , Z 1,n u ) − f u (Y 1 u , Y j u , Z 1,n u ) Coming back to (3.18), this allows to obtain, after taking conditional expectation with = C T,F,f n 2 T 0 n j=1 E Var(f u (Y 1 u , Y j u , Z 1 u )) du.
The equality before the last one follows from the fact that Y i t , j = 1, . . . , n are i.i.e. for all t. Since the law L(Y 1 u ) of Y 1 u satisfies Talagrand's T 2 inequality with a constant C F,f,G,T which depends only on the Lipschitz constants of F, f and G and of T (and which does not depend on the dimensions) (see [5,Theorem 1.3]) it follows e.g. by [54,Theorem 22.17] (see also [49,Section 7]) that L(Y 1 u ) satisfies the Poincaré inequality with the same constant. That is, it holds that Var(f u (x, Y j u , z)) ≤ C F,f,G,T R m |∂ y f u (x, y, z)| 2 dL(Y j u )(y) for all x, z fixed.
Since f is Lipschitz continuous, L 2 f is an upper bound for the integral in the right hand side above (uniformly in x, z). Therefore, we have for some constant C depending on T, L F L f and the Lipschitz constant of G. Now, showing that E[ ∂ xi B(·, x) = 1 n ∂ µ B(·, L n (x))(x i ), i = 1, . . . , n. , L n (X t,x u )) du + s t σ(X i,n,t,xi u , L n (X t,x u )) dW u Y i,n,t,x s = G(X i,n,t,xi T , L n (X t,x T )) + T s F (X i,n,t,xi u , Y i,n,t,x u , Z i,n,t,x u , L n (X t,x u ), L n (Y t,x u )) du − T s Z i,n,t,x u dW u i = 1, . . . , n (3.23) where the second inequality follows by choosing α > 1 and for some constant C L F ,T > 0. |Y 1,n,t,ξ t − Y t,ξ t | 2 ≤ C L F ,T ε n for some constant C L F ,T > 0. Combining this with (3.24) leads to (2.19).
This concludes the proof.