DONSKER-TYPE THEOREM FOR BSDES

backward stochastic di(cid:11)erential equation (BSDE), stability of BSDEs, weak convergence of (cid:12)ltrations, discretization. Abstract This paper is devoted to the proof of Donsker’s theorem for backward stochastic di(cid:11)erential equations (BSDEs for short). The main objective is to give a simple method to discretize in time a BSDE. Our approach is based upon the notion of \convergence of (cid:12)ltrations" and covers the case of a ( y; z ) {dependent generator.


Introduction
We consider in this paper the following backward stochastic differential equation (BSDE for short): where W is a standard Brownian motion. The unknowns are the adapted (w.r.t. F W · ) processes Y and Z. (y, z) −→ f (y, z) is a Lipschitz function and ξ is a random variable measurable w.r.t. F W T and square integrable. It is by now well known that the BSDE (1) has a unique square integrable solution under the usual assumptions described above; see e.g. the original work of E. Pardoux and S. Peng [12] or the survey paper by N. El Karoui, M.-C. Quenez and S. Peng [7]. Unlike SDEs -for which a lot of approximations are available -the problem of the time discretization of BSDEs seems to be difficult: only several authors have given a contribution in this direction. Let us mention the works of V. Bally [1], D. Chevance [2,3], J. Douglas, J. Ma and Ph. Protter [6] and more recently J. Ma and J. Yong [10]. In the last two works, the authors proposed numerical schemes to compute the solution of a forward-backward SDE. The method uses strongly the relationship between forward-backward SDEs and quasilinear PDEs in the spirit of the "four-step scheme" introduced by J. Ma, Ph. Protter and J. Yong [9]. This requires the numerical resolution of a quasilinear PDEs. V. Bally and D. Chevance proposed a time discretization of the BSDE (1) that avoids the resolution of a PDE. D. Chevance, in his PhD [3] and in his paper [2], proposed a discretization when the function f does not depend on z. The main point is to remark that, in this case, Y in the BSDE (1) is given by the equation which can be discretized in time with step-size h = T /n by solving backwards in time y k = E y k+1 + hf (y k+1 ) F n k , k= n − 1, . . . , 0 and to set Y n t = y [t/h] . This works if f does not depend on z and under reasonable assumptions, the convergence of Y n to Y is proved [2,3] together with the rate of convergence. D. Chevance gives also a space discretization to obtain a numerical scheme for solving the BSDE. Independently of the work [3], F. Coquet, V. Mackevičius and J. Mémin proved the convergence of the sequence Y n using the tool of convergence of filtrations; see [4]. The method developed by V. Bally in [1] applies to the case where the function f depends on both variables y and z. The time discretization is performed on a random net, namely the jump times of a Poisson process. This random net appears to be one of the main arguments to overcome the difficulties due to the dependence of f in the variable z: it avoids to deal with the evaluations of the process Z on the points of the net. In fact, the discretization concerns the term T t f (Y s , Z s )ds while the Brownian motion and thus the stochastic integral T t Z s dW s are not discretized.
The contribution of this paper is to prove the convergence of one of the most naive methods to discretize the BSDE (1) in the case when f depends on both variables y and z. This method consists in replacing the Brownian motion W by a scaled random walk W n and thus the stochastic integral is also discretized. This leads to a discrete-time BSDE. This approach does not depend on the dimension (of W or Y ) and for simplicity, we deal with real valued processes.
To be more precise, the first step is to solve the discrete-time BSDE (h stands for T /n) where {ε k } 1≤k≤n is an i.i.d. Bernoulli symmetric sequence, and ξ n is a square integrable random variable, measurable w.r.t. G n with G k = σ(ε 1 , . . . , ε k ). By a solution, we mean a discrete process {y k , z k } 0≤k≤n−1 , adapted w.r.t. G k . For solving (2), one chooses first and take y k as the solution of (2). Notice that, for n large enough, y k is well-defined since f is Lipschitz w.r.t. y, and that it is G k -measurable since it is G k+1 -measurable and orthogonal to ε k+1 (this would not happen if ε were chosen Gaussian and it is the reason why W n is not chosen as the discretization of W along the regular net with step-size h). We define two continuous time processes by setting, for The aim of the paper is to prove the convergence of the pair (Y n , Z n ) to (Y, Z); see Theorem 2.1 and its corollary below. This point of view is attractive because it consists in solving in both y and z a discrete BSDE. As in [4], weak convergence of filtrations will be a useful tool.

Statement of the result
Let (Ω, F , P) be a probability space carrying a Brownian motion (W t ) 0≤t≤T and a sequence of i.i.d. Bernoulli sequences {ε n k } 1≤k≤n , n ∈ N * . We consider, for n ∈ N * , the scaled random walks We will work under the following assumptions: (H2) ξ is F W T measurable and, for all n, ξ n is G n n measurable where G n k = σ(ε n 1 , . . . , ε n k ) such that E ξ 2 + sup n E (ξ n ) 2 < ∞; (H3) ξ n converges to ξ in L 1 as n → ∞.
Since f is Lipschitz, we can solve (for n large enough) the discrete BSDE (2) where the sequence {ε k } k is replaced by {ε n k } k . If {y n k , z n k } k is the solution of this equation, we set, for 0 ≤ t ≤ T , Y n t = y n [t/h] , Z n t = z n t/h . In addition, let {Y t , Z t } 0≤t≤T be the solution of the BSDE (1). We will prove the following Theorem 2.1 Let the assumptions (H1), (H2) and (H3) hold. Let us consider the scaled random walks W n defined in (4).
Remark. Theorem 2.1 can be extended to the case when f depends on t. In such case, (2) has to be replaced by and f has to be continuous in t. 2 Method for the proof. The key point is to use the following decomposition where the superscript p stands for the approximation of the solution to the BSDE via the Picard method. More precisely, we set Y ∞,0 = 0, Z ∞,0 = 0, y n,0 = 0, z n,0 = 0 and define (Y ∞,p+1 , Z ∞,p+1 ) as the solution of the BSDE ) is solution of a BSDE with random coefficients) and similarly In order to define the discrete processes on t/h so that Y n,p is càdlàg and Z n,p càglàd (càdlàg means right continuous with left limits and càglàd left continuous with right limits). We shall prove in Lemma 4.1 that the convergence of Y n,p , Z n,p to Y n , Z n is uniform in n for the classical norm used for BSDEs which is stronger that the convergence in the sense of (5); this part is standard manipulations. We shall prove that for any p, the convergence of Y n,p , Z n,p to Y ∞,p , Z ∞,p holds in the sense of (5); this is the difficult part of the proof, and we shall need the results of section 3.

Remark.
Let us now consider the case when ξ n = E ξ | G n n . The convergence of ξ n to ξ in L 1 comes from Theorem 3.1. In this situation, the convergence in probability implies actually the convergence in L 1 meaning the convergence of (Y n , Z n ) to (Y, Z) for the norm used in the framework of BSDEs. Standard manipulations on BSDEs show that we can assume w.l.o.g that ξ is in L ∞ . Indeed, if it is not the case, we have, the "tilde" meaning ξ and ξ n replaced in (1) and (2) by ξ1 |ξ|≤k and E ξ1 |ξ|≤k | G n n , for a constant C depending only on T and on the Lipschitz constant K, and the last two terms can be as small as needed providing we choose k large enough. But, if ξ is bounded (since f (0, 0) is also bounded), we can prove that, for any p ≥ 2, and thus we have the convergence in L 1 provided we have the convergence in probability. 2 In Theorem 2.1, the BSDE (1) and the discrete BSDEs were solved on the same probability space. But, we can also consider these equations on different probability spaces and obtain the convergence of solutions in law instead of in probability. This approach is in the spirit of Donsker's theorem. Let us consider a standard Brownian motion W defined on a probability space and a Bernoulli symmetric sequence {ε k } k≥1 defined on a possibly different probability space. We define, for each n, the scaled random walks We denote by D the space of càdlàg (right continuous with left limits) from [0, T ] in R endowed with the topology of uniform convergence and we assume that: (H4) g : D −→ R is continuous and has a polynomial growth.
Let {Y t , Z t } 0≤t≤T be the solution of the BSDE (1) with ξ = g(W ) and let {Y n t , Z n t } 0≤t≤T the piecewise constant process associated with the solution of the discrete BSDE (2) with ξ n = g(S n ). We have the following corollary Proof. Let us notice that the laws of solution (Y, Z) of (1) and of (y k , z k ) of (2) depend only . So, as far as the convergence in law is concerned, we can consider the equations (1) and (2) on any probability space. But, from Donsker's theorem and Skorokhod representation theorem, there exists a probability space, with a Brownian motion W and a sequence of i.i.d. Bernoulli sequences (ε n ) n such that the processes in probability as well as in L p , for any 1 ≤ p < ∞. It remains to solve the equations (1,2) on this space and to apply Theorem 2.1 to obtain the convergence of (Y n , Z n ) to (Y, Z) in the sense of (5). This convergence implies the convergence of {Y n } n to Y in law for the topology of uniform convergence on D . 2

Convergence of filtrations
Let us consider a sequence of càdlàg processes W n = (W n t ) 0≤t≤T and W = (W t ) 0≤t≤T a Brownian motion, all defined on the same probability space (Ω, G, P); T is finite. We denote by (F n t ) (resp. (F t )) the right continuous filtration generated by W n (resp. W ). Let us consider finally a sequence X n of F n T -measurable integrable random variables, and X an F Tmeasurable integrable random variable, together with the càdlàg martingales We Proof. For the first part, we have, from Proposition 2 in F. Coquet, J. Mémin and L. S lomiński [5], the weak convergence of filtrations F n · to F · : this means that for every A ∈ F T , the martingales E (1 A | F n · ) converge in probability to E(1 A | F · ) in the sense of J 1 -Skorokhod topology of the space D of càdlàg functions. Applying now the second point of Remark 1 of [5] to X n and X, we get the convergence in probability of M n to M in the sense of J 1 -Skorokhod topology. But since M is a continuous martingale this convergence is also uniform in t. From the assumption (A3)a, we deduce that sup n E sup 0≤t≤T |M n t | 2 is finite and thus we have sup n E sup 0≤t≤T |∆M n t | < ∞ and by assumption the same is true for the jumps of W n . It follows, from J. Jacod  Then there exists a sequence (Z n t ) 0≤t≤T of F n · -predictable processes, and an F · -predictable process (Z t ) 0≤t≤T such that: Proof. The first part is completely classic: the predictable representation of F n · -martingales in terms of stochastic integrals w.r.t. W n , and of F · -martingales in terms of stochastic integrals w.r.t. the Brownian motion W . Setting A n t := h[t/h] and applying the first part of the previous theorem, we obtain From these uniform (in t) convergences, we deduce that Extracting a subsequence (still indexed by n), we have for almost every ω, which implies the convergence of Z n · (ω) to Z · (ω) weakly in L 2 ([0, T ], λ). Since Equations (6,7) with the following lemma proved in appendix Lemma 4.1 With the notations following (8,9), imply that it remains to prove the convergence to zero of the process Y n,p − Y ∞,p and Z n,p − Z ∞,p . This will be done by induction on p. For sake of clarity, we drop the superscript p, set the time in subscript and write everything in continuous time, so that equations (8,9) become where A n s = [s/h]h and Y − denotes the càglàd process associated to Y . The assumption is that {Y n t , Z n t } 0≤t≤T converges to {Y t , Z t } 0≤t≤T in the sense of (5) and we have to prove that {Y n t , Z n t } 0≤t≤T converges to {Y t , Z t } 0≤t≤T in the same sense. The process, defined by satisfies Hence M n is an F n · -martingale and, since Y n T = ξ n , If we want to apply Corollary 3.2, we have to prove the L 1 convergence of M n T . But since Y n and Z n are piecewise constant, we have which tends to zero in probability and then in L 1 by L 2 -boundedness. This and equations (12,13), imply together with Corollary 3.2 that M n converges to Since we want to prove that This is true since we have just proved the convergence of

Discrete BSDEs and PDEs
In this section, we give an application of Theorem 2.1 to BSDEs in a Markovian framework which are related to semilinear PDEs. Let us first recall the relations between BSDEs and PDEs. The setup is the following: let b and σ be two functions defined on [0, T ] × R with values in R; f is a function defined on [0, T ] × R 3 in R and g from R to R. We assume that these functions are K-Lipschitz continuous w.r.t. all their variables. Let us introduce the unique (in the class of continuous functions with polynomial growth) viscosity solution to the PDE on [0, T ] × R, It is a very well known fact -we refer to S. Peng [14] for classical solutions and to E. Pardoux, S. Peng [13] for viscosity solutions -that U is related to the following BSDE: for the terminal condition in this BSDE is of the special form g(X T ) where {X t } 0≤t≤T is the solution to the SDE By the nonlinear Feynman-Kac formula, we have: In the remaining of this section, we will use the result of Theorem 2.1 to discretize the solution to the BSDE (15) and then to construct an approximation of the solution U to the PDE (14) which solves a discrete PDE. The framework is the same as in the section 2: (Ω, F , P) is a probability space carrying a Brownian motion (W t ) 0≤t≤T and a sequence of i.i.d. Bernoulli sequences {ε n k } 1≤k≤n , n ∈ N * . We consider, for n ∈ N * , the scaled random walks and we assume that sup as well as in L p for all real p ≥ 1. This is not a restriction as explained in Corollary 2.2. We define also G n k = σ(ε n 1 , . . . , ε n k ). We consider the time discretization of the interval [0, T ] with step-size T /n; we pick n such that KT /n < 1 and we set h = T /n so that Kh < 1. We fix a real x. {χ n i } i is defined by the relation To define this process in continuous time, we set, ∀t ∈ [0, T ], It is worth noting that the process {X n t } 0≤t≤T is the strong solution to the SDE Hence, we are in the classical context of convergence of solutions to SDEs. We refer to L. S lomiński [15] for general results in this area. We solve the discrete BSDE - (18) can be rewritten as Let D n − and D n + be the following discrete operators: We have the following result: with the terminal condition u n (n, x) = g(x). Then, we have, ∀k = 0, . . . , n − 1, y n k = u n (k, χ n k ), z n k = h −1/2 D n − u n (k + 1, χ n k ). Proof. Suppose that y n k+1 = u n (k + 1, χ n k+1 ) for some k ∈ {0, . . . , n − 1}. From the equation (18), we have and then, since E u n (k + 1, χ n k+1 ) | G n k = D n + u n (k + 1, χ n k ), . Noting that f is K-Lipschitz and that Kh < 1, we get y n k = u n (k, χ n k ). The proof is thus complete by induction since obviously u n (n, χ n n ) = g(χ n n ) = y n n . 2 We define a new sequence of functions by setting and we are interested in the convergence of the sequence {U n } n .

Theorem 5.2
For each x ∈ R, U n (0, x) converges to U (0, x), U being the solution to the semilinear PDE (14). This convergence is uniform on compact sets.

Proof.
We fix x ∈ R. We have U n (0, x) = Y n 0 and also U (0, x) = Y 0 . As a consequence the proof of the first statement will be finished if we are able to prove that For the proof of the uniform convergence on compact sets, we first remark that since f is Lipschitz, for each compact K, there exists a constant C such that, for all x, x in K, and the same is true for Y (x) in place of Y n (x). The last inequality is derived from the similar inequality for X n (x) (see [15]). Let us fix a compact K. For each k ∈ N * , we can find a finite set of points of K say K k such that for each point x ∈ K there exists a point and thus, K k being finite, which gives the result since k is arbitrary. Proof. For notational convenience, let us write y, z in place of y n,p+1 − y n,p , z n,p+1 − z n,p and u, v in place of y n,p − y n,p−1 , z n,p − z n,p−1 . Let us pick β > 1 to be chosen later. With these notations in hands, we have, for k = 0, . . . , n − 1, since y n = 0, We write y 2 i − y 2 i+1 = 2y i y i − y i+1 − y i − y i+1 2 , to use the equation (9), since Since f is Lipschitz in (y, z) with constant K, we have, for each ν > 0, and moreover, (20) implies easily that As a byproduct of these inequalities, we deduce that, for k = 0, . . . , n − 1, and, setting ρ = (ν + 2K 2 h)βh, we get Thus, if 1 − β + 2K 2 hβ/ν ≤ 0, we have, for k = 0, . . . , n − 1, in particular, taking the expectation of the previous inequality for k = 0, we get