The Steepest Descent Method for Forward-Backward SDEs 1

. This paper aims to open a door to Monte-Carlo methods for numerically solving Forward-Backward SDEs, without computing over all Cartesian grids as usually done in the literature. We transform the FBSDE to a control problem and propose the steepest descent method to solve the latter one. We show that the original (coupled) FBSDE can be approximated by decoupled FBSDEs, which further boils down to computing a sequence of conditional expectations. The rate of convergence is obtained, and the key to its proof is a new well-posedness result for FBSDEs. However, the approximating decoupled FBSDEs are non-Markovian. Some Markovian type of modiﬂcation is needed in order to make the algorithm e–ciently implementable.


Introduction
Since the seminal work of Pardoux-Peng [19], there have been numerous publications on Backward Stochastic Differential Equations (BSDEs) and Forward-Backward SDEs (FBSDEs). We refer the readers to the book Ma-Yong [17] and the reference therein for the details on the subject. In particular, FBSDEs of the following type are studied extensively: where W is a standard Brownian Motion, T > 0 is a deterministic terminal time, and b, σ, f, g are deterministic functions. Here for notational simplicity we assume all processes are 1-dimensional. It is well known that FBSDE (1.1) is related to the following parabolic PDE on [0, T ] × IR (see, e.g., [13], [20], and [7]) x, u, σ(t, x, u)u x ) = 0; u(T, x) = g(x); (1.2) in the sense that (if a smooth solution u exists) Y t = u(t, X t ), Z t = u x (t, X t )σ(t, X t , u(t, X t )). (1.3) Due to its importance in applications, numerical methods for BSDEs have received strong attention in recent years. Bally [1] proposed an algorithm by using a random time discretization. Based on a new notion of L 2 -regularity, Zhang [21] obtained rate of convergence for deterministic time discretization and transformed the problem to computing a sequence of conditional expectations. In Markovian setting, significant progress has been made in computing the conditional expectations. The following methods are of particular interest: the quantization method (see, e.g., Bally-Pagès-Printems [2]), the Malliavin calculus approach (see ), the linear regression method or the Longstaff-Schwartz algorithm (see Gobet-Lemor-Waxin [10]), and the Picard iteration approach (see Bender-Denk [3]). These methods work well in reasonably high dimensions. There are also lots of publications on numerical methods for non-Markovian BSDEs (see, e.g., [5], [6], [12], [15], [24]). But in general these methods do not work when the dimension is high.
Numerical approximations for FBSDEs, however, are much more difficult. To our knowledge, there are only very few works in the literature. The first one was Douglas-Ma-Protter [9], based on the four step scheme. Their main idea is to numerically solve the PDE (1.2). Milstein-Tretyakov [16] and Makarov [14] also proposed some numerical schemes for (1.2). Recently Delarue-Menozzi [8] proposed a probabilistic algorithm. Note that all these methods essentially need to discretize the space over regular Cartesian grids, and thus are not practical in high dimensions.
In this paper we aim to open a door to truly Monte-Carlo methods for FBSDEs, without computing over all Cartesian grids. Our main idea is to transform the FBSDE to a stochastic control problem and propose the steepest descent method to solve the latter one. We show that the original (coupled) FBSDE can be approximated by solving a certain number of decoupled FBSDEs. We then discretize the approximating decoupled FBSDEs in time and thus the problem boils down to computing a sequence of conditional expectations. The rate of convergence is obtained.
We note that the idea to approximate with a corresponding stochastic control problem is somewhat similar to the approximating solvability of FBSDEs in Ma-Yong [18] and the near-optimal control in Zhou [25]. However, in those works the original problem may have no exact solution and the authors try to find a so called approximating solution. In our case the exact solution exists and we want to approximate it with numerically computable terms. More importantly, in those works one only cares for the existence of the approximating solutions, while here for practical reasons we need explicit construction of the approximations as well as the rate of convergence.
The key to the proof is a new well-posedness result for FBSDEs. In order to obtain the rate of convergence of our approximations, we need the well-posedness of some adjoint FBSDEs, which are linear but with random coefficients. It turns out that all the existing methods in the literature do not work in our case.
At this point we should point out that, unfortunately, our approximating decoupled FBSDEs are non-Markovian (that is, the coefficients are random), and thus we cannot apply directly the existing methods for Markovian BSDEs. In order to make our algorithm efficiently implementable, some further modification of Markovian type is needed.
Although in the long term we aim to solve high dimensional FBSDEs, as a first attempt and for technical reasons (in order to apply Theorem 1.2 below), in this paper we assume all the processes are one dimensional. We also assume that b = 0 and f is independent of Z. That is, we will study the following FBSDE: In this case, PDE (1.2) becomes (1.5) Moreover, in order to simplify the presentation and to focus on the main idea, throughout the paper we assume Assumption 1.1 All the coefficients σ, f, g are bounded, smooth enough with bounded derivatives, and σ is uniformly nondegenerate.
Under Assumption 1.1, it is well known that PDE (1.5) has a unique solution u which is bounded and smooth with bounded derivatives (see [11]), that FBSDE (1.4) has a unique solution (X, Y, Z), and that (1.3) holds true (see [13]). Unless otherwise specified, throughout the paper we use (X, Y, Z) and u to denote these solutions, and C, c > 0 to denote generic constants depending only on T , the upper bounds of the derivatives of the coefficients, and the uniform nondegeneracy of σ. We allow C, c to vary from line to line.
Finally, we cite a well-posedness result from Zhang [23] (or [22] for a weaker result) which will play an important role in our proofs.

Theorem 1.2 Consider the following FBSDE
Assume that b, σ, f, g are uniformly Lipschitz continuous with respect to (x, y, z); that there exists a constant c > 0 such that (1.7) and that Then FBSDE (1.6) has a unique solution (X, Y, Z) such that where C is a constant depending only on T, c and the Lipschitz constants of the coefficients.
The rest of the paper is organized as follows. In the next section we transform FB-SDE (1.4) to a stochastic control problem and propose the steepest descent method; in §3 we discretize the decoupled FBSDEs introduced in §2; and in §4 we transform the discrete FBSDEs to a sequence of conditional expectations.

The Steepest Descent Method
Let (Ω, F, P ) be a complete probability space, W a standard Brownian motion, T > 0 a fixed terminal time, F = {F t } 0≤t≤T the filtration generated by W and augmented by the P -null sets. Let L 2 (F) denote square integrable F-adapted processes. From now on we always assume Assumption 1.1 is in force.

The Control Problem
In order to numerically solve (1.4), we first formulate a related stochastic control problem. Given y 0 ∈ IR and z 0 ∈ L 2 (F), consider the following 2-dimensional (forward) SDE with random coefficients (z 0 being considered as a coefficient): Our first result is Proof. The idea is similar to the four step scheme (see [13]).
Step 1. Denote is bounded. Note that ∆Y T = Y 0 T − g(X 0 T ). By standard arguments one can easily get Step 2.
where α i t are defined in an obvious way and are uniformly bounded. Note that ∆X 0 = 0. Then by standard arguments we get which, together with (2.3), implies (2.4).
Step 3. We now prove the theorem.

The Steepest Descent Direction
Our idea is to modify (y 0 , z 0 ) along the steepest descent direction so as to decrease V as fast as possible. First we need to find the Fréchet derivative of V along some direction (∆y, ∆z), where ∆y ∈ IR, ∆z ∈ L 2 (F). For δ ≥ 0, denote and let X 0,δ , Y 0,δ be the solution to (2.1) corresponding to (y δ 0 , z 0,δ ). Denote: where ϕ 0 s = ϕ(s, X 0 s , Y 0 s ) for any function ϕ. By standard arguments, one can easily show that where the two limits in the first line are in the L 2 (F) sense.

Lemma 2.2
For any (∆y, ∆z), we have Proof. Note that Applying Ito's formula one can easily check that Then That proves the lemma.
Recall that our goal is to decrease V (y 0 , z 0 ). Very naturally one would like to choose the following steepest descent direction: which depends only on (y 0 , z 0 ) (not on (∆y, ∆z)). Note that if ∇V (y 0 , z 0 ) = 0, then we gain nothing on decreasing V (y 0 , z 0 ). Fortunately this is not the case. (2.8) One may consider (2.8) as an FBSDE with solution triple (Ȳ t ,Ỹ t ,Z t ), whereȲ t is the forward component and (Ỹ t ,Z t ) are the backward components. Then (Ȳ 0 0 ,Z 0 t ) are considered as (random) coefficients of the FBSDE. One can easily check that FBSDE (2.8) satisfies condition (1.7) (with both sides equal to 0). Applying Theorem 1.2 we get In particular, which, combined with (2.7), implies the lemma.

Iterative Modifications
We now fix a desired error level ε and pick an (y 0 , z 0 ). If we are extremely lucky that V (y 0 , z 0 ) ≤ ε 2 , then we may use (X 0 , Y 0 , z 0 ) defined by (2.1) as an approximation of (X, Y, Z). In other cases we want to modify (y 0 , z 0 ). From now on we assume where K 0 ≥ 1 is a constant. We note that one can always assume the existence of K 0 by letting, for example, y 0 = 0, z 0 t = 0.
Following the proof of Lemma 2.2, we have Step 2. First, one can easily show that s dW s ; where α i,θ , β i,θ are defined in an obvious way and are bounded. Thus, by (2.16), which is bounded. For any constants a, b > 0 and 0 < λ < 1, applying the Young's Inequality we have Noting that the value of λ we will choose is less than 1, we have Step 3. Denote thanks to (2.17), (2.10), and (2.16). In particular, Step 4. Note that Then, by (2.9) we have Choose c 1 = c 2C for the constants c, C as above, and λ = c 1 ε . Then by (2.10) we get For k = 0, 1, · · · , let (X k , Y k ,Ȳ k ,Ỹ k ,Z k ,Z k ) be the solution to the following FBSDE: We note that (2.22) is decoupled, with forward components (X k , Y k ) and backward components (Ȳ k ,Ỹ k ,Z k ,Z k ). Denote where c 1 , C 0 are the constants in Lemma 2.4.
For c, C as above, choose N to be the smallest integer such that We get which obviously proves the theorem.

Time Discretization
We now investigate the time discretization of FBSDEs (2.22). Fix n and denote t i = i n T ; ∆t = T n ; i = 0, · · · , n.

Discretization of the FSDEs
Given y 0 ∈ IR and z 0 ∈ L 2 (F), denote (3.1) Note that we do not discretize z 0 here. For notational simplicity, we denote First we have Then I n,0 ≤ CV n (y 0 , z 0 ) + C n . (3.4) We note that (see, e.g. Zhang [21]), Then one can easily show the following estimates: Proof of Theorem 3.1. Recall (2.1). For i = 0, · · · , n, denote where α j i , β j i ∈ F t i are defined in an obvious way and are uniformly bounded. Then and, similarly, Then A 0 = 0, and By the discrete Gronwall inequality we get Next, note that Applying the Burkholder-Davis-Gundy Inequality and by (3.5) we get which, together with Theorem 2.1, implies that Finally, note that V (y 0 , z 0 ) ≤ CV n (y 0 , z 0 ) + CE |∆X n | 2 + |∆Y n | 2 = CV n (y 0 , z 0 ) + CA n .
We get Thus Choose n ≥ 2C for C as above, by (3.3) we prove (3.4) immediately.

Discretization of the BSDEs
Define the adjoint processes (or say, discretize BSDE (2.5)) as follows.
i ) for any function ϕ. We note again thatZ n,0 ,Z n,0 are not discretized. Denote ∆W i+1 = W t i+1 −W t i , i = 0, · · · , n−1. Following the direction (∆y, ∆z), by (3.1) we have the following gradients: Repeating the same arguments and by induction we get ¿From now on, we choose the following "almost" steepest descent direction: We note that ∆z is well defined here. Then we have Lemma 3.3 Assume (3.9). Then for n large, we have ∇V n (y 0 , z 0 ) ≤ −cV n (y 0 , z 0 ).
Proof. We proceed in several steps.
Step 3. By (3.9) we have where we used (3.10) for the last inequality. Then, Assume n is large. By (3.11) we get Step 4. It remains to estimate I n,0 i . First, by standard arguments and recalling (3.9), (3.12), and (3.10), we have Recall (3.8). Combining the above inequality with (3.13) we prove the lemma for large n.

Iterative Modifications
We now fix a desired error level ε. In light of Theorem 3.1, we set n = ε −2 . So it suffices to find (y, z) such that V n (y, z) ≤ ε 2 . As in §2.3, we assume Lemma 3.4 Assume (3.16). There exist constants C 0 , c 0 , c 1 > 0, which are independent of K 0 and ε, such that