A Note on Linearization Methods and Dynamic Programming Principles for Stochastic Discontinuous Control Problems

Using the linear programming approach to stochastic control introduced in [6] and [10], we provide a semigroup property for some set of probability measures leading to dynamic programming principles for stochastic control problems. An abstract principle is provided for general bounded costs. Linearized versions are obtained under further (semi)continuity assumptions.


Introduction
Linear programming tools have been efficiently used to deal with stochastic control problems (see [3], [4], [11], [12], [13], [14] and references therein).An approach relying mainly on Hamilton-Jacobi(-Bellman) equations has been developed in [9] for deterministic control systems.This approach has been generalized to controlled Brownian diffusions (cf.[6] for infinite horizon, discounted control problems and [10] for Mayer and optimal stopping problems).The control processes and the associated solutions (and, eventually stopping times) can be embedded into a space of probability measures satisfying a convenient condition.This condition is given in terms of the coefficient functions.Using Hamilton-Jacobi-Bellman techniques, it is proven in [6] and [10] that minimizing continuous cost functionals with respect to the new set of constraints leads to the same value.These formulations turn out to provide the generalized solution of the (discontinuous) Hamilton-Jacobi-Bellman equation.For further details, the reader is referred to [10] and references therein.
For regular cost functionals, the dynamic programming principle has been extensively studied (e.g.[7], [15]).In general, dynamic programming principles are rather difficult to prove if the regularity of the value function is not known a priori.An alternative is to use a weak formulation where the value function is replaced by a test function (cf.[5]).This short paper aims at giving another approach to the dynamic programming principles based on linear programming techniques (cf.[6], [10]).
We begin by briefly recalling the linearization techniques from [10] for Lipschitzcontinuous cost functionals in section 2.1.As a by-product, this allows to characterize the set of constraints as the closed convex hull of occupational measures associated to control processes.We give the linear programming formulation of the value in the case of lower and upper semicontinuous cost functionals and specify the connections between classical value function, primal and dual values (in section 2.2).Using the characterization of the set of constraints, we prove a semigroup property (section 3.1).This property follows naturally from the structure of the sets of constraints.We derive a dynamic programming principle for general bounded cost functionals in section 3.2.In the upper semicontinuous (or convex, lower semicontinuous) setting, we provide a further linearized inequality.This becomes an equality providing a linearized programming principle if the cost functionals are continuous.

Linear formulations for finite horizon stochastic control problems
We let (Ω, F, P) be a complete probability space endowed with a filtration F = (F t ) t≥0 satisfying the usual assumptions and W be a standard, d-dimensional Brownian motion with respect to this filtration.We denote by T > 0 a finite time horizon and we let U be a compact metric space.We consider the stochastic control system for all (s, t, x, y, u) ∈ R 2 × R 2N × U, and for some positive δ 0 > 0. We recall that an admissible control process is any F−progressively measurable process with values in the compact metric space U. We let U denote the class of all admissible control processes on [0, T ] × Ω.Under the assumption (2.2), for every (t, x) ∈ [0, T ] × R N and every admissible control process u ∈ U, there exists a unique solution to (2.1) starting from (t, x) denoted by X t,x,u • .

Lipschitz continuous cost functionals
In this subsection, we recall the basic tools that allow to identify the primal and dual linear formulations associated to (finite horizon) stochastic control problems.The results can be found in [10] (see also [6] for the infinite time horizon).
To any (t, x) ∈ [0, T ) × R N and any u ∈ U, we associate the (expectation of the) occupational measures Linearized DPP Here, P (X ) stands for the set of probability measures on the metric space X .Due to the assumption (2.2), there exists a positive constant C 0 such that, for every T > 0, every (t, x) ∈ [0, T ) × R N and every u ∈ U, one has (2.4) We define Moreover, the set Θ (t, T, x) is convex and a closed subset of P [t, T ] × R N × U × P R N .For further details, the reader is referred to [10].
Let us suppose that f : R × R N × U −→ R, g : R N −→ R are bounded and uniformly continuous such that for all (s, t, x, y, u) ∈ R 2 × R 2N × U, and for some positive c > 0. We introduce the usual value function and the primal linearized value function for all (t, x) ∈ [0, T ] × R N .We also consider the dual value function The following result is a slight generalization of theorem 4 in [10].The proof is very similar and will be omitted.Theorem 2.1.(see [10], theorem 4).Under the assumptions (2.2) and (2.6), Since this result holds true for arbitrary (regular) functions f and g, a standard separation argument yields: Corollary 2.2.The set of constraints Θ (t, T, x) is the closed, convex hull of Γ (t, T, x) : Θ (t, T, x) = coΓ (t, T, x) . (2.10) The closure is taken with respect to the usual (narrow) convergence of probability measures.
Remark 2.3. 1. Due to the inequality (2.4), Prohorov's theorem yields that coΓ (t, T, x) is relatively compact and, thus, Θ (t, T, x) is compact.Moreover, ) In fact, similar estimates can be obtained for any p ≥ 2. The set Θ (t, T, x) can then be defined by taking test functions whose derivatives have at most quadratic growth (C 2 ).Moreover, we can impose a bounded moment of order 2 + δ > 2. In this case, the set of constraints Θ (t, T, x) is also the closed, convex hull of Γ (t, T, x) with respect to the Wasserstein distance W 2 .

Semicontinuous cost functionals
In the case when f and g are only semicontinuous, one can still define V , Λ and µ * .
One can also (formally) consider the associated Hamilton-Jacobi-Bellman equation for all (t, x) ∈ (0, T ) × R N , and V (T, •) = g (•) on R N , where the Hamiltonian is given by for all (t, x, p, A) ∈ R × R N × R N × S N .We recall that S N stands for the family of symmetric N × N −type matrix.The following theorem states the connection between these various functions and the equation (2.12).Proofs for these results can be found in section 4 of [10].They are based on inf/sup-convolutions and the compactness of Θ (t, T, x).
Theorem 2.4.If the functions f and g are bounded and lower semicontinuous, then the primal and dual value functions coincide Λ = µ * .The common value is the smallest bounded l.s.c.viscosity supersolution of (2.12).
If the functions f and g are bounded and upper semicontinuous, then V coincides with Λ.The common value is the largest bounded u.s.c.viscosity subsolution of (2.12).
In the l.s.c.case, the assertions are given by theorem 7 in [10].The u.s.c.case is covered by theorem 8 in [10].
Remark 2.5.If the functions f and g are bounded and lower semicontinuous (but not continuous), V and Λ might not coincide.Under usual convexity assumptions, Λ coincides with the weak formulation of V (i.e.
Remark 2.6.In the case of control problems with semicontinuous cost and governed by a deterministic equation the standard assumption is convexity of the dynamics.In this case, the value function is also semicontinuous (see for instance [2,8]).Moreover, if the cost is continuous, the value function associated to the previous dynamics coincides with the value function associated to the convexified differential inclusion because of Fillipov's theorem (see for instance [1]).However, when the dynamics is not convex and the cost is l.s.c. the value functions associated to the previous dynamics might not coincide.Consequently, in order to obtain the existence of optimal trajectories/controls and to characterize the l.s.c.value function as the generalized solution (smallest bounded l.s.c.viscosity supersolution) to the associated Hamilton-Jacobi system, one would have to consider the convexified differential inclusion.This definition is similar to (2.8).
In the stochastic framework, one can reason in a similar way, by taking the weak limit of solutions (i.e. the closed convex hull of occupational measures).This set turns out to be Θ introduced in our paper.We emphasize that the actual procedure is somewhat reversed: we consider the explicit set and deduce that it is the closed convex hull of occupational measures.It is our opinion that this is the natural method allowing to obtain good properties of the value function whenever the cost is semicontinuous (or just bounded).

A linear formulation for the dynamic programming principle 3.1 A semigroup property
We fix x 0 ∈ R N .Let us consider t 1 , t 2 ≥ 0 such that 0 < t 1 + t 2 ≤ T, where T > 0 is a fixed terminal time.By analogy to Θ (t, T, x) for x ∈ R N , we define sets of constraints starting from measures.If γ ∈ P [0, The reader is invited to notice that x can be identified with γ t,t,x,u , for all u ∈ U.
Proof.We only need to prove that Θ (t 1 , t 1 + t 2 , γ) is nonempty.Convexity and closedness is obvious from the definition, while the second-order moment inequalities guarantee relative compactness (due to Prohorov's theorem).We consider an arbitrary test We begin by assuming that γ = γ 0,t1,x0,u , for some admissible control process u ∈ U. We define a couple of probability measures η 1 , η One has Therefore, combining (3.1) with (2.3), the assertion of our proposition holds true for occupational couples.
Proof.First, let us notice that, whenever x ∈ B ε n , Due to (2.11), one has Similar estimates hold true for η ε,1 .Prohorov's theorem yields the relative compactness.Moreover, if η is a limit as ε → 0 of some subsequence, for some constant c depending only on φ.We conclude that η ∈ Θ (t 1 , t 1 + t 2 , γ) .

Dynamic programming principles
Let us now return to the primal value function introduced by (2.8).If γ = γ 1 , γ 2 ∈ Θ (t 0 , t 0 , x 0 ) , then γ 2 = δ x0 and it seems natural to impose Proposition 3.7.If f and g are l.s.c. and bounded, then the restriction of the functional γ → Λ (t, γ) to Θ (t 0 , t, x 0 ) is convex, for all t 0 < t ≤ T and all x 0 ∈ R N .
We can now state and prove the dynamic programming principle(s) for the value function Λ.In the bounded case, an abstract DPP is obtained.Under further assumptions, the DPP has a linear form.We only have an inequality if the cost functions are semicontinuous and one gets a linear DPP in the uniformly continuous framework.This is just a first step in studying the equations that would characterize the discontinuous value function.
Theorem 3.8.(Dynamic Programming Principles) 1. Let us suppose that the functions f and g are bounded.Then, the following equality holds true for all x 0 ∈ R N and all t ∈ (t 0 , T ) .
2. If f and g are bounded and u.s.c, then The same holds true if {(σσ * (t, x, u) , b (t, x, u) , f (t, x, u)) : u ∈ U } is convex and f and g are bounded and l.s.c.
3. If f and g are bounded and Lipschitz-continuous, for all x 0 ∈ R N and all t ∈ (t 0 , T ) .
Remark 3.9.If f and g are bounded and Lipschitz-continuous, Λ coincides with the classical value function V (cf.Theorem 4 in [10]), being, therefore, uniformly continuous.The method can be extended to uniformly continuous cost functions.
2. In the upper semicontinuous case, due to [10], proposition 11, for all (s, x) ∈ [0, T ] × R N .Here, V w is the weak value function and it is defined on the family of weak control processes U w = π : π = Ω, F, (F t ) t≥0 , P, W, u .It is known by the classical optimality principles (see, for example [15], chapter 4, lemma 3.2 and theorem 3.3) that To every weak control π we can associate, as before, an occupational couple As in the strong case, γ 0,t,x0,π ∈ Θ (0, t, x 0 ) .With this notation, the inequality (3.5) gives and, we conclude using (3.4).In the l.s.c.case, one uses [10], proposition 12.
In the deterministic framework (σ = 0), if γ is an occupational couple associated to x 0 and u ∈ U, then, Θ (t, T, γ) = Θ t, T, x t0,x0,u t and, by definition, R N Λ (t, x) γ 2 (dx) = Λ t, x t0,x0,u t = Λ (t, γ) .It follows from the second assertion of the previous theorem that we have equality for u.s.c.costs and the proof of 3 is much simpler.