Nonlinear Continuous Semimartingales

In this paper we study a family of nonlinear (conditional) expectations that can be understood as a continuous semimartingale with uncertain local characteristics. Here, the differential characteristics are prescribed by a set-valued function that depends on time and path in a non-Markovian way. We provide a dynamic programming principle for the nonlinear expectation and we link the corresponding value function to a variational form of a nonlinear path-dependent partial differential equation. In particular, we establish conditions that allow us to identify the value function as the unique viscosity solution. Furthermore, we prove that the nonlinear expectation solves a nonlinear martingale problem, which confirms our interpretation as a nonlinear semimartingale.


Introduction
In this paper we study a family of nonlinear (conditional) expectations which we call nonlinear continuous semimartingale and which we consider as a continuous semimartingale with uncertain local characteristics. This line of research started with the seminal work of Peng [36,37] on the G-Brownian motion. In recent years, there have been several extensions to construct larger classes of nonlinear (Markov) processes, see [12,18,33,34]. At this point we highlight the articles [11,32,35] which establish general abstract measure theoretic concepts to construct nonlinear expectations. Next to this approach, nonlinear Markov processes have also been constructed via nonlinear semigroups, see [7,17,27,28] and, in particular, [17,Chapter 4] for a comparison of the methods. This paper investigates nonlinear processes with non-Markovian dynamics. We define a nonlinear expectation via E t (ψ)(ω) := sup P ∈C(t,ω) E P ψ , (t, ω) ∈ R + × C(R + ; R d ), where ψ : C(R + ; R d ) → R is an upper semianalytic function and C(t, ω) is a set of probability measures on the Wiener space which give point mass to the path ω till time t and afterwards coincide with the law of a semimartingale with absolutely continuous characteristics. In this paper, we parameterize drift and volatility by a compact parameter space F and two functions b : F × R + × C(R + ; R d ) → R d and a : F × R + × C(R + ; R d ) → S d + such that C(t, ω) := P ∈ P ac sem (t) : P (X t = ω t ) = 1, (λ \ ⊗ P )-a.e. (dB P ·+t /dλ \ , dC P ·+t /dλ \ ) ∈ Θ(· + t, X) , where Θ(t, ω) := (b(f, t, ω), a(f, t, ω)) : f ∈ F , and P ac sem (t) denotes the set of semimartingale laws after t with absolutely continuous characteristics. This framework includes nonlinear Lévy processes as introduced (with jumps) in [33] and the class of nonlinear affine processes as studied in [12]. Furthermore, our setting can also be used to model path-dependent dynamics such as stochastic delay equations under parameter uncertainty.
For this nonlinear expectation we prove the dynamic programming principle (DPP), i.e., we prove the tower property E σ (ψ) = E σ (E τ (ψ)) for all finite stopping times τ ≥ σ. To prove the DPP we use an abstract theorem from [11]. The work lies in the verification of its prerequisites. To check them we extend certain results from [32] on the measurability of the semimartingale property and the behavior of the characteristics to a dynamic framework.
In a second step we identify two properties of E which confirm our interpretation as a nonlinear continuous semimartingale. First, we relate the value function v(t, ω) := E t (ψ)(ω) = sup P ∈C(t,ω) to a path-dependent Kolmogorov type partial differential equation and, in the second part, we show that E solves a type of nonlinear martingale problem. Let us discuss our contributions in more detail.
Under mere continuity and linear growth conditions on the drift b and the volatility a, and under the hypothesis that the set {(b(f, t, ω), a(f, t, ω)) : f ∈ F } is convex for every (t, ω) ∈ R + × C(R + , R d ), we show that the value function v is a weak sense viscosity solution, i.e., a path-dependent Crandall-Lions type viscosity solution without regularity properties, to the following path-dependent partial differential equation (PPDE): where G(t, ω, φ) := sup ∇φ(t, ω), b(f, t, ω) The proof for the viscosity property is split into two parts, i.e., we prove the sub-and the supersolution property. The ideas of proof are based on applications of Berge's maximum theorem, Skorokhod's existence theorem for stochastic differential equations and Lebesgue's differentiation theorem. In contrast to the proofs from [12,33] for the viscosity subsolution property in Lévy and continuous affine frameworks respectively, we do not work with explicit moment estimates. This allows us to extend the class of test functions in our framework to C 1,2 in comparison to the class C 2,3 as used in [12,33].
Further, we investigate when the value function is not only a weak sense viscosity solution but has in addition some regularity properties. By virtue of the last term in (1.1) and Berge's maximum theorem, it is a natural idea to deduce regularity properties of v from corresponding properties of the set-valued map (t, ω) → C(t, ω). To the best of our knowledge, the idea to deduce regularity properties of value functions from related properties of set-valued maps traces back to the seminal paper [10] that investigates a controlled diffusion framework. Due to the appearance of P ac sem (t) and (dB P ·+t /dλ \ , dC P ·+t /dλ \ ) in the definition of the set C(t, ω), regularity properties of (t, ω) → C(t, ω) seem at first glance to be difficult to verify. To get a more convenient condition, we show that v(t, ω) = sup P ∈C(t,ω) where R(t, ω) := P ∈ P ac sem : P • X −1 0 = δ ω(t) , (λ \ ⊗ P )-a.e. (dB P /dλ \ , dC P /dλ \ ) ∈ Θ(· + t, ω ⊗ t X) , and ω ⊗ t ω ′ := ω1 [0,t) + (ω(t) + ω ′ (· − t) − ω ′ (0))1 [t,∞) . This reformulation of the value function v explains that it suffices to investigate the correspondence (t, ω) → R(t, ω) and it connects the two (seemingly closely related) approaches from [11] and [35] for the construction of nonlinear expectations. We show that (t, ω) → R(t, ω) is upper hemicontinuous and compact-valued, which establishes upper semicontinuity of v. This requires a profound analysis of the limiting behaviour of semimartingale characteristics and hinges on the continuity of the correspondence (t, ω) → Θ(t, ω). We also present a counterexample which explains that mere continuity and linear growth of b and a are insufficient for lower hemicontinuity of (t, ω) → R(t, ω).
Provided b and a are additionally locally Lipschitz continuous, we prove lower hemicontinuity of (t, ω) → R(t, ω). To this end, we combine arguments based on an implicit function theorem, a strong existence property for stochastic differential equations with random locally Lipschitz coefficients and Gronwall's lemma. To the best of our knowledge, this is the first result regarding lower hemicontinuity in a path-dependent setting related to nonlinear stochastic processes.
Under a uniform Lipschitz continuity condition on the coefficients b and a and the terminal function ψ, we also show that the value function v has a certain Lipschitz property w.r.t. This observation allows us to invoke a novel uniqueness result from [44] that identifies v as the unique viscosity solution to the PPDE (1.2) that is bounded and Lipschitz continuous w.r.t. d.
Next, we discuss the martingale problem related to E. In case the coefficients b and a are compactly parameterized independently of each other, and under linear growth and mere continuity assumptions, for suitable test functions φ, we show that the process is a local E-martingale. This seems to be a novel connection between nonlinear martingale problems and nonlinear processes in a path-dependent setting. Our proof is based on applications of a measurable maximum theorem and Skorokhod's existence theorem.
This paper is structured as follows: in Section 2 we introduce our setting. In Section 3 we state the DPP and in Section 4 we discuss the relation of the correspondence R to the PPDE (1.2). In Section 5 the nonlinear martingale problem is stated. The proofs for our main results are given in the remaining sections. More precisely, the DPP is proved in Section 6, the viscosity property is proved in Section 7, regularity of R and v are established in Section 8 and the martingale problem is proved in Section 9.

The Setting
Let d ∈ N be a fixed dimension and define Ω to be the space of continuous functions R + → R d endowed with the local uniform topology. The Euclidean scalar product and the corresponding Euclidean norm are denoted by ·, · and · . We write X for the canonical process on Ω, i.e., X t (ω) = ω(t) for ω ∈ Ω and t ∈ R + . It is well-known that F := B(Ω) = σ(X t , t ≥ 0). We define F := (F t ) t≥0 as the canonical filtration generated by X, i.e., F t := σ(X s , s ≤ t) for t ∈ R + . Notice that we do not make the filtration F right-continuous. The set of probability measures on (Ω, F ) is denoted by P(Ω) and endowed with the usual topology of convergence in distribution. Let F be a metrizable space and let b : F × R + × Ω → R d and a : F × R + × Ω → S d + be Borel functions such that (t, ω) → b(f, t, ω) and (t, ω) → a(f, t, ω) are predictable for every f ∈ F . Here, S d + denotes the space of all real-valued symmetric positive semidefinite d × d matrices. We define the correspondence, i.e., the set-valued mapping, Θ : Standing Assumption 2.1. Θ has a measurable graph, i.e., the graph In Lemma 2.8 below we will see that this standing assumption holds once F is compact and b and a are continuous in the F variable.
We call an R d -valued continuous process Y = (Y t ) t≥0 a (continuous) semimartingale after a time t * ∈ R + if the process Y ·+t * = (Y t+t * ) t≥0 is a d-dimensional semimartingale for its natural right-continuous filtration. Notice that it comes without loss of generality that we consider the right-continuous version of the filtration (see [32,Proposition 2.2]). The law of a semimartingale after t * is said to be a semimartingale law after t * and the set of them is denoted by P sem (t * ). Notice also that P ∈ P sem (t * ) if and only if the coordinate process is a semimartingale after t * , see Lemma 6.4 below. For P ∈ P sem (t * ) we denote the semimartingale characteristics of the shifted coordinate process X ·+t * by (B P ·+t * , C P ·+t * ). Moreover, we set P ac sem (t * ) := P ∈ P sem (t * ) : P -a.s. (B P ·+t * , C P ·+t * ) ≪ λ \ , P ac sem := P ac sem (0), where λ \ denotes the Lebesgue measure. For ω, ω ′ ∈ Ω and t ∈ R + , we define the concatenation Finally, for (t, ω) ∈ R + × Ω, we define C(t, ω) ⊂ P(Ω) by where we use the standard notation X t := X ·∧t . To lighten our notation, let us further define, for two stopping times S and T , the stochastic interval In Lemma 2.10 below we will see that this standing assumption holds under continuity and linear growth conditions on b and a.
In the following we state and discuss some conditions needed to formulate our main results.  The functions (f, t, ω) → b(f, t, ω) and (f, t, ω) → a(f, t, ω) are continuous.
Before we present our main results, let us shortly show that our standing assumptions hold under some of the above conditions. We start with Standing Assumption 2.1.
Lemma 2.8. If F is a compact metrizable space and Condition 2.5 holds, then the correspondence Θ has measurable graph, i.e., Standing Assumption 2.1 holds.
The previous lemma is a direct consequence of the following general observation. Lemma 2.9. Let (Σ, G) be a measurable space, let F be a compact metrizable space, let E be a separable metrizable space, and finally let g : F ×Σ → E be a Carathéodory function, i.e., g is continuous in the first and measurable in the second variable. Then, the correspondence ϕ defined by ϕ(σ) := {g(f, σ) : f ∈ F } has a measurable graph.
Moreover, also our second standing assumption follows from some conditions above. Proof. This follows from Lemma 7.11 below.
The value function can be interpreted as a nonlinear expectation E given by The DPP in Theorem 3.1 provides the tower rule for E. Namely, (3.1) means that By its pathwise structure, the equality (3.2) also implies that for all finite stopping times τ ≥ σ. To prove Theorem 3.1 we use a general theorem from [11]. The work lies in the verification of the prerequisites, which are (i) a measurable graph property, (ii) a stability property for conditioning, and (iii) a stability property for pasting. The proof is given in Section 6 below.
Example 3.2. In the following we mention some examples of stochastic models that are covered by our framework. We stress that it includes many previously studied frameworks but also some new ones which are of interest for future investigations.
(i) The case where Θ(t, ω) ≡ Θ is independent of time t and path ω corresponds to the generalized G-Brownian motion as introduced in [38], cf. also [33] for a nonlinear Lévy setting with jumps. (ii) The situation where Θ(t, ω) ≡ Θ(ω(t)) depends on (t, ω) only through the value ω(t) corresponds to a Markovian setting that has, for instance, been studied in [5,17].
This corresponds to a class of linear stochastic delay differential equations with parameter uncertainty (that is captured by the F -dependence of b and a). Control problems for such a pathdependent setting were studied in [14]. From a modeling perspective, natural assumptions such as sign-constraints and ellipticity can be incorporated through the functions b 0 , b 1 and a 0 , respectively. More references involving control problems for delay equations can be found in [3]. Our framework seems to be the first that captures stochastic delay equations from the perspective of nonlinear stochastic processes.
Given the DPP, we proceed studying more properties of v and E. In the following section we identify v as a viscosity solution to a certain nonlinear path-dependent partial differential equation (PPDE) and, in Section 5, we identify E as a solution to a nonlinear martingale problem.

The nonlinear Kolmogorov Equation
In the following we discuss the relation of the value function to a path-dependent Kolmogorov type partial differential equation. This section is structured as follows. In Section 4.1 we define the class of test functions for the concept of Crandall-Lions type viscosity solutions in our path-dependent setting. The Kolmogorov type equation is introduced in Section 4.2. Finally, we present our main results in Section 4.3.

4.1.
The class of test functions. In the first part of this section we introduce the set of test functions for the concept of Crandall-Lions type viscosity solutions in our path-dependent setting. The following definitions are adapted from [3,4]. Let T > 0 and t 0 ∈ [0, T ). Further, let D(R + ; R d ) be the space of càdlàg functions from R + into R d . We define Λ(t 0 ) := [t 0 , T ] × D(R + ; R d ) and, on [0, T ] × D(R + ; R d ), we further define the pseudometric d as We denote the restriction of d to Λ(t 0 ) again by d. For a map F : Λ(t 0 ) → R we say that F admits a horizontal derivative at (t, ω) ∈ Λ(t 0 ) with t < T iḟ exists. At t = T , the horizontal derivative is defined aṡ Further, we say that F admits a vertical derivative at (t, ω) ∈ Λ(t 0 ) if exist, where e 1 , . . . , e d are the standard unit vectors in R d . Accordingly, the second vertical derivatives . We write ∇F := (∂ 1 F, . . . , ∂ d F ) for the vertical gradient and ∇ 2 F := (∂ 2 ij F ) i,j=1,...,d for the vertical Hessian matrix.
Next, we denote by C 1,2 (Λ(t 0 ); R) the set of functions F : Λ(t 0 ) → R, continuous with respect to d, such that∂ F, ∇F, ∇ 2 F exist everywhere on Λ(t 0 ) and are continuous with respect to d.
In this case, we define,  ), b(f, t, ω) + 1 2 tr ∇ 2 φ(t, ω)a(f, t, ω) : f ∈ F . One of our goals is the identification of the value function v as a so-called viscosity solution to the nonlinear PPDE where ψ : Ω → R is a bounded continuous function such that ψ(ω) = ψ(ω(· ∧ T )). For notational convenience, we fix the terminal function ψ from now on. In contrast to the classical case, where the solution runs over time and space, we consider viscosity solutions which run over time and path. Let us provide a precise definition of a viscosity solution in our setting. (a) u(T, ·) ≤ ψ; we have∂φ(t, ω) + G(t, ω, φ) ≥ 0. Moreover, a function u : [[0, T ]] → R is said to be a weak sense viscosity supersolution to (4.2) if the following two properties hold: (a) u(T, ·) ≥ ψ; Further, u is called weak sense viscosity solution if it is a weak sense viscosity sub-and supersolution. Finally, a continuous weak sense viscosity solution is called viscosity solution.

4.3.
Main Results. Before we present our main results, we need a last bit of notation. For ω, ω ′ ∈ Ω and t ∈ R + , we define the concatenation We are in the position to present the main results of this section. Let us start with the viscosity solution part. For a first result regarding upper semicontinuity of v in a Markovian framework beyond the Lévy case (albeit under uniform boundedness and global Lipschitz assumptions) we refer to [17,Lemma 4.42]. The thesis [17] contains no explicit conditions for lower semicontinuity that appears to be related to martingale problems with possibly non-regular coefficients, which are difficult to study, see also [17,Remark 4.43], [27,Remark 3.4] and [28,Remark 5.4] for comments in this direction.
In the following, we prove more general conditions for upper and lower hemicontinuity of the correspondence (t, ω) → R(t, ω) which lead to explicit conditions for the continuity of the value function and thereby identify it as a viscosity solution to the nonlinear PPDE (4.2).
Theorem 4.4 (Upper hemicontinuity of R). Assume that F is a compact metrizable space and that the Conditions 2.4, 2.6 and 2.7 hold. Then, the correspondence (t, ω) → R(t, ω) is upper hemicontinuous with compact values.
The following example shows that the conditions from Theorem 4.4 are not sufficient for lower hemicontinuity of (t, ω) → R(t, ω).
Next, we show that the correspondence (t, ω) → R(t, ω) is lower hemicontinuous in case we impose an additional local Lipschitz condition. For a matrix A, we denote its transposed by A * and we denote by The following theorem is seemingly the first result regarding lower hemicontinuity and, together with Theorem 4.3, the first result on lower semicontinuity of the value function in a path-dependent framework related to nonlinear stochastic processes.
Theorem 4.7 (Lower hemicontinuity of R). Assume that F is a compact metrizable space and that the Conditions 2.4, 2.6, 2.7 and 4.6 hold. Then, the correspondence (t, ω) → R(t, ω) is lower hemicontinuous.
In the following, we consider these choices of b and a. Evidently, the convexity assumption given by Condition 2.7 is satisfied. Further, the linear growth Condition 2.4 holds once the functions b, b, a and a satisfy itself linear growth conditions, i.e., in case for every T > 0 there exists a constant C = C T > 0 such that Similarly, the continuity Condition 2.6 is implied by (joint) continuity of b, b, a and a. Under these conditions, i.e., continuity and linear growth, Theorem 4.4 implies that the correspondence (t, ω) → R(t, ω) is upper hemicontinuous with compact values, while Theorem 4.3 implies that the value function v is upper semicontinuous. Moreover, if b and a satisfy the local Lipschitz condition given by Condition 4.6, then Theorem 4.7 shows that the correspondence (t, ω) → R(t, ω) is lower hemicontinuous. In particular, the value function v is then continuous. For instance, the local Lipschitz conditions hold in case for every T, M > 0 there exists a constant C = C T,M > 0 such that a t (ω) ≥ 1/C and Remark 4.9. Let us also comment on a result from the paper [34] that deals with a regularity property of a value function V : , that are local martingale measures after time t, and for that there exists a δ = δ(t, ω, P ) > 0 such that (λ \ ⊗ P )-a.e. a P ·+t ∈ S d ++ and a P ·+t ∈ Int δ D(· + t, X). The value function V from [34] is given by In our setting, mere continuity of a and a imply upper semicontinuity of our value function, see Example 4.8, while we require some local Lipschitz continuity for lower semicontinuity.
In the following corollary we summarize our main observations concerning the correspondence R and the value function v. It is a natural question whether the value function is the unique viscosity solution to the nonlinear PPDE (4.2). Uniqueness results for certain Hamilton-Jacobi-Bellman PPDEs were recently proved in [4,44]. In the following we apply a theorem from [44] and show that, under global Lipschitz conditions, our value function is the unique viscosity solution among all solutions that satisfy a certain Lipschitz property that we explain now. We define d = d T : Condition 4.11 (Global Lipschitz continuity). There exists a dimension r ∈ N and a B(F ) ⊗ P- for all ω, α ∈ Ω. Then, the value function is d-Lipschitz continuous.
Estimates similar to Theorem 4.12 are standard in stochastic optimal control, see, e.g., [44,Theorem 2.11] where a version of Theorem 4.12 for a path-dependent control framework is given.
When it comes to mere continuity, Theorem 4.12 requires stronger regularity assumption in the path variable than Corollary 4.10. In particular, also the input function ψ has to be Lipschitz continuous in Theorem 4.12. Contrary to Corollary 4.10, Theorem 4.12 shows that the value function is Lipschitz continuous in space for fixed, but arbitrary, times with a uniform (in time) Lipschitz constant. We think this observation is of independent interest.
We are in the position to present a uniqueness result for the value function.
for all ω, α ∈ Ω. Then, the value function v is the unique viscosity solution that is bounded and d-Lipschitz continuous.
Proof. Thanks to the Theorems 4.3 and 4.12, the value function v is a viscosity solution to (4.2) that is d-Lipschitz continuous. Hence, the uniqueness statement follows from [44, Theorem 6.2].
Remark 4.14. Under more assumptions on the coefficients, a uniqueness result in a larger class of continuous function has been proved in the recent paper [4].
(i) In the Markovian case where b(f, t, ω) and a(f, t, ω) depend on (t, ω) only through the value ω(t), assumptions in the spirit of those from Theorem 4.13 imply that the (point-dependent) value function is unique among all bounded viscosity solutions, see [5] for a precise result in this direction. (ii) Let us again comment on the situation from Example 4.8, i.e., the case where d = 1 and As explained in Example 4.8, this situation can be included in our framework with F : , In this setting, Condition 4.6 holds in case there exists a constant C > 0 such that for all t ∈ [0, T ] and ω, α ∈ Ω.

The martingale problem
As a final main result, we show that E solves a type of nonlinear martingale problem. This result supports our interpretation of E as a nonlinear continuous semimartingale. We restrict our attention to the one-dimensional situation, i.e., we presume that d = 1. Let M icx be the set of all φ ∈ C 2 (R; R) such that φ ′ , φ ′′ ≥ 0. Furthermore, for n > 0, we set Definition 5.1. We say that E solves the martingale problem associated to G if for all n ∈ N and φ ∈ M icx the process Of course, the acronym icx stands for increasing and convex. Although this set of test functions looks non-standard at first sight, it is perfectly fine in the linear setting, because it is well-known to be measure-determining for the set of Borel probability measures on R with finite first moment (cf. [  In [15] a related result for generalized G-Brownian motion was proved. Let us discuss the relation of Theorem 5.2 and the approach from [15] in more detail. In [15] it is shown that a generalized G-Brownian motion (see [38]) solves a nonlinear martingale problem that is defined not only via test functions but also via a class of nonlinear test generators. This approach uses the power of G-Itô calculus in a crucial manner. In our framework we cannot rely on such a stochastic calculus. Instead, we only identify a suitable class of test functions, namely M icx , for that we can prove a nonlinear martingale property. In that sense our treatment of the martingale problem is certainly less complete than those for the generalized G-Brownian motion from [15]. We think that Theorem 5.2 is a good indicator for our interpretation of E as a nonlinear continuous semimartingale. The proof is based on an application of [11, Theorem 2.1], which provides three abstract conditions on the correspondence C which imply the DPP. The work lies in the verification of these conditions. For reader's convenience, let us restate them. For a probability measure P on (Ω, F ), a kernel Ω ∋ ω → Q ω ∈ P(Ω), and a finite stopping time τ , we define the pasting measure We are in the position to formulate the assumptions from [11, Theorem 2.1].
In the following three sections we check these properties. In the fourth (and last) section, we finalize the proof of Theorem 3.1.
Before we start our program let us shortly comment on our strategy of proof and relate it to existing literature. Dynamic programming principles for nonlinear expectations related to nonlinear Lévy processes, and more general Markovian semimartingales, have been proved in [17,33]. In contrast to our approach, which has also been used in [11] for controlled martingale problems, the proofs in [17,33] are based on an abstract result from [35]. In general, the methodologies are different in the sense that the uncertainty sets of measures in [17,33] consist of laws of semimartingales that start at zero, while time and path dependence persists in both the set Θ and the test function ψ. In our setting, however, the dependence on (t, ω) only enters via the set C(t, ω). This construction seems to us closer to those of a classical linear conditional expectation.
Compared to [17,33], the main difference in our setting is the time-dependence t → P ac sem (t) for which we have to prove new measurability and stability properties. While this differs from [17,33] on a technical level, we closely follow, however, their general strategy of proof.
6.1. Measurable graph condition. The proof of the measurable graph condition is split into several parts. For t ∈ R + , we define the usual shift θ t : Ω → Ω by θ t (ω) = ω(· + t) for all ω ∈ Ω. The next two lemmata give some rather elementary observations.
The proof is complete.
Proof. The first claim is obvious and the second follows from the first and Lemma 6.7. Lemma 6.9. Take a predictable process H = (H t ) t≥0 , t ∈ R + and ω ∈ Ω. Then, H ·+t (ω ⊗ t X) is predictable for the (right-continuous) filtration generated by X ·+t .
The next lemma is a partial restatement of [42, Theorem 1.2.10]. Lemma 6.10. If M − M t * is a P -martingale, then there exists a P -null set N such that, for all ω ∈ N , The following lemma should be compared to [33, Theorem 3.1].
This observation yields the formula for the second characteristic.
Proof. Let T > t * , and take a sequence (H n ) n∈Z+ of simple predictable processes on [[t * , T ]] such that H n → H 0 uniformly in time and ω. Then, by Lemma 6.14, we get, for every i = 1, 2, . . . , d, The first term converges to zero since P ∈ P sem (t * ) and thanks to the Bichteler-Dellacherie (BD) Theorem ([40, Theorem III.43]). The second term converges to zero by dominated convergence, the assumption that P -a.s. Q ∈ P sem (τ ) and, by virtue of Lemma 6.9, again the BD Theorem. Consequently, invoking the BD Theorem a third time yields that P ∈ P sem (t * ).
Finally, as Q ω (τ = τ (ω)) = 1 for P -a.a. ω ∈ Ω, Corollary 6.13 and Fubini's theorem yield that This completes the proof. In this section we prove that the value function is a weak sense viscosity solution to the PPDE (4.2) and we discuss some of its regularity properties. By its very definition, regularity of the value function is closely linked to the regularity of the correspondence (t, ω) → C(t, ω). Due to the appearance of P ac sem (t) and (dB P ·+t /dλ \ , dC P ·+t /dλ \ ) in the set C(t, ω), certain regularity properties of (t, ω) → C(t, ω) seem at first glance to be difficult to verify. To get a more convenient condition, in Section 7.1, we show that v(t, ω) = sup P ∈C(t,ω) This reformulation of the value function v explains that it suffices to investigate the correspondence (t, ω) → R(t, ω) from (4.3). Thereby, we shift the main (t, ω) dependence to the explicitly given correspondence Θ, as the remaining parts from R(t, ω) only depend on P ac sem = P ac sem (0) and (dB P /dλ \ , dC P /dλ \ ). In the Sections 7.2 and 7.3, we show that v is a viscosity sub-and supersolution to (4.2). The ideas of proof are based on applications of Berge's maximum theorem, Skorokhod's existence theorem for stochastic differential equations and Lebesgue's differentiation theorem. In contrast to the proofs from [12,33] for the viscosity subsolution property in Lévy and continuous affine frameworks respectively, we do not work with explicit moment estimates. This allows us to extend the class of test functions to C 1,2 in comparison to the class C 2,3 as used in [12,33]. Further, this extension also enables us to apply the uniqueness result from [44] to deduce Theorem 4.13. Given the above mentioned results, the proof of Theorem 4.3 is finalized in Section 7.4. 7.1. Some preparations. To execute the program outlined above, we require more notation. First, for t ∈ R + , we define another shift operator γ t : Ω → Ω by γ t (ω) := ω((· − t) + ) for all ω ∈ Ω. For P ∈ P(Ω), we denote P t := P • γ −1 t . Moreover, for (t, ω) ∈ [[0, ∞[[, we write ξ t,ω (ω ′ ) := ω ⊗ t ω ′ and define Q(t, ω) := P ∈ P ac sem (t) : Further, recall the definition of R as given in (4.3).
Proof. Lemma 6.5 shows the inclusion {P t : P ∈ Q(t, ω)} ⊂ R(t, ω), as P t • X −1 0 = P • X −1 t = δ ω(t) for every P ∈ Q(t, ω). Conversely, given P ∈ R(t, ω), the measure P t is contained in Q(t, ω). To see this, note first that P t • X −1 t = P • X −1 0 = δ ω(t) . Second, P t ∈ P ac sem (t) since (P t ) t = P ∈ P ac sem as θ t • γ t = id. Finally, Lemma 6.4 together with (ω ⊗ t X) t,ω . Note that the canonical process X is a semimartingale after time t if and only if the process ω ⊗ t X is a semimartingale after time t. Thus, Lemma 6.4 implies, together with the identity ξ t,ω • ξ t,ω = ξ t,ω , that P ∈ Q(t, ω) if and only if P • ξ −1 t,ω ∈ Q(t, ω). This completes the proof. Summarizing the above, we may conclude the following corollary, which provides, for our framework, the connection between the approaches in [11] and [35] for the construction of nonlinear expectations on path spaces.
Proof. Using Lemma 7.2 for the first, the identity ξ t,ω = (ω ⊗ t X) • θ t for the second, and Lemma 7.1 for the final equality, we obtain sup P ∈C(t,ω) The proof is complete.
By the Arzelà-Ascoli theorem, any relatively compact set G ⊂ Ω is N -bounded for every N > 0. (λ \ ⊗ P )-a.e. (dB P /dλ \ , dC P /dλ \ ) ∈ Θ(· + t, ω ⊗ t X) is relatively compact in P(Ω). Moreover, for every p ≥ 1, Proof. Thanks to Prohorov's theorem, we need to show that R * is tight, which we do by an application of Kolmogorov's criterion ([26, Theorem 21.42]), i.e., we show the following two conditions: (a) the family {P • X −1 0 : P ∈ R * } is tight; (b) for each T > 0 there are numbers C, α, β > 0 such that, for all s, t ∈ [0, T ], we have Part (a) follows easily from the fact that sup P ∈R * E P [ X 0 ] < ∞, which uses the boundedness of K. We now show (b). Fix T > 0, p ≥ 1 and set For a moment, take P ∈ R * and let (b P , a P ) be the Lebesgue densities of the semimartingale characteristics of X under P . Using the Burkholder-Davis-Gundy inequality, Hölder's inequality and the linear growth assumption, i.e., Condition 2.4, for all t ∈ [0, T ], we obtain that Here, the constant might depend on d, p, T, N, G and K but it is independent of n, t and P . Finally, using Gronwall's lemma and Fatou's lemma, we conclude that (7.1) holds. It remains to finish the proof for (b). Take 0 ≤ s ≤ t ≤ T . Using again the Burkholder-Davis-Gundy inequality, the linear growth assumption and (7.1), we get, for every P ∈ R * , that where the constant C > 0 is independent of s, t and P . Consequently, we conclude that (b) holds with α = 4 and β = 1. The proof is complete.
We collect more technical observations. Recall that D(R + ; R d ) denotes the space of càdlàg functions from R + into R d . In the following we endow D(R + ; R d ) with the Skorokhod J 1 topology, see [23,Section VI.1] or [16, Section XV.1] for details on this topology. For ω, α ∈ Ω and t ∈ R + , we define the concatenation Notice that ω ⊗ t α ∈ D(R + ; R d ). Furthermore, recall that we endow Ω with the local uniform topology.
Evidently, λ n is a strictly increasing continuous function such that λ(0) = 0 and λ n (s) → ∞ as s → ∞. Furthermore, for every N > 0, we have as n → ∞. Notice that (ω n ⊗ t n α n )(λ n (s)) = ω n ( t n s t )1 {s<t} + α n (s + t n − t)1 {s≥t} , s ∈ R + . Thus, for every N > 0 and some T = T N > 0 large enough, we have Thanks to the Arzelà-Ascoli theorem, we conclude that all terms on the r.h.s. converge to zero as n → ∞, which implies that sup as n → ∞. Consequently, by virtue of [16,Theorem 15.10], ω n ⊗ t n α n → ω ⊗ t α in the Skorokhod J 1 topology. The proof is complete.
In the following lemma we take care of the case t = 0.
Proof. For every T > 0, we estimate By the Arzelà-Ascoli theorem, the r.h.s. tends to zero as n → ∞. This completes the proof.
Hence, the map . We conclude that the claimed continuity properties of F,∂F, ∇F, ∇ 2 F follow from continuity of (7.4) together with the assumed continuity properties of φ,∂φ, ∇φ, ∇ 2 φ. The proof is complete.
7.2. Subsolution Property. In this section we prove that the value function is a weak sense viscosity subsolution to the nonlinear PPDE (4.2).
Lemma 7.9. Assume that F is a compact metrizable space and that the Conditions 2.4, 2.6 and 2.7 hold. The value function v is a weak sense viscosity subsolution to (4.2).
We fix P ∈ R(t, ω) and denote the Lebesgue densities of the P -characteristics of X by (b P , a P ). The pathwise Itô formula given by [3,Theorem 2.2] yields, together with Lemma 7.8, that By virtue of the linear growth condition, the polynomial growth of ∇φ and the moment bound from Lemma 7.4, the local martingale part of the stochastic integral above is a true martingale and we get which implies that ds . We investigate the right hand side when u ց 0. E P G(s, X) ds → sup P ∈R(t,ω) E P G(0, X) , u ց 0.
Proof. By Lebesgue's differentiation theorem, it suffices to prove that the function s → sup P ∈R(t,ω) E P [G(s, X)] is continuous. By the compactness of R(t, ω), which follows from Theorem 4.4, and Berge's maximum theorem ([1, Theorem 17.31]), continuity of s → sup P ∈R(t,ω) E P [G(s, X)] is implied by the continuity of Take a sequence (P n , s n ) n∈Z+ ⊂ R(t, ω) × [0, T ] such that (P n , s n ) → (P 0 , s 0 ). By Skorokhod's coupling theorem, on some probability space, there are random variables (X n ) n∈Z+ with laws (P n ) n∈Z+ such X n → X 0 almost surely. Due to the compactness of F , the continuity assumptions on b and a and Corollary 7.7, we deduce from Berge's maximum theorem that G is continuous. Thus, we get a.s. G(s n , X n ) → G(s 0 , X 0 ). Thanks to the linear growth conditions on b and a, the polynomial growth assumptions on the derivatives of φ, and Lemma 7.4, we notice that where p ≥ 4 is a suitable power (which depends on the polynomial bounds of the derivatives of φ). This estimate shows that the sequence (G(s n , X n )) n∈N is uniformly integrable and we conclude that This is the claimed continuity and therefore, the proof is complete.
Let us fix an f ∈ F . By Lemma 7.11, there exists a probability measure P ∈ R(t, ω) such that the P -characteristics of X have Lebesgue densities (b(f, · + t, ω ⊗ t X), a(f, · + t, ω ⊗ t X)). As in the proof of Lemma 7.9, we get that . With (7.8) and (7.9), we obtain that As s → E P [K(s, X)] is continuous (see the proof of Lemma 7.10), we conclude, with u ց 0, that ω)a(f, t, ω) . As f ∈ F is arbitrary, taking the sup over all f ∈ F shows that v is a weak sense viscosity supersolution. The proof is complete. 7.4. Proof of Theorem 4.3. It follows from Lemmata 7.9 and 7.12 that v is a weak sense viscosity solution to the PPDE (4.2). We now discuss its regularity properties. Recall that ψ is bounded and continuous. Thanks to Corollary 7.7, the map is continuous. Thus, by [2,Theorem 8.10.61], the map is continuous, too. Furthermore, by Theorem 4.4, R is compact-valued. From Corollary 7.3 we get that v(t, ω) = sup P ∈R(t,ω) Hence, under the continuity hypothesis on R, the continuity (upper, lower semicontinuity) of v follows from the continuity of (7. In the following section we prove the Theorems 4.4, 4.7 and 4.12. By a careful refinement of some key arguments, Theorem 4.4 extends [5, Proposition 3.8] beyond the Markovian case. Using the implicit function theorem, strong existence properties of stochastic differential equations with random coefficients and Gronwall arguments, we establish Theorem 4.7 that is seemingly the first result regarding lower hemicontinuity of the correspondence (t, ω) → R(t, ω) and thus the first result on lower semicontinuity of the value function in a fully path-dependent framework related to nonlinear stochastic processes. The proof for Theorem 4.12 on the uniform continuity of v with respect to d combines some observations made in the proof for the lower hemicontinuity of R with an application of the DPP and a Gronwall argument, cf. also the proof of [43, Lemma 3.6].
Before we start our program, we need a last bit of notation. For each n ∈ N, denote the P ncharacteristics of X by (B n , C n ). Set Ω * := Ω×Ω×C(R + ; R d×d ) and denote the coordinate process on Ω * by Y = (Y (1) , Y (2) , Y (3) ). Further, set F * := σ(Y s , s ≥ 0) and let F * = (F * s ) s≥0 be the right-continuous filtration generated by Y .
Step 1. First, we show that the family {P n • (X, B n , C n ) −1 : n ∈ N} is tight (when seen as a sequence of probability measures on the measurable space (Ω * , F * )). Since P n → P , it suffices to prove tightness of {P n • (B n , C n ) −1 : n ∈ N}. We use Aldous' tightness criterion ([23, Theorem VI.4.5]), i.e., we show the following two conditions: (a) for every T, ε > 0, there exists a K ∈ R + such that where the sup is taken over all stopping times S, L ≤ T such that S ≤ L ≤ S + θ. By Lemma 7.4, we have Using the linear growth assumption, we also obtain that P n -a.s. tr C n s ≤ C 1 + sup where the constant C > 0 is independent of n. By virtue of (8.1), this bound yields (a). For (b), take two stopping times S, L ≤ T such that S ≤ L ≤ S + θ for some θ > 0. Then, using again the linear growth assumptions, we get P n -a.s.
which yields (b) by virtue of (8.1). We conclude that {P n • (X, B n , C n ) −1 : n ∈ N} is tight. Up to passing to a subsequence, from now on we assume that P n • (X, B n , C n ) −1 → Q weakly.
This shows that (λ \ ⊗ Q)-a.e. (dY (2) Step 4. In the final step of the proof, we show that P ∈ P ac sem and we relate (Y (2) , Y (3) ) to the P -semimartingale characteristics of the coordinate process. Thanks to [42,Lemma 11.1.2], there exists a dense set D ⊂ R + such that ρ M • Φ is Q-a.s. continuous for all M ∈ D. Take some M ∈ D. Since P n ∈ P ac sem , it follows from the definition of the first characteristic that the process X ·∧ρM − B n ·∧ρM is a local P n -F + -martingale. Furthermore, by the definition of the stopping time ρ M and the linear growth Condition 2.4, we see that X ·∧ρM − B n ·∧ρM is P n -a.s. bounded by a constant independent of n, which, in particular, implies that it is a true P n -F + -martingale. Now, it follows from [23, Proposition IX. 1 Recalling that Y (2) is Q-a.s. locally absolutely continuous by Step 2, this means that Y (1) is a Q-F * -semimartingale with first characteristic Y (2) . Similarly, we see that the second characteristic is given by Y (3) . Finally, we need to relate these observations to the probability measure P and the filtration F + . We denote by A p,Φ −1 (F+) the dual predictable projection of a process A, defined on (Ω * , F * ), to the filtration Φ −1 (F + ). Recall from [20,Lemma 10.42] that, for every s ∈ R + , a random variable Z on (Ω * , F * ) is Φ −1 (F s+ )-measurable if and only if it is F * smeasurable and Z(ω (1) , ω (2) , ω (3) ) does not depend on (ω (2) , ω (3) ). Thanks to Stricker's theorem (see, e.g., [21,Lemma 2.7]), Y (1) is a Q-Φ −1 (F + )-semimartingale. Notice that each ρ M • Φ is a Φ −1 (F + )stopping time and recall from Step 3 that (λ \ ⊗ Q)-a.e. (dY (2) Hence, by definition of ρ M and the linear growth assumption, for every M ∈ D and i, j = 1, . . . , d, we have where Var(·) denotes the variation process. By virtue of this, we get from [20,Proposition 9.24] that the Q- ). Hence, thanks to Lemma 6.4, the coordinate process X is a P -F + -semimartingale whose characteristics (B P , C P ) satisfy Q-a.s.
Step 1: On the structure of P . Recall that P denotes the predictable σ-field on [[0, ∞[[ and let (b P , a P ) be the Lebesgue densities of the P -F-characteristics of the coordinate process X.
where (b P , a P ) denote the Lebesgue densities of the P -F-characteristics of the coordinate process X.
Let f = f(P ) be as in Lemma 8.1. As Condition 4.6 is assumed in Theorem 4.7, we have a decomposition a = σσ * for an R d×r -valued function σ. By a standard integral representation result for continuous local martingales (see, e.g., [19, Theorem II.7.1 ′ ]), possibly on an extension of the filtered probability space (Ω, F , F, P ), there exists an r-dimensional standard Brownian motion W such that P -a.s.
For simplicity, we ignore the standard extension in our notation.
. Next, we show that the laws of {Y n : n ∈ N} form a candidate for an approximation sequence of P . On some filtered probability space, let ζ be an F -valued measurable process and let Y be a continuous semimartingale starting at ω 0 (t 0 ) those semimartingale characteristics are absolutely continuous with densities (b(ζ, · + t 0 , ω 0 ⊗ t 0 Y ), a(ζ, · + t 0 , ω 0 ⊗ t 0 Y )). Then, the law of Y is an element of R(t 0 , ω 0 ).
Proof. Let P be the probability measure of the underlying filtered space and let Q := P •Y −1 be the law of Y . Further, denote the natural right-continuous filtration of Y by (G t ) t≥0 . Thanks to Stricker's theorem (see, e.g., [21,Lemma 2.7]), Y is a P -(G t ) t≥0 -semimartingale. Furthermore, by virtue of Condition 2.4, we deduce from [20,Proposition 9.24] and [16,Theorem 5.25] that the (G t ) t≥0 -characteristics of Y are given by By Lemma 6.4, the coordinate process X is a Q-F + -semimartingale and its characteristics (B, C) are Q-a.s. absolutely continuous and such that P -a.s. for λ \ -a.a.
Using the convexity assumption given by Condition 2.7 and [9, Corollary 8, p. 48], we obtain that This completes the proof.
Thanks to Lemma 8.2, for every n ∈ N, the law of Y n is an element of R(t n , ω n ). Consequently, the laws of {Y n : n ∈ N} are candidates for an approximation sequence of the measure P .
Step 3: ucp convergence of Y n to X: In this step we prove that Y n → X in the ucp topology, i.e., uniformly on compact time sets in probability. Hereby, we use some ideas we learned from the proof of [30,Theorem,p. 88]. Take T, ε > 0 and define Thanks to the moment bound from Lemma 7.4, we obtain that where the constant C > 0 is independent of n and m. Thus, we can take M = M ε > 0 large enough such that Next, using the Burkholder-Davis-Gundy inequality and Hölder's inequality, and (8.4), we obtain, for every t ∈ [0, T ] and n ∈ N, that Hence, from Gronwall's lemma, we get, for all n ∈ N, that where the constant C > 0 is independent of n. Clearly, the first term on the r.h.s. converges to zero as n → ∞. The second term converges to zero by Condition 2.6, Corollary 7.7 and the dominated convergence theorem, which can be applied thanks to the linear growth Condition 2.4 and either the definition of the sequence (T M n ) n∈N , or the uniform moment bound from Lemma 7.4. We conclude that E P sup We are in the position to complete the proof for ucp convergence. Namely, using (8.5) and (8.6), we get, for every δ > 0, that as n → ∞. This proves that Y n → X in the ucp topology. For continuous processes, ucp convergences implies weak convergences (which follows, for instance, from [24, Corollary 23.5]). Hence, the proof of lower hemicontinuity is complete.
Thanks to Condition 4.11, by a standard existence result for stochastic differential equations with random locally Lipschitz coefficients of linear growth, there exists a continuous adapted process Y such that P -a.s.
The proof is complete. 9. Proof of the nonlinear martingale problem: Theorem 5.2 In this section we work under the assumptions of Theorem 5.2, i.e., we assume that d = 1, that F = F 0 × F 1 for two compact metrizable spaces F 0 and F 1 , that b depends on F only through the F 0 variable, that a depends on F only through the F 1 variable, and that the Conditions 2.4 and 2.6 hold. b P ·+t = sup b(f 0 , · + t, ω ⊗ t X) : f 0 ∈ F 0 , a P ·+t = sup a(f 1 , · + t, ω ⊗ t X) : where (b P ·+t , a P ·+t ) denote the Lebesgue densities of the P -characteristics of X ·+t (for its right-continuous natural filtration).
Proof of Theorem 5.2. Fix ω ∈ Ω, n ∈ N, φ ∈ M icx and s, h ≥ 0. Take P ∈ C(s, ω), denote the Lebesgue densities of the P -characteristics of X ·+s by (b P ·+s , a P ·+s ) and denote the local P -martingale part of X ·+s by M P ·+s . Itô's formula yields that φ(X s+h ) − φ(X s ) = h 0 φ ′ (X r+s )dM P r+s + s+h s φ ′ (X r )b P r + 1 2 φ ′′ (X r )a P r dr.
The proof is complete.
Appendix A. Continuity of R in the Lévy case This appendix is dedicated to the fact that the correspondence (t, ω) → R(t, ω) is continuous if the uncertainty set Θ(t, ω) ≡ Θ is independent of (t, ω) and convex and compact. This result is covered by our more general Theorems 4.4 and 4.7. The purpose of this section is to explain a substantially simpler proof for this Lévy setting.
Indeed, Lemmata 6.3 and 6.4 show that the map P → P • (ω(t) ⊗ 0 X) −1 leaves P(Θ) invariant. We stress that this part of the argument is special for the Lévy case. Providing an intuition, it corresponds to the fact that Lévy processes have an additive structure in their initial values, i.e., if L x is a Lévy process with starting value x, then L x − x is a Lévy process (with the same Lévy-Khinchine triplet) starting at zero. Finally, we are in the position to prove that (t, ω) → R(t, ω) is continuous. By virtue of [1, Theorem 17.23], R is lower hemicontinuous, being the composition of the lower hemicontinuous correspondence (t, ω) → ϕ(t, ω) := {(t, ω)} × P(Θ) and the (single-valued) continuous correspondence (t, ω, P ) → {P • (ω(t) ⊗ 0 X) −1 }. Here, lower hemicontinuity of ϕ follows from [1,Theorem 17.28]. We would like to point out that this part of the argument hinges on the fact that Θ is constant. For the upper hemicontinuity of R we need to argue separately. Let F ⊂ P(Ω) be closed. We need to show that the lower inverse R l (F ) = {(t, ω) : R((t, ω)) ∩ F = ∅} is closed. Assume that (t n , ω n ) n∈N ⊂ R l (F ) converges to (t, ω) ∈ [[0, ∞[[. For each n ∈ N, there exists a probability measure P n ∈ R(t n , ω n ) ∩ F . As {ω n (t n ) : n ∈ N} ⊂ R is bounded, the set {P n : n ∈ N} is relatively compact by Lemma 7.4. Hence, passing to a subsequence if necessary, we can assume that P n → P weakly for some P ∈ cl(P(Θ) ∩ F ) = P(Θ) ∩ F . As P • X −1 0 = lim n→∞ P n • X −1 0 = lim n→∞ δ ω n (t n ) = δ ω(t) , we conclude that P ∈ R(t, ω) ∩ F , which implies (t, ω) ∈ R l (F ).