On the Innovations Conjecture of Nonlinear Filtering with Dependent Data

We establish the innovations conjecture for a nonlinear filtering problem in which the signal to be estimated is conditioned by the observations. The approach uses only elementary stochastic analysis, together with a variant due to J.M.C. Clark of a theorem of Yamada and Watanabe on pathwise-uniqueness and strong solutions of stochastic differential equations.


Introduction
A continuing challenge in the theory of nonlinear filtering is the identification of natural conditions under which the so-called innovations conjecture is valid.The essence of the problem is as follows: on a filtered probability space (Ω, F, P ; {F t }) one is given an R D -valued F t -standard Wiener process {W t }, together with an R D -valued F t -progressively measurable signal process {β t } subject to the "finite energy" condition ( In this rather informal introduction we disregard such measure-theoretic technicalities as the need to deal with the usual augmentations of the filtrations {F Y t } and {F I t }, and for { βt } to be interpreted as the optional projection of {β t } onto the usual augmentation of the filtration {F Y t }; all these refined technicalities will be given due consideration from Section 2 onwards.One sees from (1.3) that {I t } is F Y t -adapted, hence F I t ⊂ F Y t .The innovations conjecture states that one actually has equality of the σ-algebras, namely F I t = F Y t for each t ∈ [0, T ].The innovations conjecture is in fact false under the "minimal" conditions above, as follows from results of Benes ([2], Sect.8), which are based upon a subtle example due to Tsirel'son [25] of a functional stochastic differential equation (SDE) which fails to have a strong solution.It is therefore necessary to formulate supplementary conditions under which the innovations conjecture can be shown to hold.One of the earliest results of this kind is due to Clark [5], who established the conjecture in the case where the signal {β t } is uniformly bounded and independent of {W t }, using an argument based on the Kallianpur-Striebel representation for the conditional expectation βt (the main result of [5] appears as Theorem 11.4.1 of Kallianpur [ [11], p.284], as well as in Meyer [[19], pp.244-246]).
A powerful approach for dealing with the innovations problem was introduced in the late 1970's with the recognition by Allinger, Benes, Clark, Erzhov and Mitter that a theorem of Yamada and Watanabe [27] on pathwise uniqueness and strong solutions of SDEs was key to an alternative and very elegant approach for establishing the innovations conjecture.The central idea is the following: when {β t } and {W t } are independent then the Kallianpur-Striebel formula yields an R D -valued non-anticipative function {γ t } on C[0, T : R D ] (the space of continuous functions from [0, T ] into R D , with the raw canonical filtration) such that β(t, ω) = γ t (Y (ω)) (P ⊗ λ) a.e., and therefore the defining relation for the innovation process in (1.3) can be written in the differential form (1.4) dY t = dI t + γ t (Y ) dt, Y 0 = 0.
Since {I t } is a standard F Y t -Wiener process (see e.g.Rogers and Williams [22], Theorem VI(8.4)(i), p.323) it follows that the pair {(Y t , I t )} on the filtered probability space (Ω, F, P ; {F Y t }) is a solution of (1.4) in the "weak" or "usual" sense (see e.g.Revuz and Yor [20], Definition IX(1.2), p.350).Consequently, if it can be verified that (1.4) has the property of pathwise uniqueness, then the theorem of Yamada and Watanabe (see e.g.Theorem IX(1.7)(ii) of [20], p.352) asserts that {(Y t , I t )} is a strong solution of (1.4), that is {Y t } is F I t -adapted, and hence F Y t ⊂ F I t , as required to get the innovations conjecture.This is the approach used in [1], in which the innovations conjecture is established when {β t } is independent of {W t } and satisfies the finite energy condition (1.1).
The preceding results rely completely on the basic condition that the given processes {β t } and {W t } in (1.2) are independent.When these processes are correlated it is not possible to establish the innovations conjecture unless specific models which define the structure of the correlation are postulated.The simplest of these is the linear Gaussian model, for which the innovations conjecture may be established by functional-analytic methods which depend on the underlying linearity (see e.g.Kallianpur [11], Section 10.2).In the genuinely nonlinear and non-Gaussian case the problem of establishing the innovations conjecture with correlated data becomes much more challenging.The most significant results on this problem are due to Krylov [15], who considers the following model defined on the finite interval t ∈ [0, T ]: (1.5) (1 ) dX t = b 1 (t, X t , Y t ) dt + σ 11 (t, X t , Y t ) dW 1 t + σ 12 (t, X t , Y t ) dW 2 t , X 0 = ξ 0 , (2 ) dY t = b 2 (t, X t , Y t ) dt + dW 2 t , Y 0 = 0; (in fact, Krylov considers a rather more general model -see (1) of [15], p.773 -which we have simplified to (1.5) in order to focus on just the main essentials).In this model {X t } is an R d -valued "state" process (typically not available for exact measurement), {Y t } is an R D -valued observation process, {(W 1 t , W 2 t )} is a standard Wiener process independent of the initial state ξ 0 , and b 1 , b 2 , σ 11 , σ 12 are vector or matrix-valued Borel measurable functions (of appropriate dimension) defined and uniformly bounded on [0, T ] × R d × R D , and globally Lipschitz-continuous on t } is a standard F t -Wiener process, and {β t } is a uniformly bounded and F t -adapted process.Thus (1.5)(2 ) corresponds to the observation equation in (1.2) (with W 2 interpreted as W ), and we are therefore in the setting of the first paragraph of the present section, and can pose the question of the validity of the innovations conjecture for the model (1.5).This was established by Krylov [15], using an approach which relies on the normalized (Kushner-Stratonovich) filter equation written as a "forward equation" in density form (due to Krylov and Rozovskii [16], Theorem 1.2, pp.341-342), together with an ingenious recursive approximation of the optimal nonlinear filter, each approximation being adapted to the innovations filtration; this effectively yields that the optimal nonlinear filter itself is adapted to the innovations filtration, from which the innovations conjecture follows.The associated hypotheses are enumerated in ( [15], pp.773-774), as well as in Kallianpur ([11], pp.302-303), and in particular include (1) the law of ξ 0 has a continuous density which belongs to an appropriate Sobolev space of functions on R d ; (2) the least eigenvalue of the symmetric matrix σ 11 σ 11 (t, x, y) is lower-bounded by some µ > 0 uniformly in (t, x, y); (3) σ 11 (t, x, y) and σ 12 (t, x, y) are third-order smooth with respect to x, and b 1 (t, x, y) and b 2 (t, x, y) are second-order smooth with respect to x; and (4) the "sensor function" b 2 (t, x, y) is square-integrable in x with respect to Lebesgue measure in the sense that there is some Borel-measurable g : (see eqn. ( 4) of [15], p.774, or (v) of [11], p.302).This latter condition is fairly strong, since it supplements the previous postulate of uniform boundedness of the sensor function b 2 (t, x, y) with respect to (t, x, y) ∈ [0, T ] × R d × R D with the further condition that b 2 (t, x, y) be a member of L 2 (R d ), when regarded as a function of the "state" parameter x ∈ R d .In addition, the requirement that the initial state ξ 0 have a continuous density is also quite strong, since it excludes for example the simple case where the initial state is non-random.
In view of the preceding discussion, an obvious goal is to try to extend the basic approach used for resolving the innovations conjecture when {β t } and {W t } in the observation equation (1.2) are independent, to models such as (1.5) for which this independence no longer holds.It is this problem that we address here.The advantages of this approach are that it is conceptually quite simple, and, as will be seen, it also enables one to remove several of the stronger hypotheses that appear to be necessary when resolving the innovations conjecture for the model (1.5) by methods based on the density form of the normalized filter equation.In fact we shall study a model which is a non-Markov variant of (1.5) with path-dependent coefficients and hypotheses which do not entail uniform boundedness, smoothness, or squareintegrability of the coefficients, or require the initial state distribution ξ 0 to have a density.The approach relies only on elementary stochastic analysis, together with a modification due to Clark [6] of the classical theorem of Yamada and Watanabe [27] on the relation between strong solutions of SDEs and pathwise uniqueness.
In Section 2 we formulate the model and the basic hypotheses, and state the main result (Proposition 2.6) on equality of the observation and innovation σ-algebras.In Section 3 we summarize a number of preliminaries that we shall need, in particular a "pathwise" Bayes formula and the afore-mentioned result of Clark [6], and in Section 4 we establish the innovation conjecture for the model of Section 2. Finally, in Section 5, we specialize the nonlinear filtering problem of Section 2, and obtain as a by-product of the innovations conjecture that the corresponding measure-valued normalized filter equation has the property of pathwise-uniqueness; this supplements some earlier results of this kind of Kurtz and Ocone [17] and the authors [18].Throughout the exposition we isolate in numbered Remarks various facts and observations which we shall need for later reference.

Conditions, Model and Main Result
To facilitate easy reference we summarize all of our notation as follows: Notation 2.1.(I) For a positive integer q write R q for the space of all real q-dimensional column vectors, write x i or [x] i for the i-th scalar entry and |x| := [ q i=1 (x i ) 2 ] 1/2 for the Euclidean norm of x ∈ R q .Likewise, for positive integers q and r, write R q×r for the space of all q by r matrices with real entries, write A ij or [A] ij for the (i, j)-th scalar entry, A for the transpose, and A := max x∈R r , |x|=1 |Ax| for the operator norm of A ∈ R q×r .(II) For a fixed T ∈ (0, ∞) let C[0, T ; R q ] denote the space of all continuous mappings from [0, T ] into R q , with the usual supremum norm.For notational brevity we also denote this space by C q when there is no possibility of confusion, and put for the canonical filtration on C q (with ψ a generic member of C q ).(III) B(S) denotes the Borel σ-algebra of a separable metric space (S, ρ), and, for a F/B(S)measurable mapping ξ from a probability space (Ω, F, P ) into S, let P ξ −1 denote the probability measure on B(S) given by P ξ −1 (A) := P [ξ ∈ A] for each A ∈ B(S).(IV) For an R q -valued process {η t } put F η t := σ{η s , s ∈ [0, t]} for its raw filtration.(V) For a continuous semimartingale {M t } put E(M ) t := exp{M t − (1/2) M t }. (VI) For a measure space (E, S, µ) (not necessarily complete) write Z µ [S] for the collection {N ⊂ E : N ⊂ H for some H ∈ S with µ(H) = 0}.For a σ-algebra M ⊂ S write M∨Z µ [S] for the minimal σ-algebra on E which includes all members of M and Z µ [S].(VII) Let C ∞ (R q ) denote the set of all infinitely smooth R-valued mappings on R q , and let C ∞ c (R q ) denote the set of all members of C ∞ (R q ) with compact support.
To specify a model for the filtering problem, fix a constant T ∈ (0, ∞), which determines a finite "time-horizon" t ∈ [0, T ] of interest, together with an R d -valued "state" process {X t , t ∈ [0, T ]} and an R D -valued observation process {Y t , t ∈ [0, T ]} determined by the coupled SDE (2.7), which is subject to the following Conditions 2.2 and 2.3: ) is an R q+D -valued standard Wiener process with respect to the filtration {F t } on a given filtered probability space (Ω, F, P ; {F t }) satisfying the "usual conditions" (see e.g.[21], Definition II(67.1),p.172), and and h : Ξ → R D in (2.7) have the following properties: (i) if α denotes a generic scalar entry from a, b, c or h, then the mapping (t, x, y) → α t (x, y) : Ξ → R is continuous, and the the mapping (x, y) → α t (x, y) : Remark 2.4.The third term on the right of (2.7)(1 ) represents "feedback" of the observation {Y t } to the dynamics of the state process {X t }.When (2.7)(2 ) is used to remove " dY t " from (2.7)(1 ) then we get state-dynamics in a form similar to (1.5)(1 ) (but for pathdependent coefficients).From Condition 2.3 and standard results on SDE's with Lipschitzcontinuous "functional" coefficients (e.g.Kallianpur [11], Theorem 5.1.1,p.97) we know that the processes {X t } and {Y t } are uniquely determined to within indistinguishability, and adapted to the filtration σ{ξ 0 , W Comparing this result with that summarized in Section 1, we see that (2.7) admits coefficients with "functional" dependence (in contrast to the "diffusion-type" coefficients in (1.5)), without any postulated smoothness, allows unbounded coefficients in the "state" equation (2.7)(1 ), and does not require that the initial state ξ 0 in (2.7) have a density function.We do need uniform boundedness of the "sensor function" h t in the observation equation (2.7)(2 ), but do not postulate that this function enjoy any square-integrability properties comparable to (1.6) (which would of course not make sense in the setting of path-dependent coefficients).

A Pathwise Bayes Formula and Other Preliminaries
Essential to the proof of Proposition 2.6 is a "pathwise" Bayes representation for the process { βt } (recall Remark 2.5) which is given in the present section.We begin by recalling a Bayes formula for the conditional expectation E [ β t | Y t ] which is established in Elliott ([7], (18.24), p.290) in the case of Markov dynamics for the state/observation pair (X, Y ).We briefly summarize the derivation of the formula, since our setting is somewhat different from that of [7], and because later on we shall need some of the ideas which arise in the derivation.The process {β t } is continuous, F t -adapted and uniformly bounded (see (2.8), Remark 2.4, and Condition 2.3(i)(iii)), and hence (see Notation 2.1(V)) the process given by defines a continuous F t -martingale on (Ω, F, P ).It follows that defines a probability measure on (Ω, F) equivalent to P , and for A ∈ F 0 , we have 2) it follows in particular that ξ 0 has the same law relative to both P and P 0 , namely Moreover, from (2.8) and (2.7)(2 ), we have and it follows from (3.10), (3.11), (3.13), and the Girsanov theorem that (where we have used (3.13) at the second equality).From (3.11), for each t ∈ [0, T ] we have In the sequel it will be essential to have at hand a "pathwise" representation of the Bayes formula (3.16) which is due to Bhatt and Karandikar [3].As preparation for stating this representation we summarize in the following Theorem 3.1 two fundamental results of Karandikar ([13], Theorem 3) and ( [12], Theorem 4.3) on pathwise representations for stochastic integrals and for solutions of SDEs with Lipschitz-continuous coefficients.These results are needed to state the "pathwise" representation of the Bayes formula (3.16) and will also be used for the proof of Proposition 2.6 in Section 4.
Theorem 3.1.The following "pathwise" representations hold: (I) There exists a universal "stochastic integral" mapping J : adapted process and {η t } is an R-valued continuous semimartingale on a filtered probability space (E, A, µ; {A t }) satisfying the usual conditions, then the process {J t (ρ, η)} (defined pathwise) is indistinguishable from the (A t -adapted) stochastic integral process (II) Suppose that Condition 2.3 holds for the coefficients a, b and c in (2.7)(1 ).Then there exists a universal "solution" mapping e : R d × C q × C D → C d with the following properties: (a) (ξ, w, y) → e(ξ, w, y) : } is an R q+D -valued continuous semimartingale on a filtered probability space (E, A, µ; {A t }) satisfying the usual conditions, and χ 0 is some R d -valued A 0 -measurable random vector, then the process {e t (χ 0 , η 1 , η 2 )} (defined pathwise) is indistinguishable from the (R d -valued, continuous, and A t -adapted) process {ζ t } defined by (existence and uniqueness -to within indistinguishability -of solutions for (3.17) follows from Picard iterations and Condition 2. 3) The mapping J in Theorem 3.1(I) is "universal" in the sense that it does not depend in any way on the laws of the continuous process {ρ t } and the continuous semimartingale {η t }.
Similarly, the solution mapping e in Theorem 3.1(II) is "universal" in that it is determined by the coefficients a, b and c only, and does not depend on the laws of χ 0 or the semimartingale {(η 1 t , η 2 t )}.It will be useful to define a "canonical set-up" as follows: recalling Notation 2.1(II)(III) put in which π 0 is the law of the initial state ξ 0 for the model (2.7) (see (3.12)).
Then the filtered probability space (E, A, µ; {A t }) satisfies the usual conditions (see e.g.[21], Theorem (II)(68.4),p.175).If (ξ, w, y) is a generic member of E then {e t (ξ, w, y)} is an R d -valued, continuous and A t -adapted process on (E, A, µ; {A t }), and then, for we see from Condition 2.3(i) that {ϑ t (ξ, w, y)} is an R D -valued, continuous and A t -adapted process on (E, A, µ; {A t }).From this it follows that {J t (ϑ k (ξ, w, y), y k )} is R-valued, continuous and A t -adapted for each k = 1, 2, . . ., D, and hence the (0, ∞)-valued process {ρ t (ξ, w, y)} defined on (E, A, µ; {A t }) by Then the filtered probability space With these preliminaries in place, we can state the following representation formulae of Bhatt and Karandikar ([3], p.45, p.46):The (0, ∞)-valued process {G t (y)} defined by (3.24) G t (y) := and it is easily seen from the monotone-class argument leading to (3.29) that γ 0 is subject to the same uniform bound, namely Remark 3.4.For later reference we next recall the notions of solution and strong solution in the context of the particular SDE arising from (3.31).Although these ideas are quite standard in the theory of SDE's, there is some variation in terminology and formulation from one reference to another.In view of their considerable importance it seems appropriate to recall the definitions clearly at this point: (I) A pair {( Ω, F, P ; { Ft }), ( Ȳt , Īt )} is a solution of the SDE with drift function {γ 0 t } and unit covariance when ( Ω, F, P ; { Ft }) is a filtered probability space satisfying the usual conditions, { Ȳt } is an R D -valued continuous Ft -adapted process, and { Īt } is an R D -valued standard Ft -Wiener process on ( Ω, F, P ), such that Furthermore, a solution {( Ω, F, P ; { Ft }), ( Ȳt , Īt )} is said to be a strong solution when { Ȳt } is adapted to the filtration { Īt } (which is the usual augmentation of the raw filtration {F Ī t } of the Wiener process { Īt }), or equivalently when Ȳt ⊂ Īt , t ∈ [0, T ], where { Ȳt } is the usual augmentation of the raw filtration {F Ȳ t } of the process { Ȳt }. (II) For the innovation process {I t } and observation filtration {Y t } defined at Remark 2.5 it is a standard result that {I t } is an R D -valued Y t -Wiener process on (Ω, F, P ) (see e.g.Rogers and Williams [22], Theorem VI(8.4)(i), p.323).It then follows from (3.31) that {(Ω, F, P ; {Y t }), (Y t , I t )} is a solution of the SDE with drift function {γ 0 t } and unit covariance (in the sense of (I)).It remains to show that it is a strong solution, for then we have that Y t ⊂ I t , which gives Proposition 2.6.To this end, we shall use a variant of the theorem of Yamada and Watanabe [27] due to Clark ([6], Proposition and Corollary p.157), which, in the context of the SDE with drift function {γ 0 t } and unit covariance, states the following: Theorem 3.5.Suppose that the SDE with drift function {γ 0 t } and unit covariance has the property of pathwise-uniqueness in the following restricted sense: given any pair of solutions {( Ω, F, P ; { Ft }), ( Ȳ i t , Īt )}, i = 1, 2, each having joint-law identical to that of the solution {(Ω, F, P ; Remark 3.6.Some remarks on the relationship between the classical theorem of Yamada and Watanabe [27] and Clark's modification of this theorem are in order.The theorem of Yamada and Watanabe asserts (among other things) that if pathwise-uniqueness holds among all pairs of postulated solutions of an SDE (on an arbitrary common filtered probability space, with common initial value and common "driving" Wiener process) then each and every solution of the SDE is a strong solution (see e.g.Theorem IX(1.7)(ii) of [20], p.352, for a very nice rendition of this result).In contrast, Clark's modification [6] of this theorem postulates the weaker hypothesis of pathwise-uniqueness among all pairs of postulated solutions of the SDE having the same joint law as some designated solution, and in return gives the weaker conclusion that just the designated solution is strong.For many applications this conclusion is all that is wanted (the present one being a case in point), and the weaker hypothesis is often easier to verify, since pathwise-uniqueness need be established only among putative solutions with the same law as the solution whose strength must be demonstrated, rather than among all postulated solutions.Although Theorem 3.5 is a statement of Clark's result only for the specific case of the SDE with drift function {γ 0 t } and unit covariance, the result in fact pertains to completely general SDE's with functional coefficients (see Clark [6], Proposition and Corollary on p.157).

Proof of Proposition 2.6
In the present section our goal is to show (3.41); as noted at the end of the previous section, this establishes (3.32) and hence Proposition 2.6.To this end, and recalling Remark 3.7 and Theorem 3.1(II), define (i.e.{ Ỹt } is the usual augmentation of the filtration {H t }; the set-inclusion at (4.43) follows because the { Ỹ i } are Ft -adapted and { Ft } satisfies the usual conditions).To establish (3.41) we shall need the following result, the proof of which is deferred to later in this section: Proposition 4.1.Suppose that Conditions 2.2 and 2.3 hold.Then there exists a sequence of Ỹt -stopping times {T n , n = 1, 2, . ..} on ( Ω, F, P ), together with a sequence of constants Proof of (3.41):For each t ∈ [0, T ] and n = 1, 2, . .., put (4.48) where T n and K(n) are as asserted in Proposition 4.1.Then (4.45) gives (4.49) for each n = 1, 2, . . .From (4.47) it follows in particular that t → α n t is integrable on t ∈ [0, T ], and t → β n t is of course integrable on t ∈ [0, T ] (in view of (3.30) and (3.39)).Hence (4.49) and Gronwall's inequality (Kallianpur [11], Proposition 5.1.1,p.94-95) give, for all t ∈ [0, T ], (4.50) where , and we have used the fact that the mapping t → β n t is nondecreasing at the second inequality of (4.50).In light of (4.50) and (4.48), we obtain the following integral inequalities: for each n = 1, 2, . .., it follows from this inequality and (4.52) that there is a constant and therefore, from (4.53), (4.48) and the Gronwall inequality, for each t ∈ [0, T ] we have Next, we need the following result, which follows immediately from Bhatt and Karandikar (see Theorem 4.1 and eqn.(4.4) on p.47 of [3]): (the latter following from (4.55), (4.57) and Q 2 (N ) = 0).Then, from (4.59), (4.56), (3.39), (4.60) In view of (4.58) we see that (4.61) are Ỹt -stopping times for all n = 1, 2, . .., and Then it follows from (4.60), (4.61) and (4.63) that, for P -almost all ω, (4.64) Step To simplify the notation, put (4.66) (thus h i t is D-dimensional row-vector), and observe from (4.65) and (3.38), together with the uniform boundedness of h and γ 0 (see Condition 2.3(iii) and (3.30)), that there is a constant (recall Notation 2.1(V), and the fact that We next upper-bound the quantity on the right side of (4.70).To this end, put (4.71) From the elementary upper-bound |e x − e y | ≤ (1/2)(e x + e y )|x − y|, x, y ∈ R, together with (4.71), (4.66), (4.65), and (3.40), we find we can use (3.38) and the Cauchy-Schwarz inequality for the dτ -integrals on the right side of (4.72) to get From (4.75) and the Cauchy-Schwarz inequality for conditional expectations (Chow and Teicher [4], Theorem 7.2.4,p.219) we obtain the constant K 2 ∈ [0, ∞) depending only on the uniform bound h on {h i t } and {γ 0 t ( Ỹ 1 )}.We next establish a similar upper-bound on | l2 t |.From (4.42) and an argument identical to that which led to (4.69) we obtain Then it follows from (4.63) that and then, from Jensen's inequality, for each t ∈ [0, T ] we get for a constant K 3 ∈ [0, ∞) depending only on the uniform bound on h (Condition 2.3(iii)).
Step 3: In this step we complete construction of the stopping times T n in Proposition 4.1 (by introducing stopping times for the martingale {ψ 1 t } at (4.75) and for the martingale {ψ 2 t } to be defined at (4.86)).
Step 5: In this step we establish (4.45) for some constants K(n) ∈ [0, ∞) (with the stopping times T n given by (4.89)).From (4.46), (3.40) and (3.38), In view of this, together with (4.42), (4.85), and the Cauchy-Schwarz inequality, we find a constant K 7 ∈ [0, ∞) (depending only on T ) such that, for each t ∈ [0, T ], n = 1, 2, . .., Now upper-bound each term on the right of (4.101).It follows from Condition 2.3, reasoning identical to that used at (4.98), and (4.99), that As for the second and third terms on the right of (4.101), put where the second inequality of (4.104) follows exactly as at (4.92).For the third term on the right of (4.101) we clearly have a bound identical to that of (4.104) but with ∆c τ and Ĩ in place of ∆b τ and W respectively.For the fourth term on the right of (4.101), it follows from the uniform-boundedness of {γ 0 t ( Ỹ 1 )} and the same reasoning which led to (4.102), that for some constant C ∈ [0, ∞) (depending only on T and the uniform bound on γ 0 ).Upon combining (4.101), (4.102), (4.104), and (4.105), we find a constant K 8 ∈ [0, ∞) such that, for each t ∈ [0, T ] and n = 1, 2, . .., We next take expectations on each side of (4.106).To this end, from Doob's L 2 -maximal inequality, (3.35), and the Itô isometry, we get and, exactly as at (4.98) together with Fubini's theorem, we obtain and an identical upper-bound holds for the expectation of the fourth term on the right side of (4.106).As for the fifth term on the right of (4.106), for each t ∈ [0, T ] we have where we have used the Ỹt -measurability of I[0, T n )(t) Ẽ ζI[0, T n )(t) (in place of (2.8)).From Proposition 2.6 we then have Our goal is to establish pathwise-uniqueness for the measure-valued SDE giving the optimal nonlinear filter for the model (5.111).For completeness we briefly recall this equation together with the related ideas of a solution, pathwise-uniqueness, and uniqueness-in-law.From (5.111) we see that {X t } is a diffusion process given by (5.113) dX for which the corresponding linear second-order differential operator is ) is convergence-determining (Ethier and Kurtz [8], Problem 3.11.11,p.151), it follows that {π t , t ∈ [0, T ]} is a continuous P(R d )-valued and Y t -adapted process.
Remark 5.10.Proposition 5.9 postulates fourth-power integrability of the initial-value probability measures πi 0 , and for this reason gives a restricted form of pathwise-uniqueness when compared with Definition 5.8 (1), in which there is no comparable integrability condition on the initial-value measures.Proposition 5.9 should be compared with Theorem 4.5 of Kurtz and Ocone ( [17], p.99), which establishes a partial pathwise-uniqueness for the normalized filter equation in the following sense: if {µ t , t ∈ [0, T ]} is a P(R d )-valued continuous process on the probability space (Ω, F, P ) on which the random data of the nonlinear filtering problem is specified, adapted to the observation filtration {Y t } and satisfying a relation exactly analogous to the normalized filter equation, then {µ t } and the optimal nonlinear filter {π t } are P -indistinguishable.The hypotheses for this result are quite general (see (4.19)(i)-(iv) in [17], p.98), and in particular do not entail the uniform-boundedness and Lipschitz-continuity of the sensor function h(•) or fourth-power integrability of the initial laws that are needed for Proposition 5.9.On the other hand, in return for these stronger hypotheses, Proposition 5.9 establishes pathwise uniqueness in the genuine "Yamada-Watanabe" sense, that is among pairs of candidate solutions on an arbitrary common filtered probability space ( Ω, F, P ; { Ft }) and "driven" by an arbitrary Ft -Wiener Process { Īt }, rather than for candidate solutions {µ t } on the particular filtered probability space (Ω, F, P ; {Y t }), adapted specifically to the observation filtration, and "driven" specifically by the innovations process.
Proof of Proposition 5.9: From the hypotheses we have By a simple conditioning argument (see Ikeda and Watanabe [10], Remark IV.1.4,p.149) it is enough to establish pathwise-uniqueness in the special case where the πi 0 are non-random, and thus, without loss of generality, we shall suppose that (5.119) π1 0 = π2 0 for some πi 0 ∈ P(R d ) such that πi 0 ψ < ∞ (for ψ given by (5.118)).
Suppose Conditions 2.2 and 2.3 for the coupled SDE (2.7).Then, with reference to Remark 2.5, we have that Y t = I t for each t ∈ [0, T ].
dτ .Remark 4.2.From (3.35) and (3.38), we see that {( Wt , Ỹ i t )} is an R q+D -valued continuous Ft -semimartingale, hence, for { Xi t } defined by (4.42), it follows from Theorem 3.1(II)(b) that 2, 3, . ..Now lim n→∞ T n = T ( P −a.s.) and monotonically (by Proposition 4.1), and therefore, in view of (4.54) and the monotone convergence theorem, we obtain Ẽ| lt | 2 = 0 for each t ∈ [0, T ); now (3.41) follows from this and the Fubini theorem.Since the proof divides quite naturally into five distinct steps, we choose to present it as such (rather than breaking it up into a number of independent lemmas, propositions, etc).Step 1: In this step we begin construction of the stopping times T n asserted in Proposition 4.1.From Remark 3.2 and (3.37), Proof of Proposition 4.1: The proof involves constructing a sequence of Ỹt -stopping times T n and constants K(n) with the stated properties, and relies of course on the structure of the drift functional {γ 0 t } (defined by (3.21), (3.22), (3.24), (3.25), (3.28) and (3.29)).i ) −1 and Q 2 are equivalent probability measures on B(C D ) for i = 1, 2, 2, (as follows from (4.55) and (3.23)), thus each set A∈ B t (C D ) Q 2 has the form A = B C for some B ∈ B t (C D ) and C ∈ Z P ( Ỹ i ) −1 [B(C D )],and from this together with (4.43) it is seen that Ft ( Ỹ i ) and Gt ( Ỹ i ) are Ỹt -measurable, that is 2: In this step we work out upper-bounds for the quantities | l1 t | and | l2 t | defined at (4.63) (these upper-bounds are given by (4.79) and (4.82)).By (4.42) and (3.21) we have {ϑ t ( ξ0 , W , Ỹ i )} = {h t ( Xi , Ỹ i )}, and this process is continuous Ft -adapted (by Condition 2.2 and the fact that {( Xi valued Ft -Wiener process -see (3.35)).Since {h i t } is uniformly bounded, continuous, and Ft -adapted, we see that {E(h i • Ĩ) t } is a continuous L p -bounded Ft -martingale for each p ∈ [1, ∞), hence Doob's maximal inequality and (4.67) establish t∈[0,T ] predictable and { Ỹ i t } are continuous Ỹt -adapted, we see from (3.39) that { lt } is Ỹt -predictable, and hence t 0 | lτ | 2 dτ is Ỹt -measurable.Then, upon taking Ỹtconditional expectations on each side of (4.73) and using the uniform-boundedness of {h i t } and {γ 0 t ( Ỹ 1 )}, for each t ∈ [0, T ] we get Now substitute (4.77) -(4.78) into (4.74) and square both sides.From this, together with the upper-bound [ m i=1 a i ] 2 ≤ m 2 m i=1 a 2 i (for a i ∈ R) and (4.70), for each t ∈ [0, T ] we get Ỹt , P − a.s.
), P − a.s.for each n = 1, 2, . . .and t ∈ [0, T ].We next use the inequalities (4.79) and (4.82) obtained in Step 2 to upper-bound each term on the right of (4.90).From (4.89) and (4.83) we have the upper-bound ψ 1 t (ω)I[0, T n (ω))(t) ≤ n for all (t, ω) ∈ [0, T ] × Ω, thus, from (4.79) and the Ỹt -measurability of [3]]ombining (4.110), (4.108), and (4.106), we see that there is a constant K(n) ∈ [0, ∞) such that (4.45) holds for each t ∈ [0, T ] and n = 1, 2, ...Remark 4.4.As we have noted at Remark 3.4(II), establishing the innovations conjecture is really a matter of showing that {(Ω, F, P ; {Y t }), (Y t , I t )} is a strong solution of the SDE with drift function {γ 0 t } and unit covariance (see Remark 3.4(I)), and this in turn follows from Theorem 3.5 once we have shown that this SDE has the property of pathwise uniqueness.Although global Lipschitz-continuity is postulated for the coefficients of the state/observation equation (2.7) (see Condition 2.3) it is well to note that the complexity of the drift term {γ 0 t } means that standard results on pathwise-uniqueness for classical Itô SDEs (see e.g.Theorem 5.1.1 of [Kallianpur[11], p.97]) do not apply directly to the SDE with unit covariance and drift {γ 0 t }.In fact, the global Lipschitz-continuity of Condition 2.3 is essential for securing the representation of the drift {γ 0 t } due to Bhatt and Karandikar[3], but is otherwise used only rather indirectly in the proof of pathwise-uniqueness (at (4.98), (4.102), (4.105) and (4.108)).Remark 5.2.The model(5.111) is just a special case of (2.7), and therefore Remark 2.4 continues to hold for (5.111).We define the observation filtration {Y t }, the innovations process {I t }, and the innovations filtration {I t }, exactly as in Remark 2.5, except that In this section we shall use Proposition 2.6 to establish pathwise-uniqueness for the measurevalued SDE which gives the nonlinear filter for a simplified version of the model (2.7).To this end we shall suppose from now on that {X t , t ∈ [0, T ]} is an R d -valued "state" process and {Y t , t ∈ [0, T ]} is an R D -valued observation process given by(5.111)(1)dXt= a(X t ) dt + b(X t ) dW 1 t + c(X t ) dY t , X 0 = ξ 0 ,(2 )dY t = h(X t ) dt + dW 2 t , Y 0 = 0. subject to Condition 2.2 as well as Condition 5.1.The mappings a : R d → R d , b : R d → R d×q , c : R d → R d×D and h : R d → R D are globally Lipschitz-continuous, h is uniformly bounded, and the d × d-matrix b(x)b (x) is strictly positive definite for each x ∈ R d .