Journal URL

The paper studies the filtering problem for a non-classical frame- work: we assume that the observation equation is driven by a signal dependent noise. We show that the support of the conditional distri- bution of the signal is on the corresponding level set of the derivative of the quadratic variation process. Depending on the intrinsic dimension of the noise, we distinguish two cases: In the first case, the conditional distribution has discrete support and we deduce an explicit represen- tation for the conditional distribution. In the second case, the filtering problem is equivalent to a classical one defined on a manifold and we deduce the evolution equation of the conditional distribution. The re- sults are applied to the filtering problem where the observation noise is an Ornstein-Uhlenbeck process.


Introduction
Let (Ω, , P) be a probability space on which we have defined a homogeneous Markov process X .We can obtain information on X by observing an associated process y which is a function of X plus random noise n: y t = h(X t ) + n t . (1.1) In most of the literature n t is modeled by white noise, which does not exist in the ordinary sense, but rather as a distribution of a generalized derivative of a Brownian motion.That is, the process W defined formally as is a Brownian motion and the (integrated) observation model turns into one of the (classical) form where Y is the accumulated observation.In this case, the observation σ-field is Y t = σ Aug {Y s , s ≤ t} and the desired conditional distribution satisfies the Kushner-Stratonovich or Fujisaki-Kallianpur-Kunita equation where L is the generator of X and ν, defined as νt = Y t − t 0 π s (h)ds, t ≥ 0, is a Brownian motion called the innovation process.
Balakrishnan (see Kallianpur and Karandikar [8], p. 3) states that this (integrated) approach is not suitable for application since the results obtained cannot be instrumented.Kunita [9], Mandal and Mandrekar [11] and Gawarecki and Mandrekar [4] studied the model (1.1) when n is a general Gaussian process.The most important example is the case when n is an Ornstein-Uhlenbeck process given by dO t = −βO t d t + β dW t , (1.3) and W is a standard m-dimensional Wiener process.The observation model becomes or the integral form To fix the ideas, we let the signal X be a R d -valued process given by where b, c are R d , respectively R d×d -valued functions and B is a standard d-dimensional Wiener process independent of W .The optimal filter π is given by In the aforementioned papers, a Kallianpur-Striebel formula is given, the filtering equation for π is derived and it is proved that π converges to the solution of the (classical) FKK equation as β tends to infinity.However, the conditions imposed in these papers [9], [11], [4] are very restrictive.Most notably, the authors assume that the map t → h(X t ) is differentiable.To remove this restrictive condition, Bhatt and Karandikar [1] consider the variant observation model for α > 0 and obtain the same results for this modified model.
In this paper, we deal with the original model (1.4) but no longer assume differentiability of the map t → h(X t ).As we will see, this will lead to an observation model with signal dependent observation noise d Y t = h(X t )d t + σ(X t )dW t .
In turn, the filtering problem with signal dependent observation noise will be converted into a classical one (with signal independent observation noise) via a suitably chosen stochastic flow mapping.This article is organized as follows: In Section 2 we set the filtering problem with signal dependent observation noise and the framework for transforming this singular filtering problem into the classical one.Then, we discuss this transformation in two cases.In Section 3 we consider the case when the signal dimension is small so (under mild regularity) level set M z = {x ∈ R d : σ(x) = z} is discrete for each positive definite matrix z.In Section 4 we study the transformation when M z are manifolds.In this case, we decompose the vectors in R d into their tangent and normal components, and study the signal according to this decomposition.In Section 5, we convert the filtering with OU-noise to a special case of the general singular filtering model.
The methods and results presented here benefited a lot from the work of Ioannides and LeGland (see [6] and [7]).In particular, in [7], they study the filtering problem with perfect observations.That is, in their set-up, the observation process Y is a deterministic function of the signal X.Here we show that the filtering problem that we are interested in can be reduced to one where the observation process has two components: one that is perfect (in the language of Ioannides and LeGland ) and one that is of the classical form (see Lemma 2.3 below).

A general singular filtering model
Motivated by the filtering problem with OU-process as noise, we will consider a general filtering problem with signal and observation given by where B and W are two independent Brownian motions in R d and R m respectively, b, c, c, h, σ are functions defined on R d with values in R d , R d×m , R d×d , R m , R m×m , respectively.We will assume that Y 0 = 0 and that X 0 is independent of B and W .We will denote the law of X 0 by π 0 and assume that it is absolutely continuous with respect to the Lebesgue measure on R d and will denote by π0 its density.
We will also assume that σ(x) is a symmetric for each x and positive definite matrix for each x.If not, we can define the m-dimensional process W Then the pair process ( W , B) is a standard m+d-dimensional Brownian motion and the system (2.1) is equivalent to the following The analysis can be easily extended to cover the case when all terms in (2.1) depend on both X and Y .We make the following assumptions throughout the rest of the paper.

Condition (BC):
The functions b, c, c are Lipschitz continuous and h is bounded and measurable.
Condition (ND): For any x ∈ R, the m × m-matrix σ(x) in invertible.

Condition (S):
The partial derivatives of the function σ up to order 2 are continuous.

Condition (X0):
The law of X 0 has a continuous density π0 with respect to the Lebesgue measure on R d .
Let 〈Y 〉 be the quadratic covariation process of the m-dimensional semimartingale Y .From (2.1) it follows that Hence, the process We decompose the observation σ-field Y t into two parts: One is generated by σ(X t ) and another by an observation process of the classical form.

Denote by +
m the set of symmetric positive definite m × m-matrices.Then for x ∈ R d , we have σ(x) and σ 2 (x) ∈ + m .Next, we define the mapping a from + m to R m with m = m(m+1) 2 as the list of the diagonal entries and those above the diagonal in lexicographical order, i.e., for any r ∈ + m , a (r) is defined as a (r) 1 = r 11 , a (r) 2 = r 12 , ..., a (r) m = r 1m , a (r) m+1 = r 22 , ..., a (r)

.2)
We see now that the framework is truly non-classical as part of the observation process is noiseless.It follows that, given the observation, X t takes values in the level set M Z t as defined in the introduction.Hence, π t has support on M Z t .Therefore, π t will not have total support (unless σ is constant) and will be singular with respect to the Lebesgue measure on R d .
As was seen, only the diagonal entries and those above the diagonal of the process Z t (in other words, a Z t ) are required to generate Z t .Hence, we only need to take into account the properties of the mapping a σ a σ : , an m × d matrix.
Definition 2.4.The vector x ∈ R d is a regular point for a σ if and only if the matrix q(x) has full rank min (d, m).We shall denote the collection of all regular points by .
We will study the optimal filter π t in the next two sections according to the type of the level set M z .

The case d ≤ m
In this section, we consider the case when d ≤ m.We will assume that M z consists of countably many points and that its connected components do not branch or coalesce.Namely, we assume

Condition (R1): There exists a countable set I such that for any z
This condition holds true if a σ satisfies the following assumption In this case, for x ∈ R d , by the inverse function theorem (see, for example, Rudin [12]) there is a continuous bijection between an open neighborhood of w(a σ (x)) and an open neighborhood of x, where w : R m → R d is the projection of R m onto R d corresponding to those coordinates which give a minor of maximal rank for the matrix q(x).The composition between this continuous bijection and the projection w is, in effect, one of the continuous functions x i appearing in Condition (R1).In particular, if x ∈ M z , then there is no other element of M z within the corresponding open neighborhood.In other words, M z contains only isolated points.Thus, Condition (R) implies that all the level sets of σ (respectively, a σ ) are discrete and must be finite on any compact set and therefore countable overall.Hence there is a countable set of continuous functions describing the level sets.These continuous functions do not coalesce or branch as that would contradict the existence of the bijection at the point of coalescence/branching (in topological language, the number of connected components of the level sets is locally constant).Hence condition (R) implies Condition (R1).
By Condition (R1) and the continuity of the process X t , we see that if X 0 is non-random and X 0 = x i (Z 0 ) for some i ∈ I, then X t = x i (Z t ) for the same value of i ∈ I (the process does not 'jump' between connected components.In other words, the noiseless component of the observation uniquely identifies the conditional distribution of the signal given the observation: Next, we consider the case that X 0 is not constant.We need to take into account the additional information that may arise from observing the quadratic variation of the process a Z t and the covariation process between a(Z t ) and Y t .This will not influence the trajectory of π: From above we already know that it is deterministic given its initial value π 0 and the process a Z t .However, this additional information may restrict π 0 .
Applying Itô's formula to a σ (X t ) for X t being given by (2.1), we get where L is the second order differential operator given by It follows from (3.1) and (2.1) that the quadratic covariation process between a(Z t ) and Y t is is Y 0+ measurable.The analysis that follows hinges upon an explicit characterization of the information contained in Y 0+ .Such a characterization may not be available in general.However, we present two cases where it is possible and note that other cases may be deduced in an inductive manner.From Remark 2.1 we know that σ 2 (X 0 ), which is the derivative of the quadratic variation of the process Y at 0, is Y 0+ -measurable.Moreover, from the discussion following Lemma 2.2 σ(X 0 ) and a σ (X 0 ) are also Y 0+ -measurable.In Case 1, no more additional information is available from Z0 .This means that the Y 0+ -measurable random variables obtained by differentiating at zero the quadratic variation of a σ (X ) and the quadratic covariation process between a σ (X ) and Y are functions of a σ (X 0 ).In Case 2, these two new processes offer new information (they are not functions of a σ (X 0 )), but their corresponding quadratic variations and quadratic covariation processes do not have informative derivatives.In subsequent cases, which can be treated in a similar manner as the first two cases, more and more of the processes constructed by computing quadratic variations and quadratic covariations and differentiating at zero offer information (they are not functions of the already computed Y 0+ -measurable random variables).
Case 1: The matrices q cc T + cc T q T and qc are functions of a σ , in other words there exist two Borel measurable functions H 1 and H 2 from a( + m ) to R m× m and R m×m respectively such that Let us note that x ∈ R d is a regular point if and only if J (x) = det q T q(x) > 0.
Proof: We consider the filter πt = P X t ∈ • Ŷ t ∨ Z t .Note that π 0 = π0 , the law of X 0 , and From (3.2,3.3),we get that Z0 = H 1 a Z 0 , H 2 a Z 0 brings no new knowledge, hence we can ignore it.Following the proof of Theorem 2.8 in [6], we let µ z be the conditional probability distribution of X 0 given Z 0 = z, i.e., By the area formula (cf.Evans and Gariepy [2], p. 99, Theorem 2), we have where d is the d-dimensional Haussdorff measure (cf.Evans and Gariepy [2], p60).Taking φ = 1, we get Comparing with (3.4), we get Therefore, Hence µ z has the support on the set M z and Following the case with constant X 0 , we then have Before we consider the second case, we give an example for which the conditions of Theorem 3.1 are satisfied.
where Z denotes the collection of integers and the continuous functions x i : R 3 → R are defined as This proves Condition (R1).Condition (3.3) is also satisfied as The other conditions are easy to verify.Now we consider the other case.
Case 2: If q cc T + cc T q T and qc are not functions of a σ , then Z0 as defined in (3.2) offers new information, that is Y 0+ is larger than the σ-field generated by Z 0 .Hence the support of the distribution of X 0 given Y 0+ may be smaller than given σ(Z 0 ).To handle the new information, we need to impose additional constraints.Let σ 0 = σ, q 0 = q and σ k be the following matrix valued function mk be the dimension of the image of the mapping a σ k and q k = ∇a σ k .We replace Conditions (S), (R1) and (3.3) with the following: Condition ( S): The partial derivatives of σ 1 up to order 2 are continuous.

Condition ( R1): There exists a countable set Ĩ such that for any z
where x i : R m1 → R d , i ∈ Ĩ are continuous functions such that x i (z) = x j (z), for any i, j ∈ Ĩ, i = j.
Condition (IN k ): q k c and q k cc T + cc T q T k are functions of a σ k .
Then, the following analogue of Theorem 3.1 holds true.
where X x t = x i (Z t ) with i ∈ Ĩ is such that x = x i (Z 0 ), and . We now give an example for which the conditions of Theorem 3.3 are satisfied.
In this case m1 = 3, is a function of a σ 1 so (IN 1 ) holds and the level sets M z are described by where Z denotes the collection of integers and the continuous functions x i : R 3 → R are defined as

This proves Condition ( R1). The other conditions are easy to verify.
The analysis can continue in this manner: if (IN 1 ) is not satisfied by q 1 , we can define σ 2 in a similar manner with σ, q in (3.5) replaced by σ 1 and q 1 respectively; and the above procedure is then continued until Condition (IN k ) is satisfied.

The case d > m
In this section, we consider the case when d > m.We will show that M z is no longer a discrete set but rather a surface (manifold) and the optimal filter π t is a probability measure on the manifold M z and is absolutely continuous with respect to the surface measure.For this, we follow closely the analysis in [7].
Note that, if d > m, then x ∈ R d is a regular point if and only if We shall use T x M z to denote the space consisting of all the tangent vectors to the manifold (surface) M z at point x ∈ M z .T x M z is called the tangent space of M z at x.We will use N x M z to denote the orthogonal complement of T x M z in R d .We call N x M z the normal space of M z at x.We list in the following lemma some well-known facts about the transformation a σ without giving their proofs: ii) For any x ∈ M z , the rows of the matrix q generate the normal space N x M z to the manifold M z at point x.
Then p(x) is the orthogonal projection matrix from R m to the subspace N x M z .
For simplicity of the presentation, we make an assumption which is slightly stronger than (3.3).

Condition (IN):
There exist two Borel measurable functions H 1 and H 2 from a(S + m ) to R m×m and R m×d respectively such that qc = H 1 a σ and qc = H 2 (a σ ).To demonstrate this condition and the lemma above, we give an example.
It is clear that the row vector of q generates the normal space N x M z = {λ(1, 0) : λ ∈ R}.
We also assume: Condition (C): Both c and c are continuously differentiable.
Throughout the rest of this section, the assumptions (R, S, BC, X0, IN, ND, C) will be in force.The following theorem gives us the conditional distribution of X 0 given Z 0 .Since, in this case, Y 0+ coincides with the σ-field generated by Z 0 , it gives, in effect, π 0+ in the case of d > m.
We introduce λ u to be the surface measure on the level set M a −1 (u) for u ∈ R m and µ z to be the conditional probability distribution of X 0 given Z 0 = z, i.e., The following theorem shows that µ z is absolutely continuous with respect to λ u .
Theorem 4.3.Suppose that the density π0 is not identically zero on M a −1 (u) and satisfies the following integrability condition: .
Proof: For any test function φ defined on R d , and any Borel set D in R m×m , define Then, by the co-area formula (cf.Evans and Gariepy [2], chapter 3), we have By taking φ(x) ≡ 1, we get The result follows from the definition of the conditional expectation.
We now decompose the vector fields in the SDE satisfied by the signal according to their components in the spaces T x M z and N x M z .It is more convenient to use Stratonovich form for the signal process.That is, the signal X satisfies the following SDE in Stratonovich form: where Recall from (3.1) that the Y t -measurable process a(Z t ) satisfies where V t is the m-dimensional continuous martingale and where H 1 = qc and H 2 = qc.
Finally, we arrive at the main decomposition result.
Let {ξ t,s : 0 ≤ s ≤ t} be the stochastic flow associated with the SDE: Lemma 4.6.The flow ξ t,s maps M Z s to M Z t .Further, the process ξ t is Z t -adapted.
Proof: Applying the Stratonovich form of Itô's formula, we get Thus for σ(ξ s ) = Z s , we get σ(ξ t ) = Z t .The second conclusion follows from the uniqueness of the solution to the SDE (4.9).
Denote the column vectors of (I − p)c and It is well-known that the Jacobian matrix ξ ′ t,s of the stochastic flow ξ t,s is invertible (cf.Ikeda and Watanabe [5]).The operator (ξ −1 t,s ) * defined below pulls a vector at ξ t,s (x) back to a vector at x. Definition 4.7.Let g be a vector field in R d .The random vector field (ξ −1 t,s ) * g is defined as for any a σ regular point x ∈ R d .
We consider the following SDE on R d : Proof: Note that (ξ ′ t,0 ) −1 satisfies a diffusion SDE with bounded coefficients (cf.Ikeda and Watanabe [5]).By Gronwall's inequality, it is easy to show that Therefore, we can prove that there is a constant K such that Similarly, the above inequality holds with b 0 replaced by g i , 1 ≤ i ≤ m or g j , 1 ≤ j ≤ d.By standard arguments, we see that (4.11) has a unique strong solution.
The next theorem gives the decomposition of the signal Theorem 4.9.For almost all ω ∈ Ω, we have Proof: Denote the right hand side of (4.12) by X t (ω).Applying Itô's formula, we get By the uniqueness of the solution to (4.10), we see that the representation (4.12) holds.
The optimal filter then satisfies Note that ξ t,0 is Z t -measurable.Thus, we may regard ξ t,0 as known and the singular filtering problem can be transformed to a classical one as follows: For Then U t is the optimal filter with the signal process κ t being given by (4. Then Define process Zt by and function h by

.16)
Then, Zt is observable and Note that (4.11) can be written in the Itô form for suitably defined coefficients b, σ1 and σ2 .Denote while the unnormalized filter Ũt satisfies the following Zakai equation:  4.17) respectively.Let ξ t,s be the stochastic flow given by (4.9).Let Ũt be the unique F (M Z 0 )-valued solution to the Zakai equation (4.19) and U t = ( Ũt 1) −1 Ũt .Then, the optimal filter π t is given by Finally, we indicate that an analogue of the Kalman filter for the singular filtering setup is a part of a work in progress by Liu [10].We indicate the model and the result anticipated.For simplicity, we take m = 1.
Let the signal be given by the linear equation: Equations for the condition means and variances, as well as the weight process in the linear combination, will be derived.

The filtering model with Ornstein-Uhlenbeck noise
In this section, we consider the filtering problem with OU-process as the observation noise.As we indicated in the introduction, the OU-process is an approximation of the white noise which exists in the sense of generalized function only.We transform this filtering problem with OU-process noise to a singular filtering problem of the form studied in the previous sections.To be more general, we assume that the OU process is given by  then the pair signal/observation process is written as Hence the filtering model (5.1) is of the form (2.1).
Next, we give an example to demonstrate that for filtering problem with OU observation noise, both discrete and continuous singularity can occur.This example is a special case of those considered by Liu [10], so we omit the details here.
Example 5.1.Let the signal X i t , i = 1, 2, be governed by SDEs where σ 1 and σ 2 are two constants, and B 1 t and B 2 t are two independent one-dimensional Brownian motions.Suppose that the observation model is where O t is a real-valued OU process with a 1 = b 1 = 1.

Example 3 . 4 .
Let d = m = 1.The coefficients of the system are b

Example 4 . 2 .
Let d = 2 and m = 1.Define the coefficients by b(x) = x, c(x) = 0, c(x) = I and σ(x) = e x 1 .Then J(x) = e x 1 > 0 for all x ∈ R 2 and hence, Condition (R) is satisfied.Further, Then the pair process (V, Ṽ ) is an (m + d)-dimensional Brownian motion andd B t = (b 2 1 I + ∇hc(∇hc) T ) −1 (∇hc) T (b 2 1 I + ∇hc(∇hc) T ) −1/2 (X t )d V t −(b 2 1 I + (∇hc) T ∇hc) −1/2 (X t )d Ṽt .Hence, if we define the following functions:σ = (b 2 1 I + ∇hc(∇hc) T ) 1/2 c 1 = c(b 2 1 I + (∇hc) T ∇hc) −1 (∇hc) T (b 2 1 I + ∇hc(∇hc) T ) 1/2 c1 = −c(b 2 1 I + (∇hc) T ∇hc) −1/2 , 2m−1 = r 2m , a (r) 2m = r 33 , ..., a (r) m = r mm .: Suppose σ ∈ R m×m is symmetric.Then, σ ∈ + m if and only if d et(σ k ) > 0, k = 1, 2, • • • , m, where σ k is the k × k sub-matrix obtained from σ by removing the last m − k rows and m − k columns.Note that d et(σ k ) is a polynomial of the entries in σ.Thus the image a( + m ) consists of points in R m such that these polynomials of its coordinates are positive.This implies that a( + Proof .11) La σ (ξ t,0 (κ t ))d t + H 1 (a(Z t ))dW t + H 2 (a(Z t ))d B t .(4.14)Note that the filtering problem with signal (4.11) and observations (4.13) and (4.14) is classical.We sketch below the derivation of the Zakai equation for the unnormalized version Ũt of the process U t , and leave the details to the reader.Once this is done, we note that ))H 2 (a(Z t )) T = 0 a.s.Define independent Brownian motions Bt and Bt of dimensions m and d − m respectively by 11) and the observation ( Ŷt , a(Z t )) being given by d Ŷt = Z −1 t h(ξ t,0 (κ t ))d t + dW t (4.13) and d a(Z t ) = bX t d t + d B t , and the observation is d Y t = h T X t d t + |Q T X t |dW t , where (B, W ) is the d + 1-dimensional Brownian motion, b is a d × d-matrix and h, Q ∈ R d .It is anticipated that π t will be a linear combination of two normal measures on planes {x ∈ R d : Q T x = ±Z t }, where Z t = |Q T X t | is observable.