On Parameter Estimation of the Hidden Gaussian Process in perturbed SDE

We present results on parameter estimation and non-parameter estimation of the linear partially observed Gaussian system of stochastic differential equations. We propose new one-step estimators which have the same asymptotic properties as the MLE, but much more simple to calculate, the estimators are so-called"estimator-processes". The construction of the estimators is based on the equations of Kalman-Bucy filtration and the asymptotic corresponds to the small noises in the observations and state (hidden process) equations. We propose conditions which provide the consistency and asymptotic normality and asymptotic efficiency of the estimators.


Introduction
Let us consider the problem of parameter estimation for partially observed linear system. The observed process is:
In this paper we discuss the asymptotic properties of the estimators in "small noises" case, that is for ε → 0. In [15], the well known maximum likelihood estimator (MLE)θ ε and Bayes estimator (BE)θ ε are introduced. Denote the true value of the unknown parameter as ϑ 0 ∈ Θ. The construction of these estimators is based on the likelihood ratio function Here m (ϑ, t) = E ϑ (Y t |X s , 0 ≤ s ≤ t) is the conditional expectation, satisfying the Kalman-Bucy filtration equations dm (ϑ, t) = a (ϑ, t) m (ϑ, t) dt with initial values m (ϑ, 0) = y 0 and γ (ϑ, 0) = 0. The Kalman-Bucy filter enables us to have online estimating or track unobservable signals. Thus the equation is linear and the Gaussianity of the observable perturbed SDE yields the asymptotical properties of the estimators. For example, the MLE and BE are given by the following relations L θ ε , X T = sup ϑ∈Θ L ϑ, X T ,θ ε = Θ ϑp (ϑ) L ϑ, X T dϑ Θ p (ϑ) L (ϑ, X T ) dϑ . (1.5) with p(ϑ) some known prior density function. Denoting I(ϑ 0 ) (details on Section 3) the Fisher information, it is shown in [15] that under regularity condi-tions, the two estimators are consistent, asymptotically normal and asymptotically efficient. We take here and after "=⇒" for "convergence in distribution". It is evident that the construction of the MLE and BE according to the relations (1.5) is hard to compute, because we need solutions of the system (1.3)-(1.4) for all ϑ ∈ Θ. Therefore numerical realization of these Constructions is difficult.
In the present work our goal is to propose other estimators, which can be much more easily calculated, and have the same asymptotic properties as mentioned in (1.6). Moreover, we construct estimator-processes, i.e., estimators which evolve in time. The One-step estimator in general is well known. Such onestep procedure was firstly proposed by Fisher [6]. Then it was used by many authors, see, e.g., [8], [14], [12], [21], [18].
The construction is done in two steps. In Section 2, we construct a preliminary consistent estimatorθ τε by the observation X τε = (X t , 0 ≤ t ≤ τ ε ), where τ ε = ε δ , with δ ∈ (0, 1). Note that τ ε → 0 when ε → 0. This means that the preliminary estimator depends only on a small part of initial observations, of which the length converges to zero, but slower than ε. In Section 3 and Section 4, we propose the One-step MLE and then the estimator process, in using the preliminary estimator and the score function: where dot means the derivation w.r.t. ϑ, m(t, ϑ) is the conditional expectation of Y t w.r.t. {X s , 0 ≤ s ≤ t} and M (ϑ, s) = f (ϑ, s) y (ϑ, s), with y (ϑ, s) defined in Section 3. In Section 5, we propose an efficient estimator for conditional expectation m(θ 0 , t), which is used in the definition of estimator, but the solution could be disturbing. In Section 6, two examples are given and so does a numerical realization.

Preliminary estimator
if the function h (·, ·) is continuously differentiable on ϑ or t respectively. We denote the derivatives w.r.t. ϑ asḣ (ϑ, t) and the derivatives w.r.t. t as h (ϑ, t).
Let us introduce the Conditions R:
Recall that R 4 is a necessary condition for existence of consistent estimator for the model of observations (1.1)-(1.2). If y 0 = 0, then we can denoteX t = ε −1 X t ,Ŷ t = ε −1 Y t and rewrite the system (1.1)-(1.2) as follows Hence this system does not depend on ε and there is no consistent estimation (see Khasminskii [13]). The condition which makes main sense here is R 3 , which propose the identification of the unknown parameter. In proposition 2.1 we consider how to construct the preliminary estimator if this condition is replaced by another one. Without loss of generality we suppose that Let us denote x t (ϑ) and y t (ϑ) the solutions of the equations (1.1), (1.2) for ε = 0: The following equalities are easy to calculate: Hence we can write where ξ t and η t are Gaussian processes with E ϑ0 ξ t = 0, E ϑ0 η t = 0 and for any The constants C 1 > 0, C 2 > 0 do not depend on ϑ 0 and t ∈ [0, T ] (see [23]).
In the vicinity of the point t = 0 the function x t (ϑ) is monotonically increasing on ϑ according to (2.1), and we have Further, following [14] and [20] we put τ ε = ε δ , δ > 0, and introduce three sets we define the estimator where μ ε is the solution of the equation This is what we called preliminary estimator, which will be used in the next section for construction of asymptotically efficient estimator. The definition of this preliminary depends on the solution of the equation x τε (μ ε ) = X τε , but only for fixed time τ ε , which is of small value. This would not bring difficulties in numerical realization, as what we will do in Section 6. Theorem 2.1. Suppose that the conditions R be fulfilled and δ ∈ (0, 2), then the estimatorθ τε is uniformly consistent on compacts K ∈ Θ, i.e., for any ν > 0 and any compact K Moreover, for any p > 0, By conditions R 2 and R 3 , there exists κ * > 0 such that for sufficiently small t we have the estimate With the help of (2.2), we have Similar estimate could be obtained for the probability P ϑ0 (B + ε ). Further withμ ε certain value between μ ε and ϑ 0 according to the mean value theorem. For the moments, Thus we obtain the convergence of the moments for δ ∈ (0, 2).
The condition R 3 is not fulfilled in this case. We introduce new conditions as follows: Then the estimatorθ ε is uniformly consistent on compacts K ⊂ Θ and Proof. The proof of Theorem 2.1 can be applied here with the only difference in the estimation ofẋ t (ϑ). We havė Hence follow the proof of Theorem 2.1 we obtain the estimate

One-step MLE
Let us re-write the equation (1.3)-(1.4) as follows It can be shown that the random process m (ϑ, t) , 0 ≤ t ≤ T is continuously differentiable w.r.t. ϑ with probability 1 (see, e.g., [16]). The derivativeṁ (ϑ, t) satisfies the equation Therefore for ϑ = ϑ 0 and ε = 0, we obtain deterministic functionẏ (ϑ 0 , t) ≡ m (ϑ 0 , t)| ε=0 , which can be written as followṡ Note that y (ϑ, t) = y t (ϑ) except at point ϑ 0 , and thatẏ (ϑ 0 , t) =ẏ t (ϑ 0 ). We define the Fisher information as follows which is obtained as the following limit The family of measures of this statistical experiment is locally asymptotically normal, therefore we have the Hajek-Le Cam's lower bound on the mean square risks of all estimators ϑ ε (see, e.g., [16]) We call the estimator ϑ * ε asymptotically efficient if for all ϑ 0 ∈ Θ we have Let us introduce the estimator Theorem 3.1. Suppose that the conditions R 1 ,R 2 , R 3 , R 4 are fulfilled and δ ∈ (0, 1), then the One-step MLE ϑ ε has following properties:

It is uniformly asymptotically normal
3. The moments converge: for any p > 0
Here we denotedR ε ,R ε , the random variables satisfying the estimates Therefore we obtained the representation We obtain thus the uniform convergence of moments on compacts:

Y. A. Kutoyants and L. Zhou
This uniform convergence gives us the asymptotical efficiency: The estimatorθ τε is defined by the same equation (2.3). The Fisher information is We define the estimator just as (3.4), in puttingḟ θ τε , t = 0: t . Proposition 3.1. Suppose that the conditions R 1 , R * 2 , R 3 , R 4 be fulfilled and δ ∈ (0, 1/3), then the One-step MLE ϑ ε has the properties

It is asymptotically efficient.
Proof. The proof of this proposition is similar as the one given in Theorem 3.1. The only difference is in the estimate:

One-step MLE-process
Let us consider the problem slightly different. We have the model of observations (1.1)-(1.2) with unknown parameter ϑ ∈ Θ and we are interested in the construction of the adaptive Kalman-Bucy filter to approximate m (ϑ 0 , t) = E ϑ0 (Y t |X s , 0 ≤ s ≤ t). We can not use m (ϑ ε , t) because the estimator ϑ ε depends on the observations X T = (X s , 0 ≤ s ≤ T ), where (X s , t < s ≤ T ) are the future. Therefore we need an estimator-process {ϑ t,ε , 0 < t ≤ T }, where the estimator ϑ t,ε has the following properties • it is adaptive and depends on (X s , 0 ≤ s ≤ t); • it can be easily calculated; • it is asymptotically efficient.
We prefer not to use the MLEθ t,ε obtained as solution of the equation  4). The numerical realization of such procedure would be too complicated.
We propose the One-step MLEθ t,ε , τ ≤ t ≤ T , called One-step MLE-process, defined as followŝ Here τ is fixed which means we construct the preliminary estimator on observation {X s , 0 ≤ s ≤ τ }.
Proposition 4.1. Let the conditions R 1 , R * 2 , R 3 , R 4 be fulfilled and δ ∈ (0, 1). Then the One-step MLE-processθ t,ε , τ < t ≤ T is uniformly consistent, asymptotically normalθ and the random process η ε (t) = ε −1 θ t,ε − ϑ 0 , τ ≤ t ≤ T for any τ ∈ (0, T ), converges in distribution to the random process η (t): Proof. For any t ∈ (τ, T ], the consistency and asymptotical normality follow from Theorem 3.1. We consider the vector (η ε (t 1 ) , . . . , η ε (t k )). The representation obtained in the proof of Theorem 3.1: allows us to verify the estimate where the constant C > 0 does not depend on ε. The calculations are direct but cumbersome. The convergence of finite-dimensional distributions and estimate (4.2) provide us that the measures induced in the space of continuous functions C [τ, T ] by the processes η ε (·) converge weakly to the measure of the process η (·) (see details of the proof in the similar situation in [18], [25]). This weak convergence provides us the relation: for any τ ∈ (0, T ) and any ν > 0 Here W (·) is a Wiener process and Thus we have the One-step MLE-process is uniformly consistent: for any τ ∈ (0, T ) and any ν > 0 we have We remind that the initial time τ could be fixed or depend on ε as τ ε = ε δ → 0, this means that the length of preliminary observation converges to zero as ε → 0, but slower. We define a modified One-step MLE ϑ t,ε , τ ε ≤ t ≤ T as follows Then the modified One-step MLE-process ϑ t,ε , τ ε < t ≤ T is consistent, asymptotically normal, and efficient.

On efficient estimation of m (ϑ 0 , t)
Recall that m (ϑ 0 , t) is the mean squared optimal estimator of Y t . The random process m ε (t), τ ε ≤ t ≤ T can be considered as an estimator of the random function m (ϑ 0 , t) , τ ε ≤ t ≤ T . It is interesting to study the asymptotically efficient estimators in this problem. For any estimatorm ε (t) of the random function m (ϑ, t), we have the following lower bound on the mean square error.
Proof. The proof of this inequality follows the main steps of the proof of van Trees inequality in [7] and in [22]. Let us remind the main steps of the proof. For given density function p (ϑ) , ϑ 0 − ν < ϑ < ϑ 0 + ν such that p (ϑ 0 ± ν) = 0, we introduce the Fisher information Then we can write

Y. A. Kutoyants and L. Zhou
Note that E is the double mathematical expectation defined by the last equality. Let us denote We have where we changed the measure E ϑ0 L (ϑ, ϑ 0 , X t ) = E ϑ . Recall that Here we used Cauchy-Shwarz inequality and the property of stochastic integral which can be written as follows Therefore we obtained the van Trees inequality (5.1).
The asymptotically efficient estimator of the conditional expectation m (ϑ 0 , t) is defined as estimatorm ε (t) which satisfies To construct an asymptotically efficient estimator we have to modify slightly the estimator m ε (t). The solutions of the equations (1.3) and (4.4) can be written as follows We can not put ϑ t,ε in m (ϑ 0 , t) because the stochastic integral t τε F ϑ t,ε , s dX s is not well defined for F (ϑ, t) = Q (ϑ, s)N (ϑ, s) −1 . We will use the so-called robust version of the integral. It is given by the formula Let us denote the right side of equality (5.7) as G (ϑ, X t , t) and introduce the estimator which we compare with Remind that we comprehend F ϑ t,ε , s as that is written in (5.8), therefore the equality (5.7) will no more be valid if ϑ is replaced by ϑ t,ε . This means that the estimator in (5.9) is not exactly m ϑ t,ε , t . Recall that It is interesting to see if the proposed estimator m ε (t) is optimal in some sense.

Y. A. Kutoyants and L. Zhoü
Therefore we have for any p > 0 We can write formally

Two examples
We show the construction of preliminary estimators, which are simpler and could be easily realized by numerical method.

Example 1.
In this example we have the following system: For simplicity of exposition, we suppose that functions f t , a t , σ t , b t , t ∈ [0, T ] are positive and bounded. The function f t ∈ C (1) t . The observations are X T = (X t , 0 ≤ t ≤ T ) and the process {Y t , 0 ≤ t ≤ T } is hidden. The unknown parameter ϑ ∈ Θ = (α, β), α > 0.
The limit (ε = 0) system is We put τ ε = ε δ and define the preliminary estimator by the relation Parameter estimation of hidden Gaussian process 229 Example 2. In the second example the partially observed system is We have Y t = y t (ϑ 0 ) + εξ t and the limit equation is y t (ϑ 0 ) = y 0 + ϑ 0 t 0 a s y s (ϑ 0 ) ds.
Therefore we can write The error of estimation is Hence if δ ∈ (0, 2/3), then this estimator is consistent. The optimal choice, which minimizes the right hand side of this inequality is δ = 2/5. In the simulation, we take firstly constant coefficient functions. We put f t = f, a t = a, σ t = σ, b t = b for constants f, a, σ and b. Thus we have y t (ϑ) = y 0 e aϑt and x t (ϑ) = x 0 + fy 0 aϑ e aϑt .
With fixed constants and ε, true value ϑ 0 = 1, we simulate processes X t and Y t , and then calculate the preliminary estimatorθ τε according to (6.1), with τ ε = ε 1/6 . Thus we have the estimator-process  We show in the first graphic Figure 1, the consistency of the preliminary estimatorθ τε and the One-step estimator ϑ T,ε w.r.t. ε. Note that the estimators converge to the true value when ε goes to zero. In the second graphic Figure 2, we show the consistency of the One-step MLE process ϑ t,ε w.r.t. t, for fixed ε = 0.2. The process begins with a preliminary estimator and converges to the true value after some fluctuation. The fluctuation after the preliminary estimator is caused by the small value of the initial Fisher information. In the third graphic Figure 3, we present the normal approximation of ε −1 ϑ t,ε − ϑ 0 for fixed ε.