Online data processing: comparison of Bayesian regularized particle filters

The aim of this paper is to compare three regularized particle filters in an online data processing context. We carry out the comparison in terms of hidden states filtering and parameters estimation, considering a Bayesian paradigm and a univariate stochastic volatility model. We discuss the use of an improper prior distribution in the initialization of the filtering procedure and show that the regularized Auxiliary Particle Filter (APF) outperforms the regularized Sequential Importance Sampling (SIS) and the regularized Sampling Importance Resampling (SIR).


Introduction
The analysis of phenomena, which evolve over time is a common problem to many fields like engineering, physics, biology, statistics, economics and finance.A time varying system can be represented through a dynamic model, which is constituted by an observable component and an unobservable internal state.The hidden states (or latent variables) represent the informations we want to extrapolate from the observations.
In time series analysis, many approaches have been used for the estimation of dynamics models.The seminal works of Kalman (1960) and Kalman and Bucy (1960) introduce filtering techniques (the Kalman-Bucy filter) for continuous valued, linear and Gaussian dynamic systems.Maybeck (1982) motivates the use of stochastic dynamic systems in engineering and examines the estimation problems for state space models, in both a continuous and a discrete time framework.In economics, Harvey (1989) studies the state space representation of dynamic structural models and uses Kalman filter for hidden states filtering.Hamilton (1989) analyzes nonlinear time series models and introduces a filter (Hamilton-Kitagawa filter) for discrete time and discrete valued dynamic systems with a finite number of states.
In this paper, the online data processing problem is considered.In these situations, as pointed out by Liu and Chen (1998), Markov Chain Monte Carlo (MCMC) samplers are too much time demanding.To overcome this difficulty, some sequential Monte Carlo techniques have been recently developed.Doucet et al. (2001) provide the state of the art on these methods.They discuss both applications and theoretical convergence of the algorithms.
This work is structured as follow.Section 2 introduces the general representation of a Bayesian dynamic model and presents the stochastic volatility model.Section 3 reviews some regularized particle filters.Finally, Section 4 presents the results.

Bayesian Dynamic Models
We introduce the general formulation of a Bayesian dynamic model and show some fundamental relations for Bayesian inference on it.Our definition of dynamic model is general enough to include the models analyzed in Kalman (1960), Hamilton (1994), Carter and Kohn (1994), Harrison and West (1989) and in Doucet et al. (2001).Throughout this work, we use a notation similar to that one commonly used in particle filters literature (see Doucet et al. (2001)).
We denote by {x t ; t ∈ N}, x t ∈ X ⊆ R nx , the hidden states of the system, by {y t ; t ∈ N 0 }, y t ∈ Y ⊆ R ny , the observable variables and by {θ t ; t ∈ N}, θ t ∈ Θ ⊆ R n θ , the parameters of the model.We denote by x 0:t = (x 0 , . . ., x t ) the collection of hidden states up to time t and with x −t = (x 0 , . . ., x t−1 , x t+1 , . . ., x T ) the collection of all hidden states without the t-th element.We use the same notations for the observable variables and parameters.
The Bayesian state space representation of a dynamic model is given by: y t ∼ p(y t |x t , θ t , y 1:t−1 ) measurement density , (x t , θ t ) ∼ p(x t , θ t |x 0:t−1 , θ 0:t−1 , y 1:t−1 ) transition density , prior density , for t = 1, . . ., T .We suppose that p(x t , θ t |x 0:t−1 , θ 0:t−1 , y 1:t−1 ) = p(x t , θ t |x t−1 , θ t−1 , y 1:t−1 ).We also assume that the parameters are constant over time: the transition density of the parameters is then δ θ t−1 (θ t ) with initial value θ 0 = θ, δ x (y) denotes the Dirac's mass centered in x.In that case, the joint transition of hidden states and parameters is: Let us denote by z t = (x t , θ t ) the parameter-augmented state vector and by Z the corresponding augmented state space.For such models, we are interested in the prediction, filtering and smoothing densities which are given by: Due to the high number of integrals that must be solved, previous densities may be difficult to evaluate with general dynamics.Some Monte Carlo simulation methods, such as particle filters, allow us to overcome these difficulties.
As an example, let us consider the stochastic volatility model.Two of the main features of the financial time series are time varying volatility and clustering phenomena in volatility.Stochastic volatility models widely used in finance have been introduced, in order to account for these features.Let y t be the observable variable with time varying volatility and x t the stochastic log-volatility process.An example of stochastic volatility model is: where θ = (α, log((1 + φ)/(1 − φ)), log(σ 2 )).The choice of π(θ) will be discussed in Section 4. Fig. 1 shows two simulated paths of y t and x t .
In the next section, we deal with the problem of parameters and states joint estimation in a kernel-regularized sequential Monte Carlo framework.

Regularized particular filters
For making inference on the Bayesian dynamic model given in Section 2 in an online data processing context, MCMC algorithms are too much time demanding.Sequential importance sampling and more advanced sequential Monte Carlo algorithms called Particle Filters (Doucet et al., 2001) represent a promising alternative.The main advantage in using particle filters is that they can deal with nonlinear models and non-Gaussian innovations.In contrast to Hidden Markov Model filters, which work on a state space discretized to a fixed grid, particle filters focus sequentially on the higher density regions of the state space.This feature is common to one of the early sequential methods, the Adaptive Importance Sampling algorithm due to West (1992West ( , 1993)).
Different particle filters exist in the literature and different simulation approaches like rejection sampling, MCMC and importance sampling, can be used for the construction of a particle filter.In this work, we present some kernel-regularized particle filters, which combine the importance sampling reasoning with a suitable modification of the importance weights.The regularization approach we use is the same than the one of Liu and West (2001) and Musso et al. (2001).This approach relies upon a kernel-based reconstruction of the empirical filtering densities which produces a systematic modification of the true importance weights.Upper plot: daily dataset (α = 0, φ = 0.99 and σ 2 = 0.01).Bottom plot: weekly dataset (α = 0, φ = 0.9 and σ 2 = 0.1).

Regularized SIS
Let us start from the non-regularized SIS.We assume that at iteration t > 0 a properly weighted particle set {x i t , θ i t , γ i t } N i=1 , approximating the filtering density p(x t , θ t |y 1:t ), is available.The empirical distribution corresponding to this approximation is: The particles set, {x i t , θ i t , γ i t } N i=1 , can be viewed as a random discretisation of the state space X × Θ with associated probability weights {γ i t } N i=1 .Thanks to this discretisation, it is possible to approximate the prediction and filtering densities given in (1) and (2): The goal is now to obtain N particles {x i t+1 , θ i t+1 , γ i t+1 } N i=1 from the filtering density in (2).It is proposed to sample (x i t+1 , θ i t+1 ) according to the importance density q(•|x i t , θ i t , y 1:t+1 ).The importance weight of particle (x i t+1 , θ i t+1 ) is then calculated using the recursive formula: The choice of an optimal importance density q(•|x i t , θ i t , y 1:t+1 ), that is, a density which minimizes the variance of the importance weights is discussed in Pitt and Shephard (1999) and Crisan and Doucet (2000).In many cases, it is not possible to use this optimal importance density as the weight updating associated to the this density does not admit a closed-form expression.In that case, the transition density of the parameter-augmented state vector represents a natural alternative for the importance density.Indeed, the transition density represents a sort of prior at time t for the parameter-augmented state vector (x i t+1 , θ i t+1 ).In our case, due to the presence of the Dirac point mass at the numerator of the weights it is impossible to modify over the filtering iterations the particle values for the parameters.In practice due to the loss of particle diversity in the parameter space, the weights will tend to zeros and of course stay zero for ever, so we are facing a problem of degeneracy of the empirical filtering distribution.This scenario motivates particle filtering methods known as regularized particle filters.In order to avoid the degeneracy problem and to force the exploration of the parameter space toward regions which are not covered by the prior distribution, Liu and West (2001) and Musso et al. (2001) propose to use a regularized version of the filtering density.This approach results in the modification of the weights in (4) and the definition of a new set of weights: ) is a regularization kernel, K being a positive function defined on R n θ and h a positive smoothing factor (bandwidth).
The modification of the importance weights defined in (4) results from two steps.The first one is the regularization of the empirical density in (3) by a kernel estimator: The second one is the application of an importance sampling argument to the approximated filtering density: The convergences results associated with this type of approximation are recalled in Musso et al. (2001) and Oudjane (2000).Under some usual conditions on the kernel, when the number of particles increases to infinity, the regularized empirical density converges to the right one for various criteria.For instance, we have p R N −→ L 2 p. Thanks to this approximation, the regularization kernel becomes the natural choice for the parameters proposal distribution.Thus, we sample (x i t+1 , θ i t+1 ) according to: In that case, we have: .
In Algorithm 1, we give a pseudo-code representation of this method.

Regularized SIR
As it is well known in the literature (see for example Arulampalam et al. (2001)), basic SIS algorithms have a degeneracy problem.After some iterations the empirical distribution degenerates into a Dirac's mass on a single particle.This due to the fact that the variance of the importance weights is non-decreasing over time (see Doucet et al. (2000)).In order to solve this degeneracy problem, Gordon et al. (1993) introduce the SIR algorithm.This algorithm belongs to a wider class of bootstrap filters.At each iteration, a resampling step is used to generate a new set of particles.After this resampling step, the weights of the resampled particles are uniformly distributed over the particle set.
In the initial SIR, the resampling step is done at each iteration of the algorithm.This systematic resampling can introduce extra Monte Carlo variations, see Liu and Chen (1998).This can be reduced be doing resampling only when the Effective Sample Size (ESS) is small.The ESS measures the overall efficiency of an importance sampling algorithm.The ESS is a function of the coefficient of variation of the importance weights.At iteration t, the empirical EES is In Algorithm 2, we give a pseudo-code representation of this method.
Algorithm 2 -Regularized SIR Particle Filter -• At time t = 0, for i = 1, . . ., N , simulate z i 0 ∼ p(z 0 ) and set 3. Update the weights: The value of κ < N is calibrated depending on the problem.

Regularized APF
Due to the resampling step, the basic SIR algorithm produces a progressive impoverishment (loss of diversity) of the information contained in the particle set.To overcome this difficulty, many solutions have been proposed in the literature.We refer to the APF due to Pitt and Shephard (1999) and to the regularized APF algorithm due to Liu and West (2001).In order to avoid the resampling step, the APFs use the particle index (auxiliary variable) to select most representative particles in the proposal of the new particles.The regularized joint distribution of parameter-augmented state vector and the particle index is: A sample approximating that distribution can be obtained by using the proposal: where q(j i |y 1:t+1 ) ∝ p(y t+1 |µ j i t+1 , m j i t+1 , y 1:t )w j i t , µ j i t+1 and m j i t+1 are evaluated using the initial particle set.Therefore, the importance weight of particle (x i t+1 , θ i t+1 , j i ) is: .
In Algorithm 3 we give a pseudo-code representation of the regularized APF.
We can say that, in the APF, the selection step is done before simulating the hidden states.This selection depends on the current value of the observable.Therefore, • the APF is a standard way to construct a proposal distribution for the hidden states that depends on the current of the particle; • as we will see after, to use this selection step and the transition distribution as proposal distribution for the hidden states, results in a good proposal distribution.
In the next section, we compare the performances of some regularized SIS, SIR and APF for the stochastic volatility model.

Application to the stochastic volatility model and comparison
In this section, we apply the three regularized particle filters to the stochastic volatility model presented in Section 2. We assume that the initial value of the SV process follows the stationary distribution: x 0 ∼ N (0, σ 2 /(1 − φ 2 )) .
For the parameters β, σ and φ, we assume the prior where β = e α .We constrain the parameter φ to take values in the open interval (−1, 1) in order to impose the usual stationarity condition.As we use an improper prior, it is not possible to use the prior distribution for initializing the three particle filters.We need to start with a proper weighted sample {x i 0 , θ i 0 , ω i 0 } N i=1 .We propose to: 1) start the sequential filtering procedure at least at the value n of t, such that the posterior distribution of the parameters given all the observations up to time n is well defined: for the considered SV model, this corresponds to set n ≥ 2; 2) use a Markov Chain Monte Carlo (MCMC) algorithm to create a sample with uniform weights.
For the prior given above, the full conditional distributions are The full conditional distributions of φ and x t are not conventional and the standard Gibbs does not apply.We propose to use the Metropolis-Hastings within Gibbs algorithm studied in Celeux et al. (2006).A detailed description of the proposal distributions for φ and x t can be found in Celeux et al. (2006).In that paper, the authors compare this MCMC scheme to an iterated importance sampling one.Note that one could alternatively use this iterated importance sampling algorithm to create a first weighted sample.
Given the initial weighted random sample ), if we use the transition density as proposal distribution for the hidden states, the regularized SIS performs the following steps: For n ≤ t ≤ T − 1 and for i = 1, . . ., N : where V t and θt are the empirical covariance matrix and the empirical mean respectively, a ∈ [0, 1] and b 2 = (1 − a 2 ), (iii) Update the weights as follow In the following, we call the previous scheme SIS.Instead of the transition density, we can use a proposal distribution which depends on the current value of the observable.For instance, we can resort to the proposal distribution used for the hidden states, in the MCMC initialization step.This proposal has been introduced by Shephard and Pitt (1997).It is based on a quadratic Taylor expansion of exp(x t ), more details can be found in Shephard and Pitt (1997) or Celeux et al. (2006).It defines a new SIS which is called SIS-p in the following.
The results of a typical run of the regularized SIS on the synthetic dataset in Fig. 1, with N = 10, 000 particles and n = 100 for the Gibbs initialization, are given in Fig. from 2 to 6.We can see (last row in Fig. 2) that after a few iterations the filtered log-volatility does not fit well to the true log-volatility.We measure sequentially the filtering performance of the regularized SIS by evaluating the cumulated Root Mean Square Error (RMSE).It measures the distance between the true and the filtered states and is defined as: , where ẑt is the filtered state, which includes also the parameter sequential estimate.The RMSEs cumulate rapidly over time in both daily and weekly datasets (see upper and bottom plots in Fig. 3).The poor performance of the regularized SIS is due to the fact that the empirical posterior of the states and parameters degenerates into a Dirac's mass after a few iterations.The ESSs in Fig. 4 show that the regularized SIS degenerates after 30 iterations in both the daily and weekly datasets.We give some results on SIS-p in the following.
If we use the transition density as proposal distribution for the hidden states, the regularized SIR performs the following step: For n ≤ t ≤ T − 1 and for i = 1, . . ., N : θt , b 2 V t where V t and θt are the empirical covariance matrix and the empirical mean respectively and a ∈ [0, 1] and b 2 = (1 − a 2 ), (iii) Update the weights ) and set w i t+1 = 1/N .
If κ = N , the resampling step is done all the time.In that case, we call SIR the previous scheme.After some numerical experiments, we have found that a good value for κ is κ = 0.9 × N .In that case, the resampling step is done at regular time intervals and we called SIR-r the resulting algorithm.Moreover, as for the SIS, we can resort to the proposal of Shephard and Pitt (1997) for the hidden states.In that case, with κ = N , the corresponding algorithm called SIR-p.With κ = 0.9 × N the corresponding algorithm is called SIR-r-p.
Note that, following Pitt and Shephard (1999), one could alternatively use in the selection step a value of µ k t+1 based on the Taylor expansion of the likelihood at time t + 1.For the three regularized particle filters, we have used a Gaussian kernel where the parameter a is fixed following the usual optimal criterion.
We apply the regularized SIR-r-p and APF with N = 10, 000 and n = 100 to the weekly and daily datasets of Fig. 1 and obtain the results given in Fig. from 2 to 6.The regularized SIR-r-p and APF outperform the regularized SIS in terms of ESSs and cumulated RMSEs.The ESSs can detect the degeneracy in the particle weights, but is not useful to determine the presence of another form of degeneracy, that is the absence of diversity in the particle values.The histogram of the empirical filtering distribution allows us to detect this second form of degeneracy.
As our work deals with the sequential estimation of the parameters, we choose to show the histogram of the parameters posterior.Fig. 5 and 6 exhibit the evolution over the filters iterations of the posterior of the parameters α, φ and σ 2 .In both the daily and the weekly datasets, after a few iterations the empirical posterior of the regularized SIR-r-p degenerates into a Dirac's mass.
To confirm the previous results, we have done ten independent runs of the seven algorithms: SIS, SIS-p, SIR, SIR-p, SIR-r, SIR-r-p and APF.The weekly and daily datasets vary across the 10 experiments.Our simulation study confirms the results of the single-run experiment.Fig. 7 and 8 show a comparison between the regularized schemes in terms of RMSEs.The RMSEs are estimated over the 10 independent runs of the algorithms.The regularized SIR-r-p and APF outperform the others algorithms in both the daily and the weekly datasets.The estimated Mean Square Errors for the parameters α, φ and σ 2 (see Table 1 and 2), based on 10 independent runs of the filters, show that the regularized APF outperforms all the others schemes in term of parameters estimation.
Fig. 9 and 10 show a comparison of the filters in terms of ESS.As one could expect, in all the independent runs the regularized SIS and SIS-p weights degenerate after a few iterations.We also observe that the ESSs of the regularized SIR, SIR-r and APF are substantially equivalent.On the other hand, the average ESSs of the SIR-p and SIR-r-p are lower than the ones of the SIR, SIR-r and APF.

Conclusion
In this work we bring into action the kernel regularization technique for particle filters and deal with the online parameter estimation problem.While the regularized APF has been already used for the parameter estimation, the regularized versions of SIS and SIR have not been considered to that aim.We focus on the joint estimation of the states and parameters and compare some algorithms on a Bayesian nonlinear model: the Bayesian stochastic volatility model.As we expected, we find evidence of the degeneracy of two different regularized SIS.Finally, we find that, in terms of parameters estimation, the regularized APF outperforms all the others schemes.

Figure 3 :Figure 4 :
Figure 3: Daily (upper plot) and weekly (bottom plot) Root Mean Square Errors for the regularized APF (solid line), SIR-r-p (dashed line) and SIS (dotted line) over iterations.

Figure 5 :
Figure 5: Evolution on daily dataset of the empirical posterior distributions of α, φ and σ 2 .APF for α APF for φ APF for σ 2

Figure 6 :
Figure 6: Evolution on weekly dataset of the empirical posterior distributions of α, φ and σ 2 .