Sequential Monte Carlo Smoothing with Parameter Estimation

We propose two new Bayesian smoothing methods for general state-space models with unknown parameters. The first approach is based on the particle learning and smoothing algorithm, but with an adjustment in the backward resampling weights. The second is a new method combining sequential parameter learning and smoothing algorithms for general state-space models. This method is straightforward but effective, and we find it is the best existing Sequential Monte Carlo algorithm to solve the joint Bayesian smoothing problem. We first illustrate the methods on three benchmark models using simulated data, and then apply them to a stochastic volatility model for daily S&P 500 index returns during the financial crisis.


Introduction
State-space models are an extremely general class of models for time series. These models allow for nonlinear and non-Gaussian features that occur in many fields, including finance, ecology, biology and engineering. Over the past few decades, sequential Monte Carlo (SMC) methods have become extremely popular for online state and parameter estimation in state-space models. However, these methods have been largely ignored for Bayesian smoothing, which involves retrospective estimation of states and parameters. A paper by Carvalho, Johannes, Lopes, and Polson (2010) has brought renewed attention to this topic. Here, following on that paper, we propose two new SMC smoothing algorithms.
SMC methods, also known as particle filters, were developed for online state estimation in state-space models. The idea was introduced by Gordon, Salmond, and Smith (1993) with the name Bootstrap Filter. Pitt and Shephard (1999) then proposed an improved approach called the Auxiliary Particle Filter. However, the problem of estimating unknown parameters with SMC methods has generally received less attention. Kitagawa (1998) proposed augmenting the state vector to include the unknown parameters, and applying a particle filter on the augmented state. To avoid over-dispersion problems in the estimation of states and parameters, Liu and West (2001) used kernel density estimation for the static parameters. This algorithm remains the most general method for sequential Bayesian state and parameter estimation. Storvik (2002) and Fearnhead (2002) discussed sequential Bayesian inference for parameters in situations where sufficient statistics for the parameters are available. Lopes and Tsay (2011) and Kantas, Doucet, Singh, Maciejowski, and Chopin (2015) provide excellent reviews of existing parameter learning approaches.
There is a large literature on SMC smoothing when parameters are known. Smoothing involves estimation of states conditional on observations up to the final time period. Kitagawa (1996) introduced the idea of particle smoothing by storing the state vector. This is a forward-only algorithm where the smoothed state trajectories are generated recursively by reweighting and resampling the filtered particles within a smoothing (time) window. But as time evolves and the size of the smoothing window increases, the smoothed samples at the start of the time series will degenerate to a single path. Godsill, Doucet, and West (2004) proposed the forward-backward smoother, in which a backward recursion is included and the forward filter particles are reweighted. Other smoothing algorithms include the two-filter smoother of Kitagawa (1996), the generalized two-filter smoother of Briers, Doucet, and Maskell (2010), and the O(N 2 ) and O(N ) smoothing algorithms of Fearnhead, Wyncoll, and Tawn (2010). These methods are collectively known as particle smoothing.
The literature on particle smoothing with unknown parameters is quite limited. To our knowledge, there are only two existing SMC approaches for smoothing with parameter estimation. The first is the self-organizing state-space models of Kitagawa (1998), which uses a fixed-lagged smoother with state augmentation. The second is the particle learning and smoothing (PLS) algorithm of Carvalho, Johannes, Lopes, and Polson (2010). PLS is a two-stage algorithm with a particle filtering and learning step, followed by separate backward reweighting step for each sample of the parameters. However, the PLS smoothing algorithm does not account for the dependence between states and parameters. In this paper, we propose an adjusted PLS algorithm that takes into account this dependence in defining the backwards resampling weights. In addition, we propose a new smoothing algorithm, in which we apply a forward-backward smoother on each parameter drawn from the last filter step and get a smoothed sample of the states, while accounting for parameter uncertainty.
Markov chain Monte Carlo (MCMC) methods have also been used for Bayesian smoothing. Carlin, Polson, and Stoffer (1992) introduced the first MCMC smoothing algorithms for non-normal and nonlinear state-space models. Carter and Kohn (1994) and Frühwirth-Schnatter (1994) proposed the forward-filtering, backward-sampling (FFBS) algorithm and de Jong and Shephard (1995) developed the related simulation smoother for conditionally Gaussian models. The FFBS is a highly efficient block sampler that draws the states jointly given the parameters for linear, Gaussian state-space models. Shephard and Pitt (1997) and Gamerman (1998) provided block sampling algorithms for dynamic exponential family and generalized linear models, respectively. Other MCMC smoothers include Scipione and Berliner (1992), Geweke and Tanizaki (2001), Stroud, Müller, and Polson (2003) and Niemi and West (2010). Andrieu, Doucet, and Holenstein (2010) introduced another approach, called Particle Markov chain Monte Carlo (PMCMC), for off-line state smoothing and parameter estimation. This method combines MCMC and SMC approaches. A popular PMCMC algorithm is the Particle Marginal Metropolis-Hastings (PMMH) sampler. At each Markov chain iteration, this method simulates a parameter from a proposal density, and uses it within a SMC algorithm to create a high-dimensional proposal distribution for the smoothed states. One uses the approximate likelihood from the SMC and the prior density for the parameters, to accept or reject the proposed states and parameters. The advantage of PMMH is that it integrates out the states when updating the parameters and generally reduces autocorrelation in the sampler. The disadvantage is that this method is computationally intensive, since one must run a new SMC algorithm at each MCMC iteration.
Our new approaches have a number of advantages relative to existing methods. Refiltering provides samples from the correct posterior distribution as the number of particles goes to infinity. Unlike MCMC methods, our smoothing methods are not iterative and do not require convergence of a Markov chain and can be easily parallelized. Finally, in contrast to MCMC approaches, our methods provide accurate estimates of the marginal likelihood, which is used for Bayesian model evaluation and selection.
The paper is organized as follows. In Section 2, we give a short review of particle filtering and smoothing algorithms. Two new smoothing algorithms are proposed in Section 3. In Section 4, we compare the two new algorithms to PLS, MCMC and PMMH on three simulated examples: an autoregressive plus noise model, a nonlinear growth model, and a chaotic model. In Section 4.4, we compare the algorithms using a stochastic volatility model for daily S&P 500 returns. Finally, Section 5 concludes.

Filtering and Smoothing with SMC
Consider a general state-space model defined at discrete times t = 1, . . . , T , where y t are the observations, x t are the unobserved states, θ are the static parameters, and p(·|·) denotes a generic conditional distribution. The measurement distribution (1) relates the states to the observations; the transition distribution (2) describes the evolution of the states over time; and the initial distribution (3) summarizes beliefs about the initial state x 0 . The Bayesian model is completed with a prior distribution on the unknown parameters, θ ∼ p(θ). The state-space model (1)-(3) is quite general, making two main assumptions: x t follows a first-order Markov process, and y t 's are conditionally independent given the states.
Bayesian inference in state-space models requires calculation of the joint posterior distribution of the states and parameters. This can either be done sequentially or retrospectively. Let x s = (x 0 , . . . , x s ) and y s = (y 1 , . . . , y s ) denote the states and observations up to time s. There are two main types of inference in state-space models. The filtering problem involves sequential calculation of p(x t , θ|y t ), for t = 1, . . . , T . The smoothing problem requires calculation of p(x T , θ|y T ), retrospectively. Here, the objective is to solve the joint smoothing problem, which requires computing the joint posterior distribution for the states and parameters: For most models, this distribution is unavailable in closed form, so Monte Carlo methods are required to sample from it. In particular, we introduce two novel algorithms that draw samples from the joint posterior smoothing distribution p(x T , θ|y T ). Furthermore, we compare the performance of these algorithms to the other existing smoothing methods, such as PLS, MCMC and PMMH, using simulation studies and real data examples.
Traditionally, SMC methods assume that θ is known and are designed to approximate p(x t |y t , θ) with a set of weighted samples or particles. The subsections below give a brief overview of sampling importance resampling (SIR) particle filters, particle filters with unknown parameters, and the particle learning and smoothing algorithm.

Particle Filtering
The particle filter was first introduced by Gordon, Salmond, and Smith (1993) to conduct state estimation in nonlinear, non-Gaussian state-space models. Based on importance sampling, the particles x  t . The filtering density p(x t |y t ) can then be approximated by a discrete empirical distribution based on these particles, The algorithms to implement the SIR particle filter, along with information about other algorithms, are given in the online supplementary materials (Yang et al., 2017).

Particle Filtering with Unknown Parameters
To deal with particle filtering with unknown parameters, Kitagawa (1998) introduced the idea of augmenting the state by the parameters as z t = (x t , θ) , then applying a bootstrap filter to the augmented state vector z t . Kitagawa and Sato (2001) extended the idea by adding artificial noise to the parameters in the evolution equation to avoid particle degeneracy (i.e., collapse of samples to a single value) as time progresses. Liu and West (2001) proposed an improvement to Kitagawa's method by drawing samples of the parameters from a kernel density of the form at each filter step t. Here m whereθ t and V t are the sample mean and variance-covariance matrix of the posterior samples of θ t at time t, and a = √ 1 − h 2 is a smoothing parameter between 0 and 1. Notice that a = 1 implies the evolution equation θ t+1 = θ t , which corresponds to state augmentation with no evolution noise. For the case of a = 0, each θ t+1 becomes an independent draw from a N (θ t , V t ). In practice a typical value for a is from 0.95 to 0.99, which implies a strong persistence in the evolution of θ t .
In situations where p(θ|x t , y t ) depends on a low dimensional set of sufficient statistics s t = S(x t , y t ) such that p(θ|x t , y t ) = p(θ|s t ) and s t can be updated recursively via s t = f (s t−1 , x t , y t ), the method of Storvik (2002) can be applied to draw samples from the filtering distribution of the parameters. We include the sufficient statistics s t in the state vector and draw samples of θ based on the sufficient statistics at each time t in the filter. By doing this, the impoverishment problem is mitigated and the exact posterior p(θ|y t ) can be better and gradually approximated through the filtering process. The approach is based on the decomposition: Storvik's algorithm generates samples from this distribution at each time t = 1, . . . , T , using the following steps.

Particle Learning and Smoothing (PLS)
For particle smoothing with unknown parameters, we are interested in estimating the states and parameters conditional on the entire dataset y T by drawing samples (x T (i) , θ (i) ) from the joint posterior p(x T , θ|y T ), where T is the total number of time steps. Carvalho et al. (2010) showed that after running the filtering and learning algorithm, the filtered particles can be resampled to obtain draws from the smoothing distribution. The procedure is based on Bayes' rule and the decomposition of the joint posterior smoothing distribution written as where The steps of this algorithm are now presented:

PLS Algorithm
1. (Forward Filter) Run the particle learning algorithm to generate samples Step 1, and simulate backwards: For t = T − 1, . . . , 1, resample the particles {x Step 1 with weights proportional to ω t from the smoothing distribution. Carvalho et al. (2010) claim that PLS is an extension of Godsill, Doucet, and West (2004) to state-space models with unknown parameters. However, note that in the backward smoothing step of PLS, a fixed θ (i) is selected first and the resampling weights are evaluated proportional to p(x ). Thus, we should use samples drawn from p(x t |θ (i) , y t ), i.e., the filter samples with respect to this fixed θ (i) , but this is not the case for PLS.
The particles from the forward pass of PLS are from the marginal density p(x t |y t ), not from the conditional density p(x t |θ, y t ). Reweighting these particles using the transition density ignores the dependence between states and parameters. This causes inaccurate smoothing estimates when such dependency is strong. Figure 1 shows the dependency between the filtered samples of the states and parameters for the AR(1) plus noise model that is presented in Section 4. Correlations between x t and θ greater than 0.5 can be observed at the beginning of the time series. The simulation studies presented in Section 4 show that PLS gives poor smoothing estimates, particularly at early time periods of the time series.
In the following section we introduce two new smoothing algorithms. The first algorithm relies on a transformation of (4) and an adjustment of the weights in the back- ward smoothing pass to refine PLS. The second algorithm involves a separate forward filtering-backward sampling pass for each sampled parameter.

PLS with Adjustment (PLSa)
The backward reweighting scheme in the PLS algorithm assumes that the filtering algorithm provides samples from the conditional distribution, p(x t |θ, y t ), when in fact the samples come from the joint distribution, p(x t , θ|y t ), and hence, from the marginal distribution, p(x t |y t ). Thus, PLS does not provide samples from the target smoothing distribution. We consider the following rearrangement of (7): t |y t ) are the resampling weights for the backward smoothing pass. These are the weights for the filtered particles in the smoothing algorithm. Note that in general, we cannot compute these resampling weights exactly, since the joint filtering distribution p(x t , θ|y t ) is not available in closed form. Therefore, we propose to use a multivariate normal approximation for p(x t , θ|y t ) based on the fil- , in combination with appropriate transformations to the states and/or parameters if necessary.
The particle learning and smoothing algorithm with adjustment (PLSa) proceeds in the same way as PLS, but uses adjusted weights in the backward pass. Based on the simulation results in Section 4, we find that the adjustment of the weights matters: the adjusted version outperformed the original one, especially in the beginning of the time series, where the PLS usually has problems to recover the true states and/or parameters.

(Forward Filter) Run a filtering and learning algorithm to obtain samples
Use the filtered samples to construct a multivariate normal approximation at each time t, where the mean vector and covariance matrix are approximated with sample moments from {(x This implies that the marginal and conditional distributions are also normal: where the conditional mean and covariance are given by the well-known formulas for normal distributions.

(Backward Smoother) Select a pair (x
Step 1, and simulate backwards: Step 1 with weights proportional to There are two points to note here. First, although our description above does not explicitly mention it, when Storvik's algorithm or Particle Learning is used, then the sufficient statistics s t are stored and sampled at each time in the forward filter (Step 1). However, the PLSa algorithm is actually more general, and can be used for other sequential filtering and learning algorithms that do not involve sufficient statistics, such as Liu and West (2001). Second, although we have chosen to approximate the joint posterior using a normal distribution, other density estimators could also be used, for example, a kernel density estimator as in Silverman (1986). The main point here is that we need some analytical approximation to compute the additional density ratio in the weights.

Refiltering Smoothing Algorithm
In addition to the PLSa modification, we propose the following new smoothing algorithm. The idea behind this algorithm is the following decomposition of the joint posterior distribution: Therefore, Storvik's forward filter, particle learning, or a more general filtering method such as Liu and West (2001) can be implemented to get samples of the parameters at the last time step, i.e. θ (i) ∼ p(θ|y T ). Then for each sample, θ (i) , a forward-backward smoothing algorithm is applied as in Godsill et al. (2004) to get one state trajectory x T (i) from p(x T |y T , θ (i) ). Repeating this process for each θ (i) produces samples of the states from the marginal smoothing density p(x T |y T ).
Since the run time for the forward filter is negligible compared to the backward smoother, (O(N ) vs O(N 2 ), respectively), in simulation studies, we found that this algorithm almost has the same speed as PLS, but with differences in the accuracy of the state estimates. The algorithm is presented below:

(Forward Filter) Use Storvik, Particle Learning or Liu & West to run a forward
filter and learning algorithm and generate θ (i) ∼ p(θ|y T ).
Note that this algorithm has a complexity of O(N 2 ), the same as PLS, but it can be made an O(N ) algorithm in two ways. The first way is to choose a small number of states n 0 N for the forward-backward smoother in step 2. The second idea is to use a small number of parameter draws of size N 0 in step 2. Our simulation studies show that both ideas provide a substantial speed up with only a minor loss of accuracy.
In cases where the state-space model is linear and the errors are Gaussian, i.e., Step 2 of the refiltering algorithm can be implemented using a forward filtering, backward sampling (FFBS) algorithm (Carter and Kohn, 1994;Frühwirth-Schnatter, 1994). A Kalman filter forward pass is first run and then a sample, backwards in time, is generated. Note that p(x t |y t−1 ) ∼ N (a t , R t ) is the prior and p(x t |y t ) ∼ N (m t , C t ) is the posterior of the state at each time point t, which depends on the parameters θ = (F t , G t , V, W ).

Refiltering with FFBS
We note that similar simplifications in Step 2 can be made for other models where p(x T |θ, y T ) is available in closed form and can be simulated directly, for example, for hidden Markov models using forward-backward samplers (e.g., Scott, 2002).
The refiltering algorithm consists of a filtering and smoothing step to generate from θ ∼ p N (θ|y) and x ∼ p N (x|θ, y), respectively. The combined algorithm provides samples from p N (θ, x|y) = p N (θ|y)p N (x|θ, y). As N → ∞, the cumulative distribution function (CDF) of p N (θ|y) converges to the CDF associated with p(θ|y). In addition the CDF of p N (x|θ, y) converges to the CDF of p(x|θ, y) thus implying that the joint CDF p N (θ, x|y) converges to the joint CDF of p(θ, x|y). The refiltering algorithm can be used with any filtering and smoothing algorithm, and its asymptotic properties depend on the choice of algorithms. In this paper, we consider only filtering algorithms based on sufficient statistics (Storvik, 2002;Carvalho et al., 2010). These methods provide asymptotic draws from the marginal posterior p(θ|y) (see discussion in . For the smoothing step, we use either the FFBS algorithm or the forward-backward smoother of Godsill et al. (2004). FFBS provides exact draws from p(x|θ, y) and the Monte Carlo smoother provides asymptotic draws from it (see proof in Godsill et al., 2004). Thus, we view the refiltering algorithm as asymptotically exact for the examples considered in the paper.

AR(1) Plus Noise Model
We first consider the AR(1) plus noise model: This is a common benchmark model for testing state-space algorithms (Storvik, 2002;Polson, Stroud, and Müller, 2008). For this model, FFBS can be implemented and a long MCMC chain with 150,000 iterations is used as a standard for comparisons with other smoothing algorithms. The simulation study under this model is based on T = 100 observations with parameter values V = 1, W = 1, φ = 0.75 and x 0 = 0.
The model includes three unknown parameters θ = (φ, W, V ). We assume conjugate priors, with φ|W ∼ where  IG(a, b) denotes the inverse-gamma distribution with shape and rate parameters a and b, respectively. We choose hyperparameter values n 0 = 2, ν 0 = 2, d 0 = 2, δ 0 = 2, b 0 = 0.5 and B 0 = 1, implying fairly diffuse prior distributions. The conjugate priors for the parameters allow us to apply Storvik's algorithm based on the sufficient statistics s t = (b t , B t , n t , d t , ν t , δ t ), and the following updating recursions:  Figure 2 shows the parameter learning plots and the posterior distribution at the last time period T = 100, corresponding to Storvik's filtering algorithm with 50,000 particles and MCMC with 150,000 iterations, respectively. From the figure, we notice the true parameters values are learned properly and the samples of the parameters at the last time step are well concentrated around the true parameter values. Also, the samples from Storvik's filter agree well with samples from a long MCMC. State smoothing by refiltering and the result of a long MCMC are also presented in Figure 2. We notice that the mean, 2.5 th and 97.5 th quantiles of the smoothing samples almost coincide for the two methods at each time step t.
To show that PLSa outperforms PLS based on the same computation time, we ran 500 simulations for each of these two methods and compared the standardized absolute errors over time, i.e.,ê * t = |x t −x true t |/σ(x t |y T ) for t = 1, . . . , T , wherex true t and σ(x t |y T ) are the smoothed mean and standard deviation for x t computed from the long MCMC with 150,000 iterations, andx t is the smoothed mean from the other algorithms. The results are shown in Figure 3. From this plot, we can see the main difference between the two smoothing algorithms appears at the beginning of the series, in which the dependence of states and parameters is strong, so the adjustment matters. As time progresses, the dependency of states and parameters decreases, and the results from the PLSa coincide with PLS. In addition, the results from the refiltering smoothing algorithm are also shown in this figure. For this AR(1) model, refiltering substantially outperforms the other two methods, and its accuracy in terms of absolute standardized errors is fairly constant over time. Note that the number of particles for the three smoothing algorithms was adjusted to assure similar computation time for all methods.
To compare the performance of all of the smoothing algorithms, we implemented long and short MCMC runs, pseudo-marginal Metropolis-Hastings (PMMH), PLS, PLSa, refiltering, O(N ) refiltering, and refiltering with FFBS using 500 simulated datasets. To allow PMMH to provide the best possible results, we consider the following. The last draw and the standard deviations of the parameter samples from a very long MCMC are given as the initial parameter values in PMMH and used to choose the covariance of the random walk proposal respectively. For each iteration, a SMC algorithm is run with 500 particles. All SMC based smoothing algorithms are run in parallel on 16 cores on a single node. Based on a similar run time, the mean standardized absolute errors over time (MAE* = T t=1 |ê * t |/T ) are listed in Table 1

Non-stationary Growth Model
We next consider the non-stationary growth model: where w t ∼ N (0, W ) and v t ∼ N (0, V ). This benchmark nonlinear time series model has been used by Carlin et al. (1992) to test MCMC smoothing, by Gordon et al. (1993) to test the bootstrap filter, and by Briers et al. (2010) to test the forward-backward smoothing with known parameters. The new smoothing methods are tested on this model where the parameters θ = (α, β, γ, W, V ) are treated as unknown.
Furthermore, comparisons for the refiltering smoothing algorithm were made with N 0 = 10, 000 and n 0 = 1, 000, with a long MCMC using N = 150, 000 iterations. The smoothing plot is also presented in Figure 4. The results from the two smoothing algorithms closely agree with each other.
Using 200 simulated datasets, Table 2 gives a summary of the overall performance of the three smoothing algorithms compared to a long MCMC. A significant decrease in the mean absolute error for the new smoothing methods relative to PLS is detected. The plot of the standardized absolute errors over time of the three smoothing algorithms (not shown) illustrates the same patterns as for the AR(1) model: the main improvement of the two new smoothing algorithms over PLS is evident at the beginning of the time  series. By comparing the results from Refiltering, PMMH, and short MCMC, we notice that Refiltering dominates the latter two for both state and parameter estimation.
Note that for this model, the posterior distribution of the states is often multimodal, and it is difficult to distinguish between the positive and negative sign of the states based on the data. Thus, it is difficult to assign initial values for the states for the MCMC algorithm based only on the observations. With bad starting values for the states, the MCMC chain takes much longer to converge. In contrast, the SMC smoothing algorithms are not based on Markov chains and thus avoid issues with starting values.

Chaotic Model
The third example is the nonlinear Ricker model (Fasiolo, Pya, and Wood, 2016): This model is used in ecology, where N t stands for the population density at time t, r is the growth rate, and y t is the observed population size at time t. This model is characterized by its sensitivity to parameter variations: small increments in r will lead to significant oscillations in the likelihood function. As a result, the likelihood function can be intractable in certain areas of the parameter space and parameter estimation via maximum likelihood methods is challenging. Fasiolo, Pya, and Wood (2016) described the pathological likelihood function for this model and compared various maximum likelihood estimation and Bayesian parameter estimation approaches. A time series of 100 observations is generated from this model with true parameter values r = e 3.8 , σ 2 = 0.3 and φ = 10.
To estimate this model within our framework, we first make the transformations x t = log(N t ) and μ = log(r). The observation and evolution equations then become y t ∼ P ois(φe xt ), The model includes three unknown parameters, θ = (φ, μ, σ 2 ). We assume conjugate priors, with φ ∼ Gamma(a 0 , b 0 ), μ|σ 2 ∼ N(m 0 , c −1 0 σ 2 ) and σ 2 ∼ IG(n 0 , d 0 ), with The smoothed states, parameter learning bands and posterior distributions at time T = 100 are summarized in Figure 5. A total of N = 100, 000 particles were used for the filtering. For the simulation, the true parameter values were learned quickly and the posterior distribution of the parameter converges to the true values. Figure 5 also provides a comparison of refiltering with N 0 = 10, 000 and n 0 = 1, 000 to a long MCMC with 150,000 iterations for smoothing. The smoothing results from the two methods shown in the figure are almost identical.  When comparing Refiltering, MCMC and PMMH, we find that Refiltering dominates the other two methods in terms of state estimation. To summarize, the simulation results of the three models studied in this section show that Refiltering is as accurate or more accurate than MCMC and PMMH methods for both state and parameter estimation.

Marginal Likelihood Selection
As noted by Carvalho et al. (2010), the marginal likelihood can be computed trivially from the output of SMC-based Bayesian filtering and learning algorithms (e.g., Storvik, 2002;Carvalho et al., 2010). By comparison, many different MCMC-based estimates of the marginal likelihood have been proposed. One of the most commonly used and straightforward methods is the harmonic mean estimator (Newton and Raftery, 1994), which can be computed based on the joint distribution of the data: where y = (y 1 , . . . , y T ) are the observations, N is the total number of MCMC iterations, and ψ j = (x j , θ j ), j = 1, . . . , N are the posterior draws of the states and parameters.
An important advantage of SMC over MCMC is that the estimation of the marginal likelihood from SMC output is numerically stable. As shown in  converges quickly as N increases, while for MCMC, the harmonic mean estimator is unstable even for N as large as 500,000 in all three models.

Stochastic Volatility Model for S&P 500 Returns
To illustrate the proposed methods on real data, we analyze daily returns on Standard & Poor's (S&P 500) index from January 2008-March 2009, during the financial crisis. We consider the following stochastic volatility model: where y t = log(P t /P t−1 ) are the daily returns and P t are the daily prices on the S&P 500 index, μ is the expected return, and x t is the unobserved log-variance state variable, which follows an AR(1) process with constant term α. The coefficient β measures the autocorrelation in the logged squared returns, and W is the so-called volatility of volatility. This model has been widely used to analyze financial time series with volatility clustering (for example, Jacquier, Polson, and Rossi, 1994;Kim, Shephard, and Chib, 1998). Here, we compare smoothing and parameter estimation results from the PLS, PLSa, Refiltering and MCMC algorithms.
The sequential learning plots in Figure 6 show an abrupt change in the parameters, particularly α and β, in September 2008. The timing corresponds to the days following the collapse of Lehman Brothers, when stock prices dropped and volatility increased sharply (i.e., the start of the financial crisis). Figure 7 shows the filtered and smoothed volatilities for each algorithm. These plots show clear evidence that PLS and MCMC do not match especially from September-November 2008, when the volatility changes abruptly. PLSa more closely follows the MCMC results and among Refiltering, PLS and PLSa, Refiltering is by far the most accurate relative to long MCMC. The smoothed posterior state estimates of stochastic volatility obtained with the PMMH smoother closely matches the estimates obtained with long MCMC.
Figures 8-10 in the Supplemental Material show results for a situation where T = 3000. The refiltering algorithm is the method that provides estimates of the log-volatility state that most closely resembles the ones obtained from a long MCMC.

Conclusions
In this paper, we have proposed two new particle smoothing algorithms that simultaneously deal with state and parameter learning in general state-space models. The first is a modification of the PLS algorithm of Carvalho et al. (2010), which includes an adjustment term to their backward resampling weights. The second, called Refiltering, is a two-step algorithm that includes a parameter learning step followed by a forwardbackward algorithm for smoothing. The Refiltering algorithm is well suited for parallel implementation, since the smoothing step requires essentially no communication between processors.
We implemented the new methods on four examples: a benchmark autoregressive plus noise model, a nonlinear growth model, a chaotic model, and a stochastic volatility model from finance. We compared estimates for states and parameters with the widely-used PLS algorithm, Particle MCMC, and a full MCMC implementation. For all examples, the new smoothing methods showed significant improvement over PLS, and the Refiltering algorithm is shown to be highly competitive with smoothing algorithms based on MCMC and Particle MCMC methods. Overall, our proposed methods are quite general and apply to a wide class of nonlinear, non-Gaussian state-space models for retrospective state and parameter estimation. Finally, we note that all of the examples considered in this paper are low-dimensional, with state and observation that are one-dimensional. It is well known that particle filters (or sequential Monte Carlo methods based on importance sampling) fail with highdimensional states and observations, leading to unbalanced weights and particle degen-eracy (see Snyder, Bengtsson, Bickel, and Anderson, 2008). Similar degeneracy problems are likely to occur for high-dimensional parameters. An important area of current research is designing state and parameter estimation algorithms for high-dimensional models. We believe that "exact" methods based on importance sampling will fail, and suggest that approximate methods such as the ensemble Kalman filter (EnKF, Evensen, 1994;Katzfuss, Stroud, and Wikle, 2016) will provide more accurate and stable results. These methods are already widely used in geophysics, and EnKF state and parameter estimation methods have proved to be successful in many high-dimensional examples (e.g., Stroud and Bengtsson, 2007;Stroud, Stein, Lesht, Schwab, and Beletsky, 2010;Stroud, Katzfuss, and Wikle, 2017).