Sequential Monte Carlo Samplers with Independent Markov Chain Monte Carlo Proposals

Sequential Monte Carlo (SMC) methods for sampling from the posterior of static Bayesian models are flexible, parallelisable and capable of handling complex targets. However, it is common practice to adopt a Markov chain Monte Carlo (MCMC) kernel with a multivariate normal random walk (RW) proposal in the move step, which can be both inefficient and detrimental to the ability to explore challenging posterior distributions. We propose a flexible copula-type independent proposal which uses the population of particles to adapt the kernel in a more sophisticated way than the RW. The result is fewer likelihood evaluations, improved exploration of challenging posterior distributions and an improved ability to perform likelihood evaluations in parallel. Another consequence of using independent proposals is that all candidates generated in the SMC process can be used to estimate the marginal likelihood and posterior expectations. We consider various importance sampling (IS) estimators that use all candidates and devise a novel IS marginal likelihood estimator. We demonstrate through several examples that more precise estimates of posterior expectations and the marginal likelihood can be obtained using fewer likelihood evaluations than the more standard RW approach.


Introduction
Sequential Monte Carlo (SMC, Chopin (2002); Del Moral et al. (2006)) methods for static Bayesian models are naturally adaptive, easily parallelisable and are capable of dealing with targets that are multimodal or have complicated landscapes (see e.g. Del Moral et al. (2006) and Cappé et al. (2007)). The basic SMC method involves moving a population of N particles through a sequence of distributions which can be chosen by smoothly introducing either the data (data annealing) or the effect of the likelihood (likelihood annealing). Many of the computations associated with the N particles can be performed in parallel. As a useful by-product, SMC produces an estimate of the normalising constant of the posterior distribution (e.g. Del Moral and Miclo (2000)), the so-called evidence, which is useful for Bayesian model choice.
SMC propagates the particles through the sequence of distributions using three types of steps: reweighting, resampling and moving (or mutation importance sampling (IS) to adjust the weights of particles from the current distribution to obtain a properly weighted set targeting the next distribution. The resampling step is used to focus on promising regions that will then be diversified using a move step, which is the most computationally intensive aspect of SMC. The move step is commonly chosen to be several iterations of a Markov chain Monte Carlo (MCMC, Metropolis et al. (1953)) kernel, often with a random walk proposal. Despite their commonplace use, random walk proposals can be inefficient at exploring the target and this can have a detrimental effect on evidence and posterior expectation estimates.
We develop new and efficient SMC methods using independent MCMC proposals. Independent proposal distributions have the advantage that they can result in uniformly ergodic Markov chains, as opposed to typical geometric ergodicity achieved by random walk proposals (Tierney, 1994). They are highly parallelisable and have the potential to explore complex and multimodal targets in fewer iterations than local proposals. Some early SMC algorithms proposed basic independent proposals like multivariate normal (Chopin, 2002) and more recently an independent proposal for multivariate binary spaces has been used in SMC (Schäfer and Chopin, 2013). However independent proposals are often dismissed in the SMC literature for being too restrictive (see e.g. Del Moral et al. (2006)). Silva et al. (2010) and Schmidl et al. (2013) develop adaptive MCMC methods that use copula (Sklar, 1959) type models to form independent proposals. Copulas provide flexible multivariate distributions as they allow for separate modelling of marginals and dependence structure between components. We extend this idea to the SMC setting, taking advantage of the available population of particles.
We demonstrate that when independent proposals are used, all candidates generated in the SMC process can be used in evidence estimation and posterior inference. Throughout this paper, "candidates" refers to all samples from the prior and all MCMC proposals. We compare several estimators from the IS literature with existing methods for SMC in terms of bias and precision. We also propose a novel evidence estimator which is simple and computationally efficient to calculate from our independent SMC output. These new recycling schemes for SMC can lead to increases in the effective sample size (ESS) targeting the posterior, improved sampling from complex posterior distributions and significant variance reductions when compared to no recycling and the recycling method of Nguyen et al. (2014). In the examples considered, the novel evidence estimator is up to five orders of magnitude more efficient than the standard SMC estimator. Section 2 of this paper gives a brief review of likelihood annealing SMC, the existing literature on recycling in SMC and the copula models which form the basis of the independent proposals. The main contributions of this paper, developing independent MCMC proposals for SMC and exploiting these proposals, are described in Section 3. In Section 4, we compare our methods with a more standard SMC implementation on applications of varying complexity. A final summary and a discussion of possible limitations and extensions of this work are given in Section 5.

Background
The focus of this article is Bayesian inference for a statistical model parameterised by θ ∈ Θ ⊆ R p where p is the dimension of vector θ and the collected data is denoted y ∈ Y ⊆ R d . The posterior distribution is defined as where f (y|θ) is the likelihood, π(θ) is the prior and Z(y) = Θ f (y|θ)π(θ)dθ is the normalising constant of the posterior often referred to as the evidence. The evidence is an important quantity in Bayesian model selection because it is used in the calculation of Bayes factors. However, it is a p dimensional integral which makes it difficult to estimate.
Here we are interested in estimating both the evidence, Z, and posterior functionals, Θ ψ(θ)π(θ|y)dθ, and we use SMC to do so.

Sequential Monte Carlo
SMC traverses a set of N weighted samples or 'particles', , through a sequence of distributions, π t (θ|y) for t = 0, . . . , T . SMC consists of reweighting, resampling and move steps, each of which are described in further detail below.
In this paper we focus on the likelihood annealing sequence which is useful for exploration of complex targets (Neal, 2001) and allows for additional benefits in terms of recycling otherwise wasted samples. Interest is in sampling from the sequence of power posteriors, π t (θ|y) = f (y|θ) γt π(θ)/Z t where γ t is referred to as the temperature and 0 = γ 0 ≤ · · · ≤ γ t ≤ · · · ≤ γ T = 1.

Reweighting Step
The reweighting step uses IS to weight particles targeting the previous distribution appropriately for the next distribution. Following Del Moral et al. (2006), we obtain is an unnormalised weight and W i 0 = 1/N for i = 1, . . . , N. The weights are normalised and we set θ i t+1 = θ i t for i = 1, . . . , N to obtain the set of particles providing a discrete approximation of π t+1 (θ|y).
The temperature γ t+1 is chosen adaptively by approximately maintaining a specified effective sample size (ESS), the number of independent samples from the target that would be required to achieve the equivalent variance in the estimator, of ρN (see Jasra et al. (2011) for details). Following Kong et al. (1994), the ESS at target t + 1 is

Resampling Step
After reweighting, the particles are resampled with probabilities given by their corresponding normalised weights. This resampling has the effect of eliminating particles with negligible weight and replicating the particles with larger weights. See Gerber et al. (2017) for a recent review of the theoretical properties of resampling methods. We use multinomial resampling for simplicity.

Move Step
To diversify the particles and explore π t , we apply R t iterations of a Metropolis Hastings (MH) MCMC kernel on each particle. For a single MCMC iteration with proposal distribution q φ t and parameter φ t , a candidate θ i, * t ∼ q φ t (·|θ i t ) is proposed and we set otherwise we retain the current θ i t . A major advantage of SMC is that the parameter φ t can be estimated from the population of particles prior to the move step. It is common to use multivariate normal random walk (RW) proposals and to estimate the covariance from the population of particles (Chopin, 2002). The proposal distribution using this scheme is q h) and N (θ; μ, Σ) denotes the multivariate normal probability density with mean μ and covariance Σ evaluated at θ. Here we set h = 2.38/ √ p (Gelman et al., 1996). Our alternative proposal is described in Section 3 and compared to the RW empirically in Section 4.
We propose to choose R t adaptively based on the current move step at t so that there is a theoretical probability of 1 − c (with c set small) that the particle is moved at least once. The number of repeats is where · denotes the ceiling function andp t acc = 1 is the acceptance rate based on a trial MCMC iteration on the N particles. This method is similar to Drovandi and Pettitt (2011), who predict p t acc with the MCMC acceptance rate at t − 1, p t−1 acc . We do not rely on p t acc ≈ p t−1 acc and we do not require an initial choice for R 1 . Only a further R t − 1 MCMC iterations are required per particle in our method since the first iteration is already performed. In the discussion in Section 5, we show how an improved Rao-Blackwellised estimate of R t and a particle specific R i t choice of the MCMC repeats can be obtained.

Estimating Posterior Expectations and the Evidence
The standard SMC estimator for posterior expectations is given bŷ The standard SMC estimator of the evidence is This estimator is based on the identity Z T /Z 0 = T t=1 Z t /Z t−1 where Z 0 = 1. It requires little additional implementation or computational effort. Intermediate normalising constants can also be estimated by taking fewer terms in the product,

Recycling in SMC
Typically the final samples {θ i T } N i=1 from an SMC run are used for inference and all other proposed particles are wasted because they do not target the posterior. Particle recycling schemes, such as those presented for general importance tempering in Gramacy et al. (2010) and for SMC in Nguyen et al. (2014) and Finke (2015), use IS to reweight the final particles from π t for t = 0, . . . , T to target the posterior.
To recycle particles {θ i t } N i=1 from the t-th power posterior, the target is the posterior π T and the importance distribution is the t-th power posterior, π t = f (y|θ) γt π(θ)/ Z t . The unnormalised IS weights to use the t-th power posterior samples are therefore where the term Z t is unknown. Normalising the weights to Posterior expectations can be estimated by normalising the weights and applying standard Monte Carlo integration, Gramacy et al. (2010) propose a method to combine multiple IS estimators in order to maximise the ESS. Denote the ESS targeting the posterior using {θ i t } N i=1 from the t-th power posterior as ESS t and denote the unbiased estimate of some posterior functional using those samples asψ t . The combined estimate iŝ where λ t = ESS t / T l=0 ESS l . By linearity of expectation, this combined estimate is unbiased if the λ values are fixed from a separate run. The motivation behind this combined estimator is that importance distributions which have poor performance (as measured by ESS) should be given less weight in the combined estimator. The ESS targeting the posterior after recycling is T t=0 ESS t in the adaptive case (see Appendix A of the Supplementary Materials (South et al., 2018)). We refer to the general approach of combining IS estimators based on ESS as combined importance sampling (CIS) and we denote the combination of power posterior samples through CIS as CIS PP . CIS PP has been applied in the SMC context by Nguyen et al. (2014) and Finke (2015).
Another method for combining samples from multiple importance distributions is to treat the samples as coming from a mixture of distributions, with the mixture weights based on the proportion of samples coming from that distribution (Veach and Guibas, 1995;Owen and Zhou, 2000). Following Owen and Zhou (2000), we refer to this method as deterministic mixture sampling or DeMix for short. We refer to the use of DeMix to recycle particles from the power posteriors as DeMix PP . Nguyen et al. (2016) use DeMix PP in calculating posterior expectations. The unnormalised DeMix weights in this context are for i = 1, . . . , N and t = 0, . . . , T . Calculating the deterministic multiple mixture weights requires that π t be normalised so, like Nguyen et al. (2016), we use the SMC estimate Z t for the normalising constant of π t . Nguyen et al. (2016) compare DeMix PP and CIS PP for posterior approximation and found empirically that the DeMix PP scheme performs only marginally better.
Normalising constant estimation using power posterior recycling schemes has been restricted since the importance density involves the normalising constants Z t for π t . The requirement to estimate intermediate normalising constants in order to estimate the overall normalising constant and the fact that only an additional N particles are used when compared to the standard SMC estimator makes these evidence estimators seem unappealing when compared to the simple SMC estimator in (3). For completeness, we implement CIS PP and DeMix PP estimators of the evidence using the SMC estimates of the intermediate normalising constants. We find empirically in Section 4 that the CIS PP and DeMix PP estimators perform similarly to the SMC estimator of the evidence.
In Section 3.2, we propose evidence estimators that do not require estimates of the intermediate normalising constants and can take advantage of all candidate parameter values generated in the SMC process.

Copulas
Copulas form the basis of the independent proposals described in Section 3.1. Sklar's theorem (Sklar, 1959) states that a multi-dimensional distribution can be described entirely by its marginal cumulative distribution functions and a copula that captures the dependence between these marginals. Any p-dimensional random vector V = (V 1 , . . . , V p ) T with cumulative distribution function H and continuous marginal distribution functions G j (v) = P (V j ≤ v) for j = 1, . . . , p can be described by a unique copula C such that Copulas describe the dependence between univariate uniform marginals but this can easily be extended to non-uniform marginals using transformation methods.
A simple and popular family of copulas is the Gaussian copula (see e.g. Fang et al. (2002)). Denote random vectors on the U(0, 1) scale by U and random vectors after transformation to N (0, 1) by X. Denote the j-th dimension of X as X[j] for j = 1, . . . , p. The Gaussian copula can be written as where Φ D is the joint cumulative distribution function of the multivariate Gaussian distribution with correlation matrix D. The probability density function is In practice, the correlation matrix D is estimated by the empirical correlation matrix of a set of samples on the N (0, 1) scale.
A more flexible way to model dependence is through a multivariate mixture model, as described in Tran et al. (2014). These models are not strictly copula models because the marginals are not preserved, but they offer a flexible way to model dependence separately from the marginals. We consider the most simple and computationally inexpensive mixture, a multivariate Gaussian mixture model (MGMM, Pearson (1894); Dempster et al. (1977)) but we note that other mixtures considered by Tran et al. (2014), such as mixtures of multivariate t's, may help to improve tail coverage at the cost of computation time.
MGMMs consist of a weighted mixture of multivariate Gaussian distributions. The parameters to be estimated in an MGMM are the mean, μ k , and covariance, Σ k , of the k-th component and a set of weights c k for k = 1, . . . , K where K is the number of components and K k=1 c k = 1. The density of an MGMM for parameter X is defined as An independent MCMC proposal using copulas has been explored before in MCMC (Silva et al., 2010;Schmidl et al., 2013), where it is limited by the requirement of finding an approximation of the target distribution to fit the copula and the marginals. In SMC, an approximation from the current target is already available. To our knowledge, independent MCMC proposals using copulas have not yet been developed for general use in an SMC framework. This is the focus of Section 3.1.

Independent Proposals in SMC
The main contributions of this work are to develop efficient independent MCMC proposals for SMC and to exploit them in several ways, for example by harnessing all generated candidates to obtain reduced variance estimators of posterior expectations and the evidence. We also present a new evidence estimator that is conveniently computed from the output of independent SMC.

Copula-Type Independent Proposals Estimating the Copula-Type Model
Here we describe a method for estimating the parameters of the copula-type independent proposals. A univariate distribution is first chosen for each of the marginals. Different distributions, including univariate beta and Gaussian mixture distributions, are used in this work. When individual parameters are bounded above and below a priori, beta marginals may be a sensible choice as the prior limits can be transformed to [0, 1]. The univariate distributions with parameters η j are fitted to {θ i t [j]} N i=1 for j = 1, . . . , p and the cumulative distribution function of the j-th marginal is denoted by Gη j j (·).
. . , N transforms each of the p dimensions of the particle population to be approximately U(0, 1) distributed random variates provided that Gη j j fits well. The quantile function of the standard normal distribution, Φ −1 , can then be used to transform the marginals to be approximately standard normal through . With approximately N (0, 1) marginals, it is easier to model the dependencies. Here we consider an MGMM as described in Section 2.3. The parametersĉ,μ andΣ for the mixture model are obtained using the expectation maximisation (EM) algorithm (Dempster et al., 1977). There are methods to help choose the number of components in a mixture model (McLachlan and Peel, 2000), for example based on the Bayesian information criterion (Schwarz, 1978;Keribin, 2000) or using variational Bayes methods as in Tran et al. (2014), but for simplicity here we fix the number of components.
The process of fitting a copula-type model, including the transformations to approximately N (0, 1) marginals, is given in Algorithm 1. An illustration of this process for the example in Section 4.2 is shown in Figure 1.
Algorithm 1: Fitting the copula-type MGMM from the population of particles at π t .
Input : A population of particles from the current power posterior {θ i t } N i=1 , the number K of components in the MGMM and the type of marginals to be fitted (problem dependent). Output: Marginal distribution parameters {η j } p j=1 , mixture model parameters (ĉ,μ,Σ) and the transformed population on marginal N (0, 1) scale using the EM algorithm.

Making Proposals
Once the copula-type model has been fitted, candidates can be drawn from the mixture model which simply involves simulating from the k-th component with probability c k .
Transformation of candidates X * on the N (0, 1) scale to the approximately U(0, 1) scale is done through the cumulative distribution function of the standard normal distribution, Φ, such that U * [j] = Φ(X * [j]), for j = 1, . . . , p. The candidate on the original scale is θ * [j] = Qη j j (U * [j]), for j = 1, . . . , p, where Q j ≡ G −1 j is the quantile distribution of the j-th fitted marginal distribution. The proposal density for a candidate θ * t with fitted parameters φ t = (η,ĉ,μ,Σ) needs to account for these transformations. The proposal density (7) for X * is adjusted using transformation methods (see Appendix B of the Online Resources) to obtain where gη j j denotes the probability density function of the j-th marginal with estimated parameterη j . The proposal density in (8) is required for the MH ratio and also for recycling methods.
Algorithm 2 outlines the details of performing a single MCMC step for each particle using our MGMM independent proposals. This step is repeated multiple times depending on R t . Parallelisation is straightforward as one can draw all candidates from q φ t (·) and calculate {f (y|θ i, for the candidates in parallel.

Recycling All Candidates
In the context of our independent SMC method, it is possible to recycle all candidates from all temperatures by using the q φ t (·) as importance distributions. This means that R t times as many samples per temperature can be used in estimating posterior expectations and estimating the evidence when compared to the PP recycling methods described previously.
The importance density q φ t can be computed pointwise for any candidate when an independent MCMC proposal is used, for example q φ t for the MGMM copula-type proposal is given in (8). This means that the SMC estimates of the normalising constants are not required for any calculations, making these recycling methods much more appeal-Algorithm 2: One iteration of an MCMC kernel on all particles using independent MGMM copula-type proposals.
Input : A population of particles from the current target {θ i t } N i=1 , the transformed standard normal version of the population of particles , fitted mixture model parameters (ĉ,μ,Σ), the current temperature γ t and the prior distribution π(θ). Output: Particles after move step, ing for estimating the evidence when compared to the PP recycling evidence estimators detailed in Section 2.2.
Posterior expectations using these weights can be combined using the CIS method from (5). We also derive the following novel estimator of the evidence, where λ t = ESS t / T l=0 ESS l and ESS t is the ESS targeting the posterior using the candidates drawn from independent proposal q φ t (·). This estimator is in essence an efficient combination of multiple IS estimators for the evidence which, by linearity of expectation, is unbiased if the λ values are fixed from a separate run. To the best of our knowledge, this is the first time that the ESS-based approach has been used for evidence estimation. We refer to the CIS estimators which reuse all candidates as CIS IP where the IP standards for independent proposal recycling.
We also propose the use of the DeMix scheme to reuse all candidates in SMC and we refer to this method as DeMix IP . The DeMix IP weights arẽ for i = 1, . . . , N and t = 0, . . . , T , where q φ l (·) represents the independent proposal used to draw the candidates for π l . The associated evidence estimator is .
This requires an extra T proposal density calculations per particle when compared to CIS but no further target evaluations are required, so this additional cost is generally not prohibitive. When a large number of temperatures are used, methods which limit the computational complexity of DeMix as in Elvira et al. (2015) may be useful. As a result of the mixture term, there tend to be fewer extreme weights. DeMix intuitively seems more natural for candidate recycling (DeMix IP ) than power posterior recycling (DeMix PP ); in the latter, the terms in the mixture differ only by the likelihood powers and normalising constants which need to be estimated.

Simulation Studies
The main comparators which we consider in our simulation studies are summarised in Table 1. The two MCMC kernels being compared are our independent proposal (IND) and the ubiquitous multivariate normal random walk (RW). We compare our methods for recycling all candidates in posterior and evidence estimation (CIS IP and DeMix IP ) with the standard practice and with the current state of the art for recycling methods (CIS PP and DeMix PP ).
It is well known that the standard SMC estimator of the evidence in (3) is not guaranteed to be unbiased when the sequence of distributions or proposals are adapted online (Beskos et al., 2016). Similarly, the CIS PP and DeMix PP methods are not unbiased if the intermediate normalising constants are estimated from the same SMC run, and the CIS PP and CIS IP methods are not unbiased if the λ (or ESS) values are estimated from the same run. One aim here is to compare the bias of the estimators empirically when adaptive runs are used. If an unbiased estimate is required, the proposal distributions, temperatures, intermediate normalising constant estimates and ESS values from an adaptive run can be used in a fixed run.
MCMC kernels and recycling methods are compared on the basis of estimates of posterior quantiles and estimates of the evidence. We would ideally like to take into account computational effort and statistical efficiency (accuracy and precision) in these Kernel Recycling Samples inẐ Samples inψ New a IND refers to the MGMM copula proposal described in Section 3.1 Table 1: The kernels and recycling methods considered with information regarding how many samples are used in the estimators and whether they are novel.
comparisons. We measure computational effort by the number of likelihood evaluations (TLL) because this comprises a large proportion of the computational effort in most applications. When a gold standard of approximation is available, for example through Monte Carlo estimates based on a very long MCMC or SMC run, we can measure both accuracy and precision by looking at the mean square error (MSE) relative to the gold standard. We resort to measuring the variance of 100 estimates (VAR) when no gold standard is available. Our overall measure of efficiency is MSE · TLL for RW with no recycling divided by MSE · TLL for the method in question. When no gold standard is available, MSE is replaced with VAR. RW with no recycling has an efficiency value of 1 and larger values are preferred.
For likelihood annealing SMC, temperatures are chosen with ρ = 0.5 so that the ESS in the next iteration is at least N/2. The number of MCMC repeats is chosen so that samples are moved at least once with theoretical probability 1−c = 0.99. Regularisation is used when estimating the parameters of the Gaussian mixture models, which may assist with tail coverage and helps with the numerical stability of the EM algorithm.
In addition to the applications given in the main paper, three examples are given in the Online Resources. Appendix E contains an 11 dimensional example and Appendix F contains a three dimensional example where an unbiased likelihood estimator is used. In these examples, IND outperforms RW by some efficiency measures before recycling and all efficiency measures after recycling. A challenging application for which our MGMM copula-based IND proposal does not fully cover the tails of the target distribution is given in Appendix G.

Factor Analysis Example
This model choice example illustrates the potential benefits and limitations of independent SMC for sampling from complex posterior distributions. The model choice aspect of this example also makes the utility of the evidence estimators clear.
Monthly exchange rates of six currencies relative to the British pound were collected from January 1975 to December 1986 (West andHarrison, 1997). Lopes and West (2004) seek to describe the covariance structure of the standardised (by their sample mean and standard deviation) differences in these exchange rates using factor analysis (FA) models. FA models assume that Y ∼ N (0, Ω) where the covariance Ω is restricted to the form Ω = ββ T + Λ, β is a 6 × k lower triangular matrix with positive diagonal elements and Λ is a 6 × 6 diagonal matrix with positive elements. Here k is the number of factors and the total number of parameters is 6(k + 1) − k(k − 1)/2. We consider k ≤ 3 to avoid over-parameterising Ω, which leads to models with dimension at most p = 21.
For this example, 100 SMC runs with N = 5, 000 particles are performed. The IND proposals are formed using 5 component Gaussian mixture model marginals and a 6 component MGMM for dependence. Matlab code for this example is available at https://github.com/LeahPrice/Independent-SMC. Given that we are using this example for illustrative purposes and the likelihood is inexpensive, we also do 100 gold standard SMC runs. These gold standard runs use a RW proposal with 50,000 particles and a conservative choice of the tempering adaptation parameter, ρ = 0.99, to avoid sudden gaps in modes which discourage mixing.

Posterior Inference
Due to the number of figures required to show the quality of the posterior approximations for the three models, we show some of the more complex marginals here and discuss the results. Appendix C of the Supplementary Materials shows posterior estimates and measures of efficiency for quantiles of the posterior marginals in the three models.

One factor model (p = 12)
The marginals in the one factor model are simple and can easily be captured with our independent proposals. In terms of efficiency, IND outperforms RW before recycling and the best results can be achieved by recycling all candidates. The median ESS targeting the posterior is 5,000 before recycling and up to 55,000 with DeMix IP recycling.

Two factor model (p = 17)
The introduction of a second factor leads to a more complex posterior. The marginals for β 32 , β 42 , β 52 and β 62 are all bimodal with well separated modes and some marginals have heavy tails. Posterior approximations of two marginals are shown in Figure 2. The RW proposal fails to achieve consistent proportions in the well separated modes over multiple runs, whereas the IND proposal can easily move between modes. For this reason, IND is significantly more efficient than RW and recycling offers further improvements (see Appendix C of the Supplementary Materials). We note that the well separated mode was not captured in the reversible jump MCMC approach of Lopes and West (2004).

Three factor model (p = 21)
The three factor model pushes the limits of what can currently be achieved with our IND proposal due to complex and trimodal marginal distributions ( Figure 3) and highly complex bivariate distributions (available in Appendix C). The ESS after recycling all candidates is consistently less than N for both CIS IP and DeMix IP recycling, which indicates a lack of tail coverage of the IND proposals with respect to the power posteriors. It is not surprising given the complex dependencies that a 6 component MGMM could not adequately describe the dependence.  Evidence Figure 4 shows boxplots of the estimated model probabilities which are calculated using the model evidence estimates. It is clear that the two factor model is preferred. The three factor model has some known issues with posterior approximation so it is not surprising that there is some discrepancy between the model probabilities estimated from the independent proposal and the probabilities estimated from the gold standard. However, the bias when recycling all independent proposals is small and the estimates are precise. Table 2 shows the evidence estimates and their efficiency, and this supports the claim that the IND proposals with CIS IP or DeMix IP recycling is the most efficient method for estimating the evidence in this example.

Econometrics Example
In the previous example, our method outperformed RW in terms of the number of likelihood evaluations and the precision of estimators, but it also required more time due to the overhead of constructing the independent proposal, simulating from it and evaluating its density. Here we consider an example more suited for the independent proposal since it has a non-trivial cost in evaluating the likelihood function so we are able to achieve reduced run times.
The "bad environment -good environment" (BEGE) model of Bekaert et al. (2015) is a flexible model used to describe the time-varying, non-Gaussian innovations in financial asset return data. The innovations in returns are modelled using a linear combination of a "bad environment" component and a "good environment" component, r t is the return at time t and u t is the innovation in the return at time t.Γ(k, h) is the so-called de-meaned gamma distribution with probability density functioñ for x > −kh. The shape parameters for the good and bad environments, p t and n t respectively, are modelled using The parameters which we wish to estimate are θ = (p 0 , n 0 , ρ p , ρ n , φ + p , φ + n , φ − p , φ − n , σ p , σ n ) and we assume that μ t = 0.
We use the Matlab code available from Bekaert et al. (2015) which uses numerical integration and differentiation to estimate the likelihood. This procedure makes the likelihood expensive to evaluate, at 2-4 seconds per single likelihood evaluation on an Intel Core i7-4790 CPU @ 3.60GHz.
Daily closing prices for the S&P 500 Composite Index for the period 31/12/1989 to 29/7/2016 were obtained from Bloomberg (Bloomberg, 2017). Log returns were calculated and we apply the BEGE model to the resulting dataset of 6690 daily return observations.
The priors are which, in addition to the requirements related to stationarity that ρ p + 1 2 φ + p + 1 2 φ − p ≤ 0.995 and ρ n + 1 2 φ + n + 1 2 φ − n ≤ 0.995, impose constraints on the parameters. Proposals made using a RW which do not satisfy the prior constraints are rejected without any likelihood evaluations which lowers the acceptance probability and therefore inflates the number of MCMC repeats. On the other hand, it is possible with an IND proposal to continue drawing candidates without any likelihood or proposal density evaluations until a candidate which satisfies the constraints is made.
We perform 100 runs using N = 1, 000 particles. For this example, we use beta marginals and a 2 component MGMM for dependence in the IND proposal. The gold standard in this example is a long MCMC run with 400,000 iterations, taking a 4,000 iteration burn-in and retaining every 40th sample.

Posterior Inference
The posterior marginals in this example are relatively simple and can be explored with a RW or IND kernel (see Appendix D for figures), but the RW SMC runs are significantly more expensive to perform. RW uses on average 3.6 times more log-likelihood evaluations than IND and the run times are 3.2 times longer.
Posterior median estimates and measures of efficiency are given in Table 3

Evidence
Boxplots of the logged evidence estimates based on 100 runs are shown in Figure 5, where it is clear that at least one of the kernel types results in biased SMC and PP recycling evidence estimates. The PP based estimators are not visibly different to the standard SMC estimators, whereas the IP estimators which use all candidates are remarkably precise and are closer to the SMC RW results. Despite the additional density computations required by DeMix IP , the performance of CIS IP and DeMix IP is similar.

Discussion
A flexible independent MCMC proposal for SMC has been developed and the potential advantages of this approach have been demonstrated. The implementation of independent proposals in the SMC context is highly parallelisable and offers the ability to reuse all MCMC candidates in estimating the posterior and evidence. We have described a general method for forming these proposals based on modelling the marginals and the dependence separately, and the specific proposals applied here are based on modelling dependence through MGMMs. Our results are competitive with the multivariate normal random walk kernel in the examples presented here, even before recycling is applied. We find that if the independent proposals cover the tails of the target well, then significant improvements are achieved by recycling the candidates.
The three factor model in Section 4.1 and the example in Appendix G are applications with significant posterior complexity that push the boundaries of what can currently be achieved with our independent proposal. In the example in Appendix G, the MGMM copula-based independent proposal does not provide sufficient tail coverage yet we obtain very precise IS estimates of the evidence and we are able to improve the SE for the lower and upper 95% quantiles using recycling. We found that it is sometimes possible to detect lack of tail coverage through a small ESS after recycling. However, finding a proposal which covers the tails remains challenging for complex targets and a subject which we wish to tackle in further research. Future work may consider a mixture of student's t copulas, which models symmetric tail dependence, or defensive mixture distributions (Hesterberg, 1995), which use a mixture of some approximation q φ t of the target with a sampling distribution that improves tail coverage at the cost of efficiency. We hope that these examples will motivate further research on this topic to increase the capabilities of SMC with an independent proposal.
The concept presented in Section 3.1 for transforming the marginal distributions of a multivariate random variable to approximately N (0, 1) is a powerful tool to allow for separate and straightforward modelling of dependence. These transformations may also be relevant to other proposals, including the multivariate normal random walk proposal. However, this method relies on a reasonable fit of the chosen marginals and the ability to simulate from the resulting marginal distributions via the inversion method. If an analytical expression for the quantile function is not available, the bisection method can be used to determine the quantiles, though this may require non-negligible computation time.
The use of independent proposals allows for the novel possibilities of improving acceptance probability estimates through Rao-Blackwellisation or estimating particlespecific acceptance probabilities using a computational effort that is similar to only a single MCMC iteration on all N particles. For particles {θ i t } N i=1 and candidates {θ j, * t } N j=1 , we can compute the acceptance probability α(θ i t , θ j, * t ), from which we can obtain a Rao- N j=1 α(θ i t , θ j, * t ) or a particle specific estimatê p i,t acc = 1 N N j=1 α(θ i t , θ j, * t ) of the acceptance probabilities. These acceptance probabilities can be applied in choosing the number of MCMC repeats. Using a particle specific number of repeats, R i t , removes the restrictive assumption of having a constant probability of moving a particle regardless of its location in the parameter space. It may also be useful to set an upper limit on the number of move steps per particle, R i t ≤ R max for all i = 1, . . . , N to ensure that no one particle is assigned a large amount of computational load.
The candidates from the independent proposals in the empirical studies of Section 4 were drawn using pseudo-random numbers. As a variance reduction technique, the method known as randomised quasi-Monte Carlo (RQMC, e.g. Cranley and Patterson (1976)) could be used to improve the CIS IP and DeMix IP estimators. Using quasirandom samples instead of pseudo-random samples in Monte Carlo integration leads to estimators that have a faster convergence rate, but some randomisation must be introduced to maintain the unbiasedness property of the estimator.
If some initial approximation of the posterior h(θ) is available, it can be used as the initial SMC distribution with targets following the geometric path π t (θ|y) ∝ [f (y|θ)π(θ)] γt h(θ) 1−γt in order to reduce computation (Donnet and Robin, 2017). Independent proposals were used in SMC here via MCMC proposals in the move step but independent proposals could be used in the move step in several ways. We describe an alternative method in Appendix H which is based on skipping the resampling stage and using independent proposals with an IS correction to diversify the particles. This method is similar to adaptive importance sampling (e.g. Oh and Berger (1992)) but with annealing of the targets. This may be a useful alternative but it is harder to implement in practice due to the difficulty in controlling the weight degeneracy and for other reasons which are discussed in Appendix H of the Online Resources.
In general, the proposed methods for applying independent MCMC kernels in SMC and reusing all candidates have performed well in comparison to the multivariate normal random walk kernel. Order of magnitude improvements in the ESS targeting the posterior can be achieved using these independent proposals, and the proposed evidence West, M. and Harrison, J. (1997). Bayesian forecasting and dynamic models. New York: Springer-Verlag New York. MR1482232. 784