Ensemble MCMC: Accelerating Pseudo-Marginal MCMC for State Space Models using the Ensemble Kalman Filter

Particle Markov chain Monte Carlo (pMCMC) is now a popular method for performing Bayesian statistical inference on challenging state space models (SSMs) with unknown static parameters. It uses a particle filter (PF) at each iteration of an MCMC algorithm to unbiasedly estimate the likelihood for a given static parameter value. However, pMCMC can be computationally intensive when a large number of particles in the PF is required, such as when the data is highly informative, the model is misspecified and/or the time series is long. In this paper we exploit the ensemble Kalman filter (EnKF) developed in the data assimilation literature to speed up pMCMC. We replace the unbiased PF likelihood with the biased EnKF likelihood estimate within MCMC to sample over the space of the static parameter. On a wide class of different non-linear SSM models, we demonstrate that our new ensemble MCMC (eMCMC) method can significantly reduce the computational cost whilst maintaining reasonable accuracy. We also propose several extensions of the vanilla eMCMC algorithm to further improve computational efficiency and allow for approximate posterior hidden state inference.


Introduction
Particle Markov chain Monte Carlo (pMCMC, Andrieu et al., 2010) is now a popular method for performing Bayesian statistical inference on challenging state space models (SSMs) with unknown static parameters. The appeal of pMCMC is that it is a pseudomarginal method (Andrieu and Roberts, 2009), which attempts to mimic the ideal sampler that proposes directly over the space of the static parameters and integrates out the hidden states. Furthermore it is an exact approximation, exactly targeting the true posterior distribution.
Each static parameter proposal in pMCMC is evaluated using a particle filter (Gordon et al., 1993). Particle filters were originally proposed to solve the state space filtering problem: inferring the state parameters at a given time under known static parameters. To do so they propagate a set of particles through the state space model, and use a weighting and resampling process to concentrate on the particles with significant posterior weights. Using particle filters in pMCMC is costly. Firstly, each particle filter involves processing the entire dataset. Secondly, a particle filter can require a large number of particles, especially when the data are highly informative and/or the model is misspecified. This is because there must be enough particles to randomly propagate forwards to produce good matches to unlikely data. Thus, despite the popularity of pMCMC, it is generally a highly computationally intensive method.
Data assimilation (DA) is a field of research originating in the geosciences, initially based on the problem of numerical weather prediction. The task most commonly addressed in this field is the estimation of the state of a dynamical system, based on a dynamic model (usually a system of partial differential equations) and noisy and/or indirect measurements of this state. In this paper we take inspiration from the DA literature to propose a new approach to estimating the posterior distribution of static parameters in SSMs.
The field of DA has evolved in parallel to other fields in which SSMs play an important role, such as target tracking, economics and statistical ecology. The distinguishing feature of problems in DA is the large dimension of the state space. For example, in numerical weather prediction the state space consists of a representation of the state of the atmosphere across the globe, which for modern applications can have dimension d x of the order of 10 9 (van Leeuwen, 2015). The traditional approach to estimating the dynamic state in DA is to use approaches that solely estimate the mode of the state posterior (e.g. 4DVar) or Kalman filters that make use of approximations so as to avoid storing the full state covariance, whose size scales quadratically in the state dimension. Such methods have huge practical importance and are still deployed in DA applications, but more recent research has focussed on methods that improve the accuracy of state estimation when using nonlinear dynamics. As in other fields where this the case, particle filters are an important methodology.
Particle filters are not the usual method of choice in DA. The reason is their degeneracy when used on states of high dimension (Snyder et al., 2008). This degeneracy arises due to the limitations of importance sampling in high dimensions: the variance of importance sampling estimators depends on the distance between the target and proposal distributions, and this distance grows with dimension such that the variance is only controlled by using a number of importance samples that is exponential in the dimension (see Agapiou et al., 2017 for a review). To combat this degeneracy, the approach usually taken in the particle filtering literature is to introduce diversity into the sample through MCMC updates of the state (Beskos et al., 2014). However, in many problems in DA, MCMC updates are often not available due to the use of an intractable dynamic model, and where available may have a low acceptance rate. In this case of an intractable dynamic model, most advanced particle filtering techniques are not available, and hence the class of models to which pMCMC is practicable for inferring the posterior distribution of static parameters is relatively limited: roughly speaking the dimension of both the state space and the static parameter must be less than 10, the length of the observed data must be less than 100,000, and simulations from the model cannot be radically different from the observed data. In such situations approximate Bayesian computation (Sisson et al., 2018) or Bayesian synthetic likelihood (Price et al., 2018) can be used as an alternative, but this approach necessarily conditions the inference on summary statistics of the data which can result in posterior distributions that are very different to the truth (see Fasiolo et al., 2016 for an introduction to these alternative methods and a comparison of their performance with pMCMC). The focus in the current paper is on estimating the posterior distribution of static parameters in situations where pMCMC, for the reasons outlined above, becomes too computationally demanding when implemented on a desktop or laptop computer, and when we wish to avoid reducing the full dataset to a set of summary statistics.
An alternative means of maintaining diversity to using MCMC updates in a particle filter is given by the ensemble Kalman filter (EnKF) (Evensen, 1994; see Katzfuss et al., 2016 for a tutorial). This approach propagates a set of particles (often referred to as "ensemble members") through the dynamic model in the same way as the bootstrap particle filter, but instead uses these particles to approximate a Gaussian representation of the state distribution. This method approximates the Kalman filter and provides a means to avoid storing and manipulating the state covariance matrix. Further, it has also been shown to perform well when applied to nonlinear dynamic models. Its performance is often superior to the particle filter in cases where the particle filter suffers from degeneracy, including but not limited to the case of a state space of high dimension.
Methods from DA have also been applied to the situation of inferring an unknown static parameter simultaneously with the state. The standard approach is to augment the state vector with the static parameter, then to apply one of the previously mentioned filters to this augmented state (see, for example, Evensen, 2007). In this case, the EnKF assumes that both parameters and states follow a linear Gaussian state space model. When this assumption is unreasonable, another approach is to combine the EnKF likelihood with a particle representation of the static parameter (Stroud et al., 2018;Katzfuss et al., 2019). To mitigate degeneracy, the static parameter is allowed to dynamically vary, by adding Gaussian noise to each parameter particle. This step can be further refined via the kernel resampling strategy of Liu and West (2001). However, additional tuning parameters must be specified (e.g. to control the smoothness of the kernel) and the particle approximation can be sensitive to these choices, and in particular, the number of particles used (Vieira and Wilkinson, 2016). We note that maximum likelihood methods are also possible (see e.g. Mitchell and Houtekamer, 2000;Stroud et al., 2010;Carrassi et al., 2017). Katzfuss et al. (2019) propose the use of the EnKF as a substitute for the particle filter in pMCMC, where the filter in each case is run with fixed static parameters to produce a likelihood estimate, which is used within a Metropolis-Hastings algorithm (Minvielle et al., 2014 developed a similar approach in the Physics literature). By analogy with pMCMC we refer to this approach as "ensemble MCMC" (eMCMC). The contributions of this paper are: several extensions of eMCMC to further improve computational efficiency and reduce the bias in the EnKF estimate of the likelihood; and a more extensive empirical study than that of Katzfuss et al. (2019) where the focus is mainly on other EnKF-based approaches. This demonstrates that the method can be successful on a wider variety of more challenging applications. Moreover, we compare pMCMC with eMCMC, investigating the reduction in the number of particles/members required and the ability of eMCMC to cope better with informative or surprising data. We find that eMCMC exhibits a reduced computational cost relative to pMCMC and also improves the accuracy relative to a state augmentation approach using the EnKF where static parameters are treated as time-varying.
The rest of this article is structured as follows. In Section 2 we provide the necessary background on state space models, pMCMC and EnKF to understand our method. The eMCMC approach together with extensions is described in Section 3. Section 4 shows the results of our approach on a wide class of different models. We discuss limitations, further extensions and possible future research in Section 5.

Background
This section describes relevant existing work. Section 2.1 defines state space models. Section 2.2 describes pseudo-marginal Metropolis Hastings and the bootstrap particle filter, which can be used to perform inference for these models. Section 2.3 introduces the EnKF.

State Space Models
A state space model is a model for sequential data. It introduces a Markov chain of latent states x 1 , . . . , x T . Independent noisy observations y t are available that depend on the state x t . Let x = (x 1 , . . . , x T ) denote the collection of all latent states and y = (y 1 , . . . , y T ) the collection of all observations. The model can be defined using an evolution distribution for x t+1 |x t , θ and an observation distribution y t |x t , θ. Here θ is a vector of parameters controlling the model's behaviour. We also specify a distribution for an initial state x 0 . For more background on state space models see for example Särkkä (2013).
Throughout the paper we will make some standard assumptions about state space models. We will assume that each x t and y t are random vectors with support R dx and R dy respectively. In this section we assume the distributions above -evolution, observation and initial state -have densities p(x t+1 |x t , θ), p(y t |x t , θ) and p(x 0 ). The material immediately generalises to the case where some or all of these distributions have probability mass functions instead (by interpreting these as densities with respect to the counting measure). This is required in several of our examples. A point mass can be used for the initial state distribution if the initial state is known.
As we shall see, the EnKF is restricted to certain observation models. Hence in this paper we focus on one particular case, where P is a d y × d x matrix and S is a variance matrix, possibly a function of θ. (We assume conditional independence of the y t 's given x and θ.) The EnKF can also be used where P is replaced by a time dependent matrix P t . The case where P = I gives a complete observation regime, in the sense that all components of x t have a corresponding noisy observation. In contrast, a partial observation regime only allows observation of a subset of the components e.g. by taking P to be a projection matrix.
In practice we may wish to model states at a finer time discretisation than that at which the observation data are available. For example, consider the case where we only have observations y t at t = k, 2k, . . . , kL. This can easily be converted into the framework described above, by defining a state space model with x * τ = x τk and y * τ = y τk for τ = 1, 2, . . . , L.
The joint density of the latent states and observations in a state space model is: The likelihood can be found by marginalisation i.e. integrating out the latent states x, (If there is an observation y 0 , a factor p(y 0 |x 0 , θ) can easily be included in (2) and (3).) Bayesian inference assigns a prior p(θ) to the parameters and targets the posterior p(θ|y) ∝ p(θ)L(θ). The likelihood L(θ) typically cannot be evaluated as it is a high dimensional integral. One strategy to perform inference is to instead consider an augmented target density (often of interest in its own right), the joint posterior p(θ, x|y) ∝ p(θ)p(x, y|θ). The posterior for θ can then be obtained by marginalisation.

Pseudo-marginal MCMC, Particle Filters, and Particle MCMC
Monte Carlo algorithms are designed to sample from a target distribution, often a Bayesian posterior distribution. Markov chain Monte Carlo (MCMC) does so using a Markov chain which converges to the target distribution in the long run. Performing each update in MCMC typically requires likelihood calculations, which are not possible for models with intractable likelihoods. However it is often possible to produce unbiased likelihood estimates. Algorithm 1, pseudo-marginal Metropolis Hastings (PMMH) (Andrieu and Roberts, 2009), makes use of these to perform parameter inference. Unbiased likelihood estimates for state space models can be produced by particle filter algorithms. Algorithm 2 presents the basic bootstrap particle filter (BPF) used in this paper, but there are many variations. For more details see for example Johansen (2011), Särkkä (2013) and Fearnhead and Künsch (2018). For proof that the particle filter likelihood estimate is indeed unbiased see Del Moral (2004) and Pitt et al. (2010).
Combining the PMMH algorithm with a particle filter can target p(θ|y) for state space models. Andrieu et al. (2010) extend this approach to give particle MCMC (pM-CMC), which targets the joint posterior p(θ, x|y); we refer the reader to this paper for a full description of pMCMC.
St N (or, if a y 0 observation is available, take the product from t = 0.).

PMMH Tuning
PMMH using BPF likelihood estimates has several tuning choices. This section sets out the approach we use to make these choices in this paper. Our choices are consistent with the theoretical analyses of Sherlock et al. (2015) and Doucet et al. (2015), who derive tuning recommendations under two different sets of simplifying assumptions.
We select the number of particles N for the BPF prior to running Algorithm 1 so that the estimated log-likelihood at a representative parameter value has a standard deviation of roughly 1.5. The parameter value used should have good support under the posterior; we typically use marginal posterior medians from exploratory analyses.
We use a normal random walk proposal distribution: θ * ∼ N (θ i−1 , Σ RW ). We take Σ RW to be an estimate of the posterior variance, again taken from exploratory analyses. Sherlock et al. (2015) and Doucet et al. (2015) provide guidance for scaling the variance matrix by a scalar to improve performance, finding that this was helpful for high dimensional target distributions for instance. We did not find this necessary for our analyses of low dimensional targets, but make use of this approach in Section 4.6 where an 11-dimensional target is considered.

Ensemble Kalman Filter
Here, we give a brief overview of the ensemble Kalman filter (Evensen, 1994) and refer the reader to Katzfuss et al. (2016) and the references therein for further details.
Consider the task of generating samples according to the filtering density p(x t |y 1:t ) where y 1:t = (y 1 , . . . , y t ). (We omit explicit conditioning on the parameter vector θ throughout this section.) The EnKF generates approximate draws from p(x t |y 1:t ) via a sequence of forecasting and updating steps. Suppose that a sample {x The forecast density p(x t |y 1:t−1 ) is then approximated by where N (·; μ, Σ) denotes the multivariate Gaussian density with mean μ and variance matrix Σ. The quantitiesμ t|t−1 andΣ t|t−1 are typically taken to be the sample mean and variance computed from the forecast ensemble (some extensions of the EnKF use alternative estimates; see Katzfuss et al., 2016 for some common approaches). Now, given the linear Gaussian form of (1), the joint distribution of X t and Y t (given y 1:t−1 ) can be obtained approximately as Hence, conditioning on Y t = y t gives p enkf (x t |y 1:t ) = N (x t ;μ t|t ,Σ t|t ) whereμ t|t =μ t|t−1 +K t (y t − Pμ t|t−1 ),Σ t|t = (I dx −K t P )Σ t|t−1 andK t is an estimate of the Kalman gain, that isK It is then straightforward to generate samples from (5) to be used as the filtering ensemble at the next time point. However, rather than explicitly calculate the filtering density in (5), the standard implementation of the EnKF (see e.g. Katzfuss et al., 2016) performs a shifting step, which is equivalent under the Gaussianity assumption (4) (and a Gaussian prior for x 0 ). For each particle (known in this context as an ensemble member ), we compute x is a pseudo-observation. Note that the shifting step only requires a draw from a d y variate Gaussian distribution per particle, rather than a draw from a d x variate Gaussian if (5) is sampled directly. Moreover, the shifting approach does not make the strong assumption that the forecast ensemble is Gaussian distributed. We note that there are other schemes for performing the shifting, but we do not consider these here.

Given a sample {x
(1) 0 , . . . , x (N ) 0 } from the state prior, the EnKF recursively alternates between computing the forecast ensemble and shifting each ensemble member, giving approximate draws from the filtering density p(x t |y 1:t ) for t = 1, . . . , T . We state a version of the EnKF based on the shifting step described above as Algorithm 3.
The EnKF is most easily understood in the context of a linear Gaussian state space model. In this special case, the filtering distribution p enkf (x t |y 1:t−1 ) converges to the true filtering distribution as the number of ensemble members N → ∞. Essentially, the EnKF converges to the Kalman filter. For finite N and a linear state space model, the EnKF approximates the Kalman filter by replacing the mean and variance of the forecast distribution with their sample equivalents. The resulting dimension reduction (that only requires storing and manipulating d x -vectors) avoids the potentially expensive calculation and storage of the forecast variance matrix. Moreover, several studies (e.g. Lei et al., 2010;Houtekamer et al., 2014;Katzfuss et al., 2019) have found that the EnKF shifting step works well for non-Gaussian evolution densities. We therefore consider the use of the EnKF likelihood inside a Metropolis-Hastings scheme. We provide a motivation and give details of the proposed approach in the next section.

Ensemble MCMC
It is well known that as the variance of the likelihood estimator increases, the acceptance probability of the pseudo-marginal MH scheme rapidly decreases to 0 (Pitt et al., 2012), resulting in slow mixing behaviour of the parameter chains. As discussed in Section 2.2, a value of N (the number of particles) can be chosen to balance mixing performance and computational cost. Nevertheless, in scenarios where the stochasticity inherent in the state process dominates the observation variance, the number of particles required to maintain a reasonable likelihood variance is likely to render BPF-driven PMMH computationally infeasible. Methods that aim to alleviate this problem include the use of an auxiliary particle filter (see e.g. Golightly and Wilkinson, 2015), which requires careful exploitation of the model structure in order to propagate particles conditional on the observations. By comparison, the algorithm studied here is simple to implement and, for the simplest implementation, does not require the specification of any additional tuning parameters.
Here we outline the ensemble MCMC (eMCMC) algorithm proposed in Katzfuss et al. (2019). In essence, this is PMMH using the EnKF as a fast replacement for the BPF to estimate the likelihood L(θ). In the following subsections we discuss some extensions to improve its efficiency.
First we derive a likelihood estimate based on EnKF calculations. Recall that the (marginal) likelihood can be factorised as From (4) it follows that an EnKF approximation of p(y t |y 1: which can easily be computed for each t = 1, . . . , T , with the notational convention that p(y 1 |θ) = p(y 1 |y 1:0 , θ). The need for the explicit dependence on N for this likelihood estimate will become clearer later in this section. The overall approximation to the likelihood is given byL The EnKF including likelihood estimation is given by Algorithm 3. The eMCMC scheme is then implemented by running Algorithm 1 withL replaced byL N enkf . One issue in implementing eMCMC is how to perform tuning. Due to the absence of specialised theory, we use the same tuning guidance as for PMMH with BPF likelihood estimates, described above in Section 2.2.
It is worth emphasising that, unlike pMCMC, the eMCMC posterior does not in general equal the posterior π(θ|y) exactly. The reason is that, unlike the BPF, the EnKF gives a biased estimator of L(θ), precluding its use for exact approximate inference. Nevertheless, as noted by Stroud et al. (2010), Stroud et al. (2018) and Katzfuss et al. (2019) among others, the variance of the likelihood estimator under the EnKF can be relatively small, suggesting that use of EnKF inside a Metropolis-Hastings scheme is likely to be of practical use, particularly in scenarios when the BPF is computationally prohibitive.
In fact, even when the forecast ensemble is exactly Gaussian distributed for all t, the EnKF posterior still does not target the exact posterior, since N (y t ; Pμ t|t−1 , PΣ t|t−1 P + S) is a biased estimate of the idealised normal density should we be able to take N → ∞.

Algorithm 3 Ensemble Kalman filter.
Input: number of ensemble members N Initialise.
Thus, for finite N , the eMCMC target is not the idealised eMCMC target, p ∞ enkf (θ|y). However, we find empirically that our method appears to be weakly dependent on N . Given this, we suggest to choose N to maximise the computational efficiency by borrowing similar advice from the pseudo-marginal literature (as described in Section 2.2). Interestingly, there is an exactly unbiased estimator of a normal density given a sample from it, and we exploit this in Section 3.3. We discuss the unbiased version and other extensions below.

Randomised Quasi-Monte Carlo
For this subsection, all quantities are conditioned on θ so we drop it for notational convenience. At iteration t of the EnKF we estimate μ t|t−1 and Σ t|t−1 , so that we can approximate the conditional likelihood p(y t |y 1:t−1 ) with a Gaussian density. These moments are estimated via firstly performing the shifting step at t − 1 and conditional on the result simulating from the forward evolution density. This is effectively an approximate sample from the joint distribution p(x t , x t−1 |y 1:t−1 ) = p(x t |x t−1 )p(x t−1 |y 1:t−1 ). Then μ t|t−1 and Σ t|t−1 are estimated from the N ensemble members.
Often it is possible to write the simulation from a standard statistical distribution as a function of a uniform random number. For example, to simulate a random draw y from a normal distribution, N (μ, σ 2 ), we can compute the following, y = μ + σ · Φ −1 (u) where u ∼ U(0, 1) and Φ −1 (u) is the quantile function of the standard normal density. Assume that we can write the evolution density as a function of m uniform random variates. Then, we require d y + m uniform random numbers to approximately simulate from x t , x t−1 |y 1:t−1 (d y for the shifting step and m for simulating the evolution density). Given N particles, we use N × (d y + m) uniform random numbers for estimating μ t|t−1 and Σ t|t−1 . The naive approach is to draw these via pseudo-random numbers. However, significant variance reduction could be achieved by simulating from the (d y + m)dimensional object N times using randomised quasi-Monte Carlo (RQMC, e.g. L'Ecuyer and Lemieux, 2005). QMC is well known to generate a sequence of numbers that have superior space filling properties in the unit hypercube compared to pseudo-random numbers. The randomised component ensures that expectations can be estimated unbiasedly. Random numbers from the joint distribution of interest, x t , x t−1 |y 1:t−1 , can be achieved via transforming the RQMC numbers as recently discussed. We use this approach to bring down the variance of the estimators of μ t|t−1 and Σ t|t−1 , which hopefully reduces the variance of the estimator for p(y t |y 1:t−1 ). For generating the RQMC numbers in this paper, we use the scrambled Sobol's net, i.e. the scrambled (t, m, s)-net in base b = 2.
RQMC has recently received increasing attention in the statistics community. Tran et al. (2017) document a faster convergence in their variational Bayes updating procedure when the noisy gradient is computed using RQMC, Drovandi and Tran (2018) use it to reduce the variance of expected utility estimation within Bayesian optimal design and Gerber and Chopin (2015) show the efficiency of RQMC in particle filtering. However their application to particle filtering requires considerable ingenuity (using a Hilbert curve method to perform resampling). It is interesting to note the ease with which RQMC can be exploited in the EnKF in comparison.

Correlated eMCMC
As mentioned above, the EnKF requires generating random numbers for the shifting step and simulating the evolution density. The former can be generated by standard normal random variates and the Cholesky factorisation of the covariance matrix. We assume in this section that the evolution density can be simulated either directly or indirectly via a suitable transformation with standard normal random numbers. Denote the collection of these random numbers required in the EnKF as u. Deligiannidis et al. (2018) and Dahlin et al. (2015) develop the correlated pseudomarginal MCMC method where they consider the joint target density p(θ, u|y) where u are random numbers required to estimate the likelihood unbiasedly, p (y|θ, u). It is easy to show that the θ-marginal of the joint distribution is the posterior of interest, p(θ|y). Assume that u are independent standard normal random variates. The idea of the correlated pseudo-marginal method is to induce correlation in successive likelihood estimates in MCMC by correlating the u random numbers. This can have the effect of mitigating "sticky" behaviour often seen in pseudo-marginal chains since, in the correlated scheme, if the likelihood is overestimated at the current iteration, it is also likely to be overestimated at the next. The joint proposal distribution of the correlated pseudo-marginal method is given by where I is the identity matrix. The proposal for u is the Crank-Nicolson proposal and it is invariant with respect to the marginal distribution of u. σ 2 u is an additional tuning parameter that is typically set to be small so that u * is highly correlated with u.
Here we consider applying this correlated pseudo-marginal approach to our eMCMC method, with the motivation that a smaller ensemble size N can be used, reducing computational cost. Note that BPF driven pMCMC requires additional modification to accommodate this approach, as it did for RQMC. Essentially, the resampling step has the effect of breaking down correlation between successive likelihood estimates. To alleviate this problem, the particles can be sorted before propagation e.g. using a Hilbert sorting procedure (Deligiannidis et al., 2018) or simple Euclidean sorting (Choppala et al., 2016). The random numbers used in the resampling step itself should also be updated using the Crank-Nicolson proposal. Since the eMCMC scheme does not use resampling, incorporating correlation is straightforward.

Unbiased Ensemble Kalman Filter Likelihood
As mentioned earlier, even if the sample from the forecast distribution was exactly Gaussian for some t, the corresponding EnKF likelihood estimate for y t would not be unbiased. In general, for some data y and a sample of size N from a Gaussian distribution, x = x 1 , . . . , x N ∼ N (μ, Σ), the density estimator N (y; μ N , Σ N ) is not an unbiased estimator of N (y; μ, Σ) where μ N and Σ N are the sample mean and covariance computed from the sample x. Given the bias present in the EnKF likelihood estimate, even when the Gaussian assumption is correct, the EnKF posterior, unlike standard pseudo-marginal, theoretically depends on N .
Even though we demonstrate empirically in Section 4 that the eMCMC posterior seems to be only weakly dependent on N , we present a new approach now that will likely be less sensitive to N . Interestingly, there does exist an unbiased estimator of a Gaussian density given only an iid sample from the same Gaussian density. Using the notation of Ghurye and Olkin (1969), let , and for a square matrix A write ψ(A) = |A| if A > 0 and ψ(A) = 0 otherwise, where |A| is the determinant of A and A > 0 means that A is positive definite. The result of Ghurye and Olkin (1969) shows that an exactly unbiased estimator of N (y; μ, Σ) is (in the case where y is Gaussian and N > d + 3 where d is the dimension of y) We propose to replace the standard Gaussian density estimator in the EnKF likelihood estimator with this alternative estimator. Note that this estimator has also been used in Price et al. (2018) for approximating intractable likelihoods in simulation-based likelihood-free estimation problems.
We refer to the method when we use the unbiased Gaussian density estimator in the EnKF likelihood estimator as ueMCMC. We stress that this approach still does not target the true posterior, but at least it will not depend on the number of ensemble members N when the Gaussian assumption is correct, i.e. the target is exactly the idealised approximation, p ∞ enkf (θ|y). Even though the forecast density is unlikely to be exactly Gaussian in practice, we do expect ueMCMC to be less sensitive to N compared with eMCMC. We note that this method might be particularly useful when combined with the correlated approach in Section 3.2, since it might be sufficient to use a very small N to achieve reasonable computational efficiency, but the small N may produce bias in the eMCMC posterior compared to the idealised eMCMC posterior. Prangle et al. (2018) apply pMCMC in the setting of approximate Bayesian computation (ABC). They outline a method for rejecting proposed values of θ that have a small estimated likelihood without running the whole particle filter. In Everitt and Sibly (2019) it is shown that this approach can be extended to pMCMC when using the BPF. A similar approach may be used in eMCMC. Suppose that the likelihood estimate from the EnKF is implemented sequentially, as the EnKF is running. Recall from (8) that the EnKF likelihood estimate is a product,L N enkf (θ) = T t=1 α t . Here α t = N y t ; Pμ t|t−1 , P Σ t|t−1 P + S is calculated in iteration t of the EnKF. We are guaranteed that an upper bound on α t is given by B(θ * ) := N (0; 0, S(θ * )), where 0 represents a vector of zeros of appropriate dimension. (See Algorithm 4 for more details on notation.) Thus α t /B ≤ 1. This fact ensures thatr (τ ) enkf := τ t=1 α t /B is an upper bound onL N enkf (θ)/B T which can be calculated at iteration τ of the EnKF. We use this property to propose an "early rejection" algorithm. The idea is that during an EnKF run, as soon asr enkf drops below a certain threshold, we are sure that the MCMC proposal θ * will not be accepted. Hence we can save time by immediately terminating the EnKF run. No time is saved for an accepted proposal; only for those that are rejected. Therefore the computational savings are largest in cases where the acceptance probability is low, as we would find, for example: in a higher dimensional parameter space when the proposal variance is not reduced accordingly; or when the likelihood estimate has a high variance. Algorithm 4 describes a single iteration of the resultant MCMC algorithm, which involves reorganising the order of calculation of the acceptance probability and likelihood estimate from our standard eMCMC algorithm. This early rejection approach is employed in Section 4.6, where a computationally expensive model is studied.

Results
Here we demonstrate the potential of our method on several examples with different kinds of complexity. We select the number of particles N and MCMC proposal variance as described in Section 2.2. Given that the different methods have different target distributions, we tune the random walk covariance matrix individually for each method.
In terms of accuracy we compare the approximate eMCMC and the 'exact' pMCMC approach visually. We note that in many applications it might not be critical to obtain samples from the exact posterior given the potential for model misspecification and/or high accuracy not being important for the analysis aims. When we deem the eMCMC approximation to be reasonable enough, we compare the statistical efficiency of the Algorithm 4 An iteration of early-rejection eMCMC.

Likelihood update.
Compute estimates of the forecast mean and variance: B T (θ * ) then reject θ * and break. end if 4. Shift ensemble. Compute the approximate Kalman gain,K t , given by (6).
two methods using effective sample size (ESS). ESS is the number of independent and identically distributed samples from the target that would produce an estimate with the same variance as the autocorrelated MCMC output. We generally use the multivariate ESS estimate of Vats et al. (2019) which takes posterior dependence into account (in one case where this performs poorly we instead use the univariate ESS of Plummer et al. 2006). We also consider overall efficiency -ESS per second -which incorporates both statistical efficiency and computing time.
In Section 5 we provide suggestions on how 'exact' posterior sampling can be achieved whilst still using the EnKF. However, the statistical efficiency gains of these approaches will be reduced compared to eMCMC.
Unless otherwise stated, we use the standard normal density estimator in eMCMC as opposed to the unbiased version in Section 3.3. We find empirically they produce similar results and there is a small amount of overhead and implementation effort with the unbiased normal density estimator. However, practitioners may decide to use the unbiased version given the theoretical benefit it brings.

Population Ecology Example
Model and Inference Task Peters et al. (2010) consider a set of competing nonlinear state space population models in ecology and apply them to several datasets. Denoting the observation at time t as y t and the corresponding hidden state as n t , the four models we consider are defined below: 1. Ricker model: log n t+1 = log n t + β 0 + β 1 n t + t .
Here t ∼ N(0, σ 2 w ). The observation process is assumed to be Gaussian, y t |n t ∼ N (log n t , σ 2 e ). See Peters et al. (2010) for a justification and some qualitative analyses of these models. The parameters are assumed independent a priori and have the following specifications: β 0 , β 1 , β 3 , β 5 ∼ N (0, 1), β 4 , σ w , σ e ∼ Exp(1) and log n 0 has an improper uniform prior over the real line.
Here we re-analyse the nutria dataset, a time series of female nutria abundance in East Anglia at monthly intervals, considered in Peters et al. (2010) and some references therein. The data is shown in Figure 1. It is useful to consider this class of models since various quantities of the model are tractable to compute, permitting more advanced particle filters to compare with. For example, the fully adapted auxiliary particle filter (see e.g. Pitt et al., 2010) pre-weights the particles by (using generic notation) p(y t |x t−1 , θ) = xt p(y t |x t , θ)p(x t |x t−1 , θ)dx t and propagates resampled particles by p(x t |y t , x t−1 , θ). This particle filter results in particle weights that remain uniform throughout the algorithm. It is the optimal onestep look-ahead particle filter. The fully adapted particle filter works extremely well in this application. It requires only N = 5 particles for superior performance compared to the bootstrap filter using N = 50,000 particles.
Although the quantities required for the fully adapted auxiliary particle filter can be obtained for this example, they cannot for a wide class of complex models. For many state space models, the transition density p(x t |x t−1 , θ) cannot be evaluated. In these cases, it is convenient to propagate particles according to the transition law so that intractable terms cancel in the importance weights. Thus we consider another particle filter for comparison, referred to here as the partially adapted particle filter, where p(y t |x t−1 , θ) is used to pre-weight particles and the transition density is used to propagate the resampled particles. However, in many models evaluating p(y t |x t−1 , θ) is also intractable. Often it is approximated with p(y t |μ t|t−1 , θ) where μ t|t−1 is given by some location measure of the x t |x t−1 , θ (e.g. mean). We find that the partially adapted approach produces similar results to the standard bootstrap filter.
For the results presented below, we assume that it is only feasible to simulate the transition density, and thus compare with pMCMC using the bootstrap filter, given that the partially adapted filter produces similar results.

Inference
It is likely that all the considered models are misspecified but we would like a robust method for fitting them in order to compare the models and investigate possibilities for extending the models. We find that all models have particular difficulty in capturing the sudden drop in abundance between months 107 and 108. Further, there appears to be only small observation error. The consequence for the bootstrap filter is a very small ESS and high variance estimates of the likelihood unless a very large number of particles is used.
For eMCMC, we only require N = 250 (Ricker, Flexible-Allee, theta-logistic) and N = 200 (mate-limited) particles. In contrast, we use N = 50,000 for pMCMC. For some of the models, the standard deviation of the estimated log likelihood is still larger than 1.5 even with this large number of particles. However, we find that when these occur the distribution of the log-likelihood estimator with the BPF has a skew-left distribution, which is less problematic for pMCMC getting stuck at overestimated log likelihood values. We find that the pMCMC acceptance rates remain reasonable with N = 50,000 particles.
The MCMC acceptance rates for the four models are 15%, 4%, 11% and 10% (eM-CMC), and 8%, 3%, 6% and 5% (pMCMC), respectively. The acceptance rates are lower for the theta-logistic model as the posterior distribution is far more irregular compared to the other three models (see Figure 3).
Based on Figures 2, 4 and 5, eMCMC obtains estimated univariate posterior distributions that are remarkably similar to pMCMC. There is more difference for the theta-logistic model ( Figure 3) but they remain broadly similar. Further, the Monte Carlo error is greater for this model, potentially exaggerating the differences. The two methods are compared in terms of computational efficiency on the four models in Table 1. It is evident that the eMCMC approach produces a two order of magnitude improvement in terms of computational efficiency and still produces reasonable approximations of the posterior.
We test eMCMC for a range of N values in 100-1000 and find the univariate posteriors to show little sensitivity to N (results not shown). For N = 100, the MCMC acceptance rate drops substantially and reducing N further is likely to significantly reduce the statistical efficiency of MCMC due to the high-variance likelihood estimates. Therefore, it is difficult to test the sensitivity of the results to small N .
However, using the correlated extension (with σ u = 0.1) allows us to use small N and maintain statistically efficient results. Similar MCMC acceptance rates as eMCMC with 250 particles can be achieved using only N = 25 particles. The efficiency results can be seen in Table 1. It is evident that the correlation further improves the computational efficiency in this example. The resulting approximate marginal posteriors compared to eMCMC with N = 1000 are shown for the four models in the figures of Appendix A of the Supplementary Materials (Drovandi et al., 2020). It is clear that similar approximate posteriors are obtained even with vastly different N values. However, for all models there is a noticeable bias in the approximate posterior for σ w . We also run the unbiased version    of Section 3.3 with the correlated extension, again for N = 25. The same figures in the appendix demonstrate that ueMCMC is able to reduce the bias in the approximate posterior for σ w . The largest difference between the results for N = 1000 and N = 25 occurs for the theta-logistic model. For N = 25, the unbiased version seems to offer some correction for θ w and θ e but produces similar results to the biased version for the other parameters. We find that with the unbiased version the ESS remains similar, but the overall efficiency is slightly reduced. The reduction in computational efficiency mainly comes here from the extra time to compute the unbiased multivariate normal density estimator (the ESS is roughly the same). We note that for applications where simulating the transition density consumes the majority of the computation, the additional time associated with computing the unbiased estimator will be significantly less noticeable.
Finally, we investigate improvements that can be obtained in this example when using the RQMC extension. Here we use N = 50 ensemble members for each model. It is evident that the RQMC extension produces similar marginal posteriors compared to eMCMC with N = 1000 particles (see Appendix B of the Supplementary Materials). The ESS values for the four models are roughly 2500, 450, 1700, and 2100, which are competitive with standard eMCMC using a significantly larger, N = 200-250, number of particles (see Table 1). However, the ESS/Time scores for the four models are only roughly 900, 130, 500 and 620 with the RQMC extension. Given that simulation of the transition density is trivial in this example, the cost associated with generating the RQMC samples is significant and consequently the ESS/Time score with the RQMC extension is substantially reduced. However, in complex examples where simulating the transition density is expensive, the cost associated with RQMC will be far less noticeable.

Model and Inference Task
The Lorenz 63 dynamical system (Lorenz, 1963) is a classic low dimensional example of chaotic behaviour. An Itô stochastic differential equation (SDE) version from Vrettas et al. (2015) is Here X t is a vector of the random variables X 1,t , X 2,t , X 3,t , and W t is a vector of standard uncorrelated Brownian motion processes of the same length as X t . Note that Σ 1/2 is interpreted as a matrix square root. We assume each X i,t at a grid of prespecified t values has a corresponding observation Y i,t ∼ N (X i,t , σ 2 obs ), and that these are independent.
Exact simulation of SDEs is extremely challenging, so it is common to work with an Euler-Maruyama discretisation (see e.g. Wilkinson, 2018). For the Lorenz model above this gives, where each z i+1 is an independent N (0, I 3 ) realisation. Then x i is an approximation to X t for t = iΔt.

Log Likelihoods
First we compare log likelihood estimates produced by the EnKF and BPF. We run each method 5 times for θ 1 = 1, 2, . . . , 20. The other parameters are held constant at their true values. We use 100 particles for both filtering methods. The average run-times were roughly half as long for EnKF -0.019 s -compared to BPF 0.046 s. Appendix C of the Supplementary Materials shows the results. For any θ 1 value, the log likelihood estimates are more variable under BPF than EnKF. Variability becomes particularly large under BPF when θ 1 is far from its true value. Such high variance is problematic in PMMH, as it is likely to cause chains to become stuck. The figure suggests that the EnKF and BPF produce similar expected likelihood estimates when θ 1 is close to its true value. It is hard to draw any conclusions for other θ 1 values, as the BPF expected likelihood will be strongly driven by the upper tail of its log-likelihood estimates, and this would take a very large number of simulations to estimate well.

Inference
Here we assume σ obs is known, and attempt to infer θ i and σ i for i = 1, 2, 3. We assume these parameters have independent exponential prior distributions with rate 0.1. We ran the EnKF and BPF at the true parameter 30 times for each of various choices of N and calculated empirical log-likelihood variances. Based on these values we select N = 500 for eMCMC and N = 2500 for pMCMC.
We ran our algorithms targeting the log transformed parameters. Both pMCMC and eMCMC achieve acceptance rates in the range 10% to 20% indicating reasonable mixing. Trace plots also suggested good mixing, with no evidence of chains becoming stuck in the same state for a large number of iterations. The ESS values for the MCMC outputs were 390 (eMCMC) and 197 (pMCMC). Run times were 689 s and 10,992 s for eMCMC and pMCMC respectively. Interestingly, the pMCMC run time is roughly 15 times that of eMCMC despite using only 5 times as many particles. Figure 7 shows the resulting marginal posterior estimates. The eMCMC posterior approximation is similar to the gold standard pMCMC results, but there are some noticeable differences for some parameters e.g. the θ 1 and θ 3 posterior marginals are shifted downwards. Posterior correlations were small for both MCMC methods (all below 0.35 in magnitude).
To validate our ensemble size, we ran eMCMC again using N = 25, 50, . . . , 600, with other tuning choices unchanged. ESS per second was maximised by N = 125 and was 62% larger than using N = 500 as above. This shows that our tuning diagnostic was reasonably accurate. (We found multivariate ESS behaved poorly for MCMC output with low acceptance rates e.g. when N = 25. So here we used the univariate ESS of Plummer et al. 2006 instead, averaged over parameters.) We also ran RQMC and correlated variants of eMCMC (using σ u = 0.1 for the latter). For RQMC, initial tuning based on variance of the log likelihood selected N = 500, as for eMCMC. For correlated eMCMC we used a reduced number of particles, N = 100. Posterior marginals are shown in Figure 7 and are extremely similar to eMCMC results. RQMC eMCMC produced a similar acceptance rate and ESS value (305) to eMCMC, but the cost of QMC sampling increased the run time to 3,073 s (roughly a 5 times increase). Correlated eMCMC increased the acceptance rate (to 26%) and ESS (to 417) while also reducing the run time (to 195 s).
Finally, we implemented the particle EnKF (pEnKF) method of Katzfuss et al. (2019) (see their Algorithm 4). This algorithm avoids the need for MCMC. Instead, it evolves a population of static parameters (particles) over time, where each particle has an associated ensemble for the hidden state. The approach uses the EnKF approximation of the likelihood to re-weight the particles. The ensemble of latent states is propagated by the transition density and the particles are propagated via resampling and a jittering step. Our implementation used 10,000 particles, each with 100 ensemble members, which we found gave reasonable performance. However a difficulty of this algorithm is that it is unclear how to make these tuning choices optimally. Our run time was 2,529 s, slower than eMCMC. Figure 7 shows that the resulting posterior marginal approximations are significantly less accurate than eMCMC.

Lorenz 96 Example
The Lorenz 96 dynamical system (Lorenz, 1996) is often used to test the performance of data assimilation methods in high dimensions, and we use it for this purpose here to test the eMCMC algorithm. We introduce an SDE version. This again follows the general SDE (9), but now the ith entry of α(X t , θ) is defined as for i = 1, 2, . . . , d x , and Σ = σ 2 I. Here X t is a vector of the d x random variables X 1,t , X 2,t , . . . , X dx,t . Addition and subtraction of indices above are interpreted modulo d x so that, for example, when i = 1 then X i−1,t is X dx,t . We work with the discretised SDE (10). We simulated a dataset with d x = 50, θ = (1, 1, 8), σ 2 = 10, σ 2 obs = 25 and performed inference on θ and σ. Other details are as in the Section 4.2.
Our tuning diagnostic for the number of particles suggested using N = 5,000 for EnKF. We ran eMCMC for 10,000 iterations. We got an acceptance rate of 18% indicating good mixing. The ESS was 321 and the compute time was 79,097 s. We could not compare with gold standard pMCMC results here, as this algorithm was not computationally feasible for this high dimensional example (as even with 10,000 particles, log likelihood estimates had standard deviations far above our diagnostic target of 1.5, while MCMC runtimes were in the order of days). However, Appendix D of the Supplementary Materials shows that the approximate posterior marginals from eMCMC are consistent with the true parameter values. This demonstrates that eMCMC can produce sensible results for problems where the state has (moderately) high dimension.

Model and Inference Task
The Lotka-Volterra predator-prey model (e.g. Boys et al., 2008) describes the continuous time evolution of a non-negative integer-valued process X t = (X 1,t , X 2,t ) where X 1,t denotes prey and X 2,t denotes predator. Starting from an initial value, X t evolves according to a Markov jump process (MJP) parameterised by rate constants c = (c 1 , c 2 , c 3 ) and characterised by the instantaneous rate or hazard function h(x t , c) = (h 1 (x t , c 1 ), h 2 (x t , c 2 ), h 3 (x t , c 3 )) . Transitions over (t, t + dt] take the form of one of three types (prey reproduction, prey death / predator reproduction, predator death) with associated probabilities given by

The hazard function for this system is
It is then relatively simple to generate realisations of this process via Gillespie's direct method (Gillespie, 1977), where at time t, the dwell time between transition events is drawn from an exponential distribution with rate h 0 (x t , c) = 3 i=1 h i (x t , c i ) and the transition is type i with probability proportional to h i (x t , c i ).
We assume that the MJP is observed with Gaussian error so that As all parameters of interest must be strictly positive, we consider inference for θ = (log c 1 , log c 2 , log c 3 , log σ 1 , log σ 2 ) .
We consider two synthetic data sets (D 1 and D 2 ) simulated with rate parameters c = (0.5, 0.0025, 0.3) and initial condition x 0 = (71, 79) . We further assume σ 1 = σ 2 = 1 to be unknown. To allow the analysis of two data-poor scenarios, dataset D 1 has 51 equally spaced observations on [0, 50] and dataset D 2 is constructed by thinning D 1 to give 26 equally spaced observations on [0,25].

Inference
We compare the performance of EnKF to the gold standard auxiliary particle filter (APF) driven pMCMC scheme described in Golightly and Wilkinson (2015). In brief, state particles are propagated using Gillespie's direct method, with the hazard function replaced by an approximate conditioned hazard, derived from a linear Gaussian approximation to the MJP. Full details of this approach, including the calculation of the particle filter weights can be found in Golightly and Wilkinson (2015).
We follow the practical advice given in Section 2.2 to choose the number of particles / ensemble members N and the scaling of the innovation variance in the random walk proposal distribution. We assumed independent uniform U (−8, 8) priors for the components of θ and ran both eMCMC and pMCMC for 10 5 iterations. Since the EnKF treats the state as continuous, eMCMC used a reflecting barrier at 0 to avoid the state of the system going negative.
The results are summarised by Table 2 and Figure 8, with additional results shown in Appendix E of the Supplementary Materials. We see that for both data sets, the output of eMCMC is consistent with the true values that produced the data and, more importantly, the ground truth posterior based on the output of pMCMC. For dataset D 1 , eMCMC required more particles than pMCMC but gives better overall efficiency (as measured by the ESS per second) since sampling from the propagation construct in the auxiliary particle filter is relatively expensive. We see an increase of about a factor of 3. For dataset D 2 , the number of particles required by pMCMC must be increased, since the propagation construct is based on a linear Gaussian approximation of the true (but unknown) hazard function of the conditioned MJP. The construct breaks down as observations are made sparsely in time (and the dynamics of the conditioned process are nonlinear between observations). Ensemble MCMC on the other hand seems to work well, requiring even fewer particles than for D 1 . We see an increase in overall efficiency (compared to pMCMC) of a factor of around 55.

Model and Inference Task
A commonly used mechanism for auto-regulation in prokaryotes which has been wellstudied and modelled is a negative feedback mechanism whereby dimers of a protein repress its own transcription (e.g. Arkin et al., 1998). A simplified model for such a prokaryotic auto-regulation, based on this mechanism of dimers of a protein coded for by a gene repressing its own transcription into RNA, can be found in Golightly and Wilkinson (2005) (see also Golightly and Wilkinson, 2011).
Let X t = (X 1,t , X 2,t , X 3,t , X 4,t , X 5,t ) denote the number of copies of the unbound gene X 1,t , bound gene X 2,t , RNA X 3,t , protein X 4,t and dimers of the protein X 5,t . We assume that X t evolves according to a Markov jump process. The possible transitions can be succinctly described by the pseudo-reaction list where, for example, occurrence of R 1 at time t reduces X 1,t and X 5,t by 1, increases X 2,t by 1, and leaves the remaining components unchanged. The associated hazard function is h(x t , c) = (c 1 x 1,t x 5,t , c 2 x 2,t , c 3 x 1,t , c 4 x 3,t , c 5 x 4,t (x 4,t − 1)/2, c 6 x 5,t , c 7 x 3,t , c 8 x 4,t ) .

Inference
We again compare the performance of eMCMC to the gold standard auxiliary particle filter driven pMCMC scheme described in Golightly and Wilkinson (2015). The number of particles / ensemble members N was chosen as in Section 2.2. We assigned independent Gamma Ga(1, 0.5) priors to each unknown rate constant and ran eMCMC and pMCMC for 2 × 10 5 iterations. Note that when running eMCMC for dataset D 2 , the (assumed known) values of σ 1 and σ 2 result in no shifting of the ensemble members, rendering this step ineffectual. We therefore ran eMCMC for this scenario by setting the measurement error variance to be "small" throughout the algorithm's execution. Specifically, we found that setting σ 2 1 = σ 2 2 = 0.01 gave reasonable mixing, at the expense of introducing additional bias into the eMCMC posterior. Table 3 and Figure 9 summarise the results. Appendix F of the Supplementary Materials shows posterior estimates for dataset D 2 . It is clear that eMCMC gives output that is consistent with the true values that produced the data and output from pMCMC, which exactly targets the posterior of interest. We therefore compare overall efficiency of eMCMC and pMCMC in terms of ESS per second, as reported in Table 3. For dataset D 1 , eMCMC requires half the number of particles of pMCMC and gives a comparable ESS value. In terms of overall efficiency, eMCMC outperforms pMCMC by around a factor of 4. For dataset D 2 , pMCMC requires around 2000 particles, due to the strict requirement of particle trajectories having to "hit" the observations to receive a non

Neuroscience Example Model
We investigate the following realistic Neural Population Model (NPM) for brain activity. This model (see e.g. Bojak and Liley, 2005) is known as the Liley model, and Bayesian inference for the parameters of this model has previously been described by Maybank et al. (2017). Here a high-level description of the model is presented; more detail can be found in this latter paper. The model is of two neural populations: one population that has an excitatory effect on the activity of connected neurons, the other that has an inhibitory effect. The model consists of the following differential equations, where k = e, i for excitatory and inhibitory contributions: where the Kronecker delta δ ek admits only excitatory noise input p (white noise with zero mean and fixed standard deviation) to this stochastic differential equation system, and where S is a sigmoidal activation function. The 14 state variables are timeevolving properties of the two populations: h e , h i (the mean soma membrane potentials); I ee , I ei , I ii , I ii (local reaction to synaptic inputs) and their differentials; and Φ ee , Φ ei (long-range propagation of activity) and their differentials. The parameters are Γ ee , Γ ei , Γ ie , Γ ii (the synaptic input peak amplitude) γ ee , γ ei , γ ie , γ ii (shape parameters of the post synaptic potentials) andp ee = 6603.4,p ei = 2625.7, σ = 0.01 (the mean rate of the extracortical inputs). The parameters Λ (decay scale of long-range connectivity) v (axonal conduction velocity) and d ee , d ei , d ie , d ii (rise times to peak of the post synaptic potentials) are fixed.
We model an electroencephalogram (EEG) time-series as noisy observations of the h e variable of this NPM, assuming that the EEG observations are linearly proportional to h e with some added observational noise: where Δt is some constant time-step and the z i are iid normal random variables z i ∼ N 0, σ 2 for i = 0, . . . , n − 1.
In this paper an input noise of variance 10 8 was used to simulate data, and the dynamics were simulated using the Euler-Maruyama method with step size 2.5 × 10 −3 . Appendix G of the Supplementary Materials gives the prior distributions for the parameters that were treated as unknown, giving the uniform priors that restrict the parameters to ranges found to be plausible in Bojak and Liley (2005); other parameters were fixed to values chosen from the ranges given by Bojak and Liley (2005).

Results
We compared the performance of eMCMC and pMCMC on data simulated from the Liley model for parameters that result in quasi-linear dynamics about a stable fixed point. The work in Maybank et al. (2017) suggests that the accuracy of the parameter posterior is likely to be improved by using a method that is suitable for non-linear systems (such as pMCMC) compared to using a linearised approach such as the extended Kalman filter or the approach introduced in Maybank et al. (2017). Both MCMC approaches are forms of Metropolis-Hastings, with a truncated multivariate normal proposal for the 11 parameters and covariance chosen to be 2.562 2 /11 times the estimated posterior covariance from pilot runs (this scaling being recommended by Sherlock et al., 2015). We considered a situation that is challenging for a particle filter, with a relatively small measurement noise of σ = 0.01.
We ran 40 chains of 1000 iterations of pMCMC and eMCMC on this data. All were initialised from the parameters at which the data was generated and used a burn in of 500 iterations. Based on the scheme described in Section 2.2, we chose 1000 particles for the BPF in pMCMC, and 100 ensemble members for the EnKF in eMCMC. Both algorithms were implemented with early rejection schemes, as detailed in Section 3.4. In both cases the early rejection results in a reduction in computational cost of approximately a factor of two; with this scheme each iteration of pMCMC took an average of 1415 s, compared to the average of 91 s for eMCMC. The mean acceptance rate for pMCMC was 0.33%, compared to 0.93% for eMCMC, indicating that eMCMC is (by this measure) is approximately three times as efficient whilst being more than 15 times faster. Pilot runs on longer simulated time series suggest that the efficiency of eMCMC (relative to pMCMC) improves as the length of the time series increases, but in these cases the computational cost of pMCMC was too large to permit a rigorous comparison. Kernel density estimates of the marginal posterior of each parameter are shown in Figure 10: we observe that the posteriors obtained by both methods are similar.

Discussion
In this paper we replace the BPF with the EnKF within a pMCMC algorithm. We have demonstrated on a variety of examples that significant computational gains can be achieved without sacrificing much on posterior accuracy.
We expect eMCMC to work relatively well on applications with characteristics that may reduce the performance of particle filters such as: intractable transition densities for complex models, sparse observations, small observation variance, surprising observations and/or model misspecification. Our methods may perform comparatively less well for models that have some level of tractability that permit more advanced particle filters. However, for the Markov jump process applications we obtained competitive results even when particle filters exploited cleverly-designed guided proposals and where eMCMC simply simulated directly according to the evolution density. In some sense, eMCMC is a middle ground between exact-approximate particle MCMC and pure data assimilation methods; we attempt to gain speed improvements over pMCMC via some data assimilation ideas and obtain more accurate posterior approximations compared to pure data assimilation by using a similar algorithm structure to pMCMC.
If exact posterior inferences are essential, there are likely to be ways to exploit our EnKF approach to improve computational performance. For example, it could be used as the cheap approximate likelihood within a delayed-acceptance MCMC algorithm (e.g. Sherlock et al., 2017 and or importance sampling scheme (Franks and Vihola, 2017). Alternatively, we might bridge our approximate posterior with the true posterior using sequential Monte Carlo. Further, our approach could be used in pilot MCMC runs to more quickly identify the regions of the parameter space with non-negligible posterior support and assist MCMC tuning generally. Particularly in the posterior tails we find that the EnKF likelihood estimator has significantly lower variance than the BPF likelihood estimator.
It is important to note that there will likely be many applications where the EnKF approximation may not be appropriate. The approach relies on being able to approximate the filtering distribution reasonably well with a Gaussian density. However, our paper illustrates that there are a wide class of models where our approach can provide reasonable accuracy. Further, Katzfuss et al. (2019) present a hierarchical approach for allowing non-Gaussian observation densities with EnKF methods, which would also be applicable to our approach.
We chose the EnKF to approximate the likelihood in this paper since it respects non-linear evolution models. However, we note that other approximate KF methods could be adopted. For example, the extended KF and its variations/extensions (see, for example, Law et al., 2015, chap. 4 andAsch et al., 2016, chap. 6). Further, for problems with higher state space dimensions, innovations in the EnKF literature such as localisation and inflation could be considered (see, for example, Evensen, 2009, chap. 15). Investigating these different approximate filters is an avenue for future research.
An alternative to MCMC is state augmentation: processing the data once to infer both θ and x. In Section 4.2 we implemented one such method, particle EnKF, and found that eMCMC performed better in terms of speed and accuracy. Nonetheless, this or other versions of state augmentation may well perform better in other settings. Further, our method is strictly an offline method whereas particle EnKF can also be applied in an online setting (see Table 1 of Katzfuss et al. (2019)). Alternatively, we suggest that our approach could be incorporated into the SMC 2 algorithm of Chopin et al. (2013), which uses an MCMC kernel for jittering particles and thus preserves the current target. We leave that for further research.
We did not consider posterior inference for the hidden states in this paper. It is possible to combine our method with the ensemble Kalman smoother of van Leeuwen and Evensen (1996), as proposed in Katzfuss et al. (2019). The framework described in Katzfuss et al. (2019) may also be used to extend the methods in this paper to nonlinear non-Gaussian measurement models through the use of a transformation.
In this paper we compared the most commonly used particle filter (the BPF, except in the Markov jump process examples) and EnKF (the stochastic EnKF) within MCMC algorithms. In future work it would be interesting to compare extensions to both approaches. Extensions to the BPF are familiar to many in computational statistics (e.g. adaptive resampling, MCMC rejuvenation moves, the auxiliary PF) and the improvements they can bring to particle MCMC algorithms are relatively well understood. In the paper we have seen how ideas previously used in the pMCMC context (i.e. using Quasi Monte Carlo, and the correlated approach) can also be exploited in the EnKF case. Other extensions and alternatives to the stochastic EnKF from the DA literature are also possible and have the potential to provide further improvements in the pM-CMC setting. Examples are: the deterministic EnKF (Tippett et al., 2003), which uses a deterministic rather than a stochastic transformation in the shift step, which may