Generalised linear mixed model analysis via sequential Monte Carlo sampling

We present a sequential Monte Carlo sampler algorithm for the Bayesian analysis of generalised linear mixed models (GLMMs). These models support a variety of interesting regression-type analyses, but performing inference is often extremely difficult, even when using the Bayesian approach combined with Markov chain Monte Carlo (MCMC). The Sequential Monte Carlo sampler (SMC) is a new and general method for producing samples from posterior distributions. In this article we demonstrate use of the SMC method for performing inference for GLMMs. We demonstrate the effectiveness of the method on both simulated and real data, and find that sequential Monte Carlo is a competitive alternative to the available MCMC techniques.


Introduction
Effective strategies for generalised linear mixed model (GLMM) analysis continues to be a vibrant research area.Reasons include: • GLMMs have become an indispensable vehicle for analysing a significant portion of contemporary complex data sets.• GLMMs are inherently difficult to fit compared with ordinary linear mixed models and generalised linear models.
• Existing strategies involve a number of trade-offs concerning, for example, approximation accuracy, computational times and Markov chain convergence.
Overviews of the usefulness and difficulties of GLMM-based analysis may be found in, for example, [23,27] and [28].
Most practical GLMM methodology falls into two categories: analytic approximations (e.g.[2]) and Monte Carlo methods (e.g.[6]).Monte Carlo methods have the advantage of providing direct approximations to quantities of interest [1].On the other hand, analytic approximations, such as Laplace approximation, are indirect and prone to substantial bias (e.g.[3]).The most common Monte Carlo approach is Markov Chain Monte Carlo (MCMC), where approximation accuracy is associated with Markov chain convergence.
[30] is a recent example of research concerned with practical GLMM analysis via Markov chain Monte Carlo.Those authors explored use of the MCMC computing package WinBUGS and showed it to exhibit good performance for a number of examples.
One of the major difficulties associated with using MCMC is the need to assess convergence.Popular methods for convergence assessment often rely on the comparison of multiple sample output; see [7] for a comparative review.These methods can invariably fail to detect a lack of convergence and one needs to be cautious when taking such an approach.Another major drawback of MCMC is the difficulty in designing efficient samplers for complex problems.The use of historical information from MCMC sample paths has to be treated very carefully, so that the equilibrium distribution of the Markov chain is not disturbed.Various methods have been proposed in the literature, (see [14]), however the practical applicability of these so-called adaptive methods can be limited.
Both problems associated with MCMC discussed above are inherently due to the Markovian nature of the MCMC sampler.Sequential Monte Carlo (SMC) methods provide an alternative framework for posterior sampling, which is not dependent on the convergence of a Markov chain as in the MCMC sampler case.Though careful assessment of posterior samples is also applicable in the SMC case, these are more in line with the standard Monte Carlo methods.SMC samplers can be seen as an extension of the well known importance sampling method.The fact that SMC methods do not rely on Markov chain theory means that it is a more flexible sampler.In the sense that for example, if the historical sample path of the SMC sampler is informative for the design of an efficient algorithm, this can be done quite easily in the SMC framework.In this article we show that sequential Monte Carlo methods provide an effective means of Bayesian GLMM analysis.We provide a general yet simple framework for efficient design of the sampler, and demonstrate that this approach is a viable alternative to MCMC, and since SMC samplers require a number of user-specified inputs, we will give recommendations in the GLMM framework on how these are chosen.
Section 2 contains a brief summary of Bayesian approaches to generalised linear mixed models.In Section 3 we provide details on analysis for such models via sequential Monte Carlo sampling.In Section 4 we present two examples.In a simulated Poisson regression example, we compare the efficiencies of the SMC sampler with alternative Monte Carlo methods, and then demonstrate the effectiveness of the SMC sampler in a binary logistic regression example involving real data.Some comparisons of algorithm efficiencies for the two examples are carried out in Section 5 and concluding remarks are given in Section 6.The software used for this paper is available from the authors on request.

Bayesian generalised linear mixed models
GLMMs for canonical one-parameter exponential families (e.g.Poisson, logistic) and Gaussian random effects take the general form [u|G] ∼ N (0, G) where here, and throughout, the distribution of a random vector x is denoted by [x] and the conditional distribution of y given x is denoted by [y|x].In the Poisson case b(x) = e x , while in the logistic case b(x) = log(1 + e x ).An important special case of ( 1)-( 2) is the variance components model [u|σ 2 u1 , . . ., σ 2 uL ] ∼ N (0, blockdiag 1≤ℓ≤L (σ 2 uℓ I q ℓ )).
(3) where q ℓ is the number of elements in u ℓ .While (3) is not as general as (1)-(2) it still handles many important situations such as random intercepts and generalised additive models [30].With simplicity in mind, we will focus on this GLMM for the remainder of the paper.However, the methodology of Section 3 is quite general and is extendable to more elaborate GLMMs.
In this study we have worked with diffuse conjugate priors although, once again, the methodology extends to other types of priors.To ensure scale-invariance all continuous predictors are standardised at the start of the Bayesian analysis.The prior on β β β is a diffuse Gaussian: for some large σ 2 β > 0. The prior for (σ 2 u1 , . . ., σ 2 uL ) is assumed to have independent components; i.e.
uℓ ] could be considered [18].These include an inverse gamma distribution, a uniform distribution, and a folded Cauchy distribution.In this paper we use a conditionally conjugate inverse gamma distribution: [ This prior distribution was advocated by [30] for A uℓ = 0.01.The prior is therefore fairly non-informative, yet results in a slightly simpler sampling procedure.It will be convenient to introduce some additional notation to enable the model to be described more succinctly.We start by writing We also write q β for the number of elements in β, and for the prior covariance of ν ν ν.Writing σ σ σ 2 for (σ 2 u1 , . . ., σ 2 uL ), we can then combine (3), ( 4) and ( 5) to give the joint density of all parameters and data: From this, and noting that ν ν ν , it is clear that the density of the posterior distribution of the parameters is simply proportional to the function In Section 3, we will develop a sequential Monte Carlo sampler to produce samples from the distribution proportional to π.

Sequential Monte Carlo sampling
The Monte Carlo approach to GLMM analysis performs inference by drawing samples from the joint posterior distribution of the parameters θ θ θ = (β β β, u, σ 2 u1 , . . ., σ 2 uL ).We write π(θ θ θ) for the (unnormalised) density of this posterior distribution.Instead of using a Markov chain with π as its stationary distribution to produce these samples, the sequential Monte Carlo (SMC) sampling method is a generalisation of importance sampling that produces a weighted sample from π while retaining some of the benefits of MCMC analysis [8].
The use of SMC for static problems (as opposed to particle filters for dynamic problems; [12] requires the introduction of auxiliary distributions π 0 , π 1 ,. . ., π S−1 .At stage s of the sampler we use a (weighted) sample from the previous distribution π s−1 to produce a (weighted) sample from π s .We set π S = π so that after S stages we have a sample from the posterior distribution of interest.The introduction of the intermediate distributions allows the initial distribution π 0 to be gradually corrected to resemble the target distribution π, and can often overcome problems such as particle depletion where, if the two consecutive distributions are too dissimilar, then a small number of particles carry all the weight in the final sample.
The auxiliary distributions can be constructed in several ways: [5] introduces the observations incrementally to evolve the distribution from the prior to the posterior; [13] uses a similar technique, but increases the size of the state space as more observations are added; [8] use and π 0 is chosen to be the prior distribution for the parameters.In this article, due to the diffuse nature of the prior distribution, the initial distribution π 0 is instead chosen to be a multivariate Normal distribution with mean and covariance matrix chosen based on estimates obtained using classical methods for fitting GLMMs.
The SMC sampler algorithm starts by sampling N samples, termed "particles", from the initial distribution π 0 .Denote by θ θ θ 0 i the ith particle at initial stage s = 0, and allocate weight w 0 i ≡ 1 to each of the N particles, so that {θ θ θ 0 i , w 0 i } is a weighted sample from π 0 .The SMC sampling technique uses the weighted particles from distribution π s−1 to produce particles from distribution π s through moving, reweighting and (possibly) resampling; see [8].For simplicity, the formulation we use is that described in detail in Section 3.3.2.3 of that paper, which essentially results in the resample-move algorithm used by [5] and [19].This is also similar to the annealed importance sampling method of [24], but the use of resampling within the algorithm greatly improves the efficiency of the method.Writing θ θ θ s i for the ith particle at stage s, at each stage 0 < s ≤ S of the algorithm we perform the following steps: .
, w s i } is now a weighted sample from π s .Resample If the effective sample size (ESS, [20]), defined as ( where k is some constant typically taken to be 1/2, then we perform stratified resampling [21].ESS estimates the number of simple random samples from the target distribution that are required to obtain an estimate with the same Monte Carlo variation as the estimate using the N weighted particles.Resampling refers to a suite of techniques that replicate the particles in such a way that the expected value of particle-based estimators is retained, but particles with low weights are discarded and particles with high weights multiplied; see [11] for a summary and comparison of several such approaches.This standard practise in the sequential Monte Carlo literature allows further computational effort to focus on samples that are likely to contribute non-negligibly to the final estimate.Finally, resampled particle weights are reset to {w s i } ≡ 1. Move Let { θ θ θ s , w s i }, i = 1, . . ., N denote samples from at the current distribution π s after reweighting and (possibly) resampling.To increase particle diversity we replace each sample according to where K s is an MCMC transition kernel that admits π s as stationary distribution.[15] provides detail on MCMC transition kernels.
It is known [8] that this particular formulation of the SMC sampling is suboptimal, in terms of the variance of the importance weights {w s i }, especially if the distributions on consecutive stages are too far apart.However, since the optimal formulation is intractable, and for the static problem we have here it is easy to ensure that the difference between π s−1 and π s is small, we use this simpler formulation.(Contrast this situation with that of an SMC algorithm for a dynamic problem, or the technique of [5] where data arrive over time and there is no control over the distance between π s−1 and π s .) The "parameters" of the algorithm that must be chosen when implementing this sampler are therefore: • the initial distribution π 0 , • the sequence of values γ s that govern the rate of transition from the initial distribution π 0 to the posterior distribution π, • the transition kernels K s , used to move the particles within the distribution proportional to π s , and • the number of particles N .
• the total number of distributions S. Specific choices of these parameters used in this paper are discussed in the following subsections.We give a more algorithmic description of our method in the Appendix.

Initial distribution π 0
As previously observed, using the prior distribution as an initial distribution is flawed in this case, since the prior is highly diffuse.Instead we use the penalised quasi-likelihood (PQL) method [2] to obtain an approximate fit of the model.Let ν ν ν PQL and σ σ σ 2 PQL be the estimate of ν ν ν and σ σ σ 2 obtained using PQL.We will calculate a normal approximation of the posterior distribution of ν ν ν centred at this approximate maximum likelihood estimate, which can then be used to construct an initial distribution π 0 for the SMC sampling procedure.Note from (6) that where f is some function that does not depend on ν ν ν.It is a simple calculation to see that the matrix of second derivatives with respect to components of ν ν ν is we therefore initialise our algorithm by taking a normal distribution for ν ν ν with mean ν ν ν PQL and covariance matrix where the entries in V PQL are taken from σ σ σ 2 PQL .It remains to specify a distribution for the variance vector σ σ σ 2 .We have found it convenient to specify this conditional on ν ν ν, and of a form that is consistent with the posterior distribution π.We take .e. the σ 2 uℓ are conditionally independent given ν ν ν, and each has an inverse gamma distribution depending on the corresponding components of u.Hence, an initial sample from π 0 can easily be generated by first sampling from the normal distribution for ν ν ν then sampling the σ 2 uℓ from their conditional distributions.Furthermore, we will see in Sections 3.2 and 3.3 that this results in simple conditional distributions for σ 2 uℓ at all stages of the sampler.Putting together the initial distributions of ν ν ν and σ σ σ 2 , we see that In the GLMM examples of this paper, there is no reason to suspect that the posterior distribution is particularly spread out or multi-modal.Hence this π 0 is sufficient, as demonstrated by the fact that several different Monte Carlo methods provide identical inference in the examples of Section 4. In other examples where these complications are likely to occur, π 0 should be chosen with an inflated variance, or to be a t distribution, to help dominate the posterior distribution.Note that multi-modality is less of a problem for the sequential Monte Carlo approach than it would be for MCMC, since there is no difficulty in having samples in both modes simultaneously, whereas an MCMC approach must move between the nodes through areas of low posterior probability.

Sequence of intermediary distributions
In this section we describe the sequence of distributions used to transition from π 0 to π S = π.Recall that we choose to use the formulation (7).Using (10) and ( 6) it is clear that the intermediate distributions are proportional to π s where In the absence of any additional information about the shapes of these distributions, it is difficult to specify a sensible generic sequence of γ s values.Hence for the rest of the paper we choose to increase γ s from γ 0 = 0 to γ S−5 = 1 in a linear fashion, that is, values of γ s are sequentially incremented by the same amount.Additionally, we append γ S−4 = • • • = γ S = 1 to this sequence to give five stages at the end of the sampler on which the particles are not resampled.This means that the final sample is well spread out over the distribution π (it was found that if resampling happened too close to the end of the sampler then several samples might be identical, resulting in poor density estimates being produced using the standard techniques).
It is an interesting and open research question as to whether the sequence γ s can be chosen in a more principled manner.One option would be to choose the sequence in advance using some properties of the distributions π 0 and π.An alternative would be to choose the next γ s adaptively while the sampler proceeds through the sequence of distributions; however it is not straightforward to generalise the proofs of validity of the sampler in this case.

Transition kernels
For this paper we choose to use Metropolis-Hastings transition kernels for the parameters in ν ν ν.The choice of inverse gamma distributions for the components of σ σ σ 2 within π 0 means that we can simply use Gibbs sampling steps, [17], to update those components.At each step s we use a Metropolis-Hastings transition kernels K s .Since π 0 is an approximation to π, and π s is in some sense between π 0 and π, we use the same proposal distributions at each step s.These proposal distributions are derived from π 0 as described in this section.
We form a partition {I 1 , . . ., I J } of {1, . . ., P }, where P is the number of columns in C, so that [C I1 • • • C IJ ] is the matrix C, but with columns possibly re-ordered; and is the corresponding partition of ν ν ν.(The case J = 1 corresponds to no partitioning.)On each move step of the algorithm we move through the series of subsets I j , for j = 1, . . ., J. We apply a Metropolis-Hastings transition kernel to the components ν ν ν Ij = (ν i ) i∈Ij .
To describe the transitions we introduce the matrices Σ Σ Σ Ij , where Σ Σ Σ Ij is the conditional covariance under π 0 of ν ν ν Ij given the values of ν ν ν −Ij = (ν i ) i / ∈Ij .These can be calculated at the start of the algorithm.Recall that since π 0 is an approximation of π, the Σ Σ Σ Ij matrices therefore correspond to approximations of the conditional covariance of ν ν ν Ij given ν ν ν −Ij under the posterior distribution π.Note that we are here assuming that the approximate covariance matrices Σ Σ Σ Ij are close enough to the truth to be useful as proposal distributions in the random walk Metropolis-Hastings kernel.
The proposal distribution for ν ν ν Ij is then a normal distribution centered on the current value of ν ν ν Ij with covariance τ ν j Σ Σ Σ Ij .The acceptance probability for the move, applied after reweighting to get a weighted distribution from π s , is simply calculated from the ratio of π s values for the proposed and current values.
The scaling parameters τ ν j are by default chosen to be 2.4/ |I j | following the heuristic of [25].However in practice they are usually chosen, based on several runs of the algorithm, to ensure that the acceptance rates remain close to 0.23 (again following [25]).Details of specific choices used are given in the examples.
To update the variance parameters σ σ σ 2 , a Gibbs sampling step can be applied.Note from (11) that for each s the full conditional distribution of σ 2 uℓ is simply an inverse gamma distribution, depending on the corresponding vector of regression coefficients u ℓ .However if a different prior is used for σ σ σ 2 then Gibbs sampling will not be available and a Metropolis-Hastings update should be performed for each σ 2 uℓ in turn.

Examples
In this section we will demonstrate the methodology on two examples.The first example is a semiparametric Poisson regression model, with simulated data so that fair comparisons can be drawn with alternative MCMC approaches.The second example is a binary logistic regression involving respiratory infection in Indonesian children, with both a semiparametric component and random effects.All computations were carried out in the R language [29], using a single core AMD Opteron 2.0GHz, this is similar to running the program on a standard PC.

Semiparametric Poisson regression
In this section, we generate n = 500 Poisson random variables y i , i = 1, . . ., n from where x 1i is 0 or 1 with probability 0.5, and x 2i is uniformly sampled from the interval [0, 1].We fit model (3), with The radial cubic basis function is used to model the function f (x 2i ) = cos(4πx 2i ).This implies modelling f (x 2i ) = β x2 x 2i +Z x2i u, where for knot points κ k , chosen to be the ( k+1 K+2 )th quantile of the unique predictor values, for k = 1, . . ., K, K = 10, The glmmPQL method of the R statistical package gives an approximate MLE for the regression coefficients ν ν ν PQL , and the variance parameters σ σ σ 2 PQL .We follow the general algorithm given in Section 3.There are 13 regression coefficients to be estimated for this model, and one variance parameter.We updated each of the regression coefficients ν ν ν singly using random walk Metropolis-Hastings (RWMH) updates for the move step of the algorithm and a Gibbs update for the variance parameter.As with MCMC, the tuning of this kernel is crucial to the success of the algorithm; to achieve an acceptance rate in the MCMC step between 20-30% we set τ ν I = 1/3.We also choose the number of steps S = 105 and the number of particles N = 1000 based on preliminary runs.We will discuss the choices of N and S in more detail in Section 5.
We compare the performance of the SMC sampler simulations by monitoring the QQ-plots of samples from a simple importance sampler, a single-variable slice sampler which updates one parameter at a time and a standard RWMH sampler with the same transition kernels as used in the SMC sampler (i.e.those described in Section 3.3).Figure 1(a) shows the QQ-plot for the β 1 parameter, and the corresponding density estimates for β 1 is given in (b).With the exception of the importance sampler, which can perform badly on different simulated data sets, the remaining samplers achieved good concordance.This required 1,000 particles with 105 steps for the SMC sampler.For comparison, we used 20,000 iterations of both slice sampler and RWMH MCMC scheme, with the first 10,000 discarded as burn-in.These took approximately 1394 and 2430 seconds respectively, whereas the SMC sampler took approximately 700 seconds.The majority of the gains in computational time for the SMC sampler come from the fact that the particles at each step can be updated simultaneously without the need to cycle through a loop, compared with MCMC samplers where each iteration has to be updated sequentially depending on the value of the parameter at the previous step.In the R programming language used for this research, as with many other high level programming languages, this provides a very significant computational advantage.
The nonparametric fits of the model, calculated using the estimated posterior mean of u, are displayed in Figure 2. The model has successfully recovered the nonlinearity in the dependency on x 2 and fits the data well.

Example: Respiratory infection in Indonesian children
Here we apply sequential Monte Carlo algorithm to an example involving respiratory infection in Indonesian children (see [10,22]).The data contain longitudinal measurements on 275 Indonesian children, where the indicator for respiratory infection is the binary response.The covariates include age, height, indicators for vitamin A deficiency, sex, stunting and visit numbers (one to six).
Previous analyses have shown the effect of age of the child to be non-linear, hence we use a logistic additive mixed model of the form logit{P (respiratory for 1 ≤ i ≤ 275 children and 1 ≤ j ≤ n i repeated measures within a child. U ) is a random child effect, x ij is the measurement on a vector of the remaining 9 covariates, and f is modelled using penalized splines with spline basis coefficients u k i.i.d.N (0, σ 2 u ).As recommended by [16], we use hierarchical centering of random effects.All continuous covariates are standardised to have zero mean and unit standard deviation, so that the choices of hyperparameters can be independent of scale.where with κ k chosen to be the ( k+1 K+2 )th quantile of the unique predictor values.We take K = 20 in this example.
We use a vague prior N (0, 10 8 ) for the fixed effects.For both variance components, we use the conjugate Inverse Gamma prior IG(0.01,0.01).Other prior choices are available, see [30].Here, random walk Metropolis-hastings updates were carried out for each regression coefficients separately, with Gibbs sampling used for the variance parameters.The tuning parameters τ ν Ij for the Metropolis-Hastings update are again chosen to achieve an acceptance rate in the MCMC coeff.density summary vit.A defic.step between 20-30%, we used τ ν Ij = 3 for the fixed effect coefficients, τ ν Ij = 6 for the random effect coefficients, and τ ν Ij = 5 for the spline coefficients.Figure 3 show the results from simulation, using 1000 particles and 305 intermediate steps.The Figure shows borderline positive effect of Vitamin A deficiency, sex and some visit numbers on respiratory infection.These results are in keeping with previous analyses.Figure 4(a) shows the nonlinear effect of age; 4(b) shows the effective sample size at each of 300 sequential steps of the simulation, vertical lines indicate the occurrence of resampling.
Again, we compare the performance of the SMC sampler with the importance sampler, slice sampler and RWMH sampler with the same transition kernel as Step 3 of the SMC sampler.Results for 5,000 samples of the importance sampler, 1,000 SMC particles and 5,000 slice samples with 5,000 burn-in and 5,000 RWMH samples with 5,000 burn-in are plotted in Figure 5, good agreements are found between the SMC, slice and MCMC samplers.The sampler which performed badly appears to be the importance sampler, where in this case, the sampler appears to suffers from particle depletion where one or few particles from an area of high posterior density is dominating the other particles.Finally, the SMC sampler took approximately the same amount of time as the slice sampler at 2.8 hours and the RWMH took about 9 hours (similarly 9 hours was required in WinBUGS).In the next section we discuss the effect of the sample size and step size specifications on the efficiency of the SMC sampler.

Improving sampler performance
In this section we investigate the SMC sampler performance by looking at the effects of user-defined specifications such as the number of sequential steps (S); the number of particles to sample (N ) and block updating strategies.
Here we base efficiency comparisons on the effective sample size diagnostic calculation of [4].(Note this is different from the ESS [20] used in determining whether resampling is performed.)This diagnostic is essentially an analysis of variance approach based on several parallel runs of the algorithm, which provides the number of independent samples from the posterior distribution that would be required to gain the same degree of accuracy: higher numbers are obviously preferrable.This estimate of effective sample size is not affected by resampling and in addition, can be used to compare SMC sampler with MCMC approaches under a consistent framework.Thus, for a given parameter, we calculate the  In experimenting with block/simultaneous updating the parameters for the move step of the SMC sampler, we found that in some cases, particularly for the logistic example, naïve blocking updating (i.e., blocking regression coefficients, random effects coefficients, and spline coefficients) can in fact adversely affect the performance of the sampler.We also found that careful tuning of acceptance probabilities in the RWMH step to be between 20-30% can be crucial to the performance.Finally, we found that increasing the number of sequential distributions S, as well as increasing the number of particles N can greatly improve the effective sample size.
Figure 6 shows the effective sample size for β 1 calculated from a total of 10 independent runs of the sampler for the Poisson regression example.We calculated the diagnostic for the SMC sampler over S = 10, 20, 50, 100, 200.For each S, we implemented the sampler with a single block update using N = 1000 and N = 2000 particles, and a sampler without blocking with N = 1000.For comparison, the diagnostic for β 1 was also calculated for a (single-variable) slice sampler and a RWMH sampler, each of 20,000 iterations with 10,000 burnin.The RWMH algorithm used the same transition kernels as Step 3 of the SMC sampler, with and without block updating.
Clearly, the effective sample size of the sampler with no blocking is larger than that of the sampler with one block for all S, and increases with larger values of S and N .We can see that the effective sample size of the slice sampler is comparable to the SMC sampler with S = 50 and N = 1000, while the RWMH sampler achieves the smallest effective sample size.For the logistic regression example of Section 4.2, we found that by updating regression parameters singly, and choosing S = 200 and N = 2000 achieved an effective sample size of 2604, and S = 300 and N = 1000 achieved an effective sample size of 2377.The two samplers took about 3.8 and 2.9 hours to run respectively.As a comparison, the slice sampler with 10,000 iterations with 5,000 burn in took 2.8 hours to run and achieves an effective sample size of 1948, while a RWMH sampler of length 10,000 with 5,000 burn in achieves an effective sample size of only 177, and taking 9 hours to run.

Conclusion
In this paper we presented a general sequential Monte Carlo algorithm to produce samples from the posterior distribution for Bayesian analysis of generalised linear mixed models.The algorithm is an alternative to the popular Markov chain Monte Carlo methods.We have demonstrated that the algorithm can handle problems where the number of parameters to be estimated in the model is high.For example, in the spline formulation of the Indonesian children example, there were over 300 parameters in the model.In addition, the algorithm is generally easy to apply.We have also demonstrated that it can have substantial gains in computational time over traditional MCMC in both a simulated poisson example and a real data binomial example.Finally, perhaps the biggest advantage of SMC over MCMC samplers is the fact convergence of SMC samplers does not rely on convergence of Markov chains, which can be problematic in designing more efficient algorithms in complex problems.
We have found that in the context of Bayesian GLMM analysis, the design of the initial sequential Monte Carlo distribution may be helped by using approximate parameter estimates from classical GLMM analysis, such as using the PQL method to find MLE of the likelihood.Note that in the case where such estimates cannot be easily found, and the only sensible choice is a diffuse prior, then a SMC sampler with many more particles and sequential distributions will be needed to obtain good results.In choosing the schedule for the tempering sequence γ s in (7), we have found no substantial difference between the different types of schedules currently used in the literature, hence we recommend that a simple linear schedule be adopted in the GLMM context.We have also found that by tuning the acceptance rate of the Random-Walk Metropolis-Hastings kernel in the Move step of the SMC sampler to around 20-30% significantly improves the performance of the sampler, see [9], although this practical finding does not yet have rigorous theoretical support.
Finally, in implementing the Move step of the SMC sampler, one has some degree of flexibility when the Markov chain Monte Carlo update is used.For example, one may consider a better choice of proposal distributions for the Metropolis-Hastings algorithm, by allowing the algorithm to automatically scale a proposal distribution, see for example [5].Here a major advantage over the traditional MCMC is that the algorithm does not suffer from the restrictions associated with a Markov chain, and information from previous samples can be freely used to obtain future samples.Furthermore, one is not restricted to only MCMC type of moves in this step, other move types are possible, see [8].
However, sequential Monte Carlo algorithms are not black-box algorithms, requiring a certain amount of tuning and user input.In particular, one needs to set the number of sequential distributions (S) the number of particles to sample (N ) and tuning parameters for the Metropolis-Hastings kernels in the move step of the algorithm.
Initial sample from π 0 • Produce a sample of size N from π 0 : for each i = 1, . . ., N sample ν i from the normal distribution with mean ν ν ν PQL and covariance Σ Σ Σ, then sample σ σ σ 2 i from the conditional inverse gamma distributions (9).• Set the weights w i = 1/N for each i = 1, . . ., N .

Sequential sampling from each π s
For each s = 1, . . ., S in turn, Reweight For each i = 1, . . ., N , update w i according to γs−γs−1 then normalise the weights by setting w i ← w i / N j=1 w j .To avoid overflow and underflow problems it is recommended that logarithms be used in this step.Resample Calculate the effective sample size (ESS) using If ESS < N/2 (or if s = min{s : γ s = 1}) then resample the particles.The naive version of resampling, which introduces unnecessary Monte Carlo variation into the scheme, simply samples (with replacement) from the pool of particles, with particle i selected with probability w i .However in our implementation we use stratified resampling [21] to reduce the Monte Carlo variation.After resampling set w i = 1/N for all i = 1, . . ., N .

Move
• For each j = 1, . . ., J and each i = 1, . . ., N , generate proposals (ν ν ν i ) Ij ∼ N ((ν ν ν i ) I j , τ ν j Σ Σ Σ Ij ), 1 ≤ i ≤ N .With probability accept the proposal and set (ν ν ν i ) I j = (ν ν ν i ) I j .Otherwise reject the proposal and leave (ν ν ν i ) I j unchanged.Again, it is recommended that logarithms be used when calculating α to avoid overflow and underflow problems.Note that several parts of the ratio in the calculation of α are the same in both the numerator and denominator and need not be calculated.
• For each ℓ = 1, . . ., L, and for each i = 1, . . ., N , sample (σ σ σ 2 i ) ℓ from the inverse gamma distribution with shape A uℓ + q ℓ /2 and rate A uℓ + u ℓ 2 /2.Note that if inverse gamma distributions are not used as the prior distribution for σ σ σ 2 then sampling from inverse gamma distributions here would not result in a transition kernel that admits π s as a stationary distribution.Instead further Metropolis-Hastings can be used for each σ 2 uℓ in turn.Note that the decision to resample on the first step at which γ s = 1 means that the final sample is an unweighted sample from π. Hence standard techniques for dealing with samples from posterior distributions can be used.However the plug-in rule for the bandwidth used in the density estimates performed poorly for resampling close to step S, since some particles were identical.This is the reason that we generally set γ S−5 = 1 and finish with five applications of the transition kernel to the unweighted sample, resulting in a suitably diverse sample from π.

Fig 1 .Fig 2 .
Fig 1. QQ-plots of SMC sampler output against simple importance sampler, the slice sampler and the RW Metropolis-Hastings sampler for β 1 (a).The corresponding density estimates (b).

Fig 3 .
Fig 3. Summary of coefficients in respiratory infections in Indonesian children example.

Fig 4 .
Fig 4. Respiratory infections in Indonesian children example.(a) Posterior mean of the estimated probability of respiratory infection f (age) with all other covariates set to their average values.(b) Effective sample size over 300 distributions, vertical lines indicate instances of resampling.

Fig 5 .
QQ-plots of SMC sampler output against simple importance sampler, the slice sampler and the RW Metropolis-Hastings sampler for the coefficient of vitamin A deficiency (a).The corresponding density estimates (b).

Fig 6 .
Fig 6.Comparison of effective sample size for β 1 from the SMC sampler (over increasing number of sequential distributions (S), and the slice and RWMH samplers in the Poisson regression examples.
Radial cubic basis functions are used to fit the covariate age, where f (age) = β age age + Z age u