Bayesian Parametric Bootstrap for Models with Intractable Likelihoods

. In this paper it is demonstrated how the Bayesian parametric bootstrap can be adapted to models with intractable likelihoods. The approach is most appealing when the computationally eﬃcient semi-automatic approximate Bayesian computation (ABC) summary statistics are selected. The parametric bootstrap approximation is used to form a proposal distribution in ABC algorithms to improve the computational eﬃciency. The new approach is demonstrated through the sequential Monte Carlo and the ABC importance and rejection sampling algorithms. We found eﬃciency gains in two simulation studies, the univari-ate g-and-k quantile distribution, a toggle switch model in dynamic bionetworks, and in a stochastic model describing expanding melanoma cell colonies.


Introduction
For many complex models in biological, medical and ecological sciences, the likelihood functions are not available in an analytical form and are computationally intractable, so computing the posterior distribution for these models is challenging. To overcome this limitation, approximate Bayesian computation (ABC), a class of Bayesian "likelihoodfree" techniques, has emerged, which avoids direct evaluation of the likelihood through repeated simulation of data from the model. As such, ABC methods permit Bayesian inference for models with intractable likelihoods, when simulation from the model for a range of parameter values is feasible.
ABC methods have been successfully applied in a wide range of problems such as population genetics (Beaumont et al., 2002), infectious diseases (Drovandi and Pettitt, 2011c;Tanaka et al., 2006), astronomical model analysis (Cameron and Pettitt, 2012) and cell biology (Vo et al., 2015a,b). ABC is often implemented using one of the three main algorithms: importance and rejection sampling (ABC IS), Markov chain Monte Carlo (MCMC ABC) or sequential Monte Carlo (SMC ABC). In these ABC algorithms, a good proposal is crucial to improve the computational efficiency.
It has been shown that, for some cases, bootstrap methods are useful for numerical calculation of Bayes posterior distributions (Newton and Raftery, 1994;Efron, 2012 Rubin, 1981). In particular, Efron (2012) proposed the use of a parametric bootstrap and a re-weighting scheme to approximate posterior distributions and their expectations. This approach is efficient and computationally straightforward. However, it depends upon an analytical expression for the sampling density of a statistic and a point estimate of the parameter, which generally requires the availability of the likelihood function. We show in this article that nevertheless parametric bootstrap samples can be used to construct a good proposal distribution in the context of ABC.
The main purpose of this paper is to present new and efficient ABC algorithms by introducing a good proposal distribution. The proposal distribution is constructed using the semi-automatic approach to ABC of Fearnhead and Prangle (2012) and the Bayesian parametric bootstrap (PB) of Efron (2012). The semi-automatic ABC is chosen because it provides summary statistics as point estimators, which are then used in the Bayesian PB process. In this paper, the resulting PB distributions, which require very few model simulations, are used as an initial importance distribution for SMC ABC algorithms and ABC IS. Hereafter, we refer to these new ABC algorithms as PB SMC ABC and PB ABC IS algorithms, respectively. We also note that, in some cases, the PB distribution can be a useful posterior approximation in its own right.
We first apply the methodology to simulated datasets from two examples: (i) the g-and-k quantile distribution of Rayner and MacGillivray (2002) and (ii) the toggle switch model of Bonassi et al. (2011) to validate the approach and compare it with the standard SMC ABC algorithm in Vo et al. (2015a).
We then apply the new collection of methods to a stochastic model that describes the expansion of human malignant melanoma cells (MM127) in a circular barrier assay (Vo et al., 2015a;Treloar et al., 2013). The cell expanding populations are governed by three parameters: cell motility, cell proliferation and cell-to-cell adhesion. Simulation of cell experiments from the stochastic model is highly computationally intensive, especially for a relatively large cell proliferation rate. Thus, inference requires an ABC algorithm that is efficient in terms of the number of model simulations.
This article is organized as follows. Background of ABC methodology is briefly reviewed in Section 2, followed by the Bayesian PB of Efron (2012) in Section 3. We demonstrate how the Bayesian PB can be efficiently adapted to likelihood-free problems and how to use this result to form a proposal for ABC algorithms in Section 4. Sections 5 and 6 contain the results from the two simulation studies, while Section 7 illustrates results from experimental data of human melanoma cells. The article is concluded with a discussion in Section 8.

Approximate Bayesian computation
Let θ ∈ Θ be the parameter of a statistical model with a prior distribution p(θ) and an intractable likelihood, p(y|θ), where y is the observed data. ABC avoids likelihood evaluations by simulating data x from the model and comparing it with the observed data. In ABC, direct comparison between the observed and the simulated datasets is often inefficient, especially when the data is high dimensional (Blum, 2010). Thus, we consider a vector of summary statistics s(·) = {s 1 (·), . . . , s d (·)} and denote s obs = s(y) and s = s(x). To measure the closeness between x and y, via the closeness between s and s obs , we use a distance metric ρ(s, s obs ) and a kernel weighting function K (ρ(s, s obs )), where > 0 is a bandwidth that is referred to as the ABC tolerance. The ABC posterior is constructed as with p (s obs |θ) = K (ρ(s obs , s))p(s|θ)ds.
The general guidance for choosing the summary statistic is that it should be low dimensional but still incorporates most of the information contained in the data (Blum et al., 2013). In additional to the loss of information introduced by the use of an insufficient summary statistic, the ABC posterior is also affected by the tolerance , which is necessary to ensure computational feasibility. The smaller leads to a more accurate approximation in (2) but more computation. In practice, the kernel weighting function K (ρ(s obs , s)) is often chosen as an indicator function, 1 {ρ(s obs ,s)≤ } , that is unity if the condition involving the discrepancy is satisfied and is zero otherwise.
ABC algorithms for sampling from (1) generally consist of three major steps: sampling a proposed parameter θ , simulating data x as per the observed data structure from the model p(·|θ ) and comparing x with the observed data y. Different ABC algorithms are distinguished by the process of sampling proposed parameters. Some of these are discussed next. Despite on-going research on developing more efficient ABC algorithms, often many hundreds of thousands or millions of model simulations are required.
We suggest that the efficiency of ABC algorithms can be improved if there is a good analytical approximation to the posterior p(θ|s obs ) that can be obtained quickly. For example, such an approximation can be used to form importance distributions for importance sampling (IS) or sequential Monte Carlo (SMC) based algorithms. In Section 3, we describe an adaptation of the Bayesian parametric bootstrap that can be used to form this initial approximation. The remainder of this section briefly discusses the ABC IS algorithm (Fearnhead and Prangle, 2012) and the SMC ABC algorithm of Vo et al. (2015a). Fearnhead and Prangle (2012) provide an importance and rejection sampling implementation of ABC for which the output is a weighted sample of values from the ABC posterior distribution (Algorithm 1). For simplicity, we set up the acceptance-rejection step (line 6) using the indicator function 1 {ρ(s obs ,s)≤ } .

ABC importance and rejection sampling
In Algorithm 1, a proposed parameter θ is drawn from an importance distribution, g(θ). Each proposed value θ is assigned a weight proportional to p(θ )/g(θ ) if it produces simulated data that satisfies the discrepancy condition, otherwise its weight is zero. When g(θ) = p(θ), this algorithm becomes ABC rejection sampling which is similar to the algorithm of Beaumont et al. (2002). (Fearnhead and Prangle, 2012).

Algorithm 1: ABC importance and rejection sampling (ABC IS)
1 Given observed data y, N > 0, summary statistics s(·); a proposal density g(θ), with g(θ) > 0 when prior p(θ) > 0; a density kernel K(·), with max{K(·)} = 1 and a bandwidth > 0 2 compute s obs = s(y) The advantage of this algorithm is that it generates independent samples and the algorithm can easily be run in parallel. However, if a good importance distribution is not available and the prior distribution is substantially different from the posterior, this algorithm results in low acceptance rates and is computationally inefficient.

SMC ABC
In this paper, we use the SMC ABC algorithm of Vo et al. (2015a) (Algorithm 2), which, for the motivating application in cell biology, was shown to have improvements over the algorithms of Sisson et al. (2009);Beaumont et al. (2009);Drovandi and Pettitt (2011a). For a non-increasing sequence of tolerances { t } T t=1 , the SMC ABC algorithm aims to obtain a set of N weighted particles from the following sequence of targets In brief, the SMC ABC algorithm of Vo et al. (2015a) integrates the advantages of automatically determining tolerance values from Pettitt (2011a), Del Moral et al. (2012), and the advantage of geometric sampling from a proposal distribution until an acceptable parameter value is obtained (Sisson et al., 2009;Beaumont et al., 2009). In this SMC ABC algorithm, N (·, ·) denotes the multivariate normal distribution, and N α = αN is the number of particles to keep at each iteration among the N particles, with α ∈ [0, 1]. The stopping criterion is either the minimal acceptance rate, p accmin , or a target tolerance, T . For many applications of ABC, the most computationally intensive procedure is the model simulation process. Therefore, we aim to develop efficient ABC algorithms that can achieve a low tolerance value within a manageable number of model simulations. To achieve this, we incorporate an importance distribution at the initial iteration, t = 0, of the SMC ABC algorithm. Section 3 will describe how to obtain such an importance distribution while the detail of the algorithms will be provided in Section 4.
set Σ t as twice the weighted empirical covariance using (θ N −Nα and the number of trials, N trials = 0

Summary statistics and discrepancy function
In the literature, various approaches have been proposed to choose useful summary statistics including a sequential scheme based on the principle of approximate sufficiency (Joyce and Marjoram, 2008), partial least-squares regression (Wegmann et al., 2009), indirect inference (Drovandi et al., 2011) and machine learning methods (Aeschbacher et al., 2012). In this paper, we implement the method proposed by Fearnhead and Prangle (2012) who use the estimates of the posterior means of θ as the summary statistics. These posterior means are obtained via regression.
are made from the prior distribution. If the prior p(θ) is diffuse then draws of θ i can be restricted to regions of non-negligible posterior density found using a pilot run of ABC. Each parameter θ i is then used to simulate a dataset x i from the model, We fit the regression model with zero mean error ξ i and f (·) is a vector-valued function of the data (or a set of summary statistics if using the full data is not feasible). Different choices of f (·) could be considered to obtain a better fit in the regression. In this paper, to find the best regression model, we employ a stepwise (bidirectional) regression method and the Bayesian information criterion (BIC) for model selection.
The expected value of θ i given the simulated data where the intercept parameterα and the vector of regression coefficientsβ is estimated from the best regression model. The derived summary statistic, s F P (y) =α +β T f (y), is then interpreted as the estimated posterior mean of θ obtained from the regression procedure. Thus, using this dimension reduction approach, we have only one summary statistic per parameter. In practice, if the parameter θ is vector valued, then a multiple linear regression model (3) is fitted to each component of θ in turn, with possibly a different function f (·) and different estimatesα andβ.
The derived summary statistics s F P can have different scales and correlations between summaries. Thus, we consider the Mahalanobis distance to compare the summary statistics of the observed and the simulated data. This discrepancy function is given by where W is an estimate of the covariance matrix of the summary statistics s F P . To estimate W , we generate 100 simulated datasets {x i |θ} 100 i=1 , using our point estimatê θ = s F P (y), obtained from the regression step above. For each simulated dataset x i , we compute the vector of summary statistics

Bayesian parametric bootstrap
In this section, the summary statistics s(·) are assumed to be an estimator of θ. Given an observed data set, we can compute an estimate of θ,θ, as a function of s obs . For simplicity we denoteθ = s obs .
The Bayesian PB of Efron (2012) independently generates B values of the statistic s j = s(x j ), j = 1, . . . , B where x j is a simulated data set from the model p(·|θ). Each sample estimate of θ, θ j = s j , j = 1, . . . , B, is a PB replication ofθ. By re-weighting these points with an importance weight we obtain a weighted sample from the posterior distribution of θ givenθ (Efron, 2012).
If the likelihood function of the summary statistics p(s|θ) can be evaluated then the importance weights (4) can be found. However, for models with intractable likelihoods, p(s|θ) cannot be evaluated.
We consider a special case where the weights in (4) can be simplified. If the likelihood for s is symmetric in s−θ (s and θ must be the same dimension), there exists a symmetric density h such that for j = 1, . . . , B. Therefore, the importance weights for the posterior become just the prior evaluated at the bootstrap replication s j Without the assumption of exact symmetry, the PB sample may give a poor approximation to the likelihood [p(s|θ)] s=s obs ,θ=sj , j = 1, . . . , B. The weighted samples (6), where θ j = s j , j = 1, . . . , B, may give a poor approximation to the posterior, p(θ|s obs ).

Coupling Bayesian parametric bootstrap with ABC
This section demonstrates two main innovations: how to obtain the PB distribution for models with intractable likelihoods and how to incorporate this PB distribution in ABC algorithms to improve their efficiency.
To apply the Bayesian PB idea for models with intractable likelihoods, it is computationally too intensive to takeθ as a point estimate of the ABC posterior p (θ|s obs ) obtained from the standard ABC IS or SMC ABC (Algorithms 1 and 2 above). The main idea here is to perform the Bayesian PB withθ derived from the semi-automatic approach of Fearnhead and Prangle (2012) since it is very fast to obtain.
Sampling simulated data x for ABC requires different values of θ while obtaining the PB distribution only requires sampling x for fixedθ = s F P (y) obtained from the regression approach in Section 2.3. Assuming that the likelihood for the summary statistic p(s|θ) has the symmetry property so that the following holds then the weighted samples {θ j , W j } B j=1 give an approximation to the posterior p(θ|s obs ). Here θ j = s j and the importance weights W j are proportional to the prior density (6). This approximation is extremely computationally efficient having used only (N pilot + M + B) simulations from the model p(·|θ). Here, N pilot is the number of model simulations from the ABC pilot run. Pseudo code to perform the Bayesian parametric Algorithm 3: Likelihood-free Bayesian PB algorithm.
1 Given observed data y, prior distribution p(θ) and integers M, B > 0 2 Optional: Perform an ABC pilot run to obtain a training region of θ 3 Generate M synthetic data sets for the regression: Simulate θ i from the truncated region as appropriate, and generate x i ∼ p(·|θ i ), i = 1, . . . , M 4 Perform a regression analysis: Compute the weight W j ∝ p(θ j ) 10 end bootstrap in this section is provided in Algorithm 3. For the ABC pilot run, one could adopt any ABC algorithm.
The appeal of likelihood-free Bayesian PB is that once the regression model has been fitted, all the samples generated through the PB approach are kept. Li and Fearnhead (2016b) demonstrated that in standard ABC the ABC tolerance must reduce with an increase to the size of the sample, so that low acceptance rates are still required even in large sample settings to obtain estimators with good asymptotic properties. One exception as determined by Li and Fearnhead (2016a) is that when a good proposal distribution is used we are able to accept more of the proposals as the sample size increases if a post-hoc regression adjustment is applied. However, in finite sample scenarios that we are likely to encounter in practice, ABC methods typically suffer from very low acceptance rates. We show in the Supplementary Material (Vo et al., 2019) that the standard ABC IS in Algorithm 1 would require 70 times more model simulations than the PB ABC IS algorithm when a good proposal obtained from the PB distribution is used. Thus, the Bayesian PB approach in Algorithm 3 will be significantly more computationally efficient than standard ABC algorithms, however the PB distribution is likely to differ in some way to the standard ABC posterior. Furthermore, Li and Fearnhead (2016b) show that the asymptotic efficiency of ABC does at least improve with a good proposal distribution. Therefore, we consider harnessing the PB ABC approximation, which can be obtained quickly, to form a proposal distribution for ABC algorithms.
We denote an analytic approximation of the weighted sample {θ j , W j } B j=1 resulting from the Bayesian PB algorithm above as g(θ), which can be taken as a parametric distribution such as a multivariate normal or a kernel density estimate. It is important that the tails of the proposal distribution are thicker than that of the target distribution (for an argument of this in the ABC context, see Li and Fearnhead (2016b)). This can be achieved by extending the multivariate normal proposal to a multivariate t by choosing a small number of degrees of freedom or inflating the variance of the multivariate normal Algorithm 4: PB SMC ABC algorithm. SMC ABC algorithm with initial proposal density g(θ). In this paper we form g(θ) using the PB method, creating the PB SMC ABC algorithm. 1 Given N , N α , p accmin , T , a summary statistic function s(·) and s obs = s(y). 2 Obtain the Bayesian PB distribution, g(θ), as described in Algorithm 3 3 Set p acc = 1, t = 0 Set Σ t as twice the weighted empirical covariance using (θ N −Nα and the number of trials, N trials = 0 proposal. In this paper we choose the latter and scale the sample covariance matrix by a factor of 2. The approximation g(θ) is used as a proposal density in the ABC IS (see Algorithm 1, PB ABC IS) or an initial importance distribution for the SMC ABC algorithm (see Algorithm 4, PB SMC ABC), and we discuss other options in Section 8. Figure 1: Estimated probability density based on the simulated dataset from the g-and-k distribution.

Model and data
We now validate our new collection of methods using synthetically generated data from the g-and-k quantile distribution (Rayner and MacGillivray, 2002). The g-and-kdistribution is a class of quantile distributions and it is defined by its quantile function, the inverse cumulative distribution function where z(u) is the u-quantile of the standard normal distribution and θ = (a, b, c, g, k) is the unknown parameter. Given a fixed value of c, c = 0.8 (Rayner and MacGillivray, 2002), the g-and-k distribution consists of four unknown parameters, a, b, g and k, which are related to location, scale, skewness and kurtosis, respectively. The simulated dataset consists of n = 10 4 independent draws from the g-and-k distribution with parameters θ = (a, b, g, k) = (3, 1, 2, 0.5). A uniform prior (0, 10) 4 is used for the parameters. This is similar to the example used in Fearnhead and Prangle (2012); Drovandi and Pettitt (2011b); Allingham et al. (2009). A plot of the estimated probability density function based on this dataset is shown in Figure 1. The data shows significant skewness and kurtosis.

Results
For the ABC pilot run, we use the SMC ABC approach in Algorithm 2, the set of octiles as summary statistics (Drovandi and Pettitt, 2011b) with the Euclidean distance between summary statistics and set N = 1, 000. After 18 iterations, we find that the training regions for a, b, g and k are given by ( respectively. The number of model simulations for the pilot run is 25,012 and the probability of acceptance in the last iteration is approximately 27%. For the regression procedure, we simulate M = 5, 000 datasets from the parameters that are drawn from the training regions above. For the covariates in the regression model (3), we consider a set of summary statistics, {L i } 19 i=1 , where L i , i = 1, . . . , 19 is the (0.05 × i)th quantile, rather than the entire dataset. A bidirectional stepwise regression is then fitted to determine a 4-dimensional summary statistic s F P . The point estimatê θ obtained from the regression isθ = (â,b,ĝ,k) = (2.9970, 1.0064, 2.0426, 0.4965). It should be noted that the summary statistics {L i } 19 i=1 are highly informative for all parameters a, b, g, k, since the adjusted R-square values of the regressions for the four parameters are 85.1%, 91.5%, 92.5% and 95.6%, respectively.
Using this value ofθ, we perform the Bayesian PB with B = 1, 000 (Algorithm 3). The densities of the Bayesian PB samples are plotted in Figure 2 (dashed red). We fit a multivariate normal distribution to the PB samples and use it as an initial importance distribution, g(θ), in the PB SMC ABC algorithm. In order to help ensure coverage of the tails, the covariance matrix of g(θ) is set as twice the empirical covariance matrix based on the PB samples. The ABC posterior distributions of a, b, g and k from the PB SMC ABC algorithm are plotted in Figure 2  It should be noted that, in the standard SMC ABC approach, we use an initial importance distribution formed from the pilot, a multivariate normal distribution with covariance matrix estimated from the pilot run samples inflated by a factor of two, rather than the prior distribution to avoid the extrapolation issue in the regression procedure (Fearnhead and Prangle, 2012).
For all four parameters, the posteriors resulting from standard SMC ABC and PB SMC ABC are well-defined and quite similar, as expected. However, PB SMC ABC required 35% fewer model simulations, about 303,000 simulations less than the standard SMC ABC algorithm. The PB distributions are also very close to the ABC posteriors given a very small number of model simulations (31,012).
In both algorithms, we set N = 2, 000 and a tolerance = 0.78. The effective sample size, ESS, for the PB SMC ABC and the standard SMC ABC algorithms are 1,413 and 1,390, respectively. We also compare ABC posterior distributions with the true posteriors obtained from the numerical evaluation of the likelihood function (Rayner and MacGillivray, 2002) (results not shown). The results for a and b are very accurate, suggesting that the summary statistics s F P are close to sufficient for these parameters. The results for g and k obtained from the two ABC algorithms show slight deviation from the true posteriors and also a small loss of precision. 6 Toggle switch model

Model
The performance of the new methods was also investigated on a toggle switch model describing gene expressions. Details of the model and its background were previously described in Gardner et al. (2000); Bonassi et al. (2011). Motivated from a study of dynamic cellular networks using E.coli bacterial cells (Gardner et al., 2000), Bonassi et al. (2011) proposed a discrete time stochastic model to capture the behaviour of a network with two genes in a synthetic toggle switch design. In brief, for each cell c, c = 1, . . . , 2000, we let u c,t and v c,t be the expressions of genes u and v for cell c at time t, respectively. Given an initial state (u c,0 , v c,0 ) and a small discrete time step h, the states of u c,t and v c,t are updated using the equations below (Bonassi et al., 2011): where t = 0, . . . , T , with T is the assumed time to reach steady-state. The terms ξ c,u,t and ξ c,v,t are independent standard normal random variables and they represent the intrinsic noise within cell c.
For a particular cell c, the stochastic process is only observed for gene u, at the steady-state time T . Let y c be the noisy measurements of u c,T , which is modelled by Bonassi et al. (2011) as follows:  320, 0.25, 0.15, 25, 4, 15, 4). Subfigure (A) demonstrates a time series of u c,t for a single cell c, over T = 300 time steps. Subfigure (B) shows the histogram of the synthetic dataset used in this example.
where the measurement errors η are drawn from a standard normal distribution. The model consists of seven unknown parameters, which are denoted as θ =(μ, σ, γ, α u , β u , α v , β v ). For all model simulations, we use h = 1 and T = 300 time steps as suggested in Bonassi et al. (2011). An example of the time series data for u c,t over 300 time steps is shown in Figure 3A.

Results
For the pilot run, we use the SMC ABC in Algorithm 2 with N = 200 particles, the minimum probability of acceptance at 20% and a 19-dimensional summary statistic To perform the regression procedure, we simulate M =10,000 datasets from the fitted multivariate normal distribution of the resulting posterior distributions from the pilot run. For each simulated dataset, we fit a Gaussian mixture model with three components and use the estimated means, standard deviations and the mixture proportions (after accounting for label switching by ordering the means) for the regression input. This provides eight summary statistics, together with L i and L 2 i , for i = 1, . . . , 19, producing a vector of 46 variables which are then used as explanatory variables in the regression stage. We use a bidirectional stepwise regression and fit each parameter in turn.
The point estimateθ obtained from the regression isθ = (μ,σ,γ,α u ,β u ,α v ,β v ) = (317.18, 0.25, 0.14, 24.86, 4.17, 22.90, 4.33). We found that the set of 46 explanatory variables above is only informative for μ, σ, γ and α u , explaining about 90.5%, 91.9%, 96.9% and 94.9% of the variability of these parameters, respectively. However, for β u , α v and β v , the point estimates from the regression models are very similar to the posterior means from the ABC pilot run and using the same explanatory variables only accounts for 34.2%, 33.7% and 30.7% of the variability in these three parameters, respectively. The low values of the adjusted R-square for these parameters has a strong effect in the Bayesian PB procedure. In particular, the PB distributions for these parameters have an extremely small variance. Thus, we suggest that the PB method is more suitable when the regression adjusted R-square is high, roughly 80% or above. In this case, for the PB procedure, we use the point estimate obtained from the regression for the first four parameters, and for the last three parameters, we discard the regression models and use the posterior means from the pilot run instead.
The resulting PB distributions (dashed red) with B = 1, 000 samples are presented in Figure 4 together with the final ABC posterior distributions resulting from the PB SMC ABC algorithm (solid green) and the standard SMC ABC algorithm (dotted blue). It should be noted that, in the PB SMC ABC algorithm, we use the PB samples to form an importance distribution g(θ) for μ, σ, γ, α u , while for the parameters β u , α v and β v , the proposal is formed from the pilot run instead.
For the parameters μ, σ, γ and α u , it can be seen that the ABC posterior distributions are well-defined and have much smaller variance compared to the results from Bonassi et al. (2011). The PB distributions also provide a good approximation to the marginal ABC posterior distributions, given a very small number, 18,068, of model simulations. Furthermore, the posteriors resulting from the standard SMC ABC and the PB SMC ABC algorithms are similar, as expected. Both the PB SMC ABC and the standard SMC ABC algorithms use the same summary statistics, final tolerance = 0.7 and N = 2, 000 particles. However, standard SMC ABC required 220,320 model simulations, while PB SMC ABC only used 165,210 model simulations, which is about 25% less model simulations. The reason is that, for four parameters μ, σ, γ and α u , PB SMC ABC starts from the importance distribution g(θ) which is very close to the actual ABC posteriors, whereas standard SMC ABC starts from an importance distribution formed from the pilot run.
The results in Figure 4(e)-(g) indicate that we obtain far less information for β u , α v and β v regardless of the tolerance values and the summary statistics used in the ABC analysis, indicating that the data may be not informative for these parameters. This is consistent with the results previously shown in Bonassi et al. (2011). To overcome this, one could incorporate additional information from the network such as including the information from both genes at the steady-state time T .

Application to a collective cell spreading model
We now present our main application involving a discrete stochastic model describing the expansion of melanoma cell populations (Vo et al., 2015a). Melanoma is a cancer that begins in the melanocytes and is the most dangerous form of skin cancer (Garbe et al., 2012). Melanoma is less common, approximately 5% of all skin cancer occurrences, but accounts for approximately 75% of skin cancer deaths (Australian Institute of Health and Welfare and Australasian Associate of Cancer Registries, 2012). The spatial expansion of melanoma cells is governed by various mechanisms including cell motility, cell proliferation and cell-to-cell adhesion. Estimating these mechanisms can improve our understanding of melanoma biology and its response to treatment.

Data
We applied the new ABC algorithms to analyse an experiment of human malignant melanoma cells (MM127) (Pope et al., 1979;Whitehead and Little, 1973) in a circular barrier assay. Details of the experimental protocol were described in Treloar et al. (2013). In brief, the experiment was conducted using a 24-well tissue culture plate, where each well has a diameter of 15.6 mm. Initially, 20,000 cells were evenly distributed within a metal-silicone barrier, of a diameter 6.0 mm, which was located in the centre of the well. The tissue culture plate was kept for one hour to allow the cells to attach to the surface. Subsequently, the barrier was lifted and the plate was incubated for two time durations of 24 or 48 hours. The experiment was repeated in triplicate. For each experiment, we obtained two types of images: (i) a population-scale image which shows the entire melanoma cell colony and (ii) individual-scale images which show the location of each cell in the population. For the application in this paper, we only analyse the experiments that were terminated at 24 hours. Details of the ABC analyses for experiments at 48 hours and experiments with different initial cell densities can be found in Vo et al. (2015a).
Initially, we summarise the experimental data using a high dimensional summary statistics including three radii of the entire expanding melanoma colonies, {r i } 3 i=1 , the total number of cells, {c i } 6 i=1 , and the number of isolated cells, {p i } 6 i=1 , in six subregions of the cell population. We compute {r i } 3 i=1 by locating the position of the leading edge, measuring the area of the spreading cell population and converting this area into an equivalent circular radius. We average the {c i } 6 i=1 and {p i } 6 i=1 over three replicates, to produce a total of 15 summary statistics. These processes were performed using a segmentation algorithm written with the Matlab Image Processing Toolbox (Vo et al., 2015a) and were repeated for images that were produced by the discrete model described in Section 7.2. For more details on the image analysis and how the summary statistics were obtained see Vo et al. (2015a). Table 1 shows 15 observed summary statistics that were used for the ABC analysis in this section.

Model
To describe the spatial expansion of the melanoma cell population, we use a discrete stochastic model that incorporates cell motility (unbiased random walk), cell prolif- eration and cell-to-cell adhesion on a two-dimensional square lattice with spacing Δ (Treloar et al., 2013;Vo et al., 2015a,b). Each lattice site can be occupied by at most one agent. This model is also referred to as an agent based model or individual based model.
To simulate the experiments, we use a two-dimensional square lattice of size 867 × 867, with lattice spacing Δ =18 μm, so that the width of the lattice corresponds to the diameter of the well, 15.6 mm (15,600 μm/18 μm = 867). Let C(t) be the number of agents in the discrete model at time t, P m ∈ [0, 1] be the probability that an isolated agent will attempt to step a distance Δ within a time step of duration τ , and let P p ∈ [0, 1] represent the probability that an agent will attempt to proliferate and deposit a daughter within a time step of duration τ . The strength of cell-to-cell adhesion is represented by q ∈ [0, 1]. To step from time t to time t + τ , C(t) agents are sampled, with replacement, and given the opportunity to move with probability P m × (1 − q) n , where 0 ≤ n ≤ 4 is the number of occupied nearest neighbour sites. If an agent is at position (x, y) and has an opportunity to move, it will attempt to step to either (x±Δ, y) or (x, y ± Δ), with each target site chosen with equal probability. For increasing values of q, neighbour agents adhere more tightly to each other and it is difficult for an agent to move away from its neighbours. A similar mechanism is employed for proliferation events. A proliferative agent at position (x, y) will attempt to deposit a daughter agent at (x ± Δ, y) or (x, y ± Δ), with each target site chosen with equal probability.
In this model, the cell motility rate is quantified in terms of the cell diffusivity, D, D = P m Δ 2 /4τ , and the cell proliferation rate, λ, is related by λ = P p /τ (Simpson et al., 2010). A uniform prior U (0, 1) is used for all three parameters (P m , q, P p ). For all model simulations, we use a time step duration τ as 0.04 h (Vo et al., 2015a). We apply ABC algorithms to obtain joint posterior distributions for (P m , q, P p ), then use the values of Δ and τ to rescale posterior distributions of P m and P p into posterior distributions of the biological parameters of interested D and λ, respectively.

Parameter inferences
A pilot run was conducted using the SMC ABC algorithm of Vo et al. (2015a), incorporating all 15 summary statistics (Table 1) , and using the Mahalanobis distance to compute the distance between the observed and the simulated summary statistics. We set N = 500 particles and p accmin = 0.2 as the stopping criterion. We obtain the training regions for P m , q and P p as (0.07, 0.14), (0.14, 0.43) and (0.0010, 0.0018), respectively, using only 13,925 model simulations.   Table 2: ABC posterior summary for D, q and λ for two different ABC algorithms and the bootstrap distribution. Results shown include the posterior mean (the 90% CI in the parentheses) and the coefficient of variation, CV.
A regression analysis (3) was performed for each parameter in turn, using M = 5,000 datasets that were generated by parameters in these training regions. We found that using a vector of 30 explanatory variables {s, s 2 } can produce reasonable accuracy in the regression models, where the adjusted R-square values of the regression for P m , q and P p are 95.8%, 94.4% and 97.5%, respectively. Furthermore, we find that all elements of the initial summary statistics s are informative about P m . However, to obtain estimates for q and P p , the two largest radii of the expanding cell colonies were not significant in the regression. From the regression analysis, we obtain the point estimate (P m ,q,P p ) = (0.1217, 0.2477, 0.0015). Using the values of Δ =18 μm and τ = 0.04 h, we obtain estimates of the biological parameters D and λ,D = 246 μm 2 h −1 andλ = 0.038 h −1 . A Bayesian PB procedure with B = 1, 000 particles is then performed with the point estimate (P m ,q,P p ). The resulting PB distributions are given in Figure 5 together with the posterior distributions from the PB SMC ABC and the standard SMC ABC algorithms. A numerical summary is given in Table 2. Figure 5 show that the Bayesian PB distributions (dashed red) are very close to the ABC posterior distributions for q and λ, whereas there is some deviation between the PB distribution and the ABC posterior for D. This suggests that the PB distributions produce good approximations to the posterior distributions of q and λ, and a good enough approximation for D to produce a useful initial importance distribution for the PB SMC ABC algorithm. This PB distribution is produced using only 19,925 models simulations.

Results in
The ABC posterior distributions resulting from the two ABC algorithms, with sample size N = 2, 000, are very similar as expected given that we use the same summary statistics s F P and the same final target tolerance. However, given the same target tolerance T = 0.4, the PB SMC ABC (solid green curves) only requires 135,080 model simulations, whereas the standard SMC ABC (dotted blue curves) using an initial importance distribution formulated from the pilot run needed more than 210,000 model simulations.

Discussion
In this paper, we have presented a novel approach to perform a Bayesian PB for models with intractable likelihoods and newly developed ABC algorithms that aim to minimise the number of model simulations. The main idea is to use the PB distribution as an initial importance distribution for SMC ABC (named PB SMC ABC), and ABC IS (named PB ABC IS) algorithms. The new algorithms were validated on two test examples, the g-and-k quantile distribution and the toggle switch model. Both of these examples have shown that the PB SMC ABC algorithm outperformed the standard SMC ABC by 25-35% in terms of the number of model simulations. We also found efficiency gains when incorporating the PB approximation in ABC IS. However, for brevity, results from the PB ABC IS and standard ABC IS algorithms were not shown.
The main application of the new method was to obtain Bayesian inference for the key parameters governing the expansion of melanoma cell colonies. The simulation procedure from the stochastic model is computationally intensive for some regions of the parameter space (a high proliferation rate). Thus, using the PB approximation as an importance distribution is efficient as it is reasonably close to the ABC posterior and does not propose additional parameter values in parameter regions where it is expensive to simulate.
To speed up the SMC ABC method we used the PB approximation g(θ) as the initial importance distribution as opposed to the prior. Another approach would define a sequence of distributions formed by a geometric path that connects g(θ) with the ABC target of interest, p T (θ|s obs ).
The PB distributions can also be embedded within MCMC ABC algorithms. While Fearnhead and Prangle (2012) used the results from the pilot run to choose a starting value of the chain and to form a proposal distribution for MCMC ABC algorithms, one could use an analytical approximation to the PB distribution to form a proposal distribution and use the point estimate obtained from the regression procedure as a starting value. For the tolerance value in MCMC ABC algorithms, one could use a particular quantile of the discrepancies produced from the PB replications.
It should also be noted that the quality of the PB distributions relies very much on the quality of the multiple linear regression procedure to obtain a point estimate of θ.
Investigating the output from the linear model can help to identify which parameters were poorly estimated and as such one could modify the explanatory variables or the training region of model parameters to obtain more accurate results. Here, we suggest that the PB method is more appropriate when the regression model is able to explain a large amount of the variation in the parameters, with adjusted R-square above 80%. There are also several alternative approaches to linear regression such as non-linear regression methods (Blum and François, 2010), artificial neural networks (Chen, 2012) or partial least squares (Beaumont, 2012). These more advanced techniques may be useful in obtaining better fitting regressions. Here, we demonstrate that the PB method is more suitable for low to moderate dimensional θ, where the model simulation is expensive, as in the melanoma application considered in this paper.
To examine whether the assumption for the symmetry property for the summary statistics in the Bayesian PB is reasonable, we could apply a synthetic likelihood evaluation to estimate the log-likelihood ratio in (4). If there is no extreme value in the estimated ratios, then the assumption of symmetry is reasonable. The other assumption that our method makes is that the regression produces accurate point estimates for the parameters. If a particular point estimate is biased then the corresponding posterior approximation is likely to be biased too.
We also examined the possibility of integrating a non-parametric bootstrap procedure for models with intractable likelihoods. The Bayesian version of the non-parametric bootstrap was introduced by Rubin (1981) and later was extended by Newton and Raftery (1994) who named it the weighted likelihood bootstrap (WLB). Rubin (1981) used non-parametric bootstrapping of the maximum likelihood estimate which relies on re-sampling the data, and as such this approach may be applicable for datasets of independent observations, such as the g-and-k example in this paper, but cannot be easily applied if there is a complex dependence structure in the data, such as in the melanoma example.
Instead of re-sampling the data as in Rubin (1981), the WLB randomly weights the components of a likelihood function then maximises this weighted likelihood function to provide a bootstrap replication of the parameter. For a certain weight distribution, the WLB samples can provide an approximation to the posterior distribution, and as such it can be used to form a good starting point for an adaptive importance sampling algorithm, similar in spirit to what we do in this paper. This approach is straightforward to apply, however it relies on being able to explicitly write the likelihood function as a product of components so different weights for each component can be easily applied. Thus, the WLB is not applicable for models of interest in this paper. In conclusion, we suggest that the PB approach is the only bootstrap method generally applicable for models with intractable likelihoods.