Edinburgh Research Explorer Adaptive Approximate Bayesian Computation Tolerance Selection

. Approximate Bayesian Computation (ABC) methods are increasingly used for inference in situations in which the likelihood function is either computationally costly or intractable to evaluate. Extensions of the basic ABC rejection algorithm have improved the computational eﬃciency of the procedure and broad-ened its applicability. The ABC – Population Monte Carlo (ABC-PMC) approach has become a popular choice for approximate sampling from the posterior. ABC-PMC is a sequential sampler with an iteratively decreasing value of the tolerance, which speciﬁes how close the simulated data need to be to the real data for acceptance. We propose a method for adaptively selecting a sequence of tolerances that improves the computational eﬃciency of the algorithm over other common techniques. In addition we deﬁne a stopping rule as a by-product of the adaptation procedure, which assists in automating termination of sampling. The proposed automatic ABC-PMC algorithm can be easily implemented and we present several examples demonstrating its beneﬁts in terms of computational eﬃciency.


Introduction
Approximate Bayesian Computation (ABC) provides a framework for inference in situations where the relationship between the data and the parameters does not lead to a tractable likelihood function, but where forward simulation of the data-generating process is possible. ABC has been used in many areas of science such as biology (Thornton and Andolfatto, 2006), epidemiology (McKinley et al., 2009;Numminen et al., 2013), ecology (Beaumont, 2010), population modeling (Toni et al., 2009), modeling the population effects of a vaccine , dark matter direct detection (Simola et al., 2019), and astronomy (Cameron and Pettitt, 2012;Cisewski-Kehe et al., 2019;Ishida et al., 2015;Schafer and Freeman, 2012;Weyant et al., 2013). The basic ABC algorithm (Pritchard et al., 1999;Rubin, 1984;Tavaré et al., 1997) can be explained in four steps. Suppose the parameter vector θ ∈ R p is the target of inference, then (i) draw the model parameters from the prior distribution, θ prop ∼ π(θ), (ii) produce a synthetic sample of the data by using θ prop in the forward simulation model, y prop ∼ f (y | θ prop ), (iii) compare the true data, y obs , with the generated sample, y prop , using a distance function, ρ(·, ·), and defining the distance as d = ρ(s(y obs ), s(y prop )) where s(·) is some (possibly multi-dimensional) summary statistic of the data, (iv) if the distance, d, is less than or equal to a fixed tolerance, , then θ prop is retained, otherwise it is discarded. This is repeated until a desired particle sample size, N , is achieved.
Following the notation of Marin et al. (2012), the resulting ABC posterior can be written as π (θ | y obs ) = f (y prop | θ)π(θ)I A ,y obs (y prop ) where I A ,y obs (·) is the indicator function for the set A ,y obs = {y prop | ρ(s(y obs ), s(y prop )) ≤ }. There are many extensions to the basic ABC algorithm (e.g., Blum 2010;Blum et al. 2013;Ratmann et al. 2013;Csilléry et al. 2010;Del Moral et al. 2012;Drovandi and Pettitt 2011;Fearnhead and Prangle 2012;Joyce and Marjoram 2008;Marin et al. 2012), but here we focus on the ABC -Population Monte Carlo (ABC-PMC) approach introduced by Beaumont et al. (2009). However, the proposed methodology could be used in other sequential versions of ABC that require selecting a sequence of tolerances. The proposed adaptive approximate Bayesian computation tolerance selection algorithm (aABC-PMC) targets the same kind of approximate posterior sampling problems as the original ABC-PMC algorithm, and may be subject to the same limitations in the case of high-dimensional parameter spaces. ABC has been successfully used in numerous situations where the likelihood function is intractable and the number of parameters varies from 2 to 5 (e.g. Beaumont et al. 2009;Cisewski-Kehe et al. 2019;Csilléry et al. 2010;Cornuet et al. 2008;Del Moral et al. 2012;Gutmann and Corander 2016;Järvenpää et al. 2016;Jennings and Madigan 2016;Numminen et al. 2013;Silk et al. 2013;Simola et al. 2019;Sisson et al. 2007;Toni et al. 2009). Our algorithm is designed to significantly improve upon the original ABC-PMC method under similar circumstances.
The ABC-PMC algorithm by Beaumont et al. (2009) is based on an adaptive importance sampling approach, where, given a series of decreasing tolerances 1 > 2 > · · · > T (T being the final iteration), the proposal distribution is sequentially updated in order to improve the efficiency of the algorithm. This is done by constructing a series of intermediate proposal distributions, with the details of the steps presented in Algorithm 1. The first iteration of the ABC-PMC algorithm uses tolerance 1 and draws proposals from the specified prior distribution(s); the corresponding ABC posterior is denoted by π 1 . Rather than starting the rejection sampling over using a smaller , the algorithm proceeds sequentially by drawing proposals from the ABC posterior approximated in the previous iteration. After a parameter value, typically referred to as a particle, is selected from the set of available particles from the previous iteration, it is also translocated according to some kernel function (e.g. a Gaussian kernel) to avoid degeneracy of the sampler. Since the proposals are not drawn directly from the prior π, importance weights are used. The importance weight for a particle J = 1, . . . , N at iteration t is: (1.1) Algorithm 1 ABC-PMC algorithm for θ. Given a series of decreasing tolerances 1 > 2 > · · · > T if t = 1 then for J = 1, . . . , N do Set d where φ(·) is the density function of a standard normal distribution, 1 τ 2 t−1 is the variance (twice the weighted sample variance of the particles from iteration t − 1 is used, as recommended in Beaumont et al. 2009), and π(·) is the prior distribution. We note that the definition for the importance weight provided in (1.1) is up to a normalization constant. In fact each importance weight is normalized such that While the particles are drawn from a sequentially improving proposal distribution, the tolerances also decrease such that 1 > 2 > · · · > T , to increase the fidelity of the resulting approximation to the underlying posterior. The common strategies for selecting this sequence adaptively, highlighted in Section 1.1, can lead to inefficient sampling as well as avoiding relevant regions of the parameter space (Silk et al., 2013). The key contributions of this article are (i) a method for selecting the 1:T = ( 1 , 2 , . . . , T ) in a 1 The probability density function of a Q-dimensional standard normal distribution is φ(X) = (2π) − Q 2 exp − 1 2 X T X with the expected value of the random vector X is E[X] = 0 (where 0 is a Q-dimensional vector of zeros) and its covariance matrix is Var[X] = I Q , where I Q is the Q × Q identity matrix. manner that results in improved computational efficiency, and (ii) a rule for determining when the algorithm terminates (i.e. determining T ).

Selecting the Tolerance Sequence and Stopping Rules
There are three common approaches for selecting the tolerance sequence, 1:T : (i) fixing the values in advance (Beaumont et al., 2009;McKinley et al., 2009;Sisson et al., 2007;Toni et al., 2009), (ii) adaptively selecting t based on some quantile of {d , the distances of the accepted particles from iteration t − 1 (Cisewski-Kehe et al., 2019;Ishida et al., 2015;Lenormand et al., 2013;Simola et al., 2019;Weyant et al., 2013), or (iii) adaptively selecting t based on some quantile of the effective sample size (ESS) values (Del Moral et al., 2012;Numminen et al., 2013). These approaches can lead to inefficient sampling as discussed below and demonstrated in the simulation study in Section 3. It turns out that selecting tolerances using a predetermined quantile can, if not selected wisely, lead to the particle system getting stuck in local modes (Silk et al., 2013). Hence the exact sequence of tolerances has an impact not only on the computational efficiency of the algorithm but also on convergence towards the true posterior. We emphasize, however, that obtaining a high-fidelity approximation to the true posterior using ABC is not guaranteed, as this depends on a number of conditions to be met, including a careful selection of summary statistics. Silk et al. (2013) propose an adaptive approach for selecting the tolerance sequence at each iteration by estimating the threshold-acceptance rate curve (TAR curve), which is used to balance the amount of shrinkage of the tolerance with the acceptance rate. This approach requires the estimation of the TAR curve at each iteration of the algorithm. The naive, but computationally impractical approach to estimating the TAR curve (noted as such in Silk et al. 2013), is to simulate a Monte Carlo estimate of the acceptance rate at a range of different tolerances using the ABC forward model, which would have to be repeated at each iteration of the ABC algorithm. Instead, they suggest a more practical method for estimating the TAR curve by building an approximation to the forward model (in their example, using a mixture of Gaussians and the unscented transform of Julier et al. 2000). The TAR curve approach is able to avoid local optima values, but requires the extra step of building a fast approximation of the ABC data-generating model. Our proposed algorithm is similarly able to avoid local modes, but uses quantities that are directly available in the algorithm. More details are presented in Section 3.
After determining the sequence of tolerances, it is also necessary to determine when to stop a sequential ABC sampling algorithm. An ABC algorithm is often stopped when either a desired (low) tolerance is achieved (Sisson et al., 2007) or after a fixed number of iterations T (Beaumont et al., 2009). Ishida et al. (2015) showed that once the ABC posterior stabilizes, further reduction of the tolerance leads to low acceptance rates without meaningful improvement in the ABC approximation to the posterior. They stop the algorithm once the acceptance rate drops below a threshold set by the user.
The first main contribution of this paper is to extend the ABC-PMC algorithm so that the quantile used to update the tolerance in each iteration, q t , is automatically and efficiently selected, rather than being fixed in advance to a quantile that is used for each iteration. It is worth noticing that efficiency is not only a matter of having a high acceptance rate, as this can be easily accomplished by using larger quantiles, but rather a balance between the acceptance rate and a suitable amount of shrinkage of the tolerance. Moreover the series of tolerances needs to be selected in such a way that the algorithm avoids getting stuck in local modes. As the second contribution, we develop an automatic stopping rule directly based on the behavior of the sequential ABC posterior.
The rest of the paper is organized as follows. In Section 2 the adaptive selection of q t for determining the tolerance sequence is presented along with the proposed stopping rule. Section 3 is dedicated to a simulation study to compare quantile-based selection of tolerances using ABC-PMC with the proposed procedure. The final example considered uses real data on colonizations of the bacterium Streptococcus pneumoniae (Numminen et al., 2013). Concluding remarks are given in Section 4.

Methodology
Using the same quantile to update the tolerance at each iteration can be computationally inefficient and results in the particle system getting stuck in local modes (see the example in Section 3.2). In this section we introduce a method for adaptively selecting the quantile such that each iteration has its own quantile, q t , set based on the online performance of the algorithm.

Initial Sampling and Automatic Tolerance Selection Rule
In order to initialize the tolerance sequence we use the following approach. Let N be the desired number of particles to approximate the posterior. The initial tolerance 1 can be adaptively selected by sampling N init = kN draws from the prior, for some k ∈ Z + (Cisewski-Kehe et al., 2019). Then the N particles of the N init total particles with the smallest distances are retained, and 1 = max(d are the N smallest distances of the N init particles sampled. This initialization procedure effectively selects a distance quantile for the first step by the selection of an appropriate k, but making this first step adaptive is easier than trying to guess a good 1 . Trying to specify a reasonable 1 can be especially challenging when testing different summary statistics or distance functions because the scale of the distances can be different. It is important to note that k must be large enough to result in a satisfactory initial exploration of the parameter space, otherwise the algorithm might get stuck in local regions of the parameter space. This challenge also holds true in general for other ABC algorithms, including when 1 is predefined (i.e. not set adaptively). Providing a general and suitable value for k regardless of the problem that is considered is challenging, since this choice depends on a number of factors such as the definition of the prior distribution(s), the forward model and where relevant regions of the parameter space are (the latter being unknown). Therefore the parameter k has to be suitably tuned by the user once the forward model and the prior distribution(s) have been defined. The problem of selecting k is further discussed in Section 3.
For the subsequent tolerances, 2:T , the general idea is to gauge the amount of shrinkage for iteration t + 1 by determining the value of t+1 based on the amount of improvement betweenπ t−1 andπ t . In particular, we can use the estimated ABC posteriors to select a quantile to update the tolerance for the next iteration, and adjust the next tolerance based on how slowly or rapidly the sequential ABC posteriors are changing. More specifically, after each iteration t > 1, the following ratio can be estimated using the weighted particles:ĉ . (2.1) Sinceπ t−1 (θ) andπ t (θ) from (2.1) are both proper densities, they will be either exactly the same, makingĉ t = 1, or there must be a place whereπ t (θ) >π t−1 (θ), makinĝ c t > 1. Then the proposed quantile for iteration t (in order to determine t+1 ) is which varies between 0 and 1. Small values of q t imply q t−1 lead to a large improvement betweenπ t−1 andπ t , which then results in a larger percentage reduction of the tolerance for the coming iteration, t + 1. On the other hand, once the ABC posterior stabilizes, q t tends to 1 asπ t−1 andπ t become more similar.
The form of (2.2) was motivated by the Accept-Reject (A/R) algorithm (Andrieu et al., 2003;Robert and Casella, 2013). The A/R algorithm has a target distribution, a proposal distribution, and a rule to decide whether or not an element coming from the proposal distribution should be accepted as an element coming from the target distribution. If the form of the ABC posterior distribution was known, A/R sampling would work as follows. A candidate, θ * , would be proposed fromπ t−1 (θ|y obs ), and would be accepted with probabilityπ t (θ * |y obs ) is a positive real constant number selected such thatπ t (θ|y obs ) ≤ c ·π t−1 (θ|y obs ) (Robert and Casella, 2013). In A/R sampling, the unconditional acceptance probability is 1 c (Hesterberg, 1988). The constant c acts as a proxy for the difference between the proposal and the target distributions (e.g., if they are the same distribution, then c = 1 and all proposals would be accepted).
The ABC algorithm does not follow the A/R sampling scheme, but the notion of 1/c relating to the sampling efficiency in the A/R algorithm inspired the proposed adaptive tolerance selection idea. For some future iteration, say iteration t+1, the ABC posterior distribution is unknown so the previous two ABC posteriors at iterations t − 1 and t are used as the proposal and target distributions, respectively, so thatĉ t can be computed in (2.1). If there was a substantial change betweenπ t−1 andπ t , thenĉ t would be larger resulting in a smaller quantile, q t , for specifying t+1 . Asπ t−1 andπ t become more similar, larger quantiles q t are assigned. The proposed form of c t allows the tolerance selection to be based on changes in the ABC posterior from the previous iteration where substantial changes between iterations t − 1 and t result in a substantial decrease in the proposed tolerance for t+1 . This continues until a substantial decrease in the proposed tolerance does not result in a substantial change in the ABC posterior, at which point the amount of shrinkage in the tolerance becomes smaller. πt−1 , withĉ t defined according to (2.1) and used for setting q t , as defined in (2.2). (right) The (arbitrary) distribution of distances is from the accepted distances at iteration t, {d We found the rule based on (2.2) to work well empirically. One challenge with a theoretical evaluation of the proposed algorithm, and other algorithms designed to optimize the tolerance shrinkage and acceptance rate, is that the acceptance rate depends on the forward simulation model. In general ABC settings, the forward simulation model does not have a closed-form expression.
An illustration of the proposed quantile selection procedure is provided in Figure 1. Ifπ t−1 was used as the proposal for iteration t + 1 (instead ofπ t ), then q t would be the percentage decrease in the acceptance rate from iteration t, i.e. if acc t is the acceptance rate for iteration t, then acc t+1 would be approximately q t × acc t . However, we are not proposing fromπ t−1 , but ratherπ t so the decrease in the acceptance rate is mitigated by the improvement in the proposed particles from iteration t. When there is a large improvement in the ABC posterior fromπ t−1 toπ t , then q t is smaller, allowing for a larger drop in the tolerance. This larger percentage drop in tolerance does not result in an equal percentage drop in acceptance rate because the new proposal distribution, π t , is better thanπ t−1 . Conversely, ifπ t−1 is close toπ t , then the improvement in the ABC posterior is not enough to allow for a large decrease in the acceptance rate and consequently q t is closer to 1.
The evaluation of (2.1) relies on the calculation of the ratio between the (possibly multidimensional) density functions, defined here as r. A naive solution would be to separately calculate the density forπ t andπ t−1 using some Kernel Density Estimate (KDE) method (see Silverman 2018 for a review), and then estimate the ratio from those estimates. Then, the supremum of the previously calculated ratio can be obtained, for example, through an optimization procedure that computes the density over a grid of values. However, this is not a reliable solution, in particular for high-dimensional cases for which division by an estimated quantity can magnify the estimation error (Sugiyama et al., 2008). In order to address the problem of properly estimating r witĥ r, and therefore solving (2.1), alternatives to the KDE solution are available, such as ratio estimation methods (REM) (Sugiyama et al., 2012). The main advantage of using REM is that the calculation of the desired ratio does not include density estimation, which would involve dividing by an estimated KDE. Additionally, when using a KDE, kernel and bandwidth need to be selected, which can affect the result. Poorly estimating the density of the denominator of r, in particular, can potentially increase the error of the estimated ratio (Sugiyama et al., 2010). There are several different REM frameworks (e.g. Bickel et al. 2007;Gretton et al. 2009;Sugiyama et al. 2008Sugiyama et al. , 2010, but we use the ratio matching approach of Sugiyama et al. (2008) discussed in more detail next.
In order to introduce the REM framework, consider θ ∈ R p and two generic sam- where L and M are the sample sizes for the first and the second sample, respectively. The sample . The basic idea of the ratio matching approach is to match a density ratio modelr(θ) with the true density ratio r(θ) under some divergence (Sugiyama et al., 2010). Several divergences can be used to comparer(θ) with r(θ). A common divergence is the Bregman divergence (Bregman, 1967), along with some of its related divergences such has the unnormalized Kullback-Leibler divergence and the squared distance. In particular, the unnormalized Kullback-Leibler divergence minimizes the divergence between p L (θ) andp L (θ) =r(θ)p M (θ) by means of the following criterion: By decomposing the Kullback-Leibler divergence defined in (2.3),r(θ) can be estimated by solving the objective function maxr p L (θ) logr(θ)dθ (Hido et al., 2011;Sugiyama et al., 2010). Further details on the unnormalized Kullback-Leibler divergence and on other REM approaches are found in Sugiyama et al. (2012). As pointed out by Sugiyama et al. (2010), a further non-negligible advantage of using REM, and in particular the ratio matching approach, is the applicability of gradient-based algorithms and quasi-Newton methods for optimization overr(x).
In the analyses of the present work we use the ratio matching approach and the Kullback-Leibler importance estimation procedure (KLIEP) (Hido et al., 2011;Sugiyama et al., 2010Sugiyama et al., , 2008 in order to estimate, at the end of each iteration t, the ratio of densities defined in (2.1). Recall that the densities involved in (2.1) areπ t (θ) andπ t−1 (θ). Once the ratio betweenπ t (θ) andπ t−1 (θ) has been estimated, the supremum of (2.1) is calculated by using an optimizer over the parameter space, such as the one proposed by Brent (2013). The quantile used to reduce the tolerance for the coming iteration is finally retrieved by using (2.2). The steps discussed above are performed at the end of each iteration as long as the stopping rule, defined in (2.5) and discussed below, is not satisfied. Estimation ofr is carried out by using the densratio package, 2 which is freely available in the R software (R Core Team, 2019).
The acceptance rate is also useful for evaluating the computational burden of the ABC-PMC algorithm, defined as: where D t is the number of draws done at iteration t in order to produce N accepted values. Equation (2.4) generally decreases with each iteration because as the tolerance decreases, the number of elements D t required to get N accepted particles generally increases (Lintusaari et al., 2017).

Stopping Rule
There are several published ideas in the literature on how to determine the number of iterations in an ABC-PMC algorithm. Often one picks some T based on the computational resources available, but this can be needlessly inefficient. Ishida et al. (2015) proposed to stop the algorithm once the acceptance rate is smaller than some specified, fixed tolerance. The proposed stopping rule is directly based on the estimated sequential ABC posterior distributions, which avoids unnecessary additional iterations of the algorithm.
The ABC-PMC algorithm produces a sequence of T posterior distributions,π t , where t identifies the tolerance used in iteration t, with t = 1, . . . , T and 1 > 2 > · · · > T . When defining a stopping rule, it turns out that (2.2) can be used not only to adaptively selecting the quantile used to reduce the tolerance across the iterations, but also to indicate when to stop the procedure once the sequential ABC posterior stops changing significantly.
The series of quantiles defined through (2.2) generally increases as the tolerance decreases. In particular, since the quantile used to reduce the tolerance is based on the online performance of the ABC posterior distribution, once the ABC posterior has stabilized, q t ≈ 1. This follows directly from (2.1) because once the ABC posterior has stabilizedĉ t ≈ 1, and further reductions of the tolerance (i.e. additional iterations) do not necessarily lead to an improvement by the ABC posterior distribution. In other words, once the ABC posterior stabilizes, the series of the quantiles defined through (2.2) stops increasing and the upper bound of 1 implies that no further reduction will improve the ABC posterior distribution. This leads to an automatic and simple stopping rule, which is employed starting from the third iteration, i.e. once the transformation kernel has been used twice to avoid premature stopping. Our algorithm is stopped at time t when q t > 0.99 for t ≥ 3. (2.5) Hence, the algorithm is stopped once the quantile used to reduce the tolerance suggests that further reduction is not necessary since the ABC posterior has stabilized.
Using (2.2) as an automatic rule to shrink the tolerance and (2.5) as the stopping rule, the ABC-PMC algorithm is stopped once additional iterations with smaller tolerances do not lead to significant changes in the ABC posterior. 3

Illustrative Examples
Next we provide a comparison between the original ABC-PMC algorithm and our extension proposed in Section 2, the aABC-PMC, by using three examples. In the first example the Gaussian mixture model by Sisson et al. (2007) is used in order to demonstrate the computational efficiency of the proposed aABC-PMC procedure. Then the aABC-PMC algorithm is used for a model from Silk et al. (2013), which has local modes, in order to illustrate how the proposed automatic tolerance selector is able to avoid getting stuck in local regions of the parameter space. The final example, originally presented in Numminen et al. (2013), uses data on colonizations of the bacterium Streptococcus pneumoniae and represents a computationally expensive forward model. Expensive forward models are a challenge for ABC methods because the computational cost can be prohibitive for practical applications, and in these cases selecting an appropriate sequence of tolerances is crucial. A fourth example, the Lotka-Volterra model by Toni et al. (2009), is presented (see Appendix A in Supplementary Material, Simola et al., 2020).
In order to compare the proposed procedure with the original ABC-PMC algorithm, both the computational time and the total number of draws until the stopping criterion is satisfied are considered. The Hellinger distance is used for evaluating the similarity between the 1-dimensional marginal ABC posterior distributions at the final iteration, π T , and a benchmark, π true , which is defined as: (3.1) The benchmark, π true , is the true posterior distribution if it is available in closed form, which is the case in the first two presented examples (see Sections 3.1 and 3.2). In the final example, since the true posterior distribution is not available, the ABC posteriors from Numminen et al. (2013), are used as benchmarks (see Section 3.3).
In order to estimate the 1-dimensional marginal ABC posterior distributions from the samples and their corresponding importance weights, a KDE (Silverman, 2018) is used with a Gaussian kernel and a smoothing bandwidth parameter h. The bandwidth is selected using Silverman's rule-of-thumb (Silverman, 1986).
Finally, unless otherwise noted, the number of particles in the ABC procedures is set to N = 1, 000. Sisson et al. (2007) aABC (The displayed results were obtained by running the procedure 21 times and using the run that produced the median number of total draws.) For the aABC-PMC algorithm, the quantile automatically selected through the iterations is displayed under q t . The procedure stopped once the quantile q 5 = 0.999 was proposed. For the ABC-PMC algorithm a total of 1, 421, 283 (243 sec.) draws were required, while our aABC-PMC takes 81, 230 (88 sec.) draws overall.

Gaussian Mixture Model
The first application of the aABC-PMC is an example from Sisson et al. (2007), which is also analyzed by Beaumont et al. (2009). It is a Gaussian mixture model with two Gaussian components with known variances and mixture weights, but an unknown common mean, f (y | θ) = 0.5N (θ, 1) + 0.5N (θ, 0.01) and prior π(θ) ∼ Unif(−10, 10). With a single observation y obs = 0, the true posterior distribution is π(θ | y obs ) ∼ 0.5N (0, 1) + 0.5N (0, 0.01).  Table 1. To evaluate the reliability of the aABC-PMC, a comparison with the ABC-PMC is done both in terms of computational time and total number of draws. The results of the analysis are shown in Table 1 and are based on 21 independent runs with the same dataset, y obs = 0. The table includes the values for the run that produced the median number of total draws. The aABC-PMC outperforms ABC-PMC in the terms of total draws (81,230 vs 1,421,283) and a faster computational time (88 seconds vs The q t 's (black circles) generally increase through the iterations until the ABC posterior has stabilized. The acceptance rate (blue triangles) decreases throughout the iterations, which is why it is desirable to stop the algorithm once the ABC posterior has stabilized.
243 seconds). The final ABC posteriors for each method are displayed in Figure 2a. Though the aABC-PMC method is computationally more efficient than the ABC-PMC approach, the final ABC posteriors are very similar. This suggests that after a suitable tolerance is achieved, decreasing the tolerance further does not necessarily lead to a better approximation of the posterior distribution.
From Table 1, we note that the final tolerance for Sisson et al. (2007) is 10 = 0.0025 (H dist = 0.20) while the automatic stopping rule of aABC-PMC leads to 4 iterations with a final tolerance of 4 = 0.035 (H dist = 0.20). In Figure 2b, the q t 's retrieved by using (2.2) are displayed (black circles), which increase until the final iteration, while the acceptance rate (blue triangles) decreases. Neglecting to stop the algorithm once the ABC posterior has stabilized can be inefficient since the number of draws needed in order to complete further iterations can drastically increase, as evidenced by the increasing D t for later iterations displayed in Table 1.
Next, we show the behavior of the aABC-PMC algorithm for different choices of the number of proposed values from the prior distribution at the first iteration of the procedure. Initial particle sample sizes, N init , of N, 2N, 5N, and 10N are considered (with N = 1, 000), and the results are displayed in Table 2. The initial particle sample size that seems to best balance the total number of draws and the time required to satisfy the stopping rule in this example is 5N , with similar final ABC posterior distributions based on H dist (see Table 2); the posteriors are displayed in Figure 3.  (N, 2N, 5N, 10N ) for the Gaussian mixture model example.  (N, 2N, 5N, 10N ) for the Gaussian mixture model example.
Using the EasyABC R package 4 we carried out the same analysis for the ABC-SMC algorithm by Del Moral et al. (2012). The ABC-SMC algorithm by Del Moral et al. (2012) is discussed in Section 3.3. For each initial particle sample size, N init , of N, 2N, 5N, and 10N, 21 independent runs with the same dataset are performed and the runs that produced the median number of total draws are compared to the corresponding run obtained by our adaptive approach. Our choices for setting the parameters required by the ABC-SMC algorithm (see Section 3.4) are: N = 1, 000, = 0.035, α = 0.5, M = 1 and nb threshold = N/2. We note that for the last three parameters, the default values are used, according to the suggestions by Del Moral et al. (2012). The results of the analysis are summarized in Table 3 and the corresponding posterior distributions are displayed in Figure 4. For all four N init values considered, the final tolerances returned by the ABC-SMC algorithm are comparable with the one obtained by our approach with N init = 5N ( 4 = 0.035). However the corresponding ABC-SMC posterior distributions do not match the true posterior distribution as well as our proposed approach. In particular, the ABC-SMC algorithm does not seem to capture the (low) variance coming from the second component of the Gaussian mixture model. Similar results were also obtained by the ABC-SMC sampler proposed in Bonassi and West (2015). A further comparison when using the ABC-SMC algorithm by Del Moral et al. (2012) (N, 2N, 5N, 10N ) for the Gaussian mixture model example.  (N, 2N, 5N, 10N ) for the Gaussian mixture model example.

Presence of a Local Mode
The sequence of tolerances has an impact not only on the computational efficiency of the algorithm, but also on its ability to find the true posterior (Silk et al., 2013), noting again that convergence to the true posterior using ABC is not guaranteed. To demonstrate the performance of aABC-PMC in the presence of local modes, we consider an example proposed in Silk et al. (2013). The (deterministic) forward model is g(θ) = (θ − 10) 2 − 100 exp(−100(θ − 3) 2 ). The input value is set to θ = 3 leading to a single observation y obs = −51. The true posterior distribution is a Dirac function at 3. The specifications for the distance function (L 1 norm), the prior distribution (a normal distribution with mean of 10 and variance of 10), and the desired number of particles (N = 1, 000) are taken from Silk et al. (2013). Figure 5 displays the locations of the accepted particles (orange x's) against the distances for a range of θ's, which highlights the challenge for ABC with this model. There is a local minimum distance around θ = 10, but the global minimum distance occurs at the true value of θ = 3. Initial steps of the ABC algorithm will find the local minimum, but the algorithm can easily get stuck around θ = 10 if the sequential tolerances are not selected carefully. The series of plots in Figure 5 shows the behavior of the aABC-PMC algorithm by focusing on the values for θ that were accepted (orange x's). After 6 iterations, the aABC-PMC algorithm has found the global minimum distance around  Table 4, where 384, 347 total particles were used by the proposed aABC-PMC algorithm. The table includes the values for the run that produced the median number of total draws. It is apparent from Figure 5 that the third iteration was an important step in which the large reduction of the tolerance allowed the algorithm to consider those few particles coming from the global optimal value at θ = 3. Although the raw tolerance hardly decreases between the first and the second iteration ( 1 = 51.59 and 2 = 51.02), there is a substantial change between the ABC posteriors, fromπ 2 toπ 3 . The majority of the accepted values from t = 2 are sampled near the local mode at θ = 10, but the reduction resulting from the slightly smaller 3 leads to the majority of values proposed near θ = 3 to be accepted.
In order to compare the proposed aABC-PMC algorithm with the ABC-PMC approach of Silk et al. (2013) (see Section 1.1), we estimated the TAR curve and the corresponding thresholds (Silk et al., 2013). The TAR curve is obtained by plotting on the x-axis several thresholds that might be picked for the next iteration of ABC simulations and on the y-axis their corresponding acceptance rates. The threshold recommended for the next ABC-PMC iteration is then selected by locating the "elbow" of the estimated TAR curve (Silk et al., 2013). Since the forward model is computationally cheap, an approximation to the forward model was not needed. Instead, the TAR curve was estimated at each iteration by setting arbitrary grid points of tolerances having range in (0, t−1 ), running the ABC-PMC algorithm (for t > 1 the previous iteration's particle system and the Gaussian perturbation kernel are used), and then calculating the acceptance rate according to (2.4). This procedure was repeated 100 times and the resulting average TAR curve was used to retrieve the tolerance for the coming iteration, as was done in Figure 2(left) of Silk et al. (2013). As result, a plot of acceptance rate vs. TAR curve (Silk et al., 2013) a  Table 4: The number of draws needed in each iteration to reach N = 1, 000 accepted values for the ABC-PMC with the TAR curve-selected tolerances and the aABC-PMC algorithm. (The displayed results were obtained by running the procedure 21 times and using the run that produced the median number of total draws.) For the aABC-PMC algorithm, the quantile automatically selected through the iterations is displayed under q t . The procedure stopped once the quantile q 7 = 0.9991 was calculated. For the ABC-PMC algorithm a total of 1, 415, 600 (310 sec.) draws are required, while our aABC-PMC takes 384, 347 (258 sec.) draws overall. The number of draws listed for Silk et al. (2013) does not include the draws required to build the TAR curve; however, we did include the TAR curve construction in the computational time.
tolerances was obtained; the tolerance is set at the value corresponding to the elbow of the TAR curve. The series of tolerances, displayed together with the number of draws in Table 4, is 1:3 = (150, 51.26, 50.84) and the corresponding ABC posterior distributions are displayed in Figure 6a. The number of draws listed for Silk et al. (2013) does not include the draws required to build the TAR curve; however, we did include the TAR curve construction in displayed computational time. We note that the true posterior distribution, which is a Dirac function centered in θ = 3, is not suitably approximated by Silk et al. (2013) (H dist = 0.17).
In order to calculate the Hellinger distance in this example, we approximate the true posterior (i.e., a Dirac function at θ = 3) with an N -dimensional vector with all elements equal to 3.
For t = 4, the estimated TAR curve did not have an elbow and, consequently, there was no additional shrinkage of the tolerance resulting in an ABC posterior that was not a suitable approximation to the true posterior distribution; the final tolerance 3 was too high. We tried making adjustments to the TAR curve grid to see if this could be improved. When using fewer grid points (e.g. 10) for the TAR curve, we were able to improve the performance. However, this improved performance was due to poorer approximation to the TAR curve. In general, it would be preferable if a better estimate of the TAR curve lead to better performance. A higher resolution TAR curve grid with 1000 grid points also was not able to find the global optimal solution. In contrast, as shown in Figure 6b, the proposed aABC-PMC approach provides a better approximation of the true posterior distribution although the number of draws required by the simulator is only of 384, 347 (compared to 1,415,600 draws required by ABC-PMC with the 100 point TAR curve grid). Silk et al. (2013) note that if the particles are sampled from a large region of the parameter space that has a negligible mass in the posterior distribution, there is a risk of getting stuck in this parameter region if the tolerance is not selected carefully. In other words, the parameter space needs to be sufficiently explored in order to get enough particles in regions near the global optimal value. In the first iteration of the aABC-PMC algorithm the number of particles sampled directly from the prior was kN with k = 5, which seems to work well in the examples considered. We emphasize that moving toward relevant regions of the parameter space needs to happen in the first few iterations of the ABC-PMC procedure, since uniformly small reductions in the tolerance sequence (e.g. using a fixed q t ≥ 0.25) could end up removing those few important particles near the global optimal value, even if the number of particles sampled directly from the prior is 5N .
The initial exploration of the parameter space and the definition of small enough quantiles in the first iterations appears to be why in the procedure based on the TAR curve, the total number of draws needed by the ABC-PMC algorithm is large, making it very expensive computationally. In fact, at the end of the second iteration, the majority of the previous iteration's accepted particles are drawn near the local minimum. Moreover, since their N init = N , only few candidates close to the global optimum are available. This means that when a particle is resampled, it will likely come from regions near to the local minimum and therefore it may be easily rejected during the third iteration of the ABC-PMC algorithm, for which the selected tolerance is  (N, 2N, 5N, 10N ) for the Silk et al. (2013)'s model. The proposed aABC-PMC algorithm allows for small q t 's early on, when larger improvements occur between the sequential ABC posteriors. By doing so, larger reductions in the tolerance sequence can be taken in the first iterations of the ABC-PMC, which results in moving away from local optimal values into better regions of the parameter space. If a sufficient reduction of the tolerance is not made early on, achieving a good approximation of the true posterior distribution is unlikely because the distances associated with the local optimal values will overwhelm the particle system so that it gets stuck in the local region.
As previously done for the Gaussian Mixture Model example presented in Section 3.1, we conclude the analysis of this model by performing a comparison between our adaptive aABC-PMC approach and the ABC-SMC algorithm by Del Moral et al. (2012). Again, four initial particle sample sizes of N init are considered (N, 2N, 5N, and 10N) and 21 independent runs with the same dataset are performed. The results include the runs that produced the median number of total draws and are compared to the corresponding results obtained by our adaptive approach. The 5 parameters required by the ABC-SMC algorithm have been fixed as follows: N = 1, 000, = 0.00025, α = 0.5, M = 1 and nb threshold = N/2. We note again that default values are used for the last three parameters, following the suggestions by Del Moral et al. (2012). The results of the analysis are summarized in Table 5 and the corresponding posterior distributions are displayed in Figure 7. From Table 5 with k = 5, although the number of total draws of the ABC-SMC algorithm is smaller than the corresponding total number of draws obtained by the adaptive aABC-PMC, our procedure is faster in terms of computational time. Moreover, the final ABC posterior distribution obtained by the aABC-PMC algorithm (H dist = 0.064) matches the true posterior distribution better than the one obtained by the ABC-SMC sampler (H dist = 0.388). On the other hand, the ABC-SMC sampler successfully explores relevant regions of the parameter space for k = 1 and k = 2, while our aABC-PMC failed to reach the global mode for k = 1, 2 because too few particles from the global mode were drawn in the first iteration of the procedure. However, the final ABC posterior distribution obtained by the aABC-PMC algorithm with the recommended k = 5, and for which H dist = 0.064, better matches the true posterior compared to any ABC posterior distribution obtained by the ABC-SMC algorithm (Figure 7).

Bacterial Infection in Day Care Centers Example
The final model we consider, discussed by Numminen et al. (2013), uses data on colonizations of the bacterium Streptococcus pneumoniae. Discussion about mathematical models for such scenarios, known as household models, can be found in Hoti et al. (2009) or Brooks-Pollock et al. (2011). According to the specifications provided in Numminen et al. (2013), the transmission process is modeled with four parameters. Two parameters, β and Λ, account for the hazards of infection from the day care center and from the community, respectively. Another parameter, θ, scales the probability of co-infection. Finally, the parameter γ corresponds to the rate of clearance of an infection. In the following analyses we considered γ = 1 fixed and known, to be consistent with the analysis in Numminen et al. (2013).
The observed data consists of the identified pneumococcal strains in a total of 611 children from 29 day care centers, with varying numbers of sampled attendees per day (Vestrheim et al., 2008(Vestrheim et al., , 2010. For each of the 29 day care centers, a binary matrix with varying number of sampled attendees is available. For each sampled attendee, the state of carrying one of the 33 different pneumococcal strains or not is indicated by a 1 or 0, respectively, in the binary matrix. As pointed out in Gutmann and Corander (2016) statistical inference is challenging in this setting since the data represent a snapshot of the state of the sampled attendees at a single time point only. Moreover, the modeled system involves infinitely many correlated unobserved variables, since the modeled process evolves in continuous time. Using the observed colonizations with bacterial strains, the following four summary statistics are obtained for each of the 29 day care centers: the Shannon index of diversity of the distribution of the observed strains, the number of different strains, the prevalence of carriage among the observed individuals, and the prevalence of multiple infections among the observed individuals. By doing so, the dimensionality of the problems reduces from a 611 · 33 · 29 = 584, 727 dimensional space to a 4 · 29 = 116 dimensional space. Numminen et al. (2013) use the four summary statistics and four tolerances, = ( 1 , 2 , 3 , 4 ), for each iteration of their procedure. Instead, we use the approach of Gutmann and Corander (2016). Each of the four summary statistics is rescaled so that the maximum value for each of the four the summary statistics is one. Then the summary statistics are vectorized in order to obtain a single vector of dimension 116. Finally the L 1 distance between the vector corresponding to y prop and the vector corresponding to y obs is calculated, with the result divided by 116. By doing so, only one tolerance is used in the ABC procedure.
The series of tolerances used in Numminen et al. (2013) is the importance weight for particle J = 1, . . . , N at iteration t as defined in (1.1). Once the ESS is estimated by using (3.3), the new tolerance t+1 is obtained by solving the following for t+1 : where q t is some pre-selected quantile which varies between 0 and 1. Numminen et al. (2013) had to adjust this to work for their setting with four tolerances. We note that our aABC-PMC approach does not require the specification of a quantile q t , nor other parameters such as the number M of simulations performed for each particle, the minimal effective sample size threshold below which a resampling of particles is performed, nb threshold , and the final tolerance level, final . Further details on the ABC-SMC algorithm and discussions on how to properly select its required parameters can be found in Del Moral et al. (2012).
The prior distributions for the three parameters of interest are β ∼ Unif(0, 11), Λ ∼ Unif(0, 2), and θ ∼ Unif(0, 1). Starting from the second iteration of the ABC-PMC algorithm, proposals are perturbed with Gaussian kernels, using the specifications of Beaumont et al. (2009). The desired particle sample size was set at N = 10, 000. For the aABC-PMC algorithm, the initial number of draws sampled from the prior distributions is set to N init = 5 × 10, 000, in order to appropriately explore the parameter space.
The results of the analysis are summarized in Table 6, where the proposed adaptive rule for selecting the quantile performs better than the ABC-SMC algorithm both in terms of the computational time (3 days and 5 hours vs. 4 days and 12 hours using a cluster computer) and the total number of draws (1, 085, 696 draws vs. 2, 199, 760 draws). Because the proposed sampling procedure stops after t = 4 iterations, the expensive forward model is used fewer times, achieving final posterior distributions in a shorter amount of time. We note that the number of particles sampled in the first iteration has an important role in the performance of the algorithm. In fact, having sampled from the priors D 1 = 50, 000 particles allowed the aABC-PMC algorithm to initiate with a smaller tolerance 1 = 1.26 compared to the ABC-SMC algorithm ( 1 = 3.91 by fixing D 1 = 10, 000 particles). Numminen et al. (2013) aABC  Table 6: Bacterial infection in day care centers results. The number of draws needed in each iteration to reach N = 10, 000 accepted values for the ABC-SMC as presented in Gutmann and Corander (2016) and the proposed aABC-PMC algorithm. In the aABC-PMC algorithm also the quantile automatically selected through the iterations is available. The procedure stopped once the quantile q 5 = 0.993 was calculated. For the ABC-SMC algorithm a total of 2, 199, 760 (4 days and 12 hours on a cluster with 200 cores) draws are required, while our aABC-PMC takes 1, 085, 696 draws (3 days and 5 hours on a cluster with 200 cores).

Concluding Remarks
The ABC-PMC algorithm of Beaumont et al. (2009) has lead to great improvements over the basic ABC rejection algorithm in terms of sampling efficiency. However, to use ABC-PMC it is necessary to define a sequence of tolerances along with the total number of iterations. We propose an approach leveraging ratio estimating methods for shrinking the tolerances by adaptively selecting a suitable quantile based on the progression of the estimated ABC posteriors. The proposed adjustment to the existing algorithm is shown to be able to deal with the possible presence of local modes and shrinks the tolerance in such a way that fewer draws are needed from the forward model compared to commonly used techniques for selecting the tolerances. A simple criterion for stopping the algorithm based on the behavior of the sequential ABC posterior distribution is also presented. The empirical performance in the examples considered suggests the proposed aABC-PMC algorithm is superior to the other options considered in terms of computational time and the number of draws from the forward model. Based on the computational experiments we envisage that the proposed aABC-PMC algorithm performs generally well when dealing with small to moderate dimensional problems for which the original ABC-PMC algorithm was developed. It remains as a challenge for the future research to generalize these samplers to higher dimensional models.
inference with type Ia supernovae: approximate Bayesian computation for a complete treatment of uncertainty." The Astrophysical Journal , 764: 116. 397, 400