Error Control of the Numerical Posterior with Bayes Factors in Bayesian Uncertainty Quantiﬁcation

. In this paper, we address the numerical posterior error control problem for the Bayesian approach to inverse problems or recently known as Bayesian Uncertainty Quantiﬁcation (UQ). We generalize the results of Capistr´an et al. (2016) to (a priori) expected Bayes factors (BF) and in a more general, inﬁnite-dimensional setting. In this inverse problem, the unavoidable numerical approximation of the Forward Map (FM, i.e., the regressor function), arising from the numerical solution of a system of diﬀerential equations, demands error estimates of the corresponding approximate numerical posterior distribution. Our approach is to make such comparisons in the setting of Bayesian model selection and BFs. The main result of this paper is a bound on the absolute global error tolerated by the numerical solver of the FM in order to keep the BF of the numerical versus the theoretical posterior near one. For two examples, we provide a detailed analysis of the computation and implementation of the introduced bound. Furthermore, we show that the resulting numerical posterior turns out to be nearly identical from the theoretical posterior, given the control of the BF near one.


Introduction
Bayesian Uncertainty Quantification (UQ) is a recently coined term and has received considerable attention in the last ten years, by several researchers in many fields, as an alternative solution to some classical inverse problems (see Kaipio and Somersalo, 2005;Fox et al., 2013;Dashti and Stuart, 2016, for some reviews on the subject).
Many applications are found in image processing (Giovannelli and Idier, 2015;Zhu et al., 2011;Cai et al., 2011;Chama et al., 2012;Nissinen et al., 2011;Kozawa et al., 2012, for example), but there are great variety of applications in many fields as in geothermal prospection (Cui et al., 2011(Cui et al., , 2019, network analysis (Hazelton, 2010;Sun et al., 2015), heat transfer (Kaipio and Fox, 2011), pollution and ecology (Keats et al., 2010;Hutchinson et al., 2017), fusion physics (Osthus et al., 2019), tumor growth (Collis et al., 2017;Kahle et al., 2019), to mention a few. As with most terms, "UQ" has a level of arbitrariness: indeed, all of statistics is partly concerned with quantifying uncertainty. However, in the inverse problems community UQ mostly refers to the theory and practice of probabilistic propagation of uncertainty and probabilistic (Bayesian) inference in the context of complex regressors involving systems of differential equations. The subject is indeed relevant to the statistical, and especially, to the Bayesian community. We address in this work what we believe is an important issue of Bayesian UQ.
In more concrete words, Bayesian UQ is Bayesian inference on a possibly infinitedimensional parameter θ with data y, e.g. y = H(F(θ)) + ; ∼ N m (0, σ 2 I); where the regressor F(θ) or Forward Map (FM) is commonly a complex non-linear map, with input parameters θ, defined by an initial/boundary value problem for a system of ODEs or PDEs. Consequently, to evaluate F(θ) at a single θ value we must solve the corresponding system of ODEs or PDEs (H is an observation functional or link function, formal details will be given in Section 2). However, the analytical solution of such problem is seldom available and we must work with a numerical approximation F α(n) (θ), in the classic numerical analysis sense (Quarteroni and Valli, 2008;Iserles, 2009). Here α(n) represents a discretization used to approximate the FM and as n increases the discretization becomes finer and the approximation error becomes smaller. This commonly involves the use of serious numerical methods and large computing power to evaluate F α(n) (θ).
Bayesian UQ refers to the solution of the above problem in terms of inference on the unknown parameter θ, by stating a prior π(θ) and seeking its posterior distribution (via MCMC or any other standard or tailor-made method).
In this paper, we are concerned with the numerical error induced in the numerical posterior with respect to the theoretical posterior, when considering the approximate FM F α(n) (θ) instead of the exact, theoretical, FM F(θ). We are mostly interested in establishing guidelines for choosing a discretization level α(n) for the FM: a critical first problem to be considered when working with numerical FMs, or any other computer model.
Controlling the numerical error induced in the posterior distribution is an important issue in Bayesian UQ (Cotter et al., 2010;Yan and Guo, 2015;Lie et al., 2018;Dashti and Stuart, 2011). Capistrán et al. (2016) (CA in the following) present an approach to address the problem in the finite-dimensional case using Bayes Factors (BF) of the numerical model vs. the theoretical model. In an ODE framework, these odds are guaranteed to converge to one in the same order as the numerical solver used. For high order solvers Capistrán et al. (2016) illustrate, by reducing the step size in the numerical solver, that there should exist a point at which the BF is nearly one, but for a fixed discretization threshold, α(n) (step size) greater than zero. This threshold, in turn, depends on the numerical FM error compared to the data noise σ, the sample size, and other factors. This is the main point made by CA: it could be possible to calculate a threshold in the numerical error of the FM such that the numerical posterior is basically equal to the theoretical posterior so, although we are using an approximate FM, the resulting posterior is nearly numerical error-free. CA show, with some examples, that such optimal solver discretization leads to negligible differences in the numerical and the theoretical posterior since the BF is close to one, potentially saving CPU time by choosing a coarser solver.
However, the approach from CA still has some shortcomings. Firstly, it depends crucially on estimating the normalizing constants from Monte Carlo samples of the unnormalized posterior, for a range of discretizations α(n). This is a complex problem, and the subject of current research, since it is difficult to reliably estimate these normalizing constants in high dimensional problems (Moral et al., 2017;de Valpine, 2008). Secondly, CA's approach is as yet incomplete since one would need to decrease α(n) systematically, calculating the normalization constant of the corresponding numerical posterior to eventually estimate the normalization constant of the theoretical posterior (see figure 2 in CA), which in turn will pinpoint a discretization at which both models are nearly identical. The main difficulty here is that only after calculating the posterior for some small step sizes, we estimate that a larger step size was sufficient, with an estimated BF close to one. That is, one has already made the CPU effort of calculating the posterior distribution for a grid of small step sizes, and therefore it renders useless the selection of the optimal step size.
To improve on CA, the idea of this paper is to consider the expected value of the BFs, before data is observed. We will try to bound this expected BF to find general guidelines to establish error bounds on the numerical solver, depending on the specific problem at hand and the sample design used, but not on particular data. These guidelines will be solely regarding the FM and, although perhaps conservative, they may represent useful bounds to be used in practice. Moreover, we generalize CA to an infinite-dimensional setting.
To fix ideas, we present an example. Imagine we have a metal rod subject along its length x to a heat profile described with the function f (x) = sin (πx) (its forcing), and where its edges are kept at a reference temperature value (e.g. zero). The thermal conductivity of the rod a(x) along its length is unknown. The forcing is applied, and the system is left to reach equilibrium, at which point temperature measurements are taken at fixed points along the rod. This system is commonly described as the stationary heat subject to Dirichlet boundary conditions u (0) = u (1) = 0, with forcing term f (x) = sin (πx) and thermal conductivity a(x) > 0. The Forward Map in this case is: given a(x), solve the above differential equation to find u, namely F[a] = u.
Given fixed measurements locations x 1 , x 2 , . . . , x m , the observation functional is H(u) = (u(x 1 ), u(x 2 ), . . . , u(x m )) . Under additive Gaussian noise, the data y = (y 1 , y 2 , . . . , y m ) satisfies Bayesian inference is required for the unknown parameter a, the thermal conductivity. A Finite Element numerical solution (see Brenner and Scott, 2008;Reddy, 2006, for example) is required to find a numerical approximation F α(n) [a] to the theoretical FM F [a]. If prior information tells us that the rod is homogeneous then we may set a(x) = θ ∈ R + and the problem is parametric inference of dimension one. In other circumstances prior information might lead us to think that the thermal conductivity is increasing linearly with length, that is a(x) = θ 0 + θ 1 x etc. On the other hand, little may be known about a(x) and the whole function may be needed to be inferred, in which case the parameter is infinite dimensional belonging to a functional space. A particular representation may be chosen, for example a( and their corresponding priors π and π k . There are several results published recently proving that the numerical versions of the posterior tend to the theoretical posterior (see Section 2.2 for details). That is, under some regularity conditions, as n increases, the numerical posterior becomes a better approximation to the theoretical posterior, where the limit has been adequately defined using the Hellinger or Total Variation measure norms. That is, we already know how and in what sense the approximations are consistent. However, in applications, we will still need to choose a particular (finite) n.
Accordingly, the problem we try to address in this paper is: How can we control the numerical error in the posterior distribution? Specifically, how can we choose a suitable discretization n to control such numerical error properly?
The basic idea is to establish the relative merit of the numerical posterior vs. the theoretical posterior using Bayesian model selection as a function of n, to keep both models with a Bayes Factor of nearly one. We prove how this is possible, for finite n, resulting in a posterior distribution with negligible numerical error w.r.t the theoretical posterior. Indeed, this depends on how significant the numerical error in the FM is in comparison to the data error σ, the sample size, etc. We do not discuss here the use of an approximate prior π k , only a brief comment is given on this respect in Section 3.1.
After explaining our formal setting and notation in Section 2, we prove our main result in Section 3 for Banach spaces and considering any location-scale family for the distribution of the data. In Section 4 we present two examples to illustrate our approach, and in Section 5, a discussion of the paper is given.

Setting and preliminary lemmas
For completeness and to establish the theoretical setting needed for our main theorems in Section 3, we discuss here the existence and consistency of the posterior distribution in general Banach spaces. Note that in other contexts beyond the standard UQ literature, the foundation of Bayesian theory in abstract spaces is very well known (here we outline a particular version of this general Bayesian theory see, for example, Schervish, 1997;Ghosal and van der Vaart, 2017). Here we adhere to the standard Bayesian approach in which, among many other details, the aim is to define a joint measure of both observables and unknowns, from which a posterior measure is obtained. As opposed, commonly in the UQ literature, the posterior measure is postulated as a Radon-Nikodym derivative w.r.t the prior and equal to the likelihood (e.g. Stuart, 2010;Dashti and Stuart, 2016;Bui-Thanh and Ghattas, 2014), and then proved to exist and be well defined, as opposed to standard Bayesian theory in which such Radon-Nikodym derivative w.r.t the prior is a consequence of the existence and regularity of the posterior measure (see, for example, section 1.3 and eq. (1.1) of Ghosal and van der Vaart, 2017).
In the case when θ is a function, we need to establish a proper setting for this Bayesian inference problem in which the parameter θ belongs to a general Banach space. The same setting also applies to the finite-dimensional case since R d is also a Banach space. A general enough and convenient setting is as follows.
Let Y ∈ Y ⊂ R m be the data at hand and {P θ : θ ∈ Θ} be a family of probability models for Y . We assume that for each θ ∈ Θ the observable Y has a density f θ (y) w.r.t a σ-finite measure λ, namely a product of the Lebesgue and/or counting measures in R m to accommodate, possibly, continuous and/or discrete observations. That is for all measurable A. For example, Y is a product space of subsets of R or Z, leading to continuous and/or discrete data. This is the usual setting in parametric inference.
In any case, with the usual topological considerations, we assume Y is a Polish space. Polish spaces include complete metric spaces that have a countable dense subset. Y should be viewed as a Polish space with the standard metric in R and the discrete metric in Z, and λ then results in a Borel σ-finite measure on Y. P θ is then a Radon measure for all θ ∈ Θ, since any Borel probability measure on a Polish space is Radon.
Until now, the parameter space Θ is arbitrary. In order to define a probability measure π on Θ (namely, a prior distribution) we need to define a measurable space (Θ, @). So far f θ (y) cannot be considered a conditional distribution. f θ (y) is already λ measurable and we assume also that is not only measurable in θ, but for fixed y, continuous in θ. More precisely, assume that Θ is a separable Banach space, @ its Borel σ-algebra, and θ → f (y|θ) is continuous for all y ∈ Y. Therefore, since Y is a Polish space, the posterior measure is well defined (see, for example, Christen and Pérez-Garmendia, 2020, and references therein). Namely, for any π-measurable g we have that the posterior measure Q y [π] is defined by where now we use the more common notation f (y|θ) = f θ (y). Or equivalently, the Radon-Nikodim derivative of the posterior measure w.r.t the prior is proportional to the likelihood, i.e.
To make the dependency on the prior explicit, we adopted above the notation Q y [π] for the posterior measure. When substituting the likelihood with a numerical approximation f n (y|θ), we denote the corresponding posterior as Q n y [π] with Z n (y) its normalization constant.

The inverse problems setting
We follow the general Bayesian UQ setting, as explained in, for example, Scheichl et al. (2017); Dashti and Stuart (2016). Let Θ and V be separable Banach spaces, let F : Θ → V be the Borel measurable forward map (FM) and H : V → A ⊆ R m+s the Borel measurable observation operator. The composition H • F defines a Borel measurable mapping from the parameter space Θ to the data sample space in R m , plus possibly additional parameters. Going beyond Gaussian noise assume that f o (y|η) is a density for data y w.r.t. λ for all η ∈ A. The parametric family of sample models, as in Section 2, is defined with the family of λ-densities To fix ideas, we elaborate on the usual independent Gaussian noise case, namely If σ is unknown, we may take s = 1 and include it as a parameter. The same holds if we had an unknown variance-covariance matrix. We do not discuss this case further in the main part of the paper. Some notes are added in Section 5 regarding the case when σ is unknown.
Let F α(n) be a discretized version of the forward map F, for some discretization α that depends on an integer refinement n. This is the actual numerical version of the forward map defined in our computers. Let f n (y|θ) = f o (y|H(F α(n) (θ))) be the resulting discretized numerical likelihood. In the rest of the paper we take the following assumption. If H • F and H • F α(n) are continuous then θ → f (y|θ) and θ → f n (y|θ) are continuous. Therefore all requirements are met for the posterior measures Q y [π] and Q n y [π] to be well defined (as explained in Section 2), either considering the theoretical or the numerical likelihood, respectively (with Z(y) and Z n (y) their corresponding normalization constants). In Dashti and Stuart (2016); Scheichl et al. (2017) it is also assumed that H(F(θ)) is continuous.
Note that if we consider independent data with a location-scale model as where ρ(x) is uniformly Lipschitz continuous and σ known, the first part of assumption 2.1 is met and we only require to establish that H Assume a global error control of this numerical FM as for all θ ∈ Θ and some functional · . Note that (3) is a global bound, valid for all θ ∈ Θ and includes already the observational operator H. That is, it is a global bound but it is only a statement at the locations H j s where each y j is observed, j = 1, . . . , m.
Regarding the functional · , if we have a discretization in space and time with Δx = h x /n, Δt = h t /n, then α(n) = (Δx, Δt) and it could be the case that |α(n)| = Δx + Δt depending on the numerical method used to solve F(θ). There are a great variety of methods, but it is assumed that n represents the discretization level and |α(n)| → 0 as n → ∞. We assume that α(n) is properly defined so that p signifies the method's order.
From assumption 2.1 f o (y|η) is uniformly Lipschitz continuous for any given y.
for all θ ∈ Θ, which is also a global error bound, now for the numerical likelihood, since the constant K 1 = LK 0 is independent of θ. This is the general setting needed to formally state the Bayesian UQ problem.

Consistency results
Once the existence of the theoretical and numerical versions of the posterior measure has been assured, the natural next question is that of approximations' consistency. That is, if and how and in what sense the numerical posterior measure Q n y [π] tends to the theoretical posterior measure Q y [π], as the discretization α(n) becomes finer (i.e. n increases).
For example, using (4), Hosseini and Nigam (2017) prove that ||Q n y [π] − Q y [π]|| H → 0, adding regularity conditions on − log f (y|θ); (see Hosseini and Nigam, 2017, and references therein), where ||·|| H is the Hellinger norm for measures. Christen and Pérez-Garmendia (2020) prove that ||Q n y [π] − Q y [π]|| T V → 0 using (4); here || · || T V is the Total Variation norm (although note that the Hellinger and Total Variation norms are equivalent, i.e. the latter implies the former and vice versa Gibbs and Su, 2002) and also that Z n (y) → Z(y). That is, for a given problem, we may safely assume, after verifying the regularity conditions needed, that as n increases (the discretization becomes finer), in a suitable measure norm || · ||. Moreover, Christen and Pérez-Garmendia (2020) As mentioned before, we still need to fix a discretization level to compute the posterior distribution. The previously mentioned results provide a sound theoretical basis for Bayesian UQ, but as such, they do not address the latter problem, which is the primary concern of this paper. Accordingly, next, we turn to our main result on choosing the discretization level n using Bayesian model comparison.

Expected a priori error bounds and Bayes Factors
As in Capistrán et al. (2016) in order to find reasonable guidelines for choosing a discretization level n, we compare the numerical posterior Q n y [π] with the theoretical posterior Q y [π] using Bayesian model selection, namely Bayes Factors (BF). Assuming an equal prior probability for both models, the BF is the posterior probability odds of one model against the other, that is p 1−p where p = Z n (y) Z n (y)+Z(y) , the posterior probability of the numerical model. Indeed, the BF is the ratio of the normalization constants Z n (y)

Z(y)
and in our case, it is the ratio of the posterior probabilities p and 1 − p.
The use of Bayesian model selection and BF has not been absent of debate, leading to some stylized alternatives discussed in many specific contexts (Hoeting et al., 1999;Berger and Pericchi, 1996;Kass and Raftery, 1995;O'Hagan, 1997;Xu et al., 2019;van der Linden and Chryst, 2017). Some of these concerns relate to prior specifications and prior sensitivity. In the extreme case, improper priors cannot be used since they are established up to an arbitrary constant and Z(y) is indeterminate; additional provisions need to be taken in this context (e.g. Berger and Pericchi, 1996). Moreover, when the models are diverse (e.g. having different number of variables), it is, in general, a complicated task to specify a prior distribution that 1) conveys, in either case, the correct prior information for each model and 2) does not bias one model over other. Even trying to establish "diffuse" priors can be problematic since the meaning of "diffuse" might change from model to model. Moreover, two different models might have a BF equal to one if they have the same predictive power, thus equal posterior probabilities for two models do not mean that the two models are the same, etc. (see Clyde et al., 2007, for a discussion and references therein).
Fortunately, the above mentioned common concerns in the use of Bayesian model comparison are not present when comparing Q n y [π] with Q y [π]. Note first that from the convergence and error control in the FM in (3) we may conclude (with further assumed regularity conditions) that the models converge as explained in Section 2.2, namely ||Q n y [π] − Q y [π]|| → 0. The numerical posterior Q n y [π] is not of some different nature but only a numerical approximation to Q y [π] and since we are assuming Z(y) is the correct sampling distribution, lack in the predictive power of Q n y [π] can only be attributed to its approximate nature (we do not discuss here model fit/model adequacy, a comment is given in this respect in Section 5). Note also that the prior measure π is assumed proper and Z n (y)/Z(y) → 1 as in (5). Therefore the criterion for choosing n is then sough for the numerical approximation Q n y [π] to have nearly the same expected predictive power as Q y [π], being all other pieces equal and knowing that ||Q n y [π] − Q y [π]|| → 0.

The expected Absolute BF
In terms of model equivalence an alternative expression conveying the same BF odds Z n (y)/Z(y) is 1 2 comparing Q y [π] with Q n y [π]. We now try to control the Bayes Factor between the discretized model and the theoretical model, Z n (y) Z(y) , through the use of what we call the Absolute BF (ABF) in (6). In order to do that, independently of the specific data at hand, we try to bound the expected ABF (the EABF), in terms of estimates on the error in the numerical forward map, as in (3). The idea is to keep the EABF below a small threshold (e.g. 1 20 ) so that the BF is close to one and evidence in favor of the theoretical model over the numerical model is "not worth more than a bare mention" (Kass and Raftery, 1995;Jeffreys, 1961).
Note in the last part of the above proof how the global error bound is needed for all j as in (3), nonetheless, we could restrict this bound for θ ∈ supp{π} ⊂ Θ since we are integrating w.r.t π(·).
We may now attempt to calculate the remaining double integral by changing the order of integration to calculate M (H(F(θ)))π k (dθ) where This in general leads to no simplification, however if it happens that M (η) does not depend on θ it can then be very useful. This is the case for any location-scale family and we present it next.

Theorem 3.2.
With the setting of Theorem 3.1, assuming independent data arising from a location-scale family, namely Proof. From (8) note that , and we obtain the result since the latter does not depend on η.
Since K 0 α(n) p is the error in the FM (with the observation operator in (3)), measured in the same units as the y j s, note from (9) that K0 α(n) p σ is the relative error in the numerical FM with respect to the standard error in the observations σ. It makes much sense to measure K 0 α(n) p with respect to σ, i.e. the proportion of numerical error in the FM w.r.t the error in the observations. Note that K0 α(n) p σ is unit free. In the usual case of independent Gaussian errors, with known variance σ 2 , ρ(x) = 1 √ 2π e − x 2 2 and ρ(0) = 1 √ 2π . If we let the EABF ≤ b, and for example b = 1 20 = 0.05, we expect nearly no difference in the numerical and the theoretical posterior. If we set the error in the FM K = K 0 α(n) p then we require ρ(0) K σ m < b, that is, we need the numerical error in the FM in (3) to satisfy Our suggested procedure is to run the solver, including an after-the-fact error estimate (or a posteriori error estimate, we use after-the-fact given the conflict of terms with the Bayesian jargon, see Quarteroni and Valli, 2008;Iserles, 2009, for example), that is, after running our solver to evaluate F α(n) (θ), an upper estimateK of K is obtained, for that particular θ value. If the error in the FM does not comply with the bound in (10), then run the solver again with a finer discretization α(n). In passing, we guarantee (3) for all θ ∈ Θ. Note that for ODEs, the RK45 method (Rungue-Kutta order 5 method of Cash and Karp, 1990, for example) produces after the fact error estimates. More recently, the discontinuous Galerkin method for PDEs may include high order solvers with after the fact error estimates (Di Pietro and Ern, 2011;Hesthaven and Warburton, 2007). In general, error estimates for PDEs are much harder to obtain, and the usual strategy is to consider adjoint-based methods.
The numerical posterior distribution, including the approximate likelihood f n (y|θ) and the EABF bound is therefore . A common problem in Bayesian UQ is that we also need to approximate the prior π with an approximate sequence of priors π k . For example, π is a theoretical infinite dimensional prior and π k are finite dimensional approximations. To choose a specific prior π k Theorem 3.1 may be easily extended, resulting in the same bound for the EABF adding the term ||π k − π|| T V . However, many details need to be considered before this bound is of any theoretical or practical use, as for example, the validity of the total variation metric, the sense in which π k tends to π etc. These problems involve many technical details and are matters of current research (Stuart, 2010;Dashti and Stuart, 2016;Bui-Thanh and Ghattas, 2014;Hairer et al., 2014;Zhou et al., 2017;Scheichl et al., 2017;Ghosal and van der Vaart, 2017).

Examples
In Sections 4.1 and 4.2 we present two examples considering Bayesian UQ problems for 1D and 2D heat equations, respectively. Using our bound in (3), we compare the resulting posterior distribution to illustrate our approach's performance.

A 1D heat equation inferring the thermal conductivity
Let us return to the thermal conductivity problem for the stationary heat equation in 1D, briefly mentioned in Section 1. Namely subject to Dirichlet boundary conditions u (0) = u (1) = 0, with forcing term f (x) = sin (πx) and thermal conductivity a(x) that varies with the space parameter x. Here, the functions a and f are assumed to be continuous on [0, 1] and 0 < α 0 ≤ a(x) ≤ α 1 < ∞.
In this example, the FM is not available analytically, and a numerical FM is used. We use an error estimation in the FM to bound the EABF.
The numerical solution of (11) is computed using the Finite Element Method (FEM), which allows us to calculate a local error estimation in the L 2 norm (see Babuška and Rheinboldt, 1978, for more details), given by where n is the number of elements, u h the numerical solution with step size h, is the residual and Then, the error estimationK 0 is computed bŷ The inference problem is the estimation of the function a(x) = exp(b(x)) given observations of u j = u(x j ) at fixed locations x j , j = 1, . . . , m. Certainly, the theoretical and the numerical FMs are continuous.
We simulate a synthetic data set with the true thermal conductivity is a ( , and error model y j = u (x j ) + σε j , where ε j ∼ N (0, 1), with the following parameters k 0 = 5, r = 0.9, a = 20, s = 2 and σ = 0.0005 (to maintain a 0.01 signal-to-noise ratio). The data are plotted in Figure 1(b). We consider m = 30 observations at locations x j regularly spaced between 0 and 1.
In order to define the parametric space, the function a is represented as a third-order b-spline that passes through the set of points In this case, since the FEM used is numerically demanding we keep the prior truncation fixed to k + 1 terms; namely the control points {a i } k i=0 , at locations {x i } k i=0 , for the third-order b-spline. In this example we take k = 20.
Regarding the prior distribution for the parameters {b i } k i=0 , we define their prior using a Gaussian Markov random field (GMRF) with zero mean and sparse precision matrix (inverse-covariance), encoding statistical assumptions regarding the value of each element b i based on the values of its neighbors (see details in Bardsley and Kaipio, 2013). We restrict the support of −∞ < β 1 ≤ b(x) ≤ β 2 < ∞, this implies that a i = exp(b i ) ∈ [exp(β 1 ), exp(β 2 )]. Then the parameter space is compact and there exists a global bound for (12), complying with (3).
With the standard error and sample size used, calculating the error bound for the Forward Map (FM) as stated in (10), we requireK 0 < 2.1 × 10 −6 . To sample from the posterior distribution, we use the t-walk (Christen and Fox, 2010).
Regarding the numerical solver, we begin with a relatively large step size h = 0.02 (considering n = 50 elements in the FEM) and start the MCMC. At each iteration the FM is first computed along with its error estimationK 0 . If the solution u h does not satisfy the estimated global bound, i.e.K 0 > 2.1 × 10 −6 , we increase the number of elements by 50 (h = 1/(n + 50)), until the bound is met. For h = 0.0066, n = 150 elements in the FEM, the bound is achieved for all iterations. For comparisons, a smaller grid is considered with h = 0.002, n = 500 elements. The results are shown in Figure 1. Shaded areas represent the uncertainty in the model fit, as draws from the posterior distribution, using 150 elements (blue) and 500 elements (yellow). Note that, if we use a smaller step size than that required by the bound in (10), results are basically the same simply adding CPU time.
We took 50,000 iterations of the t-walk, the MCMC mixes quite well. With n = 150 the sampling took 3 min and with n = 500, 16 min; in a standard 2.6 GHz processor computer. As seen in Figure 1 the conductivity is recovered and taking n = 500 elements in the FEM results in basically the same posterior as for only n = 150, which already comply with the EABF bound, only resulting in unnecessary CPU effort.

A 2D heat equation inferring the initial condition
In this example, we consider a more complex 2D PDE inverse problem. The FM is available analytically, and a numerical FM is also used for comparisons. In this case, the numerical error is directly calculated, and we infer only two parameters.
We present a 2D heat equation problem to determine the initial conditions from observations of transient temperature measurements taken within the domain at a time t = t 1 . The heat transfer PDE is given by u(x, y, t) = 0 on ∂D, u(x, y, 0) = f (x, y) .
Taking the forcing term f (x, y) = b sin (πx) sin (πy) + c sin (2πx) sin (πy) as initial condition, the PDE has an analytical solution u(x, y, t) = b exp −2απ 2 t sin (πx) sin (πy) + c exp −5απ 2 t sin (2πx) sin (πy) . A numerical solution of the boundary value problem represented in (13) is also computed using the Finite Element Method (FEM) within FEniCS (Martin et al., 2015), which allows us to calculate the error in the numerical solver using the exact solution.
The inferential problem is to estimate θ = (b, c) given measurements of u at time t 1 = 0.3. A priori we took independent truncated Gamma distributions for b and c with parameters (2, 0.7) and (2, 0.4) respectively, both restricted to [0,8]. Certainly, the theoretical and the numerical FMs are continuous, and since the support is compact we may conclude that the error bound in (3) exists for all θ.
We simulate a synthetic data set with the error model where ε i ∼ N (0, 1), i = 1, . . . , m, σ = 0.3 (using a signal to noise ratio of 5%), with b = 3 and c = 5. The data are plotted in Figure 2(b). We consider m = 25 observations, (x i , y i ), i = 1, 2, . . . , m regularly spaced on D. Since we have an analytic solution, if we run the PDE solver we may calculate the maximum absolute error, K 0 , exactly. The error bound for the FM as stated in (10) is 0.0015. To sample from the posterior distribution, we also use the t-walk (Christen and Fox, 2010).
Regarding the numerical solver we start with a large step size of Δx = Δy = 0.1 and Δt = 0.268, and calculate K 0 . If the solution does not comply with the bound, that is, if K 0 > 0.0015, a new solution is attempted by reducing the step-size in Δx, Δy and Δt by half, until the global absolute errors are within the bound, i.e., K 0 0.0015. The resulting mesh is Δx = Δy = 0.025 and Δt = 0.067. We compare the above FEM numerical FM with the exact FM, with 250,000 iterations of our MCMC. The results are shown in Figure 3 and in Table 1. The differences observed in both results may be attributed to the Monte Carlo sampling.

Discussion
The generalization of results from Capistrán et al. (2016) to a priori statements and Banach parameter spaces makes our posterior error control strategy more feasible, general, and applicable. Our results lead to the error bound for the FM's error in (10) that is quite simple to calculate, provided after-the-fact error estimates of the FM numerical approximation are available. Here we presented two examples. We have experimented with several others. In all cases, using (10) the numerical error in the posterior was controlled successfully, leading to a negligible increase in posterior numerical precision if a more precise FM was considered. This, in turn, may result in CPU time save, as cheaper/rougher solvers may be used.
We have not discussed the scenario when error parameters σ are not known and/or there is correlation in the data. In this case, we may consider that a priori θ and σ are independent, a correlation structure for y, and equivalent results should follow; this was discussed in a previously unpublished version of this manuscript but not here . We only need to prove that the new likelihood follows assumption 2.1, in particular, that it is bounded λ-a.s.
In general, one needs to decide, sooner or later, a discretization level n. Although there are multilevel methods that work with several discretizations simultaneously (Cliffe et al., 2011;Katsiolides et al., 2018), even, in that case, one nevertheless needs, a priori, a sense of what is a large or a small FM error. Our bound in (10) is an attempt to provide precisely that, a sense of what is a large or a small FM error in the perspective of the Bayesian inverse problem at hand, considering the noise level in the data (σ) and the sample size.
Since the parameter θ in many UQ applications is an unknown function, one may use a Bayesian nonparametric approach to estimate θ (BNP, see, for example Müller et al., 2015, chap. 4), using a prior measure on a set of regressors with the machinery of BNP (see Giordano and Kekkonen, 2020, and references therein).
We have not discussed here the problem of model fit/model adequacy since we assumed from the onset that the actual distribution for the data y is Z(y). The question of model adequacy is always present in applications since, indeed "all models are wrong" and certainly, it is a concern in inverse problems and computer experiments (Kennedy and O'Hagan, 2001). Generalizing our results to the scenario where the actual sampling distribution is Z * (y), different from Z(y), is an interesting next step but is left for future research.
Note that decreasing FM solver precision can only be done within the stable regime of the solver used. Moreover, in real case applications, increasing the mesh size or any mesh refinements may come at a great coding effort, for example, in a large scale 3D geothermal inversion (Cui et al., 2011). Our approach only makes sense in the case where working with different discretizations for the FM is computationally feasible.