Designing truncated priors for direct and inverse Bayesian problems

The Bayesian approach to inverse problems with functional unknowns, has received significant attention in recent years. An important component of the developing theory is the study of the asymptotic performance of the posterior distribution in the frequentist setting. The present paper contributes to the area of Bayesian inverse problems by formulating a posterior contraction theory for linear inverse problems, with truncated Gaussian series priors, and under general smoothness assumptions. Emphasis is on the intrinsic role of the truncation point both for the direct as well as for the inverse problem, which are related through the modulus of continuity as this was recently highlighted by Knapik and Salomond (2018).


Outline
We study the problem of recovering an unknown function f from a noisy and indirect observation Y n . In particular, we consider a class of inverse problems in Hilbert space, given as Here A : X → Y is a linear mapping between two separable Hilbert spaces X and Y , termed the forward operator. For our analysis, we shall assume that the mapping A is compact and injective. It will be clear from the assumptions made later, that the injectivity can easily be relaxed. These assumptions will also entail the compactness of A. The observational noise is assumed to be additive, modeled as a Gaussian white noise ξ in the space Y , scaled by 1 √ n . The problem of recovering the unknown f from the observation Y n is assumed to be illposed, in the sense that A is not continuously invertible on its range R(A) ⊂ Y . In particular, this means that R(A) is not contained in a finite-dimensional subspace. Notice that although the white noise ξ can be defined by its actions in the space Y , it almost surely does not belong to Y . Rigorous meaning to model (1) can be given using the theory of stochastic processes, see Section 6.1.1 in [14].
In the Bayesian approach to such inverse problems, we postulate a prior distribution Π on f and combine with the (Gaussian) data likelihood P n f to obtain the posterior distribution Π(·|Y n ) on f |Y n , see [9] for a comprehensive overview of the area. We are interested in studying the frequentist performance of the posterior distribution in the small noise asymptotic regime 1 √ n → 0, and hence n → ∞. More specifically, we consider observations generated from a fixed underlying element f 0 ∈ X, Y n ∼ P n f0 , and study rates of contraction of the resulting posterior distribution around f 0 , as n → ∞.
The study of rates of posterior contraction for inverse problems has received great attention in the last decade, initiated by [22]. The authors of that study considered Gaussian priors which were conjugate to the Gaussian likelihood. This results in Gaussian posteriors, having explicitly known posterior mean and covariance operator. Moreover, by assuming that the prior covariance operator and the linear map A are mutually diagonalizable, the infinite dimensional inverse problem was reduced to an infinite product of one-dimensional problems. In this way, posterior contraction rates could be determined using explicit calculations both for moderately, and in the subsequent studies [23] and [4], for severely ill-posed linear forward operators. This approach was surveyed and extended to general ill-posedness of the linear operator by the present authors in [3], using techniques from regularization theory.
Several works extended the diagonal linear Gaussian-conjugate setting to various other directions, for example [2] and [25] studied linear forward operators which are not simultaneously diagonalizable with the covariance operator of the Gaussian prior, and [20] studied linear hypoelliptic pseudo-differential forward operators with Gaussian priors.
More recently, there has been a wealth of contributions in more complex inverse problems, including non-linear ones arising in PDE models, see for example [32,15,1]. Another line of progress has been the consideration of more general priors, so far for linear inverse problems, see [34] and [16]. The idea underlying all of these works, is to first establish rates of contraction for the related direct problem with unknown g = Af , in which the data Y n are generated from g 0 = Af 0 . Once such rates are established, the strategy is to control distances on the level of f by distances on the level of g, when restricting on a sieve set S n on which the inversion of A is well-behaved. This enables to translate rates for the direct problem to rates for the inverse problem when the posterior is restricted on the sieve set S n . If the posterior mass concentrates on S n , then these rates are also valid for the unrestricted posterior. In order to establish direct rates, the authors of the above-mentioned studies use the testing approach, see [13].
Here we shall explore the methodology proposed by [21], which explicitly uses the modulus of continuity (function) in order to translate rates for the direct to rates for the inverse problem. This approach is in principle general, however, so far it has been applied to certain linear inverse problems, with moderately and severely ill-posed forward operators, under Sobolev-type smoothness assumptions on the truth f 0 . Our work is also related to [16], in that both works use approximation-theoretic techniques to control the inversion or A.
We consider (centered) Gaussian priors on f , arising by truncating the series representation of an underlying infinite-dimensional prior on the separable Hilbert space X, see e.g. [9,Sect. 2.4]. We develop a comprehensive theory for establishing rates of contraction for general linear inverse problems, under general smoothness conditions, with a particular focus on the optimal choice of the truncation level. Truncated priors are both practically relevant since when implementing one needs to truncate, but also can lead to optimal rates of contraction for a smoothness-dependent choice of the truncation level as a function of n, see e.g. [36]. Furthermore, in [34], it was shown that putting a hyperprior on the truncation level can lead to adaptation to unknown smoothness. This was done in the context of inverse problems with specific types of smoothness (Sobolev) and degree of ill-posedness of the operator (power or exponential type). See also [5], where direct models are studied. The extension of adaptation to the general framework which we consider here, is interesting but beyond the scope of this work.
Contraction rates for the problems (1) and (2) are related through the modulus of continuity of the mapping A −1 . Thus, knowing a contraction rate, say δ n for the direct problem (2), and knowing the behavior of the modulus of continuity ω f0 (A −1 , S n , δ), δ > 0, where S n is the (finite-dimensional) support of the prior, we obtain a contraction rate for f 0 as ω f0 (A −1 , S n , δ n ), n → ∞. In this program, the role of the truncation level k = k(n) is most important. There is k (1) = k (1) (n) that should be used for the inverse problem, k (2) = k (2) (n) which works for the direct problem, and finally k (3) = k (3) (n) used in the modulus of continuity. For the plan, as outlined above, to work we need to establish that actually a universal choice k = k(n) is suited for all three problems.
In Section 2 we shall introduce the overall setting of the study, and we shall formulate Theorem 1, which comprises the main achievements of this study. The rest of the study is composed of four parts.
In Section 3 we will develop the tools needed to analyze the direct problem (2) and obtain k (2) (n) depending on the underlying prior covariance and the smoothness of g 0 . Due to linearity, Gaussian priors on f induce Gaussian priors on g = Af , which streamlines the analysis of problem (2). However, the smoothness of the induced true element in the direct problem, g 0 = Af 0 , depends on the smoothing properties of the operator A and in particular, g 0 might not belong to any of the standard smoothness classes. For this reason, we shall study rates of contraction in the (direct) white noise model, given a Gaussian prior on g and under general smoothness assumptions on g 0 . Emphasis will be given on the construction of the prior. We shall analyze truncated Gaussian priors posed directly on g, obtained by truncating the series representation of an underlying infinite-dimensional Gaussian prior (called 'native', below), but also priors that are obtained as linear transformations of truncated Gaussian priors chosen for some f (called 'inherited', below). The former is relevant in the context of (2) when A commutes with the covariance operator of the underlying Gaussian prior on f . In the latter non-commuting case the analysis is more involved and restrictive. This section is self-contained and may be of independent interest. The main result is Theorem 2 and it includes a way of assessing the optimality of the obtained bounds.
In Section 4 we introduce the modulus of continuity, and we shall discuss its behavior, as δ → 0, under an approximation-theoretic perspective. The main result here will be Theorem 3, indicating the choice k (3) (n).
In Section 5 we shall show that the choice k (2) (n) yields optimal behavior also of the modulus of continuity, such that we may let k (2) = k (3) . Therefore, letting k (1) (n) = k (2) (n) = k (3) (n) yields a contraction rate for the inverse problem allowing us to establish the main result, Theorem 1. We exemplify the obtained (general) bounds at 'standard instances', with forward operators which have a moderate decay of singular numbers, a (severe) exponential decay, but also a (mild) logarithmic decay in Section 6. Many examples for such instances are known. The Radon transform is prototypical for a power type decay of singular numbers, see the monograph [31]. The heat equation is known to exhibit an exponential decay of the singular numbers, see [12], which is also a good resource for more examples. In particular, we explicitly derive (minimax) contraction rates under Sobolev-type smoothness, both for the direct and inverse problems, in the above-mentioned instances.
In order to streamline the presentation the proofs of the results are given separately in Section 7.

Setting and main result
We next define certain concepts that will be needed for the development of the paper. After establishing some notation, we introduce rates of posterior contraction for the direct and inverse problem, links between the main operators pertaining to our analysis, as well as the concept of smoothness that will be used throughout the paper. We formulate the main result in Section 2.5.

Notation
We shall agree upon the following notation. We denote by · X , · Y the norms in X, Y , respectively. When there is no confusion we will use plainly · and the same notation will be used for the operator norm in X or Y . For a (compact selfadjoint) linear operator, say G : X → X we denote by s j (G), j = 1, 2, . . . the non-increasing sequence of its singular numbers. We reserve the notation s j = s j (H), j = 1, 2, . . . for the operator H := A * A, the self-adjoint companion to the mapping A.
Furthermore, according to whether we study the inverse problem (1) or the related direct problem (2) we shall denote elements by f or g (f 0 , g 0 for the corresponding true elements).
For two sequences (a n ) and (b n ) of real numbers, a n b n means that |a n /b n | is bounded away from zero and infinity, while a n b n means that a n /b n is bounded from above.

Prior distribution and posterior contraction
We shall use priors Π which are truncations of a Gaussian prior N (0, Λ), for a self-adjoint, trace-class and positive definite covariance operator Λ. Such priors are characterized by the underlying covariance Λ and the truncation level k.
Below we shall use the notation Λ = Λ f for Gaussian priors on f in the context of (1) and Λ = Λ g for Gaussian priors on g in the context of (2).
Let us fix a prior distribution Π on the unknown f , and consider data Y n generated from the model (1) for a fixed true element f 0 ∈ X, Y n ∼ P n f0 . We are interested in deriving rates of contraction of the posterior Π(·|Y n ) around f 0 , in the small noise limit n → ∞. In particular, we find sequences ε n → 0 such that, for an arbitrary sequence M n → ∞, it holds Here E 0 denotes expectation with respect to P n f0 . One can also derive rates of posterior contraction around Af 0 , that is sequences δ n → 0, such that for arbitrary M n → ∞ Such rates δ n and ε n will be called rates of contraction for the direct and inverse problem, respectively. We are going to derive rates of contraction for the inverse problem, by deriving rates of contraction for the direct problem and using the modulus of continuity as was proposed in [21]. These rates of contraction will be obtained by means of the squared posterior contraction, a concept which will be introduced in detail in § 3.

Relating operators in Hilbert space
As highlighted in the introduction we deal with several operators. In order to facilitate our analysis we need to relate these operators and to this end we introduce the following concept.
The primary operator we deal with is the forward operator A : X → Y which governs equation (1). Its self-adjoint companion H = A * A : X → X will have the central role in our analysis. We mention the following identity: Furthermore, in order to obtain rates of contraction for inverse problems from rates of contraction for direct problems, we will need to link the underlying (untruncated) covariance operator Λ f of the Gaussian prior for f to the operator H. We will study two cases. Initially we shall assume that Λ f and H commute. Precisely, we impose the following assumption: Assumption 1 (prior in scale). There is an index function χ such that This commutative case allows for a general analysis, but has limited applicability, as it may be hard to design a truncated Gaussian prior, because the singular basis of H (and hence Λ f ) may not be known.
Instead, we may relax the commutativity assumption, and impose a corresponding link condition.
Assumption 2 (prior linked to scale). There is some exponent a ≥ 1/2 such that The requirement a ≥ 1/2 has a natural reason. We need to link the operators A and Λ f in several places, and by virtue of (5) this can be done via H 1/2 . Therefore, the case a = 1/2 needs to be covered in the assumption.
We mention, that within the non-commuting case we confine to power type links. Also, notice that χ(t) = t a in Assumption 1 yields a special instance of Assumption 2. One may extend to more general index functions, but for the sake of simplicity we do not pursue this direction here.
Both Assumptions 1 and 2 have impact on the mapping properties of A. First, the mappings H and Λ f share the same null spaces (kernels). Also, since the covariance operator Λ f , being trace-class, is compact, this compactness transfers to H, and a fortiori to A. Thus, under these links the compactness of A cannot be avoided, while its injectivity can be relaxed by factoring out the common null spaces. Remark 1. In our analysis the self-adjoint companion H to the operator A plays the role of the central operator. When studying contraction rates for the inverse problem (1), smoothness will be given with respect to it. Instead, one might give this role to the operator Λ f , and consider smoothness with respect to this operator. The analysis would be similar, and some results in this direction are given in [16,Sec. 5].

Smoothness concept
For the subsequent analysis it will be convenient to introduce the smoothness of an element h ∈ Z in a Hilbert space Z, with respect to some injective positive self-adjoint operator, say G : Z → Z, in terms of general source conditions. Definition 2 (Source set). Given a positive-definite, self-adjoint operator G, and an index function ρ the set is called a source set.
Remark 2. The sets G ρ from above are ellipsoids in the Hilbert space Z. The element v is often called source element, and the representation h = ρ(G)v is called source-wise representation. We emphasize that elements in G ρ are in the range of ρ(G), such that for the subsequent analysis Douglas' Range Inclusion Theorem, see its formulation in [25], will be used several times. It is seen from [26] that, given the injective operator G each element h ∈ Z has a source-wise representation for some index function ρ.
Below, we shall use this concept for specific operators, and specific functions. For instance, the set will correspond to a source set for the operator G := Λ g : Y → Y , and the index function ψ. In some cases we will assume that the index function ψ is operator concave. The formal definition is given in Section 7, but we refer to [7, Chapt. X] for a comprehensive discussion. Here we mention that a power type index function ψ(t) = t a with a > 0, is concave exactly if it is operator concave, hence 0 < a ≤ 1.
Example (Sobolev-type smoothness). Let u 1 , u 2 , . . . be the eigenbasis of the compact self-adjoint operator G, arranged such that the corresponding eigenvalues are non-increasing; this example can be considered either in X or Y . Given some β > 0 we consider the Sobolev-type ellipsoid Now, suppose that the singular values of G decay as s j (G) j −γ for some γ > 0. Then it is a routine matter to check that h ∈ S β yields that h ∈ G ρ for an index function ρ(t) ∝ t β/γ , t ≥ 0, see [25,Prop. 2]. Similarly the converse holds true, and there is thus a one-to-one correspondence between Sobolev-type ellipsoids and power-type source-wise representations for such operators G.

Main result
We aim at deriving posterior contraction rates for the inverse problem (1), from contraction rates for the corresponding direct problem (2) by using the modulus of continuity, for truncated Gaussian priors. The Gaussian prior on f which is truncated at level k n , has all its mass on a finite dimensional subspace X kn , and so does the posterior through the linear model (1). The following result links the rates of posterior contraction corresponding to the inverse problem (1) and the direct problem (2). The link is given by the modulus of continuity function ω f0 (H −1/2 , X kn , ·), see § 4.1 below. It is an immediate corollary of [21, Theorem 2.1]. Proposition 2.1. Assume we put a Gaussian prior on f , truncated at level k n . Let δ n → 0 be a posterior contraction rate for the direct problem (2) around g 0 = Af 0 ∈ Y , for some f 0 ∈ X. Then ε n := ω f0 (H −1/2 , X kn , δ n ), where H = A * A, is a rate of contraction for the inverse problem (1), at f 0 .
We can thus obtain contraction rates ε n for the inverse problem by obtaining rates δ n for the direct problem (2), and bounds for the inherent modulus of continuity for the inverse problem. The main result of the study implements this program in a general setting with a specific choice of the truncation level k n . Theorem 1. Consider the inverse problem (1), recall H = A * A, and suppose that f 0 has smoothness H ϕ . Assume we put a truncated Gaussian prior N (0, P kn Λ f P kn ) on f , with Λ f a self-adjoint, positive-definite, trace-class, linear operator in X, and P kn the singular projection of Λ f . We specify the related (covariance) operator where for the latter assumption we specify χ(t) = t a , and ϕ(t) = t μ , consider the index function For the choice k n according to let δ n be given as for some constant C. Then the posterior contracts around f 0 at a rate The strategy for proving Theorem 1 is loosely outlined at the end of Section 1. A main component of both the result and its proof, is the fact that the truncation levels k n , as given in (10), optimize both the rates δ n for the direct problem as well as the bounds on the modulus of continuity. These considerations can be found in § 5, where we establish the steps for proving Theorem 1.

Direct signal estimation under truncated Gaussian priors
Here we consider the Bayesian approach to signal estimation under white noise in the space Y , that is, the model where ξ is Gaussian white noise in Y .
For linear Gaussian models with Gaussian priors, it is convenient to describe posterior contraction in terms of the squared posterior contraction (SPC), which by Chebyshev's inequality, is the square of a rate of contraction.
For an element g 0 ∈ Y , given data Y n , and a truncation level k for the (Gaussian) prior, we assign where the inner expectation is with respect to the (Gaussian) posterior distribution, whereas the outer expectation concerns the sampling distribution, given the element g 0 , that is, the data generating distribution. The SPC for (regularized) untruncated Gaussian priors in the context of (linear) inverse problems, was analyzed in the previous study [3]. Here we develop a similar approach for direct problems with truncated Gaussian priors, and we will exhibit some features, as these are specific for the latter.
Having fixed a class F ⊂ Y , and given the truncation level k, we assign which is a squared rate of contraction, holding uniformly over the class F.

Native and inherited Gaussian priors
In its simplest form, a centered truncated Gaussian prior for g can be defined using some orthonormal system, say y 1 , y 2 , . . . in Y , independent and identically distributed standard Gaussian random variables γ 1 , γ 2 , . . . , and a square summable positive sequence σ 1 , σ 2 , . . . , as The square summability of the sequence σ j , j = 1, 2, . . . ensures that the prior Π Y k is the (singular) projection of an infinite dimensional prior supported in Y , having finite-trace covariance operator where Q k are orthogonal projections onto span {y 1 , . . . , y k }. We shall call this a native (truncated) prior for g.
On the other hand, a centered finite dimensional Gaussian prior for g may be defined using a linear transformation of some native truncated prior Π X k for f ∈ X, defined along some orthonormal system, say x 1 , x 2 , . . . , and with corresponding projections P k onto X k := span {x 1 , . . . , x k }, thus having covariance P k Λ f P k . The prior Π Y k for g ∈ Y is then obtained as the push forward T (Π X k ) under some linear mapping T : X → Y , and is supported on T X k . The Gaussian prior Π Y k will thus have covariance C k = T P k Λ f P k T * , and we shall call this an inherited (truncated) prior. Inherited priors are relevant for example when studying the direct problem (2) associated to the inverse problem (1). When using such an inherited prior, we will quantify the relation between the mapping T and the covariance operator Λ f driving Π X k , in order to control the effect of C k .
In this context we shall measure the smoothness of the truth g 0 relative to the covariance operator Λ g of the underlying infinite dimensional Gaussian prior on g, and we shall assume the smoothness condition g 0 ∈ Λ g ψ for some index function ψ, see (7). For inherited priors, the operator Λ g will be given as the covariance of the push forward of the underlying infinite dimensional prior on f , Λ g = T Λ f T * . We stress that for inherited priors in general we cannot ensure that the covariance C k corresponds to the singular projection of Λ g , that is that T P k Λ f P k T * = Q k Λ g Q k or equivalently that T X k coincide with the singular spaces of Λ g ; see the next subsection for details. Nevertheless, we still say that C k is truncated at level k, since it has rank k.

Basic SPC bound
We shall start with proving a basic bound on the squared posterior contraction as given in (14) in the white noise model (12), for both native and inherited truncated Gaussian priors N (0, C k ), under general smoothness on the truth.
When treating inherited priors, it will be important that the projections P k in the corresponding C k , are along the singular spaces of Λ f , such that If Λ f and T * T commute, then we will show that C k coincides with the singular projection of Λ g , and we can bound the SPC as in the native case.
In the non-commuting case we cannot ensure that C k is the singular projection of Λ g . We assign the intrinsic mapping H := T * T , and work under Assumption 2, linking the operators Λ f and T via H. Notice that our treatment in Section 3 is standalone and does not necessarily correspond to an inverse problem, however with a slight abuse of notation we can let H = T * T and assume the link condition described in Assumption 2.
Within this (finite dimensional) Gaussian-Gaussian conjugate setting, given the centered Gaussian prior with covariance C k , the posterior is also Gaussian with mean and covarianceĝ In alignment with [3, Eq. (3)], for any given g 0 and truncation level k, the SPC is decomposed as where the first summand is the (squared) bias for estimating g 0 by using the posterior meanĝ k , the second summand is the related estimation variance, whereas the last summand constitutes the posterior spread. The proof of the next result is based on this decomposition.

Proposition 3.1.
Consider the white noise model (12) with a Gaussian prior N (0, C k ), and with underlying truth g 0 ∈ Λ g ψ for some index function ψ (see (7)), where either and where in the latter case we let Λ g = T Λ f T * , and we assume that the function ψ is operator concave. There is a constant C 1 ≥ 2 such that for any truncation level k the squared posterior contraction is bounded as

Optimized SPC bound
We aim at optimizing the general bound (19). This bound is constituted of two k-dependent terms, and a summand which is independent of the truncation level k. As can be seen in the proof of Proposition 3.1, namely (46), this summand is the result of bounding the regularization bias, inherent in Bayesian problems with (untruncated) Gaussian priors. Hence the best (provable, by bounding the SPC as above) contraction rate, will be bounded below (in order) by this regularization bias.
To better understand the nature of the k-dependent terms in the bound (19) we recall the following result from statistical inference. The minimax risk over the class Λ g ψ is given as where the infimum runs over all estimators using data Y n . Similarly, let where the infimum is taken over all (linear) truncated series estimators. Since the class Λ g ψ constitutes an ellipsoid, the following result holds.

Proposition 3.2 ([11, Prop. 8]). We have that
In particular We are ready to optimize the bound established in Proposition 3.1,while Proposition 3.2 will enable the comparison of our optimized bound to the minimax rate.

Theorem 2.
Consider the white noise model (12) with Gaussian prior N (0, C k ), and with underlying truth g 0 ∈ Λ g ψ as in (7), where either and where in the latter case we let Λ g = T Λ f T * , and we assume that the function ψ is operator concave. We assign k = k n as in (10). Then, for the constant C 1 from Proposition 3.1, we have If the regularization bias in (20) is of lower order, then the obtained contraction rate over the class Λ g ψ is order optimal. Remark 3. We emphasize that necessarily k n → ∞ as n → ∞, because otherwise, if k n < K < ∞, then, from (10) we find that hence by the properties of index functions we have s K (Λ g ) = 0, which is a contradiction.
Remark 4. The case that k n /n < ψ 2 (1/n) corresponds to the situation when the regularization bias dominates the overall SPC. In this case the truncation level is obtained from the relation k n = max {j, s j (Λ g ) > 1/n}, and this may be significantly smaller than the truncation level obtained in the case that the regularization bias is dominated.
It is thus interesting to characterize those cases when the regularization bias in (20) is of lower order. We shall provide a characterization; but for this we need an additional assumption.

Assumption 3 (control of decay of singular numbers). There is a constant
This assumption does not hold for operators Λ g with singular values decaying faster than exponentially.
Specifically, Proposition 3.3 applies for problems with covariance operator Λ g of the underlying infinite dimensional prior on g, having a power type decay of the singular numbers. Thus, in such cases, under Assumption 3 the truncation level k n yields order optimal contraction exactly if (22) holds.
Example (α-regular prior and Sobolev smoothness). For a native α-regular prior defined in Y (recall the example in § 2.2), with (untruncated) covariance operator Λ g , the Sobolev type smoothness of the underlying truth g 0 ∈ S β is expressed through the index function ψ(t) = t β/(1+2α) , t > 0 (recall the example in § 2.4). For this function we see that for α ≤ β it holds that such that both, Assumption 3 and condition (22) hold, and Proposition 3.3 applies.
The truncation level k n is then given from balancing k n k −2β , which results in k n n 1/(1+2β) , yielding a bound for the SPC of the form which is known to be minimax for direct estimation. The same bounds are valid also for inherited α-regular priors with commuting operators Λ f , T * T . In the non-commuting case, provided H = T * T and Λ f satisfy Assumption 2, the same bounds hold for α ≤ β ≤ 1 + 2α, where the additional restriction on β is needed in order to ensure that ψ is operator concave.

Interlude
Frequentist convergence rates of the posterior distribution under Gaussian priors in the Gaussian white noise model, have been considered for example in [37] (rates for the posterior mean under Sobolev-type smoothness), [6] (contraction rates under Sobolev-type smoothness), and [36] (general contraction theory). We gave a detailed discussion here on the one hand because, as explained in Section 1, we are interested in general smoothness assumptions, and on the other hand because we want to emphasize the specifics when using truncated (Gaussian) priors. Theorem 2 highlights the general nature of our bounds for the squared posterior contraction (SPC), in terms of both the considered prior covariances, and the smoothness of the truth, expressed using source sets. In our analysis we distinguish two cases: case 1, which uses native priors, and which is entirely based on the singular value decomposition of the underlying covariance operator, and case 2, which refers to priors inherited from external native priors using some linear mapping, and which is such that the inherited finite dimensional prior is no longer supported in a singular subspace of the covariance operator of the underlying infinite dimensional inherited prior. The latter case, which is relevant when studying the direct problem (2) associated to the inverse problem (1) can be treated provided that the linear mapping is appropriately linked to the external native prior's covariance. In particular this link, captured in Assumption 2, imposes a minimum smoothness on the external native prior.
Special emphasis is put on the description of the optimal truncation level k n , made explicit in (10). It is seen that in general this level will depend on the underlying smoothness as well as on the noise level 1 √ n , and that, in the case that the regularization bias is dominated, it is the same as the truncation level in (minimax) statistical estimation under white noise when using truncated series estimators, as expressed in Proposition 3.2. Furthermore, the obtained upper bound on the contraction rate involves a truncation-independent term, the regularization bias, and thus in Proposition 3.3 we give a characterization to determine whether this term will be of lower order compared to the k-dependent terms, or it will be dominating. In the former case the obtained rates of contraction are minimax, while in the latter case they are suboptimal 1 . As already mentioned in Remark 4, the truncation level according to (10) will be smaller for dominating regularization bias than for the case when the regularization bias is dominated.
We close this discussion with the following observation. In studies dealing with scaled infinite-dimensional Gaussian priors a typical 'saturation effect' is observed: In order to achieve minimax-optimal rates of contraction the prior smoothness must not be much lower than the regularity of the underlying truth, see [22] and [35]. The contrary is true for truncated priors: when applying Proposition 3.3 in specific examples later on, it will be transparent that the prior regularity must be lower than the regularity of the underlying truth; see the preceding example as well. This has also been observed in [36], and it can be explained by the fact that truncation of a Gaussian prior increases its regularity, which can correct for an under-smoothing but not for an over-smoothing prior. In the case that the truncation of the Gaussian prior is not along some singular subspace, a limitation for the considered smoothness occurs due to the nature of the linking Assumption 2. This can be seen from the final example in § 3.3.

Modulus of continuity for inverse problems
We next consider the linear mapping A : X → Y from (1) and shall introduce the modulus of continuity for controlling its inversion on a subset S (often called conditional stability). We shall do this for S := X k , where X k ⊂ X is a k-dimensional subspace. We derive bounds on the modulus of continuity which are known to be sharp in many cases.

Modulus of continuity
Similarly to the recent study [21], but restricting to normed spaces, we proceed as follows. Given the operator A, for a class S ⊂ X and a fixed element f 0 ∈ X, we let (23) be the modulus of continuity function. We stress that this modulus function controls the deviation around the element f 0 , and hence it is local. Recall hence we shall confine the subsequent analysis to the operator H.

Bounding the modulus of continuity
When bounding the modulus of continuity for the inversion of an operator around an element f 0 ∈ X, it is convenient to express the smoothness of f 0 relative to that particular operator. Precisely, in the context of the inverse problem (1), we shall measure the smoothness relative to the operator H, the companion of A, and we shall assume that f 0 ∈ H ϕ for some index function ϕ, see § 2.4 where source conditions were introduced. The control of the modulus of continuity is based on several assumptions, relating a finite dimensional subspace X k ⊂ X to the operator H as well as to the target function f 0 . We denote by P k the orthogonal projection of X onto the subspace X k . Furthermore, recall that we denote by s k := s k (H), the k-th singular number of the (compact) operator H. Definition 3 (degree of approximation). Let K : X → Y be a (compact) operator. Given a finite dimensional subspace X k ⊂ X we assign (K, X k ) := K(I − P k ) X→Y the degree of approximation of the subspace X k for the operator K.
Definition 4 (modulus of injectivity). Let K : X → Y be a (compact) operator. Given a finite dimensional subspace X k ⊂ X we assign the modulus of injectivity, which quantifies the invertibility of the operator K on the subspace K(X k ).
We mention here, that the last two concepts are interesting for sequences of increasing subspaces X k . Taking K = H 1/2 : X → X, the quantities (H 1/2 , X k ) and j(H 1/2 , X k ) shall allow us to quantify the impact of the choice S = X k , when bounding the modulus of continuity ω f0 (H −1/2 , S, δ).
Remark 5. The above (H 1/2 , X k ) relates to the k-th Kolmogorov number, while j(H 1/2 , X k ) relates to the k-th Bernstein number, both of which are well studied quantities in approximation theory, see [33].
When X k is the k-th singular subspace of H, then it can be seen that , for any index function ϕ. When using a subspace X k other than the k-th singular space, then its quality with respect to the k-th singular subspace is measured in terms of Jackson and Bernstein inequalities which look as follows.
Assumption 4 (relating X k to the k-th singular subspace of H 1/2 ). Consider a sequence (X k ) k∈N of subspaces of X. There are constants M, C P , C B ≥ 1 such that for k ∈ N we have and for f 0 ∈ H ϕ we have that Remark 6. Within the context of projection schemes in classical ill-posed problems such assumptions were made in the study [29]. For finite element approximations, i.e., when the spaces X k consist of finite elements, a detailed example is given in [18,Ex. 2.4]. In the context of Bayesian methods the recent study [16] also makes similar assumptions, see ibid. Ass. 2.3.
Under Assumption 4 the following bound holds.

175
In the bound from Proposition 4.1 we have the flexibility of choosing the truncation level k ∈ N, and we next study this choice. First, we recall the following companion to the index function ϕ as Notice that Θ ϕ is also an index function, more specifically, it is always strictly increasing hence invertible. Optimizing the bound from Proposition 4.1 with respect to the choice of the truncation level, we arrive at the main result of this section.
Theorem 3. Suppose that f 0 ∈ H ϕ , and that (X k ) k∈N satisfies Assumption 4. Given δ > 0 we assign Then there is a constant C 4 such that for δ > 0 small enough.
Some extensions to the above bounds on the modulus of continuity, can be found in Appendix A.

Relating the contraction rates for the direct and inverse problems
In this section we discuss the steps for proving Theorem 1, which is an application of Proposition 2.1.
We shall first use Theorem 2 to establish contraction rates for the direct problem (2), finding rate sequences δ n , for truncation level k n , n ∈ N. In order to apply Theorem 2 we need to determine the inherited prior for the direct problem (formulated in Y ), obtained by pushing forward the (truncated Gaussian) prior on f through the mapping A (formulated in X). Furthermore, given an element f 0 ∈ X we need to express the smoothness of g 0 = Af 0 with respect to the corresponding (inherited, untruncated) covariance operator. We address both of these tasks in § 5.1 and derive rates δ n for the direct problem, by relying on either Assumption 1 or 2, depending on whether the (untruncated) prior covariance on f commutes with H = A * A or not.
Given such a rate δ n , we can then use the results of Section 4, specifically Proposition 4.1, to compute the corresponding ω f0 (H − 1 2 , X kn , δ n ), which according Proposition 2.1, is a rate of contraction for the inverse problem at f 0 . Here, (X k ) k∈N are the singular spaces of the prior covariance operator Λ f . A main component of the proof will be the realization that k n as given in (10), optimizes both the contraction rate δ n as well as our bounds on the modulus of continuity; we shall see this in § 5.2. In the course, we shall establish that (X k ) k∈N obey Assumption 4.

Rates for the direct problem
Let us consider the model (1), and put a Gaussian prior Π X k = N (0, P k Λ f P k ) on f ∈ X, for a given self-adjoint, positive-definite and trace-class operator Λ f : X → X. Here P k denotes the orthogonal projection onto the k-dimensional subspace X k ⊂ X corresponding to the singular value decomposition of the operator Λ f . We are interested in finding contraction rates δ n of Af around Af 0 , for a given f 0 ∈ X.
Due to linearity, the Gaussian prior on f ∈ X, induces a Gaussian prior Π Y k on Af ∈ Y , which has zero mean and covariance operator Recall the terminology of native and inherited priors from Section 3. It is interesting to ask when this push-forward prior is native for g ∈ Y and this is the case when the operators H and Λ f commute. However, in general this will not be the case, that is, the push-forward of Π X k will not be native in Y . Nevertheless, the SPC was bounded in (16) for both native and non-native priors, respectively. See also Theorem 2, which optimizes the bounds in both cases.

Commuting case: general smoothness
The main observation is comprised as follows.
Based on the above technical result we state the following consequence. of the sequence of Gaussian priors Π X k(n) = N (0, P k(n) Λ f P k(n) ) on f , where P k is the orthogonal projection onto the k-th singular space X k of Λ f , and for f 0 ∈ H ϕ . b) (δ n ) n∈N is a rate of contraction for model (12), obtained for a sequence of (native) Gaussian priors N (0, Q k(n) Θ 2 χ (UHU * )Q k(n) ) on g, where Q k is the orthogonal projection onto the k-th singular space UX k of the operator Λ g := Θ 2 χ (UHU * ), and for g 0 ∈ Λ g ψ .

Non-commuting case: power type smoothness
If the operators H = A * A and Λ f do not commute, the push-forward of the prior on f will no longer be native for g = Af . However, even in the noncommuting case we can translate the smoothness assumption f 0 ∈ H ϕ with power-type ϕ, to a corresponding smoothness of g 0 := Af 0 with respect to the operator Λ g = AΛ f A * , under Assumption 2.

Lemma 5.2. Suppose that Assumption 2 holds. If f ∈ H ϕ for an index function
2a+1 , which has an operator concave square.
The proof of Lemma 5.2, which holds for μ ≤ a, is based on Heinz' Inequality, and this allows to treat power-type smoothness of g 0 with respect to Λ g , with exponent 0 ≤ θ ≤ 1/2. In particular, it does not allow to fully exploit the results of Section 3 for inherited priors, which hold for 0 ≤ θ ≤ 1 (since they only require operator concavity of ψ).
Therefore, we shall highlight the following condition, which allows to extend the range of applicability in the non-commuting cases. It is a strengthening of Assumption 2: There exists a ≥ 1/2 such that Remark 7. In view of Heinz' Inequality (with θ := 1/3), (33) is consistent with Assumption 2. Conversely, in this non-commuting case, (33) cannot be derived from Assumption 2, but instead is a strengthening of it. In brief, the validity of a link condition yields that the eigenfunctions for the operators on both sides must share the same smoothness (which can be seen from the modulus of injectivity, reflecting the 'inverse property'). Therefore, in general a link cannot be 'lifted' to higher powers, contrasting the commuting case, where both sides share the same eigenfunctions, and so do arbitrary powers.

Lemma 5.3. Suppose that
We summarize the developments of this section.
Consider the direct problem (2) around g 0 = Af 0 , under the sequence of priors Π X k(n) on f and for f 0 ∈ H ϕ . Then we can obtain a rate of contraction for this problem, by computing a rate of contraction (δ n ) n∈N for model (12), for the sequence A Π X k(n) of inherited Gaussian priors on g, and for g 0 ∈ Λ g ψ with We conclude this discussion on relating the obtained smoothness of g 0 = Af 0 to the smoothness of f 0 , as expressed in Propositions 5.1 and 5.2, respectively for the commuting and non-commuting cases. Specifying χ(t) := t a and ϕ(t) := t μ in the commuting case, we restrict to the power-type smoothness and relationship between Λ f and H, considered in the non-commuting case. In that setting, the obtained functions, representing the smoothness, should thus agree. Indeed, it is readily seen that the function ψ as obtained in Proposition 5.1 is exactly the same as in Proposition 5.2 with this specification. Therefore, the assumptions for the non-commuting case allow to maintain the results as obtained in the commuting one, however, the limitations a ≥ 1/2 and 0 < μ ≤ 2a + 1/2 occur, which are not seen in the commuting case.

Rates for the inverse problem -optimality of the truncation point
Consider a forward operator A and let H := A * A be its companion selfadjoint operator. Let δ n be a rate of contraction for the direct problem (2) around g 0 = Af 0 ∈ Y , under a truncated at level k n Gaussian prior as defined in the previous subsection. If Λ f and H commute, by Proposition 5.1, under Assumption 1 such a rate can be computed using Theorem 2. Such a rate can also be computed in the non-commuting case under Assumption 2, and the corresponding result was formulated in Proposition 5.2. Then according to Proposition 2.1, to compute a rate of contraction for the original inverse problem (1), it suffices to compute ε n = ω f0 (H −1/2 , X kn , δ n ).
We have studied bounding the modulus of continuity ω f0 (H −1/2 , X k , δ) in Section 4. Our bounds hold under Assumption 4 on the relationship of the subspaces (X k ) k∈N to the singular subspaces of H. Since in the present Bayesian inverse problem context, (X k ) k∈N are aligned to the untruncated prior covariance operator Λ f , in order to apply the results of Section 4 for bounding ε n = ω f0 (H −1/2 , X kn , δ n ), we first need to verify that (X k ) k∈N satisfy Assumption 4. We do this in the next proposition. Proposition 5.3. Let (X k ) k∈N be the singular spaces for the operator Λ f . Both Assumption 1, or Assumption 2 with smoothness f 0 ∈ H ϕ with ϕ(t) = t μ for 0 < μ ≤ a, yield the validity of Assumption 4. Under the stronger assumption (33) the range in the latter setting extends to μ ≤ 2a + 1/2. Remark 8. The above result is in correspondence with [16,Prop. 5.3], in which the commuting case is concerned. Here this is extended to the non-commuting cases under the link conditions (Assumption 2 and (33)).
We next investigate whether the truncation level k n from (10), also yields an optimized bound when used as a discretization level for the modulus of continuity, such that both bounds are optimized simultaneously. Indeed, we will see that this is the case and the following two technical results are the key. We first establish the optimality of k n in the commuting case, and then extend to the non-commuting case.
Given an index function ψ we consider a rate sequence δ n which obeys for a constant 2 ≤ C 8 < ∞.

Proposition 5.4. Under Assumption 1 the following holds true: suppose that
. Let k n be as in (10), and assume that (34) holds true for a rate sequence (δ n ) n∈N . We then have that This result is extended to the non-commuting case as follows.
Proposition 5.5. Under Assumption 2 with μ ≤ a, or Assumption (33) with μ ≤ 2a + 1/2, the following holds true: suppose that f 0 ∈ H ϕ for the power type function ϕ(t) = t μ , and let ψ(t) = t (μ+1/2)/(2a+1) . Let k n be as in (10), and assume that (34) holds true for a rate sequence (δ n ) n∈N . We then have that Evidently, for k n as in (10) a bound as in (34) holds for δ n equal to the optimized bound for the direct problem as given in the right hand side of (20), hence our bound on the modulus of continuity is indeed also optimized in both the commuting and non-commuting cases according to the last two propositions.
Combined, Propositions 5.4 and 5.5 imply the validity of Theorem 1. We emphasize that Proposition 5.5 holds true for the extended range μ ≤ 2a + 1/2, provided that condition (33) holds. This yields the following corollary. (1), and suppose that f 0 has smoothness H ϕ for the function ϕ(t) = t μ . Assume we put a truncated Gaussian prior N (0, P kn Λ f P kn ) on f , with Λ f a self-adjoint, positive-definite, trace-class, linear operator in X, and specify the related covariance operator Λ g = AΛ f A * .

Corollary 5.1 (Corollary to the proof of Theorem 1). Consider the inverse problem
Under condition (33) with μ ≤ 2a + 1/2, consider the index function For the choice k n according to (10) let δ n be given as for some constant C. Then the posterior contracts around f 0 at a rate ε n ϕ(Θ −1 ϕ (δ n )), n → ∞.

Examples
Here we exhibit how to use Theorem 1 in order to obtain rates of contraction for the inverse problem (1).
The subsequent examples will distinguish between the decay of the singular numbers of the forward map A, being moderate (power type), severe (exponential decay) or mild (logarithmic decay). Throughout we fix once and for all some element f 0 ∈ S β , see (8) in Section 2.4. It will be transparent that, depending on the underlying operator H = A * A this will result in different source-wise representations f 0 ∈ H ϕ . However, regardless of the kind of ill-posedness of the operator H we will have that ϕ 2 (s j ) j −2β . For a truncated Gaussian prior on f with underlying covariance operator Λ f , we thus need to determine SPC-rates δ n for the direct problems (2) which correspond to these examples. We will do this in Section 6.1, and we will apply Theorem 2 which results in the bound (20) for the optimal truncation level k n given in (10). For all considered types of behaviour of the singular numbers of A, we will study truncated α-regular Gaussian priors as introduced in Section 2.2. In addition, in the case that A exhibits exponential decay of the singular numbers, we shall also discuss a prior covariance operator with exponential decay (analytic prior), this is in alignment with the case analyzed in the study [21]. In all cases we will assume that Λ f and H commute.
Having determined rates δ n for the direct problem, in Section 6.2 we shall establish bounds for the modulus of continuity corresponding to the forward operators A at hand. To this end we will apply Theorem 3, which for any δ results in the bound (30) for the optimal truncation level k δ given in (29). We shall then highlight, that by Theorem 1, plugging δ = δ n in these bounds results in contraction rates for the corresponding inverse problems, for a truncated at level k n Gaussian prior.
The rates given below for (most of) the direct and (all of the) inverse problems, correspond to the minimax rates for estimation in Gaussian white noise, under Sobolev-type smoothness. While for Examples 1 and 2 these minimax rates are known, it is possible to find the minimax rates for the mildly ill-posed case in Example 3, by using Theorem 2 for the direct problem and the result from [10] for the inverse problem. These rates are given here for the first time.
Finally, we will conclude with a discussion on non-commuting Λ f and H cases in Section 6.3.

Direct rates
We confine to the case that Λ f and H commute, so that Assumption 1 holds with appropriate χ. Recall that in this context, Λ g = AΛ f A * , and that the smoothness of the truth is expressed relative to Λ g , via ψ(t) given in (9). Then, in order to obtain the truncation level k n from (10) and the corresponding bound on the SPC from (20), we shall proceed as follows.
In this commuting case we see that s j (Λ g ) = s j (H)s j (Λ f ) and we first check if Assumption 3 holds, in which case we can use Proposition 3.3 to determine whether the regularization term dominates in the bound (20) or not. Furthermore, we make use of the identity ψ(Λ g ) = Θ ϕ (UHU * ), which holds for ψ(t) from (9), and this extends to the singular numbers. Using this identity, condition (22) translates to Under Assumption 3 and (35), we find k n by balancing k/n ψ 2 (s k (Λ g )) and the SPC is bounded by (a multiple of) k n /n. This bound is known to be order optimal. If Assumption 3 does not hold, then we proceed as follows, cf. Remark 4. We find l n by balancing l/n ψ 2 (s l (Λ g )). Then we check whether ψ 2 (1/n) is larger than l n /n, in which case the regularization bias dominates. If this is the case, then k n is found by balancing s j (Λ g ) 1/n and the SPC is bounded by (a multiple of) ψ 2 (1/n). Otherwise, k n = l n and the SPC is bounded by (a multiple of) k n /n. In the latter case this is known to be order optimal again. We emphasize that we only need to explicitly compute ψ (hence also χ, (Θ 2 χ ) −1 ), in the case that the regularization bias dominates. Another consequence is worth mentioning. In case that the regularization bias is dominated, and hence the obtained contraction rate corresponds to the minimax rate of statistical estimation, then the truncation level k n is obtained from balancing k/n ψ 2 (s k (Λ g )) = Θ 2 ϕ (s k (H)). In particular, the level k n does not depend on the chosen regularity of Λ g , it is entirely determined by the smoothness as expressed with respect to H. Similar applies to the contraction rate for the inverse problem. As the minimax rate cannot depend on the prior regularity the same holds for the chosen truncation level. This is seen in the examples, below.
Notice that in Example 2 (both with α-regular and analytic priors as considered below), the direct problem corresponds to a prior covariance and smoothness of the truth, which are not standard in the literature for the white noise model. Here they appear naturally, because the structure of the direct problem is inherited from the considered inverse problem. For this reason, it was necessary to have the general setup for the direct problem in Section 3. Example 1 (moderately ill-posed operator). Here we assume that the operator H has power type decay of the singular numbers, that is, s j (H) j −2p , p > 0, j = 1, 2, . . . . We need to find a corresponding index function such that f 0 ∈ H ϕ . This is achieved by letting ϕ(t) := t β/(2p) , see the example in § 2.4, which gives Θ ϕ (t) = t β+p 2p . We consider truncated α-regular priors, so that s j (Λ f ) j −1−2α . Note that g 0 has smoothness Λ g ψ = (UHU * ) Θϕ , which in this example translates to Sobolev-type smoothness of order β + p.
We have s j (Λ g ) = s j s j (Λ f ) j −1−2(α+p) , and hence the regularity of the prior increases from α to α + p, also. Assumption 3 holds in this case. For αregular priors condition (35) holds if and only if α ≤ β, and in this case we know from Proposition 3.3 that the regularization bias in Theorem 2 is of lower order. The optimized truncation level k n as given in (10), can thus be computed by balancing yielding k n n 1 1+2β+2p . Plugging this into (20), we obtain the bound which is the square of the minimax rate for the white noise model under Sobolevtype smoothness of order β + p (this is both asserted by Theorem 2 but also well known in this case). Example 2 (severely ill-posed operator). Here we assume that the operator H has exponential decay of the singular numbers, that is, s j (H) e −2γj p , p > 0, j = 1, 2, . . . . The resulting index function ϕ which realizes the source condition for f 0 is then ϕ(t) = log −β/p (1/t), and the related function Θ ϕ is given as Θ ϕ (t) = √ t log −β/p (1/t). Lemma B.1 shows that its inverse behaves like Θ −1 ϕ (s) ∼ s 2 log 2β/p (1/s). We again consider truncated α-regular priors, so that Note that again g 0 has smoothness Λ g ψ = (UHU * ) Θϕ , which in this example means that g 0 has coefficients decaying at least as fast as e −γj p /j β .
On the one hand the regularization bias behaves asymptotically as On the other hand, we find l n by balancing resulting in l n log 1 p (n), again using Lemma B.1. We thus see that the regularization bias is of lower order, i.e., ψ 2 (1/n) l n /n, if and only if α ≤ β, in which case k n in (10) is equal to l n . For α > β, the level k n can be found by balancing s k (Λ g ) 1/n, yielding k n log 1 p (n) again. The right hand side of the bound (20) is dominated by k n /n 1 n log 1 p (n) in the former case, while in the latter by ψ 2 (1/n) as given above.
Combining, Theorem 2 gives the bound which, whenever α ≤ β, is the square of the minimax rate for the white noise model under the smoothness class Λ g ψ = (UHU * ) Θϕ (this is both asserted by Theorem 2 but also well known, again in this case).
We consider again α-regular priors, so that we find that s j (Λ g ) = s j s j (Λ f ) j −1−2α log −2p j. In particular Assumption 3 holds, and Condition (35) is valid if and only if α ≤ β. Thus, in the latter case the regularization bias is dominated, and the truncation level k n is obtained from balancing resulting in k n n 1 1+2β log − 2p 1+2β (n), again using Lemma B.1. Notice, that we do not need to explicitly determine the function ψ in this case, since the identity ψ 2 (s k (Λ g )) = Θ 2 ϕ (s k ) holds throughout, as mentioned above. We obtain that and that this is the (square of the) minimax rate of statistical estimation in the white noise model under smoothness expressed in terms of the index function Θ ϕ from above.
Finally, we revisit Example 2, but this time with the covariance operator of the Gaussian prior as considered in [21,Section 3.3]. Example (Example 2 with analytic prior). The covariance operator of the Gaussian prior is assumed to have eigenvalues Although the element g 0 = Af 0 is the same as before, i.e., g 0 has coefficients decaying at least as fast as e −γj p /j β , its smoothness relative to the resulting Λ g is with respect to a different function ψ, such that again g 0 ∈ Λ g ψ . Indeed, we find that s j (Λ g ) j −α e −(ξ+2γ)j p , so that again Assumption 3 only holds if p ≤ 1. We thus cannot apply Proposition 3.3 and we again need to explicitly check which of the two terms dominates the bound (20) in Theorem 2. In particular, we again need to explicitly compute ψ(t) = Θ ϕ ((Θ 2 χ ) −1 (t)). By Assumption 1, we have that On the one hand the regularization bias behaves asymptotically as On the other hand, we find l n from balancing l n ψ 2 (s l (Λ g )) = Θ 2 ϕ (s l ) l −2β e −2γl p , resulting in l n ∼ 1 2γ log(n) In particular, this rate is worse than the (minimax) rate obtained by the αregular prior.

Modulus of continuity and inverse rates
Below, we use Theorem 3 to bound the modulus at f 0 ∈ S β , for S = X k where X k satisfies Assumption 4, and for the three different choices of the linear operator H. We then plug the rates δ = δ n for the direct problem, obtained in the previous section, into these bounds. According to Theorem 1, the resulting rates are rates of contraction for the corresponding inverse problem (1) under the respective prior.
Example (Example 1 continued). Here the setup is exactly the same as in Example 1, with s j := s j (H) j −2p , such that Θ ϕ (t) = t (β+p)/(2p) . For the (optimal) choice k δ δ − 1 β+p , we thus get the bound on the modulus of continuity Then, in order to get a rate of contraction for the original inverse problem with an α-regular Gaussian prior truncated at k n , it suffices to insert δ n from (36) into bound (40) on the modulus of continuity. Indeed, for α ≤ β we get the rate which is known to be the minimax rate in the inverse problem setting with the assumed moderately ill-posed operator H, under Sobolev-type smoothness β. Example (Example 2 continued). With the representation of ϕ and Θ ϕ as in Example 2 we get the bound on the modulus of continuity which by again using Lemma B.1, is achieved for k δ log 1/p (1/δ). In order to get a rate of contraction for the original inverse problem with an α-regular Gaussian prior truncated at k n , it suffices to insert δ n from (37) into the bound (41) on the modulus of continuity. Regardless of whether α ≤ β or not, we get the rate log −β/p (1/δ n ) log −β/p (n), which is known to be the minimax rate in the inverse problem setting with the assumed severely ill-posed operator H, and under Sobolev-type smoothness β. That is, α-regular Gaussian priors truncated at k n log 1 p (n), are rate adaptive over Sobolev-type smoothness in this severely ill-posed operator setting.
When using an analytic prior we need to insert the (sub-optimal) rate from (39) into bound (41) on the modulus of continuity. This yields that is a rate of contraction for the inverse problem. In particular, the truncated analytic Gaussian prior with truncation point k n log 1 p (n), is also rate adaptive over Sobolev balls S β , for all β > 0. This is in agreement with the findings in [21,Section 3.3]. Example (Example 3 continued). With the representation of ϕ and Θ ϕ as in Example 3, and using Lemma B.1 again, we observe an asymptotic behavior for the modulus of continuity and this bound is achieved for k δ δ − 1 β log − p β (1/δ). This is (up to a logarithm) linear in δ, and the inverse problem is not much harder than the direct one. In analogy to [3] the problem is mildly ill-posed.
Inserting the rate for δ n from (38) into bound (42) on the modulus of continuity yields that is a rate of contraction for the inverse problem.

Discussion on the non-commuting case
We conclude with a discussion about the non-commuting case, and we revisit the setup of Example 1, i.e., with Sobolev type smoothness β and power type decay of the singular numbers of H as s j (H) j −2p . In this case the applicability of Theorem 1 was limited to μ ≤ 2a + 1/2, due to the assumed concavity of the function ψ. Translating the assumed setup, we find that the exponent giving the smoothness of f 0 specifies to μ := β/(2p), while the exponent a in Assumption 2 to a := (1 + 2α)/(4p). First, the assumption a ≥ 1/2 imposes a minimum regularity of the prior 1 + 2α ≥ 2p, if 2p > 1. In terms of Sobolev smoothness β, and for α-regular priors, the above limitation translates to β+p ≤ 1 + 2(α + p), and the function ψ would be given by ψ(t) = t (β+p)/(1+2(α+p)) , being concave under this limitation. This is in accordance with the discussion at the end of § 3.3, because when turning from f 0 to g 0 = Af 0 the Sobolevtype smoothness increases from β to β + p. Also, the regularity of the prior, when turning from Λ f to Λ g increases from α to α + p, see (53). Using this information to compute k n from (10), we get that the α-regular prior truncated at k n n 1 1+2β+2p gives the minimax rate in this non-commuting setting, for α ≤ β ≤ 1 + 2α + p.

Proofs
In order to understand the arguments that are used in some of the subsequent proofs, we recall a few facts from the theory of (bounded non-negative) selfadjoint operators in Hilbert space; we refer to [7] for a comprehensive treatment. First, we introduce the partial ordering for (non-negative) self-adjoint operators, say G 1 , G 2 : Z → Z, acting in a Hilbert space Z. We write G 1 ≺ G 2 if G 1 z, z ≤ G 2 z, z , z ∈ Z, and G 1 G 2 if there are constants 0 < a 1 , a 2 < ∞ such that both G 1 ≺ a 2 G 2 and G 2 ≺ a 1 G 1 . Weyl's Monotonicity Theorem, see [7, III.2.3] asserts that G 1 ≺ G 2 implies that the singular numbers also obey s j (G 1 ) ≤ s j (G 2 ), j = 1, 2, . . . Furthermore, we recall Heinz' Inequality, see [12,Prop. 8.21], which states that for 0 ≤ θ ≤ 1 the inequality G 1 z Z ≤ where the fractional power is again defined by spectral calculus. We shall also use the fact that for a positivedefinite, self-adjoint operator H : X → X, an isometry U : X → Y , and an index function ζ, we have from spectral calculus that Finally, the above ordering in the space of self-adjoint operators in Hilbert space gives rise to notions as operator monotonicity and operator concavity, extending the usual comparisons from real valued functions to self-adjoint operators by spectral calculus, and we refer to the monograph [7]. Specifically, for some range, say [0, a], an operator valued function ψ is operator concave if for any pair of non-negative self-adjoint operators G 1 , G 2 with spectra in [0, a] it holds true that In our subsequent analysis we will confine to power type index functions. Such functions are operator concave if and only if they are concave. However, we occasionally use and highlight the relevance of the operator concavity to indicate that the results have extensions to the more general context, without dwelling into this.

Proofs of Section 3
Proof of Proposition 3.1. The bound for the SPC(g 0 , k) will be based on the decomposition in (18), and we shall bound each summand, separately.
We start with bounding the posterior spread, and notice that for a (nonnegative finite rank) operator G : Y → Y we always have that Since the prior covariance C k has rank at most k, and since C k + 1 n −1 C k is norm-bounded by one, we can bound the posterior spread as Similarly we bound the estimation variance as It remains to bound the estimation bias g 0 − Eĝ k Y under smoothness g 0 ∈ Λ g ψ . To this end we notice that E(ĝ k ) = C k + 1 n −1 C k g 0 . Then the bias simplifies to We introduce the residual function of Tikhonov regularization r α (t) := α/(t+ α), α > 0, t > 0, and it is readily checked that for a sub-linear index function ψ we have that r α (t)ψ(t) ≤ ψ(α). This is then used by spectral calculus as operator function r α (C k ), which implies that r α (C k )ψ(C k ) Y →Y ≤ ψ(α). Since r α (C k ) Y →Y ≤ 1, this yields with α := 1/n and for g 0 ∈ Λ g ψ , that where the last inequality holds if the index function ψ is sub-linear. Otherwise, the maximal decrease of the first summand (as n → ∞) is of the order 1 n , which is known as the saturation of Tikhonov regularization.
The second summand in (45) will be bounded, both for the commuting (native prior or inherited prior with commuting Λ f , T * T ) and non-commuting (inherited prior with non-commuting Λ f , T * T ) cases. This will then result in an overall bound for the SPC after taking into account the bounds for the posterior spread and the estimation variance as already established.
for some constant c > 0, and this holds uniformly for g 0 ∈ Λ g ψ . The proof is complete, since 1/n 2 ≤ k/n.
We turn to the case of inherited priors, and we shall use the operator concavity of the index function ψ. This implies, cf. [7, Thm. X.1.1], that the second summand in (45), is bounded as We have that Truncated priors for Bayesian problems 189 We thus bound the approximation error which expresses the capability to approximate the compound operator T Λ f 1/2 by finite rank approximations, yielding by virtue of (47) that To this end, we will rely upon the link between T and Λ f , as captured by Assumption 2.
First, using Weyl's monotonicity Theorem with Assumption 2 we find that Next, by applying Heinz' Inequality with θ = 1/(2a) ≤ 1, we see that Using spectral calculus and Assumption 2 with f := T * g, g ∈ Y we find for arbitrary g ∈ Y that Thus ρ 2 k s k+1 (Λ g ), as k → ∞. Inserting this into the bound from (49) we complete the estimate for the bias from (45), and obtain the same bound as in the native case, when restricting to operator concave ψ. This completes the proof.
Remark 9. Within the context of projection schemes for ill-posed equations in Hilbert space, a more elaborate analysis allows for bounding the bias for general spectral regularization schemes, and for certain index functions which can express higher order smoothness. Specifically, such index functions are products of operator concave and Lipschitz ones; we refer to [27,Thm. 2] for details.
The proof of Proposition 5.4 above, consisted of three steps. First, we used Assumption 4 in order to derive a bound for the modulus of continuity in terms of a decreasing (in k) smoothness-dependent part, and a non-decreasing part. Then, each of the two terms were appropriately bounded by using the definition of k n . We follow a similar strategy in the next proof as well.