Bernstein - von Mises theorems for statistical inverse problems II: Compound Poisson processes

We study nonparametric Bayesian statistical inference for the parameters governing a pure jump process of the form $$Y_t = \sum_{k=1}^{N(t)} Z_k,~~~ t \ge 0,$$ where $N(t)$ is a standard Poisson process of intensity $\lambda$, and $Z_k$ are drawn i.i.d. from jump measure $\mu$. A high-dimensional wavelet series prior for the L\'evy measure $\nu = \lambda \mu$ is devised and the posterior distribution arises from observing discrete samples $Y_\Delta, Y_{2\Delta}, \dots, Y_{n\Delta}$ at fixed observation distance $\Delta$, giving rise to a nonlinear inverse inference problem. We derive contraction rates in uniform norm for the posterior distribution around the true L\'evy density that are optimal up to logarithmic factors over H\"older classes, as sample size $n$ increases. We prove a functional Bernstein-von Mises theorem for the distribution functions of both $\mu$ and $\nu$, as well as for the intensity $\lambda$, establishing the fact that the posterior distribution is approximated by an infinite-dimensional Gaussian measure whose covariance structure is shown to attain the Cram\'er-Rao lower bound for this inverse problem. As a consequence posterior based inferences, such as nonparametric credible sets, are asymptotically valid and optimal from a frequentist point of view.

While the Bayesian approach to inverse problems is widely used in scientific and statistical practice, very little theory is available that explains why Bayesian algorithms should be trusted to provide objective solutions of inverse problems in the presence of statistical noise, particularly in infinite-dimensional, non-linear cases which naturally arise in applications, see [10,27]. In the recent contributions [18,21,24] a general proof strategy was developed that can be used to derive theoretical guarantees for posterior-based inference, based on suitably chosen priors, in various settings, including inverse problems arising with diffusion processes, X-ray tomography or elliptic partial differential equations. A main idea of [18,21] is that a careful analysis of the 'Fisher information operator' inducing the statistical observation scheme combined with tools from Bayesian nonparametrics [6,7] can be used to derive sharp results about the frequentist behaviour of posterior distributions in general inverse problems.
The analysis of the 'information operator' depends highly on the particular problem at hand, and in the present article we continue this line of investigation in a statistical inverse problem very different from the ones considered in [18,21,24], namely in the problem of recovering parameters of a stochastic jump process from discrete observations. Statistically speaking, the inverse problem is a 'missing observations' problem that arises from the fact that we do not observe all the jumps and need to 'decompound' the effect of possibly seeing an accumulation of jumps without knowing how many have occurred. This has been studied from a non-Bayesian perspective for certain classes of Lévy processes by several authors, we mention here the seminal papers [2,3,19,31] -see also [1] for various further references -and [22], [9,23,28] relevant for the results obtained in the present paper. A typical estimation method used in several of these articles is based on spectral regularisation techniques built around the fact that the Lévy measure identifying all parameters of the jump process can be expressed in the Fourier domain by the Lévy-Khintchine formula (see (3)

below).
Given the sophistication of the non-linear estimators proposed so far in the 'decompounding problem' just described, one may wonder if a 'principled' (or 'ignorant') Bayesian approach that just places a standard high-dimensional random series prior on the unknown Lévy measure can at all return valid posterior inferences in such a measurement scheme. In the present article we provide some answers to this question in the prototypical setting where one observes discrete increments of a compound Poisson processes at fixed observation distance ∆ > 0. To lift some of the technicalities occurring in the proofs we restrict ourselves to periodic and hence compactly supported processes, and -to avoid identifiability problems arising in the periodic case -to small enough ∆. We show that the posterior distribution optimally recovers all parameters of the jump process, both in terms of convergence rates for the Lévy density ν and in terms of efficient inference for the intensity of the Poisson process and the distribution function of the jump measure µ. For the latter we obtain functional Bernstein-von Mises theorems which are the Bayesian analogues of the 'Donsker-type' central limit theorems obtained in [22], [9] for frequentist regularisation estimators. Just as in [21], our proofs are inspired by techniques put forward in [4][5][6][7][8] in 'direct' problems. However, due to the different structure of the jump process model, our proofs need to depart from those in [21] in various ways, perhaps most notably since we have to consider a prior with a larger support ellipsoid, and hence need to prove initial contraction rates for our posterior distribution by quite different methods than is commonly done, see Section 6. The inversion of the information operator in the jump process setting also poses some surprising subtleties that nicely reveal finer properties of the inference problem at hand -our explicit construction of the inverse information operator in Section 2.4 also gives new, more direct proofs of the semi-parametric lower bounds obtained in [28] (whose lower bounds admittedly hold in a more general setting than ours). Finally we should mention that substantial work -using tools from empirical process theory -is required in our setting when linearising the likelihood function to obtain quantitative LAN-expansions since, in contrast to [21], our observation scheme is far from Gaussian. In this sense the techniques we develop here are relevant also beyond compound Poisson processes, although, as argued above, the theory for non-linear inverse problems is largely constrained by any specific case one is studying.

Main results
2.1. Basic definitions. Let (N(t) : t 0) be a standard Poisson process of intensity λ > 0. Let µ be a probability measure on (−1/2, 1/2] such that µ({0}) = 0, and let Z 1 , Z 2 , . . . be an i.i.d. sequence of random variables drawn from µ. In what follows we view I = (−1/2, 1/2] as a compact group under addition modulo 1. Then the (periodic) compound Poisson process taking values in (−1/2, 1/2] is defined as where Y 0 = 0 almost surely, by convention. The process (Y t : t 0) is a pure jump Lévy process on I = (−1/2, 1/2] with Lévy measure dν = λdµ. We observe this process at fixed observation distance ∆, namely Y ∆ , Y 2∆ , . . . , Y n∆ , and define the increments of the process The X k 's are i.i.d. random variables drawn from the infinitely divisible distribution P ν = P ν,∆ which has characteristic function (Fourier transform) by the Lévy-Khintchine formula for Lévy processes in compact groups (Chapter IV.4 in [25]). Obviously (ϕ ν (k) : k ∈ Z) identifies P ν but under the hypotheses we will employ below it will also identify ν and thus the law of the jump process (Y t : t 0). The inverse problem is to recover ν from i.i.d. samples drawn from the probability measure P ν . We denote by C(I) the space of bounded continuous (and hence periodic) functions on I equipped with the uniform norm · ∞ , and let M(I) = C(I) * denote the (dual) space of finite signed (Borel) measures on I. For κ 1 , κ 2 ∈ M(I) their convolution is defined by κ 1 * κ 2 (g) = I I g(x + y)dκ 1 (x)dκ 2 (y), g ∈ C(I).
This coincides with the usual definition of convolution of functions when the measures involved have densities with respect to Lebesgue measure. We shall freely use standard properties of convolution integrals, see, e.g., Section 8.2 in [13].
[To see this just check the obvious fact that the Fourier transform of the last representation coincides with ϕ ν in (3), and use injectivity of the Fourier transform.] We will denote by P N ν the infinite product measures describing the laws of infinite sequences of i.i.d. samples (2) arising from a compound Poisson process with Lévy measure ν, and E ν will denote the corresponding expectation operator. We denote by L p = L p (I), 1 p < ∞, the standard spaces of functions f for which |f | p is Lebesgue-integrable on I, whereas, in slight abuse of notation, for a finite measure κ we will denote by L p (κ), 1 p ∞, the corresponding spaces of κ-integrable functions on I, predominantly for the choices κ = ν, κ = P ν . The spaces L 2 (I), L 2 (κ) are Hilbert spaces equipped with natural inner products ·, · , ·, · L 2 (κ) , respectively. The symbol L ∞ (I) denotes the usual space of bounded functions on I normed by · ∞ . We also write , ≈ for (in-)equalities that hold up to fixed multiplicative constants, and employ the usual o P , O P -notation to indicate stochastic orders of magnitude of sequences of random variables.

2.2.
Likelihood, prior and posterior. We study here the problem of conducting nonparametric Bayesian inference on the parameters ν, µ, λ, assuming a regularity constraint ν ∈ C s (I), s > 0, where C s is the usual Hölder space over I normed by · C s (when s ∈ N these are the ordinary spaces of s-times continuously differentiable functions, e.g., p.350 and p.360 in [16]). To define the likelihood function we need a common dominating measure for the statistical model (P ν : ν ∈ V) where V is some family of Lévy measures possessing densities with respect to Lebesgue measure Λ with density Λ = 1 (−1/2,1/2] . Since Λ is idempotent -Λ * Λ = I Λ(· − y)Λ(y)dy = Λ -we can consider the resulting compound Poisson measure P Λ = e −∆ δ 0 + (1 − e −∆ )Λ as a fixed reference measure on I. Then for any absolutely continuous ν on I the densities p ν of P ν with respect to P Λ exist. The likelihood function of the observations X 1 , . . . , X n is defined as We also write ℓ n (ν) = log L n (ν) for the log-likelihood function. Next, if Π is a prior distribution on a σ-field S V of V such that the maps ν → p ν (x) are Borel-measurable for all x ∈ I, then standard arguments (p.570 in [16]) imply that the resulting posterior distribution given observations X 1 , . . . , X n is .
We shall model an s-regular function by a high-dimensional product prior expressed through a wavelet basis: Let form a periodised Daubechies' type wavelet basis of L 2 = L 2 (I), orthogonal for the usual L 2inner product ·, · (described in Section 4.3.4 in [16]; where the constant 'scaling function' is written as the first element ψ −1,0 ≡ 1, in slight abuse of notation). Basic localisation and approximation properties of this basis are, for any g ∈ C s (I) and j ∈ N, that where P V j is the usual L 2 -projector onto the linear span V j of the ψ lk 's with l j − 1. Now consider the random function where u lk are i.i.d. uniform U(−B, B) random variables, and B is a fixed constant. The support of this prior is isomorphic to the hyper-ellipsoid of wavelet coefficients. To model an s-regular Lévy measure ν we define the random function and shall choose J = J n such that 2 J grows as a function of n approximately as . We note that the weights a l = 2 −l (l 2 + 1) −1 ensure that the random function v has some minimal regularity, in particular is contained in a bounded subset of C(I).
We shall work under the following general assumption on the prior and on the Lévy measure identifying the law of the compound Poisson process generating the data. Assumption 1. Assume the true Lévy measure ν 0 has a Lebesgue density, still denoted by ν 0 , which is contained in C s (I) for some s > 5/2, that ν 0 is bounded away from zero on I, and that for v 0 = log ν 0 and some γ > 0, (12) | where a l was defined in (9). Assume moreover that B, ∆ are such that λ = I ν < π/∆ for all ν in the support of the prior.
The assumption s > 5/2 (in place of, say, s > 1/2) may be an artefact of our proof methods (which localise the likelihood function by an initially suboptimal contraction rate) but, in absence of a general 'Hellinger-distance' testing theory (cf. Section 7.1 in [16]) for the inverse problem considered here, appears unavoidable.
The assumption (12) with γ > 0 guarantees that the true Lévy density is an 'interior' point of the parameter space V B,J for all J -a standard requirement if one wishes to obtain Gaussian asymptotics for posterior distributions.
Finally, the bound on λ ensures identifiability of ν, and thus of the law of the compound Poisson process, from the measure P ν generating the observations. That such an upper bound is necessary is a consequence of the fact that we are considering the periodic setting, see the discussion after Assumption 24 below. For the present parameter space V B,J , Assumption 1 enforces a fixed upper bound on ∆ -we could also renormalise the prior by a large enough constant to allow for larger values of ∆ as long as the intensities λ are small enough, but we avoid this for conciseness of exposition.
2.3. Supremum norm contraction rates. Even though the standard 'Hellinger-distance' testing theory to obtain contraction rates is not directly viable in our setting, following ideas in [4] we can use the Bernstein-von Mises techniques underlying the main theorems of this paper to obtain (near-) optimal contraction rates for the Lévy density ν 0 in supremum norm loss. The idea is basically to represent the norm by a maximum over suitable collections of linear functionals, and to then treat each functional individually by semi-parametric methods. The rate in the following theorem can be shown to be minimax optimal over Lévy densities in C s (I), up to the power of the log-factor. Theorem 2. Suppose the X 1 , . . . , X n are generated from (2) and grant Assumption 1. Let Π(·|X 1 , . . . , X n ) be the posterior distribution arising from prior Π = Π J in (10) with J as in (11). Then for every κ > 5/2 we have as n → ∞ that The only comparable result of this kind we are aware of in the literature can be found in [17], who obtain contraction rates for the Hellinger distance h(P ν , P ν 0 ) between the infinitely divisible distributions P ν , P ν 0 induced by the Lévy measures ν, ν 0 . Since we are not aware of any sharp 'stability estimates' that would allow to derive optimal bounds on the distance ν −ν 0 ∞ , or even just on ν −ν 0 L 2 , in terms of h(P ν , P ν 0 ), the results in [17] do a fortiori not imply any guarantees for Bayesian inference on the statistically relevant parameters ν, µ, λ.

2.4.
The LAN expansion and semi-parametric Cramér-Rao lower bounds. In order to formulate, and prove, Bernstein-von Mises type theorems, and to derive a notion of semi-parametric optimality of the limit distributions that will occur, we now obtain, for L n the likelihood function defined in (5), the LAN-expansion of the log-likelihood ratio process of the observation scheme considered here, in perturbation directions ν h,n that are additive on the log-scale. This will induce the score operator for the model and allow us to derive the inverse Fisher information (Cramér-Rao lower bound) for a large class of semi-parametric subproblems. Some ideas of what follows are implicit in the work by Trabs (2015), although we need a finer analysis for our results, including inversion of the score operator itself.
Proposition 3 (LAN expansion). Let ν = e v be a Lévy density that is bounded and bounded away from zero, and for h ∈ L ∞ (I) consider a perturbation ν h,n = e v+h/ √ n .
where the score operator is given by the Radon-Nikodym density The operator A ν defines a continuous linear map from L 2 (ν) into L 2 0 (P ν ) := g ∈ L 2 (P ν ) : I gdP ν = 0 .
The proposition will be proved in Section 5. In what follows we study properties of A ν and of its adjoint A * ν , in particular we need to construct certain inverse mappings. Due to the presence of the Dirac measure in (14) some care has to be exercised when identifying the natural domain of the inverse ('Fisher') information operator A * ν A ν . In particular we can invert A * ν A ν only along directions ψ for which ψ(0) = 0. An intuitive explanation is that the axiomatic property ν({0}) = 0 is required for ν to identify the law of the compound Poisson process (otherwise 'no jumps' and 'jumps of size zero' are indistinguishable), and as a consequence when making inference on the functional I ψdν one should a priori restrict to I ψ1 {0} c dν, a fact that features in the Cramér-Rao information lower bound (25) to be established below.
To proceed we will set ∆ = 1 without loss of generality for the moment. If κ ∈ M(I) is a finite signed measure on I and g : I → R a function such that I |g|d|κ| < ∞, we use the notation gκ for the element of M(I) given by (gκ)(A) = A gdκ, A a Borel subset of I. Then, for a fixed Lévy density ν ∈ L ∞ (I), consider the operator defined on the subset of M(I) given by This operator serves as an extension of A ν from (14) to the larger domain D. It still takes values in L 2 0 (P ν ); in fact δ 0 is in the kernel of A ν since but extending A ν formally to D is still convenient since the inverse of A ν to be constructed next will take values in D. Define a finite signed measure for which P ν * π ν = δ 0 (by checking Fourier transforms). Formally, up to a constant, π ν equals the inverse Fourier transform F −1 (1/ϕ ν ) of 1/ϕ ν , and convolution with π ν can be thought of as a 'deconvolution operation'.
We then have where the second term vanishes since for such g, by Fubini's theorem, That A ν takes values in D is immediate from the definition of π ν and (4).
We now calculate the adjoint operator of A ν .
Lemma 5. Assume the Lévy density ν ∈ L ∞ (I) is bounded away from zero on I. If we regard A ν from (14) as an operator mapping the Hilbert spaces L 2 (ν) into L 2 0 (P ν ) then its adjoint A * ν : so that the formula for the adjoint holds on the dense subspace C(I) of L 2 0 (P ν ). The Cauchy-Schwarz inequality implies that P ν (−·) * w ∈ L 2 (ν) so that the case of general w ∈ L 2 0 (P ν ) follows from standard approximation arguments.
Inspecting the formula for A * ν we can formally define the 'inverse' map (A * ν ) −1 (g) = π ν (−·) * g, g ∈ L 2 (ν), scaled by 1/∆ if ∆ = 1. If g(0) = 0 then for positive numerical constants c, c ′ (20) This last identity is first proved for a smooth approximation of g, using Plancherel's identity for Schwartz distributions on the unit circle, e.g., as in p.295f. in [13], and then extended by approximation to general elements of L 2 (ν). Now let ψ ∈ L 2 (ν) be arbitrary but such that ψ(0) = 0, for instance we can take ψ1 {0} c for any ψ ∈ C(I). If ν ∈ L ∞ (I) is bounded away from zero then by what precedes (A * ν ) −1 (ψ/ν) ∈ L 2 0 (P ν ) and hence in view of Lemma 4 we can define, for any such ψ, the new function as an element of D. Concretely, in view of (4), (17), (when ∆ = 1, otherwise divide the right hand side in the following expression by ∆ 2 ) We can then write ψ d = ψ + cδ 0 where is the part of ψ d that is absolutely continuous with respect to Lebesgue measure Λ, and cδ 0 is the discrete part (for some constant c).
The content of the next lemma is that ψ allows to represent the LAN inner product in the standard L 2 -inner product ·, · of L 2 (I).
Using the LAN expansion and the previous lemma we derive the Cramér-Rao lower bound for 1/ √ n-consistently estimable functional parameters of the Lévy measure of a compound Poisson process, following the theory laid out in Chapter 25 in [29]. We recall some standard facts from efficient estimation in Banach spaces: assume for all h in some linear subspace H of a Hilbert space with Hilbert norm · LAN , that the LAN expansion holds, where P n v are laws on some measurable space X n and where ∆ n (h) that is suitably differentiable with continuous linear derivative map κ : H → R. By Theorem 3.11.5 in [30] the Cramér-Rao information lower bound for estimating the parameter K(ν) is given by κ * 2 LAN where κ * is the Riesz-representer of the map κ : (H, · LAN ) → R. We now apply this in the setting of the LAN expansion obtained from Proposition 3, with laws P n v parametrised by v = log ν, tangent space H = L ∞ and LAN-norm h LAN = A ν 0 h L 2 (Pν ) where A ν 0 : (H, · L 2 (ν) ) → L 2 0 (P ν ) is the score operator studied above corresponding to the true absolutely continuous Lévy density ν 0 generating the data (note that the central limit theorem ensures ∆ n (h) → d ∆(h) for these choices). For ψ ∈ L ∞ (I) we consider the map which can be linearised at ν 0 with derivative We conclude that the Cramér-Rao information lower bound for estimating I ψν from discretely observed increments of the compound Poisson process equals (25) , where we used Lemma 4 in the second equality. Note that the last identity holds under the notational assumption ∆ = 1 employed in the preceding arguments and the far right hand side needs to be scaled by 1/∆ 2 when ∆ = 1.

2.5.
A multi-scale Bernstein-von Mises theorem. We now formulate a Bernstein-von Mises theorem that entails a Gaussian approximation of the posterior distribution arising from prior (10) in an infinite-dimensional multi-scale space. We will show in the next subsection how one can deduce from it various Bernstein-von Mises theorems for statistically relevant aspects of ν, µ, λ. Following [7] (see also p.596f. in [16]) the idea is to study the asymptotics of the measure induced in sequence space by the action ( ν, ψ lk ) of draws ν ∼ Π(·|X 1 , . . . , X n ) of the conditional posterior distribution on the wavelet basis {ψ lk } from (7). In sequence space we introduce weighted supremum norms (26) x M(w) = sup with monotone increasing weighting sequence (w l ) to be chosen. Define further the closed separable subspace M 0 (w) of M(w) consisting of sequences for which lim l→∞ w −1 l max k |x lk | = 0, equipped with the same norm.
The Bernstein-von Mises theorem will be derived for the case where the posterior distribution is centred at the random element ν(J) = ( ν(J) l,k ) of M 0 (w) defined as follows (27) ν(J) l,k ≡ with the convention that ν(J) l,k = 0 whenever l J (the operator (A * ν 0 ) −1 was defined just after Lemma 5 above). A standard application of the central limit theorem and of (20) implies as n → ∞ and under P N ν 0 that, for every fixed k, l, and hence in view of (25) the random variable ν(J) is a natural centring for a Bernstein-von Mises theorem. Since ν ∈ L ∞ (I) the law of √ n(ν − ν(J)) defines a probability measure in the space M 0 (ω) for ω as in the next theorem. Next, denote by N ν 0 the law L(X) of the centred Gaussian random variable X on M(w) whose coordinate process has covariances . The proof of the following theorem implies in particular that N ν 0 is a tight Gaussian probability measure concentrated on the space M 0 (w) where weak convergence occurs. Recall (Theorem 11.3.3 in [11]) that weak convergence of a sequence of probability measures on a separable metric space (S, d) can be metrised by the bounded Lipschitz (BL) metric Theorem 7. Suppose the X 1 , . . . , X n are generated from (2) and grant Assumption 1. Let Π(·|X 1 , . . . , X n ) be the posterior distribution arising from prior Π = Π J in (10) with J as in (11). Let β M 0 (ω) be the BL metric for weak convergence of laws in M 0 (ω), with ω = (ω l ) satisfying ω l /l 3 ↑ ∞ as l → ∞. Let ν J be the random variable in M 0 (ω) given by (27). Then for ν ∼ Π(·|X 1 , . . . , X n ) and N ν 0 as above we have in P N ν 0 -probability, as n → ∞, Theorem 7 has various implications for posterior-based inference on the parameter ν. Arguing as in [7], Section 4.2, we could construct Bayesian multi-scale credible bands for the unknown Lévy density ν, with L ∞ -diameter shrinking at the rate from Theorem 2. We will leave this application to the reader and instead focus on inference on functionals of the Lévy measure ν that are continuous, or differentiable, for · M(ω) (see Section 4.1 in [7], [5]).
2.6. Bernstein-von Mises theorem for functionals of the Lévy measure. We now deduce from Theorem 7 Bernstein-von Mises theorems for the functionals which for t = 1/2 also includes the intensity λ = I dν = V (1/2) of the underlying Poisson process. From the usual 'Delta method' we can then also deduce a Bernstein-von Mises theorem for the distribution function M(t) = I 1 (−1/2,t] dµ of the jump measure µ = ν/λ = ν/ I ν. The key to this is the following lemma, proved in (the proof of) Theorem 4 of [7].
Then we have as n → ∞ and in P N ν 0 -probability that The first two limits are immediate consequences of Theorem 7, Lemma 8 and the continuous mapping theorem. For the last limit we apply the Delta method for weak convergence ( [29], Theorem 20.8) to the map V → V /V (1/2), which is Fréchet differentiable from L ∞ (I) → L ∞ (I) at any ν ∈ L ∞ (I) that is bounded away from zero, with derivative l ν .
Arguing just as before (25) one shows that the above Gaussian limit distributions all attain the semi-parametric Cramer-Rao lower bounds for the problems of estimating V, M, λ = V (1/2), respectively. In particular they imply that 'Bayesian credible sets' are optimal asymptotic frequentist confidence sets for these parameters -the arguments are the same as in [7], Section 4.1, and hence omitted. These results are the 'Bayesian' versions of the Donsker type limit theorems obtained for frequentist estimators in [9,22], where the same limit distributions were obtained.

3.1.
Asymptotics for the localised posterior distribution. The first step will be to localise the posterior distribution near the 'true' ν 0 ∈ C s by obtaining a preliminary (in itself sub-optimal) contraction rate for the prior Π from (10). Recall the notation v = log ν and define with M a constant and for any δ > 1/2. We have the following Proposition 10. For D n as in (28), prior Π arising from (10) with J chosen as in (11) and under Assumption 1, we have for any s > 5/2, δ > 1/2 and every M large enough (29) Π(D c n,M |X 1 , . . . , X n ) → P N ν 0 0 as n → ∞. In particular we can choose M in (28) large enough so that the last convergence to zero occurs also for D n,M/2 replacing D n,M . Moreover, on the set D n we also have the same contraction rates for ν − ν 0 in place of v − v 0 with a possibly larger constant M.
Proof. This is proved in Section 6 below.
As a consequence of the previous proposition, if Π Dn := Π Dn (·|X 1 , . . . , X n ) equals the posterior measure arising from the prior Π(· ∩ D n )/Π(D n ) instead of from Π, we can deduce the basic inequality (30) sup as n → ∞. We now study certain Laplace-transform functionals of the localised posterior measure Π Dn . We use the shorthand notation g J = P V J (g) for the wavelet projection of g ∈ L 2 (I) onto V J , and for a fixed function η : I → R, consider a perturbation of ν given by where 0 < t < ∞ and δ n → 0 such that δ n √ n → ∞ is a sequence to be chosen. That the perturbation ν t equals a convex combination of points will be useful to deal with the fact that our parameter space has a boundary (see also [20,21]). We have the following key proposition, giving general conditions under which a (sub-) Gaussian approximation for the Laplace transform of general functionals F (ν) of the posterior distribution holds. Its proof is given in Section 4.

Proposition 11. Under the hypotheses of Proposition 10, suppose δ n is chosen such that (52) below is satisfied and let
where r n = O P N ν 0 (a n ) as n → ∞ with a deterministic null sequence a n → 0 that is uniform for all η ∈ H n , Z n = Dn e Sn(ν)+ℓn(νt) dΠ(ν) Dn e ℓn(ν) dΠ(ν) , ν t as in (31), and where A ν : L 2 (ν) → L 2 0 (P ν ) was defined in Proposition 3. Given a functional F of interest, Proposition 11 can be used to prove Bernstein-von Mises theorems by selecting appropriate η so that S(ν) vanishes (or converges to zero). When this is the case it remains to deal with Z n by a change of measure argument for ν → ν t .

3.2.
Change of measure in the posterior. We now study the ratio Z n for η, δ n satisfying certain conditions, and under the assumption that sup ν∈Dn |S n (ν)| is either O(1) or o(1). Note that by Assumption 1, v 0 = log ν 0 is an 'interior' point of the support of the prior Π. We shall require that (t/δ n √ n)η + v 0,J is also contained in V B,J , implied by Note that under (32) the function v t from (31) is a convex combination of elements v, (t/δ n √ n)η + v 0,J of V J,B and hence itself contained in the support V J,B of Π. We can thus write where Π t is the law of ν t , absolutely continuous with respect to Π, and where The measure Π t corresponds to transforming each coordinate v lk of the 2 J -dimensional product integral defining the prior Π into the convex The density of the law of v t,lk with respect to v lk is constant on a subinterval of I l,B of length 2B(1 − δ n ) and thus has constant density (1 − δ n ) −1 . The density of the product integrals is then also constant in v and equal to independently of ν. We conclude that if (32), (33) hold then where the last identity follows from renormalising both numerator and denominator by V e ℓn(ν) dΠ(ν). The numerator in the last expression is always less than or equal to one and by Proposition 10 the denominator converges to one in probability, so that we have Lemma 12. Suppose sup ν∈Dn |S n (ν)| = O(1) holds as n → ∞ and assume η, δ n are such that (32), (33) hold. Then the random variable Z n in Proposition 11 is O P N ν 0 (1), uniformly in η, as n → ∞.
To prove the exact asymptotics in the Bernstein-von Mises theorem we need: Lemma 13. Suppose η, δ n are such that (32), (33) hold and assume in addition that η ∞ d for some fixed constant d.
A) Let D n,M be as in (28) and define the set D n,M,t = {ν t : ν ∈ D n,M }. Then for all n n 0 (t) and M large enough we have D n,M/2 ⊆ D n,M,t and thus by Proposition 10 also Then by definition so ω t (ν) = ν follows. It remains to verify that also ω(ν) ∈ D n,M for every ν ∈ D n,M/2 . To see this we let n large enough such that in particular δ n < 1/4 and then (8) and also 1/ for n large enough, hence ω ∈ V J,B . The last claim in Part A) now follows directly from Proposition 10, and Part B) also follows, from (34).

Lemma 14.
For any ψ = c ℓJ ψ ℓm 1 I\{0} with fixed ℓ < J, m, let ψ d be the corresponding finite measure defined in (21), let ψ be its absolutely continuous part from (23), and let ψ J = P V J ( ψ) be its wavelet projection onto V J . Then we have, for some constant c 0 independent of ℓ, m, J, that Proof. We notice that Lemma 6 implies so that by linearity of the operator A ν 0 and Lemma 5 it suffices to bound where we have used Parseval's identity, and the shorthand h(ν, ν 0 ) : . Now ψ is the absolutely continuous part of ψ d which according to (22) is given by By standard properties of convolutions, using (4) and since ψ/ν 0 is absolutely continuous, removing the discrete part of ψ d means removing Dirac measure from the series expansion of P ν 0 -denote the resulting absolutely continuous measure by P ν 0 . First we consider the part ψ of ψ corresponding to the terms in the last series where either ι > 0 or κ > 0, so that not all of the convolution factors in 2, we can use the basic convolution inequality f * g C α (I) f H α (I) g L 2 , α = 0, 2, (proved, e.g., just as Lemma 4.3.18 in [16]), the fact that ψ/ν 0 = c ℓJ ψ ℓm /ν 0 is bounded in L 2 = H 0 , and the multiplier property f g H 2 f C 2 g H 2 combined with the fact that the density of P ν 0 is contained in C s (I) ⊆ C 2 (I), to deduce thatψ is contained in C 2 (I) and thus, by (8) which is of the desired order. Setting ι = κ = 0 in the preceding representation of ψ and using the convolution series representation of P ν 0 (without discrete part) yields the 'critical' term which is given by −ψg where for a suitable constant c > 0. By arguments similar to above the function g is at least in C 2 and for x lk the mid-point of the support set S lk of ψ lk (an interval of width O(2 −l ) at most) we can write The last term vanishes by orthogonality (ℓ J < l), and using the mean value theorem the absolute value of the first is bounded by Then, using (8) and the standard convolution inequalities for L 2 -norms, The result follows by scaling the last estimate by a constant multiple of c ℓJ = 2 ℓ/2−J/2 (log n) −1/2−δ .
Conclude from Proposition 10 and our choice of J that Simple calculations (using that (22) implies that ψ J , 2 −J/2 ψ J are uniformly bounded in L 2 , L ∞ , respectively, proved by arguments similar to those used in Lemma 14) show that for s > 5/2 the conditions (52), (53), (54), (32), (33) are all satisfied for such η, δ n and K large enough. We thus deduce from Proposition 11 and Lemma 12 that for some sequence If we define ν ℓm = 1 n n k=1 A ν 0 ( ψ J )(X k ) + c ℓJ v 0 ψ ℓm then this becomes the sub-Gaussian estimate for the stochastic process Z ℓ,m = (c ℓJ vψ ℓm − ν ℓm )|X 1 , . . . , X n conditional on X 1 , . . . , X n . We can then decompose and the maximum over 2 J many variables in (36) can now be estimated by the sum of the maxima of each of the preceding processes. The first process consists, conditional on X 1 , . . . , X n , of sub-Gaussian variables with uniformly bounded sub-Gaussian constants using Lemma 18, that ν 0 ∈ L ∞ is bounded away from zero, that P V J is a L 2 -projector, combined with standard convolution inequalities. Hence using Lemma 2.3.4 in [16] this maximum has expectation of order at most O( √ J) with P N ν 0 -probability as close to one as desired. To the maximum of the second (empirical) process we apply Lemma 3.5.12 in [16] (and again Lemma 18 combined with the inequality in the previous display and also that g ∞ 2 J/2 g L 2 for any g ∈ V J ) to see that its P N ν 0 -expectation is of order O( uniformly in ℓ J, m. Feeding these bounds into (36) we see that on an event of P N ν 0 -probability as close to one as desired, Since δ > 1/2 was arbitrary an application of Markov's inequality completes the proof.
Lemma 15. For any monotone increasing sequencew = (w l ),w l /l 3 1, if Z equals either X or the process √ n(ν − ν(J))|X 1 , . . . , X n , then for some fixed constant C > 0 we have where in case Z = √ n(ν − ν(J))|X 1 , . . . , X n the operator E denotes conditional expectation E Dn [·|X 1 , . . . , X n ] and the inequality holds with P N ν 0 -probability as close to one as desired. Proof. We first consider the more difficult case where Z is the centred and scaled posterior process. We decompose, with ν J = P V J (ν), The second term on the right hand side has multi-scale norm ν 0 − ν 0,J M(w) bounded by (8), ψ lk L 1 2 −l/2 . Similarly the expectation of the multi-scale norm of the third term is bounded by using (39). We turn to bounding the multi-scale norm of the first term, corresponding to The first term in the decomposition and the quadratic remainder is of order o(1/ √ n) uniformly in k, l by definition of D n and since s > 5/2.
Lemma 16. Let ψ = ν 0 ψ lk 1 I\{0} for some l < J, k with corresponding ψ = ( ψ) lk from (21), (23) and wavelet approximation ψ J ∈ V J . We have Proof. The proof requires only notational adaptation of the proof of Lemma 14, except for the last display, where now we use Lemma 18 (and its variant for The upper bound in the last display has E Dn [·|X 1 , . . . , X n ]-expectation of order o(1/ √ n) in view of (39). We now apply Proposition 11 to the functional with choices δ n = K2 J (J 2 + 1)/ √ n for K > 0 is a large enough constant and η = ψ J . Simple calculations (using that ψ J , 2 −J/2 ψ J are uniformly bounded in L 2 , L ∞ , respectively) show that for s > 5/2 the conditions (52), (53), (54), (32), (33) are all satisfied. Conclude from Proposition 11 and Lemma 12 that or equivalently, if V lk = 1 n n k=1 A ν 0 ( ψ J )(X k ), then for some C ′ n = O P (1), Arguing just as in (38) the sub-Gaussian constants ψ J 2 LAN are bounded by a fixed constant, and standard arguments for suprema of sequences of normalised sub-Gaussian random variables (e.g., as on p.599 in [16], or p.1962 in [7]) then give as soon as w l √ l. Moreover one proves E ν 0 sup l<J w −1 l max k |V lk | 1/ √ n and also E ν 0 sup l<J w −1 l max k |W lk | 1/ √ n just as in the proof of Theorem 1 in [7] (or Theorem 5.2.16 in [16]), using Bernstein's inequality combined with the previous bound on the sub-Gaussian constants and a uniform bound of order 2 J/2 (proved just as after (38)) on the envelopes The inequality (40) implies in particular that for any weighting sequence ω as in Theorem 7, the processes Z concentrate in the separable subspace M 0 (ω) of M(ω), and their laws define tight (in the case of N ν 0 , Gaussian) Borel probability measures in it (by Ulam's theorem, see p.225 in [11]). Then, using the estimate (40) and arguing as in the proof of Proposition 6 in [7] (or in Theorem 7.3.20 in [16]), Theorem 7 will follow if we can establish convergence of the finite-dimensional distributions Π n • P −1 V L towards those of N ν 0 • P −1 V L , L ∈ N fixed, as n → ∞, where P V L is the projection operator onto the finite-dimensional subspace V L of M 0 (w) corresponding to the first 2 L coordinates (x lk : l L, k). For this we proceed as in the previous lemma, combining (41), (42) with Lemma 16 and the definition of W lk , to reduce the problem to showing weak convergence in probability of the conditional laws of to the law of N ν 0 for every fixed k, l L ∈ N. Applying Proposition 11 as after (43) combined with Lemma 13 (for k, l fixed the corresponding ψ J 's are bounded in L ∞ ) gives convergence of Z n in Proposition 11 to one and hence one has, as n → ∞ and for all t, E Π Dn e tYn |X 1 , . . . , X n = (1 + o P (1)) exp Using Lemma 4, (21), A ν ( ψ d − ψ) = 0 by (16) and (23), and then also Lemma 18 combined with ψ ∈ L 2 one has as J → ∞, in particular by Chebyshev's inequality ρ n = o P (1) for every fixed l L, k. Thus the Laplace-transforms of each such coordinate projection converge to the Laplace transform of the correct normal limit distribution, for all t, E Π Dn e tYn |X 1 , . . . , X n = (1 + o P (1)) × exp and convergence in distribution now follows from standard arguments (see, e.g., Proposition 30 in [21]). This argument extends directly to all linear combinations l L,k a l,k ψ lk , so that we can apply the Cramer-Wold device to obtain joint convergence in V L for any L ∈ N. The proof is complete.

Proof of Proposition 11
Using the definition of S n (ν) and the formula for the posterior distribution we obtain By Assumption 1 we have s > 5/2 so that by Remark 20 condition (54) implies condition (55) and we conclude that the entire Assumption 19 is satisfied. By Lemma 21, the choice of J as in (11), Assumption 19 and the L p -contraction rates (63) derived from Proposition 10 we have that Assumption 17 is satisfied. In Section 4.2 we prove that under Assumption 17 where sup ν∈Dn |r ′ n (ν)| = o P (1) with the deterministic null sequence implicit in the o P notation uniform in η ∈ H n . Since the first two terms on the right hand side do not depend on ν they can be taken outside the posterior integral in (45) so that A ν 0 (η)(X k ) × Dn e Sn(ν)+ℓn(νt)+r ′ n (ν) dΠ(ν) Dn e ℓn(ν) dΠ(ν) .
By the mean value theorem for integrals r ′ n (ν) can be replaced by r n not depending on ν with |r n | sup ν∈Dn |r ′ n (ν)| = o P (1) in the above display finishing the proof of the proposition. In order to prove the crucial perturbation approximation (46), we first need to obtain formulas for the directional derivatives of the likelihood function, which is done in the next section.

4.1.
Directional derivatives of the likelihood function. We fix a positive and absolutely continuous Lévy measure ν 0 = λ 0 µ 0 with corresponding infinitely divisible distribution P ν 0 . We set v 0 = log ν 0 so that ν 0 = exp v 0 and parametrise a path away from ν 0 as The resulting compound Poisson measure can be identified in the Fourier domain as is a finite signed measure on I. One checks by the usual properties of convolution and definition of e z that the second factor in the last product is the Fourier transform of the finite signed measure e −∆ν (s),h (I) ∞ k=0 (∆ k (ν (s),h ) * k )/k! and so we conclude by injectivity of F that Let Λ denote the Lebesgue (probability) measure on I. We observe that the resulting compound Poisson measure is of the form P Λ = e −∆ δ 0 + (1 − e −∆ )Λ. Both P ν (s) and P ν (s+h) are absolutely continuous with respect to P Λ . We will now determine the first five derivatives of d P ν (s) / d P Λ . To this end we expand (47) in terms of h. We start with the factor in front of the sum and expand From the definition of ν (s),h we observe that (ν (s),h ) * k = O(h k ). Using (47) we obtain To find the first derivative we gather all terms that are linear in h and obtain This gives the first derivative d ds Gathering all terms quadratic in h we find And this gives the second derivative Finally we gather all terms which are cubic in h. This yields In this way we obtain the third derivative In a similar way we obtain for the fourth and fifth derivative Let L 2 0 (P ν ) := {g ∈ L 2 (ν) : g d P ν = 0}. Motivated by the structure of the derivatives we define the multilinear form In view of the derivatives of the log-likelihood we divide the derivatives by d P ν (s) / d P Λ . Then the dominating measure P Λ cancels and we suppress it in the notation. We obtain the following expressions With the densities at hand we can determine the derivatives of the empirical log-likelihood The previous quantities simply denote one-dimensional derivatives of the empirical loglikelihood along the curve ν (s) . These derivatives can be viewed as values on the diagonal of symmetric multilinear forms and by means of polarization we extend the derivatives to symmetric multilinear forms. Formally we understand these multilinear forms through polarization of the one-dimensional derivatives.

Likelihood expansion.
In this section we will use a likelihood expansion to show the statement used in the proof of Proposition 11 that where sup ν∈Dn |r ′ n (ν)| = o P (1). Let ε L p n with 2 < p < ∞ be rates such that for D n,p = D n,p,M : Setting ω L p n = tn −1/2 η L p + δ n ε L p n we work under the following conditions. Assumption 17. Let H n ⊆ L ∞ (I). Assume J, δ n , ε L p n and ω L p n satisfy uniformly over η ∈ H n We consider the following path from ν 0 to ν, s → exp(s(v − v 0 ) + v 0 ) = ν (s) . A Taylor expansion of the log-likelihood ℓ n along this path gives , where the first two terms denote first and second derivative at zero and the last term denotes the third derivative at some intermediate point s ∈ [0, 1]. We will see later that the derivatives depend linearly on the directions. Thus it is possible to extend them to symmetric multilinear forms. The corresponding path from ν 0 to We recall the perturbation (31) and define δ n (v) by With this definition we calculate We need to show that The first term is given by . For the second term we have where G n = √ n(P ν 0 ,n − P ν 0 ) is the empirical process and F We observe that there is D > 0 such that v 0 ∞ D and v ∞ D for all v ∈ V B,J . We will bound the norms of functions in F using the following lemma.

Lemma 18.
Let v ∞ D and ν = exp(v). Then for A ν defined in (48) and for 1 p ∞ The constants only depends on k, D and ∆.
Proof. We write ν for both the Lévy measure and its density. The measure P ν can be written as a convolution exponential P ν = e −∆λ ∞ k=0 ∆ k k! ν * k with intensity λ = ν((−1/2, 1/2]). The function v is bounded such that the corresponding Lévy density ν = exp(v) is bounded from above and bounded away from zero. Likewise the intensity λ is bounded from above and bounded away from zero. We denote by Λ the Lebesgue measure on [−1/2, 1/2]. Then dΛ d Pν is in L ∞ (P ν ) with norm bounded by a constant depending on D and ∆ only. Defining by P a ν = e −∆λ ∞ k=1 ∆ k k! ν * k the absolutely continuous part with respect to the Lebesgue measure Λ we see likewise that the density d P a ν dΛ is bounded in L ∞ (Λ) from above depending on D and ∆ only. By definition we have .
The nominator consists of 2 k terms and a typical term is of the from w 1 dν· · · w j dν · (w j+1 ν) * · · · * (w k ν) * P ν and up to permutation and choice of j between 0 and k all terms are of this form. So it suffices to bound d( w 1 dν· · · w j dν · (w j+1 ν) * · · · * (w k ν) * P ν ) d P ν L p (Pν ) .
For j = k this gives the desired bound and for j < k the previous line can be bounded by where we have used boundedness of dΛ d Pν and d P a ν dΛ . Young's inequality for convolutions yields the bound and the lemma follows by treating all 2 k terms in this way.
We define v(u) = l J−1 k a l u lk ψ lk with a l = 2 −l (l 2 where the constant only depends on D and ∆. It follows that where the supremum is over all Borel probability measures Q. Consequently we have sup Q N(F , L 2 (Q), ε F L 2 (Q) ) (A/ε) 2 J , for some A 2 and for 0 < ε < A and where the envelope can be taken as a constant function F with constant only depending on D and ∆. Let Then we have by Corollary 3.5.8 in [16] for some c > 0 We obtain II = o P (1) using the conditions √ nδ n ε L 2 n 2 J/2 log c ε L 2 n = o(1) and 2 J/2 √ n log c ε L 2 n ε L 2 n .
Next we consider the term III. It equals where we understand the bilinear forms through polarization and by abuse of notation ν (s) denotes a generic path. The terms (i)(a) and (ii)(a) are both centred. The term (i)(b) is centred after subtracting yielding the corresponding term in (49). The centring of the term (ii)(b) is of order . We start with the term (i)(a). We define functions and consider the corresponding class of functions as in (50). For u, u ′ ∈ Ê 2 J we denote again v = v(u), v ′ = v(u ′ ) and apply Lemma 18 to the function where the constant only depends on D and ∆. We choose the envelope F of the class F as a constant function C η ∞ , where the constant C depends only on D and ∆. Then the bound shows that we have sup Q N(F , L 2 (Q), ε F L 2 (Q) ) (A/ε) 2 J for some A 2 and for all 0 < ε < A.
The next step is to bound σ 2 = sup f ∈F P ν 0 f 2 . By Lemma 18 we have Corollary 3.5.8 in [16] allows to bound the empirical process appearing in term (i)(a). For some c > 0 we obtain The conditions for the first term dominating the second term is the same as for the term II.
To bound the term (i)(a) we use Next we treat term (i)(b), which is given by So after centring the term is given by tG n f v . We have by Lemma 18 We consider the class of functions F as in (50) corresponding to the functions of the form f v here and bound Just as for term (i)(a) we apply now Corollary 3.5.8 in [16] with envelop proportional to η ∞ . So the conditions for term (ii)(b) are the same as for the term (i)(a). We move on to the term (ii)(a). We define and f v = f vv . We now consider the class of functions F with this definition of f v . Then we have Choosing the envelope as a constant function proportional to ε L ∞ n we obtain for the covering numbers sup Q N(F , L 2 (Q), ε F L 2 (Q) ) (A/ε) 2 J . Turning to σ we see Again we apply Corollary 3.5.8 in [16], which gives the following bound for term (ii)(a) This tends to zero by the assumption for the term II.
The only remaining term of III is (ii)(b). This term takes the from can be written after centring as −δ n √ nG n f v and we bound We denote by F the class of functions corresponding to f v as in (50) and further bound We see that (ii)(b) leads to the same condition as the term (ii)(a).
The term IV equals .
The terms (i)(a), (ii)(a) and (iii)(a) are centred. The term (i)(b) can be centred by subtracting and gives the corresponding expression in (49). For the centring of term (ii)(b) we subtract To centre the term (iii)(b) we add tδ n √ n E ν 0 [A ν 0 (η)A ν 0 (v − v 0,J )] and this is bounded in absolute value by For term (i)(a) we bound using Lemma 18 and for term (i)(b) we bound using Lemma 18 So after centring term (i) is of order O P (t 2 n −1/2 η 2 L 4 ) and we use t 2 n −1/2 η 2 L 4 = o(1). The terms IV (ii) and IV (iii) are treated in the same way as the terms III(ii) and III(i), respectively. Since the terms IV (ii) and IV (iii) both have an additional factor δ n , no extra condition is needed.
The remainder term can be expressed as . We start with the centring of the third derivatives. So the aim is to bound .
The term (a) is centred. For term (b) we calculate using Hölder's inequality and for term (c) we likewise obtain We conclude Using Lemma 18 as well as the generalization of Hölder's inequality k j=1 f j L 1 (µ) k j=1 f j L p j (µ) for k j=1 1 p j = 1 and some measure µ, it follows in the same way that For the fifth derivative we let ν be either ν (s) or ν (u) t and first apply a measure change We observe that ω L p n = t √ n η L p + δ n ε L p n is the rate at which δ n (v) converges to zero in L p . For the centring of the third, fourth and fifth derivative we use the following conditions For the empirical process part we develop the remainder term only to the third derivative so that it takes the form .
We have v − v 0 L p ε L p n and v t − v 0 L p ε L p n + ω L p n . Both (i) and (ii) can be treated jointly by bounding a term of the form D 3 ℓ n ( ν n )[w, w, w] with ν n = exp( v n ), v n ∞ D, Let ν (r) = ν n exp(rw) so that .
For term (a) we define the functions .
After centring the term (a) is given by √ n n f v with f v varying in the class of functions corresponding to (50), where the functions f v are defined as here. We bound using Lemma 18 We take the envelope F to be a constant function proportional to ε L ∞ n + ω L ∞ n 2 and obtain sup Q N(F , L 2 (Q), ε F L 2 (Q) ) (A/ε) 2 J for some A 2 and for all 0 < ε < A. We bound σ by Using Corollary 3.5.8 in [16] this yields some c > 0 such that For the term (b) and (c) we obtain the same bounds for the uniform covering numbers and for σ as for term (a). So the bound (51) applies likewise to terms (b) and (c).

4.3.
Simplification of Assumption 17. In this section we simplify Assumption 17 and reduce it to a condition involving η and δ n only. To this end we recall ε n from (61) and the L p -contraction rates ε L p n from (63) both in Section 6. We set 2 J ≈ n 1/(2s+1) . Assumption 19. Suppose t = O(1), s > 11/6 and H n ⊆ L ∞ (I). Furthermore, assume for δ n and uniformly for all η ∈ H n δ n n 2/(2s+1) (log n) 1+2δ = o(1), (52) Remark 20. For s > 9/4 (and so in particular for s > 10/4 = 5/2) condition (54) implies condition (55). log n = n −s/(2s+1) (log n) 1/2 ε L 2 n and nδ n (ε L 2 n ) 2 = δ n n 2/(2s+1) (log n) 1+2δ = o(1) using (52). For term III(i) we bound by (54). We check that by (54) and that by (52). For the centring of the third derivatives we bound where we used (55) for the first term and (52) for the second term. Further we have = o(1) from the next to last display for the second term. The terms for the centering of the fourth derivates are treated by where we used (55) for the first term and (52) for the second term, and by , where we used (54) for the first term and the next to last display for the second term. Turning to the centring of the fifth derivatives we observe n(ε L 5 n ) 5 = n (−3s+5)/(2s+1) (log n) 5/2+5δ = o(1) and n(ω L 5 n ) 5 n t 5 n 5/2 η 5 using (54) for the first term and the next to last display for the second term. For the remainder term R n we bound using that s > 11/6 for the first and the second term and (54) for the third and the fourth term. Finally for the condition that the first term dominates in R n we verify

Proof of Proposition 3
The Radon-Nikodym density in (14) is well defined in view of the convolution series representation of P ν in (4). That A ν maps L 2 (ν) into L 2 (P ν ) is proved in Lemma 18, and an application of Fubini's theorem gives I A ν (h)dP ν = 0 for all h ∈ L 2 (ν). The expansion (13) follows by the same arguments used for the proof of Proposition 11 in Section 4.2 but is in fact easier and no empirical process tools are needed here. In the case v ∈ V J for some J the expansion follows directly from setting v 0 = v and η = h in (49). For the general case we consider the path s → exp(v + sh/ √ n) = ν (s) and obtain by a Taylor expansion for some s ∈ [0, 1] The terms I and II are both centred and are treated exactly as the term IV (i)(a) and the centred version of IV (i)(b) in Section 4.2. This yields I + II = O P (n −1/2 h 2 L 4 ). The centring of term III is shown to be O P (n −3/2 h 3 L 3 ), which is proved along the same lines as the centring of the third derivatives of the term R n in Section 4.2 combined with the measure change there applied to the fifth derivatives. After centring the term III is shown to be of order O P (n −1 h 3 L 6 ) with the same bounds as used for bounding σ when treating the empirical process part of R n except that here h is fixed and so a simple variance bound suffices instead of the empirical process inequality used for R n . We conclude I + II + III = o P (1).

Proof of Proposition 10
We first derive a general contraction theorem from which we will deduce Proposition 10 (after Proposition 28). We follow the usual 'testing and small ball probability approach' (as in Theorem 7.3.1 in [16], see also [14]), which in our setting gives the following starting point to prove contraction rates, where K(P ν , P ν ′ ) denotes the usual Kullback-Leibler (KL-) divergence between two probability measures P ν , P ν ′ .
Proposition 22. Consider a prior Π on a σ-field S V of some set V of Lévy measures for which the maps ν → p ν (x), x ∈ I, defined before (5) are all measurable. Let d be some metric on V such that ν → d(ν, ν ′ ) is measurable for all ν ′ ∈ V. Suppose for some sequence ε n → 0 such that √ nε n → ∞, constant C > 0 and n large enough we have n and that for V n ⊆ V such that we can find tests Ψ n = Ψ(X 1 , . . . , X n ) and δ n > 0, M 0 > 0, such that Then if Π(·|X 1 , . . . , X n ) is the posterior distribution from (6) we have, for every M M 0 , Π(ν : d(ν, ν 0 ) Mδ n |X 1 , . . . , X n ) → 0 as n → ∞ in P N ν 0 -probability. As in previously studied 'inverse problems' settings [21,24,26], to apply this proposition one requires new approaches to the construction of frequentist tests, and as in these references we use tools from 'concentration of measure' theory put forward in [15], where we initially choose for d the weak (or 'robust') metric induced by the norm · H(δ) of (56) a negative order Sobolev space. Contraction rates in stronger norms will then be deduced from interpolation arguments. Before doing so, however, we need to calculate KL-divergences for the observation scheme relevant in our context, and show that they can be bounded in terms of the distance of their Lévy measures.
Lemma 23. Let D > 0 such that e −D dν/ dΛ e D and e −D dν 0 / dΛ e D on I. Then there exists K D > 0 such that Proof. We define the path s → exp(s(v −v 0 ) + v 0 ) = ν (s) , s ∈ [0, 1], from ν 0 to ν and consider the function Observing f (0) = 0 a Taylor expansion at s = 0 yields some s ∈ [0, 1] such that f (1) = f ′ (0) + 1 2 f ′′ (s). By the upper and lower bounds on the Lévy densities the differentiation may be performed under the integral and we obtain log , where the last step contains a change of measure from P ν 0 to P ν (s) such that we may now apply Lemma 18 For the second inequality we consider the folllowing function g and its derivatives Observing g(0) = g ′ (0) = 0 we obtain by a Taylor expansion g(1) = g ′′ (s) for some s ∈ [0, 1] and thus log dP ν dP ν 0 2 dP ν 0 = 2 log dP ν (s) dP ν 0 In what follows, we will need the following identifiability condition.
For Lévy processes on R the Lévy measure can be identified by taking the complex logarithm of the characteristic function of P ν in such a way that the resulting function is continuous. This is known as the distinguished logarithm. For Lévy processes on a circle the characteristic function is defined only on the integer lattice and a continuous version of the logarithm cannot be defined. However, this problem can be resolved by assuming λ < π/∆ since then the exponent in the Lévy-Khintchine representation always coincides with the principle branch of the logarithm of the characteristic function, ensuring identifiability. This condition is sharp as the following examples show.
For the concentration around the mean we observe that Z itself also has bounded differences with c 2 = 4/n and applying Theorem 3.3.14 in [16] yields P(Z E Z + t) e −2t 2 /c 2 = e −nt 2 /2 , This shows that the linearisation of the first term in (58) is bounded by a multiple of ( √ log K + x)/ √ n. On A n we can bound the remainder in the linearisation by a multiple of the same quantity. For n/ log K large enough E Z is smaller than 1/(4c ′ ) and we can bound P(A c n ) by exp(−R 2 n) exp(−R 2 n/ log K) using the concentration of Z. The bound P(A c n ) (1/R 2 ) exp(−R 2 n/ log K) for all n and K is obtained by choosing a possibly smaller constant R 2 .
For the bias we bound, using the Cauchy-Schwarz inequality, which explains the second regime in the inequality in Lemma 25. Now to estimate ν we first estimate F ν(k), k = 0, by F ν(k) = (Φ n (k) + λ)1 [−K,K] (k), where K is a spectral cut-off parameter. By standard theory of Sobolev spaces on the unit circle, an equivalent norm on H(δ) is given by f ′ H(δ) = k |F f (k)| 2 k −1 (log(e + k)) −2δ .
Using that k k −1 (log(e + k)) −2δ converges for δ > 1/2 we obtain which, repeating the above arguments, gives the same bounds as those obtained for λ−λ.
We now verify the above assumptions for the prior used in the main body of the paper. For the choice J = J n with 2 Jn ≈ n 1/(2s+1) the prior (10) satisfies for n large enough the small ball probability condition (59).
Restricting to this event we can further bound L 2 -distances: by v 0 = log ν 0 ∈ C s and (8) and using Lemma 29 below (and the remark before it) we have on an event with posterior probability tending to one n so that, as n → ∞, Π(ν : ν − ν 0 L 2 C2 J/2 J δ ε n |X 1 , . . . , X n ) → P N ν 0 and further using that with posterior probability tending to one which also implies that Π(ν : ν − ν 0 ∞ C2 J J δ ε n |X 1 , . . . , X n ) → P N ν 0.