Blockwise SVD with error in the operator and application to blind deconvolution

We consider linear inverse problems in a nonparametric statistical framework. Both the signal and the operator are unknown and subject to error measurements. We establish minimax rates of convergence under squared error loss when the operator admits a blockwise singular value decomposition (blockwise SVD) and the smoothness of the signal is measured in a Sobolev sense. We construct a nonlinear procedure adapting simultaneously to the unknown smoothness of both the signal and the operator and achieving the optimal rate of convergence to within logarithmic terms. When the noise level in the operator is dominant, by taking full advantage of the blockwise SVD property, we demonstrate that the block SVD procedure overperforms classical methods based on Galerkin projection or nonlinear wavelet thresholding. We subsequently apply our abstract framework to the specific case of blind deconvolution on the torus and on the sphere.


Motivation
Consider the following idealised statistical problem: estimate a function f (a signal, an image) from data where K : H → G is a linear operator between two Hilbert spaces H and G.The observation of the unknown f ∈ H is challenged by the action of the linear degradation K as well as contaminated by an experimental Gaussian white noise Ẇ on G with vanishing noise level n −1/2 as n → ∞.Alternatively, in a density estimation setting, we observe a random sample (Z 1 , . . ., Z n ) drawn from a probability distribution 1 with density Kf .In each case, we do not know the operator K exactly, but we have access to where Ḃ is a Gaussian white noise on H × G thanks to preliminary experiments or calibration through trial functions.This setting has been discussed in details in [14,18].In this paper, we are interested in operators K admitting a singular value decomposition (SVD) or a blockwise SVD.In essence, we know the typical eigenfunctions of K but not the eigenvalues.We cover two specific examples of interest: spherical and circular deconvolution.
Spherical deconvolution.Used for the analysis of data distributed on the celestial sphere, see Section 4.1 below.One observes a random sample (Z 1 , . . ., Z n ) with where the ε i are random elements in SO(3), the group of 3 × 3 rotation matrices, and the X i are independent and identically distributed on the sphere S 2 , with common density f with respect to the uniform probability distribution µ on S 2 .In this setting, if the ε i have common density g with respect to the Haar measure du on SO(3), we have can be explicitly related in the asymptotic δ → 0 and n → ∞; this includes the comparison with earlier results of [24,14,18] in the context of blockwise SVD.
iii) Application to spherical deconvolution on S 2 or circular deconvolution on the torus; this includes the discussion of our findings in terms of the existing literature on the topic [7,25,22] and some practical aspects of numerical implementation.

Main results and organisation of the paper
In Section 2, we present an abstract framework that allows for operators K to admit a so-called blockwise SVD.This property is simply turned into the existence of pairs of increasing finite dimensional spaces (H ℓ , G ℓ ) that are stable under the action of K.The blockwise SVD property is further appended with a smoothness condition quantified by the arithmetic decay of the operator norm of K and its inverse on H ℓ (resp.G ℓ ) (the so-called ordinary smooth assumption, see e.g.[32]).By means of a reconstruction formula, we obtain in Section 2.2 an estimator f n,δ of f by first inverting K δ on H ℓ with a thresholding tuned with δ and then filter the resulting signal by a block thresholding tuned with n −1/2 .As for i) and ii), we establish in Theorems 3.1 and 3.4 of Section 3 the minimax rates of convergence for Sobolev constraints on f under squared error loss and we demonstrate that f n,δ is optimal and adaptive to within logarithmic terms.The explicit interplay between δ and n −1/2 is revealed and discussed in the case of sparse operator when n −1/2 ≪ δ, completing earlier findings in [14,18] and to some extent [24] in the univariate case for density deconvolution.In particular, we demonstrate that a certain parametric regime dominates when the smoothness of the signal dominates the smoothing properties of the operator.Concerning iii), the method is applied to the case of spherical and circular deconvolution in Section 4 where harmonic Fourier analysis enables to provide explicit blockwise SVD for the convolution operator.We illustrate the numerical feasability of f n,δ and the phenomena that appear in the case n −1/2 ≪ δ.Section 5 is devoted to the proofs.
We choose to state and prove our results in the white Gaussian model generated by the observation of y n and K δ defined by (1.1) and (1.2).The extension to the case of density estimation, when y n is replaced by the observation of a random sample of size n drawn from the distribution Kf , like for instance in [24,9] is a bit more involved, due to the intrinsic heteroscedasticity that appears when enforcing a formal analogy with the Gaussian setting (1.1).It is briefly addressed in the discussion Section 3.2 2 Estimation by blockwise SVD

The blockwise SVD property
Let G denote a family of linear operators between two Hilbert spaces H and G that shall represent our parameter set of unknown K.
A fundamental property (Assumption 2.1 below) is that an explicit singular value decomposition (SVD) or blockwise SVD is known for all K ∈ G simultaneously.More specifically, we suppose that there exist two explicitly known bases (e λ , λ ∈ Λ) of H and (g λ , λ ∈ Λ) of G, as well as a partition of Λ where means inequality up to a multiplicative constant that does not depend on ℓ.
It is worthwhile to notice that in our examples as well as in the rates of convergence that we will exhibit later, d plays the role of a dimension.In particular, d = 1 corresponds to a 'standard SVD', whereas d > 1 creates blocks and deserves the name of 'blockwise' SVD.However, there is no need in the paper to assume that d is in N. Set

The Galerkin projection of an operator
We further need to quantify the action of K on H ℓ .We denote by Assumption 2.2 (Spectral behaviour of K |H ℓ ).For every ℓ ≥ 1, K ℓ is invertible and there exists ν ≥ 0 such that We associate with the bases (e λ , λ ∈ Λ) and (g λ , λ ∈ Λ) the following decompositions where •, • denotes the inner product either in H or G and the scale of Sobolev spaces For ν ≥ 0, Assumption 2.2 implies that K : W −ν/2 → W ν/2 is continuous.
In particular, when ν > 0, the operator K is ill-posed with degree ν, see for instance [26].

Blockwise SVD reconstruction with noisy data
Under Assumption 2.1 and 2.2, we have the reconstruction formula By the observed blurred version K δ of K in (1.2), we obtain a family of estimators of (K ℓ ) −1 from data (1.2) by considering the operator where κ > 0 is a cutoff level, possibly depending on ℓ.Likewise, the coefficient Kf, g λ can be estimated by z n,λ := y n , g λ . (2.4) Mimicking the reconstruction formula (2.2) with the estimates (2.4) and (2.3), we finally obtain a (family of) estimator(s) of f by setting where The procedure is specified by the maximal frequency level L and the threshold levels and for some prefactors λ 0 , µ 0 > 0. The threshold rule we introduce in both the signal (with level τ ℓ ) and the operator (with level κ ℓ ) is inspired by classical block thresholding [21,4,3] and will enable to adapt with respect to the smoothness properties of both the signal f and the operator K, see below.
3 Main results

Minimax rates of convergence
We assess the performance of the estimator f n,δ defined in Section 2.2 over the Sobolev spaces linked to the basis (e λ , λ ∈ Λ) defined in (2.1).Define the Sobolev balls and κ ℓ , τ ℓ as in (2.5) and (2.6).For sufficiently small λ 0 and sufficiently large µ 0 , for every s, M > 0, where means inequality up to a multiplicative constant that depends on d, s, ν, M, Q and µ 0 , λ 0 only.
The bounds for µ 0 and λ 0 are explicitly computable.In the model generated by y n in (1.1) and K δ in (1.2), they depend on the dimension d and on the absolute constants c 0 and c 1 of the concentration lemmas 5.3 and 5.6 below.However, they are in practice much too conservative, as is well known in the signal detection case [13] or the classical inverse problem case [1], see the numerical implementation Section 4.
Our next result shows that the rate achieved by f n,δ is indeed optimal, up to logarithmic terms.The lower bound in the case δ = 0 is classical (Nussbaum and Pereverzev [26]) and will not decrease for increasing noise levels δ or n −1/2 whence it suffices to provide the case which formally corresponds to observing Kf without noise while K remains unknown.Theorem 3.2 (Lower bounds).In the same setting as Theorem 3.1, with in addition Q 2 > 1/Q 1 , assume we observe Kf exactly and K δ given by (1.2).For sufficiently small δ, we have where means inequality up to a positive multiplicative constant that depends on d, s, ν, M and Q only.
Combining (3.3) together with (3.4) and the results of [26], we conclude that f n,δ is minimax over W s (M ) to within logarithmic terms in n and δ, and that this result is uniform over the nuisance parameter K ∈ G ν (Q).

Discussion
The case of diagonal operators It is interesting to notice that the condition Q 2 > 1/Q 1 in Theorem 3.2 excludes the case where K is diagonal.In this particular case, considered especially in the deconvolution example of Section 4.2 below, a closer inspection of the proof of the upper bound shows that the rate n −s/(2(s+ν)+d) δ 1 s/ν can be obtained (up to some extra logarithmic factors) as in the case where d = 1, which improves on the rate provided by Theorem 3.1.This sheds some light on the role of the number d.It is in fact twofolds: it acts as a 'dimension' in the term n −2s/(2(s+ν)+d) ; in the term involving error in the operator δ, it reflects the distance to the diagonal case expanding from δ 1 s/ν in the diagonal case, to δ 1 2s/(2ν+d−1) in the case It is very plausible, though beyond the scope of this paper, to express conditions on K leading to rates of the form 2s/(2ν + α), with α continuously varying from 0 to d − 1.Note that in the case d = 1, we recover the minimax rate of density deconvolution with unknown error as proved by Neumann [24], see also [9].

Relation to other works in the case of sparse operators
For an unknown signal f with smoothness s > 0 and unknown operator with degree of ill-posedness ν ≥ 0, the optimal rates of convergence are up to inessential logarithmic terms.The exponents α(s, ν) and β(s, ν) are linked respectively to the error in the signal y n and the error in the operator K δ .Efromovitch and Kolchinskii [14] established that under fairly general conditions on the operator K, the optimal exponents are given by They noted however that if certain sparsity properties on K are moreover assumed (and that we shall not describe here, for instance if K is diagonal in an appropriate basis) then the exponent β(s, ν) = 2s 2(s+ν)+d is no longer optimal, while α(s, ν) remains unchanged.
In the related context of operators acting on Besov spaces B s p,p ([0, 1] d ) of functions with smoothness s measured in L p -norm, Hoffmann and Reiß [18] introduce an ad hoc hypothesis on the sparsity of the unknown operator (that we shall not describe here either), expressed in terms of the wavelet discretization of K.They subsequently obtain new rates of convergence for a certain nonlinear wavelet procedure, and these rates overperform (3.5) as expected from the results by [14].In particular, if one considers the estimation of f ∈ B s 2,2 , in the extreme case where the operator K is diagonal in a wavelet basis, the procedure in [18] achieves the rate up to extra logarithmic terms.We may compare our results with the rate (3.6).In our setting, if we pick (e λ , λ ∈ Λ) as the Fourier basis described in Section 4.2, then we have Assuming K to be diagonal in the basis (e λ , λ ∈ Λ) which is the exact counterpart of the approach of Hoffmann and Reiß with K being diagonal in a wavelet basis, then by Theorem 3.1, our estimator f n,δ (nearly) achieves the rate which already outperforms the rate (3.6) whenever the error in the signal y n is dominated by the error in the operator and s is small compared to ν, as follows from the elementary inequality The superiority of the blockwise SVD in this setting is explained by the fact that the wavelet procedure in [18] is agnostic to the diagonal structure of K in the wavelet basis, in contrast to f n,δ that takes full advantage of the block structure of K.As already explained in the preceding section, one could actually improve further this result in the specific case of K being diagonal in (e λ , λ ∈ Λ) and show that f n,δ (nearly) achieves the rate n −α(s,ν)/2 (δ 2 ) 1 s/ν , thus deleting the 'dimensional effect' of d for the error in the operator.
Adaptation over the scales {W s , s > 0} and {G ν , ν ≥ 0} The estimator f n,δ is fully adaptive over the family of Sobolev balls {W s (M ), s > 0, M > 0} (in the sense that f n,δ does not require the knowledge of s nor M ).However, the knowledge of the degree of illposedness ν of K is required through the choice of the maximal frequency L in (3.2).This restriction can actually be relaxed further in dimension d ≥ 2. Indeed, setting formally ν = 0 in (3.2), one readily checks that f n,δ becomes adaptive over {W s (M ), s > 0, M > 0} and 2) is forbidden, but an alternative adaptivity result can be obtained by taking L = ⌊(δ 2 ) −1/s 0 ⌋ ∧ n for some s 0 > 0, in which case f n,δ is fully adaptive over the scale

Extension to density estimation
We briefly show a line of proof for extending Theorem 3.1 to the framework of density estimation.Suppose that instead of y n we observe a random sample Z 1 , . . ., Z n drawn from Kf assumed to a probability density.By analogy to (2.4), we have an estimator of P(g λ ) = Kf, g λ replacing y n , g λ with with η n,λ = n 1/2 P n (g λ ) − P(g λ ) , an inspection of the proof of Theorem 3.1 reveals that an extension to the density estimation setting carries over as soon as the vector (η n,λ , λ ∈ Λ ℓ ) satisfies a concentration inequality, namely To that end, we may apply a concentration inequality by Bousquet [2] as developed for instance in Massart [23], Eq (5.51) p. 171.The precise control of this extension requires further properties on the basis (g λ , λ ∈ Λ) and on the density Kf via the behaviour of λ∈Λ ℓ Var g λ (Z 1 ) , see Eq. (5.52) p. 171 in [23].We do not pursue that here.
4 Application to blind deconvolution 4.1 Spherical deconvolution Scientific context.A common challenge in astrophysics is the analysis of complex data sets consisting of a number of objects or events such as galaxies of a particular type or ultra high energy cosmic rays (UHECR) and that are genuinely distributed over the celestial sphere.Such objects or events are distributed according to a probability density distribution f on the sphere, depending itself on the physics that governs the production of these objects or events.For instance, UHECR are particles of unknown nature arriving at the earth from apparently random directions of the sky.They could originate from long-lived relic particles from the Big Bang.Alternatively, they could be generated by the acceleration of standard particles, such as protons, in extremely violent astrophysical phenomena.They could also originate from Active Galactic Nuclei (AGN), or from neutron stars surrounded by extremely high magnetic fields.As a consequence, in some hypotheses, the underlying probability distribution for observed UHECRs would be a finite sum of point-like sources.In other hypotheses, the distribution could be uniform, or smooth and correlated with the local distribution of matter in the universe.The distribution could also be a superposition of the above.Identifying between these hypotheses is of primordial importance for understanding the origin and mechanism of production of UHECRs.The observations, denoted by X i , are often perturbated by an experimental noise, say ε i , that lead to the deconvolution problem described in Section 1.1.Following van Rooj and Ruymgart [33], Healy et al. [16], Kim and Koo [22] and Kerkyacharian et al. [25], we assume the following model: we observe an n-sample where the X i are distributed on the sphere S 2 , with common density f with respect to the uniform probability distribution µ(dω) on S 2 and independent of the ε i that have a common density g with respect to the Haar probability measure dr on the group SO(3) of 3 × 3 rotation matrices.One proves in [16,22] that the density of the Z i is and we are interested in the case where the exact form g of the convolution operator K = g⋆• is unknown, due for instance to insufficient knowledge of the device that is used to measure the observations.However, we assume approximate knowledge of g through K δ as defined in (1.2).
Checking the blockwise SVD Assumptions 2.1 and 2.1.We closely follow the exposition of [16,22,25] for an overview of Fourier theory on S 2 and SO(3) in order to establish rigorously the connection to Theorem 3.1 and 3.4.Define for l ∈ N, −l m, n l where P l mn are the second type Legendre functions described in details in [34].The D l mn are the eigenfunctions of the Laplace-Beltrami operator on SO(3) hence the family ( √ 2l + 1D l mn ) forms a complete orthonormal basis of L 2 (dr) on SO(3), where dr is the Haar probability measure.Every h ∈ L 2 (dr) has a rotational Fourier transform h(u)D l mn (u)du, l ∈ N, −l m, n l, and for every h ∈ L 2 (dr) we have a reconstruction formula An analogous analysis is available on S 2 .Any point ω ∈ S 2 is determined by its spherical coordinates ω = sin(θ) cos(ϕ), sin(θ) sin(ϕ), cos(θ) for some and a reconstruction formula If g ∈ L 2 SO(3) the spherical convolution operator Kf = g ⋆ f defined in (4.1) satisfies and we retrieve the blockwise SVD formalism of Section 2.1 in dimension d = 2 by setting H = G = L 2 (S 2 , µ), where µ the probability Haar measure on S 2 and We have |Λ ℓ | = 2ℓ + 1 and by (4.3),K ℓ is the finite dimension operator stable on Span{e λ , λ ∈ Λ ℓ } with matrix having entries (K ℓ ) mn = F(g) ℓ mn .
Hence Assumption 2.1 is satisfied.Notice also that in this case K ℓ is generally not diagonal.Assumption 2.2 is satisfied as we assume that g is ordinary smooth in the terminology of Kim and Koo [22].Our Assumption 2.2 exactly matches the constraint (3.6) in their paper with examples given by the Laplace distribution on the sphere (ν = 2) or the Rosenthal distribution (ν > 0 arbitrary).
Numerical implementation.Following Kerkyacharian, Pham Ngoc and Picard [25] in their Example 2, we take ) with ω 1 = (0, 1, 0) and C = 1/0.7854.We have f L 2 (µ) = 0.7469.g is the density of a Laplace distribution on SO(3), defined through F(g) ℓ mn = δ mn 1 + ℓ(ℓ + 1) −1 .Hence, the matrices (K ℓ ) mn are homotheties whose ratios behave as ℓ −2 .We have ν = 2.We plot in Figures 1 a 1000-sample of X i with density f on the sphere, and the action by ε i on the X i , where the ε i are distributed according to g in Figure 2. Note that for the estimation of g, we have access to a noisy version of g with noise level δ only.We display below the (renormalised) empirical squared error of f 10 8 ,δ (oracle choice λ 0 = 1, µ 0 = 1) for 1000 Monte-Carlo for several values of δ.The noise level δ is to to be compared with the noise level n −1/2 = 10 −4 .The latter is chosen non-negative, in order to show the interaction between the two types of error, and sufficiently small to emphasize the influence of δ on the process of estimation.
Noise level δ 0 10 −3 3 10 −3 5 10 −3 10 −2 Mean error 0.0466 0.0542 0.1732 0.2784 0.4335 Standard dev.0.0011 0.0022 0.0126 0.0355 0.0466 Finally, on a specific sample of n = 10 8 data, we plot the target density f (Figure 3) and its reconstruction for n = 10 8 data with δ = 0 (Figure 4) and δ = 3 10 −3 (Figure 5).At a visual level, we oversimplify the representation by plotting f and its reconstruction with a view from above the sphere through the Oz axis.We see that the contour in Figure 5 is not well recovered in the regions where f is small (on the right side of the graph in Figure 5).The choice of λ 0 , µ 0 remains unchanged.

Circular deconvolution
Scientific context.In many engineering problems, the observation of a signal f or image is distorted by the action of a linear operator K.We assume for simplicity that f lives on the torus T = [0, 1] (or [0, 1] d ) appended with periodic boundary conditions.In many instances, the restoration of f from the noisy observation of Kf is challenged by the additional uncertainty about the operator K.This is the case for instance in electronic microscopy [27] for the restoration of fluorescence Confocal Laser Scanning Microscope (CLSM) images.In other words, the quality of the image suffers from two physical limitations: error measurements or limited accuracy, and the fact that the exact PSF (the incoherent point spread function) that accounts for the blurring of f (mathematically the action of K) is not precisely known.This is a classical issue that goes back to [31,15].An idealised additive Gaussian model for the noise contamination yields the observation (1.1) with The degradation process K = g⋆• is characterised by the impulse response function g.In most cases of interest, we do not know the exact form of g.In a condensed idealised statistical setup, we have access to where Ẇ ′ is another Gaussian white noise defined on L 2 (µ) and independent of Ẇ .Experimental approaches that justify representation (4.4) are described in [10,20,30].
Numerical implementation.We numerically implement f n,δ in dimension d = 1 in the case where there is no noise in the signal (formally n −1/2 = 0) in order to illustrate the parametric effect that dominates in the optimal rate of convergence in Theorems 3.1 and 3.4 that becomes δ 2 s/ν∧1 in that case.We take as target function f : T → R belonging to W 5−α for all α > 1/2 and defined by its Fourier coefficients We pick a family of blurring functions g ν defined in the same manner by the formula F(g ν ) λ = |λ| −ν , λ ∈ {−1000, . . ., 1000}, ν ∈ {1, 4, 5, 6, 8}.
We show in Figure 6 in a log-log plot the mean-squared error of f ∞,δ for the oracle choice µ 0 = 0, λ 0 = 1 over 1000 Monte-Carlo simulations for ν ∈ {1, 4, 5, 6, 8} and δ ∈ [10 −4 , 10 −1 ].For small values of δ the predicted slope of the curve gives a rough estimate of the rate of convergence.We visually see that for the critical case ν ≤ s = 5 − α with α > 1/2 and below, the slope is close to 2 confirming the parametric rate that is obtained whenever ν ≤ s.

Preliminary estimates
Preparation.Recall that and that P ℓ (resp.Q ℓ ) denotes the orthogonal projector onto G ℓ (resp.Figure 6: Estimation of the rate exponent when n −1/2 ≪ δ.Empirical squarederror Ê versus δ in log-log scale.Top-to-bottom: ν = 8, 6, 5, 4, 1.The target function has smoothness s = 5 − α for all α > 1/2.For ν < 4.5, the slope of the curve is constant and close to 2, confirming the parametric rate predicted by the theory when the smoothess of the signal dominates the degree of ill-posedness of the operator.The empirical errors were computed using 1000 Monte-Carlo simulations. H ℓ ).For h ∈ H, we have ℓ and therefore P ℓ K(Id− Q ℓ )h = 0.As a consequence (5.1) In turn, we have a convenient description of the observation K δ defined in (1.2) and y n defined in (1.1) and in terms of a sequence space model that we shall now describe.
Notation.If h ∈ G, we denote by h ℓ the (column) vector of coordinates of P ℓ h in the basis (g λ , λ ∈ Λ ℓ ).If T : H → G is a linear operator, we write T ℓ for the matrix of the Galerkin projection Sequence model for error in the operator.The observation of K δ in (1.2) leads to the representation K δ,ℓ = K ℓ + δ Ḃℓ , or equivalently, in matrix notation An immediate consequence of Lemma 5.1 is the following moment bound: For every p > 0, E Ḃℓ (5.4) Sequence model for error in the signal.From (1.1), we observe the Gaussian measure y n , or equivalently, thanks to (5.1) or, using the notation introduced in (2.4), in matrix notation where we used (5.1), with η ℓ denoting a vector of |Λ ℓ | independent centred Gaussian random variables with unit variance.
The following result is a direct consequence of the fact that η ℓ has a χ-square distribution with |Λ ℓ | degrees of freedom.The proof is standard Lemma 5.2.There are positive constant

Proof of Theorem 3.1
We have where • denotes the Euclidean norm on R |Λ ℓ | (we shall omit any reference to ℓ when no confusion is possible).Concerning the bias term, we have and this term has the right order by definition of L in (3.2).Concerning the stochastic term, thanks to our preliminary analysis, we may write We set We thus obtain the decomposion of the variance term as We shall successively bound each term I, II and III.
• The term I, preliminary decomposition.Writing We introduce further the event { δ Ḃℓ op ≤ a ℓ } with a ℓ = ρ κ ℓ for some 0 < ρ < 1  2 and the condition { K ℓ f ℓ ≥ τ ℓ 2 }.We thus have We shall next successively bound each term IV , V , V I and V II • The term IV.First, we have , by a usual Neumann series argument, Therefore, on A ℓ and { δ Ḃℓ op ≤ a ℓ }, we have Second, we now write hence, on A ℓ and { δ Ḃℓ op ≤ a ℓ }, we have by (5.8) We are ready to bound the term IV itself.We have where we successively used (5.8) and (5.9).It follows that where we used Assumption 2.2.The bound is uniform in K ∈ G ν (Q).By definition of κ ℓ and using that |Λ ℓ | is of order ℓ d−1 , we derive by definition of L in (3.2), and this result is uniform in By definition of L again we derive and this bound is uniform in f ∈ W s (M ).Putting together (5.10) and (5.11), we finally obtain n −2s/(1ν+d) (5.12) • The term V. We have where we successively used (5.8) and (5.9) in the same way as for the term IV, the last inequality being obtained thanks to Assumption 2.2.By Assumption 2.2 again, since for some constant c that depends on Q 1 (K) and the pre-factor µ 0 in the choice of • The term VI.We further bound the term V I via On A ℓ , we have applying successively Cauchy-Schwarz, the moment bound (5.4) and Lemma 5.1.Indeed, since a ℓ = ρ/κ ℓ , by definition of κ ℓ in (2.5), we infer (5.14) by (5.6) of Lemma 5.2 since ρ λ 0 | log δ| 1/2 ≥ β 0 for sufficiently small λ 0 thanks to the assumption δ ≤ δ 0 < 1.Finally, since Λ ℓ is non-empty, by taking λ 0 sufficiently small, we conclude uniformly in f ∈ W s (M ).We now turn to the term IX.Observe first that (5.16) We reproduce the steps we used for the term V III, replacing the event By definition of τ ℓ in (2.6) and Lemma 5.6, we have (5.17) since µ 0 2 (log n) 1/2 ≥ β 1 for large enough µ 0 .It follows that by taking µ 0 sufficiently large.The bound is uniform in f ∈ W s (M ).
Putting together the estimates (5.15) and (5.18), we derive for large enough n, uniformly in f ∈ W s (M ).
• The term VII. .The arguments needed here are quite similar to those we used for the term V I. On A ℓ , we have hence, using (5.16), the fact that E η ℓ 2 = |Λ ℓ | ℓ d−1 together with κ ℓ ≤ n 1/2 by definition (2.5), we successively obtain where we applied (5.14) and (5.17) to obtain the last inequality.The choice of L in (3.2) leads to by taking λ 0 sufficiently small and µ 0 sufficiently large.
• The term II.We claim the following inequality a consequence of the following elementary lemma Lemma 5.3.Let A and B be two bounded operators with bounded inverse.If B −1 ≥ κ for some κ > 0, then either Proof of Lemma 5.3.Write B = A+ξ.Assume that A −1 < κ/2.By the triangle inequality, (A+ξ) −1 −A −1 ≥ κ/2.We proceed by contradiction: suppose that ξ ≤ 1/κ.Then we have and a standard Neumann series argument entails By Assumption (2.2), we have for some constant c that depends on Q 2 (K) and λ 0 only.For the second term in the right-hand side of (5.22), we apply by Lemma 5.1 in the same way as we obtained (5.15) for the term V III.We derive We finally obtain • The term III.Obviously, the decomposition (5.5) entails On the one hand, we have uniformly in f ∈ W s (M ), K ∈ G ν (Q).On the other hand, by (5.17), we have by taking µ 0 large enough, uniformly in f ∈ W s (M ).Combining this last estimate with (5.24) we infer Proof of Theorem 3.1, completion.It remains to piece together the estimates (5.7), (5.21), (5.23) and (5.25).
It readily follows that the posterior law of ϑ given X is we have where we used a version of Anderson's Lemma given in Lemma 10.2 in [19] p. 157.Indeed, the law of E[ϑ | X] − ϑ has a centrally symmetric density and the function H δ is nonnegative, centrally symmetric, satisfies H δ (0) = 0 and the sets {x, H δ (c 3 , x) < c} are convex for any c > 0.
Now, E[ϑ | X] − ϑ 2 has a χ 2 -distribution with |Λ ℓ | degrees of freedom, up to a scaling factor of order δ 2 ℓ 4ν .This means that the sequence of random variables δ bounded below in probability in ℓ ≥ 1 and δ > 0. Since E[ϑ | X]−ϑ is moreover independent of X, it follows that there exists c 3 independent of δ and ℓ such that Integrating with respect to X, we obtain (5.29) and the result follows.
Proof of Theorem 3.2 We assume with no loss of generality that 2ν +d−1 ≥ 2s.(Otherwise, the lower bound δ trivially follows from the parametric case. where g ℓ is an arbitrary vector in R |Λ ℓ | with g ℓ = 1 (fixed in the sequel).Then It follows that for an arbitrary estimator f , we have sup Lemma 5.5.There exist a choice of g ℓ with g ℓ = 1 and constants c 4 , c 5 (depending on s, ν, M, Q) such that for any π ∈ Π s,ν (M, Q 1 ), if where the infimum is taken over all estimators and provided δ > 0 is sufficiently small.
Proof of Lemma 5.5.In view of (5.31), we may (and will) assume that π ℓ = 1.We rely on the notation and definition of the preliminaries.Observe first that inf where K 0 is defined in (5.26).Put v δ,ℓ = δ 2 ℓ 4ν+d−1 .For any c > 0, by Chebyshev inequality, we have (5.32) We adopt the same Bayesian approach as in the preliminaries and consider K ℓ as a random matrix with distribution such that where Ẇ ℓ is an independent copy of Ḃℓ and c 2 > 0 is to be specified later.Using the randomisation (5.33) on K ℓ , the right-hand side in (5.32) is now bigger than inf Let us first show that inf is bounded below for an appropriate choice of c > 0. Introduce the event for some 0 < ρ < 1. Observe that (K 0 ℓ ) −1 c 2 δ Ẇ ℓ op ≤ ρ on A δ , therefore, by an usual Neuman series argument, we have the decomposition First, we have that Ẇ ℓ op is bounded in probability by Lemma 5.1 in ℓ ≥ 1.Since |Λ ℓ | 1/2 δ ≤ c 4 ℓ ν by assumption, we also have that ℓ ν δ Ẇ ℓ op is bounded in probability, hence the probability of A δ can be taken arbitrarily close to 1 by taking c 2 sufficiently small.Moreover, on A δ , we have where we again used the fact that |Λ ℓ | 1/2 δ ≤ c 4 ℓ −ν by assumption.The claim follows from the fact that |Λ ℓ | −1/2 Ẇ ℓ op is bounded in probability.Hence (5.36) and (5.35) is proved.
In order to complete the proof of Lemma 5.5, we need to check that the term P K ℓ / ∈ M ν ℓ (Q) can be taken arbitrarily small when bounding (5.32) below by (5.34).We have (5.37) For the first term in the right-hand side of (5.37), we have The last term can be rewritten as For the second term in the right-hand side of (5.37), thanks to the property K −1 ℓ op c 1 ℓ −ν − c 2 δ Ẇ ℓ op −1 we derive By assumption, we have that ℓ −ν |Λ ℓ | −1/2 δ −1 is bounded away from zero.Since |Λ ℓ | −1/2 Ẇ ℓ op is tight in ℓ ≥ 1, we can conclude by taking c 2 sufficiently small.The proof of Lemma 5.5 is complete.

(4. 2 )
for l ∈ N, −l m l where P l m are the Legendre functions.We haveY l −m = (−1) m Y lm and the (Y l m ) constitute an orthonormal basis of L 2 (µ) on S 2 , generally referred to as the spherical harmonic basis.Any f ∈ L 2 (µ) has a spherical Fourier transform

Figure 1 :
Figure 1: Data from f .Plot of n = 1000 data with common distribution f on the sphere S (planar representation).

Figure 2 :
Figure 2: Data from f ⋆ g.Plot of n = 1000 data εiXi on the sphere S with common distribution Kf = f ⋆ g.The Xi are the data pictured in Figure 1 and the εi are sampled according to g (planar representation).

Figure 3 :
Figure 3: Target density f .The representation is simplified through a view from above the sphere through the Oz-axis.

. 2 )where
Ḃℓ is a |Λ ℓ | × |Λ ℓ | matrix with entries that are independent centred Gaussian random variables, with unit variance.The following estimate is a classical concentration property of random matrices.For ℓ ≤ L, • op denotes the operator norm for |Λ ℓ | × |Λ ℓ | matrices (we shall skip the dependence upon ℓ in the notation).