Block thresholding for wavelet-based estimation of function derivatives from a heteroscedastic multichannel convolution model

We observe $n$ heteroscedastic stochastic processes $\{Y_v(t)\}_{v}$, where for any $v\in\{1,\ldots,n\}$ and $t \in [0,1]$, $Y_v(t)$ is the convolution product of an unknown function $f$ and a known blurring function $g_v$ corrupted by Gaussian noise. Under an ordinary smoothness assumption on $g_1,\ldots,g_n$, our goal is to estimate the $d$-th derivatives (in weak sense) of $f$ from the observations. We propose an adaptive estimator based on wavelet block thresholding, namely the"BlockJS estimator". Taking the mean integrated squared error (MISE), our main theoretical result investigates the minimax rates over Besov smoothness spaces, and shows that our block estimator can achieve the optimal minimax rate, or is at least nearly-minimax in the least favorable situation. We also report a comprehensive suite of numerical simulations to support our theoretical findings. The practical performance of our block estimator compares very favorably to existing methods of the literature on a large set of test functions.

In the following, any function h ∈ L 2 per ([0, 1]) can be represented by its Fourier series where the equality is intended in mean-square convergence sense, and F ℓ (h) denotes the Fourier series coefficient given by whenever this integral exists.The notation • will stand for the complex conjugate.

Overview of previous work
There is an extensive statistical literature on wavelet-based deconvolution problems.For obvious space limitations, we only focus on some of them.
In the special case where g 1 = • • • = g n , (1) reduces to the form where Y (t) = (1/n) n v=1 Y v (t), and W (t) = (1/n 1/2 ) n v=1 W v (t) is standard Brownian motion.In such a case, (2) becomes the standard deconvolution which attracted attention of a number of researchers spanning a wide range of fields including signal processing and statistics.For instance, wavelet-based estimators of f have been constructed and their asymptotic performance investigated in a number of papers, see e.g.[4-6, 9, 15, 18].When g 1 , . . ., g n are not necessarily equal, estimators of f and their minimax rates under the mean integrated squared error (MISE) over Besov balls were proposed in [13,[21][22][23].These authors develop wavelet thresholding estimators (hard thresholding in [13,23] and block thresholding in [21,22]) under various assumptions on g 1 , . . ., g n (typically, ordinary smooth and super-smooth case, or boxcar blurring functions).
Estimating the derivative of a function on the basis of noisy and blurred observations is of paramount importance in many fields such as signal processing, control or mathematical physics.For instance detecting the singularities of f or characterizing its concavity or convexity properties is a longstanding problem in signal processing.The estimation of the derivatives from noisy solutions of ordinary or partial differential equations is typical in many areas of mathematical physics such as astronomy.The derivatives estimation have already been investigated from several standard nonparametric models.If we only restrict the review to wavelet methods, we refer to [10] for model (2) and to [7,8,25] for density estimation problems.

Contributions and relation to prior work
In this paper, considering an appropriate ordinary smoothness assumption on g 1 , . . ., g n , we develop an adaptive wavelet-based block estimator f (d) of f (d)  from (1), d ∈ N. It is constructed using a periodized Meyer wavelet basis and a particular block thresholding rule which goes by the the name of BlockJS; see [1] for the original construction of BlockJS in the standard Gaussian noise model, and [2,3,11,26] for further developments on BlockJS.Adopting the minimax approach under the MISE over Besov balls, we investigate the upper bounds of our estimator.This is featured in Theorem 4.1.We prove that the rates of our estimator are nearly optimal by establishing a lower bound as stated in Theorem 4.2.
Our work is related to some prior art in the literature.To the best of our knowledge, the closest ones are those of [21,22].For the case where d = 0 and the blurring function is ordinary-smooth or super-smooth, [21, 22, Theorems 1 and 2] provide the upper and lower bounds of the MISE over Besov balls for a block hard thresholding estimator from the functional deconvolution model1 .These bounds match ours but only for d = 0.In this respect, our work goes one step further as it tackles the estimation (with a different wavelet estimator) of f and its derivatives.As far as the methods of proof are concerned, we use similar tools (concentration and moment inequalities as well as the general result in [11]) as theirs for the upper bound, but the proof of the lower bounds are different.However unlike [21], we only cover the ordinary smooth convolution, while their results apply also to the super-smooth case.On the practical side, for d = 0, we will show in Section 5 that BlockJS behaves better than block hard thresholding [22, (2.9)] over several test functions that contain different degrees of irregularity.

Paper organization
The paper is organized as follows.Section 2 gives a brief account of periodized Meyer wavelets and Besov balls.Section 3 states ordinary smoothness assumption on g 1 , . . ., g n , and then constructs the BlockJS-based estimator.The minimax upper and lower bounds of this estimator are investigated in Section 4. Section 5 describes and discusses the simulation results, before drawing some conclusions in Section 6.The proofs are deferred to Section 7 awaiting inspection by the interested reader.

Periodized Meyer Wavelets
We consider an orthonormal wavelet basis generated by dilations and translations of a "father" Meyer-type wavelet φ and a "mother" Meyer-type wavelet ψ.These wavelets enjoy the following features.
• They are smooth and frequency band-limited, i.e. the Fourier transforms of φ and ψ have compact supports with where supp denotes the support.• The functions (φ, ψ) are C ∞ as their Fourier transforms have a compact support, and ψ has an infinite number of vanishing moments as its Fourier transform vanishes in a neighborhood of the origin: • If the Fourier transforms of φ and ψ are also in C m for a chosen m ∈ N, then it can be easily shown that φ and ψ obey for every t ∈ R.
Let l ≥ j * , any function h ∈ L 2 per ([0, 1]) can be expanded into a wavelet series as where See [20,Vol. 1 Chapter III.11] for a detailed account on periodized orthonormal wavelet bases.

Besov balls
Let 0 < M < ∞, s > 0, 1 ≤ p, r ≤ ∞.Among the several characterizations of Besov spaces for periodic functions on L p ([0, 1]), we will focus on the usual one based on the corresponding coefficients in a sufficiently q-regular (periodized) wavelet basis (q = ∞ for Meyer wavelets).More precisely, we say that a function h belongs to the Besov ball B s p,r (M ) if and only if 1 0 |h(t)| p dt ≤ M , and there exists a constant M * > 0 (depending on M ) such that the associated wavelet coefficients (6) satisfy with a smoothness parameter 0 < s < q, and the norm parameters p and r.
Besov spaces capture a variety of smoothness features in a function including spatially inhomogeneous behavior, see [20].

The ordinary smoothness assumption
In this study, we focus on the following particular ordinary smoothness assumption on g 1 , . . ., g n .We assume that there exist three constants, c g > 0, C g > 0 and δ > 1, and n positive real numbers σ 1 , . . ., σ n such that, for any ℓ ∈ Z and any v ∈ {1, . . ., n}, This assumption controls the decay of the Fourier coefficients of g 1 , . . ., g n , and thus the smoothness of g 1 , . . ., g n .It is a standard hypothesis usually adopted in the field of nonparametric estimation for deconvolution problems.See e.g.[24], [16] and [18].
Example 3.1.Let τ 1 , . . ., τ n be n positive real numbers.For any v ∈ {1, . . ., n}, consider the square-integrable 1-periodic function g v defined by Then, for any ℓ ∈ Z, In the sequel, we set For a technical reason that is not restrictive at all (see Section 7), we suppose that ρ n ≥ e.

BlockJS estimator
We suppose that f (d) ∈ L 2 per ([0, 1]) and that the ordinary smoothness assumption (8) holds, where δ refers to the exponent in the assumption.We are ready to construct our adaptive procedure for the estimation of f (d) .
Let A j = {1, . . ., ⌊2 j L −1 ⌋} be the set indexing the blocks at resolution j.For each j, let {B j,K } K∈Aj be a uniform and disjoint open covering of {0, . . ., 2 j − 1}, i.e.K∈Aj B j,K = {0, . . ., 2 j − 1} and for any (K, We define the Block James-Stein estimator (BlockJS) of f (d) by where for any resolution j and position k ∈ B j,K within the Kth block, the wavelet coefficients of f (d) are estimated via the rule with, for any a ∈ R, (a) + = max(a, 0), λ > 0, and α j1,k and β j,k are respectively the empirical approximation and wavelet coefficients, defined as and Notice that thanks to (3), for any j ∈ {j 1 , . . ., j 2 } and k ∈ {0, . . ., 2 j − 1} Let f (d) be the estimator defined by (10) with a large enough λ.Then there exists a constant C > 0 such that, for any M > 0, p ≥ 1, r ≥ 1, s > 1/p and n large enough, we have where Theorem 4.1 will be proved using the more general theorem [11, Theorem 3.1].To apply this result, two conditions on the wavelet coefficients estimator are required: a moment condition and a concentration condition.They are established in Propositions 7.2 and 7.3, see Section 7 .

Minimax lower-bound for the MISE
We now turn to the lower bound of the MISE to formally answer the question whether ϕ n is indeed the optimal rate of convergence or not.This is the goal of Theorem 4.2 which gives a positive answer.Theorem 4.2.Consider the model (1) and recall that we want to estimate f (d)  with d ∈ N. Assume that (8) is satisfied.Then there exists a constant c > 0 such that, for any M > 0, p ≥ 1, r ≥ 1, s > 1/p and n large enough, we have inf where and the infimum is taken over all the estimators f (d) of f (d) .
It can then be concluded from Theorem 4.1 and Theorem 4.2 that the rate of convergence ϕ n achieved by f (d) is near optimal in the minimax sense.Near minimaxity is only due to the case p ∈ [1, 2) and s > (1/p − 1/2)(2δ + 2d + 1) where there is an extra logarithmic term.

Simulations results
In the following simulation study we consider the problem of estimating one of the derivatives of a function f from the heteroscedastic multichannel deconvolution model (1).Three test functions ("Wave", "Parabolas" and "TimeShifted-Sine", initially introduced in [19]) representing different degrees of smoothness were used (see Fig 1).The "Wave" function was used to illustrate the performance of our estimator on a smooth function.Note that the "Parabolas" function has big jumps in its second derivative.
We have compared the numerical performance of BlockJS to state-of-theart classical thresholding methods of the literature.In particular we consider the block estimator of [22] and two term-by-term thresholding methods.The first one is the classical hard thresholding and the other one corresponds to the non-negative garrote (introduced in wavelet estimation by [17]).In the sequel, we name the estimator of [22] by 'BlockH', the one of [17] by 'TermJS' and our estimator by 'BlockJS'.For numerical implementation, the test functions were finely discretized by taking T equispaced samples t i = i/T ∈ [0, 1], i = 0, . . ., T − 1.The deconvolution estimator was efficiently implemented in the Fourier domain given that Meyer wavelets are frequency band-limited.The performance of the estimators are measured in terms of peak signal-to-noise ratio (PSNR = 10 log 10 ) in decibels (dB)).For any v ∈ {1, . . ., n}, the blurring function g v is that of Example 3.1 and was used throughout all experiments.

Monochannel simulation
As an example of homoscedastic monochannel reconstruction (i.e.n = 1), we show in Fig 2 estimates obtained using the BlockJS method from T = 4096 equispaced samples generated according to (1) with blurred signal-to-noise ratio (BSNR) of 25 dB (BSNR = 10 log 10 T ǫ 2 dB).For d = 0, the results are very effective for each test function where the singularities are well estimated.The estimator does also a good job in estimating the first and secondorder derivatives, although the estimation quality decreases as the order of the derivative increases.This is in agreement with the predictions of the minimaxity results.We then have compared the performance of BlockJS with BlockH.The blurred signals were corrupted by a zero-mean white Gaussian noise such that the BSNR ranged from 10 to 40 dB.The PSNR values averaged over 10 noise realizations are depicted in Fig 3 for d = 0, d = 1 and d = 2 respectively.One can see that our BlockJS thresholding estimator produces quite accurate estimates of f , f ′ and f ′′ for each test function.These results clearly show that our approach compares favorably to BlockH and that BlockJS has good adaptive properties over a wide range of noise levels in the monochannel setting.

Multichannel simulation
A first point we would like to highlight is the fact that some choices of σ 1 , . . ., σ n can severely impact the performance of the estimators.To illustrate this, we show in Fig 4 an example of first derivative estimates obtained using BlockJS from n = 10 channels with T = 4096 samples and noise level corresponding to BSNR= 25 dB, for σ v = v (dashed blue) and σ v randomly generated in (0, +∞) (solid blue).With σ v randomly generated, we can observe a significant PSNR improvement up to 6.85 dB for the first derivative of TimeShiftedSine.Note that this improvement is marginal (about 0.60 dB) for the most regular test signal (i.e.Wave).
We finally report a simulation study by quantitatively comparing BlockJS to the other thresholding estimators described above.For each test function, we generated T = 4096 equispaced samples on [0, 1] according to (1) with varying number of channels ranging from n = 10 to 100.
Table 1 summarizes the results.It shows in particular that BlockJS consistently outperforms the other methods in almost all cases in terms of PSNR.As expected and predicted by our theoretical findings, on the one hand, the performance gets better as the number of channels increases.On the other hand, it degrades with increasing noise level and/or d.Indeed, the derivatives estimation for BSNR= 10 dB is rather difficult to estimate, especially for functions having highly irregular derivatives such as "Parabolas" (which has big jumps in its second derivative, see Fig. 2(d)).

Conclusion and perspectives
In this work, an adaptive wavelet block thresholding estimator was constructed to estimate one of the derivative of a function f from the heteroscedastic multichannel deconvolution model.Under ordinary smooth assumption on g 1 , . . ., g n , it was proved that it is nearly optimal in the minimax sense.The practical comparisons to state-of-the art methods have demonstrated the usefulness and the efficiency of adaptive block thresholding methods in estimating a function f and its first derivatives in the functional deconvolution setting.
It would be interesting to consider the case where g v are unknown, which is the case in many practical situations.Another interesting perspective would be    to extend our results to a multidimensional setting.These aspects need further investigations that we leave for a future work.

Proofs
In the following proofs, c and C denote positive constants which can take different values for each mathematical term.

Preparatory results
In the three following results, we consider the framework of Theorem 4.1 and, for any integer j ≥ j * and k ∈ {1, . . ., t)dt, the wavelet coefficients (6) of f (d) .Proposition 7.1 (Gaussian distribution on the wavelet coefficient estimators).For any integer j ≥ j * and k ∈ {0, . . ., 2 j − 1}, we have Proof.Let us prove the second point, the first one can be proved in a similar way.For any ℓ ∈ Z and any v ∈ {1, . . ., n}, Therefore, if we set It follows from (1) that Note that, since f is 1-periodic, for any u ∈ {0, . . ., d}, f (u) is 1-periodic and . By classical properties of the Fourier series, for any ℓ ∈ Z, we have Using (12), we have Since {e −2πiℓ.} ℓ∈Z is an orthonormal basis of L 2 per ([0, 1]) and W 1 (t), . . ., W n (t) are i.i.d.standard Brownian motions, is a sequence of i.i.d.random variables with the common distribution N (0, 1).Therefore Proposition 7.1 is proved.

Proposition 7.3 (Concentration inequality).
There exists a constant λ > 0 such that, for any j ∈ {j 1 , . . ., j 2 }, any K ∈ A j and n large enough, Proof.We need the Cirelson inequality stated in Lemma7.1 below.
Lemma 7.1 ([12]).Let (ϑ t ) t∈D be a centered Gaussian process.If then, for any x > 0, we have For the sake of notational clarity, let Recall that, by Proposition 7.1, we have V j,k ∼ N 0, ρ −2 n σ 2 j,k , where σ 2 j,k is given in (14).Let B(1) the unit 2-norm ball in C Card(Bj,K ) , i.e.B(1) = {a ∈ C Card(Bj,K ) ; k∈Bj,K |a k | 2 ≤ 1}.For any a ∈ B(1), let Z(a) be the centered Gaussian process defined by By a simple Legendre-Fenchel conjugacy argument, we have sup a∈B( 1) . Now, let us determine the values of N and V which appeared in the Cirelson inequality.

) 4 .Theorem 4 . 1 .
Minimaxity results of BlockJS over Besov balls 4.1.Minimax upper-bound for the MISE Theorem 4.1 below determines the rates of convergence achieved by f (d) under the MISE over Besov balls.Consider the model (1) and recall that we want to estimate f (d) with d ∈ N. Assume that (φ, ψ) satisfy (5) for some m ≥ d and (8) is satisfied.