Adaptive procedure for Fourier estimators: application to deconvolution and decompounding

: The purpose of this paper is twofold. First, introduce a new adaptive procedure to select the optimal – up to a logarithmic factor – cutoﬀ parameter for Fourier density estimators. Two inverse problems are considered: deconvolution and decompounding. Deconvolution is a typical inverse problem, for which our procedure is numerically simple and stable, a comparison is performed with penalized techniques. Moreover, the procedure and the proof of oracle bounds do not rely on any knowledge on the noise term. Second, for decompounding, i.e. non-parametric estimation of the jump density of a compound Poisson process from the observation of n increments at timestep Δ, build an uniﬁed adaptive estimator which is optimal – up to a logarithmic factor – regardless the behavior of Δ. MSC 2010 subject classiﬁcations: 62C12, 62C20; secondary 62G07.


Adaptive procedure
In the literature on non-parametric statistics a lot of space is dedicated to adaptive procedures. Adaptivity may be understood as minimax-adaptivity, i.e. optimal rates of convergence are attained simultaneously over a collection of class of densities, such as Sobolev-balls. Adaptivity may also refer to proving nonasymptotic oracle bounds, i.e. having a procedure that mimics, up to a constant, the estimator that minimizes a given loss function. It is this last notion of adaptivity we adopt here.
Hereafter, we propose an approach that is relevant for inverse problems when the estimator relies on Fourier techniques. This method is inspired from the one introduced in Duval and Kappus [20] and is generalized to deconvolution and decompounding inverse problems. We present this procedure below in a general context, even though in this article we study oracle bounds for two specific inverse problems; deconvolution and decompounding.

Heuristic of the adaptive procedure
Notations. We first introduce some notations which are used throughout the rest of the text. Given a random variable Z, ϕ Z (u) = E[e iuZ ] denotes the characteristic function of Z. For f ∈ L 1 (R), Ff (u) = e iux f (x)dx is understood to be the Fourier transform of f . Moreover, we denote by · the L 2 -norm of functions, f 2 := |f (x)| 2 dx. Given some function f ∈ L 1 (R)∩L 2 (R), we denote by f m the uniquely defined function with Fourier transform Ff m = (Ff )1 [−m,m] .
General statistical setting. Consider n i.i.d. realizations Y j , 1 ≤ j ≤ n, of a random variable Y with Lebesgue-density f Y . Suppose Y is related to a variable X, with Lebesgue-density f through a known transformation T relating their characteristic functions: ϕ Y = T(ϕ X ), where T admits a continuous inverse. To estimate the density f of X from the (Y j ), we build an estimator ϕ X,n of ϕ X as follows Its performance is measured with a L 2 -loss, the choice of the cutoff parameter m is crucial. The optimal cutoff m which minimizes the L 2 -risk is such that This optimal value m * usually depends on the unknown regularity of f and is hence not feasible. An adaptive optimal procedure consistst in selecting a random cutoff m n , calculated from the observations, for which the L 2 -risk is close to the one of f m * , meaning that one can establish an oracle bound for a positive constant C and r n a negligible remainder. Then, f mn is called adaptive rate optimal estimator of f .
Heuristic of the adaptive procedure. Suppose there exists a function F ϕ Y , possibly depending on T and ϕ Y , such that for some positive constant C, it holds which is the case e.g. if T −1 is Fréchet differentiable: the quantity F ϕ Y = (T −1 ) (ϕ Y ) is explicit in the deconvolution case. Then, it follows The second term in (1.2) is a majorant of the integrated variance of the estimator. If the upper bound (1.2) is optimal, meaning that it has the same order as (1.1), asymptotically we get m m n , where m n is such that the bias-variance compromise in the right hand side of (1.2) is realized. To compute m n , we differentiate in m the right hand side of (1.2) giving that m n satisfies Clearly, (1.3) has an empirical version, we select m n accordingly, let m n ∈ m ∈ [0, n], (1.4) for some κ > 0, possibly depending on n. As, the solution of (1.4) may not be unique we consider one element in this ensemble, such as its maximum.
Relation to other works There exist numerous techniques for adaptivity, we mention some of them together with a non exhaustive list of references. Loosely speaking there exist three main approaches; thresholding techniques for wavelet density estimators (see e.g. [16,17,36]), penalized estimators (see e.g. [3,1,31,29]) and pair wise comparison of estimators such as the Goldenshluger and Lepskii's procedure (see e.g. [23,24,27]). These techniques have been developed for different inverse problems and in anisotropic multidimensional settings. All the afore mentioned methods rely on the choice of a parameter to be calibrated by the practitioner, such as κ in (1.4). Numerical performances of the selected estimator are sensitive to this choice and many studies have been devoted to the calibration of this parameter (see e.g. Baudry et al. [2] and Lacour et al. [27]). An advantage of the procedure presented here is that, in the cases considered, for all the values of κ such that the oracle bound bound is valid, the corresponding estimator is reasonable.
Many adaptive procedures such as penalization methods minimizes an empirical version of the upper bound (1.2), while the spirit of (1.4) consists in finding the zeroes of an empirical version of the derivative in m of the upper bound (1.2). Roughly speaking, the difference between our procedure and a penalization procedure is the same as the difference between Z-estimators and M-estimators.
Adaptation to the deconvolution problem We consider the deconvolution problem as it is an archetype inverse problem that has been extensively studied in the literature (see the references in Section 2). Moreover, it is a building brick of many inverse problems. One observes n i.i.d. realizations of Y i = X i +ε i , where X i and ε i are independent, the density of ε 1 is known and the density f of X 1 is the quantity of interest. Optimal rates of convergence depend on the asymptotic decay of the characteristic function ϕ ε of the noise ε, usually ordinary smooth cases -when ϕ ε decays polynomially to 0 -and super smooth cases -when ϕ ε decays exponentially to 0.
In that first case our procedure presents many advantages. On a theoretical point of view our procedure and the proof to establish oracle inequalities are the same in both ordinary smooth and super smooth cases, whereas usual adaptive procedures study these two cases separately. Moreover, the proof to get the oracle bound is rather elegant and relies on a fine cutting of the quadratic risk: the most powerful result involved is an Hoeffding concentration inequality. Usually, tools used to establish oracle inequalities rely on more demanding concentration results. On a numerical point of view, our procedure is simple and for all the possible choices of the hyper parameter κ in (1.4) predicted by the theory, the associated estimators are relevant. We conduct an extensive simulation study which illustrates the stability of the procedure. We compare our results with a penalization procedure described in Comte and Lacour [12], which are known to be rapid and efficient in deconvolution contexts.

A unified estimator for decompounding
Consider a compound Poisson process Z, where N is a Poisson process with intensity λ independent of the i.i.d. random variables (X j ) j∈N with common density f . In the decompounding problem, one discretely observes one trajectory of a Z at sampling rate Δ > 0 over [0, T ]: (Z iΔ , i = 1, . . . , n), where n = T/Δ . The aim is to estimate f from these observations. This model is central in many applied fields e.g. statistical physics, biology, financial series or mathematical insurance as it is well adapted to study phenomena where random independent events occur at random times. For instance, in insurance failure theory these events can model the claims that insurance companies have to pay to the subscribers, it is the Cramér-Lundberg model (see Embrechts et al. [21]).
In the literature, cases Δ → 0 (high frequency observations) or Δ fixed (low frequency observations, often Δ = 1) have received a lot of attention (see the references given in Section 3) and are usually considered separately. Here we propose a unified strategy which is valid regardless the behavior of Δ := Δ n → Δ 0 ∈ [0, ∞). The dependency in Δ 0 of the upper bound is made explicit and shows a deterioration as Δ 0 increases. Those results complement the knowledge on decompounding. Moreover, the estimator remains consistant in cases where Δ grows to infinity slowly. This latter result is not straightforward, if the jump density is centered and has unit variance, it holds that Therefore, one would expect that in these regimes non-parametric estimation of f is impossible as each increment is close, in law, to a parametric Gaussian variable. When Δ goes too rapidly to infinity, namely as a power of nΔ n , Duval [19] shows that consistent non-parametric estimation of f is impossible, regardless of the choice of the loss function. Having a consistant estimator for f when Δ gets large is interesting, usually a Gaussian approximation is made to simplify computations, at the expense of loosing the specificities of the jump density (see e.g. Cont and de Larrard [15]).
Finally, we show that our adaptive procedure can be extended to this case and leads to an adaptive and rate optimal estimator of the jump density f , up to a logarithmic loss, for all sampling rates such that Δ n < 1/4 log(nΔ n ) as n → ∞, this condition is fulfilled for fixed or vanishing Δ.
Organisation of the article In Section 2 we establish and prouve an oracle inequality based on our procedure for the deconvolution problem. We illustrate numerically its performances and compare it with a penalized adaptive optimal estimator given in [12]. Section 3 is dedicated to the decompounding problem. Finally, Section 4 gathers the proofs of the results of Section 3.

Statistical setting
Suppose that X 1 , . . . , X n are i.i.d. with density f and are accessible through the noisy observations Y j = X j + ε j , j = 1, . . . , n.
Assume that the (ε j ) are i.i.d., independent of the (X j ) and such that ∀u ∈ R, ϕ ε (u) = 0. Suppose that the distribution of ε 1 is known. This last assumption can be softened, the procedure allows a straightforward generalization to the case where the distribution of ε 1 can be estimated from an additional sample, see Neumann [32]. Then, the mapping T defined in Section 1.1 is given by T : ϕ → ϕϕ ε , which is a continuous mapping of inverse T −1 : ϕ → ϕ/ϕ ε and (T −1 ) is equal to 1/ϕ ε . A deconvolution estimator of the characteristic function ϕ X of X is given by denoting the empirical characteristic function. Since ϕ X is a characteristic function, its absolute value is bounded by 1 and the estimator can hence be improved by using the definition Cutting off in the spectral domain and applying Fourier inversion gives the estimator This estimator and adaptation techniques have been extensively studied in the literature, including in more general settings than above. Optimal rates of convergence and adaptive procedures are well known in dimension d = 1 (see e.g. [8,37,22,5,6,7,34,14] for L 2 -loss functions or [30] for the L ∞ -loss). Results have also been established for multivariate anisotropic densities (see e.g. [13] for L 2 -loss functions or [35,28] for L p -loss functions, p ∈ [1, ∞]).

Risk bounds and adaptive bandwidth selection
The following risk bound is well known in the literature on deconvolution.
This upper bound is the sum of a bias term that decreases with m and a variance term, increasing with m. We select m n , the optimal cutoff parameter, such that the upper bound (2.3) is minimal Differentiating the right hand side with respect to m, we find that the following holds for the optimal cutoff parameter: This equality has an empirical version and we select m n accordingly. In order to ensure adaptivity the following heuristic consideration is helpful. When the characteristic function is replaced by its empirical version, the standard deviation is of the order n −1/2 . Consequently, estimating ϕ Y by ϕ Y,n makes sense for |ϕ Y | ≥ n −1/2 . If |ϕ Y | < n −1/2 , the noise is dominant so the estimator can be set to zero. This inspires to re-define the estimator of ϕ Y as follows: with the threshold value κ n := (1 + κ √ log n). The constant κ > 0 is specified below and the additional log term is added to ensure good concentration properties on the event considered. Then, the estimator of f given in (2.2) is modified using (2.5) instead of (2.1). We obtain We define the empirical cutoff parameter m n as follows. Since ϕ Y,n may show an oscillatory behavior and the solution of (2.4) may not be unique, we consider for some α ∈ (0, 1]. It is worth emphasizing that the calculation of m n does only rely on the empirical characteristic function ϕ Y,n which can be estimated from the direct observations, and does not require the evaluation of penalty terms depending on the (perhaps unknown) ϕ ε . Moreover, this procedure is the same in the super smooth case as well as in the ordinary smooth case. In (2.6) if we set α = 1, m n is thresholded to the value n, which is natural as if m ≥ n, the variance term in (2.3) no longer vanishes. It is possible to set 0 < α < 1, if one has additional knowledge on the regularity of f (see the discussion below).
Proof of Theorem 2.1.
Step 1: An upper bound for f m . Let m > 0, we first establish an upper bound for the estimator ≈ f m of f defined as f m but whose characteristic function is given by (2.5). Parseval equality and the definition Adaptive procedure for Fourier estimators 3431 The first term in the right hand side is bounded by 1 n . Recall that κ n = 1 + κ √ log n, we decompose the second term on the set where we used the Hoeffding inequality and κ > √ 2. Finally, gathering all the (2.7) Step 2: Adaptation. It holds using the Parseval equality, that we recover a usual variance term. Now we control the surplus in the bias of the estimator f mn decomposing T 1 as It is the sum of a usual bias term and an additional term controled using the inequality Along with the definition of m n , it gives Then, the definition of κ n and (2.7) imply that, on the event E, for a positive constant C depending only on the choice of κ, Second, on the complement set E c , we immediately have It remains to control the surplus in the variance of f mn using the decomposition Next, using that | ϕ X,n (u)| ≤ 1, we derive that The last inequality is a direct consequence of the Hoeffding inequality. Putting the above together, we have shown that for universal positive constants C 1 and C 3 and a constant C 2 depending only on κ, for all m ≥ 0, Taking the infimum over m completes the proof.
Discussion Theorem 2.1 is non asymptotic and ensures that the estimator f mn automatically reaches the bias-variance compromise, up to a logarithmic factor and the multiplicative constant C 1 .
Comments on the adaptive procedure. Similarly to the grouped data setting (see [20]), the computation of the adaptive cutoff (2.6) involves only the set | ϕ Y,n (u)| = (1 + κ √ log n)/ √ n . Therefore, m n depends only on the empirical characteristic function of the direct observations Y , and not the one of the errors ϕ ε nor ϕ X and the adaptive estimator and the proof of the oracle bound are the same in the super smooth case and the ordinary smooth case. Generalizing the result to the case where the distribution of the error is unknown but where we have e.g. independent i.i.d. relalizations (ε 1 , . . . , ε N ) of ε should therefore be straightforward.
Comments on the proof. Proof of Theorem 2.1 is self contained and relies on fine cuttings of the quadratic risk. The more involved tool used is an Hoeffding inequality, whereas usual techniques involve stronger results such as Thalagrand inequalities. The interest is that it should be robust to small changes of in the modelisation.
Note that compared to [20], the proof relies on more direct arguments. Moreover, it permits to derive a stronger result, namely an oracle type inequality, whereas in [20] we ensured that given some regularity class the optimal rate is achieved on this class. (2.6). Regarding the choice of α and κ in (2.6), it is always possible to take α = 1. Note that the case α > 1 is not interesting as, even in the direct problem ε = 0 a.s., if m > n the variance term in (2.3) no longer tends to 0. Taking α < 1 is possible only if one has additional information on the target density f . For instance, if one knowns that f is in a Sobolev class of regularity β, for some β ≥ β 0 > 0,

Choice of the hyper parameters in
where F is the set of densities with respect to the Lebesgue measure. Then, it holds that f − f m 2 m −2β and straightforward computations lead to m n/log n 1 2β+1 (regardless the the asymptotic decay of ϕ ε ). Then, one may restrict the interval for m n to [0, n α ] where 1 > α > 1 2β0+1 . Second, the choice of κ must be such that n α−κ 2 /2 is negligible, the choice κ > 2 always works. The following numerical study illustrates that the procedure is stable in the choice of κ.

Numerical results
Stability of the procedure To illustrate the performances of the method and the influence of the parameter κ we proceed as follows. Fix α = 1, therefore On Figures 1, 2 and 3 we observe that the adaptive rates are small and that the procedure is stable in the choice of kappa. We observe, on these three cases, that the value of κ should not be chosen too large but that for a wide range of values the performances are similar. In practice, the value of n is fixed and there is a natural boundary for κ, indeed observe that it is useless to increase κ if (1 + κ log n)n −1/2 ≥ 1 as the selection rule (2.6) will be constant equal to n α . Moreover, we expect that if (1+κ log n)n −1/2 gets too large, e.g. larger than 1/2 the performances of the adaptive estimator should deteriorate. This practical consideration encourages to choose κ smaller than ( √ n − 1) log(n) Comparison with a penalization procedure We compare the performances of our procedure for κ = 8, with a penalization procedure and with an oracle. For the penalization procedure, we follow Comte and Lacour [12] and consider the adaptive estimator f mn which is the estimator defined in (2.2) where where M n > 0, K > 0 and Δ(m) = 1 2π [−m,m] |ϕ ε (u)| −2 du, which is known in our setting. The parameter M n is chosen as the maximal integer such that 1 ≤ Δ(m) n ≤ 2. For the parameter K it is calibrated by preliminary simulation experiments. For calibration strategies (dimension jump and slope heuristics), the reader is referred to Baudry et al. [2]. Here, we test a grid of values of the K's from the empirical error point of view, to make a relevant choice; the tests are conducted on a set of densities which are different from the one considered hereafter, to avoid overfitting. After these preliminary experiments, K is chosen equal to 2 which is the same value as the one considered in Comte and Lacour [12]. The standard errors are given in parenthesis. The running times for each risks of the penalization procedure and our procedure are similar. However, one should take into account that a preliminary calibration step seems obsolete in our case. In deconvolution problems, the theoretical optimal K can be in some cases far away from the practically optimal K and may vary with the sample size explaining the nessecity of this calibration step (see e.g. Kappus and Mabon [26] where the practical optimal value of K was much smaller than the value predicted by the theory).
Second, an oracle "estimator" is computed f m , which is the estimator defined in (2.2) where m corresponds to the following oracle bandwidth This oracle can be explicitly evaluated when f is known. We denote these different risks by R, for the risk of our procedure, R pen for the penalized estimator  and R or for the oracle procedure. All these risks are computed on 1000 Monte Carlo iterations. The results are gathered in Tables 1 for the Gamma density,  2 for the mixture and 3 for the Cauchy density where C stands for the Cauchy distribution. In each case both an ordinary smooth and a super smooth errors are considered. Comparison of the different methods. Tables 1, 2 and 3 show that all the procedures behave as expected; the L 2 -risks decreases with n and are smaller in the case of an ordinary smooth deconvolution problem than in the case of a super smooth deconvolution problem. The estimator with the smallest risk is the oracle, and the penalized risks are most of the time smaller than our procedure which is consistent with the fact that our procedure has a logarithmic loss and is asymptotic. More precisely for small values of n our procedure does not perform as well as the penalized method. But for larger values of n it is competitive. We can exhibit particular cases where our procedure is more stable in the choice of the hyper parameter than the penalized procedure, even on large sample sizes  (see Figure 4 for example). This is due to the fact that the penalized constant K that is suitable for small values of n is different than for larger values of n.
In practice a logarithmic term in n is added in the penalty term, that is theoretically unnecessary and entails a logarithmic loss but improves the numerical results. If we add this logarithmic term (we replace K = 2 with K log(n) 2.5 with K = 0.3 and the multiplying log(n) 2.5 factor as suggested in Comte et al. [14]). This second penalty procedure performs well for all values of n and when n gets large it has similar performances as our procedure (see Table 4). For our procedure, changing κ for smaller values of n does not improve the results.

Statistical setting
Let Z be a compound Poisson process with intensity λ > 0 and jump density f , i.e. where N is an homogeneous Poisson process with intensity λ and independent of the i.i.d. variables (X j ) with common density f . One trajectory of Z is observed at sampling rate Δ over [0, T ], T = nΔ, n ∈ N. Non-parametric estimation of f , or its Lévy density λf has been the subject of many papers, among others, [4,9,11,18,38] and [25] for the multidimensional setting.

Upper bound and discussion on the rate
The contraint Δ < 1 4 log(nΔ) is fulfilled for any bounded Δ as nΔ → ∞. Moreover it allows Δ to be such that Δ := Δ n → 0 and Δ n → ∞, not too fast. This last point is interesting. An estimator that is optimal simultaneously when Δ is fixed or vanishing and consistant, optimal up to a logarithmic loss, when Δ tends to infinity, has scarcely been investigated, nor the estimation problem when the sampling rate goes to infinity. To the knowledge of the authors, the only similar result was released shortly after our result in Coca [10]. In [10], a L p , p ≥ 1 adaptive optimal nonparametric estimation of the Lévy density (which is related to the jump density) is studied. Both results are complementary, our estimator is adapted to L 2 and has the advantage that its definition (both adaptive and non adaptive) is simpler, leading to more succinct proofs and we provide a numerical study. In the remaining of this paragraph, we discuss the different rates of convergence implied by Theorem 3.1 according to the behavior of Δ.

Discussion on the rates
The upper bound derived in Theorem 3.1 is the sum of four terms: a bias, two variance terms V e 4Δ m nΔ (using that |ϕ Δ (u)| ≥ e −2Δ ) and V m nΔ , which is always smaller or of the same order as V , and a remainder. Assume that f lies is the Sobolev ball S (β, L) (see (2.8)). Then, the bias f − f m 2 has asymptotic order m −2β and we may derive the following rates of convergence.
• Microscopic and mesoscopic regimes. Let Δ = Δ n be such that Δ n → Δ 0 ∈ [0, ∞) such that nΔ n → ∞. Then, the bias variance compromise leads to the choice m = (e −4Δ0 nΔ 0 ) 1 2β+1 and to the rate of convergence e −4Δ0 nΔ 0 ) − 2β 2β+1 that matches the optimal rates of convergence as Δ 0 is fixed or tending to 0. Indeed, the rate is in T − 2β 2β+1 , with T = nΔ n denoting the time horizon, it is clearly rate optimal as it corresponds to the optimal rate of convergence to estimate the jump density of a compound Poisson process from continuous observations (Δ = 0). The constant e −4Δ0 appearing in the rate depends exponentially on Δ 0 , which asymptotically as little effect but in practice deteriorates the numerical performances.
• Macroscopic regime. Let Δ = Δ n → ∞ such that Δ n < 1 4 log(nΔ n ) The variance term V tends to 0, so that the estimator is consistent. Heuristically, if Δ goes to infinity the central limit theorem states that Y Δ is close in law to a parametric Gaussian variable, e.g. if f is centered and with unit variance it holds that: the fact that f can be constantly estimated is non trivial. Duval [19] establishes that if Δ = O((nΔ) δ ), for some δ ∈ (0, 1), i.e. when Δ n goes rapidly to infinity, there exists no consistent non-parametric estimator of f . The fact that estimation is impossible when Δ goes too rapidly to infinity was established through an asymptotic equivalence result. In this case it is always possible to build two different compound Poisson processes for which the statistical experiments generated by their increments are asymptotically equivalent. Therefore, the result of Theorem 3.1 is new in that context. We may distinguish two additional regimes: 1. Slow macroscopic regime. If Δ n = o log(nΔ n ) , the choice m = e −4Δn nΔ n ) 1 2β+1 leads to the rate of convergence e −4Δn nΔ n − 2β 2β+1 . There is no lower bound in the literature to ensure if this rate is optimal. However if Δ goes slowly to infinity, for example if Δ n = log(log(nΔ n )), then the rate is (log(nΔ n )) −4 nΔ n − 2β 2β+1 , which is rate optimal, up to the logarithmic loss that may not be optimal. 2. Intermediate macroscopic regime. Let Δ n = δ log(nΔ n ), 0 < δ < 1/4, then m = (nΔ n ) 1−4δ 2β+1 , leading to the rate (nΔ n ) − 2β(1−4δ) 2β+1 . This rate deteriorates as δ increases. The limit δ = 1/4 imposed by Theorem 3.1 may not be optimal, no lower bound adapted to this case exists in the literature.
The interest of the macroscopic regime is mainly theoretical as in practice if Δ is a large constant to get e −4Δ nΔ large one should consider a huge amount n of observations. However, this regime enlightens the role of the sampling rate Δ in the non-parametric estimation of the jump density. Using [19], consistent non-parametric estimation of the jump density is impossible if ∃δ > 0, Δ n = O(nΔ n ) δ , the remaining questions are what happens in between and if the log loss in the upper bound that appears when Δ n → ∞ is avoidable or not. The constant 1/4 in the bound Δ n < 1/4 log(nΔ n ) of Theorem 3.1 can probably be improved.

Adaptive choice of the cutoff parameter
We consider the optimal cutoff m n given by Following the previous strategy, the upper bound given by Theorem 3.1 is optimal, at least for Δ → [0, ∞). The leading variance terms is in me 4Δ nΔ , we differentiate in m the upper bound to find that the optimal cutoff m m n is such that: which has an empirical version, we select m n accordingly. As in the deconvolution setting, we modify the estimator ϕ n in (3.4) which is set to 0 when the estimator of |ϕ| is smaller that 1/ √ nΔ, meaning that the noise is dominant. Define where κ n,Δ := (e 2Δ + κ log(nΔ)), κ > 0, and the new the estimator of f Finally, we introduce the empirical threshold, for some α ∈ (0, 1] and κ > 0 If κ > √ 2e 2Δ Δ the last additional term is negligible, regardless the value α ≤ 1 and Theorem 3.2 ensures that the adaptive estimator f mn,Δ satisfies the same upper bound as in Theorem 3.1. Therefore, it is adaptive and rate optimal, up to a logarithmic term and the multiplicative constant C 1 , in the microscopic and mesoscopic regimes defined above. In the macroscopic regimes such that Δ := Δ n → ∞ such that Δ n < 1 4 log(nΔ n ) as n → ∞ the estimator is consistent. Note that to establish the adaptive upper bound we imposed a stronger assumption that E[X 4 1 ] < ∞. In the following numerical study, we recover that the procedure is stable in the choice of κ.

Numerical results
As for the deconvolution problem, we illustrate the performance of this adaptive estimator for different densities f . We consider the same densities as for the deconvolution problem, the Cauchy density excepted as it is not covered by our procedure: it has infinite moments. We compute the adaptive L 2 -risks of our procedure over 1000 Monte Carlo iterations for various values of κ. We consider n = 5000 and the sampling interval Δ = 1. The results are represented on Figure 5, we observe that the rates are small and stable regardless the value of κ and the density considered.
Then, we derive from the Hoeffding inequality and Lemma 4.1 that where C depends on E[X 2 1 ] and E[X 4 1 ].
Proof of Theorem 3.2 Step 1: An upper bound for f m,Δ . Let 0 < m < (nΔ) α , Parseval equality and (3.6) leads to The first term in the right hand side is bounded using Theorem 3.1. For the second term, recall that κ n,Δ = e 2Δ + κ log(nΔ) and decompose it on the set ] whose value may change from line to line.