Adaptive density estimation in deconvolution problems with unknown error distribution

: We investigate the data driven choice of the cutoﬀ parameter in density deconvolution problems with unknown error distribution. To make the target density identiﬁable, one has to assume that some additional information on the noise is available. We consider two diﬀerent models: the framework where some additional sample of the pure noise is available, as well as the model of repeated measurements, where the contaminated random variables of interest can be observed repeatedly, with independent errors. We introduce spectral cutoﬀ estimators and present upper risk bounds. The focus of this work lies on the optimal choice of the bandwidth by penalization strategies, leading to non-asymptotic oracle bounds.


Introduction
This paper addresses the problem of the adaptive bandwidth selection via penalization in deconvolution problems with unknown error distribution.We study in parallel two different models.In both models, we assume that the random variables are real-valued.
Model 1.The random variable X of interest is perturbed by some additional additive error ε, independent of X, so the empirically accessible quantity is Y = X + ε.The distribution of the noise is assumed to be unknown.One observes n independent copies of Y : Y j = X j + ε j , j = 1, . . ., n. ( In addition it is assumed that a sample (ε −j ) j=1,...,M of the pure noise, independent of the Y j , is available.
Model 2. The noisy random variable X can be measured repeatedly, with independent errors.The observations are then of the form, Y j,k = X j + ε j,k , j = 1, . . ., n and k = 2. ( All the X j are assumed to be independent and identically distributed and all the ε j,k are independent and identically distributed and independent of the X j .Again, it is assumed that the distribution of the ε j,k is unknown.In addition, the error terms are assumed to be centered.
Density deconvolution is a classical topic in nonparametric statistics and a large amount of literature on this subject has been published since the late 80s.Rates of convergence and their optimality have been studied, for example, in [9,32,33,21] and [20].For the study of sharp asymptotic optimality, see [5,7,8].Adaptive estimation for deconvolution problems has then been investigated by [31], who apply wavelet techniques, by [13], who consider the adaptive bandwidth selection for projection estimators and by [6] for linear functionals.We can also cite [12] in a multivariate setting.However, the above mentioned papers have been working under the assumption that the distribution of the errors is perfectly known, which is clearly not realistic in most fields of application.The systematic study of deconvolution with unknown error distribution in presence of an additional noise sample, which corresponds to Model 1, dates back to the late 90s.For the study of convergence rates, see [30,22] or [27].
The rigorous study of adaptive procedures in a deconvolution model with unknown errors has only recently been addressed.We are aware of the work by [11], by [23] who consider a model of circular deconvolution and by [17], who deal with adaptive quantile estimation via Lespki's method.
In comparison to the classical deconvolution model with known errors, the research on the model of repeated measurements has not been so intense.For the more general model of repeated measurements with skew error densities, rates of convergence have been studied in [25] and recently been improved in [10].Consistent estimation under minimal a priori assumptions is investigated in [28].For repeated measurements with symmetric error density, we refer to [18] and [16].This model has various applications in economics, see, for example, [4], but also in a medical context, see [16].
In the last mentioned paper, as well as in [18], practical strategies for the adaptive bandwidth selection have been proposed, but a theoretical justification is not given which is a motivation for the rigorous study presented in this paper.Essential tools for our approach rely on considerations presented in [24].There are also many common points with recent contributions by [17].In comparison to the last mentioned authors, the main difference lies in the fact that their approach is minimax and asymptotic whereas we are interested in nonasymptotic oracle bounds, thus following the model selection paradigma in the sense of [2] and [26].
Model 1 corresponds, in many respects, to the situation which has been investigated in [11].Let us clarify the essential differences: in a deconvolution model with estimated characteristic function of the errors, the risk bounds are determined by the size M of the noise sample, as well as the number n of observations of Y .For sample sizes M ≥ n the risk bounds correspond to the model with known error distribution.However, for M < n the bound on the risk gets worse.The approach by [11] is tailored for the case where M is, by a polynomial factor, larger than n, and cannot be extended to M < n.The additional considerations presented in the present work allow to handle the case of small noise samples.This seems to be of some practical relevance when one turns away from the classical measurement error model and regards, in some context of physics or biology, ε as a signal overlying some other signal X.In such a framework, the size of the noise sample will be determined by some extraneous influence and the assumption that M > n may fail to hold true.
We want to emphasize another important difference between our reasoning and the arguments given in [11], but also in [17].The last mentioned papers do always work under the standing assumption that the interesting density and error density belong to certain prescribed classes of functions.More precisely, it is assumed therein that the characteristic functions have an exponential or polynomial decay behavior.We are able to dispose completely of any of such semi-parametric assumptions.This is motivated by arguments given in [2].From a model selection point of view, rather than considering a family of parameter sets and aiming at building an estimator which is simultaneously asymptotically minimax, the target is to find the best estimator within a collection leading to non-asymptotic oracle inequalities.It is argued in [2] that these considerations make sense without specifying any particular family of parameter sets.From this point of view, it is desirable to avoid, as far as possible, any a priori parametric assumptions and provide a fully general treatment.
This paper is organized as follows.In Section 2, we fix the notation and assumptions, introduce the estimators and present upper risk bounds.In Section 3, the data driven choice of the cutoff parameter is investigated.We introduce penalized criteria and derive non asymptotic oracle bounds for the corresponding estimators.In Section 4, we present some data examples to illustrate the practical performance of our estimator.All proofs are postponed to Section 6.

Statistical model, estimation procedure and risk bounds
In the present section we fix the statistical model and assumptions, introduce the estimators and recall, for the readers convenience, the non asymptotic risk bounds which have been presented in earlier publications on the subject.We start by introducing some notation which will be used throughout the rest of the text.

Statistical model and estimators
In the situation of Model 1 and Model 2 defined in the introductory part, the target is to recover the density f of X from the data.In the sequel, we limit our considerations to the case where the number k of repeated measurements is equal to two.This setting easily generalizes to a higher dimensional model.However, for sake of clarity, we omit the details.
Throughout the paper, we make the following assumptions: (A1) X and ε have square integrable densities f and f ε w.r.t.Lebesgue measure.
The inverse Fourier transform is then applied to get an estimate of f .However, since neither f * Y nor 1/f * ε are integrable, it is necessary to apply some regularization technique, for example, a spectral cutoff estimator.In this particular case, the estimator of f would be 1/(2π) |u|≤πm e −iux f * Y (u)/f * ε (u) du.We can notice that this estimator corresponds both to a kernel estimator built with a sinc kernel ( [5]) or to a projection type estimator as in [13].
In the present case, the error distribution is assumed to be unknown.To make the problem identifiable, some additional information on the noise is required.
In the ns-model, we introduce the empirical characteristic function of ε, e iuε−j .
In the same way, f * Y is estimated by its empirical version Secondly the identification of f * ε is also possible in the model of repeated observations.Under the symmetry assumption (A3), we have the following equalities When (A3) is violated, the model is much more complicated and requires a completely different approach since f * ε is not a real positive function anymore.For further discussion, see [10].Formula (4) suggests to define the following unbiased estimator of (f * ε ) 2 : Moreover, an unbiased estimator of f * Y is given by When considering the empirical characteristic functions, one has to be careful about the fact that the Y j,k are not independent of each other, nor independent of the ε j,k .
Finally, one has to pay attention to the fact that small values of the empirical characteristic function in the denominator lead to unfavorable effects and a bad performance of the estimator.This is an immediate consequence of the fact that a reasonable estimation of the ratio 1/f * ε is impossible as soon as the denominator is smaller than the standard deviation.This phenomenon has been investigated in [30].This entails the necessity to consider a regularized version of the empirical characteristic function in the denominator.[18] propose a ridge-parameter approach.However, this requires a careful discussion of the choice of the ridge parameter.For this reason, we prefer the completely data driven approach proposed in [30], which has also been applied by [11] for the nsmodel and by [16] for the rd-model.This approach corresponds to the following estimators of the ratio: .
These definitions lead to defining the empirical versions of f * as follows: .
Finally, the objects to be estimated are characteristic functions, so the absolute values are bounded by

Upper risk bounds
The following non-asymptotic risk bounds are valid for the estimators defined in the preceding section.
Proposition 2.1.(i) In presence of an additional noise sample, under (A1)-(A2), there exists a universal positive constant C such that (ii) In the model of repeated measurements, under (A1)-(A3), there exists a universal positive constant C ′ such that Remark 1.The first two terms on the right-hand side of Equations ( 8) and ( 9) correspond to the usual terms when the distribution of the errors is known: the squared bias term and a bound on the variance.The last term is due to the estimation of f * ε which depends on the considered model.These bounds have already been established in the literature on deconvolution, see [30,18] or [11,16].We can hence omit the proof.
In view of the rates of convergence, we observe the following: consider first the ns-model.If M ≥ n, there is no loss in the rate in comparison to a deconvolution problem with known f ε .However, a loss in the rate may occur for M < n.More precisely, if the ratio M/n is small, in comparison to f * /f * ε , the estimator does not achieve the rates of convergence which are known to be optimal for deconvolution with known error distribution.This is intuitive, since f can only be identified through f ε and there is no hope to estimate f with high precision when the information on f ε is not reliable.For a detailed discussion and minimax lower bounds, we refer to [30].Next, consider the rd-model.From Equation ( 9), one derives immediately, that there is no loss in the rate, in comparison to deconvolution with known error distribution, if the decay of f * outbalances the decay of f * ε .If this is no longer true, the following holds: the smoother f ε is, in comparison to f , the worse are the resulting rates of convergence.It can be shown that a loss in the rate is unavoidable in this context (Alexander Meister, personal communication), but to the best of our knowledge, minimax lower bounds have not been published for this particular case.

Data driven bandwidth selection and oracle bounds
The goal of this section is to provide a strategy for the optimal data driven choice of the smoothing parameter.Given a collection M of cutoff parameters, which may vary with M and n, the bandwidth m should ideally outbalance the bias and variance term displayed in Equations ( 8) and ( 9).This trade-off is easier to realize when the variance term is assumed to be known, which is often the case in the literature on model selection.In a deconvolution problem with perfectly known error distribution m should mimic the oracle choice du .
In the present framework, the considerations are even more involved since the characteristic function in the denominator is unknown and the variance is hence not feasible to actually compute.Following the model selection paradigm, see [3,2] or [26], we select m as the minimizer of a penalized criterion such that The penalty term should be chosen large enough to annihilate the fluctuation of fm around its target, for all m in the model collection M simultaneously, but on the other hand, should ideally be as close as possible to the variance term in order to preserve the non-asymptotic risk bounds.In a model selection problem with known variance, the penalty term is deterministic, which is no longer the case in the present situation.
Before introducing the stochastic penalty terms, we shall need the following definitions: for δ > 0 and u ∈ R, The weight function w has been introduced in [29] and the considerations presented in that paper, combined with ideas given in [24] play an important role for our arguments.Since the penalty terms will involve an empirical version of the characteristic function in the denominator, the oracle inequalities depend on a precise control of the deviation of f * ε from f ε , simultaneously on the real line.It is shown in [29] that the distance between both object, weighted by w, is simultaneously small on the real axes.In the penalty terms, there will hence occur a loss of logarithmic order, in comparison to the variance term.
Let us now introduce the stochastic penalty terms.In the ns-model, du, (10) with τ Y = τ ε = √ 6.In the rd-model, du, (11) with τ Y as previously and The cutoff parameters are selected as the minimizers of the following penalized criteria.
and pen rd (m) := pen 1,rd (m) + pen 2,rd (m) We are now ready to formulate the oracle bounds and hence the main result of this paper: Assume that (A1) and (A2) are satisfied.Let m ns be defined by (12) and f mns,ns according to (7).Then there exists a universal positive constant C ad and a positive constant C depending on the particular choice of τ and δ, but not on any of the underlying distributions, such that (ii) Let M be a collection of cutoff parameters with max M ≤ √ n.Assume that (A1)-(A3) are satisfied.Let m rd and f mrd,rd be defined according to (13) and (7).Then there is a universal positive constant C ad and a positive constant C depending on the choice of γ and δ, but not on the underlying distributions, such that Remark 2. It is remarkable that we are able to establish non-parametric oracle bounds which make sense without specifying any particular semi-parametric model.Related problems are frequently discussed under specific a priori assumptions on the decay behavior of f * and f * ε , see for example [13] or [11].In the present work, we can completely dispose of any such assumptions, so our approach is as general as possible.
Another interesting point about our considerations is the following: the only assumption imposed on the collection M of cutoff parameters is some upper bound on the largest index.No further specification is necessary and we may work with an arbitrarily fine grid, allowing good approximation properties.This is a consequence of the fact that our proofs rely on one sole application of the Talagrand inequality.Additional applications of the Bernstein inequality and sums over M are not required.
Finally, it is worth emphasizing that, to the best of our knowledge, the nonasymptotic oracle bounds for the rd-model are completely new and the same is true for the bounds in the ns model with M < n.
We have considered the problem from a non-asymptotic perspective, but Theorem 3.1 entails, from the asymptotic and minimax point of view, the following observation: in those cases where minimax rates of convergence are known, the procedure achieves, up to a logarithmic loss, automatic adaptation over prescribed non-parametric function classes, typically Sobolev-spaces or classes of super-smooth functions.

Practical estimation procedure
Let us describe first the adaptive procedures as it is implemented for the both models: As often in model selection methods, the values of the constants in the penalty, here denoted by τ Y and τ ε , which are obtained from the theory are too large in practice.Therefore a calibration step is required and done: for a small set of densities and different sample sizes the mean integrated squared error (MISE) is computed in order to determine admissible range for the values of the constants (see [1] for a description of this step).Finally, the penalties are chosen according to Equations (10) and (11) with τ Y replaced by 0.6 and τ ε by 0.3 for the nsmodel and rd-model.Besides we consider the model collection M = {m = k/10, 1 ≤ k ≤ 25}.In practice, one can take k in a much larger set and propose larger values for m; the first selected value can be followed by another run of the estimation algorithm with a thinner grid of proposals around the selected value.We limited the set here because the proposed values seemed adequate and allowed less time-consuming repeated experiments.
In the sequel we also use the notations ror and rad define as follows where Ê is the approximation of theoretical expectation computed via Monte-Carlo repetitions.The whole implementation is conducted using R software.The integrated squared error (ISE) f − f m 2 is computed via a standard approximation and discretization (over 300 points) of the integral on an interval of R denoted by I. Then the MISE E f − f m 2 is computed as the empirical mean of the approximated ISE over 500 simulation samples.[11] and influence of M in the NS-model

Comparison with
We compute different estimators of the signal for different values of M and consider different signal densities and two noises.Following [13], we study the following densities on the interval I: All the densities are normalized with unit variance except the Cauchy density.
We consider the two following noise densities with same variance 1/10.
We want to study the influence of the relationship of n and M on the estimation of f in the ns-model.We then consider different values of n and values of M = √ n and M = n.
Results.The results of the simulations are given in Table 1.Table 1 illustrates the case where we can recover a preliminary sample of the noise ε (our so-called ns-model).First we see that the risk decreases when the sample size increases.
Likewise the risk increases when the variance increases.Secondly the results are very close to those of [11].Nevertheless our procedure is equivalent or better.The main improvement is that our procedure has better performances when the size of the preliminary sample is small.Moreover we show that the procedure does not need big M since we reach the same performances as [11] for M = n when they take M = n 2 .Indeed if we consider the mixed Gamma distribution for n = M = 500, our risk is 0.232 with a Laplace noise, while [11] with M = n 2 have 0.382.We can make the same remarks for the other distributions (except for the Laplace distribution) with a sample size 100 or 250 and with the two noise distributions.For the Laplace distribution, the results of [11] are better but they do not outperform ours.Nonetheless, in our model, the risk decreases more rapidly when M and n increase.

Effect of the variance
We also test how our procedure behaves when the variance is increased.In Table 2, we present the results of simulations where the variance takes the values 1/4, 1/2 and 1.We only report the case of a Gaussian error distribution since the results of the Laplace error are very similar.Moreover a Gaussian distribution is a case less favorable.Indeed its Fourier transform decays exponentially and it is known to imply possibly slower asymptotic rates.Thus it is more difficult to recover the target density f .
Results.The results are reported in Table 2.As before the risk decreases when n and M increase.Similarly when we increase the contamination of the variable of interest by increasing the variance of the error distribution, the risk increases.The procedure performs still well since the adaptive risk is close to the oracle risk around twice bigger.

Illustrations in the RD-model
For this model, we use the same signal and error distributions as described in 4.2.
Results.The results are reported in Table 3.We note that the values of the MISE are very close for both error distributions.Moreover the adaptive risk is also close to the oracle risk.It is multiplied by approximately 1.5 when the variance of the error distribution is 1/10 and 2 when the variance equals 1/2.Again the risk decreases when n and M increase.And the risk increases when the variance increases.

Comparison with a kernel estimator
Recently, some papers as [14] have investigated the necessity of inversion in statistical inverse problem.The idea is to compare the performances of a simple kernel estimator directly applied to the data Y i with the adaptive estimator of the ns-model.This model allows us to choose a non symmetric noise which makes the estimation with a kernel estimator more difficult.That is why we compare our estimator with a kernel in two contexts: one with a symmetric noise and another one with an asymmetric noise.
First, for the symmetric noise, we take the results of Table 2 with variance 1/2 for a Gaussian noise with M = n and compute the kernel estimator for this design.
Secondly we compute the estimator of the signal in the ns-model with M = ⌊ √ n⌋ with an asymmetric noise.We consider two densities defined in the beginning of this section: mixed Gamma and Gaussian.We choose the error distribution as follows Γ(α) 1 x≥0 with parameters α = 1 and β = 2.The variance of the error distribution is then 1/4.For the kernel estimator, we use the function density of R with a Gaussian kernel where the bandwidth is selected by cross-validation.
Results.The results are reported in Tables 4 and 5.In Table 4, we see that the results are very close.For the mixed Gamma of the kernel the results are a little bit better but the risk of our estimator decreases more rapidly.For the Gaussian distribution our results are better and the risk decreases also more rapidly when the sample size increases.
For the non symmetric noise the results are reported in Table 5.We see that for the mixed Gamma, the kernel estimator performs unexpectedly well.For n = 100, the risks are practically the same as those of the adaptive estimator.When Table 4 Comparison of the method with the results of a Gaussian kernel estimator with a symmetric noise (MISE×100 averaged over 500 simulations), rker is the risk of the kernel estimator, rad is the risk of the adaptive estimator, ror is the risk of the oracle estimator, M is the size of the noise and n is the size of the Y i 's  n increases the adaptive estimator performs better but the kernel estimator still gives satisfying results.
On the other hand, the results of the Gaussian distribution illustrate well the importance of inversion in statistical inverse problem.Indeed the risk with a kernel estimator is around 3.10 −2 for the diverse values of n while for the adaptive estimator the risk is divided by 3.More precisely for n = 100 the risk of the kernel estimator is multiplied by 2 compared to the adaptive estimator, by 5 for n = 250 and by 7 for n = 500.So when the error distribution is unknown and the signal to noise ratio (named s2n) is not too large, our procedure is worthy of interest.When the s2n gets larger, we only expect the deconvolution procedure not to deteriorate the results compared to direct estimation.It has been checked to be true for known noise density by [14] see section 4.7 and 4.9.As in practice the value of the s2n is unknown, our procedure is always recommended.
Figures 1 and 2 illustrate the estimation of a mixed Gamma using both penalized estimators in the cases for n = M = 200 and n = M = 500.The estimation is made with additional Gaussian and Laplace noises with a variance of 1/10.The bimodal specificity of the density is well described.Moreover the precision increases with the sample size.

Concluding remarks
This paper deals with adaptive deconvolution estimation of a density when the noise density is unknown.We have considered two cases: one where a preliminary  sample of the noise can be observed and another one where the variable of interest X can be observed repeatedly with independent errors.For both models, we have proposed a theoretical adaptive procedure which automatically makes a data driven bias-variance compromise.Moreover it allows us to not specify rates of convergence since they are mechanically reached.This procedure enables us to treat the problem of adaptive estimation in repeated observation model which is completely new.The estimation procedure relies on the independence of the estimators of the characteristic functions of f * Y and f * ε .Its advantage is to be very general under weak assumptions.Indeed, that procedure takes into account cases where there can be small number of replications which matches realistic applications as in medicine or economics.Besides of its theoretical properties, our procedure has showed good performances in simulation.
At last we think that our procedure can be extended to density estimation of a random effect in linear mixed-effects model.Indeed we are aware of the work of [15] who proposed an adaptive procedure based on deconvolution methods in the unknown-error case which is not optimal and [19] who used a Lepski's method in the known-error case.In that model, the noise can also be recovered by successive difference similarly to the repeated model but the characteristic function of the noise would be raised to a greater power.We may then propose in the same spirit an adaptive procedure for the random effect in linear mixedeffects model.

Preliminaries
We start by restating, for the reader's convenience, the following version of Talagrand's inequality: Lemma 6.1.Let I be some countable index set.For each i ∈ I, let Z (i) 1 , . . ., Z (i) n be independent and identically distributed random variables with values in [−1, 1], defined on the same probability space.Let v 2 := sup i∈I Var[Z Then there are universal positive constants c 1 and c 2 such that for any κ > 0, A proof of this result is given in [26], see page 170.It follows from the arguments presented therein that for any r, s > 0 with 1 s + 1 r = 1, the constants can be chosen c 1 = 2s 2 and c 2 = 6r.
The following result will be essential for the theoretical justification of the adaptive procedure.The proof was given in [24].It uses a fundamental Lemma shown in [29], combined with Lemma 6.1.Assume that τ > 2 √ p. Then there exists a positive constant C depending on m Z and the choice of γ and δ such that for arbitrary n ∈ N: Remark 3. In the situation of the preceding Lemma: if the Z j are assumed to have a symmetric distribution and the empirical characteristic function is defined to be f * Z (u) = 1/n n j=1 cos(uZ j ), it is enough to assume τ > √ 2p to obtain (18).
The next result has been formulated and proved in [30].Lemma 6.3.Let Z 1 , . . ., Z n be i.i.d.random variables.Let f * Z be the true and Then for arbitrary p ∈ N, there exists a positive constant C depending only on p such that Remark 4. In the preceding lemma, one should keep in mind that in the rdmodel, Y j,1 − Y j,2 = ε j,1 − ε j,2 plays the role of Z j .Lemma 6.3 yields the following useful corollary, which allows to compare the stochastic penalty terms to their deterministic counterparts.Corollary 6.4.Let the penalty terms be defined according to Equations (10) and (11).Then for some C > 0, Proof.It is enough to consider pen rd .Equation (19) gives immediately for some C > 0, The Cauchy-Schwarz inequality and the estimate 1 This completes the proof for the rd-model.The arguments for the ns-model are the same, line for line, so we omit the details.The only difference lies in the fact that, in the definition of f * ε,ns and f * ε,ns , M plays now the role of n as defined in Lemma 6.3.The proof of Theorem 3.1 relies on the following auxiliary result which is, in turn, a consequence of Lemma 6.2.

A technical auxiliary result
Proposition 6.5.(i) In the model of repeated measurements, there exists an universal positive constant C such that (ii) If an additional sample of the pure noise is available, for some universal positive constant C, Proof.(i) Let us introduce the favorable events Applying Parseval's equality we can estimate fk,rd − fm,rd We start by dealing with the first summand appearing in the last line of (20).
The definition of E Y,rd and the fact that immediately imply the following inequality Consider now the second summand in the last line of (20).Recall that f * ε and f * 2 ε,rd are real valued and (formally, with c/∞ := 0), This entails the series of inequalities For the expression appearing in the last line of formula (21), we observe that The definition of pen Consider now the second to last line in Equation (21).We observe that 3| From this, we derive that .
Using this and the definition of E ε,rd and pen 2,rd , we arrive at Putting the above together, we have shown There remains to consider the exceptional set E c Y,rd ∪ E c ε,rd .Using the fact that for arbitrary m, the absolute value of f * m,rd is, by definition, bounded by 1, as well as the fact that max M ≤ √ n, we can estimate Thanks to Lemma 6.2, it holds that This completes the proof of part (i).
(ii) The proof of the second part uses essentially the same arguments as the proof of the first part, so we content ourselves with sketching the important steps.In analogy with (20)   We have Taking expectation and applying Proposition 2.1 gives for the first summand on the right hand side since the variance term is a fortiori bounded from above by the penalty term.So there remains to consider the second summand on the right hand side of (23).Taking expectation and applying Corollary 6.4 yields for some positive constant C, We have thus proved the desired result on G: • Next, we consider the set G c = { m > m ⋆ }.It holds that We realize that, by definition of f m, see (24), on G c , − It follows from there that Taking expectation and applying Proposition 6.5, as well as the monotonicity of the bias term, we conclude that with N = n in the rd-model and N = M ∧ n in the ns-model.This is the desired oracle bound for G c .

Fig 1 .
Fig 1. Estimations for n = M = 200 of a mixed Gamma (bold black line), in blue dashed line for the ns estimator and in red longdashed for the rd estimator.

Fig 2 .
Fig 2. Estimations for n = M = 500 of a mixed Gamma (bold black line), in blue dashed line for the ns estimator and in red longdashed for the rd estimator.

Table 1
Values of approximated MISE E( f − f mns ,ns 2 ) × 100 averaged over 500 samples with a Laplace and Gaussian noise of variance 1/10, rad is the risk of the adaptive estimator, ror is the risk of the oracle estimator, M is the size of the noise and n is the size of the Y i 's

Table 2
Values of approximated MISE E( f − f mns ,ns 2 ) × 100 averaged over 500 samples with a Gaussian noise, rad is the risk of the adaptive estimator, ror is the risk of the oracle estimator, M is the size of the noise and n is the size of the Y i 's

Table 3
Values of approximated MISE E( f − f mrd,rd 2 ) × 100 averaged over 500 samples, rad is the risk of the adaptive estimator, ror is the risk of the oracle estimator, M is the size of the noise and n is the size of the Y i 's

Table 5
Comparison of the method with the results of a Gaussian kernel estimator with an asymmetric noise (MISE×100 averaged over 500 simulations), rker is the risk of the kernel estimator, rad is the risk of the adaptive estimator, ror is the risk of the oracle estimator, M is the size of the noise and n is the size of the Y i 's 1,rd and E Y,rd readily imply that