Density deconvolution in a two-level heteroscedastic model with unknown error density

We consider a statistical experiment where two types of contaminated data are observed. Therein, both data sets are affected by additive measurement errors but the scaling factors of the error density may be different and/or the observations have been averaged over different numbers of independent replicates. That kind of heteroscedasticity of the data allows us to identify the target density although the error density is unknown and we can allow that the characteristic function of the error variables may have zeros. We introduce a novel nonparametric procedure which estimates the target density with nearly optimal convergence rates. The main goal in this paper is to derive the upper and lower bounds for the convergence rates. A small simulation study addresses the finite sample properties of the procedure.


Introduction, model and applications
We consider nonparametric density estimation based on data which are affected by additive independent measurement error and hence a deconvolution problem occurs.The basic approaches to tackle this situation go back to the papers of Stefanski and Carroll (1990), Carroll and Hall (1988), Fan (1991) and others.In the recent years, this field of statistics has attracted an increasing number of researchers.See e.g. the book of Meister (2009) for a comprehensive and recent review on nonparametric deconvolution problems.
A major drawback of the usual deconvolution techniques is the fact that the error density is required to be known.Otherwise, one faces heavy problems of non-identifiability, which is rather intuitive: Suppose we would like to estimate a density function f X while any empirical access is restricted to some independent observations drawn from the density f X * f ε , the symmetry of convolution makes it impossible to identify f X in the general case.Moreover, Meister (2004) shows that misspecification of the error density in the standard deconvolution kernel estimator as introduced in Stefanski and Carroll (1990) may have fatal consequences on the asymptotic behavior of the estimator.
The fact that it is unrealistic to assume to know exactly the error density in many applications has well been recognized in the statistical community.Therefore, many authors have tried to relax that condition by various modifications of the model.In order to have a situation where the measurement system can somehow be calibrated one may assume the availability of additional independent direct data from the error density, see Diggle and Hall (1993), Neumann (1997), Efromovich (1997), Johannes (2009) for that framework.Another popular model where the error density also is not required to be perfectly known in advance, uses replicated measurements of the same incorrupted random variable with the density f X , see e.g.Horowitz and Markatou (1996), Li and Vuong (1998), Hall and Yao (2003), Neumann (2007), Delaigle et al. (2008).In the related topic of errors-in-variables regression Schennach (2004a,b) consider repeated observation and instrumental variables models.
Other concepts making the error distribution accessible require more restrictive conditions on the target density f X .E.g. consider the standard experiment where one observes the data Y 1 , . . ., Y n generated by Y j = X j + σε j , j = 1, . . ., n , (1.1) where all X 1 , ε 1 , . . ., X n , ε n are independent; the X j have the density f X to be estimated; the error variables ε j have the density f ε and σ > 0 denotes a scaling parameter.In the papers of Butucea and Matias (2005), Butucea et al. (2008) and Meister (2006) this model is considered under the assumption that the Fourier transform of f X , denoted by f ft X , has a specific known positive lower bound.Then, σ can be estimated consistently, however, the density f ε is still assumed to be known so that f X is finally identifiable.Those models use semiparametric approaches to the generic problem of "blind deconvolution", i.e. deconvolution with unknown or partly known noise density.Meister (2007) establishes consistency in a model where f X is compactly supported, σ may be put equal to one and f ft ε (t) has to be known on some bounded interval t ∈ [−T, T ], T > 0 only.
In the present paper, we consider an observation scheme which allows for a specific type of heteroscedasticity in the data.Some recent contributions address the problem that the data may be contaminated by different error densities, e.g. when the observations are drawn from two experiments with different measurement systems; see Delaigle and Meister (2008) and Staudenmayer et al. (2008).In our model, we are given the data Y • are independent; the X j and X ′ j have the density f X ; the ε j and the ε j,k have the error density f ε .We allow for both f X and f ε to be unknown; only the integer m > 0 and the scaling parameter σ ∈ (0, 1) if m = 1 and σ ∈ (0, 1/m] if m ≥ 2 are supposed to be known.For instance, in the replicated data model, the assumption of known σ located in this given interval can be justified as σ = 1/m.We will focus on that model below.Otherwise, if σ is unknown and e.g.m = 1, then the statistical model becomes non-identifiable, in general.Note that e.g.X j ∼ N (0, 3), ε j ∼ N (0, 1), σ 2 = 1/2 on the one hand and X j ∼ N (0, 2), ε j ∼ N (0, 2), σ 2 = 3/4 on the other hand lead to the same distribution of the observed data set; here N (µ, σ 2 ) denotes the normal density with the mean µ and the variance σ 2 .So we have to assume prior knowledge on σ .
In addition some decay constraints will be imposed on f X , details will be deferred to Section 3 due to their technical nature.Note that another scaling parameter σ ′ could be added in the first line of (1.2); however, this may be absorbed by the error variable ε j .Also, we mention that model (1.2) and our estimation procedure as introduced in the following section are fully extendable to the statistical experiment where the data are observed where m < m, i.e. both data sets have cumulative noise structure.
Our method is applicable when putting ε j = m k=1 ε j,k and the m in the estimator equal to the ratio of m and m in the above model.
In the sequel, we will discuss two examples to which model (1.2) is applicable.First, we consider the case of integer m > 1, σ = 1/m.As referred to above, we have repeated measurements for the data in many real life applications.However, in many cases only the averages of those replicates are reported; that problem is mentioned in e.g.Linton and Whang (2002).A direct application of model (1.2) occurs when for some observed individuals only one contaminated measurement is reported while each of the other represent the average of a known number of measurements of the same individual.Also see Morris et al. (1977) and Thamerus (1996) for related data sets in the field of medical statistics.Linton and Whang (2002) consider a grouped data model with an additional error component.They mention that when at least two data aggregates with different size are available the error distribution is identifiable.The idea is similar to the special case σ = 1/m of our model.A least square estimate of the cumulant of the error is roughly suggested, which is completely different from our methods.However, the properties of that estimator are not studied.As a more specific example for the case of m = 1, σ ∈ (0, 1), we mention the Consumer Expenditure Survey from the United States Department of Labor, which is referred to in Nelson (1994) and analyzed by e.g.Schennach (2004a).The goal is to estimate the log income of an individual while the empirical access is restricted to the log expenditure of an individual.This latter quantity can be modelled as the sum of the log income and the logarithm of a certain independent random quantity δ j , which may be interpreted as the readiness of the individual to spend his earnings.Then, we have the standard additive measurement error model while the original model (before applying the logarithm) follows an independent multiplicative error scheme.We modify the model by assuming that the readiness of spending money depends on the total wealth of an individual rather than his income.The wealth is modelled by the income plus an independent component.Then the observed expenditure of a person is equal to his wealth multiplied by the independent random variable δ j .Furthermore, at some point in time the individuals' income may be affected e.g. by some change in the tax rate.For data surveyed after that change the random variables representing the income are multiplied with the known factor σ ∈ (0, 1).Thus, the data can be split into two independent but non-identically distributed data sets.If the distribution of δ j is known or estimable by replicates one can generate independent data from the estimated densities and hence obtain two pseudo-data sets which approximately follow model (1.2).
The paper is organized as follows: In Section 2, we will describe our estimation procedure; its asymptotic properties are investigated in Section 3. Numerical simulations are given in Section 4, the proofs are deferred to Section 5.

Methodology
We begin with the definition of the classes for the target density f X and the error density f ε in model (1.2).
With respect to the target density f X , we assume that f X is β-fold continuously differentiable for some integer β ≥ 1 and its derivatives satisfy (2.1) Furthermore, we impose that xf X (x)dx = 0 .
Those densities satisfying the conditions above are collected in the class F = F β,C1,C2 .Note that all symmetric, sufficiently smooth and compactly supported densities satisfy these conditions.Also when the error density is centered one could think of centering the data X j in practice.On the other hand, normal densities and some appropriate normal mixtures are also included in these constraints.
With respect to the error density f ε , we assume that its Fourier transform satisfies |f ft ε (t) − 1| ≤ T 1 |t| for all |t| ≤ T 0 with T 0 , T 1 > 0. Those densities f ε are summarized in the class G = G T0,T1 .Thus, our conditions on f ε are rather mild; Note that any density with finite first moment is included in G for T 1 and T 0 sufficiently large and small, respectively.But the Cauchy density, for example, is also included in G.In particular, no restrictions about the set of all zeros of f ft ε are required in contrary to the standard situation in deconvolution problems.
Now we address the question how to reconstruct f ft X (t) on a compact interval.By the conditions contained in f X ∈ F and the Taylor expansion of f ft X , we may establish that for all t ∈ R.This implies also the existence of a uniform positive lower bound of |f ft X | and |f ft ε | on any compact interval [−T, T ] with That lower bound is essential as terms involving both |f ft X | and |f ft ε | will occur in our estimators.Writing holds true for all integer K > 0 as the right side of (2.4) may be viewed as a telescopic product.The integer parameter K remains to be selected.A similar technique has been used in Belomestny (2003) in the field of time series analysis; while the author considers data from an autoregression model, assumes that the target density is ordinary smooth and uses kernel regularisation.
Combining the condition σ ≤ 1/m and (2.2), we may conclude that lim k→∞ 1, which will be made more precise in the proof of Theorem 3.1.On the other hand, we have so that the function Φ/Φ is uniquely determined by the characteristic functions of Y ′ j and Y j .Combining that with (2.4), we notice that f ft X (t) is indeed identifiable from the data in model (1.2).
We employ the empirical characteristic functions where their true counterparts are denoted by Ψ(t) and Ψ(t), respectively, and we propose , as the estimators for Φ(t) h ε (t) and Φ(t) h ε (t), respectively, with the common . Motivated by (2.4) and (2.5) we define, utilizing some fixed 0 < ρ < 1, the following estimator for f ft X (t), Therein, a ridge parameter ρ ∈ (0, 1) is introduced in order to prevent the denominator from being too close to zero due to some stochastic deviation.
Actually, the ridge parameter has only minor influence on the estimator as the estimators use some shrinking interval on which Φ(t) tends to one.
In deconvolution problems it is common to assume that f ft ε (t) = 0, for all t ∈ R. In order to derive convergence rates, even specific lower positive bounds on |f ft ε | are required (see e.g.Fan (1991)).However, such conditions are often introduced for mathematical convenience; they are satisfied for normal or Laplace densities but they are not valid for other important densities such as the convolution of uniform densities with any other distribution.The case where the error density has some periodic isolated zeros has been studied in the papers of Hall and Meister (2007) and Meister (2008).By contrast, Meister (2007) introduces a consistent estimator f X provided that f ft X (t) is compactly supported so that f ft ε is permitted to vanish on an open non-void interval.Here, we do not assume that f X is compactly supported but use a decay constraint which is contained in the assertion f X ∈ F. The estimation procedure in Meister (2007) is based on global polynomial extension in the Fourier domain and kernel regularization.While that method achieves consistency its finite sample performance suffers from Gibbs phenomenon.We introduce a novel procedure based on an orthogonal series approach, which seems to be more appropriate.We consider the orthogonal expansion in where the infinite sum converges in the L 2 (R)-sense and pointwise.Therein the space L 2 (R, exp(− • 2 /2)) denotes the Hilbert space consisting of all squaredintegrable functions with respect to the weight function exp(− We realize that the coefficients of this expansion are a linear composition of the moments of the X 1 , which are accessible by the restriction of the Fourier transform f ft X to a bounded domain.This will be made precise in the following paragraph.This is a great advantage of the Hermite polynomial basis in the current setting.Besides the fact that f ft ε may have non-isolated zeros another aspect encourages us to apply an approach which uses an estimate of f ft X only on a bounded interval.For large |t|, the impact of the ridge regularization of the estimator ΨX (t) in (2.6) will increase, what however adds non-negligible bias.
Here that phenomenon is even more critical than for usual deconvolution as the denominator does not only depend on f ft ε but also on f ft X , yielding faster decay of the unregularized denominator.
Furthermore, we have for all integer l ≥ 0. Next for some K ∈ N we define the quantities c = (c 0 , c 1 , . . ., c K ) T , for l, k = 0, . . ., K (note (W) l,k denotes the kth coefficient of the function • l with respect to the orthogonal basis generated by the Hermite polynomials).We derive that with e j = (δ j,k ) k=0,...,K , (2.8) Next, the vector d can be estimated via the characteristic function since by integration by parts, we have for 0 where the function Therefore, combining the equations (2.7) and (2.8) motivates the final density estimator where the smoothing and ridge parameters K, b, ρ, K are still to be selected.

Convergence rates -upper bounds
Throughout this paper, we write f p , p > 0, for the L p (R d )-norm of a function f ∈ L p (R d ) where d may be any integer larger or equal to 1.As the dimension d of the domain of f is clear for any involved function we feel that there is no need to include d in the notation of • p .
Theorem 3.1.For n ∈ N we consider data drawn from the statistical model (1.2) where σ ∈ (0, 1) Then, for β ≥ 1, the estimator (2.9) satisfies Thus, we give the convergence rates from a double-uniform asymptotic view, i.e. the statistical risk is considered uniformly with respect to both f X ∈ F and f ε ∈ G.The logarithmic rates established for our estimator in Theorem 3.1 coincide with those derived for density deconvolution under known normal error distributions (see Fan (1991)) up to some iterated logarithmic loss.As in supersmooth deconvolution, we realize that the exact smoothness degree β of the target density is not required to be known in advance in order to obtain those rates by parameter selection.

Convergence rates -lower bounds
As normal densities are included in our condition f ε ∈ G, it is apparently impossible to significantly improve the result in Theorem 3.1 by getting faster rates such as algebraic rates, for instance.At most, the iterated logarithmic factor could be removed.Nevertheless, in the case of m = 1 we will establish a logarithmic lower bound on the convergence rates with respect to an arbitrary estimator based on the given data structure (1.2); although we restrict our consideration to a subclass G ′ ⊆ G of admitted error densities whose Fourier transform have a polynomial lower bound.Thus, the densities contained in G ′ are ordinary smooth in the terminology of Fan (1991).Note that the smaller the class G ′ the stronger the results as we are considering lower bounds.Therefore, if the error density f ε ∈ G ′ was known in model (1.2), then algebraic rates could be achieved by e.g. the standard deconvolution kernel density estimator as defined in Stefanski and Carroll (1990).This highlights the fact that the slow convergence rates in Theorem 3.1 are not due to some fast decay of the Fourier transform of the error density; but allowing for the error density to be unknown really causes that remarkable deterioration of the rates from algebraic to logarithmic under the given smoothness constraints on the target density f X .Hence, the following Theorem 3.2 will establish nearly optimality of our estimator (2.9) in the given setting.
On the other hand, assuming polynomial lower bounds on both f ft ε and on f ft X , algebraic rates are attainable by a different estimation procedure, whose applicability is restricted to the cases where the existence of those lower bounds can be justified, see Wagner (2009).
Theorem 3.2.Consider the model (1.2) with m = 1.Assume that C 1 and C 2 are sufficiently large and small, respectively, and β ≥ 1.We put G ′ equal to the set of all f ε ∈ G, which satisfy |f ft ε (t)| ≥ C(1 + |t| α ) −1 for all t and some fixed C > 0, α > 1/2.Any estimator fX of f X based on the data Y 1 , . . ., Y n and for n sufficiently large .
Combining the Theorem 3.1 and 3.2, we conclude that our estimator (2.9) attains nearly optimal convergence rates when m = 1; i.e. optimal up to some iterated logarithmic factor.

Numerical simulations
The finite sample size performance of our estimator (2.9) is studied based on some numerical simulations.We applied the estimator to data generated from four different target densities f X : the normal density N (0, 1/4) (Figure 1, 2), the bimodal normal mixture density 0.6 • N (1, 1/4) + 0.4 • N (−2, 1/4) (Figure 3, 4), the triangle density on [−2, 2] (Figure 5, 6) and the triangle density on [−0.5, 2.5] (Figure 7, 8).In the Figures 1, 3, 5, 7 we consider the case of small sample size            (n = 100) while in the Figures 2, 4, 6, 8 the simulations are based on n = 1000 data.The error density f ε is N (0, 1), σ = 0.5 and m = 1.We have run 100 independent replicates in each setting.The figures show five arbitrarily chosen representatives of those replicates along with the true density, which is plotted as a dotted curve.Furthermore, we have computed the integrated squared errors (ISE) of the estimates; the average, the maximum and the minimum of the ISE taken over these 100 simulations are provided in the tabulars below the corresponding figures.
We choose the parameters b = 0.08, K = 6, ρ = 0.2 and K = 6 in any simulation.Further numerical inspection indicates that the specific parameter selection of K and ρ does not affect the outcome very much.Indeed in our examples the protection of the denominator by the threshold parameter ρ was redundant, i.e. putting ρ = 0 leads to the same result.The selection of the parameters b and K is more critical.However, the simulations indicate that the current choice seems reasonable for realistic sample sizes.In particular, one should avoid to select K too large in order to avoid an explosion of the variance, which is also supported by the theoretical aspect of the logarithmic convergence rates.Also, a slight modification of the estimator (2.9) was carried out.The function ϕ K is appropriate for the theoretical purposes of this work; however, its derivates show heavy oscillations near the support boundaries.Thus, the numerical integration of these functions is highly unstable.Therefore, we have replaced this function by a normal density, which is not compactly supported.Still, it performs well due to its fast decay in the tails.As a disadvantage of Hermite polynomials, the flexibility of the choice of the smoothing parameter is limited as only integers are admitted.Nevertheless, kernel techniques as studied in Wagner (2009) do not provide better simulation results than the current Hermite polynomial approach in the considered setting.
Of course, we realize improvement with respect to the variance when increasing the number n of the simulated data; in particular for the bimodal target densities where some structural outliers cannot be avoided for n = 100.Nevertheless, the results indicate that the basic structure of the target densities are well detected by our estimator; although we certainly face a very difficult estimation problem which does not admit estimation at algebraic convergence rates.

Proofs
Proof of Theorem 3.1.Applying the decomposition (2.7) to f X , we obtain that where we have used the orthogonality of the Hermite polynomials with respect to the inner product involving the weight function exp(−• 2 /2) and the inequality exp(−x 2 ) ≤ exp(−x 2 /2) for all x ∈ R.
The smoothness conditions contained in f X ∈ F must be exploited in order to bound the last term in (5.1).Note that the Hermite polynomials based on the weight exp(− • 2 /2) satisfy the relation . It follows by integration by parts that and, hence, by induction that uniformly bounded with respect to f X ∈ F. By Parseval's identity with respect to the Hermite polynomial basis and by assumption (2.1), we derive that Setting j = β we find for K ≥ 4(β − 1) and any (5.2) Thus, we have derived an upper bound on the bias term.Now we focus on the first term in (5.1).As ϕ K is compactly supported and (K + 1)-fold continuously differentiable, its derivatives up to order K + 1 and, hence, their Fourier transforms • l ϕ ft K lie in L 2 (R) for any l = 0, . . ., K + 1; and we have ϕ for almost all t ∈ R by simple Fourier inversion so that Representing the functions • l , l = 0, . . ., K in the orthogonal basis formed by the Hermite polynomials leads to where the Plancherel isometry has been used yielding for all g, h ∈ L 2 (R).In the sequel, we write const.for a generic constant which depends on neither f X nor f ε .As exp(−i • x)H k (x)ϕ ft K (bx)dx is a linear combination of finitely many derivatives of the function ϕ K (•/b) which is supported on [−b, b], we may conclude by the Cauchy-Schwarz inequality and Parseval's identity that where we have used a second order Taylor approximation of The recursion relation for Hermite polynomials, by induction and an estimate by the exponential series.These upper bounds yield that by the Fourier representation of the Sobolev norm.Moreover, we have used that for any k = 0, . . ., K and |x| ≤ 1. Inserting that inequality along with (5.2) into (5.3) and then into (2.7),we obtain that Focussing on the term E ΨX (t) − f ft X (t) 2 , we write Ψ and Ψ for the expectation of Ψ and Ψ, respectively.Moreover, we introduce the notation (5.5) For the first term we have As Φ ∞ , Φ D ∞ ≤ 1 a.s., we have Hence, it suffices to bound E| Φ(t) − ΦD (t)| 2 and E| Φ(t) − Φ D (t)| 2 .We restrict our consideration to the first term as the second one can be treated analogously, where we have applied the Cauchy-Schwarz inequality along with the inequality a p − b p = (a − b) p−1 j=0 a j b p−1−j and Ψ ∞ , Ψ ∞ ≤ 1 a.s..For the second deterministic term in (5.5) we consider that, for |t| ≤ b, as b → 0 when using (2.2) combined with the assumption f ε ∈ G and the condition σ ≤ 1/m.Therein, note that Ψ(t) = f ft X (t)f ft ε (t).As an important result, the convergence of Φ D (t) to one takes place uniformly with respect to |t| ≤ b, f X ∈ F and f ε ∈ G. Thus, for n sufficiently large, we have uniformly in |t| ≤ b that Applying this formula to evaluate the second term in (5.5) we find that where, again, we have used (2.2) and the inequality σ uniformly over all |t| ≤ b.Applying that result to (5.4), we obtain that Inserting the parameter choice as requested in the theorem, we realize that all terms except the last one are asymptotically negligible.Thus, we have a biasdominated problem; the order of the last term determines the convergence rate stated in the theorem.
Proof of Theorem 3.2.As two density sequences competing to be the true density of the j , we define where f N denotes the standard normal density; V denotes the uniform distribution on the set Under the selection a n = c a K −β n with an appropriate constant c a > 0, we can verify that f X,d ∈ F for n sufficiently large.Also we are guaranteed that f X,d , d = d 0 , d 1 , are density function as their nonnegativity and their membership in L 1 (R) follow from the definition of f X,d (note that 1+a n d cos((K n +1/2)πx) ≥ 0 for all x and n large enough).Furthermore, these functions integrate to one as f ft X,d (0) = 1.With respect to the error density, we will also specify the two competing density sequences f ε,d , d = d 0 , d 1 .We define f ε,d via its Fourier transform; we put . By appropriate selection of the terms c t,1 > c t,0 > 0, the function f ft ε,d0 may be continued onto [−c t,1 K n , −c t,0 K n ] and [c t,0 K n , c t,1 K n ] by the tangent lines of f ft ε,d1 taken at the points −c t,0 K n and c t,0 K n , respectively, so that f ft ε,d0 is continuous on the whole real line.Therein, we allow for c t,0 and c t,1 to depend on n but c t,0 is bounded away from zero and c t,1 is bounded from above by a positive number smaller than σ.Those conditions may be fixed by choosing c t,0 sufficiently small since those tangent lines take the value 0 at We realize that the competing error densities have the following properties: f ft ε,d0 and f ft ε,d1 are symmetric; f ft ε,d0 (0) = f ft ε,d1 (0) = 1; they are continuously differentiable on (0, ∞) with finite right side derivatives at 0; continuous on the whole of R; monotonously decreasing and convex on [0, ∞]; lim t→+∞ f ft ε,d0 (t) = lim t→+∞ f ft ε,d0 (t) = 0. Hence, we learn from Polya's criterion (see e.g.Lukacs (1970) After constructing those densities, we write fX for an arbitrary estimator of f X based on the given observation scheme.We use the following arguments from statistical decision theory.
y) 2 2 h d0,d1 (y)dy + fX (y) − f X,d1 (y) 2 2 h d1,d0 (y)dy 2 2 min h d0,d1 (y), h d1,d0 (y) dy (5.6)where h dθ ,d1−θ , θ ∈ {0, 1}, denotes the densities of the data set (Y 1 , Y ′ 1 , . . ., Y n , Y ′ n ) when f X,dθ and f ε,d1−θ are the true densities of X 1 and ε 1 , respectively.By a telescopic sum argument, we derive that (5.7) Now we consider the L 1 (R)-distance by the Cauchy-Schwarz inequality and Parseval's identity.The analogous upper bound holds true for the first L 1 (R)-distance occurring on the right side of (5.7) when setting σ = 1.Note that the derivatives of f ft ε,d0 and f ft ε,d1 are to be understood in the weak Sobolev sense.For |t| ≥ σ −1 c t,1 K n , we derive that for k, l, l ′ = 0, 1.Thus, we obtain that (5.9) From the definition of f ft ε,d1 , we easily derive that its first weak derivative is bounded (with a jump discontinuity at t = 0) while the function itself is bounded by 1. Due to the specific exponential tails of f ft N (t) and its derivative, we obtain that (5.9) has the upper bound O K 3 n exp(−const.K 2 n ) .This bound affects all partial integrals taken over |t| ≥ σ −1 c t,1 K n in (5.8).Now we consider those partial integrals taken over |t| ≤ c t,0 K n .Note that, on this domain, we have coincidence of f ft ε,d0 and f ft ε,d1 on the one hand and f ft ε,d0 (σ•) -fold continuously differentiable on the whole real line and supported on [−1, 1] and B(., .)denotes the Beta-function.Thus we define Dl = i l b −l−1 ϕ (l) K (t/b) ΨX (t)dt with ΨX as in (2.6); and the vector d := D0 , . . ., DK T .
, p.83, Theorem 4.3.1)that f ε,d , d = d 0 , d 1 , are both probability densities and contained in L 2 (R) where the latter properties follows from the integrability of |f ft ε,d | 2 , d = d 0 , d 1 , by Parseval's identity.One can easily recognize that the polynomial lower bound on f ft ε,d is satisfied; as is the Lipschitz condition on f ft ε,d contained in the definition of G. Therefore, we have verified that f ε,d ∈ G ′ for d = d 0 , d 1 .