Nonparametric estimation of the ability density in the Mixed-Eﬀect Rasch Model

: The Rasch model is widely used in the ﬁeld of psychometrics when n persons under test answer m questions and the score, which describes the correctness of the answers, is given by a binary n × m -matrix. We consider the Mixed-Eﬀect Rasch Model, in which the persons are chosen randomly from a huge population. The goal is to estimate the ability density of this population under nonparametric constraints, which turns out to be a statistical linear inverse problem with an unknown but es- timable operator. Based on our previous result on asymptotic equivalence to a two-layer Gaussian model, we construct an estimation procedure and study its asymptotic optimality properties as n tends to inﬁnity, as does m , but moderately with respect to n . Moreover numerical simulations are provided.


Introduction
We consider the famous Rasch model which is used to analyse psychometric surveys when n individuals under test answer m questions. The score is given by a realization of a random binary n × m-matrix X. Its (j, k)th entry indicates whether the answer of the jth person to the kth question is correct. In this case we have X j,k = 1 and we put X j,k = 0 otherwise. In the standard Rasch model, all entries X j,k are assumed to be independent binary random variables which satisfy P (X j,k = 1) = exp{β j − θ k } 1 + exp{β j − θ k } , k= 1, . . . , m; j = 1, . . . , n , where the parameter θ k describes the difficulty of the kth question and the parameter β j represents the ability of the jth person. For fundamental literature on this model we refer to the books of [29]. The Rasch model has also been used in the field of econometrics (e.g. [5], [6], [25], [18]).
In this paper we focus on the Mixed-Effect Rasch Model (MRM), in which the individuals under test are assumed to be randomly chosen from a huge population so that the β j , j = 1, . . . , n, are interpreted as i.i.d. random variables with some distribution F . Thus the underlying statistical experiment X n,m equals X n,m := {0, 1} n×m , P({0, 1} n×m ), (P θ,F ) (θ,F )∈Θ×F , where the probability measure P θ,F satisfies for all ω ∈ {0, 1} n×m . The specification of the parameter sets Θ and F is defered to subsection 2.2. The MRM is studied e.g. in the books of [11] and [37] as well as [24], [31], [33] and [34]. So far literature on Rasch models has mainly focused on the estimation of the difficulty parameters, consistency and asymptotic normality (mostly for bounded m). Therein maximum likelihood (ML), quasi-ML and conditional ML methods are the most popular procedures, see e.g. [10], [14], [13], [7], [27], [28], [32]. By the conditional ML approach, the difficulty parameters can be estimated without estimating the ability distribution before. Also see [36] for a review on the related literature. [1] consider saddlepoint approximation of the ability parameters. [8] construct confidence intervals for the ability parameters. [33] and [34] study the covariance structure and asymptotic distribution of quasi-ML estimators in the MRM. [24] focus on semiparametric estimation in the MRM where estimation of the ability distribution via ML-related methods (mainly parametrically specified where the number of parameters is allowed to increase in some of the results). [12] considers consistent estimation in the MRM for classes of discrete ability distributions with a finite number of support points. [15] studies identification and estimation of parameters under unconstrained ability distribution; [38] consider log-linear models for the ability distributions; [39] use skew-normal distributions. Still, to our best knowledge, fully nonparametric approaches to the estimation of the ability distribution along with rigorous investigation of the theoretical properties (minimax convergence rates, adaptivity etc.) are apparently missing. Considering large international surveys, estimation of the ability distribution seems very important in order to compare the ability distribution of the students from two different countries, for example.
Therefore the current work focuses on the estimation of the ability density in the MRM. We provide a fully nonparametric procedure for the estimation of this density while the difficulty parameters are unknown. Moreover this nonparametric estimator can be applied to treat parametric submodels and to construct parametric goodness-of-fit tests as well where the investigation of the asymptotic distribution of our estimator remains a challenging task. Our method significantly differs from the existing approaches as we do not use MLbased techniques but, in fact, we exploit asymptotic equivalence of the MRM to a two-layer Gaussian experiment (more precisely, a Gaussian observation andconditionally on that -another Gaussian observation) in Le Cam's sense, which has recently been proved in [22]. In that paper the authors use some components of the observation in order to construct an asymptotic confidence region for the difficulty parameters and leave the problem of nonparametric estimation of the ability density open for future research. Thus, in the current work, the same authors tackle exactly this problem. Moreover another asymptotically equivalent Gaussian model with two independent (multivariate) observations will be derived in the current work.
When working with the asymptotically equivalent version of the MRM, estimating the ability density represents a statistical linear inverse problem with Gaussian noise and an unknown but empirically accessible linear operator. Such models occur quite frequently in nonparametric statistics, see e.g. [9], [4], [19] and [35]. Other examples in this class of models concern deconvolution/ image deblurring from an unknown point-spread function (e.g. [17]) or functional linear regression (e.g. [16], [26]). Each of these inverse problems has its own major challenges -also since the procedure and the theoretical results heavily depend on the operator and its properties; and since the type of the empirical information on the operator is various. This holds true for the current model as well.
Two-and three parameter models from logistic item response theory (2-PL IRT and 3-PL IRT) are popular extensions of the Rasch model for the analysis of psychometric surveys (e.g. [2], [3]). Therein the probability that the jth individual provides the correct answer to the kth question is modelled by with the difficulty parameters a k , b k , θ k and the person parameter β j . Note that the standard Rasch model is included for b k = 0 and a k = 1. There is no straight forward extension of our procedures to the estimation of the ability distribution in the corresponding mixed-effect model since the sums of the rows and the columns do apparently not form a sufficient statistic in general. Thus the basic arguments from [22] cannot be taken over. Nonparametric estimation of the ability distribution in 2-PL and 3-PL IRT remains an interesting topic for future research.
The paper is organized as follows: In section 2 we describe the asymptotically equivalent experiments, which motivate our estimation procedures and investigation. In section 3 the estimation procedures are provided and explained. In section 4 the asymptotic properties of these procedures are studied. In section 5 we compute the transforming Markov kernels for the link between the original MRM and the asymptotically equivalent models; and we provide numerical simulations. The proofs are defered to section 6.

Notation
In the following we introduce some of the frequently used terms where the terminology is partially adopted from [22]. These terms are required to understand the asymptotically equivalent Gaussian models in Subsection 2.2. A shifted ver-sion of the difficulty parameters is given by ϑ k := θ k − θ m for all k = 1, . . . , m. The set of all vectors b ∈ {0, 1} m with exactly k components equal to 1 is denoted by B(k, m). For any ϑ = (ϑ 1 , . . . , ϑ m−1 ) and N = (N 1 , . . . , N m ) ∈ N m 0 we define the function In order to provide some insight about this function, we mention that, in [22], some statistics containing the sums of the rows and the columns in the MRM have been detected to be sufficient; and their asymptotic distribution has been studied. Therein the conditional expectation of one statistic given the other equals minus the grandient of Ψ N and the corresponding conditional covariance matrix equals the Hessian matrix of Ψ N , see [22], p. 1337. Hence this function plays a major role in the asymptotically equivalent Gaussian models, on which our estimation strategy will be based. The function τ maps from R m+1 to R m and is defined by Moreover ΔΨ N (ϑ) stands for the corresponding Hessian matrix, and [·] and · + denote componentwise rounding and truncation by zero, respectively. Furthermore we write N(μ, Σ) for the normal distribution with the expectation vector μ and the covariance matrix Σ. The function which maps all elements of the domain to 1 is called 1. Finally we define and the corresponding vector q(θ, F ) and the matrix Q(θ, F ) via These quantities also occur as the expectation vector and the covariance matrix of one statistic in the asymptotically equivalent Gaussian experiment in the next subsection.

Asymptotically equivalent models
Let us summarize the conditions on the parameters sets Θ and F which are assumed in [22] in order to establish asymptotic equivalence of MRM and the two-layer Gaussian experiment. As our procedure is based on this result we also impose these constraints to deduce all theoretical properties of the current paper while, in practice, the procedure may still be useful when these assumptions are violated. The vector θ = (θ 1 , . . . , θ m ) which contains all difficulty parameters as its components is imposed to lie in the set for some constant R > 0. Therein the difficulty parameters are required to sum up to zero, which represents a common calibration in order to guarantee identifiability. The distribution class F contains exactly those distributions F on R which have a Lebesgue density f that satisfies the tightness condition for some universal envelope function f with f (x)dx < ∞ and some universal positive constants D 0 and D 1 . Moreover impose the existence of some β > 3D 1 + 29/2 such that sup n m β n /n < ∞ , (2.4) for m = m n ; and m ≥ 3. For simplicity, the index n of m n will be omitted in the sequel. Condition (2.4) allows m, the number of questions, to increase slowly as n, the number of participants of the survey, tends to infinity. That constraint seems realistic in many psychometric surveys. Under the conditions (2.3) and (2.4), [22] prove asymptotic equivalence of the MRM X n,m and the experiment Y n,m , in which the 2m-dimensional random vector (X, Y ) is observed (notation changed with respect to [22]) when n tends to infinity. The distribution of Y equals L(Y ) = N(nq(θ, F ), nQ(θ, F )). The conditional distribution of X given Y equals N(ϑ, This consideration is now continued by showing asymptotic equivalence of the experiment Y n,m and the experiment Z n,m , in which one observes (X * , Y ) where X * and Y are independent; Y is as in the experiment Y n,m while L(X * ) = N(ϑ, {ΔΨ nq(θ,F ) (ϑ)} −1 ). For this purpose, no transformation of the data by a Markov kernel is required. Note that, in the above Hessian matrix, the partial derivatives are only to be taken with respect to the ϑ j , not with respect to the θ in the index. Experiment Z n,m is advantageous compared to experiment Y n,m in the regard that the statistics X * and Y are independent. On the other hand, the covariance matrix of X * is unknown to the statistician in the experiment Z n,m ; while Y n,m is preferable for the construction of confidence ellipsoids for the difficulty parameters as in section 6 in [22] since the conditional covariance matrix of X given Y does not depend on F . In the current work, it makes no difference if one considers model Y n,m or Z n,m for the oracle setting (see subsection 4.2); whereas model Z n,m is used for the estimation of the ability density under unknown difficulty parameter in the subsection 4.3.
For basic literature on Le Cam theory we refer to the books of [20], [21] and [23]. The exact Markov kernels which have been used in [22] to transform the MRM into the experiment Y n,m are only needed in the simulation section 5 as the original data have to be transformed into data from Y n,m and Z n,m , respectively, in order to apply our estimation procedure. Therefore those kernels will be presented in that subsection 5.1.

Oracle setting
In this subsection we consider estimation of the ability density f under the oracle constraint that the difficulty parameters θ = (θ 1 , . . . , θ m ) are fully known. Writing f for the true density of the ability distribution F , we define for some fixed (known) Lebesgue probability density w which has no zeros. We assume that h lies in the separable Hilbert space which contains all measurable functions g from R to itself with |g(x)| 2 w(x)dx < ∞ (or, to be exact, the equivalence classes of those functions which coincide almost everywhere with respect to the LB measure, i.e. the Lebesgue-Borel measure). Accordingly ·, · w and · w stand for the corresponding inner product and norm, respectively. We define the function and the operator Γ m,θ which satisfies We consider the observation Y = (Y 0 , . . . , Y m ) from the experiment Y n,m as defined in subsection 2.2; and realize that Thus, in order to estimate h or f , inversion of the operator Γ m,θ is required. Therefore we study some of its properties, writing H m,θ for the linear hull of the functions β k,m,θ , k = 0, . . . , m.

Lemma 3.1. (a) The dimension of the linear subspace
The mapping Γ m,θ is a compact, self-adjoint and positive semi-definite linear operator from L 2 (R, w) to H m,θ ⊆ L 2 (R, w). The kernel of Γ m,θ corresponds to the orthogonal complement of H m,θ with respect to L 2 (R, w).
By Lemma 3.1(b), the unit eigenfunctions of Γ m,θ , which are denoted by ϕ j,m,θ , integer j ≥ 0, form an orthonormal basis of L 2 (R, w). Therein we arrange that the ϕ j,m,θ are sorted such that the corresponding eigenvalues λ j,m,θ decrease in j. As Γ m,θ is positive semi-definite all λ j,m,θ are non-negative. Moreover Lemma 3.1 yields that the λ j,m,θ , j = 0, . . . , m, are positive while λ j,m,θ = 0 for all j > m. This also implies that the ϕ j,m,θ are located in H m,θ for all j = 0, . . . , m while ϕ j,m,θ lies in the orthogonal complement of H m,θ for any j > m.
is an unbiased oracle estimator of the score h, ϕ j,m,θ w for all j = 0, . . . , m.
According to the usual structure of orthogonal series estimators we introduce the oracle estimator of h bŷ for some weights w j,m,θ ∈ [0, 1], j = 1, . . . , m, which remain to be selected. Therein we have exploited that ϕ 0,m,θ = 1 (see Proposition 4.1(a)) and, hence, h, ϕ 0,m,θ w = 1 since f = h · w is a probability density.

Estimation of h under unknown θ
While, in practice, the vector θ = (θ 1 , . . . , θ m ) is unknown it is empirically available by the statistic X * from the experiment Z n,m whereas the oracle density estimator in (3.5) is only based on the independent statistic Y . The expectation of X * equals ϑ = (ϑ 1 , . . . , ϑ m−1 ). The definition of the ϑ k along with the fact that the components of θ sum up to 0 yields that θ = Z m ϑ for all θ ∈ Θ where This equation has also been used in [22] in order to construct confidence bands for θ in an intermediate experiment. That motivates us to useθ := Z m X * as a plug-in estimator of θ. Insertingθ for θ in (3.5) provides the fully accessible estimatorĥ w m,θ of h.

Data-driven selection of the weights
In order to choose the weightsŵ j,m,θ we apply a keep-or-kill strategy, also known as hard thresholding approach (see e.g. [19]). Concretely we put

Theoretical properties
In this section we study the asymptotic properties of the mean integrated squared error (MISE), which is defined by for the (oracle and realistic) estimators of h from section 3. For that purpose the spectrum (eigenvalues and eigenfunctions) of the operator Γ m,θ in (3.2) is studied in the following subsection.

Spectrum of the operator Γ m,θ
The following proposition provides important properties of the eigenvalues λ j,m,θ of Γ m,θ . The decay of the eigenvalues is crucial to establish convergence rates of the MISE.
(c) The sequence (Γ m,θ (m) ) m with θ (m) ∈ Θ m has no limit point with respect to the induced operator norm of L 2 (R, w), regardless of the specific sequence (θ (m) ) m . Proposition 4.1(c) shows that no (strong) limit operator of the Γ m,θ exists for large m, which complicates the analysis of the statistical inverse problems compared to the usual setting. Proposition 4.1(b) allows to control the asymptotic behaviour of the first log μ m eigenvalues for μ < 1/2 and, at least in terms of a lower bound, for μ < 1 as well. Furthermore, an upper bound on the supremum norm of the unit eigenfunctions ϕ j,m,θ of Γ m,θ will be established in the following lemma under the condition with the constants D 0 and D 1 from (2.3). We provide for all j = 1, . . . , m.
Finally, the fact that θ is not exactly known but must be estimated byθ, see subsection 3.2, motivates us to study the proximity between the corresponding eigenvalues λ j,m,θ for different values of θ in the following lemma.

Asymptotic optimality in the oracle setting
Let us consider the asymptotic quality of the estimator (3.5) in the oracle setting of subsection 3.1. In the following proposition an upper bound on its MISE is provided.

Proposition 4.2. It holds that
The proof follows from Proposition 4.2 by straight forward calculation. Under additional constraints on h m we will show asymptotic minimax optimality of our estimator (3.5) up to a constant factor.
Whenever the second term of the upper bound in Theorem 4.1 dominates the first one Theorem 4.1 and 4.2 yield "blind" rate minimaxity as neither h m nor the decay of the eigenvalues λ j,m,θ is specified explicitly. Indeed the condition m → ∞ (as n → ∞) is required in order to identify h under nonparametric constraints, as will be shown in the following theorem.

Asymptotic properties of the fully data-driven estimator
Now we study the asymptotic quality of the estimatorĥŵ m,θ from subsection 3.3 in the experiment Z n,m , which has been introduced in subsection 2.2 and shown to be asymptotically equivalent to model Y n,m in Theorem 2.1 and, thus, to the original MRM X n,m as well. The Theorem 4.4 shows that the convergence rates from Theorem 4.1 can be maintained up to deterioration by the factor m 4 log n. The factor log n occurs frequently in adaptation by the keep-or-kill strategy (unless appropriate blockwise shrinkage is considered; however, in the current problem, where the exact decay of the eigenvalues λ j,m,θ is unexplored, this seems very hard to handle). The factor m 4 stems from the statistical risk of the plug-in estimatorθ, see [22]. If we were allowed to use the true θ this factor could be removed.

Implementation and simulations
In this section we address the computational aspects of our estimation procedurê hŵ m,θ and investigate its finite-sample performance by numerical simulations.

Computation of the transforming Markov kernels
The estimatorĥŵ m,θ is designed to treat data from the experiment Y n,m or Z n,m . Therefore, before applying this estimator, the original data from the MRM experiment X n,m have to be transformed by the Markov kernels provided in [22]. These transformations are listed in the following. While in [22] specific contamination by Gaussian noise is required between step 1 and 2 as well as between step 2 and 3 in order to prove asymptotic equivalence, these blurring steps are omitted here.

In the MRM experiment X n,m one observes a binary n × m-matrix X.
Compute the statistic X [1] 2. Find y such that − ∇Ψ N (y) − X [1] [1 : m − 1] is minimized over a certain discretized grid (approximate inversion of the function −∇Ψ N ) where X [1] [1 : m − 1] denotes the first m − 1 components of X [1] . Replace X [1] [1 : m − 1] by that y; and keep the latter m components as they are. The outcome is denoted by X [2] . 3. Generate the N(n, n)-distributed random variable V independently of X [2] .

Numerical simulations
We generate data from the original MRM experiment in four settings. Therein two different ability densities are considered, namely The results indicate that the estimator is able to distinguish between the unimodal and the bimodal shape of density f 1 and f 2 , respectively, quite well. We realize that, for m = 5 as in Figure 5 and 6, the simulation results are     1-4; however the numerical effort of the computation increases significantly, in particular due to Step 2 in Subsection 5.1.

Proofs
Proof of Theorem 2.1: First we mention that the probability of the event {[τ (Y )] +,1 + · · · + [τ (Y )] +,m−1 > 0} tends to 1 uniformly with respect to θ ∈ Θ and F ∈ F thanks to Lemma 5.2 in [22]. Therefore it suffices to prove that denotes the squared Hellinger distance between two densities f and g. According to equation (A.4) in [30], the squared Hellinger distance between two multivariate normal densities with a joint expectation vector can be bounded from above as follows.
where, here, · F and · denote the Frobenius norm and the induced matrix norm, respectively. For all symmetric and positive semi-definite (m−1)×(m−1)matrices A and B, it holds that Note that {ΔΨ nq(θ,F ) (ϑ)} −p 2 is bounded from above by the smallest eigenvalue of ΔΨ nq(θ,F ) (ϑ), raised to the (−2p)th power, for p ∈ {1/2, 1}. The Lemmata 4.3 and 5.1 from [22] provide the lower bound on that eigenvalue. From the proof of Lemma 5.2 in [22], it follows that and, hence, that the term (6.2) has the asymptotic upper bound O m 8 /n + m 4 /n 1/2 with universal constant factors (only depending on R and f ). Finally the condition (2.4) completes the proof of (6.1).
Proof of Lemma 3.1: (a) It suffices to show that the β k,m,θ , k = 0, . . . , m, are linearly independent elements of L 2 (R, w). As the β k,m,θ are bounded and measurable functions they are located in L 2 (R, w). Now assume that and, hence, that the polynomial P m has infinitely many zeros in the interval (0, ∞) so that b 0 = · · · = b m = 0. (b) Linearity of Γ m,θ is obvious. Also we easily realize that Γ m,θ h lies in H m,θ for any h ∈ L 2 (R, w). Therein note that β k,m,θ (z)w(z)dz > 0 for all k = 0, . . . , m. Therefore the range of Γ m,θ is finite-dimensional so that Γ m,θ is a compact operator. Since γ m,θ (x, y) = γ m,θ (y, x) for all x, y ∈ R the operator Γ m,θ is self-adjoint, i.e. we have that by Fubini's theorem. Moreover we deduce that so that λ 0,m,θ = 1 is the largest eigenvalue of Γ m,θ . Moreover, consider that 1 w = 1. Whenever λ j,m,θ = 1 the above inequality is sharp so that As the functions β k,m,θ · w take on only positive values the eigenfunction ϕ j,m,θ coincides with either a non-negative function or a non-positive function Lebesgue-almost everywhere. For all j ≥ 1, it holds that due to the orthogonality of the eigenfunctions, so that j = 0 follows. Therefore the eigenspace with the eigenvalue 1 is the linear span of 1. Courant's min-max theorem provides that where the suprema are taken over all (j + 1)-dimensional linear subspaces H and F of L 2 (R, w) and L 2 (R), respectively; and Therein we have used that the linear operator which maps each h ∈ L 2 (R, w) to the function Writing K * l := K l − η l for the centered versions of the K l the function G m,θ (·/σ m,θ ) may be represented by Taylor approximation yields that where the remainder term satisfies for all x ∈ R. Hence, for all |x| ≤ σ m,θ , it holds that Since m/4 ≥ σ 2 m,θ ≥ m/(1 + exp{R}) 2 , it follows for ρ m := (D log m) 1/2 and some constant D > 0 that on the event E m,θ (δ m ) := {|Z m,θ | ≤ δ m }, for some δ m ∈ (1, ρ m /2), where, throughout this proof, the constants contained in O(· · · ) may only depend on w, R and D. Therein we impose that the function w and its derivative are uniformly bounded; and that ρ m < σ m,θ , which is satisfied for m sufficiently large with respect to R and D. Under the same constraints which we have used to establish (6.6) we derive that for all f ∈ L 2 (R) with f 2 = 1 using the Cauchy-Schwarz inequality in L 2 (R). Moreover, if E m,θ (δ m ) occurs, we have that and that Finally we study the terms The function Φ m,θ increases strictly and satisfies so that, on the event E m,θ (δ m ), the mode x m,θ is located in the interval [−ρ m /σ m,θ , ρ m /σ m,θ ] for ρ m > 2δ m and m sufficiently large with respect to w, R and D. Therefore the term (6.10) is bounded from above by and the term (6.11) obeys the upper bound if E m,θ (δ m ) occurs, thanks to (6.5). Combining these results with (6.6)-(6.9), we deduce that on E m,θ (δ m ) when D ≥ 8. Whenever w(0) > 0 holds, in addition, then (6.4) and (6.13) provide that (6.14) where N denotes the N(0, 1)-density and * denotes convolution. Now we focus on the distribution of Z m,θ . Writing

Fourier inversion yields that
for all integer k = 0, . . . , m. It follows that (6.15) since for all f ∈ L 2 (R) with f 2 = 1 by the Cauchy-Schwarz inequality and Parseval's identity where f ft denotes the Fourier transform of the function f . Analogously we deduce that for all x, y ∈ R, so that Piecing together (6.14)-(6.16) yields that for all integer j ≥ 0. We define which forms an infinite-dimensional closed linear subspace of L 2 (R) for any S > 0. Also we fix that δ m := c log 1/2 m for some constant c ∈ (0, 1). By Jensen's inequality we derive that for any R > 0, f ∈ L 2 (R) with f 2 = 1 and L(Z) = N(0, 1). Putting R = δ m and using Parseval's identity we deduce from (6.17) that imposing that F is a (j + 1)-dimensional linear subspace of L δm/4 2 (R), where f ft denotes the Fourier transform of the function f . More precisely, we select any orthonormal system f 0 , . . . , f j in L 1 2 (R) where the f l are located in the first order Sobolev space, in addition; and put F equal to the linear hull of the for any d > 0 where we have used integral substitution, the Fourier representation of the Sobolev norm and Parseval's identity again. Let ϕ be some continuously differentiable function which is supported on [−1, 1] and satisfies ϕ 2 = 1. Then put The derivatives f l form an orthogonal system as well. In particular we have whenever j l=0 a 2 l = 1 so that (6.18) and (6.19) yield the claimed asymptotic behaviour of the eigenvalues.
(c) Assume convergence of some subsequence (Γ σ(m),θ (σ(m)) ) m to some continuous linear operator Γ. As the set of all compact, self-adjoint and positive semidefinite operators from L 2 (R, w) to itself is closed, Γ is compact, self-adjoint and positive semidefinite as well. Let λ j denote the (j + 1)th largest eigenvalue of Γ for any integer j ≥ 0. Thanks to the imposed convergence and part (b) of the proposition, it holds that λ j = 1 since Therefore the eigenspace of Γ with the eigenvalue 1 is infinite-dimensional, which contradicts the compactness of Γ.
Proof of Lemma 4.2: By Taylor approximation we deduce for any fixed m and θ , θ ∈ Θ that where, here, · stands for the induced operator norm of L 2 (R, w) and Γ l,m,θ denotes the compact linear operator which maps any h ∈ L 2 (R, w) to the function x → ∂γ m,θ ∂θ l (x, y) h(y) w(y) dy .

Proof of Theorem 4.2:
In the experiment Y n,m one observes (X, Y ). As the conditional distribution of X given Y does not depend on f , the statistic Y is sufficient for f (and, thus, h) when θ is known so thatĥ n may be viewed as being based on Y . We introduce the functions As we are considering lower bounds we may assume |ĥ n,j | ≤ h * m,j for all j without any loss of generality so that By η ν,j,± we denote the N(nq(θ, h ν,j,± ), nQ(θ, h ν,j,± ))-density where h ν,j,± stands for the function h ν when ν j is replaced by ±1. We deduce that