Depósito Legal: VG 1402-2007Kernel density estimation with doubly truncated data

In some applications with astronomical and survival data, doubly truncated data are sometimes encountered. In this work we introduce kernel-type density estimation for a random variable which is sampled under random double truncation. Two different estimators are considered. As usual, the estimators are defined as a convolution between a kernel function and an estimator of the cumulative distribution function, which may be the NPMLE [2] or a semiparametric estimator [9]. Asymptotic properties of the introduced estimators are explored. Their finite sample behaviour is investigated through simulations. Real data illustration is included. AMS 2000 subject classifications: Primary 62G07; secondary 62N02.


Introduction
Truncated data play an important role in the statistical analysis of survival times as well as in other fields like astronomy or economy.Double truncation of survival data occurs e.g. when only those individuals whose event time lies within a certain subject-specific observational window are observed.An individual whose event time is not in this interval is not observed and no information on this subject is available to the investigator.Because we are only aware of individuals with event times in the observational window, the inference with truncated data is based on sampling information from a conditional distribution.Hence, suitable corrections to account for the observational bias are needed.This problem goes back to Turnbull [16].Among the various existing problems of random truncation, literature has mainly been focused on the left-truncation model or, more generally, in one-sided truncation setups.Woodroofe [19] investigated the properties of the nonparametric maximum-likelihood estimator (NPMLE) of the distribution function (df) with left-truncated data, see also [3].This estimator was further investigated by Stute [14], being also extended to the right-censored scenario (see [15,18] or [20], among many others).However, literature on random double truncation is much scarcer.A possible reason is the absence of closed form estimators; indeed, the existing methods for doubly truncated data are iterative and computationally intensive, and these issues make difficult both the theoretical developments and the practical implementations.
Efron and Petrosian [2] introduced the NPMLE of the df under double truncation, while Shen [12] formally established the uniform strong consistency and the weak convergence of the NPMLE.Bootstrap methods to approximate the finite sample distribution of the NPMLE with doubly truncated data were explored in [8].The semiparametric approach, in which the distribution of the truncation times is assumed to belong to a given parametric family, was investigated in [9], see also [13].Interestingly, these authors showed that the semiparametric estimator may outperform the NPMLE in the sense of the mean squared error (MSE).An R package to compute the NPMLE of a doubly truncated df was presented in [10].However, for the best of our knowledge, estimation of a density function observed under random double truncation has not been considered so far.
The rest of the paper is organized as follows.In Section 2 two new estimators of a doubly truncated density function are introduced, and their main asymptotic properties are discussed.As usual in kernel smoothing, these estimators are obtained as a convolution between a kernel function and an appropriate estimator of the cumulative df.The first estimator is purely nonparametric, since it is based on the Efron and Petrosian's NPMLE [2]; while the second estimator is semiparametric, being constructed from the semiparametric cumulative df proposed by Moreira and de Uña-Álvarez [9].Section 3 provides a simulation study in which the finite-sample properties of the two estimators are investigated.In particular, we explore in much detail the role of the smoothing parameter or bandwidth.Both estimators are critically compared in the sense of the integrated MSE.In Section 4 we give a real data illustration of the proposed methods.To this end, we use data on childhood cancer from Northern region of Portugal [8].Main conclusions and a final discussion are given in Section 5.

The estimators. Asymptotic properties
Let X * be the random variable of ultimate interest, with df F , and assume that it is doubly truncated by the random pair (U * , V * ) with joint df T , where U * and V * (U * ≤ V * ) are the left and right truncation variables respectively.This means that the triplet (U * , X * , V * ) is observed if and only if U * ≤ X * ≤ V * , while no information is available when In the absence of such independence assumption, the recovery of the distribution of X * is not possible in general.Martin and Betensky [5] discussed a testing procedure for quasi-independence, a weaker assumption under which the methods discussed here are still consistent.Let (U i , X i , V i ), i = 1, . . ., n, denote the sampling information, these are iid data with the same distribution of (U * , X * , V * ) given U * ≤ X * ≤ V * .Introduce α = P (U * ≤ X * ≤ V * ), the probability of no-truncation.For any df W denote the left and right endpoints of its support by a W = inf {t : W (t) > 0} and b W = inf {t : W (t) = 1}, respectively.Let T 1 (u) = T (u, ∞) and T 2 (v) = T (∞, v) be the marginal df's of U * and V * , respectively.When a T1 ≤ a F ≤ a T2 and b T1 ≤ b F ≤ b T2 , F and T are both identifiable (see [19]).
In the following two Subsections we introduce respectively the NPMLE and the semiparametric estimator of the df of X * .Then, in Subsection 2.3 we consider the problem of estimating the density function on the basis of these two cumulative estimators.

The NPMLE of the cumulative df
Here, we assume without loss of generality, that the NPMLE is a discrete distribution supported by the set of observed data [16].Let ϕ = (ϕ 1 , . . ., ϕ n ) be a distribution putting probability ϕ i on X i , i = 1, . . ., n.Similarly, let ψ = (ψ 1 , . . ., ψ n ) be a distribution putting joint probability ψ i on (U i , V i ), i = 1, . . ., n.Under the assumption of independence between X * and (U * , V * ), the full likelihood, L(ϕ, ψ), can be decomposed as a product of the conditional likelihood of the X i 's given the (U i , V i )'s, say L 1 (ϕ), and the marginal likelihood of the where Φ i is defined through The conditional NPMLE of F [2] is defined as the maximizer of L 1 (ϕ) in equation (2.1): φ = argmax ϕ L 1 (ϕ).Shen [12] proved that the conditional NPMLE maximizes indeed the full likelihood, which can be also written as the product where Here, L * 1 (ψ) denotes the conditional likelihood of the (U i , V i )'s given the X i 's and L * 2 (ψ, ϕ) refers to the marginal likelihood of the X i 's.Introduce ψ = ( ψ1 , . . ., ψn ) as the maximizer of The NPMLE of F also admits the representation where F * n is the ordinary empirical df of the X i 's, is a nonparametric estimator for the conditional probability of sampling a lifetime X * = t, i.e.G(t) = P (U * ≤ t ≤ V * ), and ) −1 is an estimator for α.Shen [12] established the uniform strong consistency and the weak convergence of F n .
Once θ is estimated, a semiparametric estimator for F is introduced through , Moreira and de Uña-Álvarez [9] established the asymptotic normality of both θ and F θ .They also showed by simulations that F θ may perform much more efficiently than the NPMLE.As a drawback, the semiparametric estimator requires preliminary specification of a parametric family, which may eventually introduce a bias component when it is far away from reality ([9]).

The density estimators
Introduce where K h (t) = K(t/h)/h is the re-scaled kernel function and h = h n is a deterministic bandwidth sequence with h n → 0. Note that (2.2) is a purely nonparametric estimator of f , the density of X * (assumed to exist).Introduce also the semiparametric kernel density estimator Note that both estimators (2.2) and (2.3) correct the double truncation by downweighting the X i 's according to an estimation of the sampling probability G(X i ).This is very intuitive, since the values with less probability of being observed are receiving more mass.The case G(.) = 1 is possible; for example, this happens whenever the left-truncation time U * is uniformly distributed in a suitable interval and V * − U * is degenerated.See our real data illustration.
In such a case, the correction for truncation vanishes and we obtain the usual kernel density estimators.Both G n and G θ are √ n-consistent estimators of G.For G θ this follows from the √ n-consistency of θ, provided that G θ is a smooth function of θ ( [9]).For G n , the result may be obtained by noting that , where T * n is the ordinary empirical df of the truncation times.Hence, √ nconsistency of G n is a consequence of that of F n [12] and T * n .Since both G n and G θ approach to G at a √ n-rate, which is faster than the nonparametric rate √ nh, the asymptotic properties of f h and f θ,h will be the same, and will coincide with those of the estimator based on the true G.However, for the finite sample case, some error improvements are expected when using f θ,h due to the smaller variance associated to G θ .This issue is illustrated in our simulations section.
Introduce the asymptotically equivalent version of f h and f θ,h through where In the next result we establish the strong consistency and the asymptotic normality of f h (x).We implicitly assume G(x) > 0 throughout this Section.
By [1] we have f 0,h (x) → f 0 (x) almost surely, where and the supremum goes to zero by the continuity of G at x.This ends with the proof to (i).Statement (ii) is proved similarly to Section 2 of [11]; by following such lines we obtain Now, a two-term Taylor expansion (and the fact that K is even) gives Ef h (x) = f (x) + O(h 2 ).Since nh 5 → 0, this implies the claimed result.
Remark.Formally, to derive the asymptotic normality of f h (x) and f θ,h (x) from Theorem 2.1 (ii) we need the conditions sup y∈ξ(x) respectively, where ξ(x) is a neighborhood of x.From the √ n-consistency results derived in [12] and [9], it is reasonable to conjecture that these two conditions will hold under suitable regularity assumptions.The formal derivation of these result falls, however, out of the scope of the present work.
The asymptotic mean and variance of (2.4) are given in the following result.We refer to the following standard regularity assumptions.
Proof.The proof follows standard steps.A second-order Taylor expansion of f around x is used, and the assumptions on the kernel and the bandwidth are enough to conclude.See e.g.[17].
Theorem 2.2 shows that the double truncation influences the variance of f h (x), the bias being unaffected otherwise.More specifically, the variance of the estimator is large at points x for which the relative probability of getting X i values around x (i.e.G(x)) is small.Usually one will be interested in the global error of f h as an estimator of the entire curve f .This can be measured through the integrated MSE, namely Under regularity, we have from the previous results the following asymptotic expression for the M ISE(f h ): Because of the √ nh-equivalence between G n ,G θ and G, the same asymptotic expression will hold for f h and f θ,h under proper conditions.Note that the conclusions of Theorem 2.2 do not automatically transfer to the two proposed estimators since the (nonparametric or parametric) estimation of the function G will influence the bias and the variance.However, heuristically, one may argue that this influence will be negligible in the limit provided that the proposed estimators and f h are asymptotically equivalent (Remark above).See [4], Sec.2.4.1, for similar argumentations.In Section 3, the real impact of the estimation of G in the performance of f h and f θ,h will be explored through simulations.
Interestingly, Hölder's inequality gives α G −1 f ≥ 1, which indicates that the global error when estimating the density in the doubly truncated scenario is at least as large as that pertaining to the no truncated situation.This does not mean that for a particular x the MSE of f h (x) may not be smaller than in the i.i.d.situation, since αG(x) −1 f (x) < 1 may happen.Minimization of AM ISE(f h ) w.r.t.h leads to the asymptotically optimal bandwidth Of course, this expression depends on unknown quantities that must be estimated in practice.There exist several criteria to select the bandwidth from the data at hand.Although in this paper we do not propose any particular automatic bandwidth selector, in Section 3 we investigate through simulations the impact of the smoothing parameter in the performance of the two introduced density estimators f h and f θ,h .

Simulation study
In this section we illustrate the finite sample behavior of both estimators, the purely nonparametric estimator and the semiparametric estimator, through simulation studies.We analyze the influence of the bandwidth in the estimators' mean integrated squared errors (MISEs), and we measure the amount of efficiency which is gained through the using of the semiparametric information.
For the computation of the semiparametric density estimator, as parametric information on (U * , V * ) we always consider a Beta(θ 1 , 1) for U * ; besides, a Beta(1, θ 2 ) is considered for V * in Case 1.An exception is Model 1.3 for which we considered the product Exp(θ 1 ) × Exp(θ 2 ) (with the marginals truncated on the (0, 1) interval) as parametric information on (U * , V * ).Note that these parametric models include the several truncation distributions in the simulations.For each Model, we simulate 1000 Monte Carlo trials with final sample size n = 50, 100, 250 or 500.This means that, for each trial, the number of simulated data is much larger than n, actually N ≈ nα −1 are needed on average, where recall that α stands for the proportion of no truncation.For the simulated models, the proportion of truncation ranges between 44% and 88%.More specifically, the following right and left truncation proportions occur: 37% (right) and 44% (left) for Model 1.1; 38% and 40% for Model 1.2; 67% and 19% for Model 1.3; 53% and 22% for Model 2.1; and 45% and 28% for Model 2.2.
In Table 1 we report the optimal bandwidths (in the sense of the MISE) and the corresponding minimum MISE's for both the nonparametric and the semiparametric estimators.The theoretical MISE function is approximated by the average of the ISEs along the M =1000 trials, namely where f m h and f m θ,h are the nonparametric and the semiparametric estimators when based on the m-th Monte Carlo trial.
From Table 1 it is seen that the optimal bandwidths and the MISEs decrease when increasing the sample size; besides, the semiparametric estimator has an error which is smaller than that pertaining the the nonparametric estimator.It is also seen that the optimal bandwidths for the semiparametric estimator are smaller than those of the nonparametric estimator, according to the extra amount of information.As the sample size grows, the relative efficiency of the nonparametric estimator approaches to one; this is in agreement to the asymptotic equivalence of the semiparametric and the nonparametric density estimators discussed in Section 2. Interestingly, for finite sample sizes we see that such relative efficiency may be as poor as 45% (Model 2.2, n = 50).
When comparing Models 1.1 and 1.2 -1.3, one can appreciate that the density corresponding to the first one is not so well approximated by the two es- In Table 2 we report the biases and the variances of the nonparametric and semiparametric estimators at some selected time points, corresponding to the quartiles of F , for sample sizes n=100, 500, along the 1,000 Monte Carlo trials.It is seen that the squared bias is always of a smaller order when compared to the variance, so the resulting mean squared errors (MSEs) are mainly determined by the estimates' dispersion.For all the cases, these local MSEs are smaller for the semiparametric estimator.The biases and variances vary along the quartiles; this is asymptotically explained by the influence of f ′′ (x), respectively of G(x) −1 f (x), in these quantities, see Section 2. In particular, it is seen in Table 2 that the variance is larger around the median for a Gaussian X * , according to the larger value of f (x) at this point.On the other hand, when f is monotone decreasing (Model 2.1), the maximum variance is obtained for the first quartile.Finally, when f is constant (Model 1.1), the variance is mainly affected by the  function G, so a smaller dispersion is found at the median (corresponding to the maximum value of G(x) = x(1 − x) for Model 1.1).
In Figures 1 to 5 we report for each simulated model: (i) the ratio between the MISE's of the semiparametric and the nonparametric estimators along a grid of bandwidths (top row); (ii) the ratio between the MISE of the semiparametric estimator and the minimum MISE of the nonparametric estimator (middle row); and (iii) the target density together with its semiparametric and nonparametric estimators averaged along the 1000 Monte Carlo trials (bottom row).From these Figures 1 to 5 several interesting features are appreciated.First, for each given smoothing degree, the MISE of the semiparametric estimator is less than that of the nonparametric estimator; the relative benefits of using the semiparametric information are more clearly seen when working with relatively smaller bandwidths, when the variance component of the MISE is larger.This illustrates how

(i) The ratio between the MISE's of the semiparametric and the nonparametric estimators along a grid of bandwidths (top row); (ii) the ratio between the MISE of the semiparametric estimator and the minimum MISE of the nonparametric estimator (middle row); and (iii) the target density (solid line) together with its semiparametric (dashed line) and nonparametric (dotted line) estimators averaged along the 1000 Monte Carlo trials (bottom row) for Model 1.1.
the semiparametric estimator achieves a variance reduction w.r.t. the NPMLE.
The minimum relative efficiency of the nonparametric kernel density estimator varies from about 0.4 to about 0.85, depending on the simulated model and the sample size.Also importantly, we see that the ratios of the MISE's approach to one as the sample size increases.This was expected, since (as discussed in Section 2) both estimators are asymptotically equivalent.However, even when n = 500, the relative performance of the nonparametric estimator may be as poor as 70% (Figure 5, top).Second, from the middle rows of Figures 1 to 5, we see that the semiparametric estimator behaves more efficiently than the nonparametric estimator even when the former uses a sub-optimal bandwidth.Indeed, for Models 1.

. (i) The ratio between the MISE's of the semiparametric and the nonparametric estimators along a grid of bandwidths (top row); (ii) the ratio between the MISE of the semiparametric estimator and the minimum MISE of the nonparametric estimator (middle row); and (iii) the target density (solid line) together with its semiparametric (dashed line) and nonparametric (dotted line) estimators averaged along the 1000 Monte Carlo trials (bottom row) for Model 1.2.
widths which maintain the superiority of the semiparametric density estimator with respect to the nonparametric estimator based on its optimal smoothing parameter.Finally, the averaged estimators depicted in Figures 1-5 reveal that the semiparametric estimator fits better the target than its nonparametric competitor when the sample size is moderate.Simulations above are informative about the relative performance of the two proposed estimators when the parametric information on the truncation distribution is correctly specified.However, in practice, some level of misspecification in the parametric model may occur.To investigate the sensitivity of the semiparametric estimator to some level of misspecification, we have repeated the simulation of Model 1.1 but changing the U (0, 1) distribution of U * for a Beta(1, a) distribution with a = 1, so the parametric information Beta(θ 1 , 1) on

Fig 3. (i) The ratio between the MISE's of the semiparametric and the nonparametric estimators along a grid of bandwidths (top row); (ii) the ratio between the MISE of the semiparametric estimator and the minimum MISE of the nonparametric estimator (middle row); and (iii) the target density (solid line) together with its semiparametric (dashed line) and nonparametric (dotted line) estimators averaged along the 1000 Monte Carlo trials (bottom row) for Model 1.3.
U * is misspecified.Results on the bandwidth, the MISE, and the local MSE of both the semiparametric and nonparametric density estimators are reported in Tables 3 and 4 for the case n=500 (results based on 1,000 trials).From Table 3, it is seen that the semiparametric estimator may be still equivalent or even preferred to the nonparametric estimator when the misspecification level is small (a = 1/2, 3/2).However, for a larger specification error (a = 1/5, 4), the MISE of the semiparametric estimator is above that of the nonparametric one.Table 4 indicates that, when the parametric information is misspecified, the variance of the semiparamtric estimator is still smaller than that of the nonparametric estimator; however, the bias of the semiparametric estimator may be one order of magnitude greater than the bias of the nonparametric estimator which, overall, explains the relative MISE results of Table 3.

Real data illustration
For illustration purposes, in this section we consider data on the age at diagnosis of childhood cancer.These data concern all the cases of childhood cancer diagnosed in North Portugal between 1 January 1999 and 31 December 2003.The age at diagnosis (ranging from 0 to 15 years old) is doubly truncated by (U * , V * ), where V * stands for the elapsed time (in years) between birth and end of the study (31 December 2003), and U * = V * − 5. Information on the 406 diagnosed cases is entirely reported in [8].
The semiparametric and the nonparametric kernel estimators for the density of X * computed from the n = 406 cases are given in Figure 6.The scale in the horizontal axis comes from the transformation (t+5)/20, which has been used for  the ages at diagnosis and the truncation variables.With this transformation, the U * is supported on the (0, 1) interval.For the semiparametric estimator, we assume a Beta(θ 1 , θ 2 ) model for U * , and the parameters are estimated by maximizing the conditional likelihood of the truncation times (see Section 2 for details).In this case the pair (U * , V * ) does not have a density, and the likelihood L * 1 (θ) must be properly re-defined by substituting the density of U * for g θ in that expression, see Remark 2.1 in [9] for further details.
Three different bandwidths are used: h = 0.02, h = 0.035, and h = 0.06.As expected, more bumps appear as the smoothing degree decreases.For large bandwidths, only two bumps remain, indicating the existence of two subgroups of cases: early cancer detection and late detection (less frequent).For comparison, the naive kernel density estimator which does not correct for the double truncation is also reported.We see that the three estimators are close to each other.This is not surprising, since previous analysis of these data have shown that there is almost no observational bias on the age at diagnosis because of the uniformity of U * [8].This fact is also confirmed in Figure 8, left, in which a fairly flat shape of G n is seen.For further illustration, in Figure 7 we provide these three estimators for a subgroup of cases.Specifically, we consider the n = 38 diagnosed cases of neuroblastoma.For this subgroup the uniformity of U * is lost, and as a consequence there exists some observational bias ( [7], page 78).Certainly, Figure 8, right, suggests that relatively small ages at diagnosis are more probably observed.This explains the overestimation of the density carried out by the naive estimator at the left tail.Unlike the naive estimator, both the nonparametric and the semiparametric estimators which take the double truncation issue into account declare a second mode at the right tail.These two estimators are similar on the interval [0.25, 0.45] while differences appear from 0.45 on.In order to explain this, we report in Figure 8 the estimators G n and G θ for the full data set and for the neuroblastoma cases.Note that the semiparametric estimator is based on a parametric specification of the truncation df; this introduces a bias term which influences the shape of the final density estimator while reducing its variance.Indeed, Figure 7, right, indicates that G −1 θ is smaller than G −1 n at intermediate values of X * , while the contrary occurs at large times.This explains why the semiparametric estimator locates the second mode more to the right.This biasing effect of the parametric model is not appreciated when analyzing the full data set because G n and G θ are close to each other in this case (Figure 8, left).

Conclusions and final discussion
In this paper we have introduced kernel density estimation for a variable which is observed under random double truncation.Two estimators have been proposed.The first one is purely nonparametric, and it is defined as a convolution of a kernel function with the NPMLE of the cumulative df.The second estimator is semiparametric, since it is based on a parametric specification for the df of the truncation times.Asymptotic properties of the two estimators have been discussed, including a formula for the asymptotic mean integrated squared error (MISE).
Both estimators are asymptotically equivalent in the sense of having the same asymptotic MISE.However, for small and moderate sample sizes, we have seen that the semiparametric estimator may outperform the nonparametric estimator.More explicitly, the relative efficiency of the nonparametric estimator may be as poor as 45% in special situations with small sample sizes.Moreover, in special instances, the relative benefits of using the semiparametric approach are clearly seen even when the sample size is as large as n = 500.Finally, our simulation results have revealed that the semiparametric estimator may be preferable even when based on a sub-optimal bandwidth.A real data illustration has been provided.
A crucial issue in the construction of the semiparametric estimator is how to choose the parametric model for the truncation distribution.Note that, rather than the truncation distribution itself, the function G influences the shape of the final estimator.Hence, an informal assessment of the parametric family may be performed by plotting the empirical biasing function G n together with the fitted G θ .Formal goodness-of-fit tests for a parametric model could be developed too, and this problem is currently under research.
Since the bandwidth h plays a very important role in the performance of the estimators, an interesting topic for future research is to investigate automatic bandwidth selectors.Bandwidth selectors can be introduced from the asymptotic MISE expressions derived in Section 2, although other possible criteria include bootstrap and cross-validation methods.Also, the application of kernel smoothing to the estimation of the hazard rate function (another important curve in Survival Analysis) in a doubly truncated setup is currently under investigation.
Fig 2. (i) The ratio between the MISE's of the semiparametric and the nonparametric estimators along a grid of bandwidths (top row); (ii) the ratio between the MISE of the semiparametric estimator and the minimum MISE of the nonparametric estimator (middle row); and (iii) the target density (solid line) together with its semiparametric (dashed line) and nonparametric (dotted line) estimators averaged along the 1000 Monte Carlo trials (bottom row) forModel 1.2.

Fig 4 .
Fig 4. (i) The ratio between the MISE's of the semiparametric and the nonparametric estimators along a grid of bandwidths (top row); (ii) the ratio between the MISE of the semiparametric estimator and the minimum MISE of the nonparametric estimator (middle row); and (iii) the target density (solid line) together with its semiparametric (dashed line) and nonparametric (dotted line) estimators averaged along the 1000 Monte Carlo trials (bottom row) for Model 2.1.

Fig 5 .
Fig 5. (i) The ratio between the MISE's of the semiparametric and the nonparametric estimators along a grid of bandwidths (top row); (ii) the ratio between the MISE of the semiparametric estimator and the minimum MISE of the nonparametric estimator (middle row); and (iii) the target density (solid line) together with its semiparametric (dashed line) and nonparametric (dotted line) estimators averaged along the 1000 Monte Carlo trials (bottom row) for Model 2.2.
timators; this is because the strong boundary effects of the uniform density (Model 1.1), which disappear when considering a Gaussian model (Models 1.2 -1.3).Also, the difficulties for estimating the normal density in Case 2 (Model 2.2) are greater than under Models 1.2 -1.3; this could be explained from the above mentioned fact that Models 1.2 -1.3 favor the observation of intermediate lifetimes, so there is more sampling information around the density mode (the difficult part to estimate).Model 2.1 is the one presenting the largest MISEs; this Model 2.1 presents difficulties at the left boundary, where the density goes to infinity.

Table 2
Biases and variances of the nonparametric (EP ) and semiparametric (SP ) estimators at the quartiles of F , for sample sizes n=100, 500, along the 1,000 Monte Carlo trials

Table 3
Optimal bandwidths (hopt) and minimum MISEs of the density estimators: nonparametric estimator (EP ) and semiparametric estimator (SP ) along 1000 trials of sample size n =500.Similar as Model 1.1, but U * is simulated as a Beta(1, a) random variable

Table 4
Biases and variances of the nonparametric (EP ) and semiparametric (SP ) estimators at the quartiles of F , for sample size n=500, along the 1,000 Monte Carlo trials.Similar as Model 1.1, but U * is simulated as a Beta(1, a) random variable