Efficient nonparametric estimation of distribution for current status censoring

Abstract: Current status censoring (CSC) implies that there is no direct access to the lifetime of an event of interest. Instead it is known if the event already occurred or not at a random monitoring time. CSC is a simple sampling procedure and in many cases the only possibility to assess the lifetime of interest. At the same time, the absence of a direct measurement of a lifetime of interest makes the problem of nonparametric distribution estimation ill-posed. A simple, adaptive and sharp minimax estimator of the density and cumulative distribution function is proposed. The simplicity of estimator also allows us to relax assumptions. Practical examples illustrate CSC problem and the proposed estimator.


Introduction
Considered problem is a nonparametric data-driven and sharp minimax estimation of the probability density and the cumulative distribution function (cdf) of the lifetime of interest X (nonnegative random variable) which is not observed directly. Instead, there is a possibility to check status of the event at some random moment of time Z, called the monitoring time. Then the available current status censoring (CSC) observation is a pair of random variables pZ, Δq where Z is the monitoring time and Δ :" IpX ď Zq is the status of the event of interest, namely the status (indicator) is equal to 1 if the event of interest already occurred at moment Z and the status is 0 otherwise. Available data is a sample of size n from pZ, Δq.
Current status censoring (CSC), also known as "case I" interval censoring, is a classical problem in survival analysis, see a discussion in books [9,15,21,23,25,32,36] and thorough reviews of CSC in papers [7,10,16,24,27,28,34] where further references may be found. It is well known that the stated nonparametric problem of density and cumulative distribution function estimation is ill-posed with slower rates of risk convergence than for the case of direct observations. Due to the slower rates of convergence, it is always important to study not only rates but sharp (minimal) constants. Recently [16] established sharp minimax lower bounds for the Mean Integrated Squared Error (MISE) convergence for an oracle that knows CSC data, information about distribution of X, and density of monitoring time Z. The aim of this paper is to propose density and cdf estimators that match performance of the oracle and attain the sharp constant and the rate of the MISE convergence.
The oracle lower bounds and proposed data-driven estimators matching the oracle will be presented shortly, and now let us review the related nonparametric literature. The literature is vast and primarily devoted to estimating the cdf. Let us begin with [33] where it is established that, under a mild assumption on differentiability of the cdf, it may be estimated pointwise with rate n´1 {3 by a nonparametric maximum likelihood estimate (NPMLE). Minimal assumptions under which the rate is optimal can be found in [18]. Note that the rate is dramatically slower than n´1 {2 for the case of direct observations, and more discussion and a thorough review of previous results can be found in these papers. Another seminal paper, devoted to estimation under minimal assumptions, is [5] where a piecewise constant (histogram-type) estimator of cdf is proposed. [37] studied locally linear smoothers. Spline methods were introduced in [26]. A novel kernel method was proposed in [20]. In [8] a warped adaptation for a kernel estimator was motivated by Goldenshluger-Lepskii procedure which yielded the squared-bias and variance trade-off. Log-concave constraint was proposed in [2]. Bootstrapped confidence bands are developed in [22]. A number of interesting and thought-provoking papers are devoted to orthogonal series estimation. In [7] a rigorous analysis of a so-called quotient estimator is performed. The underlying idea is to write the cumulative distribution function of interest as a ratio of two densities of directly observed random variables. Then each density is estimated via a series projection estimator with cutoffs chosen via minimization of a penalized contrast function. It is shown that the adaptive estimator attains optimal nonparametric rates whenever smoothnesses of the two densities are the same. A regression-type estimator was also explored. Further development and literature review on series estimation can be found in [6]where both compact and non compactly supported bases are considered. [29] used penalization for a projection series estimator of a conditional cumulative distribution function.
Density estimation problem for CSC data is dramatically less explored. Let us mention [3] where the authors consider a kernel density estimation via solving it-erative equations, nonparametric maximum likelihood estimation, and also via a local EM approach. The latter requires an explicit solution of the local likelihood equations which is done via the symbolic Newton-Raphson algorithm. Convergence of proposed algorithms is studied. In [4] data sharpening is proposed to increase robustness of a kernel density estimator to bandwidth misspecification and measurement errors. [19] used a maximum smoothed likelihood approach and a smoothing the (discrete) MLE of the distribution function approach. In particular, under assumption that the cumulative distribution function is threefold differentiable, the density is estimated with the rate n´4 {7 . Asymptotic distribution of the estimate was further explored in [20]. In [34] a kernel estimator is proposed under an assumption that the density of interest is twice differentiable and the density of monitoring time is three-fold differentiable. A smart procedure of data transformation is used to convert the problem into deconvolution. Then the optimal rate n´4 {7 is achieved, and expansions of the expectation and variance as well as asymptotic normality are derived. This paper also contains a nice literature review.
It is important to stress that current status sampling may be dramatically simpler than direct sampling of a lifetime of interest, and in many cases it is the only available option, see a discussion in [9,25,32]. At the same time, CSC makes cdf and density estimation problems ill-posed. As a result, it is prudent to develop a data-driven estimator that attains both the optimal rate and the constant of a risk convergence, and does that without a requirement for the density of monitoring time Z to have a smoothness matching smoothness of an underlying cdf of the lifetime of interest X.
The paper proposes a simple adaptive and sharp-minimax estimator of the density and cdf of a lifetime of interest. The simplicity allows us to prove robustness of the estimator with respect to smoothness of the density of monitoring time. In [16], where a sharp minimax lower bound for CSC is obtained, it is assumed that the density of Z is as smooth as an estimated cdf of the lifetime of interest, and this is a serious restriction because we never know how smooth an underlying cdf of interest is. In this paper only a bounded derivative of the density is assumed, and this makes the proposed estimator robust.
The context of the paper is as follows. Section 2 presents assumptions and a known sharp lower bound. A proposed data-driven estimator for the cdf may be found in Section 3. Density estimation is considered in Section 4. Practical examples and a numerical study are presented in Section 5. Proofs are in Section 6. Conclusion and open problems are in Section 7.
Finally let us introduce several notations used in the paper. Sample size is denoted as n, o n p1q is a traditional notation for vanishing sequences in n, s :" s n :" 3`rlnpn`3qs. (1.1) and ras denotes the smallest integer larger or equal to a, IpAq is the indicator of event A. The cosine basis on r0, 1s is ϕ 0 pxq :" 1, ϕ j pxq :" 2 1{2 cospπjxq, j " 1, 2, . . . (1.2) Assumption 1. The lifetime of interest X and the monitoring time Z are independent continuous random variables, PpX P r0, 1sq " 1, a known density f Z pzq of the monitoring time has a bounded derivative on r0, 1s, ş 1 0 f Z pzqdz " 1, and min zPr0,1s f Z pzq ě c˚ą 0.
Let us comment on the assumption. Given PpZ ď 1q " 1, no consistent estimation of the distribution of X is possible if PpX ą 1q ą 0. In other words, for consistent estimation the support of X must be the subset of the support of Z. Similarly, the assumed independence between X and Z is necessary for consistent estimation, see a discussion in [16]. In the presented below lower bound continuity of f Z pzq is sufficient, but in upper bounds we will use the differentiability. More comments can be found in Section 7.
Assumption 1 implies that the joint (mixed) density of pZ, Δq is f Z,Δ pz, δq " f Z pzqrF X pzqs δ r1´F X pzqqs 1´δ , δ P t0, 1u. (2.1) The formula sheds additional light on why for consistent estimation the support of X must be a subset of the support of Z. Now we introduce two function classes. The former is a classical global Sobolev class (ellipsoid) of α-fold differentiable functions on r0, 1s Note that in (2.3) κ j are Fourier coefficients of the function g. Sobolev function classes are traditionally considered in upper bounds, and this is what will be done shortly.
The latter function class is a local one where considered functions are close to a pivot in L 8 -norm. Let F 0 pxq be the cdf of a random variable (lifetime) supported on r0, 1s. Introduce a class of cumulative distribution functions supported on r0, 1s and created by additive perturbations of F 0 , FpF 0 , α, Q, c 0 , c 1 , ρq :" ! F : F pxq " F 0 pxq`gpxqIp0 ď x ď 1q, F p1q pxq ě 0, gp0q " gp1q " 0, g P Fpα, Qq, max  .
The local functional class (2.4) was introduced in [16], function F 0 pxq is called the pivot, and parameter ρ defines the local nature of the class. Local function classes and a corresponding local minimax approach for establishing lower bounds were pioneered in [17] for the case of direct observations, and for CSC data in [18] for finding an optimal rate for cdf estimation. Now we are in a position to present the lower bound of [16]. We are using notation pF X pxqq pβq with β " 0 and β " 1 corresponding to the cdf and the density, and in what follows the parameter β is used solely for this purpose.
Introduce a nonparametric Fisher information J pα, Q, d, βq where d :" is the so-called coefficient of difficulty of CSC. Note that the nonparametric Fisher information (2.5) is a ratio. The numerator is defined by an underlying class of estimated functions (by parameters α and Q) and by the estimand specified by parameter β, and the denominator (coefficient of difficulty) captures the effect of an underlying distribution of interest (here cdf F X ) and a nuisance function (here density f Z of the monitoring time). One may think that a nonparametric Fisher information is an analog of a classical parametric Fisher information 1{σ 2 in a problem of estimating the mean of a normal variable with variance σ 2 , see [1]. Further, the coefficient of difficulty d is a nonparametric analog of σ 2 because, as we will see shortly, d is a factor in the asymptotic variance of an efficient Fourier estimator implying sharp minimax nonparametric estimation. The larger the coefficient of difficulty, the more complex an estimated nonparametric problem is, and this explains the "coefficient of difficulty" terminology introduced in [13].
Here parameter β is either 0 or 1 for the estimand being either the cdf or the density, respectively, the supremum is over F X P FpF 0 , α, Q, 1{ lnplnpnqq, rlnplnpnqqs 1{2 , 1{ lnplnpnqqq defined in (2.4), the infimum is taken over all possible oracle-estimators r Ψ β knowing the sample, density f Z pxq of the monitoring time Z and everything about the class (2.4), namely F 0 , α and Q.
In Theorem 2.1 parameter ρ " 1{ lnplnpnqq " o n p1q, and accordingly (2.7) is the lower bound for a shrinking local minimax which makes the bound more challenging to match for an adaptive estimator and a global minimax.
It follows from Theorem 2.1 that the classical rate n´1 for estimating cdf, known for direct observations, slows down to n´2 α{p2α`1q . This is why a CSC sampling is called ill-posed with respect to direct sampling a lifetime of interest. Further, nonparametric Fisher information (2.5) and coefficient of difficulty d are important outcomes of the oracle's lower bound. In particular, the coefficient of difficulty captures the effect of the underlying distribution of interest and the distribution of the monitoring time on the Fisher information and the MISE convergence.
Let us comment on what is known about sharpness of the lower bound (2.7). It is a lower bound for an oracle-estimator that knows data, smoothness of an underlying cdf, and the nuisance density f Z of the monitoring time. The lower bound is sharp and it is attained by an oracle-estimator for a local Sobolev class. The estimator is too complicated to present here, it is motivated by a proof of the lower bound and uses an aggregation of bases for subsets of the support of X. In next two sections a simple data-driven and robust estimator is proposed that attains the sharp lower bound for a global Sobolev class (2.2) with unknown parameters α and Q. The simplicity will also allow us to relax assumptions of [16] about smoothness of the nuisance density f Z of the monitoring time. In other words, we will see shortly that it is possible to match performance of the minimax oracle.

CDF estimator
The aim of this section is to present a simple sharp-minimax cdf estimator that adapts to smoothness of an underlying cdf and may be used for small samples. Further, the estimator preserves its properties under a mild assumption of differentiability of the density f Z of the monitoring time Z.
In what follows we first present the estimator and its properties, and then explain the estimator and discuss the results.
Recall that we observe a sample pZ 1 , Δ 1 q, . . . , pZ n , Δ n q from a CSC pair pZ, Δq where Z is the monitoring time, Δ :" IpX ď Zq is the status, and X is the unobserved lifetime of interest. The aim is to estimate an underlying cdf F X pxq :" PpX ď xq of X.
We begin with the case of a controlled CSC when density f Z pzq of the monitoring time is known. Recall that sequence s :" s n and elements of cosine basis ϕ j pxq, j " 0, 1, . . . are defined in (1.1) and (1.2), respectively. Set for Fourier coefficient of the cdf of interest F X pxq. Then (2.1) implies that a pilot Fourier estimator is unbiased estimator of θ j , and accordingly set for unbiased estimate of function This function approximates F X pxq as both s and j increase to infinity, and note that ş 1 0 F´jpxqϕ j pxqdx " In what follows q F´jpxq is referred to as the pilot cdf estimator, and note that this estimator and F´jpxq do not depend on j whenever j ą s. Now we can define a Fourier estimator Note how simple the estimator is, hardly more complicated than the pilot (3.2). The proposed blockwise-shrinking cdf estimator is based in Fourier estimates (3.5) and the following three statistics. Introduce blocks of positive integers/frequencies B 0 :" t0, 1, . . . , su, B k :" t0, 1, . . . , st1`1{ lnpsqu k uz Y k´1 r"0 B r , k " 1, 2, . . ., and denote the length (cardinality) of B k by L k . Set and r d :" min´3s, n´1 Note that r d is a bounded plug-in sample mean estimate of the coefficient of difficulty (2.6).
The cdf estimator, based on an underlying density f Z pzq, is defined as Here r n is the largest integer r such that st1`1{ lnpsqu r ď lnpsqn 1{3 {4. Note that the number of blocks r n is of order plnpnqq 2 .
Here and in what follows, all specific constants allow us to use an estimator for small samples. The estimator will be commented on shortly.
In the following proposition we use notations of Theorem 2.1 introduced in Section 2.
If f Z is unknown, then its estimation is based on available sample Z 1 , . . . , Z n from Z. Accordingly, define a bounded from below projection estimate Note that the plug-in estimator is completely data-driven and it adapts to unknown smoothness of an underlying cdf of X. Important theoretical achievement of Theorem 3.2 is that only differentiability of f Z pzq is assumed (versus α-fold differentiability assumed in [16]).The improvement is due to the simpler Fourier estimator (3.5) and a more advanced proof. Now let us comment on the introduced estimator and statistics. It is worthwhile to begin with a brief introduction to series estimation.
Estimator (3.9) belongs to a class of orthogonal series estimators. A series estimator employs a familiar from the function analysis result that a square integrable on r0, 1s function qpxq can be written as qpxq " ř 8 j"0 κ j ϕ j pxq whenever tϕ j pxq, j " 0, 1, . . .u is a basis on r0, 1s and κ j " ş 1 0 qpxqϕ j pxqdx are corresponding Fourier coefficients. Then a traditional nonparametric rate-optimal estimation paradigm, based on a sample of size n, is as follows. Find a Fourier estimator s κ j whose mean squared error is of order n´1, that is Etps κ j´κj q 2 u ď Cn´1. Then use a projection estimator s qpxq " ř Mn j"1 s κ j ϕ j pxq where M n is called a cutoff. A popular add-hoc choice of the cutoff is M n " n 1{5 which yields rateoptimal estimation for qpxq P Fp2, Qq. Indeed, the Parseval identity implies that the mean integrated squared error (MISE) of the estimator is Here V n is called the variance component of the MISE and it is proportional to M n n´1, and B n is called the integrated squared bias (or simply bias) component and it is proportional to M´4 n . This yields the optimal MISE convergence of order n´4 {5 . Note that for q P Fpα, Qq the optimal cutoff is of order n 1{p2α`1q . An interesting and practically important property of a rate-optimal projection estimator using a trigonometric basis is that its derivative is elementary calculated and it is a rate-optimal estimate of the derivative of an estimated function. This is due to the fact that the optimal cutoff is the same for a function and its derivative, see [15]. Knowing this fact will be handy in understanding the next Section 4. In applications smoothness of qpxq, defined by parameter α, is unknown and then numerical procedures for data-driven (adaptive) choosing M n are developed based on a variance-bias tradeoff. The main technical element here is to note that due to the Parseval identity a bias in (3.13) can be written as B n " ş 1 0 q 2 pxqdx´ř Mn j"0 κ 2 j . Now note that ş 1 0 q 2 pxqdx does not depend on M n , and accordingly the problem of finding a data-driven cutoff is converted into minimizing ř M j"0 rE q tps κ j´κj q 2 u´κ 2 j s with respect to M . Another interesting approach, popular for wavelet bases, is to use a thresholding like the one in the first sum on the right side of (3.9). A thresholding may yield almost rate optimal (within a logarithmic factor) estimation. The interested reader can find more about rate-optimal estimation in books [13,35].
The problem becomes more complex if sharp-minimax is of interest when the aim is to achieve both optimal rate and constant of the MISE convergence. For a global Sobolev class Fpα, Qq defined in (2.2), a linear Pinsker's oracle r q˚pxq :" Jnpα,Qq ÿ j"0 r1´pj{J n pα, Qqq α sp κ j ϕ j pxq. (3.14) is sharp-minimax whenever the Fourier estimator p κ j is efficient, namely Here J n pα, Qq is a special cutoff specific for an underlying Sobolev class, and d 1 is a coefficient of difficulty specific for an underlying statistical problem. For instance, for estimation of the density f X pxq based on direct observations of X this coefficient is 1. Note that the linear oracle (3.14) "smooths" Fourier estimates. While being very simple, the linear oracle has two drawbacks. The former is that it is possible but extremely difficult to estimate the cutoff, the latter is that derivative of the linear oracle is not a sharp-minimax estimate of the derivative of qpxq. Nonetheless, the Pinsker's linear oracle has been a motivation for a blockwiseshrinkage adaptation used by the proposed estimator (3.9). Let us explain the underlying idea following [11] where the problem of density estimation based on direct observations is considered. First, the Pinsker smoothing coefficient p1´pj{J n pα, Qqq α q is dominated by an oracle-coefficient κ 2 j {rκ 2 j`d 1 n´1s where d 1 " 1 is the coefficient of difficulty for the density estimation problem. Second, we may estimate the oracle-coefficient but the accuracy is not sufficient for its mimicking. Instead, using the fact that the Pinsker's smoothing coefficients with neighboring frequencies j are close to each other, we can propose a single smoothing coefficient for a block of Fourier coefficients. Consider a block of frequencies B which includes L frequencies, then the corresponding smoothing blockwise oracle-coefficient is . (3.16) Due to the averaging over the block, the Sobolev functional L´1 ř jPB κ 2 j can be estimated with a sufficient accuracy for sharp-minimax estimation. The oraclecoefficient w˚is convenient when the coefficient of difficulty is known as in the density estimation for direct observations. Otherwise, it may be more convenient to mimic a closely related, due to (3.15), an oracle-coefficient A nice feature of this oracle-coefficient is that no estimation of d 1 is required, and more discussion and specific results can be found in Section 6, see Lemmas 6.3-6.6. Further, it is established in [12] that derivative of a sharp-minimax blockwise estimator is a sharp-minimax estimate of the derivative. This is another attractive feature of a blockwise adaptive estimation that will be used shortly in Section 4. This ends our brief overview of orthogonal series estimation which sheds light on the proposed methodology, and the interested reader can be find more information in books [13,15,35]. Now let us comment on specifics of the proposed estimator (3.9). We are explaining statistics in the order as they were introduced. The pilot Fourier estimate (3.2) is a simple sample mean estimate based on the formula (2.1), its properties are highlighted in Lemma 6.1 of Section 6. Using the terminology of [24] we may say that the pilot estimate is based on "cases". The pilot cdf estimator (3.3) is the above-discussed projection series estimator with cutoff s and "removed" frequency j ď s. As a result, we have important relation ş 1 0 q F´jpxqϕ j pxqdx " 0 for all j " 0, 1, . . . Another property of the pilot estimator is that it is unbiased estimate of F´jpxq defined in (3.4), and also see Corollary 6.1 in Section 6. Note that the mean squared error of the pilot Fourier estimate is , while the mean squared error of the proposed Fourier estimate (3.5) is dn´1p1`o j p1q`o n p1qq, see also Lemmas 6.1 and 6.2. As a result, recalling (3.15) we may refer to the Fourier estimate p θ j as efficient. Note how subtracting the pilot cdf estimate from the status Δ in (3.5) decreased the mean squared error. Now we are ready to look at the used blockwise smoothing coefficients. If we replace in statistic (3.6) the pilot estimate q F´j by F´j, then this statistic becomes U-statistic and unbiased estimate of the Sobolev functional Θ k :" L´1 k ř jPB k θ 2 j , see a discussion in the proof of Lemma 6.6 in Section 6. This fact, together with the oracle's smoothing coefficient (3.17), explain the blockwise smoothing used in the second sum on the right side of (3.9). Lemma 6.4 in Section 6 sheds light on relationship between statistics p Θ k and r Θ k , while Lemma 6.5 explains how well the blockwise shrinkage mimics a blockwise oracle in L 2 -norm. The used thresholding is necessary to attain sharpminimaxity as shown in [11]. The choice of r n in (3.9) is such that the estimator includes all frequencies that a sharp-minimax oracle does. Now let us comment on the low-frequency component in (3.9). It is created solely for small samples using recommendation of [13], and its effect on the asymptotic MISE is negligible, see details in (6.27), (6.30) and Corollary 6.2 in Section 6. Further, thresholding Ip p θ 2 j ą 2 r dsn´1q may be skipped or replaced by Ip p θ 2 j ą Csn´1q with no effect on the sharp-minimax. The used estimate (3.8) may be replaced by others. For instance, note that Et p θ 2 j u " dn´1r1`o j p1q`o n p1qs, and hence using a sample mean or sample median of t p θ 2 j , j " s`1, . . . , 2su is applicable. The reader familiar with wavelet estimators may recall that the median approach is popular in wavelet statistical packages. Overall, there is a large flexibility in choosing a lower frequency component of a blockwise shrinkage estimator, and in (3.9) this component is chosen based on recommendations of [13] and analysis of examples presented in Section 5. We will continue the discussion in Section 7.

Density estimation
There are two classical approaches to the problem of density estimation, and we are considering them in turn. The former is to propose a smooth estimate of cdf and then take derivative. This approach looks natural for CSC data due to the underlying likelihood (2.1). An interesting example of using this approach for CSC data is [19] where derivative of a maximum smoothed likelihood cdf estimate is used to estimate the density. Under the assumption that the third derivative of F X pxq is continuous, it is shown that the density estimate attains the rate n´4 {7 . The latter approach is to bypass estimation of the cdf and consider density estimation as a self-defined nonparametric problem, recall literature review in the Introduciton. We consider these approaches in turn.
Using estimates (3.9) and (3.11) we can define a density estimate Next proposition follows from [12] and results of Section 3.

Corollary 4.1 (Upper Bound for the MISE of Density Estimator (4.1)).
Let Assumption 1 hold and α ě 2. Then This result presents good and bad news about density estimation for CSC data. The good one is that adaptive sharp-minimax estimation is possible. The bad one is that rate of the MISE convergence is dramatically slower than for the case of a direct sampling from X. Indeed, recall that if density f X pxq has γ derivatives then based on a direct sample of size n it may be estimated with the rate n´2 γ{p2γ`1q . In (4.2) α is the number of derivatives of F X pxq, and accordingly density f X pxq has γ " α´1 derivatives. Then Corollary 4.1 asserts that the best rate of the MISE convergence for a CSC sample is n´2 γ{p2γ`3q . According to [13], for a direct sample of size n from X this rate is the same as for estimation of a trivariate density having γ derivatives in each variable. Now recall a familiar curse of multidimensionality in nonparametric estimation, see a discussion in [35], and then Corollary 4.1 sheds a new light on complexity of CSC data analysis. We will continue this discussion in Section 5.
Under the second approach, when estimation of cdf is bypassed, we estimate density f X pxq directly. The proposed density estimator is again a blockwiseshrinkage cosine series estimator discussed in Section 3, and we continue to use notation of that section. Further, for the reader's convenience we will compare steps in construction of a blockwise density estimator with those for the cdf estimator (3.9).
The main principal step is to understand how Fourier coefficients of the density f X can be estimated. We know that ζ 0 " ş 1 0 f X pxqdx " 1 because X is supported on r0, 1s. Accordingly, we need to propose a Fourier estimator for j ě 1, and from now on we are considering only j ě 1. Using integration by parts we can write, In the last equality we used F X p0q " 0, F X p1q " 1, and ϕ p1q j pxq " p´πjqψ j pxq. Here ψ j pxq :" 2 1{2 sinpπjxq, j " 1, 2, . . . . (4.5) are elements of the classical sine basis on r0, 1s. Now note that the expectation on the right side of (4.4) can be estimated by a sample mean estimate. This yields a pilot Fourier estimator (compare with (3.2)) Note how similar pilot Fourier estimators (4.6) and (3.2) are despite different estimands ζ j and θ j , respectively. They are both unbiased estimates of their estimands, but a critical difference is in the factor πj in (4.6) which yields an increasing by factor pπjq 2 mean squared error (MSE) of q ζ j while the MSE of q θ j is bounded.
Following (3.3), we use Fourier estimates (4.6) to construct a pilot cdf estimator Note that p F´jpxq is unbiased estimate of a function This function mimics F´jpxq defined in (3.4) only here the sine basis is used in place of the cosine basis. The sine basis is used because, due to using sines in (4.6), the new pilot estimate and the approximation F˚j of F X pxq have the desired property Note that (4.9) matches the same property of q F´jpxq and F´jpxq with respect to ϕ j pxq, see the paragraph below line (3.4).

Following (3.5) we define a Fourier estimator
It is easy to check that if in (4.10) the estimate p F´jpZ l q is replaced by F˚jpZ l q, then due to (4.9) estimate p ζ j becomes unbiased estimate of ζ j . Another important property of the Fourier estimate will be presented shortly in Lemma 4.1, and also see Lemmas 6.7 and 6.8 in Section 6. Now we introduce three new statistics. The first one is analog of r Θ k , The second statistic is analog of p Θ k , The third statistic is analog of r d, Now we are in a position to introduce a blockwise density estimator that mimics the cdf estimator (3.9), Finally, if f Z pzq is unknown, then we use its estimate s f Z pzq defined in (3.11).

Let Assumption 1 hold and
We may conclude that the both methods (taking derivative of the blockwise cdf estimate and a direct blockwise density estimation) lead to a sharp-minimax estimation that matches performance of oracles. Overall the second method (direct density estimation) is preferable because this technique is widely used and many innovations are developed, see [13,15,35]. Another comment is that integration of the proposed density estimator yields a sharp-minimax cdf estimator. Now let us present a technical result which explains our choice of Fourier estimator p ζ j and sheds light on Theorem 4.1.
The interested reader may also compare statistical properties of the Fourier estimates q ζ j and p ζ l using Lemmas 6.7 and 6.8 in Section 6.
is the coefficient of difficulty for cdf estimation defined in (2.6). On the right side of (4.16) we see this coefficient of difficulty multiplied by factor pπjq 2 which is due to ş 1 0 rdϕ j pxq{dxs 2 dx " pπjq 2 . This observation explains similarity between the two approaches for density estimation. Lemma 4.1 also helps us to understand why for F X P Fpα, Qq rate optimal cdf and density projection estimators can use the same cutoff J n pαq. Indeed, let us make several simple calculations. The MISE of a projection cdf estimator s F pxq :" ř J j"0 p θ j ϕ j pxq is of order Jn´1`J´2 α . The latter yields rate-optimal cutoff J n pαq " J 1{p2α`1q and the rate n´2 α{p2α`1q for the MISE convergence. The MISE of a projection density estimator s f pxq :" 1`ř J j"1 p ζ j ϕ j pxq is of order ř J j"1 pπjq 2 n´1`J´2 pα´1q . This yields the same rate-optimal cutoff J n pαq " n 1{p2α`1q and the corresponding rate n´2 pα´1q{p2α`1q for the MISE convergence. Note that the above-presented rates of MISE convergence are optimal according to Theorem 2.1.
Let us finish our theoretical sections with a practical comment. In general a series estimate may be non bona fide. If this is an issue, then an L 2 -projection on a corresponding bona fide class may be performed, see a discussion in book [13] and use its R software.

Examples
The section sheds additional light on estimation for CSC data via analysis of real and simulated data. We begin with real-data examples.
Environmental company BIFAR has been interested in exploring aerobic treatment of municipal wastewater, see a discussion of the treatment and BI-FAR's CSC experiments in [16]. In one of the experiments, a random variable of interest was time X when a chemical pollutant appears at a sludge tank. Because it was impossible to observe the time directly, a CSC study was conducted. BIFAR's CSC observations pZ 1 , Δ 1 q, . . . , pZ n , Δ n q are shown in the top diagram in Figure 1. Before proceeding to estimates, let us look at the data and try to guess an underlying density. Recall that guessing an underlying density is often possible for direct observations, and this is a recommended first step in nonparametric estimation, see chapter 3 in [13]. In the diagram we can directly observe monitoring times, and it is reasonable to assume that an underlying density is uniform. The estimate s f Z pzq (the solid line) supports the conclusion. Now let us try to analyze the CSC data. Because Δ " IpX ď Zq and all Δs with Z ą 0.65 are equal to 1, the CSC data, together with the uniform s f Z , tells us that an underlying density has a vanishing right tail. Next we note that Δs with Z ă 0.3 are zero and hence all corresponding underlying times of interest satisfy X ą Z. This points upon a vanishing left tail. Unfortunately, there is nothing else that a visualization can produce. In particular, it is difficult to answer questions about symmetry or multimodality. This complexity is explained by formula (2.1) implying that to visualize an underlying density, one first need to visualize an underlying cdf and then visualize its derivative. The derivative step is too complicated for visual analysis. We may conclude that only statistical estimators may help us to gain understanding of CSC data.
The middle and bottom diagrams exhibit pilot (the dashed line) and proposed (the solid line) estimators of the density and cdf. Recall that all estimates are series estimates and they are consistent, the proposed estimator is more accurate while the pilot estimator is very simple. For the data at hand, based on the estimated F X and f Z , the ratio of the standard deviation of the pilot Fourier estimator (3.2) to the standard deviation of the proposed Fourier estimator (3.5) is 2.3. This sheds light on improvement in estimating Fourier coefficients. Further, let us look at the two density estimates. The pilot estimate is skewed to the left and asymmetric. At the same time, the proposed estimate is unimodal and symmetric. BIFAR concurred the symmetric and unimodal shape of an underlying density of interest.
Our second example is more interesting in terms of the shape of an underlying distribution, and it involves a larger sample size. Operating a motor vehicle while fatigued or sleepy is commonly referred to as "drowsy driving." There is an extensive literature devoted to the topic with a primary interest in understanding the drowsy driving, driver assistance technologies in vehicles, and the driver's fatigue detection by monitoring systems, see reviews and interesting data in [31,38]. Here our aim is less ambitious, and we would like to evaluate distribution of a driving time X until first yawning. This is a simple and inexpensive CSC experiment which uses a call to a driver and a question about yawning prior to the call. CSC data for n " 485 commercial drivers are shown in the top diagram in Figure 2. The structure of Figure 2 is identical to Figure  1, and this allows us to compare the two CSC datasets. We again see vanishing tails, but the difference is in asymmetry of the fatigue data. There are too many observations to visualize an underlying density f Z , and the estimate s f Z (the solid line) indicates an increasing density. Based on the top diagram, it is difficult to add something else to these remarks. Now let us look at the middle diagram. The left tail of the pilot estimate (the dashed line) looks strange, there is no reason to believe that it is correct. The more accurate proposed density estimate (the solid line) tells us an interesting story about the data. It reveals two modes in the lifetime of interest X. A reasonable explanation is that the left and right modes are created by driving at night and daytime, correspondingly. The bimodal density is also supported by a known fact (see the above-mentioned literature) that a driver three times more likely to have a fatal accident at night than during the day. Let us also note that the pilot estimate is a smoothed version of the proposed estimate. The bottom diagram shows us estimates of cdf. Here the ratio of the standard deviation of the pilot Fourier estimator to the standard deviation of the proposed Fourier estimator is 2.2. Now let us present results of a numerical study when we know an underlying model. We would like to understand how CSC effects density estimation and relative performance of the proposed CSC density estimator. Figure 3 exhibits two simulations with underlying densities shown by the solid lines in the bottom diagrams. In the left diagram the density is unimodal and in the right it is bimodal. In both experiments the monitoring time is uniform. Considered estimates are: (i) Estimate of [13] based on an underlying sample from X. Recall that we are dealing with simulated CSC and hence know underlying lifetimes of interest. Let us refer to this estimate as an oracle. The oracle is shown by the dashed line; (ii) Proposed estimate. The estimate is shown by the dotted line; (iii) Aggregated estimate of [15]. This estimate aggregates two series estimates based on observations with Δ " 1 and Δ " 0, respectively, and this is a reliable rate optimal CSC density estimate supported by the software [15]. Now we repeat a simulation 5000 times, for each simulation and each estimate calculate an integrated squared error (ISE), calculate ratio A of ISE of the proposed estimate to ISE of the oracle, calculate ratio B of ISE of the aggregated estimate to ISE of the oracle, and then calculate sample means s A and s B of the  [13]) based on an underlying sample from X, the proposed estimate, and aggregated estimate of [15], respectively. ratios over 5000 simulations. Results are shown in Table 1, and the underlying models are as follows. Model 1 is for the unimodal underlying density, and the left column in Figure 3 exhibits a particular simulation. Model 2 is for the bimodal underlying density, and the right column in Figure 3 Table 1 corresponds to the unimodal underlying density of interest, and model 4 to the bimodal underlying density of interest shown by solid lines in the bottom diagrams of Figure 3, respectively. Let us look at the presented results. As we already know from the theory, small samples may present only onset of ill-posedness, and the results support this possibility. For larger samples CSC creates more dramatic complications with respect to direct observations. Relative performance of the two CSC density estimators becomes worse with increased sample size, and this outcome coincides with the theory. Recall that asymptotically CSC density estimation is equivalent to estimating a trivariate density, and the results reflect this. On a positive note, the study shows that the used series density estimator is robust toward an underlying density of the monitoring time, and there is only a minor bump in performance of the estimator when the density of monitoring time changes from the uniform to the unimodal. To see the latter, compare outcomes for odd and even models. Another interesting observation is that the relative performance of the CSC estimates is better for the bimodal underlying density than for the uniform underlying density. The latter is due to the fact that even for the oracle estimation of the bimodal density is a notoriously complicated task, see a discussion in [13]. In conclusion, the examples show that estimation based on CSC data is a feasible but ill-posed task. Fortunately, the ill-posedness is relatively mild for small samples. Accordingly practitioners should not shine from employing CSC experiments and nonparametric analysis of CSC data because visualization of nonparametric density estimates may shed a useful light on the lifetime of interest.

Proofs
In what follows Cs are generic finite constants, d is the coefficient of difficulty CSC defined in (2.6), E F X t¨u and V F X p¨q denote the expectation and the variance given the underlying cdf F X whenever we would like to stress an underlying cdf, and we continue using notations introduced at the end of the Introduction.
Proof of Theorem 3.1. The proof is structured via a sequence of technical lemmas. This simplifies understanding main steps in the proof. We begin with evaluation of the mean and variance of the pilot estimator q θ j of Fourier coefficient θ j defined in (3.2) and (3.1), respectively. Lemma 6.1. Let Assumption 1 hold. Then the pilot estimator q θ j of θ j is unbi-ased and rate optimal, namely and Proof of Lemma 6.1. For the expectation we write using (2.1), This verifies unbiasedness of the pilot Fourier estimator q θ j . Variance of the pilot Fourier estimator is We conclude that the pilot Fourier estimator is unbiased and rate optimal. Lemma 6.1 is verified. Now let us consider Fourier estimator p θ j defined in (3.5). Next proposition sheds light on why this estimator may be referred to as efficient for the considered nonparametric problem of the cdf estimation. Lemma 6.2. Let Assumption 1 hold and F X P Fpα, Qq. Then and Here d "

Remark 6.1.
In what follows it is sufficient to have dn´1r1`o j p1q`o n p1qs on the right side of (6.4)

Proof of Lemma 6.2. Recall statistic q
F´jpxq is defined in line (3.3) and used to estimate function F´jpxq which approximates F X pxq and is defined in (3.4). Write, Let us evaluate mean and mean squared error (MSE) of the two terms in turn. Using (2.1) and a straightforward calculation we get for the mean, We conclude that p θ 1j is unbiased estimate of θ j . Using (2.1) and Δ 2 l " Δ l we write for MSE, Now we write F´jpxq as F X pxq`rF´jpxq´F X pxqs and continue To evaluate v 1 we use trigonometric formula 2 cos 2 pγq " 1`cosp2γq which implies ϕ 2 j pxq " 1`2´1 {2 ϕ 2j pxq. (6.8) We also use differentiability of both F X pxq and f Z pxq, and relation (2.2.5) in [13]. We get Note that we need only c 2 j ă C to verify (6.4). To evaluate v 2 we first note that F X pxq´F´jpxq " θ j ϕ j pxqIpj ď sq`ÿ rąs θ r ϕ r pxq. (6.10) Second we make several preliminary calculations. We begin with To evaluate the right side of (6.11) we use ϕ i pxqϕ 2j pxq " 2´1 {2 rϕ j pxq`ϕ 3j pxqs and inequality (2.2.8) in [13]. We get |θ j κ j |Ipj ď sq ď Cpj`1q´2Ipj ď sq.
Our next preliminary calculation is for the second term on the right side of (6.10). Write, Using Cauchy-Schwarz inequality and F X P Fpα, Qq we evaluate the right side of (6.12) by | ÿ Now we have all technical relations needed for the analysis v 2 . Using (6.10)-(6.13) yields v 2 ď Cn´1rpj`1q´2`s´1s. (6.14) The last term to evaluate on the right side of (6.7) is v 3 . Note that F 2 j pxq " rF X pxqs 2´2 F X pxqrθ j ϕ j pxqIpj ď sq`ÿ rąs θ r ϕ r pxqs rθ j ϕ j pxqIpj ď sq`ÿ rąs θ r ϕ r pxqs 2 .
This yields that Combining (6.9), (6.14) and (6.15) on the right side of (6.7) we get This ends our analysis of p θ 1j . Now we are evaluating the mean and MSE of p θ 2j defined in (6.5). First of all a comment is due. This is not a simple task to evaluate this statistic with the required accuracy. On one hand, we have a plain and very strong property Et q θ j´θj q 2 u ď Csn´1, (6.17) which holds due to the Parceval identity and Lemma 6.1. On the other hand, this result alone is not sufficient for verification of Lemma 6.2 and making a conclusion that term p θ 2j is negligible with respect to the main term p θ 1j in (6.5). To simplify formulas for analysis of p θ 2j , we need several new notations. Set S´j :" t0, 1, . . . , suztju for j ď s and S´j :" t0, 1, . . . , su for j ą s, and N´l :" t1, . . . , nuztlu. Using these notations we may rewrite q F´jpZ l q, Using orthogonality of elements tϕ j pxq, j " 0, 1, . . .u of the cosine basis we get Further, using Assumption 1 we get Combining the obtained results yields This and (6.6) yields (6.3). Now we are evaluating the second moment of p θ 2j . First we note that Second we have Third, using notation (6.8) for Z l and Z m where l ‰ m and l, m P t1, 2, . . . , nu, we write This expansion and (6.17) allows us to write for any 1 ď l ă m ď n, We need an extra notation to analyze the last expectation in (6.24). Following (6.18) we set The reason for this new decomposition of q F 1 pZ l q is that q F 1 pZ l ,´mq does not depend on pΔ l , Z l , Δ m , Z m q and | q F 1 pZ l , mq| ď Csn´1 almost sure. Using this notation we write, To continue evaluation of A 1 1 we need several technical relations. Using (6.19) we can write, Recall the rough inequality |r q F 1 pZ l , mq| ď Csn´1, and using the Cauchy-Schwarz inequality together with (6.21) we get We conclude that A 1 1 ď Cs 3{2 n´3 {2 . Analysis of q A 2 is similar. Using the Cauchy-Schwarz inequality we get Due to symmetry the same inequality holds for q A 3 . Finally, we have E F X t| q A 4 |u ď Cs 2 n´2. Combining the above-presented relations we get As we see, the component p θ 2j is indeed negligible with respect to p θ 1j . Lemma 6.2 is proved. Corollary 6.1. There are three technical conclusions from the above-presented proof. The first one is that according to (6.5) we have a representation p θ j " p θ j1`p θ j2 where the second term is negligible and may be skipped. Using (6.19), a similar conclusion can be made for the pilot cdf estimate q F´jpZ l q " q F 1 pZ l qq F 2 pZ l q where the second term q F 2 pZ l q is negligible. Further, note that the pivotal estimate is based on at most s estimated Fourier coefficients, and then for any integer k we have the following directly verified rough inequality The reader, who wants to understand the following proof without going into detailed analysis of negligible terms, may replace p θ j by p θ 1j and q F X j by F X j . Finally, in Lemma 6.2 the upper bound (6.4) may be replaced by the right side of (6.16). While we do not need that accuracy, it is an interesting upper bound that sheds light on the Fourier estimator which yields sharp minimax cdf estimation. Recall that in nonparametric curve estimation literature such a Fourier estimator is called efficient. Now we are ready to begin analysis of the MISE of r F X px, f Z q. Using the Parseval identity we can write, The three terms on the right side of (6.27) correspond to the MISE components on low, middle and high frequencies, and they are explored in turn. Using r d ď 3s and Lemma 6.2 we get for a jth term in A 1 pF X q, Let us consider the last term on the right side of (6.18). Using the Chebyshev inequality and Lemma 6.2 we get, Using (6.29) in (6.28) we conclude that

Corollary 6.2.
As we see, the MISE of the low-frequency part of the proposed cdf estimator is negligible with respect to the verified rate n´2 α{p2α`1q . Another important conclusion is that there is a wide choice of thresholds and a boundary frequency (which is currently s) between the low and middle frequency components of a blockwise estimator that yield sharp-minimax estimation. The currently used boundary frequency s is chosen based on recommendation of [13], because of its simplicity, and using a numerical study of experiments similar to those in Section 5. Discussion of more complicated procedures can be found in [13].
Now we turn our attention to the second term A 2 pF X q in (6.27). This term is the MISE of the block-shrinkage part of the proposed cdf estimator.

Lemma 6.3. Let Assumption 1 hold. Then
Proof of Lemma 6.3. We begin with a remark that the studied risk A 2 pF X q is the MISE of a blockwise estimator r ψpxq :" is oracle's smoothing coefficient, is the Sobolev statistic introduced in Section 3, and d is the coefficient of difficulty (2.6). The Parseval identity allows us to express the oracle's MISE as Et Using Lemma 6.2 we evaluate terms in (6.37). Write, We are considering the three terms on the right side of (6.38) in turn. The first term is simplified into What we see is the MISE of a classical blockwise oracle studied in [11]. Line  To consider T 2 we note that μ k ď 1. This, together with pj`1q´1 ď s´1 for j P B k and (6.39), yield that T 2 ď Cs´1n´2 α{p2α`1q " o n p1qn´2 α{p2α`1q .
The third term T 3 is far from being simple for evaluation because of the factor pΘ k`d n´1q´1 which may be of order n. The idea of evaluating T 3 is to correctly bound the sum in j. To do that we use Lemma 6.2 and Cauchy inequality. Write, 1.
Using this inequality and (6.39) we write, sup F X PF pα,Qq Combining the bounds for T 1 , T 2 and T 3 in (6.38) we conclude that the oracle p ψ˚pxq is sharp-minimax. This ends step 1 of the analysis of A 2 pF X q.
Step 2 is to introduce a new oracle-estimator r ψ˚pxq which is more convenient for studying the blockwise-shrinkage estimate r ψpxq defined in (6.32) and which sheds light on the chosen blockwise smoothing. Set r ψ˚pxq :" , and recall that it is used in the denominator of the smoothing ratio r Θ k { p Θ k of the proposed estimate (3.9). To analyze MISE of the new oracle, the Parseval identity allows us to consider a particular block B k . Write for any constant λ P r0, 1s, Using Lemma 6.2 we continue, We begin with evaluating the last sum on the right side of (6.32). For F X P Fpα, Qq we can write using the Cauchy-Schwarz inequality, For the first two terms on the right side of (6.42) we note that λ˚that mini- Thus λ˚" λ k introduced in (6.41). This and sharp minimaxity of the oracle p ψ˚pxq yields that We conclude that the oracle r ψ˚pxq is sharp-minimax. To continue evaluation of the term A 2 pF X q on the right side of (6.27) we need several more technical lemmas. These lemmas are also of interest on their own because they shed light on the Sobolev statistics introduced in Section 3 and also on blockwise oracles.

Lemma 6.4. Statistics r
Θ k and p Θ k , defined in (3.6) and (3.7), satisfy the following relation If Assumption 1 holds and F X P Fpα, Qq then and

46)
where d is the CSC coefficient of difficulty (2.6).
Proof of Lemma 6.4. We begin with verification of (6.44). Using (3.6) and (3.7) we write, This proves (6.34). The verified inequalities are rough but sufficient for our purposes. We begin with (6.45). Write, The term a 1j was already studied as a component of E F X p p θ 1j´θj q 2 u, see the line below (6.6) and then (6.16). Using notation of (6.16) we can write that The last inequality is valid because c j ă C and j ą s whenever j P B k . Inequality (6.26) implies that L´1 k ř jPB k a 2j ď Cs 2 n´1, and similarly L´1 k ř jPB k |a 3 | ď Csn´1 {2 . Inequality (6.45) is verified. Now let us check inequality (6.46). Write using (6.45), To evaluate the expectation we use the expansion q F´jpZ l q " F´jpZ l q`p q F´jpZ l q´F´jpZ l qq, which allows us to write for a factor in r d k , rΔ l´q F´jpZ l qs 2 " rΔ l´F´j pZ l qs 2 tp q F´jpZ l q´F´jpZ l qqr2pΔ l´q F´jpZ l qq´p q F´jpZ l q´F´jpZ l qqsu. (6.47) Let us look at terms on the right side of (6.47). The term rΔ l´F´j pZ l qs 2 depends only on pair pΔ l , Z l q, and hence the central second moment of a sample mean n´1 n ÿ l"1 " L´1 k ÿ jPB k pΔ l´F´j pZ l qq 2 ϕ 2 j pZ l q pf Z pZ l qq 2 ı is proportional to n´1. Further, the term in curly brackets in (6.47) is negligibly small with respect to s´1 due to inequality E F X tp q F´jpZ l q´F´jpZ l qq 2k u ď Cs 2k n´k, see (6.16). Inequality (6.36) is verified. Lemma 6.4 is proved.

Lemma 6.5. Let Assumption 1 hold. Recall notations
and p ψ˚pxq :" Proof of Lemma 6.5. The Parseval identity yields, Using these facts we continue, We begin with analysis of D 1 . Using Lemma 6.4 we may write, Now we evaluate expectations of terms on the right side of (6.52). To do that we need the following technical result that will be proved later. Lemma 6.6. Let Assumption 1 hold and F X P Fpα, Qq. Then Using (6.46) and (6.53) to evaluate expectation of the right side of (6.52) we get, In the last line we used sr n ď Cs 4 and (6.39). Now we are estimating the expectation of D 2 . Write using (6.44), Using Lemma 6.4 and (6.39) we get, To evaluate the expectation on the right side of (6.55) we use the Chebyshev inequality and Lemma 6.6. Write, Now note that x{px`dn´1q is increasing function in x ą 0. Using this observation and (6.56) in (6.55) we conclude that In the last relation we used the earlier mentioned inequality r n ă Cs 2 . Using (6.54) and (6.57) in (6.51) verifies Lemma 6.5 given validity of Lemma 6.6.
Proof of Lemma 6.6. Consider a fixed k P t1, 2, . . . , r n u and introduce an oracle This oracle is a U-statistic which is unbiased estimate of Θ k and a benchmark for the studied statistic While it is possible to analyze variance of r U k directly following [11], it is faster and simpler to follow a standard methodology of calculating moments of a U -statistic. To do that, for the reader's convenience we will use terminology, notation and results of the book [30]. First of all, note that the U-statistic r U k is based on a symmetric kernel h k ppΔ l , Z l q, pΔ l 1 , Z l 1 qq A relation between the studied statistic and the oracle is straightforward, Recall that for considered j P B k neither F´jpZq nor q F´jpZq depend on j, but the notation is still useful because it reminds us that Note that E F X t r U k u " Θ k , and accordingly our first step is to estimate variance of r U k . In [30] there is an explicit formula for the variance, and to introduce it we need several more notations of [30]. Set X :" pΔ, Zq, X l :" pΔ l , Z l q, pΔ´F´jpZqqϕ j pZq f Z pZq , (6.63) Using new notations we can write, The Hoeffding lemma (section 5.1.4 in [30]) gives us a representation We need to introduce several more notations. Set Using trigonometric formula we continue simplification of the expression for ζ 1 , Here we used the fact that F´jpxq " F´ipxq for the considered i and j from B k . Similarly set The Hoeffding formula (Lemma A in section 5.2.1 of [30]) yields that This is an exact formula, and while we can use it for calculating the variance, here we need only a rough upper bound. Note that a function has a bounded derivative whenever z P r0, 1s. This yields that Fourier coefficients of the function are absolutely summable, and the reader may recall that this property is the Bernstein inequality, see [13]. This fact, together with inequality 2|θ j θ i | ď θ 2 j`θ 2 i , yield a rough inequality Similarly we conclude that Combining the results in (6.75) we get We proved that the U-statistic r U k satisfies the verified inequality for the variance of r Θ k . Further, this is the main term in the expansion (6.61) as we will see shortly. The interested reader may note that [30] gives a simple upper bound C˚n´1 for the variance, but the issue is that we need to get a specific constant C˚for the considered U-statistic which is CL´1 k rΘ k`n´1 s. This is why all these lengthy calculations are presented. Now we are evaluating the second moment of the term T 1 on the right side of (6.61). This is not a U-statistic and we need to evaluate it directly. Nonetheless, similarly to analysis of a U-statistic the main tools are combinatoric and combining similar terms in a few groups. Several new notations are needed. Instead of a summation over l ă l 1 , we consider the summation over l 1 ă l 2 , and when T 2 1 is considered we use a double sum over l 1 ă l 2 and l 3 ă l 4 . Second, recall notation (6.18) and write, q F´jpZ l q " Let us comment on (6.80). First of all, for the considered j P B k neither q F 1 pZ l q nor q F 2 pZ l q depend on j. Second, E F X t q F 1 pZ l qu " 0. Finally, q F 2 pZ l q depends only on pΔ l , Z l q and not on any other observation pΔ t , Z t q with t ‰ l, and also | q F 2 pZ l q| ď Csn´1 almost sure. These facts will help us to analyze the second moment of T 1 .
When we square T 1 , we get where We can also rewrite (6.82) in a more compact form, What we want to show is that Similarly to analysis of a U-statistic, while there are of order n 4 terms EtT 1 pl 1 , l 2 , l 3 , l 4 qu in the sum (6.84), they may be grouped in 4 categories that we are considering in turn. The first one is when l 1 ‰ l 2 ‰ l 3 ‰ l 4 , and recall that always l 1 ă l 2 and l 3 ă l 4 . This is the largest category that contains of order n 4 terms. We want to show that each term from this category satisfies EtT 1 pl 1 , l 2 , l 3 , l 4 qu ď CL k n´1pΘ k`n´1 q, l 1 ‰ l 2 ‰ l 3 ‰ l 4 . (6.85) Let us explain why such a term satisfies (6.85). The analysis is primarily based on two facts. The former is that The latter is that Now we can analyze a term. In (6.83) we have L 2 k terms that include a factor rF´jpZ l2 q´q F 1 pZ l2 qs and L 2 k terms that include a factor q F 2 pZ l2 q. Consider the first type of terms. Such a term is not zero due to (6.86) if there is no extra factor containing Z l2 . This implies that the only possible nonzero term is Repeating the same argument of using (6.86), only now with Z l4 , the term T 12 is simplified into .
(6.89) Using (6.87) we simplify (6.89) into We conclude that the first type of terms in T 2 1 satisfies (6.85). Let us also note that using a more thorough analysis of the expectation in (6.90) as a Fourier coefficient, inequality (6.91) can be improved in order. Now consider expectation of the second type with the factor´q F 2 pZ l2 q, . (6.92) Recall that q F 2 pZ l2 q is a function only in pΔ l2 , Z l2 q, and then (6.86) used for l " l 4 yields that Then identically to (6.91) we get the wished upper bound for T 1 12 . This and (6.91) yield the wished (6.85) for the first category of indexes where l 1 ‰ l 2 ‰ l 3 ‰ l 4 . Note that we can improve in order the upper bound by considering the expectations on the right side of (6.90) and (6.93) as Fourier coefficients, but the obtained rough inequalities are sufficient for our purposes. Now we are considering terms in (6.81) from the second group where l 1 ă l 2 " l 3 ă l 4 or l 1 " l 4 and l 2 ‰ l 3 . We present analysis of the former case because the latter is analyzed similarly. There are of order n 3 such terms, accordingly it is sufficient to show that the expectation of a particular term T 1 pl 1 , l 2 , l 2 , l 4 q is bounded by CL k pΘ k`n´1 q. Using (6.83) we can write that . (6.94) Using (6.87) and (6.86) with l " l 4 we simplify the last expectation into To evaluate the first sum on the right side of (6.95) we use a relation following from the Cauchy-Schwarz inequality, Et The inequalities imply the wished upper bound. Similarly, we use the previous inequality and | q F 2 pzq| ď Csn´1 to get the wished upper bound for the second sum in (6.95). This ends evaluation of a term from the second category. Note that we have established a stronger in order inequality than needed.
The third category of terms is when l 1 " l 3 , l 1 ă l 2 , l 3 ă l 4 and l 2 ‰ l 4 . Similarly to the second category, there are of order n 3 terms in the category, and we need to verify the same upper bound for a term. Write for such a term, . (6.98) Here we simply repeat analysis of a term from the first category, namely we get (6.89) and (6.92) only now with l 1 " l 3 , and this yields the wished upper bound. The final fourth category is when l 1 " l 3 and l 2 " l 4 . There are of order n´2 terms in this category. For a particular term we have . (6.99) The wished upper bound follows immediately from (6.26).
Combing the upper bounds for the four categories of terms we verify that sup F X PF pα,Qq We are left with evaluating the second moment of T 2 defined in (6.61). We need to show that First of all, let us note that using (6.26) we get E F X tT 2 2 u ă Cs 2 n´2 which is close to what we want. To get (6.101), we use (6.86) and follow our analysis of T 12 which gives us a very rough but sufficient upper bound Cn´5 {2 s 4 . This and s 4 L rn 0 " o n p1qn 1{2 prove (6.101).
Proof of Theorem 3.2. The assertion of Theorem 3.2 is not obvious and intriguing because according to (2.1) the joint density f Z,Δ pz, δq of observed pair pZ, Δq is equal to the product of an unknown density f Z pzq and rF X pzqs δ r1F X pzqs 1´δ . Accordingly, it is natural to conjecture that, for efficient estimation of F X pzq, an unknown density f Z pzq of the monitoring time should be as smooth as the cdf of interest F X pzq because the smaller smoothness defines the smoothness of f Z,Δ pz, δq in z. Fortunately, as we will establish shortly, the latter is not the case and it suffices for the density f Z pzq to be differentiable regardless of how smooth the cdf is. This is a remarkable outcome in comparison with [16], and it will be explained why this is the case.
We begin with recalling Lemma 1 in [14] which allows us to evaluate quality of estimators s f Z pzq and q f Z pzq for any positive integer t, Because the density estimator is plugged in denominator, we use an expansion that will allow us to utilize (6.102), (6.103) We begin our analysis with the plugged-in pilot Fourier estimator. Using (6.103) we write, Here q θ j is the estimate based on the underlying f Z and studied in the proof of Theorem 3.1, see Lemma 6.1.
It is simple to evaluate A j2 using (6.102), Our next step is to evaluate the second moment of the term A j1 in (6.104). Write, According to Assumption 1, density f Z pzq is bounded below from zero, and Evaluation of B j1 is more involved because we only assume that the density f Z pzq is differentiable. Write, f Z pzq´q f Z pzq " rpn´2qn´1f Z pzq´r h 1 pzqs`r2n´1f Z pzq´r h 2 pzqs.
Using this representation we can rewrite B j12 as Next we consider D j1 , Then r h 1 pzq " pn´2qn´1 ř Jn i"1 r κ i ϕ i pzq, and for the difference in (6.111) we get Substituting (6.113) in (6.111) and then using the Cauchy inequality we continue (6.111), To evaluate the first term, recall a familiar trigonometric formula ϕ j pzqϕ i pzq " 2´1 {2 rϕ j´i pzq`ϕ j`i pzqs, definitions (6.112), and write for the expectation, Now we can finish evaluation of D j11 . Set, with the use of the above-presented trigonometric formula, Using (6.115) and (6.116) we get The last inequality holds due to the famous Bernstein inequality which states that Fourier coefficients of differentiable functions are absolutely summable. Furthermore, Lipschitz functions of order larger than 1/2 can be considered as well, see [13]. Now consider D j12 . Using (6.116) and the Cauchy-Schwarz inequality yields Using the obtained results in (6.114) we conclude that D j1 ď Cn´1. (6.119) As we will see shortly, this is the dominant term in A j1 . We are left with evaluation of the cross-product term D j2 defined in (6.110). Write, Several remarks about terms in (6.120) are due. Using (6.113) and the previously introduced notation we conclude that Further, we have n´1|2f Z pzq´ř Jn i"0 ϕ 2 i pzq| ď CJ n n´1. Using (6.116), Cauchy-Schwarz and Bessel inequalities yield Combining the results in (6.120) we conclude that D j2 " o n p1qn´1. (6.123) As we have verified, the dominant term in A j1 is D j1 . Using the obtained results in (6.110) yields B j12 ď Cn´1, and together with already evaluated B j11 " o n p1qn´3 {2 and (6.108) we get B j1 ď Cn´1. Using this inequality and (6.107) in (6.106), and the obtained result, (6.104) and (6.105) we conclude that uniformly over all considered cumulative distribution functions of interest F X and densities f Z of the monitoring time we have This inequality yields that a projection estimate of an underlying cdf F X pxq, based on either r θj or r θ j yields the same rate of the MISE convergence. Also note that in (6.124) the subscript emphasizes that both F X and f Z must be known to calculate the expectation, and the constant C is the same for all considered F X . Now we are considering the data-driven Fourier estimate p θ j , defined in (3.5) and where in place of f Z pzq we use s f Z pzq. Namely, we are exploring the datadriven Fourier estimator Here q F´jpzq is defined in (3.3), and it estimates a function F´jpzq defined in (3.4). The difference between the already studied Fourier estimator q θj and the new Fourier estimator p θj (compare (6.104) and (6.125)) is that in p θj the factor Δ l is replaced by Δ l´q F´jpZ l q. The underlying idea is that with the help of this replacement the second moment inequality (6.124) for q θj´q θ j can be replaced by and This result is the key in proving the sharp minimax assertion of Theorem 3.2. A comment is due. Inequality (6.127) states that p θj mimics p θ j with the MSE of order n´1, and this result is well expected. A challenging part is the case j ą s in (6.126). It follows from the above-presented proof of (6.124) that the term that prevented us from establishing (6.126) is D j1 which is of order n´1, while all other terms are of the wished order o n p1qn´1. Accordingly, we are considering pΔ l´q F´jpZ l qqϕ j pZ l q rf Z pZ l qs 2 r pn´2 n f Z pZ l q´r h 1 pZ l qs ) (6.128) and would like to show that To verify (6.129), we rewrite the new factor as Δ l´q F´jpZ l q " rΔ l´F´j pZ l qs`rF´jpZ l q´q F´jpZ l qs ": W 1 pΔ l , Z l q`W 2 pZ l q. (6.130) For W 2 pZ l q we have W 2 pZ l q " F´jpZ l q´q F´jpZ l q " ÿ iPS´j pθ i´q θi qϕ i pZ l q, (6.131) where S´j " t0, 1, . . . , suztju if j ď s and S´j " t0, 1, . . . , su otherwise. Then, with the help of (2.1), Lemma 6.1 and (6.124) we get a rough but sufficient for our task inequality, E F X ,f Z trW 2 pZ l qs 4 u ď Cn´2s 4 . (6.132) Now we are ready to evaluate term Dj 1 defined in (6.128). Using notation (6.130) we write, pΔ l´q F´jpZ l qqϕ j pZ l q rf Z pZ l qs 2 r pn´2 n f Z pZ l q´r h 1 pZ l qs ) pW 1 pΔ l , Z l q`W 2 pZ l qqϕ j pZ l q rf Z pZ l qs 2 r pn´2 n f Z pZ l q´r h 1 pZ l qs ) . (6.133) We begin with considering a term with W 1 pΔ n´1 , Z n´1 qW 1 pΔ n , Z n q. Write Using (6.112), (6.113) and the Cauchy inequality we continue (6.134), To evaluate B j11 we recall ϕ j pzqϕ i pzq " 2´1 {2 rϕ j´i pzq`ϕ j`i pzqs and write, Here b i pj, sq :" Let us comment on terms b i pj, sq. Consider j ą s, recall that θ i are Fourier coefficients of an underlying F X , and write, b i pj, sq " where ν i :" Thus, using the above-mentioned Bernstein inequality we get for j ą s Using the obtained relations, together with (6.112) and (6.115), we get for j ą s (6.140) Now consider B j12 and write, Using obtained results in (6.135) we conclude that Now let us consider a term with W 2 pZ n´1 qW 2 pZ n q in (6.133). Using (6.132) and that r h 1 pzq is a projection estimate of f Z pzq with the bias (6.121), we get with the help of Cauchy and Cauchy-Schwarz inequalities, Finally, we need to consider a cross-product term with W 1 pΔ n´1 , Z n´1 qW 2 pZ n q, :" E f Z ! pΔ n´1´F´j pZ n´1 qqϕ j pZ n´1 qr n´2 n f Z pZ n´1 q´r h 1 pZ n´1 qs rf Z pZ n´1 s 2 W 2 pZ n qqϕ j pZ n qr pn´2q n f Z pZ n q´r h 1 pZ n qs rf Z pZ n qs 2 . (6.144) Using Cauchy-Schwarz inequality, (6.102) and (6.132), we get a rough but sufficient relation Combining obtained results we verify (6.129), and then validity of Theorem 3.2 follows from the above-presented proof of Theorem 3.1. Let us also note that the presented proof justifies using a plug-in density estimate.
Proof of Corollary 4.1. The verified assertion follows from [12] and the already proved sharp-minimaxity of the blockwise-shrinkage estimator (3.9). The interested reader may also follow lines in the proofs of Theorems 3.1 and 3.2, and establish validity of Corollary 4.1 directly without reference on [12].
Proof of Theorem 4.1. Let us begin with a general remark. The proposed density estimator is "identical" to the cdf estimator apart of: (i) Using the sine basis for calculating statistics r Z k , p Z k and p d; (ii) The new Fourier estimates q ζ j and p ζ j . To address the first issue, let us present classical formulas for elements ϕ j pxq of the cosine basis (1.2) used in the proofs of Theorems 3.1 and 3.2, and then complement them by corresponding formulas for elements ψ j pxq :" 2 1{2 sinpπjxq of the sine basis. We have In what follows we also will need an analog of relation (2.2.5) in [13] which was used in the proof of Theorem 3.1. Let gpxq, x P r0, 1s be a function with bounded derivative, then The inequality is proved via integration by parts.
The new estimators of the density Fourier coefficients ζ j are analyzed in the presented below two lemmas. We begin with a proposition which is similar to Lemma 6.1.

152)
and Note how (6.152) and (6.153) mimic results (6.1) and (6.2) of Lemma 6.1. Proof of Lemma 6.7. For the expectation we write using (4.4), This verifies (6.152) and tells us that q ζ j is unbiased estimate of ζ j . For the variance we get using (2.1), This and Assumption 1 verify (6.153). Lemma 6.7 is proved. Our next proposition is analog of Lemma 6.2.
Lemma 6.8. Let assumption of Theorem 4.1 hold and F X P Fpα, Qq. Then and E F X tp p ζ j´ζj q 2 u ď pπjq 2 dr1`Cpj´1`s´1qs. (6.155) Here d " ş 1 0 F X pxqp1´F X pxqrf Z pzqs´1dx is the coefficient of difficulty (2.6). Proof of Lemma 6.8. In the proof we are following steps of the proof of Lemma 6.2 to highlight similarity between analysis of the two Fourier estimates. This also will allow us to highlight differences between the proofs.

Using the fact that statistic p
F´jpxq is used to estimate function F˚jpxq, which in its turn approximates F X pxq, we write, We are considering the two terms on the right side of (6.156) in turn. Using (2.1), (4.4) and a straightforward calculation we get, Thus p ζ 1j is unbiased estimate of ζ j . Using (2.1) and Δ 2 l " Δ l we can bound from above MSE of p ζ 1j , ) " pπjq 2 n ż 1 0 rF X pxq´2F X pxqF˚jpxq`rF˚jpxqs 2 sψ 2 j pxqrf Z pxqs´1dx. (6.158) Using F˚jpxq " F X pxq`rF˚jpxq´F X pxqs we may continue, We are evaluating the three terms on the right side of (6.159) in turn. Using (6.149), differentiability of both F X pxq and f Z pxq, and relation (2.2.5) in [13], we get Now we evaluate u 2 using several technical relations. First, F X pxq´F˚jpxq " ζ j ψ j pxqIpj ď sq`ÿ rąs ζ r ψ r pxq. (6.161) Second, using (6.149) we write To continue evaluation of the right side of (6.162) we use (6.150) and note that according to that formula we get ψ j pxqϕ 2j pxq " 2´1 {2 rψ 3j pxq´ψ j pxqs. This and (6.151) yield |ζ j ν j |Ipj ď sq ď Cpj`1q´2Ipj ď sq. Third, using the Cauchy-Schwarz inequality and the assumption of Lemma 6.8 we get, Using these results we establish u 2 ď C pπjq 2 n rpj`1q´2`s´1s. (6.164) To evaluate u 3 we begin with based on (6.161) relation rF˚jpxqs 2 " rF X pxqs 2´2 F X pxqrζ j ψ j pxqIpj ď sq`ÿ rąs ζ r ψ r pxqs rζ j ψ j pxqIpj ď sq`ÿ rąs ζ r ψ r pxqs 2 .
Using it we obtain the following upper bound, u 3 ď C " u 2`ż 1 0 rζ j ψ j pxqIpj ď sq`ÿ rąs ζ r ψ r pxqs 2 ψ 2 j pxqrf Z pxqs´1dx ı ď C " u 2`ζ 2 j Ipj ď sq`r ÿ rąs |ζ r |s 2 ı ď Crpj`1q´2`Cs´1s. (6.165) Now we can use the obtained upper bounds for u 1 , u 2 and u 3 in (6.159) and get E F X tp p ζ 1j´ζj q 2 u ď pπjq 2 n d C pπjq 2 n rc j pj`1q´1`pj`1q´2`s´1s, where This ends our analysis of p ζ 1j . Now we analyze two moments of statistic p ζ 2j introduced in (6.156). Our aim is to show that this statistic is negligible with respect to p ζ 1j . Again we begin with several technical results. Set S´j :" t1, . . . , suztju for j ď s and S´j :" t1, . . . , su for j ą s, and N´l :" t1, . . . , nuztlu. First, using the Parseval identity, definition of q F´j and F˚j, and Lemma 6.7 we can write Et q ζ i´ζi q 2 u ď Cs 3 n´1. (6.167) Second, introduce a decomposition p F´jpZ l q " n´1 ÿ kPN´l Δ k rf Z pZ k qs´1 ÿ iPS´j ψ i pZ k qψ i pZ l q n´1Δ l rf Z pZ l qs´1 ÿ iPS´j ψ 2 i pZ l q ": p F 1 pZ l q`p F 2 pZ l q. (6.168) Then using orthogonality of elements of the sine basis we can write for p F 1 pZ l q, E F X trF˚jpZ l q´p F 1 pZ l qsrf Z pZ l qs´1ψ j pZ l q|tpΔ k , Z k q, k P N´luu " 0. (6.169) Third, Assumption 1 allows us to write |E F X t p F 2 pZ l qψ j pZ l qrf Z pZ l qqs´1u| " n´1|E F X tΔ l rf Z pZ l qs´2 ÿ iPS´j ψ 2 i pZ l qψ j pZ l qu| ď Csn´1. (6.170) Combining the obtained results yields |E F X t p ζ 2j u| ď Cjsn´1.
We are left with evaluating the second moment of p ζ 2j . We again do that via establishing several technical results. First, E F X pF˚jpZ l q´p F 1 pZ l qq 2 ď Csn´1. (6.171) Second, Third, consider two integers l and m such that l ‰ m and l, m P t1, 2, . . . , nu, and write, rF˚jpZ l q´p F´jpZ l qsrF˚jpZ m q´p F´jpZ m qs " rF˚jpZ l q´p F 1 pZ l q´p F 2 pZ l qsrF˚jpZ m q´p F 1 pZ m q´p F 2 pZ m qs " rF˚jpZ l q´p F 1 pZ l qsrF˚jpZ m q´p F 1 pZ m qs´rF˚jpZ l q´p F 1 pZ l qs p F 2 pZ m q p F 2 pZ l qrF˚jpZ m q´p F 1 pZ m qs`p F 2 pZ l q p F 2 pZ m q ": W 1`W2`W3`W4 . (6.173) Using these notations we can write for any 1 ď l ă m ď n, E F X t p ζ 2 2j u ď Cj 2 n´1E F X tpF˚jpZ l q´p F´jpZ l qq 2 ù CE F X trW 1`W2`W3`W4 sψ j pZ l qψ j pZ m qrf Z pZ l qf Z pZ m qs´1u. (6.174) The first term on the right side of (6.174) is at most Cj 2 sn´2 due to (6.171). To evaluate the second term set p F 1 pZ l ,´mq :" p F 1 pZ l q´n´1Δ m rf Z pZ m qs´1 ÿ iPS´j ψ 2 i pZ m q ": p F 1 pZ l q´p F 1 pZ l , mq. (6.175) Note that p F 1 pZ l ,´mq does not depend on pΔ l , Z l , Δ m , Z m q and | p F 1 pZ l , mq| ď Csn´1 almost sure. Write, V :" E F X tW 1 ψ j pZ l qψ j pZ m qrf Z pZ l qf Z pZ m qs´1u " E F X ! rF˚jpZ l q´p F 1 pZ l ,´mq´p F 1 pZ l , mqsrF˚jpZ m q´p F 1 pZ m ,´lq´p F 1 pZ m , lqŝ ψ j pZ l qψ j pZ m qrf Z pZ l qf Z pZ m qs´1 ) .
To continue we need several relations. First, using (6.169) allows us to write, E F X ! rF˚jpZ l q´p F 1 pZ l ,´mqsrF˚jpZ m q´p F 1 pZ m ,´lqŝ ψ j pZ l qψ j pZ m qrf Z pZ l qf Z pZ m qs´1 ) " 0.
These relations allow us to conclude that |V | ď Cs 3{2 n´3 {2 . Next, using the Cauchy-Schwarz inequality we get E F X t|W 2 |u ď rE F X tpF˚jpZ l q´p F 1 pZ l qq 2 uE F X tr p F 2 pZ m qs 2 us 1{2 ď Cs 2 n´3 {2 .
Due to symmetry the same inequality holds for W 3 , and E F X t|W 4 |u ď Cs 2 n´2.
Combining the above-presented relations we get E F X t p ζ 2 2j u ď Cj 2 s 2 n´3 {2 . (6.176) Lemma 6.8 is proved.
We have established that the considered Fourier estimates q ζ j and p ζ j satisfy the desired statistical properties mimicking properties outlined in Lemmas 6.1 and 6.2 for Fourier estimates of cdf Fourier coefficients. The rest of the proof of Theorem 4.1 follows the same steps as in the proof of Theorems 3.1 and 3.2, the steps are based on using relations (6.146)-(6.151) for sines in place of cosines, and finishing the proof presents no new technical complications. Theorem 4.1 is verified.
Proof of Lemma 4.1. Using (6.156) we can write, Now we are evaluating the expectations on the right side of (6.177) in turn. Formulas (6.158) and (6.159) yield Using (6.149) and (6.151) we conclude that Combining this relation with upper bound (6.164) for u 2 and upper bound (6.165) for u 3 we get E F X tp p ζ 1j´ζj q 2 u " pπjq 2 n dr1`o j p1q`o n p1qs. (6.179) Further, it is established in (6.176) that E F X t p ζ 2 2j u ď Cj 2 s 2 n´3 {2 " o n p1qj 2 n´1. Using this, (6.179) and the Cauchy-Schwarz inequality on the right side of (6.177) proves Lemma 4.1.

Conclusion and further research
Current Status Censoring (CSC) is a familiar sampling procedure when instead of observing a lifetime of interest X only its status Δ " IpX ď Zq is available at a monitoring time Z. In other words, in place of a classical sample of direct observations from X, under CSC a sample from pΔ, Zq is available. CSC sampling is a popular technique due to its simplicity, and in many applications it is the only possibility to get information about an underlying lifetime of interest. It is also well known that CSC makes estimation of the density and cdf ill-posed and dramatically slows down rate of a risk convergence.
Whenever a problem is ill-posed, it is important to investigate not only the rate but also a minimal constant of a risk convergence. The latter is due to the fact that this constant sheds light on an ill-posed problem and because for small samples we may see only the onset of ill-posedness. Recently [16] obtained sharp-minimax lower bounds for density and cdf oracle-estimators, and this paper proposed a data-driven and robust sharp-minimax estimator that attains the oracle's lower bound. Further, the estimator may be used for small samples and it is tested on real and simulated examples.
There are many interesting and practically important extensions of the considered CSC setting. Missing is a typical complications in data analysis, and then approaches of book [15] may be tested. Estimation of joint and conditional densities is another important topic to consider. CSC regression, including a functional regression, is another important applied topic to consider. Let us also mention a research devoted to more general cases of interval censoring. Here no results on sharp minimax estimation are known even for the interval censoring case II when one observes a triplet pL, U, Δq where L ă U are two monitoring times and the status Δ "´1 if X ď L, Δ " 0 if L ă X ď U , and Δ " 1 otherwise, see [5,6,18]. It is an interesting, challenging and open problem to expand the obtained CSC sharp-minimax results to a general interval censoring data. Now let us formulate several open problems for the considered CSC setting and make several last remarks. (i) The coefficient of difficulty (2.6) indicates that there is a serious issue when the integral ş x 0 p1´F X puqqrf Z puqs´1du diverges as x increases due to a light tail of f Z puq. This issue is paramount to address for extending the developed theory to unbounded support of X. While cases of unbounded lifetimes are rare in practical applications, they are of a great theoretical interest. (ii) In Assumption 1 the equality ş 1 0 f Z pzqdz " 1 can be relaxed and the support of Z may be larger than r0, 1s. This case does not present a theoretical complication and the results still hold only now we are estimating f Z pzq over interval r0, 1s using Z l P r0, 1s, see the corresponding density estimate in [13]. What is of interest here is to estimate the support of X. (iii) The case when the support of X is larger than the support of Z yields inconsistent estimation, but a sharp estimation over the support of Z is still possible. (iv) An interesting and complicated setting is when X and Z are unbounded. No sharp-minimax theory is known for this case. (v) An important applied task is to create a user-friendly R package for CSC density and cdf estimation.