Penalized nonparametric likelihood-based inference for current status data model

: Deriving the limiting distribution of a nonparametric estimate is rather challenging but of fundamental importance to statistical inference. For the current status data, we study a penalized nonparametric likelihood- based estimator for an unknown cumulative hazard function, and establish the pointwise asymptotic normality of the resulting nonparametric esti- mate. We also propose the penalized likelihood ratio tests for local and global hypotheses, derive their limiting distributions, and study the opti- mality of the global test. Simulation studies show that the proposed method works well compared to the classical likelihood ratio test.


Introduction
When analyzing survival data, interval censoring arises frequently in medical and public health studies. In particular, interval-censored data occur when the exact failure time cannot be observed, instead, it is known to lie within an interval or not. Among these data, due to the constraints, costs, features of events of interest and many other difficulties, an event of interest is observed once only but the failure time is just known before the examination time or not. This kind of data is called case I interval-censored data or current status data ( [15,25]).
The nonparametric maximum likelihood, as a widely used method in nonparametric inference, has been developed for estimation of an unknown cumulative distribution function with current status data. In particular, [1] and [9] derived the nonparametric maximum likelihood estimator (NPMLE) with current status data, while [15] and [17] established the limit distribution of the NPMLE for current status data. [3] studied a likelihood ratio test to construct pointwise confidence intervals for an unknown distribution function with current status data. This novel method was investigated further in [4,5,2]. The NPMLE is an important method and widely used in applications, since it does not involve any tuning parameter and it converges in distribution pointwisely to an asymptotically unbiased limiting distribution. However, due to the discontinuity of NPMLE, estimators of the density and the hazard rate cannot be directly obtained through taking derivatives. For the problem, [6] provided adaptive risk bounds for a wide class of distribution function estimators under current status data based on smoothness classes such as splines, wavelets, etc. [14] proposed the maximum smoothed likelihood and the smoothed maximum likelihood estimators. Moreover, [11] presented the maximum smoothed likelihood estimator for the interval censoring model, [12] discussed these nonparametric estimators, and [13] improved the algorithm for computation and also provided an alternative way to derive the distribution of the likelihood ratio. [27] developed an I-spline based sieve maximum likelihood method for estimating the marginal and joint distribution functions with bivariate current status data.
As the hazard and cumulative hazard functions can provide different insights about an event of interest from a survival function, and the survival function can be directly obtained from the hazard and cumulative hazard functions via simple calculations, a number of studies have been developed for estimating the hazard and the cumulative hazard. While the nonparametric estimation without smoothing is not stable, some smoothing estimation approaches have been developed. For example, the kernel-based approaches were given in [10] and [14]. In order to avoid selecting the sensitive bandwidth in estimation, spline methods have also been developed for the case of interval-censored data. Specifically, [20] used M-splines to model the hazard and I-splines for the cumulative hazard, [22] suggested using the nonnegative coefficients to model the hazard, while [7] developed the penalized B-spline basis to model the hazard. [16] considered Cox's proportional hazards model with current status data, and studied the asymptotic properties of the maximum likelihood estimators of the regression parameter and the baseline cumulative hazard function. [18] proposed likelihood ratio tests and confidence intervals for the current status data with the model studied by [16]. Furthermore, [28] developed a B-spline based semiparametric maximum likelihood approach for Cox's proportional hazards model with case II interval-censored data.
Despite the significant contributions in the literature, the limiting distributions of the spline-based estimators have not been established yet. Our goal is to fill this gap and address the theoretical challenge. An additional challenge is to study the likelihood ratio test for the global hypothesis, which has not been addressed in the current status data model. In this paper, we develop a penalized nonparametric likelihood method to estimate an unknown cumulative hazard function with current status data. In particular, a functional Bahadur representation is established. Using this technical tool, we show that the proposed estimator enjoys the pointwise asymptotic normality. Furthermore, we study the penalized likelihood ratio tests and show the optimality of the global test.
The remainder of this paper is organized as follows. In Section 2, we present estimation procedures, construct the Sobolev space with a special inner product, and give some basic results. In Section 3, we derive a functional Bahadur representation (FBR) in the space and establish the asymptotic properties of the proposed estimator. In Section 4, we develop the penalized likelihood ratio tests for local and global hypotheses. In Section 5, we present simulation results for comparing the performance of the proposed penalized likelihood ratio test and the classical likelihood ratio test [3]. Some concluding remarks are made in Section 6. All technical proofs are deferred to the Appendix.

Methodology
Denote U as an examination or observation time and T as a failure time with an unknown distribution function F . Then under the scenario of current status data, the observation consists of X = (Δ, U), where Δ = I(T ≤ U ). In this paper, we assume that the examination time is independent of the failure time. Such an assumption is less stringent than that of [6] and common for the simplest version of the current status model in survival analysis; see [26]. Let X i = (Δ i , U i ), i = 1, 2, . . . , n be i.i.d copies of X = (Δ, U). Under the assumption that U is independent of T , the log-likelihood of F is by omitting the term independent of F . Assume that there exists a small positive constant ξ such that P (U ≥ ξ) = 1. Let Λ 0 be the true cumulative hazard function of the failure time. Assume that Λ 0 (t) is increasing and bounded away from 0 and infinity on I = [ξ, τ ], where τ is the end of the study period. This assumption is less stringent than (F.1) in [14], since we do not need to assume the distribution of the survival time has bounded support. Besides, the introduction of ξ is to make the definition of the inner product in (1) meaningful. In fact, we can relax this assumption to ξ = 0. In applications, we can choose ξ as the minimum observation time and τ as the maximum follow-up time. Moreover, it is assumed that Λ 0 belongs to the Sobolev space H m , where Here m > 1 is assumed to be known, and g (j) is the j-th derivative of any function g. Without loss of generality, we assume that I = [ξ, 1 + ξ]. Then, the log-likelihood of Λ is Define l(Λ) = El n (Λ) and J(g,g) = I g (m) (t)g (m) (t) dt. To make an inference about Λ 0 (t), we propose the following penalized log-likelihood of Λ where J(Λ, Λ) is the roughness penalty and λ is the smoothing parameter. Then, the penalized likelihood estimator of Λ 0 is defined aŝ Λ n,λ = arg max Λ∈H m n,λ (Λ).
We will show thatΛ n,λ is increasing on I with probability tending to 1 in the next section. Define λ (Λ) = E n,λ (Λ) and the inner product in the space H m as where E U is the expectation with respect to U and the corresponding norm is g 2 λ = g, g λ . Hence, H m is the reproducing kernel Hilbert space (RKHS) with the inner product ·, · λ . Besides, there exists a positive self-adjoint operator: We have Let K(·, ·) be the reproducing kernel of H m , and let h j and γ j be the eigenfunctions and eigenvalues of H m . The properties of K(·, ·), h j ∈ H m and γ j can be found in Appendix A. Let S n (Λ) and S n,λ (Λ) be the Fréchet derivatives of l n (Λ) and n,λ (Λ), respectively. Similarly, let S(Λ) and S λ (Λ) be the Fréchet derivatives of l(Λ) and λ (Λ), respectively. Let D be the Fréchet derivative operator. Then, direct calculations yield The following proposition will play a key role in the FBR.
where id is the identity operator.
Following Proposition 1, the first term of the Taylor expansion of S n,λ (Λ) at Λ 0 can be approximated by −id(Λ − Λ 0 ). This will result in a sum of the independent and identically distributed random variables.

Functional Bahadur representation and asymptotic normality
The functional Bahadur representation (FBR) is a key technique to establish the asymptotic normality of the estimators. The following lemma shows that the estimator is consistent in the · ∞ with g ∞ = sup t∈I |g(t)| and · 1 , which denotes that · λ with λ = 1.
To see this, define A n = {ω : sup t∈I |Λ Then by Lemma 1, for any > 0, η > 0, there exists a N 0 such that when n ≥ N 0 , n,λ (t) > 0 for any t ∈ I on A n with n > N 0 , j = 0, 1. This yields the conclusion. Numerically, to ensure monotonicity, we use the B-spline with constrained coefficients for estimating the cumulative hazard in the simulation studies.
Next, we present the exact rate of convergence and the FBR forΛ n,λ .
We derive a new version of the FBR.
From Theorem 2, we can find that the bias of the estimator is very close to a sum of some independently and identically distributed random variables, which is very useful to study the asymptotic normality.

Theorem 3. (Asymptotic Normality) Assume that
Let Λ * = (id − W λ )Λ 0 be the biased true parameter. Then we have The above theorem reports the point-wise asymptotic normality of the resulting estimate.

Remark 2.
The choice of tuning parameter that leads to the optimal pointwise rate of convergence is h n − 1 2m+1 . However, this rate does not satisfy the conditions of Corollary 1; in other words, the estimator needs to be under-smoothed to ensure an unbiased limiting distribution, which yields a sub-optimal pointwise rate of convergence. This shares the same spirit as the under-smoothing procedures in the literature. Thus, we need to sacrifice the optimal rate for removing bias. In practice, we apply the widely-used CV or GCV to select the tuning parameter h, as suggested by [24]. Remark 3. Corollary 1 together with the Delta-method immediately gives the pointwise confidence interval for some real-valued smooth function of Λ 0 (t 0 ) at any fixed point t 0 ∈ I, denoted as ρ(Λ 0 (t)). Letρ(·) be the first derivative of ρ(·). Ifρ(Λ 0 (t 0 )) = 0, we have where z α is the lower αth quantile of the standard normal distribution function.

Penalized likelihood ratio test
In this section, we present the penalized likelihood ratio tests for the local and global hypotheses.

Local likelihood ratio test
For a prespecified point (t 0 , ω 0 ), we consider the following hypothesis: The constrained penalized log-likelihood is defined as where Λ ∈ H 0 = {Λ ∈ H m : Λ(t 0 ) = 0}. To test H 0 against H 1 , take the penalized local likelihood ratio test (PLLRT) statistic: whereΛ 0 n,λ = arg max Λ∈H0 L n,λ (Λ). Endowed with the norm · λ , H 0 is a closed subset in H m , and thus a Hilbert space. The following proposition states that H 0 also inherits the reproducing kernel and penalty operator from the space H m . The proof is trivial and thus omitted.
is a reproducing kernel for (H 0 , ·, · λ ). That is, for any t ∈ I and g ∈ H 0 , From Proposition 2, we obtain the restricted FBR forΛ 0 n,λ , which will be used to derive the null limiting distribution. Let the first-order Fréchet derivative of L n,λ and L n be S 0 n,λ and S 0 n respectively.
Following from the equation, we have the next proposition.
The proof of Proposition 4 is similar to Theorem 1 and thus omitted.
The proof of Theorem 4 is similar to that of Theorem 2 and thus omitted. Our main result follows immediately from the Restricted FBR.

Global likelihood ratio test
Consider the following global hypothesis: The penalized global likelihood ratio test (PGLRT) statistic is defined as .
This shows that the Wilks phenomenon holds for the PGLRT. Since people pay much more attention to the shape of a function, the PGLRT plays a key role in practice. For example, the proposed likelihood ratio approach can also be applied to a parametric setup, e.g., It follows from similar arguments as in Remark 5.4 of [24] that, the asymptotic null distribution for testing such a hypothesis is χ 2 ν λ , which is the same as that in Theorem 6. This result fills a gap compared to the local test results in [3].
Furthermore, we show that the PGLRT achieves the optimal minimax rate of testing presented in [19] based on the uniform version of the FBR. Write Then for any ∈ (0, 1), there exist positive constants C and N such that Theorem 7 shows that when h = h * = n −2/(4m+1) , the PGLRT can detect any local alternatives with separation rates no faster than n −2m/(4m+1) , which turns out to be the minimax rate of testing in the sense of [19].

Simulation studies
To assess the performance of the penalized likelihood estimator and likelihood ratio test, we conducted a simulation study with a focus on the comparison of the proposed penalized method and the classical likelihood method [3,4,5,13]. For this purpose, we consider two examples. We use 5-fold cross validation to choose the tuning parameter. Besides, we consider m = 2 and 3, and sample sizes n = 600 and 800. In addition, we calculate c t0 on the basis of the eigenfunctions and eigenvalues defined in equation (1) with the cumulative hazard function replaced by the estimate of Λ 0 . For each setting, we report simulation results based on 1000 repetitions using the proposed local likelihood ratio test (PLLRT) and the likelihood ratio test (LRT) proposed by [3] and [13]. Example 5.1. The failure time follows the Exponential distribution with density function where μ = 2.5, while the examination time follows the uniform distribution from 0.5 to 1.5. Specifically, we test H 0 : Λ(0.82) = Λ 0 (0.82) against H 1 : Λ(0.82) = Λ 0 (0.82) where Λ 0 is the cumulative hazard function of the failure time defined above. To access the power of the test, we generated failure times with density function where μ = 2.5, c = 0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, and the distribution of the examination time remains unchanged. Simulation results for this example are shown in Table 1. The powers of both methods are comparable. It can be seen that the estimated sizes of the proposed test are closer to the target level 5% than that of LRT.
Example 5.2. We use the setting of Example 5.1 except that the failure time distribution is changed to the Pareto distribution with density function where Λ 0 is the cumulative hazard function of the Pareto distribution defined above. To assess the power of the test, we generated failure times with density function   where c = 0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, and the distribution of the examination time remains unchanged. Simulation results for this example are shown in Table  2. It can be seen that the estimated sizes and powers of both methods are comparable.
For each setting, the pointwise averages, the coverage probability of the pointwise 95% confidence intervals, and the estimated standard errors ofΛ n,λ (t) are calculated. The simulation results of Example 5.1 are displayed in Figures 1-3, and those of Example 5.2 are showed in Figures 4-6. Table 3 summarizes the mean squared error (MSE) of the estimate in each setting. The simulation results based the NPMLE proposed by [3] and [13] are also included for comparison. For each case, the estimates given by our proposed method are closer to Λ 0 (t), particularly when t is near the boundaries, than that given by [3] and [13]. When the sample size increases, the reduction of MSE of the proposed method is faster than that of NPMLE. This result confirms that the convergence rate of the proposed method is higher than that of the NPMLE. The coverage probabilities of both methods are reasonable. Besides, the simulation results indicate that the proposed method with m = 2 and m = 3 has similar performance.

Closing remarks
This article focuses on the development of nonparametric inference for the cumulative hazard function with case I interval-censored data or current status data. It is well known that the convergence rate of the NPMLE of an unknown distribution function is n −1/3 rather than the standard rate n −1/2 due to intervalcensoring. To improve the convergence rate, we develop a penalized likelihood method using smoothing techniques. To establish the asymptotic properties of the proposed estimator, we derive a functional Bahadur representation of the estimator in the Sobolev space with a proper inner product, which plays a key role for nonparametric inference. Furthermore, we develop the penalized likelihood ratio tests for both local and global hypotheses. In particular, the proposed penalized global likelihood ratio test is able to detect any local alternatives with minimax separation rate in the sense of [19], while the classical likelihood ratio test for the global hypothesis in the current status data model has not been addressed, to the best of our knowledge. Simulation studies demonstrate that the proposed estimator outperforms the classical NPMLE and the penalized likelihood ratio test is more powerful than the classical likelihood ratio test, as expected.
Note that for case II interval-censored data, the convergence rate of the nonparametric maximum likelihood estimator of an unknown distribution function is n −1/3 and the limiting distribution of the estimator is still unknown. Further interesting research is to investigate the limiting distribution of penalized nonparametric maximum likelihood estimator through deriving a functional Bahadur representation of the estimator, which is very challenging in the presence of case II interval-censoring.

Appendix
The Appendix contains the proofs of the main results in the main text and the properties of the reproducing kernel and the eigensystems. In the following, we will denote different positive constants by C which may take different values in different places, while "a b" means "a ≤ Cb" and "a b" means "a ≥ Cb".

Appendix A
In this section, we state the properties of the reproducing kernel K(·, ·), the eigensystems and how to compute the eigensystems.
First, the reproducing kernel K(·, ·) of H m defined on I × I satisfies the following properties: (P 1 ) K t (·) = K(t, ·) and K t , g λ = g(t) for any g in H m and any t ∈ I. (P 2 ) There exists a constant c m which only depends on m such that K t λ ≤ c m h −1/2 for ∀t ∈ I, where h = λ 1/2m . Thereby, for any g ∈ H m , we have g ∞ ≤ c m h −1/2 g λ . The eigenfunctions h j ∈ H m and the eigenvalues γ j satisfy the following properties: where δ ij is a Kronecker's delta, which means that when i = j, δ ij = 1; otherwise, it's 0. (P 5 ) For any g ∈ H m , we have g = j V (g, h j )h j with a convergence in the · λ -norm. (P 6 ) For any g ∈ H m and t ∈ I, we have g 2 It follows from [24] that the underling eigensystem (γ j , h j (·)) can be chosen as the (normalized) solution of the following ODE functions: where π(·) is the density of U .

Appendix B
In order to prove Lemma 1, we need the following lemma.
Lemma B.1. For any g ∈ H m , we have J(g, g) ≤ C 0 V (g, g) with C 0 being independent of g.
Proof. Using the properties of the eigensystems (P4)-(P5), we have for any g) with C 0 being independent of g. Thus, g).

Proof of Theorem 1
Set g =Λ n,λ − Λ 0 and write where g * = Λ 0 + αg with α ∈ [0, 1]. Note that Then, Lemma 1 shows that g ∈ F = {g ∈ H m (I), g ∞ ≤ 1, J(g, g) ≤ c −2 m hλ −1 } when n is large enough with c m defined in (P 2 ). It then follows from [24] that there exits a set B n with lim n→ P (B n ) = 1 such that on B n , Thereby, on B n , we have then on B n , Hence, from ( For I 2 , we write In view of (3), we have It follows from the definition ofΛ n,λ that I 1 + I 2 + I 3 ≥ 0. Thus,
Note that Then, It follows from [24] that there exists an event A n ⊂ B n such that lim n P (B n − A n ) = 0, and on A n , Then, on event A n , As g ∞ ≤ 1, then, on A n , For the main part of I 2 (ignoring o p (1)I 2 , still denote as I 2 ), we have Again, it follows from [24] and nh 2 → ∞ that with n large enough, Thereby, it is not hard to show that On the other hand, sD 2 S λ (Λ 0 + ss g)g 2 ds ds .

Proof of Theorem 3
Define Rem n =Λ n,λ − Λ * − S n (Λ 0 ). It follows from the Functional Bahadur representation that Rem n λ = O p (α n ). As nh 3 → ∞, nh 4m−1 → 0 and m > Rem n is negligible compared with S n (Λ 0 ). Next, we intend to show the asymptotic distribution of (nh) −1/2 {Λ n,λ (t 0 ) − Λ * (t 0 )}. We will use the fact that for any t ∈ I and any g ∈ H m , K t , g λ = g(t). Hence, for any fixed t 0 ∈ I, since then, as n → ∞. We finish the proof of Theorem 3.

Proof of Theorem 5
Clearly, part (i) can be obtained from part (ii) and part (iii). Here, we only need to give the proofs of part (ii) and part (iii).