A penalized likelihood approach for efficiently estimating a partially linear additive transformation model with current status data

Current status data are commonly encountered in medical and epidemiological studies in which the failure time for study units is the outcome variable of interest. Data of this form are characterized by the fact that the failure time is not directly observed but rather is known relative to an observation time; i.e., the failure times are either left- or right-censored. Due to its structure, the analysis of such data can be challenging. To circumvent these challenges and to provide for a flexible modeling construct which can be used to analyze current status data, herein, a partially linear additive transformation model is proposed. In the formulation of this model, constrained $B$-splines are employed to model the monotone transformation function and nonlinear covariate effects. To provide for more efficient estimates, a penalization technique is used to regularize the estimation of all unknown functions. An easy to implement hybrid algorithm is developed for model fitting and a simple estimator of the large-sample variance-covariance matrix is proposed. It is shown theoretically that the proposed estimators of the finite-dimensional regression coefficients are root-$n$ consistent, asymptotically normal, and achieve the semi-parametric information bound while the estimators of the nonparametric components attain the optimal rate of convergence. The finite-sample performance of the proposed methodology is evaluated through extensive numerical studies and is further demonstrated through the analysis of uterine leiomyomata data.


Introduction
Current status data are characterized by the fact that the failure/event time cannot be directly measured, but rather is known relative to a single observation time. That is, the failure time is known to be smaller or larger than an observation time leading to left-or right-censored data, respectively. Data of this structure commonly arise as a part of medical and epidemiological studies, among many others, in which study participants are observed only once due to destructive/invasive testing, resource limitations, etc. For example, as part of the National Health and Nutrition Examination Survey (NHANES), participants were examined for the presence of the human papillomavirus (HPV). In this study, HPV onset time could not be directly observed but was known relative to the participant's age at the examination time.
For the regression analysis of current status data, several parametric and semiparametric methods have been proposed. In particular, many of these efforts have focused on the proportional hazards (PH) [4] and proportional odds (PO) models [5]; for a review see [11,19,26] and the references therein. To provide a more general modeling framework, some researchers have developed estimation techniques under linear transformation models [7,10,12,27,35,37,38]. A potentially prohibitive assumption made in all of the aforementioned works is that the covariate effects on the failure time are "linear". To relax this assumption, Ma and Kosorok [17] and Lu and McMahan [13] presented partially linear transformation and PH models, respectively, for current status data. These works allowed for the estimation of a single nonparametric effect, which could again be limiting. To avoid this potential limitation, Cheng and Wang [2] developed a semiparametric additive transformation model which allows for the estimation of multiple nonparametric regression functions. To our knowledge, the work of these authors is the most general model designed for the analysis of current status data.
Motivated by the work of Cheng and Wang [2], herein we seek to develop a flexible, easy to implement, and computationally efficient methodology which can be used to conduct the regression analysis of current status data under a partially linear additive transformation model. The proposed model makes use of constrained B-splines to approximate all unknown functions, i.e., the transformation and regression functions. In order to provide for more efficient estimates and to largely circumvent the need to carefully specify the basis function representation, a penalization strategy is used to regularize the estimation of the unknown functions. The novelty of our approach resides in the carefully constructed form of the proposed model which leads to the development of a computationally efficient and easy to implement hybrid algorithm for model fitting. The uniqueness of this algorithm is that it determines all penalty parameters as a part of the model fitting process; thus obviating the need to conduct a computationally expensive grid search to identify the same. Theoretically, it is shown that the proposed estimators of the regressions coefficients, which rely on the penalization and spline approximation technique, are asymptotically normal and efficient, i.e., they attain the semiparametric information bound. Moreover, we establish that the estimators of the nonparametric components achieve the optimal rate of convergence for appropriately chosen order of the smoothing parameters and knots of spline spaces. Moreover, the estimator of the variance-covariance matrix of the estimated regression coefficients is shown to be asymptotically consistent and its finite-sample performance is also evaluated. Further, through numerical studies we demonstrate that the proposed methodology has better inferential and computational characteristics than the method developed by Cheng and Wang [2] and Lu and McMahan [13], the only existing, to our knowledge, competing approaches that are the most directly comparable.
The remainder of this manuscript is organized as follows. In Section 2, we develop a penalized partial linear additive transformation model for current status data. In Section 3, we outline the details of our model and the hybrid algorithm. In Section 4, we present the large-sample properties of the penalized estimators and the estimator of the variance-covariance matrix of the estimated regression coefficients. In Section 5, we summarize the results of a simulation study that was designed to examine the finite-sample performance of our approach. In Section 6, we further illustrate our methodology by applying it to HPV data collected as a part of the NHANES study. In Section 7, we conclude with a summary discussion. Technical details and proofs of the asymptotic properties are relegated to Appendix A and the additional results referenced in Sections 5 and 6 are available in Appendices B and D. Finally, the simulation results for comparison between the proposed penalized approach and the regression spline method of Lu and McMahan [13] are presented in Appendix C.

Model
Consider a study which examines n subjects for a failure time of interest. Let T i denote the time that the ith subject experiences the failure, for i = 1, ...., n. To provide for modeling flexibility, we assume that T i obeys a partially linear additive transformation model which is given by where η(·) is an unknown monotone increasing transformation function on (0, ∞) with η(0) = −∞ and ε i is a random variable having cumulative distribution function (CDF) H(·). In the above specification, β is a q-dimensional vector of regression coefficients corresponding to the covariate Z i = (Z i1 , ..., Z iq ) and ϕ j (·) is an unknown smooth regression function taking the covariate W ij as its argument, for j = 1, ..., J. Throughout, we assume that ε i ⊥ ε j , for i = j, and ε i ⊥ X i , where ⊥ denotes statistical independence and X i = (Z i , W i1 , ..., W iJ ) . This formulation provides for a very general modeling construct that holds several popular models as special cases.
where g(·) is a known, smooth, and strictly increasing link function on (0, 1). Though equivalent to (1), the representation provided in (2) is more convenient for reasons that will shortly become apparent. Here g(·) assumes the role of H(·) with respect to determining the final form of the proposed model, e.g., specifying g(x) = log{− log(1 − x)} provides for the PH model and g(x) = log{x/(1 − x)} results in the PO model.
A key feature of current status data is that the failure times (i.e., T i , for i = 1, . . . , n) are not directly observed but rather are known relative to observation times, i.e., T i is either left-or right-censored indicating that T i occurred before or after the observation time, respectively. To mathematically frame the structure of this data, let Y i denote the observation time for the ith individual and let Δ i = I(T i < Y i ) denote the corresponding censoring indicator, where I(·) is the usual indicator function. Note, Δ i = 1(0) represents the event that the ith individual's failure time is left (right)-censored. Thus, the observed data consist Following the works of [16,17,18], we propose to estimate the unknown parameters via the following penalized log-likelihood where τ = (β, η, ϕ 1 , ..., ϕ J ) represents the collection of unknown model parameters, λ = (λ 0 , . . . , λ J ) is a vector of smoothing parameters, such that λ j ≥ 0 for all j, and In the penalties above, η (r) (·) and ϕ (r) j (·) denote the rth derivative of η(·) and ϕ j (·), respectively, r ≥ 1. This penalized log-likelihood is derived under the assumption that the covariates are time-invariant and that the failure and censoring times are conditionally independent, given the covariates. These assumptions are common among the literature, e.g., see [1,2,17] and the references therein.

Penalized spline log-likelihood
In general, estimating β, η(·) and ϕ j (·) directly from the penalized log-likelihood can be a tumultuous task. Thus, motivated by the works of [11,17,33], we propose to strike a balance between modeling flexibility and complexity by approximating all unknown functions in (2) through the use of constrained B-splines functions. In particular, where b 0k (·) and b * jk (·) are B-spline basis functions with unknown coefficients γ = (γ 1 , . . . , γ p0 ) and α * j = (α * j1 , ..., α * jpj ) , respectively, 1 ≤ j ≤ J. Once a knot set is chosen and the degree of the spline function is specified these basis functions are uniquely determined; for further discussion see Schumaker [21]. To insure monotonicity of η n (· |γ) we require that γ 1 ≤ . . . ≤ γ p0 ; see Theorem 5.9 of Schumaker [21]. It is important to note that other techniques of enforcing the monotoncity constraint have been proposed but often lead to complex and unreliable model fitting strategies, e.g., see Cheng and Wang [2].
As with all additive models, ϕ j (·) is only identifiable up to an additive constant. Thus, sum-to-zero constraints are imposed such that For implementation purposes, the sum-to-zero constraints can be expressed as . Motivated by Wood [30], the model is reparameterized such that the constrained set of parameters α * j can be uniquely determined by an unconstrained reduced set α j = (α j1 , ..., α jpj −1 ) . In particular, let Q j denote a p j × (p j − 1) semi-orthogonal matrix whose columns are orthogonal to B * j . Obviously, B * j Q j α j = 0. Thus, for any α * j ∈ R pj satisfying the sum-to-zero constraint there exists α j ∈ R pj −1 such that α * j = Q j α j . Note, Q j can easily be found via a QR-decomposition of B * j ; see Wood [30] for further details. Thus, the log-likelihood under the proposed spline model is obtained by replacing F (Y i |X i ) in the penalized log-likelihood function by where Thus, the set of unknown parameters to be estimated are given by θ = (β , γ , α ) , where α = (α 1 , . . . , α J ) .
Under this strategy, the penalized spline log-likelihood function is given by where D 0 and D j are band diagonal matrices whose (k, l)th elements are given by b jl (w j )dw j , respectively. Cubic splines are well-known for their numerical stability and are widely used in semiparmatric regression in literature; e.g., see [3,15,11], among many others. For this reason, cubic splines are used to approximate the unknown functions in our numerical experiments and data application, and hence r is set to be 2. The algorithm for finding D 0 and D j is discussed in Wood [30] and is implemented in the smooth-Con function in the R package mgcv. The penalized spline maximum likelihood estimator of θ is given by θ = argmax θ l n,λ (θ), where the maximization is completed subject to the monotonicity constraints, i.e., γ 1 ≤ . . . ≤ γ p0 . Accordingly, the spline penalized estimator of τ is defined as τ = ( β , η, ϕ 1 , . . . , ϕ J ) with η(·) = η n (· | γ) and ϕ j (·) = ϕ nj (· | α j ). The theoretical properties and numerical performance of τ are discussed in Section 4 and Section 5, respectively.
The proposed penalized spline estimation approach has several appealing features when compared to existing methods. In particular, under the smoothness assumption, η can be well approximated by a monotone spline which can be written as a linear combination of spline basis functions with a much smaller number of coefficients to estimate, for instance, n 1/3 instead of n, which is required by fully nonparameteric estimation methods like Ma and Kosorok [17]. Further, the performance of existing spline techniques including Cheng and Wang [2] and Lu and McMahan [13] is intrinsically tied to the specification of these basis functions. Regretfully, as with most spline based procedures, to effectively implement these techniques one must perform a grid search over different basis function representations, which can be computationally expensive, especially when there are multiple unknown functions to be estimated. On the other hand, via the proposed penalized approach we are able to regularize the estimation of these parameters to address the potential issue of over-fitting. These two features acting in concert provide an estimation framework which is both more accurate and more efficient. Moreover, as discussed in Section 3.2, our formulation leads to the development of a computationally efficient and easy to implement hybrid algorithm for model fitting. All of these features lead to a methodology that is of practical interest to practitioners who are left to analyze data of this form.

Parameter estimation
To complete model fitting in a computationally efficient manner, herein we develop a novel hybrid algorithm which can be used to simultaneously identify both the penalized spline estimators and the smoothing parameters via a nested iterative process. The inner process is aimed at identifying the value of θ that maximizes (4) for a fixed value of λ, while adhering to the monotonicity constraints, i.e., the inner step identifies θ λ = argmax θ l n,λ (θ) s.t. γ 1 ≤ . . . ≤ γ p0 .
To solve this constrained optimization problem, an iterative algorithm is developed. In each iteration of this algorithm, the unconstrained updates of the parameters, say θ * , are first determined based on the current parameter value, say θ (m) , via the following Fisher-scoring step where ∇l n,λ (θ (m) ) and I n,λ (θ (m) ) are the gradient and the expected negative value of the hessian matrix of (4) evaluated at θ (m) , respectively. Closed form expressions for these quantities are given by . In the expressions above, the transformed response vector and weight matrix for a specific value of θ are given by where X i is the ith row of X , G (·) is the first derivative of G(·) = g −1 (·), and π i = g −1 (X i θ).
The updated values of γ available in θ * do not necessarily satisfy the monotonicity constraints; see Section 2. To enforce these conditions, isotonic regression is implemented to project the unconstrained estimates into the constrained space, i.e., into Γ = {γ : γ 1 ≤ . . . ≤ γ p0 }. This is accomplished by solving the following minimization problem where σ 2 k is the diagonal element of I −1 n,λ (θ * ) corresponding to γ k , for k = 1, ..., p 0 . This problem can be solved using the pava function in the R package Iso, which takes γ * and (σ 2 1 , . . . , σ 2 p0 ) as inputs. Substituting γ (m+1) for γ * in θ * one obtains the constrained update θ (m+1) . These two steps are completed in turn until convergence is attained, i.e., the difference between θ (m) and θ (m+1) is less than some specified stopping criterion. At the point of convergence of this inner process, one obtains θ λ , i.e., the maximizer of the penalized spline log-likelihood function for a fixed value of λ. Once the inner process is complete, the outer process updates the smoothing parameters and the inner process is restarted. To obtain the updated smoothing parameters, say λ * = (λ * 0 , ..., λ * J ) , a generalized Fellner-Schall approach [31] is adopted and makes the update as where S − is the generalized inverse of S. Using the updated smoothing parameters (i.e., λ * ), the inner process is then used to identify θ λ * . This nested iterative process continues until the difference between θ λ and θ λ * is less than some specified stopping criterion. It is worthwhile to point out that the update of λ j is guaranteed to be positive, which is required, as long as I n,λ ( θ λ ) is positive definite (see Wood and Fasiolo [31]), which is the case for the proposed model. At convergence of the proposed hybrid algorithm, the final update of θ is defined as the spline penalized estimator θ.

Assumptions
In this section we present the asymptotic properties of the penalized spline estimators. Let T 0 = [a 0 , b 0 ] and T j = [a j , b j ] be supports of Y and W j , respectively, for j = 1, . . . , J. Define M as a space of monotone splines with degree r + 1 on the compact set T 0 and S j as a space of splines with degree r +1 on the compact set T j . Let the regression parameter space Φ be a compact set of R q and the nonparametric space be is the Sobolev norm for a fixed integer r ≥ 1 defined in Section 2. Denote by τ 0 = (β 0 , ϕ 0,0 , ϕ 0,1 , ..., ϕ 0,J ) with ϕ 0,0 = η 0 and τ = ( β, ϕ 0 , ϕ 1 , ..., ϕ J ) with ϕ 0 = η the true value and the penalized spline likelihood estimator of τ , respectively. Let · n , · 2 , and · ∞ be empirical norm, L 2 -norm, and uniform norm, respectively. We established the asymptotic properties of τ under the following regularity conditions: The true parameter β 0 is an interior point of Φ.

Condition 2:
The rth derivative of ϕ 0,j , for 0 ≤ j ≤ J, satisfies the Lipschitz condition on T j , and ϕ 0,0 is strictly increasing on T 0 . Condition 3: The support of the observation times is an interval within [l y , u y ] with 0 < l y < u y < inf{t : F 0 (t) = 1}, where F 0 is the CDF of the observation times. Condition 4: The covariate vector Z has a bounded support and E(ZZ ) is non-singular. Condition 5: The first derivative of the link function Q(·) ≡ g −1 (·) is bounded away from 0 and infinite, and the second derivative of Q(·) is bounded.

Condition 6:
The joint density of (Y, Z, W, Δ) is bounded away from 0 and infinite. Condition 7: For any β = β 0 , Pr(Z β = Z β 0 ) > 0 and E{ϕ j (W j )} = 0, for 1 ≤ j ≤ J. Condition 8: The smoothing parameters are specified such that Condition 9: The ratio of maximum and minimum spacings of the knots for S j is uniformly bounded in n, It is noted that condition 1 is a standard assumption in semiparametric estimation. The smoothness assumption of ϕ 0,j in condition 2 guarantees that the unknown nonparametric components can be well approximated by B-splines. Further, under the smooth condition, the upper bound of entropy for Soblev space is lower than that for a collection of monotone functions; see Examples of 19.10 and 19.11 of van der Vaart [29] for more information. It follows that the penalized estimator of ϕ 0 achieves faster rate of convergence compared to that of nonparametric alternative proposed by Ma and Kosorok (2005). Conditions 3-6 are necessary in entropy calculation used to derive the rate of convergence of τ and the asymptotic normality of β. Condition 7 is required to establish the identifiability of the model. Condition 8 is a typical assumption used to derive the asymptotic properties of the penalized estimators. Condition 9 specifies the appropriate order of dimensions of spline spaces to derive the optimal rate of convergence of τ . Finally, condition 10 is common for the asymptotic normality and the efficiency of β.

Efficient information calculation
The log-likelihood of (Y, Δ, Z, W) for τ is given by The score function for β takes the forṁ Differentiating the log-likelihood (τ ) along the parametric submodels and the score operator for ϕ j In the sequel, to simplify notation, define ϕ 0 (·) = η(·) and h 0 (·) = a(·).
By applying the similar arguments to those in the proof of Lemma 5 in Lu and McMahan [13], we conclude that there exist unique

Theorem 2 (Asymptotic normality and efficiency) Under conditions 1-10,
The proofs of these results are relegated to Appendix A. It is worthwhile to point out that if the smoothing parameters and the dimensions of spline spaces are chosen to be of the appropriate order (see conditions 8 and 9), our functional estimators are uniformly bounded and consistent on compact sets, i.e., uniform convergence can be achieved on boundaries for penalized spline estimators, and attain the optimal rate of convergence, O p (n −r/(1+2r) ), which is the best attainable rate of convergence in the context of semiparametric regression; see Stone [23] for more information. Moreover, β is efficient in the semiparametric sense since it achieves the information bound.

Variance estimation
Since the least favorable direction does not have a closed form, it is very demanding to estimate it directly, and so is the information matrix I (β 0 ). To address the issue, we adopt the least-square approach proposed by Zhang et al. [36]. In particular, we use B-splines to approximate h up to some constant. Here the sum-to-zero constraints are applied to h * nj,l (·), for 1 ≤ j ≤ J and 1 ≤ l ≤ q. The constraint only affects h * nj,l (·) vertically to ensure its mean to be zero. Let , for 1 ≤ l ≤ q and then define where P n is the empirical measure. In the above specification, δ l = ( δ 0,l , . . . , where˙ β,l ( τ ) is the lth element of˙ β ( τ ). By the standard least-square calculation, , for j = 0, 1, . . . , J. Thus, Clearly, by the formula of the inverse of block matrix, I −1 n is the upper q × q submatrix of the inverse of the observed information of (4) for β without the penalized term. It can be shown that I n is a consistent estimator of I (β 0 ). For the purposes of conducting large-sample inference, we proposed to estimate the variance-covariance matrix of β by I n,λ , which is the upper q × q submatrix of Using this estimator one can conduct standard Wald type inference about the regression coefficients. Theorem 3 shows that I n,λ and I n are asymptotically equivalent, and hence I (β 0 ) can be consistently estimated by I n,λ .

Theorem 3 (Variance estimation)
Under conditions 1-10, I n,λ is a consistent estimator of I (β 0 ). Moreover, the simulation study presented in Section 5 indicates that the proposed variance estimation technique performs well in practical situations.
The proposed methodology was used to analyze each of the simulated data sets. For purposes of comparison, we also implement the method of Cheng and Wang [2]. To compute both estimators, cubic B-spines were used to approximate all unknown functions with interior knots being placed at equally-spaced quantiles of the covariate values. For each data set, the proposed approach made use of a single knot set consisting of n 1/3 interior knots. In contrast, the competing estimator was computed using multiple knot set configurations with the number of interior knots ranging from n 1/3 − 2 to n 1/3 + 2, with the final unpenalized estimator being selected based on the BIC criterion; for further discussion see Cheng and Wang [2]. The estimator of the variance-covariance matrix of β developed in Section 4 was used to construct 95% Wald confidence intervals for the estimated regression coefficients, while the observed information approach proposed in Cheng and Wang [2] was applied for inference on the regression parameter estimators obtained from the competing approach.
Tables 1 and 2 summarize the results of estimating the regression coefficients across all considered simulation configurations for both the proposed penalized spline method and the competing approach with n = 200 and 400, respectively. Provided in this summary are the empirical bias, sample standard deviation, and mean squared error of the 1000 point estimates, as well as the average of the 1000 estimated standard errors and empirical coverage probability associated with 95% confidence intervals. From these results it is apparent that the proposed approach works well, i.e., the penalized spline estimators of the regression coefficients exhibit little bias, the mean squared errors of estimates decrease as sample size increases, the standard deviations of the point estimates are in agreement with the average estimated standard errors, and the empirical coverage probabilities are at their nominal level. The latter two findings tend to suggest that the proposed method of estimating the variance-covariance matrix works well and that large-sample inference is possible even for relatively small sample sizes. Further, the proposed penalized spline estimator performed as good if not better than the competing technique across all considered simulation configurations even though this competing estimator was chosen as the best from amongst multiple candidates. In fact, the performance of this competing technique varied dramatically (results not shown) across the different knot set configurations, i.e., it was found to be sensitive to the specification of the number and placement of the basis functions. This sensitivity resulted in the necessity to conduct a grid search over the different knot set configurations. This search resulted in an average model fitting time that was between approximately 20 and 100 times slower than the proposed approach.
One of the primary goals of this study was to assess the performance of the proposed method with respect to estimating the unknown functions. To this end, Figures 1 and 2 summarize the estimates of the unknown functions under S2 with all considered values of α for both approaches when n = 200 and 400, respectively. This summary includes the 0.025, 0.5, and 0.975 point-wise quantiles of the estimates. Four figures providing analogous results under S1 and S3 when n = 200 and 400, respectively, are presented in Appendix B (i.e., Figures 4, 5, 6 and 7). These results suggest that the proposed methodology can be used to accurately estimate all unknown functions; i.e., very little bias can be observed when one compares the median of the functional estimates to the true underlying function. The same can not be said for the competing technique.

Table 1
Summary of the results that were obtained using the competing technique (Cheng and Wang [2]) and the proposed penalized spline estimator when α = 0 (PH model), α = 0. In particular, the approach of Cheng and Wang [2] provides estimates that are far more biased and variable than our approach. These findings reinforce the assertion that the proposed penalized spline estimator performs better with respect to both estimation and inference than this existing procedure.
To further examine the performance of our approach, we conducted a second simulation study to compare our method to the proposal of Lu and McMahan [13]. It is important to note that this existing technique was developed for the partially linear Cox PH model; i.e., this approach can only accommodate a single unknown function of a covariate. Thus, in this study comparisons were made under the partially linear Cox PH model. In particular, three data generating scenarios and two sample sizes (n = 200 and n = 400) were considered. Under each setting 1000 data sets were created and analyzed by the two techniques. More details on the specific simulation settings are provided in Appendix C. In addition, the simulation results are summarized and presented in Table 4 and Figures 8 and 9 in Appendix C. In conclusion, both of these techniques can accurately estimate the regression coefficients and unknown functions and draw Table 2 Summary of the results that were obtained using the competing technique (Cheng and Wang [2]) and the proposed penalized spline estimator when α = 0 (PH model), α = 0. reliable inference. That being said, our approach performs as good if not better than this existing technique. In particular, it is observed that our method yields a smaller bias, variability, and mean squared error for the regression coefficients and attains less variability in the estimated functions. More importantly, our approach is far more computational efficient than the approach of Lu and McMahan [13]. That is, our approach is able to complete model fitting approximately 15 to 70 times faster than this existing technique.

Application to HPV data
To further illustrate our approach, the proposed methodology is now used to analyze HPV data which was collected as a part of the National Health and Nutrition Examination Survey (NHANES). HPV is one of the most common sexually transmitted diseases (STD) in the United States and persistent infections may lead to cancer (e.g., cervical cancer) and genital warts. Thus, given these potential sequelae and the asymptomatic nature with which this disease presents, it is essential to monitor the prevalence of this disease within the population of the United States, and elsewhere.
To this end, we consider analyzing the most recent data released by the NHANES study containing HPV test results (2015-2016). In total, there are 2840 participants and the left-censoring rate is approximately 41.7%. Given that HPV is an STD, we define the HPV onset time relative to the reported time of first sexual intercourse for each participant. In addition to HPV status, several covariate were also available; namely, gender (Z 1 = 1 if male), race (Z 2 = 1 if Mexican American, Z 3 = 1 if Other Hispanic, Z 4 = 1 if Non-Hispanic Black and Z 5 = 1 if Other; with Non-Hispanic White as the reference), ratio of family income to poverty (W 1 ) and the participant's age at first sexual intercourse (W 2 ). To identify how the continuous covariates should enter into the final model (i.e., W 1 and W 2 ), the model was first fit allowing for nonlinear effects in these two variables. From this fit, there was very little evidence of nonlinearity in W 1 ; see Figure 13 in Appendix D. Thus, the HPV onset time is related to risk factors through the following model where g α (u) is defined as in (8). In fitting the proposed model, cubic B-splines  with a knot set consisting of n 1/3 = 15 interior knots were used to approximate all unknown functions. Model fitting was completed under both PH (α = 0) and PO (α = 1) models using the hybrid algorithm outlined in Section 4. The same variance-covariance estimation strategies as those used in Section 5 were used to conduct inference for the regression coefficients. Table 3 summarizes the regression parameter estimates, along with their estimated standard errors and corresponding 95% confidence intervals. Results obtained from both the PH and PO models are very similar. The results suggest that gender is not a significant factor, Mexican Americans have a significantly lower and Non-Hispanic Blacks have a significantly higher hazard of developing HPV when compared to Non-Hispanic Whites, while Other Hispanic and Other race do not. The ratio of family income to poverty has a significantly negative linear effect. Figure 3 provides the estimates of η(·) and ϕ(·), along with 95% point-wise confidence intervals. The lower and upper limits of the 95% point-wise confidence intervals were obtained as the 0.025 and 0.975 point-wise quantiles of 1000 bootstrap estimates, respectively. The estimated functions obtained under the PH model show similar features to those obtained under the PO model. The estimate of ϕ(·) clearly exhibits a nonlinear pattern, i.e., age at first sexual intercourse has a nonlinear impact on the hazard of contracting HPV. Note, the confidence interval for ϕ(·) becomes wider at the right end which is attributable to the small number (1.48%) of individuals whose age at first sexual intercourse exceeds 30; see Figure 10 in Appendix D for a histogram of these ages.
For comparative purposes, we also fit the models proposed by Cheng and Wang [2] and Lu and McMahan [13] to the HPV data. To achieve the best performance for these competing approaches, for each of the functions η(·) and ϕ(·), we conducted a grid search to identify the optimal number of interior knots where the candidate models had knot set configurations with the number of interior knots ranging from 1 to 20. The final model for each of these competing techniques was selected from among these candidates based on the BIC criterion.
The results from these competing approaches are also included in Table 3. From these results, one will first note that the estimates of the regression coefficients and the associated 95% confidence intervals obtained from the proposed method and that of Lu and McMahan [13] exhibit a large degree of similarity under PH model, but differ quite a bit from those obtained from the approach of Cheng and Wang [2]. Note, for our approach the results obtained under the PH and PO models do not contradict each other. However, for the approach of Cheng and Wang [2] the results under the PH and PO model do contradict each other. Second, to complete a model fitting, the proposed approach took 4 seconds, however due to the need to conduct a grid search to determine the appropriate number of interior knots, the approach of Cheng and Wang [2] took approximately 4 hours, and Lu and McMahan [13] took around 3 minutes. The estimates of η(·) and ϕ(·) obtained by the two competing approaches are summarized in Figures 11 and 12 in Appendix D.
Finally, to select the best model for these data, we note that under our general model the 2 candidates (i.e., the PH or PO model) are special cases which correspond to the specification of α. For this reason, a common approach used to select from amongst these models is to simply examine which possess a larger likelihood value [34,39]. In order to account for the penalization in the proposed approach, we proceed similarly but make use of a modified BIC criterion. The modification involves replacing the usual degrees of freedom with an expression that quantifies the effective degrees of freedom that results due to penalization. In particular, following the work of [8] (i.e., §6.8.3) and [32], the effective degrees of freedom is given by trace{(X Ω( θ)X )I −1 n,λ ( θ)}, where I n,λ ( θ) = X Ω( θ)X + S is the negative of the expected Hessian matrix of the penalized log likelihood (4). Expressions for X , Ω( θ), and S are provided in Section 3.2. As a result, the effective degrees of freedom for the PH and PO models are 14.58 and 14.45, respectively. Accordingly the modified BIC is 3771.9 and 3769.6 for the PH and PO model. Thus, the PO model is slightly preferred for this considered data set.

Discussion
In this work we developed a computationally efficient penalized approach for fitting a flexible partially linear additive transformation model to current status data. The proposed procedure demonstrates appealing properties, both numerical and theoretical. The hybrid algorithm developed for model fitting is easy to implement and robust to initialization. A simple variance-covariance estimation procedure was established and was shown to provide reliable Wald-type inference. In addition to attractive finite-sample performance, the estimators of the nonparametric components are shown to attain the optimal rate of convergence and we prove that estimators of regression coefficients are asymptotically normal and efficient in the semiparametric sense. To further disseminate this work, code (written in R) which implements all aspects of this research has been prepared and is available upon request from the corresponding author.
Given the successes of this approach, future work will be directed at generalizing this methodology to allow for the analysis of interval-censored data.
A key obstacle in the development of this methodology will be the model fitting strategy, which will be far more complicated due to the complex nature of interval-censored data. Another direction for future research could involve further extending this work to allow for multiple end points, both for current status and interval censored data. Moreover, this penalized spline technique could be used to facilitate the estimation of other survival models, such as semiparametric generalized probit models for informative current status data [6]. Lastly, and potentially most challenging, would be the development of goodness-of-fit tests that could be used to guide the specification of the link function. This is a notoriously hard problem in the transformation model setting, and is only made harder when one considers censored data, and/or additive models.

Lemma 4
Suppose that U 1 , . . . , U n are independent random variables with expectation 0 satisfying the uniformly sub-Guassiion condition, i.e., for some fixed positive constants K and σ 0 and that for all ε > 0 and some constants A > 0 and 0 < α < 2. Further, assume sup f ∈F f n ≤ R for some constant R > 0. Then for some constant C > 0 depending on A, α, R, K and σ 0 , we have for all T ≥ C, Lemma 5 For every probability measure P and every ε > 0, for some constant C > 0.

A.3. Proof of Theorem 1 (Consistency and rate of convergence)
For observations (Δ 1 , Y 1 , X 1 ), . . . , (Δ n , Y n , X n ), the log-likelihood function for τ can be written as In view of the concavity of logarithm function, together with Q(ζ n ) = Q(ζ n ), we have Then, by the definition of τ , On the other hand, by the inequality log( Write U = Δ − E(Δ|X). We have Next we show that . Under the assumption ε 0 ≤ Q(x) ≤ 1 − ε 0 for 0 < ε 0 < 1, the class for some constant C > 0. Further, the sub-Guassian condition holds for binary indicator Δ. It concludes from Lemma 4 that .
Here we use the fact Since Q (·) is assumed to be bounded and ζ n − ζ 0 ∞ = O(ρ n ), we obtain . The same result can also be read for I 3 . Combining (13) and (14) with the results of I 2 and I 3 yields Next, applying the similar arguments to those for deriving I 2 and I 3 yields Thus, combing (17) and (18) with (16), we derive the basic inequality which implies either or Clearly, in view of the assumption of the order of λ j , j = 0, . . . , J, (20) implies h n ( ζ, ζ 0 ) = O p (ρ n ) and J ( ζ) = O p (1). Inequality (21) implies either In view of the assumption of the order of λ j , (22) Combing (23) and (24) Here we use the triangle inequality and Q(ζ n ) − Q(ζ 0 ) n = O(ρ n ). Applying the similar arguments to those in the proof of Lemma 5, we can establish i.e., Q( ζ) converges to Q(ζ 0 ) in L 2 -norm. Thus, Lemma 19.24 of van der Vaart [28] applies and yields that Under the assumption |Q (ζ)| > η 0 > 0 for all ζ in a neighborhood of ε 0 , we obtain the rate of convergence for ζ, i.e., Then, by Lemma 3.1 of Stone [25], Thus, Z ( β − β) 2 = O p (ρ n ), and hence β − β 0 2 = O p (ρ n ) results from the non-singularity assumption of E(ZZ ). Further, by Lemma 1 of Stone [24], we have ϕ j − ϕ 0,j 2 = O p (ρ n ), j = 0, . . . , J. The rate of convergence for τ follows. By Lemma 7.3 of Murphy and van der Vaart [20], Finally, by Cauchy-Schwarz inequality and J( ϕ j ) = O p (1), the first derivative of ϕ j (·) is uniformly bounded in probability, and hence is uniformly equicontinuous in probability on the compact set T j . The uniform convergence of ϕ(·) follows from the Arzel-Ascoli theorem.

Proof of Theorem 2 (Asymptotic normality and efficiency)
We first show that τ satisfies the efficient score equations P n * β ( τ ) = 0, up to a o p (n −1/2 ) term. Differentiating the penalized log-likelihood function The second term in (25) can be bounded by λ 2 j J ( ϕ j )J (h * j ) as a result of Cauchy-Schwarz inequality. In view of J ( ϕ j ) = O p (1), together with the assumption of the order of λ j , we derive the stationary equation Because the second derivative of Q(·) is assumed to be bounded, l(·) is Lipschitz. Further, since both Δ − Q(ζ 0 ) and D k are fixed bounded functions, it concludes from Theorem 2.4 of van de Geer [29] and Theorem 9.23 of Kosorok [9] that the entropy for uniform norm of the class of uniformly bounded functions is of order (1/ε) 1/r , and hence the class is a Donsker class. Since l(·) is Lipschitz, we have Further, according to Theorem 1, and hence Here we use the fact that Similarly, by Theorem 2.4 of van der Geer [29], together with the fact that q(ζ 0 )l(ζ 0 )D is bounded, the class of uniformly bounded functions (ζ − ζ 0 )q(ζ 0 )l(ζ 0 )D with ζ − ζ 0 2 ≤ M , ζ ∞ ≤ M , and J (ζ) ≤ M is a Donsker class. It concludes from Theorem 1 and Lemma 19.24 of van der Vaart [28] that In view of the definition of (h * 0 , . . . , h * J ), we have E * β (τ 0 ) ϕj (τ 0 )[h j (W j )] = 0 for any h j (W j ), j = 0, . . . , J. Thus, E( ϕ j − ϕ 0,j )(W j )q(ζ 0 )l(ζ 0 )D = 0. Therefore,

It follows that
As shown above, the class of uniformly bounded functions ζ −ζ 0 with ζ ∞ ≤ M and J (ζ) ≤ M is a Donsker class. It concludes from Theorem 1 and Lemma 19.24 of van der Vaart [28] that ζ − ζ 0 n = ζ − ζ 0 2 + o p (n −1/2 ), which implies that · 2 and · n share the same rate of convergence. Then, under condition 5, a Taylor expansion yields Further, since both Q(·) and l(·) are Lipschitz, it follows from Theorem 1 that Therefore, combing the stationary equation (26) with (27) -(30) yields The asymptotic normality of β follows immediately from the standard central limit theorem. .
Further, it follows from the continuous mapping theorem and dominated convergence theorem, together with the consistency of τ that Combining (31) and (32) yields By the characteristic of the least favorable direction h * +,l , for any ε > 0 and any h +,l ∈ H + with |h +,l − h * +,l | ≥ ε, P κ l (τ 0 , h +,l ) − κ l (τ 0 , h * +,l ) > 0, and hence we have with probability converging to 1. As shown in the proof of Theorem 2, the class of uniformly bounded functions {Δ − Q(ζ)} l(ζ) for τ − τ 0 ≤ M , ζ ∞ ≤ M , and J(ζ) ≤ M is a Donsker class. Further, in view of bracketing entropy calculation of spline in Shen and Wong [22], the class of uniformly bounded functions z l −h +,l for h +,l ∈ S + , h +,l −h * +,l ∞ ≤ M is also a Donsker class. The Donsker scenarios. In particular, in Scenario 1 (S1) the two functions were specified as η(t) = log(2t), ϕ(w) = 2w, and the regression coefficients were β 1 = 0.5 and β 2 = −0.5; in Scenario 2 (S2) the two functions were η(t) = log{1.5t − log(1 + 1.5t)}, ϕ(w) = 2 sin(−πw), and the regression coefficients were β 1 = 0.5 and β 2 = 0.5; and in Scenario 3 (S3) the two functions were η(t) = log{log(1+t/10)+ √ t/10}, ϕ(w) = 4w 2 − 4/3, and the regression coefficients were β 1 = −0.5 and β 2 = −0.5. To create current status data, observation times (Y ) were simulated from an exponential distribution with a mean of 2, 2, and 1 under S1, S2, and S3, respectively, and the censoring indicator was determined as Δ = I(T < Y ). These specifications yield a variety of right-censoring rates; for example, under S1, S2, and S3 the right censoring rates are 22%, 43%, and 80%, respectively. For each simulation configuration, this data generating process was repeated 1000 times to create 1000 independent data sets consisting of n = 200 and n = 400 observations. We use n 1/3 inner knots for our approach. In contrast, for the competing approach, estimators were computed using multiple knot set configurations with the number of interior knots of each function ranging from n 1/3 − 4 to n 1/3 + 4, then the final unpenalized estimator is selected based on the BIC criterion. Table 4 summarizes the estimation of the regression coef- Appendix D: Additional results for HPV data from two competing approaches Figure 10 presents the histogram of age at first sexual intercourse. Figure 11 summarizes the estimates of η(·) and ϕ(·) (i.e., nonlinear function of age at first sexual intercourse) under both PH (first row) and PO (second row) models, along with 95% point-wise confidence intervals, using the approach by Cheng and Wang [2]. Figure 12 presents the estimates of η(·) and ϕ(·) under PH model using the approach by Lu and McMahan [13]. Figure 13 summarizes the estimates of η(·), ϕ 1 (·) and ϕ 2 (·).