Suﬃcient dimension reduction for survival data analysis with error-prone variables

: Suﬃcient dimension reduction (SDR) is an important tool in regression analysis which reduces the dimension of covariates without losing predictive information. Several methods have been proposed to handle data with either censoring in the response or measurement error in covariates. However, little research is available to deal with data having these two features simultaneously. In this paper, we examine this problem. We start with considering the cumulative distribution function in regular settings and propose a valid SDR method to incorporate the eﬀects of censored data and covariates measurement error. Theoretical results are established, and numerical studies are reported to assess the performance of the proposed methods.


Introduction
Survival analysis has been proven useful in many areas including cancer research, clinical trials, epidemiological studies, actuarial science, and so on. A primary interest in survival analysis is to study the association between survival times and covariates of interests. In addition to traditionally used survival models such as the Cox proportional hazards model, the accelerated failure time model, the proportional odds model, and the additive hazards model, many parametric or semiparametric models have been proposed in recent years to handle more complex features in survival analysis. Those models include the partially linear hazard regression [3], the partially linear single-index survival model [11], and the partially linear transformation model. However, they may still be inadequate to handle real problems due to the lack of the knowledge of the suitability of a particular model. Motivated by this, non-parametric regression models are employed in applications to offer the flexibility of modeling and protect against the risk of model misspecification. However, such models are hampered by the high dimension of covariates. To offer a flexible yet parsimonious model formulation, sufficient dimension reduction (SDR) becomes useful in reducing the dimension of covariates as well as preserving their predictive information (e.g., [15]).
For uncensored data, various methods have been proposed to reduce the dimension of covariates, including the inverse regression ( [16]; [28]), the minimum average variance estimation [24], and the semiparametric framework [19]; some details can be found in [7] and [15]. For right-censored survival data, a number of methods have also been developed for dimension reduction. To name a few, [17] examined the sliced inverse regression method to estimate the central subspace (CS) of dimension reduction directions. [25] considered semiparametric models and proposed the minimum average variance estimation using the inverse censoring weighting scheme. [18] discussed the sliced inverse regression with inverse probability weights and implemented the variable selection approach for sparse data with a large dimension.
While those methods are useful for different settings, they are inapplicable for error-contaminated data, an ubiquitous feature in applications. As noted by [4], when covariates are subject to measurement error, misleading results are often yielded if measurement error effects are ignored when performing sufficient dimension reduction. To address measurement error effects in the SDR framework, [4] proposed the "corrected" covariates for the implementation of sliced inverse regression. [14] established the invariance law for correcting measurement error effects. [27] developed the cumulative slicing estimation method using the "corrected" covariates, which extended the development of [28]. In the presence of both censored data and measurement error in covariates, however, there has been no available work, to the best of our knowledge.
Driven by the lack of methods for handling such data, in this paper we develop estimation methods for handling dimension reduction for censored data with covariate measurement error. We consider the single-index conditional distribution model which covers many useful survival models, and based on them, we develop a semiparametric estimation procedure. Our method does not require the usual assumptions such as linearity and constant variance conditions that are imposed by other authors (e.g., [16]). Our method employs the "corrected" covariates to correct for measurement error effects and applies the conditional expectation scheme to remove the bias caused by censoring.
The remainder is organized as follows. In Section 2, we introduce the framework of right-censored survival data, the single-index conditional distribution model, and the measurement error model. In Section 3, we propose a valid estimation procedure to correct for the measurement error effects, estimate the parameters of interest, and reduce the dimension of covariates. Theoretical results are established in Section 4. Empirical studies, including simulation results and real data analysis, are provided in Section 5. We conclude the article with discussions in Section 6. Technical justifications are included in the appendices.

Preliminaries and framework
In this section, we introduce the preliminaries of SDR, survival analysis, and measurement error models.

SDR and conditional distribution
Let T > 0 be the univariate response, and let X be the p-dimensional vector of covariates having a multivariate normal distribution, where p is often a large positive integer. The spirit of SDR is to find a p×d matrix, say B = [β 1 · · · β d ], such that T ⊥ ⊥ X|B X, (2.1) where "⊥ ⊥" stands for the statistical independence, and β j is a p-dimensional column vector for j = 1, · · · , d. Here d can be viewed as the dimension of the reduced covariates and is smaller than p, and B is often called a basis. Let S(B) represent the SDR subspace which is spanned by the column vectors of B. [6] showed that the intersection of all such S(B) exists. Consequently, such an intersection is called the central subspace (CS) for the regression of T on X. Let S T |X denote the CS with the structural dimension d dim(S T |X ) which is basically unknown.
We now consider the cumulative distribution function of T given X = x, F T |X (t|x) P (T ≤ t|X = x), which is written as F T |X (t|x) = F 0 (t, x) (2.2) to show its dependence on both values of T and X, where F 0 (·, ·) is an unknown nonnegative function. As discussed in [19] and [24], in the absence of censoring, (2.1) yields that P ( T ≤ t| X = x) = P T ≤ t| B X = B x for any t ∈ R 1 and X ∈ R p , suggesting that We consider the setting where the response T represents the survival time for a subject and is associated with the covariates X. We are interested in finding the CS, S T |X , to study the relationship between the survival time T and covariates X. As noted by [19], the basis matrix B is not unique. To uniquely map CS to a basis matrix, we follow [19] and redefine B as with D being a (p − d) × d matrix having (p − d)d unknown parameters and I d×d being the d × d identity matrix. Our objective is to estimate the basis matrix D. But for ease of exposition, we still refer to B in the following development.

Survival data with measurement error
In survival analysis, T is usually incomplete due to the presence of the censoring time for a subject, denoted C. We assume that given X, C and T are independent and that C and X are independent (e.g., [22]); the assumption are listed as Condition (C2) in Appendix A. Let Y = min(T, C) and Δ = I(T ≤ C), where I(·) is the indicator function. Directly implementing the SDR methods to the observed variable Y and covariates X is equivalent to studying S Y |X , which is generally not equal to S T |X , as shown by [18] and [25] that On top of the issue of censoring, another challenge is posed by that covariates X are commonly error-contaminated. To facilitate this feature, let X * denote the observed surrogate version of X. The dimension of X * , denoted s, can be identical to or differ from p, though in applications, s is often equal to p. As in [4], we consider the measurement error model where is independent of {X, T, C}, γ is an s-dimensional vector of parameters, and Γ is an s × p matrix of parameters which may be known, partially known, or unknown. As in [14], we assume that follows a normal distribution. Let Σ X * , Σ X , and Σ denote the covariance matrices of X * , X, and , respectively. As discussed in (2.5) of [4], we consider U = LX * (2.6) as the "corrected" covariates expressed in terms of X * , where With X and following a multivariate normal distribution, [14] showed that S T |X = S T |U , suggesting that replacing X by U preserves the CS S T |X , and hence,

Determination of "corrected" covariates
Suppose that we want to collect measurements {Y i , Δ i , X i } : i = 1, · · · , n of a sample of n subjects, where for i = 1, · · · , n, {Y i , Δ i , X i } has the same distribution as {Y, Δ, X}. Let τ denote a finite value typically larger than the maximum observed times in the sample. As X i is subject to measurement error, rather than having precise measurement of X i , we often observe surrogate measurement X * i (or repeated measurements) for i = 1, · · · , n, where {X * i , X i } has the same distribution as {X * , X}.
Note that model (2.5) yields that Σ X * = ΓΣ X Γ + Σ , and that Σ X * can be estimated by its empirical estimator based on the available measurements of X * . To estimate L, we need only to handle Σ and Γ. Consequently, we consider the following three scenarios.
Scenario I : Both Σ and Γ are known.
In this scenario, L is determined by (2.7), which allows us to directly calculate the "corrected" covariates U using (2.6). Such a scenario is useful for conducting sensitivity analyses to understand the impacts of measurement error in X on SDR. Scenario II : Γ is known, Σ is unknown, and repeated measurements of X are available.
Suppose that two repeated measurements of X, {X * ir : r = 1, 2; i ∈ R}, are collected for additional m |R| subjects, where R denotes the index set for those subjects. Consistent with [4], we take Γ to be I p×p for ease of discussions. Then the measurement error model based on repeated measurements is given by for i ∈ R and r = 1, 2, where ir ∼ N (0, Σ ) and ir is independent of By direct derivations, we obtain that and so that L can be expressed as which is estimated by with Σ and Σ X * being empirical estimators of (2.10) and (2.11), respectively. Scenario III : Both Σ and Γ are unknown and validation data are available.
Suppose that M is the set of n subjects for the main study and V is the set of m subjects for the external validation study. That is, M and V do not overlap, and the available data for the main study and the validation sample are respectively. Hence, the measurement error model (2.5) gives that Here the transportability assumption [5, p. 30] is made for the validation and main study data.
Let μ X = E(X i ) and μ X * = E(X * i ). Then using the validation data and hence, by (2.7), L can be estimated by Once the estimator L is obtained by either repeated measurements or validation data in Scenarios II or III, we adjust the surrogate covariate X * i by U i = LX * i to accommodate measurement error effects in the following development.

Methodology
In this section, we consider model (2.3) and propose a method to estimate the basis matrix B for survival data subject to measurement error in covariates. To be more specific, we first apply (2.6) to correct for the measurement error effects and also use the inverse probability weighting scheme to adjust for the censoring effects due to censored responses; next, we propose valid inferential procedures to estimate B and d without imposing additional conditions, such as the linearity condition (e.g., [16]) which is commonly used in the conventional SDR methods.

Adjustment for measurement error and censoring effects
In the presence of measurement error, X in (2.3) is often unavailable for estimation of B. We want to use the observed surrogate measurement X * to estimate B with measurement error effects accounted for. The equivalence (2.8) suggests us to consider SDR based on U . Analogous to (2.2), we write P (T ≤ t|U ), or for a nonnegative unknown function F (·, ·). By (2.8), we have that for any t > 0, and hence, yielding that The equality (3.2) offers us the basis of estimating B using the surrogate measurements of X i together with the survival times T i . In survival analysis, however, T i is usually incomplete due to censoring; we have only (Y i , Δ i ) as described in Section 2.1. To accommodate censored responses, a useful strategy is to proceed with the inverse weighted scheme.
For any given c > 0, let G (c) P (C ≥ c). Then for given y > 0 and U = u, where the first step is due to (2.8), and the last step comes from (3.2). That is, for given y > 0 and u. The identity (3.3) allows us to estimate B using the observed time Y by examining an expectation conditional on the observed surrogate covariate X * , where the weight Δ G(Y ) is imposed to correct for the censoring effect, and U adjusts for the measurement error effects.

Estimation procedures
With (3.3), we estimate the function F (·, ·) using the kernel estimation method: with B given by (2.4), where G(·) is a consistent estimator of G(·) which can be obtained from the nonparametric Kaplan-Meier estimator (e.g., [22] and q is a positive integer. With the estimator of F (·, ·) by (3.4), we use the leave-one-out cross-validation (CV) criterion to calculate the CV value and F (−i) (y, B u) is the estimator of (3.4) with the ith subject being deleted.
In the degenerate case with d = 0, B is null, and the CV value in (3.5) can be calculated as

Theoretical results
In this section, we present theoretical results of the proposed method, and the proofs of the theoretical results are deferred to Appendix C. Let vec(·) denote the vectorization operation that stacks the columns of a matrix from left to right, and let · represent the Frobenius norm of a matrix. Define a ⊗2 = aa for any vector a. To emphasize the involvement of the estimator L, we write U i = LX * i as defined in Section 2.3. For a differentiable function f (α), let ∇ j α f (α) denote the jth order derivative of the function f (α) with respect to α. Let B 0 and d 0 denote the true values of the parameter and its structural dimension, respectively. Let h 0 be the optimal bandwidth. We first present the consistency of the estimators ( B, d, h) whose proof is placed in Appendix C.1.
Theorem 4.1 says that the estimator B converges in probability to its true value as n goes to infinity, and both estimated structural dimension and bandwidth are close to the true structural dimension and the optimal bandwidth with probability approaching one, respectively.
For l = 0, 1 and j = 0, 1, 2, let Furthermore, let f B U (B u) denote the density function of B U , and define We now present the theoretical result for (3.4) with B replaced by its estimator, and the proof is deferred to Appendix C.2.

Theorem 4.2. Under regularity conditions in Appendix A, then
Theorem 4.2 indicates that the difference between the estimated function and the true function is of order O p n −1/2 if the biased term (i.e., the third term in the left-hand-side of the identity) is removed. Finally, we present the asymptotic distribution of the estimator B. Define , (4.6) and We define (a) Assume that L is known, then as n → ∞, Assume that L is unknown and estimated based on either repeated measurements or validation data. Let Φ i be if L is estimated based on repeated measurements, and let Φ i be Theorem 4.3 shows the asymptotic normality of the proposed estimator B under three different scenarios in Section 2.3. In particular, when Σ is unknown and additional information is available, there exist an additional term T (B 0 )Φ i in the asymptotic variance.

Numerical studies
In this section, we conduct simulation studies to assess the performance of the proposed estimators for a variety of settings. We first design the simulation settings and then present the simulation results. Finally, the methods are implemented to analyze a real dataset.

Simulation design
where Σ X is the covariance matrix with diagonal entries being one and off-diagonal entries being 0.4. Given the covariates X and B 0 , we use three models, the proportional hazards (PH), proportional odds (PO), and additive hazards (AH) models, to generate survival times. Specifically, the corresponding cumulative distribution functions F (·, ·) are formulated, respectively, as and where ϕ(·, ·) is a given function. Let U be generated from the uniform distribution UNIF(0, 1). Then survival times T based on the PH and PO models can be generated from and respectively, and survival times T based on the AH model can be obtained by solving the following equation We consider the following two settings: and β 20 = (0, 1, 0, 1, 0 p−4 ) ; and ϕ(X, B 0 ) = X β 10 2 + 2 X β 20 , where 0 r represents a zero vector of dimension r.
Let C be the censoring time generated from the uniform distribution UNIF(0, c), where c is a constant that is chosen to yield about 50% censoring. Consequently, we calculate Y = min{T, C} and Δ = I (T ≤ C).
We consider three scenarios to generate surrogate measurements of X and estimate B using the methods described in Section 2.3 accordingly.

Scenario 1: Both Σ and Γ are assumed known.
For i = 1, · · · , n, X * i is generated from (2.5) where γ = 0, Γ is the identity matrix, and i assumes a distribution, denoted f ( i ). To obtain the estimate B of B, we employ (3.6) in combination with Scenario I in Section 2.3. Scenario 2: Γ is known, Σ is unknown, and repeated measurements of X i are available. Suppose that two repeated measurements X * ir : r = 1, 2; i ∈ R are generated for additional m subjects with m = 100, where X i and ir are independently generated from N (0, Σ X ) and f ( i ), respectively, and X * ir is generated from X * ir = X i + ir for i ∈ R and r = 1, 2. To obtain the estimate B of B, we employ (3.6) in combination with Scenario II in Section 2.3. Scenario 3: Both Σ and Γ are unknown and validation data are available.
For i = 1, · · · , n, we generate i from f ( i ) independently and set X * i = X i + i . The main study data then consist of Then we obtain the estimator B of B using (3.6) in combination with Scenario III in Section 2.3.
We compare the performance of the proposed methods to the naive estimators of F (·) and B, which are derived by directly implementing the observed covariates X * i in (3.4) and (3.5). As a reference for comparisons, we also use the true values of X for the estimation, and denote this method as "true". In implementing our methods, we take the kernel function to be the Epanechnikov function, given by Consider the sample size n = 200, 300, or 400. We repeat simulations 500 times for each setting. To assess the accuracy of the estimator B of B, we consider the Frobenius norm respectively. To examine the performance of the estimator of F (·), we consider the mean integrated squared error (MISE) Further, we evaluate the standard error (S.E.) for the estimators of B and F (·) which are calculated as the sample standard deviation for those estimates obtained from the 500 simulations.
For each setting, we report four proportions of the estimator d for 500 simulations, given by 1 500

Performance under normally distributed measurement errors
In this subsection we evaluate the performance of the proposed method under the circumstance where the distribution of measurement error term is correctly specified. That is, we set f ( i ) to be a normal distribution in the three scenarios of generating surrogate measurements. Specifically, we set f ( i ) to be N (0, Σ ) with Σ being a diagonal matrix having diagonal entry σ 2 = 0.15, 0.5, or 0.75.
We present the simulation results in the top panels of Tables 1-9. In terms of estimation of B and F (·), the naive method produces biased results, and the finite sample bias increases as the degree of measurement error increases; the proposed methods greatly outperform the naive approach, yielding results that are fairly close to those produced from the reference method by using the true measurements of the covariates. Agreeing with the phenomenon we observed in the literature of measurement error models, the standard errors associated with the proposed methods are larger than those obtained from the naive method, which is the price paid to correct for biases induced from measurement error in covariates. While the differences for estimation of h and d are not very striking between the naive method and our proposed methods, our approaches perform better than the naive approach.

Misspecification of the measurement error distribution
To assess the sensitivity of the proposed methods to misspecification of the measurement error distribution, we repeat the simulation studies in Section 5.1.2, except that the normal distribution N (0, Σ ) of i or ir is replaced by the uniform distribution UNIF(0, √ 12σ ) when we generate data. Simulation results are summarized in the bottom panels of Tables 1-9.
The results demonstrate the same patterns observed in Section 5.1.2. The estimators of B and F (·) produced from the proposed methods still outperform those without suitable correction for the measurement error effects. Standard errors for the associated methods exhibit the same patterns as those shown in Section 5.1.2. Compared to the results in Section 5.1.2, the performance of the proposed estimators appears to be fairly insensitive under the misspecification of the measurement error distribution we consider.
In summary, in the presence of measurement error, the naive method yields unsatisfactory results. The proposed methods appear to successfully correct for the measurement error effects for various settings.

Analysis of ACTG 175 dataset
We implement the proposed method to analyze the AIDS Clinical Trials Group (ACTG) 175 data which were discussed by [10]. The ACTG 175 study was a double-blind randomized clinical trial which evaluated the HIV treatment effects. The dataset is available in R package "speff2trial". The dataset contains measurements on 26 variables for 2139 individuals; these variables are age, wtkg, hemo, homo, drugs, karnof, oprior, z30, zprior, preanti, race, gender, str2, strat, symptom, treat, offtrt, cd40, cd420, cd496, r, cd80, cd820, cens, days, and arms. Since the variable cd496 contains missing values and r is its missing indicator, so we remove those two variables. In addition, we remove variables zprior and treat because zprior is the constant 1 for all subjects and treat indicates whether or not the subject received the zidovudine treatment, overlapping with arms. As a result, in addition to the survival time days and the censoring indicator cens, we have p = 20 covariates in the dataset where cd40 is error-prone. The censoring rate of this dataset is approximately 75.6%.
Fourty-four subjects were measured once for cd40 at the baseline, while 2095 subjects had two replicated baseline measurements of cd40. As discussed in [26,Section 3.6.4], let X denote log(cd40 + 1). To implement the proposed method, we consider the measurement error model (2.9) to facilitate the availability of repeated measurements. Consequently, based on the discussion of Scenario II in Section 2.3, the estimates of Σ and Σ X * are given by Σ = 0.035 and Σ X * = 0.114, respectively, yielding L = 0.693 as indicated by (2.12). In this study, we consider the entire data with error correction by L. In addition to X, let Z denote the vector of the remaining 19 covariates that are assumed to be precisely measured. Consequently, we let U = LX * , Z denote a 20dimensional vector of covariates to be implemented with the proposed method.
The naive method and the proposed method give d = 2, suggesting that there are two directions in the central subspace, say β 1 and β 2 . In Figure 1, we present the scatter plots of the observed time Y i and β j U i with j = 1, 2. While the naive method and the proposed method show fairly similar patterns, the scatter plot based on the naive method seems slightly more variable than that for the proposed method. The similarity may be pertinent to the small degree of measurement error, indicated by the estimate Σ = 0.035 of Σ obtained from the replicate measurements. The more variability of the results obtained from the naive method agrees with the phenomenon that often exhibits in settings with measurement error (e.g., [4]; [5]; [26]). We also examine the estimated functions 1− F (·) for subjects i = 1, 7 and 23, and display the curves in Figure 2. It is seen that the estimated curves based on the naive and the proposed methods have similar patterns.

Discussion
Sufficient dimension reduction is a useful tool in regression models, which mainly reduces the dimension of variables with the predictive information of covariates preserved. In this paper, we deal with SDR for censored data with measurement error using cumulative distribution models. We propose valid inferential procedures to correct for the measurement error effects and estimate the central subspace. Our methods are justified theoretically, and their finite sample performance is demonstrated to be satisfactory through simulation studies.
Some further extensions are worth exploring. In the development here the censoring time C i is assumed to be independent of the covariates X i . This consideration is basically driven by its common use in the literature, which enables us to use the Kaplan-Meier estimator to consistently estimate the survivor function of the censoring process. This assumption, however, is not essential. One may relax it by replacing the unconditional survivor function G(c) of C i with the conditional survivor function, G(c|X i ) P (C i > c|X i ), of C i given X i , and modify the development here accordingly. Estimation of G(c|X i ) may be performed with parametric or semiparametric regression models imposed (e.g., [18]).
As shown in Section 2.2, the normal distribution for X and is required when replacing X * by U , which ensures the invariance law to hold ( [14]). When X does not follow a normal distribution but has a large dimension, Li and Yin [14, Section 6] discussed approximate invariance using the low dimensional projections of covariates. It would be interesting to examine whether this scheme can be combined with the estimation procedure in Section 3 to handle non-normally distributed and error-contaminated covariates. Another interesting topic is to relax the normality assumption of X by assuming X to be elliptical symmetric [8], as suggested by the Associate Editor.
The development here considers the case where the components of X are all continuous random variables. In applications, however, error-prone vector X may contain discrete components. It is interesting to generalize our methods to accommodating this circumstance, and such research warrants careful explorations. Conditions (C1) to (C3) are regular assumptions in survival analysis for the establishment of the asymptotic properties (e.g., [1]). The independence of C i and X i is commonly imposed in survival analysis (e.g., [22]). Condition (C4) guides the choice of a suitable bandwidth and is used to establish the √ nconsistency of B [13]. Condition (C5) is a routine requirement that is used to establish the asymptotic distribution of B. Consistency of G(·) in Condition (C6) is imposed to establish the asymptotic properties, which was also used by other authors such as [18]. Requiring it to be bounded away from zero avoids numerical instability in extreme situations, as noted by [29].

Appendix B: Technical Lemmas
Taking the difference between (4.1) and (4.3) and between (4.2) and (4.3), we further define In the following lemmas, we present the convergence rates of (B.1) and (B.3).
Proof:  [20]. Further, Lemma 2.12 in [20] implies that those three classes are Euclidean, and thus, Lemma 2.14 in [20] indicates that is also a Euclidean. As a result, by Theorem II.37 in [21] and derivations similar to [9], we have that Since L is estimated by either repeated measurements or a validation sample, by similar derivations of Theorem 1 in [27], we have that As the result, we have if L is unknown and L is the estimator, then

Proof:
For j = 0, by expressions (3.4) and (4.1), we observe that where the fourth equality is due to adding and subtracting the term F Therefore, combining with (B.13) and (B.14) gives (B.9) with j = 0. Similar procedure gives (B.9) with j = 1. We next discuss the derivation of (B.10). Since for j = 0, 1. The second term of (B.15) is derived, so the remaining target is the first term of (B.15). By the Taylor series expansion on ∇ j vec(B), L F (y, B u) with respect to L gives In addition, define and In the next lemma, we examine the behavior of the CV value.

Proof:
Let and Then the cross-validation criterion (3.5) To study the uniform consistency of CV (B, d, h) and ECV (B, d, h), we consider the following two scenarios.

Case 1: S T |U ⊆ S(B).
In this case, we have that F (y, B 0 u) = F (y, B u) (e.g., [13]). Thus, we immediately have that Since S 1 is the form of U-statistic, then applying the convergence property of U-statistic (e.g., [23,Ch 12]) gives that as n → ∞, By Lemmas B.1 and B.2, S 3 can be expressed as Since the class {ζ j1,B (y, u)} is Euclidean ([20, Theorems 2.13 and 2.14]), then by the derivations similar to Theorem of [13], we can show that On the other hand, for the parameter corresponding to measurement error model, if L is known, then S 4 = R 3 = R 6 = 0. If L is unknown and L is the corresponding estimator, then As shown in [14], L is the estimator of L with √ m-rate. It means that L − L =

Case 2: S T |U ⊆ S(B).
In this case, the derivations of S 1 , S 3 , R 2 , R 3 , and R 6 can be determined in Case 1, but S 2 , R 1 , R 4 , and R 5 do not equal zero. The main goal in this case is to discuss those remaining parts.
Noting that by Theorem 3.1 in [2], we have that which is equivalent to

Appendix C.1. Proof of Theorem 4.1
Note that L is the consistent estimator in the sense that L = L + o p (1), then we have that U i = U i + o p (1). Hence, in the remaining proof, we focus on U i . In addition, we divide this proof into two steps. In step 1, we discuss the consistency of B, and then discuss the asymptotic performances of h and d in Step 2.
Step 1: The consistency of B.
For any ξ > 0, we further have that By (C.1) and (C.2), we have that In particular, for the second term of (C.3), we further have that where the last step is due to the consideration of the event b −1 as n → ∞, where the first step is due to (B.17) and (B.18), and the second step As a result, combining (C.3) and (C.5), we have that as n → ∞, Therefore, by (C.6), we have that as n → ∞, Step 2: The asymptotic performance of d, h .
In (C.6), set ξ = inf B:d<d0 b 2 0 (B). Then we have that where the second inequality is due to the fact as well as that P b 2 0 ( B) < inf B:d<d0 b 2 0 (B), d < d 0 = 0 by the definition of B. Applying (C.6) to (C.7) gives that as n → ∞, log n n , By (C.8) and the partition of events W CV and W k for k = 1, · · · , 5, we have that For k = 2, applying (C.6) with ξ = log n n gives that as n → ∞, For k = 3, applying (C.8) gives that as n → ∞, For k = 4, we have that as n → ∞, For k = 5, we have that as n → ∞, Therefore, combining (C.10)-(C.13) with (C.9) yields that as n → ∞, and we conclude that d, h are consistent estimators.

Appendix C.2. Proof of Theorem 4.2
Note that L is a consistent estimator in the sense that L = L + o p (1), then we have that U i = U i + o p (1). Hence, in the remaining proof, we focus on U i . By adding and subtracting an additional term F y, B 0 u , we have that where On the other hand, if L is known, then ∇ 1 vec(B) CV (B 0 , d 0 , h 0 ) can be expressed as where the second equality comes from adding and subtracting additional terms F (y, B 0 u) and F (1) (y, B 0 u), and the last step is due to For T 1 , applying the convergence property of U-statistic in [12], we have that as n → ∞, Since T 2 and T 3 contain ζ (0) i,B0 (y, u) and ζ (1) i,B0 (y, u) in (4.4) and (B.8), respectively, and involve F If L is determined by repeated measurements, then and if L is obtained by validation data, then where Σ XX * = cov(X i , X * i ). Applying Lemma B.1 and combining (C.23) with the expression of L − L give that