Efficient estimation in the partially linear quantile regression model for longitudinal data

Abstract: The focus of this study is efficient estimation in a quantile regression model with partially linear coefficients for longitudinal data, where repeated measurements within each subject are likely to be correlated. We propose a weighted quantile regression approach for time-invariant and time-varying coefficient estimation. The proposed approach can employ two types of weights obtained from an empirical likelihood method to account for the within-subject correlation: the global weight using all observations and the local weight using observations in the neighborhood of the time point of interest. We investigate the influence of choice of weights on asymptotic estimation efficiency and find theoretical results that are counter intuitive; it is essential to use the global weight for both time-invariant and time-varying coefficient estimation. This benefits from the within-subject correlation and prevents an adverse effect due to the weight discordance. For statistical inference, a random perturbation approach is utilized and evaluated through simulation studies. The proposed approach is also illustrated through a Multi-Center AIDS Cohort study.


Introduction
Efficient estimation has been attracting significant attention in parametric quantile regression (QR) models for longitudinal data; see [9,6,19,5,11,20,14,3]. However, this has not been investigated in semiparametric or nonparametric QR models. In response to this gap in literature, this paper thoroughly investigates estimation efficiency in the partially linear quantile regression model; for τ ∈ (0, 1), the τ th conditional quantile of the jth response from subject i measured at time t ij , denoted by Y i (t ij ), is formulated as for i = 1, . . . , n and j = 1, . . . , m i , where m 1 , . . . , m n are the number of measurements taken from n subjects and Z i (t ij ) = {Z i1 (t ij ), . . . , Z iq (t ij )} T and X i (t ij ) = {X i1 (t ij ), . . . , X ip (t ij )} T are a q-dimensional and p-dimensional covariate vector, respectively. We remark that model (1.1) allows us to explore efficient estimation in various quantile regression frameworks, such as the parametric QR model including only X i (t ij ) T α τ and the time-varying coefficient QR model including only Z i (t ij ) T β τ (t ij ). To fit the partially linear QR model for longitudinal data, [21] considers B-spline basis functions, yet it does not achieve estimation efficiency due to ignoring the within-subject correlation commonly existing in longitudinal studies. This paper extends the empirical likelihood based weighted QR approach [19] to the partial linear QR framework. We develop a weighted marginal QR and a weighted local linear QR for estimation of time-varying parameters β τ (t) and time-invariant parameters α τ , respectively. Two types of weights are considered from the empirical likelihood: the global weight using correlations of all observations and the local weight using correlations of observations in the neighborhood of the time point of interest. For comparison purposes, we also consider the equal weight where the within-subject correlation is completely ignored in estimation. We have investigated the impact of the choice of weights and have obtained very interesting results that oppose to one's intuition in accommodating the local weight and global weight for efficient estimation of β τ (t) and α τ , respectively. According to our theoretical results, the estimator of α τ obtained based on the aforementioned intuition is not always more efficient than that obtained by completely ignoring the within-subject correlation.
The key finding in our asymptotic analyses is as follows. Asymptotic estimation efficiency of α τ can be gained when the global weight is used in the weighted marginal QR for estimation of α τ . On the other hand, estimation efficiency of β τ (t) is not asymptotically affected by the choice of the weights in the sense that the limiting variances of the estimator of β τ (t) are identical with the three weights considered. One may conclude from the foregoing result that the choice of weights in estimation of β τ (t) is not crucial. However, it has a significant effect on the asymptotic estimation efficiency of α τ , since a loss of estimation efficiency of α τ has been found from the discordance between the weights used for estimation of β τ (t) and α τ . In summary, the global weight should be used in both estimation of β τ (t) and α τ to achieve estimation efficiency of α τ ; the amount of estimation efficiency for α τ gained from the proposed approach corresponds to that obtained in [19] under the parametric QR model.
For statistical inference on β τ (t) and α τ , we implement the random perturbation approach [8]. This does not require estimation of the complicated limiting variance of the estimators. Through extensive simulation studies, the proposed inference approach accomplishes good empirical coverage probabilities very close to the nominal level in all cases under consideration. In addition, the simulation results confirm that estimation efficiency of α τ can be achieved when the global weight is used for both estimation of β τ (t) and α τ .
This article is organized as follows. In Section 2, we introduce how to obtain empirical likelihood weights and propose the weighted QR approach in the partially linear QR model. In Section 3, we investigate asymptotic properties of the estimators of β τ (t) and α τ under various combinations of weights and propose the best strategy that leads to estimation efficiency of α τ asymptotically. Section 4 presents a two-step procedure to implement the proposed approach and a cross-validation approach to choose the optimal bandwidth. In Section 5, we assess the finite sample performance of the proposed approach through simulation studies and apply this to the analysis of real-life data. Regularity conditions and proofs of theoretical results are provided in the Appendix.

Estimation procedures with auxiliary information
For the ease of presentation, we denote Y ij = Y i (t ij ), Z ij = Z i (t ij ), and X ij = X i (t ij ) in (1.1) and write U ij = (t ij , Z ij , X ij ). If repeated measures within a subject are assumed independent, estimators of β τ (t) and α τ in (1.1) can be obtained as are partial quantile residuals, and K(·) is a kernel function with a bandwidth h. However, this procedure ignores the within-subject correlation and therefore can cause a loss of estimation efficiency.
In order to account for the within-subject correlation, [19] proposed a weighted QR with subject-specific weights obtained from the empirical likelihood in the parametric mean regression model. Since these weights contain auxiliary information with regards to the within-subject correlation, estimation efficiency of regression quantiles is improved. Here, we adopt auxiliary information from the partial linear mean regression model. This is the counterpart of (1.1), is a q-dimensional vector of time-varying functions, and the error ij satisfies the conditional variance σ 2 (U ij ) = E( 2 ij |U ij ) and the conditional mean E( ij |U ij ) = 0.
For efficient estimation of γ and δ(t ij ), [13] developed kernel estimating equations and profile estimating equations involving a working correlation matrix of the repeated measures, denoted by R i . Following the matrix expansion idea of quadratic inference functions [17], i.e., R −1 . . , B is are known basis matrices and b 1 , . . . , b s are unknown coefficients, we extend the ith component in kernel estimating equations as and the ith one in profile estimating equations as In what follows, we omit conditioning γ in δ(t|γ), denoted by δ(t), if no confusion arises. See Remark 2.1 below for discussion about the choice of basis matrices.
To secure auxiliary information with regards to the within-subject correlation, two types of subject-specific weights can be provided as (2.5) and Since L{δ(t)} and L(γ) consider observations in the neighborhood of the time point of interest and all observations, respectively, their corresponding weights, w i {δ(t)} and w i (γ), are called the local weight and global weight. Following [16], the optimal local weight is given To incorporate the auxiliary information for estimation of regression quantiles in the partial linear QR model (1.1), we propose weighted quantile regressions for estimation of β τ (t) and α τ . With a given α τ , the weighted local linear QR is formulated to estimate β τ (t) as where Y * ij = Y ij − X T ij α τ and w n i are weights used for time-varying coefficient estimation. Here, the superscript n in w n i stands for nonparametric estimation. Givenβ τ (t|α τ ) from (2.7), an estimator of α τ is obtained by minimizing the weighted marginal QR aŝ , w p i are weights used for time-invariant coefficient estimation, and the superscript p in w p i stands for parametric estimation. The performance of our estimation method relies on the choice of two weights, w n i and w p i , that plays an important role to achieve estimation efficiency. With the local weight w i {δ(t)} and global weight w i (γ), four combinations for w n i , w p i can be considered as Intuitively, the last one would be a natural choice because they are obtained from the empirical likelihood for their counterpart in the mean regression model. We note that w n i = w p i = 1/n is the case, where the withinsubject correlation is completely ignored, corresponding to (2.1) and (2.2). Remark 2.1. Basis matrices B i1 , . . . , B is are determined by the type of working correlation structure. For example, if R i is assumed a first-order autoregressive model denoted by AR(1), then is a symmetric matrix with 1 on the sub-diagonal entries and 0 elsewhere, and B i3 is a symmetric matrix with 1 in elements (1, 1) and (m i , m i ) and 0 elsewhere. If R i corresponds to a compound symmetry structure, two basis matrices B i1 and B i2 are required, where B i1 is an identity matrix and B i2 is a symmetric matrix with 0 on the diagonal and 1 elsewhere. See [17] for more details.

Theoretical studies
In this section, we investigate the impact of auxiliary information on asymptotic properties ofβ τ (t|α τ ) andα τ based on three weights: global weight w i (γ), local weight w i {δ(t)}, and equal weight across subjects w e i = 1/n.

Asymptotic properties ofβ τ (t|α τ )
We assume that U i (t) = {t, Z i (t), X i (t)} are independent realizations of the process U (t) = {t, Z(t), X(t)} and t ij are independent and identically distributed with a probability density function and denote f ξτ (·|U ij ) and F ξτ (·|U ij ) by the conditional density and distribution functions of ξ ij (τ ) given U ij , respectively. For simplicity of theoretical derivation, we let m i = m for all subjects, yet the theorems derived in this paper can be proved with different m i < ∞ in a straightforward manner. Accordingly, we let N = nm, simplify R −1 Theorem 1 states that the asymptotic theory ofβ τ (t|α τ ) does not depend on Thus, in what follows we omit conditioning α τ in β τ (t|α τ ). More importantly, the variances ofβ τ (t) under three different weights are asymptotically equal. The results also suggest that the auxiliary information through the empirical likelihood does not improve asymptotic estimation efficiency in the time-varying coefficient QR model, which is a simple case of model (1.1). The phenomenon that the additional term is zero can be explained as follows: as bandwidth h → 0, at most one measurement per subject is ultimately used in the estimation of β τ (t) and hence, the working correlation structure does not make an impact on the asymptotic result. The use of the global weight also fails to yield an asymptotically more efficient estimator of β τ (t) than the one assuming the independent correlation structure. This is because the convergence rate of the additional term generated by the global weight in (3.4) is faster than that of the variance ofβ τ (t).

Asymptotic properties ofα τ
According to the asymptotic result as in Theorem 1, the choice of weights among the three considered weights seems to be trivial for estimation of β τ (t). However, in the semiparametric regression, the asymptotic results of the time-invariant coefficient estimator is often influenced by those of the time-varying coefficient estimator. For example, the invariant coefficient estimator is often biased in the semiparametric regression unless undersmoothing is used [13,1]. Therefore, the choice of weights for estimation of β τ (t) may play an important role in the asymptotic properties ofα τ . As shown in [19], the natural choice of weights w p i in (2.8) for estimation of α τ is the global weight w i (γ). Thus, we first investigate asymptotic properties ofα τ with w p i = w i (γ) and discuss the impact of using other weights for estimation of α τ later.
To appreciate how the choice of weights w n i in (2.7) for estimation of β τ (t) influences estimation efficiency of α τ , its estimator is denoted byα τ (w n i ) in this section. We define

Theorem 2. Assume that conditions (A1)-(A5) and (B1)-(B3) in the Appendix hold and the global weight
Theorem 2 shows that the choice of weights for estimation of β τ (t) has a significant influence on the limiting variance of the estimator of α τ . An asymptotic variance ofα τ {w i (γ)} cannot be larger than that ofα τ [w i {δ(t)}] andα τ (w e i ), due to a nonnegative definite matrix L. In general, L is most likely to be positive definite, since the number of estimating equations in (2.4) is larger than the dimensionality of γ. It is important to note that use of the global weight in both estimation of β τ (t) and α τ does not only improve estimation efficiency of α τ , but also prevents efficiency loss from the discordance between w n i and w p i . If X(t) and Z(t) are uncorrelated across time t with E{X(t)} t = 0 or E{Z(t)} t = 0 and homogeneous errors ξ ij , then {D(t)} t = 0 holds. Consequently, it follows from B = 0 and the bias term b ατ = 0 thatα τ (w n i ) satisfies a √ n-consistency and its asymptotic variance is identical regardless of the choice of w n i in estimation of β τ (t). However, the foregoing conditions are too strong to satisfy in practice, since longitudinal data often involve heteroscedasticity and the covariates from observational studies are more likely to be correlated. To remove the bias ofα τ caused from nonparametric estimation, undersmoothing is typically needed [13,1].
When the equal weight is used for estimation of β τ (t) and α τ , the limiting variance This causes a loss of parametric estimation efficiency. In addition, use of the global weight, only for estimation of α τ , might lead to a less efficient estimator than the one obtained from completely ignoring within-subject dependence (w p i = w n i = w e i ), since ALA − BLB is not always nonnegative definite. Therefore, it is essential to use the global weight for time-invariant coefficient estimation as well as time-varying coefficient estimation to obtain the desired efficient estimator for α τ .

Inference with a perturbation approach
The conventional statistical inference on β τ (t) and α τ that plugs a consistent estimate of the limiting variance function into (3.1) and (3.5) can be very challenging in practice. A random perturbation approach provides a viable alternative [8]. We generate independent positive random variables {v i } n i=1 from a distribution having both mean and variance 1, such as an exponential distribution with mean 1. Then we obtainβ τ,v (t) andα τ,v by minimizing two weighted objective functions iteratively,

Implementation
After presenting a procedure that finds the local weight w i {δ(t)} and the global weight w i (γ), we implement estimation of β τ (t) and α τ . In the first step, we obtain two weights, w i {δ(t)} and w i (γ), via the following iterative estimating procedure: (1-1) Set the time-invariant parameter γ varies over time and fit a time-varying coefficient model, Y ij = X T ij γ(t ij ) + Z T ij δ(t ij ) + ij , using the local linear least squares to obtain an initial value ofδ(t).
(2-2) Updateβ τ (t) in (2.8) and obtainα τ that minimizes (2.8). When we estimate δ(t) and γ in the first step, both g i {δ(t)} and h i (γ) contain the diagonal variance matrix V i , which can be readily estimated using the method of moments [12]. Moreover, we need to construct a consistent estimator of ∂δ(t|γ)/∂γ in h i (γ). Following [16], we havê Then, we can easily show that Since Σ σ (t) and E[Z(t)X(t) T /σ 2 {U (t)}] can be estimated by respectively, an estimator of ∂δ(t|γ)/∂γ is Estimation of β τ (t) and α τ relies on the choice of bandwidths. We adopt a leave-one-subject-out cross-validation approach [18,23]. Given fixed bandwidths h 1 in (2.3) and h 2 in (4.1), we fit a model to all observations except the ith subject and obtainδ −i (t) andγ −i via the above first step. The cross validation is defined as

S. Kim and H. R. Cho
We choose optimal bandwidths h 1 and h 2 that minimize CV(h 1 , h 2 ). To choose a bandwidth h in (2.7), we first evaluate the bandwidth h ls through the local linear least squares as Similar to the proposed iterative estimation procedure above,δ −i (t) andγ −i can be estimated. Following [25], the bandwidth h is , where φ and Φ are standard normal density and distribution functions, respectively. To eliminate the bias terms in Theorem 1-3, at least theoretically, β τ (t) should be undersmoothed, and thus the bandwidth is adjusted as hn −2/15 [13].

Case 3 (Time dependent heteroscedastic errors):
We first obtain a global weight and a local weight using the empirical likelihood under the AR(1) working correlation structure. Next, we use the global weight in (2.8) for estimation of α 1,0.5 and α 2,0.5 at τ = 0.5 and investigate how the choice of weights in (2.7) for estimation of β 0,0.5 (t) and β 1,0.5 (t) influences the time-invariant estimation based on three different weights of w n i : the global weight, the local weight, and the equal weight. For comparison, the conventional quantile regression using the equal weight in both time-invariant and time-varying coefficient estimation is also investigated. The standard Gaussian kernel is used and the bandwidths are selected as shown in Section 4. To evaluate estimation efficiency of the proposed approach, we compute the mean squared error for time-invariant coefficients, MSE(α) =   Table 1 Empirical relative efficiency (relative to the proposed estimator with (w i (γ), w i (γ))) for Case 1-3.  Table 1 reports a relative efficiency of the proposed estimator using w n i , w p i = w i (γ), w i (γ) to the one using either w i {δ(t)}, w i (γ) , w e i , w i (γ) , or w e i , w e i , defined by the ratio of the mean squared errors and the mean integrated squared errors. The larger the value of the relative efficiency, the more efficient the proposed estimator. When the correlation coefficient ρ 0 is 0.7, the approaches utilizing the global weight for time-invariant coefficient estimation outperform the one ignoring the within-subject correlation in terms of smaller values of the mean squared errors. On the other hand, the relative efficiencies are closer to one as the within-subject correlation becomes weaker. This suggests that accommodating an informative correlation can improve the parametric estimation efficiency when the within-subject correlation is presented. When the results of the approaches using the global weight for time-invariant coefficient estimation are compared, the proposed approach using the global weight for time-varying coefficient estimation yields a more efficient estimator than the others, using either the local weight or the equal weight in all cases. This confirms that the additional parametric estimation efficiency gain can be achieved when the global weight is used for both time-invariant and time-varying coefficient estimation. On the other hand, the relative efficiencies of the estimated time-varying coefficients are all close to one, as expected from the asymptotic results ofβ τ (t) in Section 3.
We further investigate the statistical inference based on the proposed random perturbation approach using the global weight on both estimations. We evaluate the empirical standard error with 200 repetitions based on an exponential distribution with mean 1 and compute coverage probabilities of pointwise confidence intervals at the nominal 95% level. Since the coverage probabilities of varying coefficients are measured at sixteen time points, we average them out and provide those in Table 2. The results support that the proposed approach is effective on the statistical inference in that the coverage probabilities are close to the nominal level in all cases under consideration. Table 2 Empirical coverage probabilities of 95% confidence intervals for the proposed estimator.

Application to real data
In this section, we apply the proposed approach to longitudinal data from the Multi-Center AIDS Cohort study and evaluate the effects of pre-HIV infection CD4 cell count, smoking, and age at HIV infection on CD4 cell count. In general, CD4 cell count is considered a biomarker indicating the health status of HIV infected patients. The objective of this study is to explore the dynamic change of covariate effects on CD4 cell count over time and determine the trend of CD4 cell count depletion after HIV infection at different quantiles. This dataset consists of 283 homosexual men who became infected by HIV between 1984 and 1991. Although each patient was supposed to be repeatedly measured every 6 months, many patients missed some of their scheduled visits. Thus, unequal numbers of repeated measurements, which range from 1 to 14, are followed at different times. [10] illustrated the study design, methods, and medical implications in more detail. [7,4] confirmed that the baseline effect and the pre-infection CD4 cell count effect on the mean response may vary over time, yet neither smoking nor age has a significant impact on the mean response. Therefore, we postulate the quantile regression model with partially linear time-varying coefficients, } is the τ th conditional quantile of CD4 cell count at year t, Z 1 is the centered pre-infection CD4 cell count, X 1 is 1 if the subject smoked after infection and 0 otherwise, and X 2 is the individual's centered age at infection. Here, the covariates Z 1 and X 2 are computed by subtracting the averages of CD4 cell counts and age at infection from the subject pre-infection CD4 cell count and age at infection, respectively. This allows us to facilitate the biological interpretations, since β 0,τ (t) corresponds to the τ th conditional quantile of CD4 cell count at year t for a nonsmoker with an average pre-infection CD4 cell count and an average age at infection.
We assume an AR(1) working correlation structure and assess the covariate effects on the post-infection CD4 cell count over time at τ = 0.25, 0.5, and 0.75. We apply the proposed approach with the global weight to the estimation of both time-invariant and time-varying coefficients and use the cross validation approach in Section 4 for bandwidth selection under the standard Gaussian kernel. For statistical inference, the proposed perturbation approach with 200 repetitions and an exponential distribution with mean 1 is implemented to obtain the empirical distributions of the estimators of α τ and β τ (t).  Table 3 reports estimates of α 1,τ and α 2,τ along with their standard errors at τ = 0.25, 0.5, and 0.75. The results confirm that age at infection is negatively associated with the CD4 cell count regardless of quantiles, while the sign of smoking effect at τ = 0.75 is different from that of the smoking effect at τ = 0.25 and 0.5. However, both smoking and age effects are not statistically significant in all quantiles under consideration. Similar results are reported with the conditional mean model in [7,4]. Figure 1 provides the estimated time-varying coefficient curves of the baseline and the pre-CD4 cell count and their pointwise confidence intervals at the nominal 95% level. This figure indicates that the baseline CD4 cell count deteriorates over time at all quantiles under consideration, yet the decreasing rate of CD4 cell count at τ = 0.25 is higher than the one at τ = 0.75. In addition, the effect of the pre-infection CD4 cell count is statistically significant with pos-itive estimates overall at all quantiles. The pre-infection CD4 effect appears to be constant over time at τ = 0.75, while this effect decreases as time goes on at τ = 0.25 and 0.5. This finding at the median is comparable with the ones illustrated by kernel smoothing methods [22,23], an empirical likelihood [24], and a profile weighted least square [4] for the conditional mean regression.

Appendix: Proofs
Write x n y n if x n /y n → 1, x n = O(y n ) if sup n |x n /y n | < ∞, and x n = o(y n ) if x n /y n → 0. The bandwidths h 1 in (2.3) and h in (2.7) are different. However, since h 1 = O(h), for the ease of presentation we abuse notation h = h 1 in the proof. We also omit τ from ξ ij (τ ). The following conditions are needed for our asymptotic results.
(B3) Σ σ (t) and Λ τ (t) are positive definite and differentiable. G T (t) is differentiable. (B2) and (B3) are not required for the quantile regression, yet these are quite standard for the marginal mean regression [12,17], and thereby the proposed method is generally applicable for longitudinal data.
Without loss of generality, the inverse of the correlation matrix is decomposed as R −1 = a 0 I + C, where a 0 is an unknown coefficient and C is the matrix containing off-diagonal elements in R −1 . Accordingly, g i {δ(t)} in (2.3) is rewritten as and use (5.1) to prove Lemma 1 and Theorem 1. Proof of Lemma 1.

Lemma 1. Under conditions (B1)-(B3), we have
. Simple linear algebra gives where e ij is the jth element of e i . To prove (5.2) and (5.3), it is sufficient to show the convergence of block matrices of (Nh Here we show the convergence of the (2,2)th block matrix of (Nh) −1 n i=1 g i {δ(t)}g i {δ(t)} T and the 2nd block matrix of (Nh) −1 n i=1 ∂g i {δ(t)}/∂δ(t) in probability. The convergence of other block matrices can be similarly shown.
Since the kernel function K has bounded support, it is sufficient to consider |t ij −t| = O(h). By the boundedness of Z ij and Taylor's expansion By the fact that m is bounded, (5.4) and condition (B3), we can obtain 1 Nh and also 1 Nh Similarly, the variance of each component of (Nh converges to 0. This implies the assertion of Lemma 1.
For convenience, we recall important results in [16]: (C1) By a similar argument in the proof of Theorem 1 in [16] and Lemma 1, we have and M are defined in Section 3.1, and (5.13). As a result, the asymptotic distribution ofβ τ (t|α τ ) is the same as that ofβ τ (t|α τ ), and thus we assume that α τ is known. where minimizes the following re-parameterized function of Θ: Consider I n first. By (C2) and (C3), we have Recall |t ij − t| = O(h) due to the bounded support of K. Then, by the bound- and E(K 2 ij η 2 ij ) = O(ρ n /N ). By the Cauchy-Schwarz inequality and condition (A3), we have var Because d ij = O(h 2 ), the Taylor's expansion yields uniformly for all (i, j). Then we have where W(t) = diag{Σ fZ (t), μ K Σ fZ (t)} is a block diagonal matrix. By a similar argument with (5.7)-(5.9), it is easy to show that Therefore, by (5.7)-(5.10), we have the convergence in probability: By the convexity lemma [15],Θ has the Bahadur representation: From the first component ofΘ, we have the asymptotic Bahadur representation: . By the arguments in (5.7)-(5.8) and Taylor's expansion and var(R n1 ) = o{(Nh) −1/2 }. Thus, it follows from (5.11)-(5.12) that The desired result in (3.2) then easily follows from (5.13) and the independence of 1 , . . . , n .
(ii) Suppose that local weight w n i = w i {δ(t)} is used. By (C2), we have . By the law of large numbers, (5.14) By (5.14), (C1) and Lemma 1, using steps similar to those in the proof of Theorem 1 in [19], we have Similarly, we can obtain Thus, we have However, it is easy to show that Λ τ (t) T MΛ τ (t) = 0 regardless of the dependence matrix C in (5.1) because C T 2 C −1 The desired result in (3.3) follows from (5.13) and the independence of 1 , . . . , n .
(iii) Suppose that global weight w n i = w i (γ) is used. By (C2), we have n √ Nh Recall h i (·) defined in (2.4).
We have By the law of large numbers, we have