Source: Ann. Statist. Volume 37, Number 5A
(2009), 2351-2376.
We consider a class of doubly weighted rank-based estimating methods for the transformation (or accelerated failure time) model with missing data as arise, for example, in case-cohort studies. The weights considered may not be predictable as required in a martingale stochastic process formulation. We treat the general problem as a semiparametric estimating equation problem and provide proofs of asymptotic properties for the weighted estimators, with either true weights or estimated weights, by using empirical process theory where martingale theory may fail. Simulations show that the outcome-dependent weighted method works well for finite samples in case-cohort studies and improves efficiency compared to methods based on predictable weights. Further, it is seen that the method is even more efficient when estimated weights are used, as is commonly the case in the missing data literature. The Gehan censored data Wilcoxon weights are found to be surprisingly efficient in a wide class of problems.
References
[1] Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins Univ. Press, Baltimore.
[2] Borgan, O., Langholz, B., Samuelsen, S. O., Goldstein, L. and Pogoda, J. (2000). Exposure stratified case-cohort designs. Lifetime Data Anal. 6 39–58.
[3] Breslow, N. E. and Wellner, J. A. (2007). Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression. Scand. J. Statist. 34 86–102.
[4] Buckley, J. and James, I. R. (1979). Linear regression with censored data. Biometrika 66 429–436.
[5] Chen, K. and Lo, S.-H. (1999). Case-cohort and case-control analysis with Cox’s model. Biometrika 86 755–764.
[6] Cox, D. R. (1972). Regression models and life tables (with discussion). J. Roy. Statist. Soc. Ser. B 34 187–220.
Mathematical Reviews (MathSciNet):
MR341758
[7] Fygenson, M. and Ritov, Y. (1994). Monotone estimating equations for censored data. Ann. Statist. 22 732–746.
[8] Hu, H. (1998). Large sample theory for pseudo-maximum likelihood estimates in semiparametric models. Ph.D. dissertation, Dept. Statistics, Univ. Washington.
[9] Huang, Y. (2002). Calibration regression of censored lifetime medical cost. J. Amer. Statist. Assoc. 97 318–327.
[10] Jin, Z., Ying, Z. and Wei, L. J. (2001). A simple resampling method by perturbing the minimand. Biometrika 88 381–390.
[11] Kalbfleisch, J. D. and Lawless, J. F. (1988). Likelihood analysis of multi-state models for disease incidence and mortality. Stat. Med. 7 149–160.
[12] Kalbfleisch, J. D. and Prentice, R. L. (2002). The Statistical Analysis of Failure Time Data, 2nd ed. Wiley, New York.
[13] Kulich, M. and Lin, D. Y. (2004). Improving the efficiency of relative-risk estimation in case-cohort studies. J. Amer. Statist. Assoc. 99 832–844.
[14] Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley, Hoboken, NJ.
[15] Nan, B. and Wellner, J. A. (2006). Semiparametric pseudo Z-estimation with applications. Technical report, Dept. Biostatistics, Univ. Michigan.
[16] Nan, B., Yu, M. and Kalbfleisch, J. D. (2006). Censored linear regression for case-cohort studies. Biometrika 93 747–762.
[17] Parzen, M. I., Wei, L. J. and Ying, Z. (1994). A resampling method based on pivotal estimating functions. Biometrika 81 341–350.
[18] Pierce, D. A. (1982). The asymptotic effect of substituting estimators for parameters in certain types of statistics. Ann. Statist. 10 475–478.
Mathematical Reviews (MathSciNet):
MR653522
[19] Prentice, R. L. (1986). A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73 1–11.
[20] Pugh, M., Robins, J., Lipsitz, S. and Harrington, D. (1994). Inference in the Cox proportional hazards model with missing covariates. Technical Report 758Z, Harvard School of Public Health, Boston, MA.
[21] Ritov, Y. (1990). Estimation in a linear regression model with censored data. Ann. Statist. 18 303–328.
[22] Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 89 846–866.
[23] Self, S. G. and Prentice, R. L. (1988). Asymptotic distribution theory and efficiency results for case-cohort studies. Ann. Statist. 16 64–81.
Mathematical Reviews (MathSciNet):
MR924857
[24] Stute, W. (1993). Consistent estimation under random censorship when covariables are available. J. Multivariate Anal. 45 89–103.
[25] Stute, W. (1996). Distributional convergence under random censorship when covariables are present. Scand. J. Statist. 23 461–471.
[26] Tsiatis, A. A. (1990). Estimating regression parameters using linear rank tests for censored data. Ann. Statist. 18 354–372.
[27] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
[28] van der Vaart, A. W. and Wellner, J. A. (2000). Preservation theorems for Glivenko–Cantelli and uniform Glivenko–Cantelli classes. In High Dimensional Probability II (E. Giné, D. Mason and J. A. Wellner, eds.) 115–134. Birkhäuser, Boston.
[29] Wei, L. J., Ying, Z. L. and Lin, D. Y. (1990). Linear regression analysis for censored survival data based on rank tests. Biometrika 77 845–851.
[30] Ying, Z. (1993). A large sample study of rank estimation for censored regression data. Ann. Statist. 21 76–99.
[31] Yu, M. and Nan, B. (2006). A hybrid Newton-type method for censored survival data using double weights in linear models. Lifetime Data Anal. 12 345–364.