Institute of Mathematical Statistics Collections

Semiparametric models and two-phase samples: Applications to Cox regression

Norman E. Breslow and Thomas Lumley

Full-text: Open access

Abstract

A standard estimation method when fitting parametric models to data from two-phase stratified samples is inverse probability weighting of the estimating equations. In previous work we applied this approach to likelihood equations for both Euclidean and non-Euclidean parameters in semi-parametric models. We proved weak convergence of the inverse probability weighted empirical process and derived an asymptotic expansion for the estimator of the Euclidean parameter. We also showed how adjustment of the sampling weights by their calibration to known totals of auxiliary variables, or their estimation using these same variables, could markedly improve efficiency.

Here we consider joint estimation of Euclidean and non-Euclidean parameters. Our asymptotic expansion for the non-Euclidean parameter is apparently new even in the special case of simple random sampling. The results are applied to estimation of survival probabilities for individual subjects using the regression coefficients (log hazard ratios) and baseline cumulative hazard function of the Cox proportional hazards model. Expressions derived for the variances of regression coefficients and cumulative hazards estimated after calibration of the weights aid construction of the auxiliary variables used for adjustment. We demonstrate empirically the improvement offered by calibration or estimation of the weights via simulation of two-phase stratified samples using publicly available data from the National Wilms Tumor Study and data analysis with the R survey package.

Chapter information

Source
Banerjee, M., Bunea, F., Huang, J., Koltchinskii, V., and Maathuis, M. H., eds., From Probability to Statistics and Back: High-Dimensional Models and Processes -- A Festschrift in Honor of Jon A. Wellner, (Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2013) , 65-77

Dates
First available in Project Euclid: 8 March 2013

Permanent link to this document
https://projecteuclid.org/euclid.imsc/1362751180

Digital Object Identifier
doi:10.1214/12-IMSCOLL906

Mathematical Reviews number (MathSciNet)
MR3186749

Zentralblatt MATH identifier
1347.60008

Subjects
Primary: 60F05: Central limit and other weak theorems 60F17: Functional limit theorems; invariance principles
Secondary: 60J65: Brownian motion [See also 58J65] 60J70: Applications of Brownian motions and diffusion theory (population genetics, absorption problems, etc.) [See also 92Dxx]

Keywords
Asymptotic distributions asymptotic efficiency calibration empirical processes survival analysis stratified sampling two-phase

Rights
Copyright © 2010, Institute of Mathematical Statistics

Citation

Breslow, Norman E.; Lumley, Thomas. Semiparametric models and two-phase samples: Applications to Cox regression. From Probability to Statistics and Back: High-Dimensional Models and Processes -- A Festschrift in Honor of Jon A. Wellner, 65--77, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2013. doi:10.1214/12-IMSCOLL906. https://projecteuclid.org/euclid.imsc/1362751180


Export citation

References

  • [1] Aalen, O. O., Borgan, O. and Gjessing, H. K. (2008). Survival and Event History Analysis. Springer, New York.
  • [2] Andersen, P. K. and Gill, R. D. (1982). Cox’s regression model for counting processes: A large sample study. Annals of Statistics 10 1100–1120.
  • [3] Begun, J. M., Hall, W. J., Huang, W.-M. and Wellner, J. A. (1983). Information and asymptotic efficiency in parametric-nonparametric models. Annals of Statistics 11 452-452.
  • [4] Borgan, O., Langholz, B., Samuelsen, S. O., Goldstein, L. and Pogoda, J. (2000). Exposure stratified case-cohort designs. Lifetime Data Analysis 6 39–58.
  • [5] Breslow, N. (1974). Covariance analysis of censored survival data. Biometrics 30 89–99.
  • [6] Breslow, N. and Crowley, J. (1974). A large sample study of the life table and product limit estimates under random censorship. Annals of Statistics 2 437–453.
  • [7] Breslow, N. E. (1996). Statistics in epidemiology: The case-control study. Journal of the American Statistical Association 91 14–28.
  • [8] Breslow, N. E., Lumley, T., Ballantyne, C. M., Chambless, L. E. and Kulich, M. (2009). Improved Horvitz-Thompson estimation of model parameters from two-phase stratified samples: Applications in epidemiology. Statistics in Biosciences 1 32–49.
  • [9] Breslow, N. E., Lumley, T., Ballantyne, C. M., Chambless, L. E. and Kulich, M. (2009). Using the whole cohort in the analysis of case-cohort Data. American Journal of Epidemiology 169 1398–1405.
  • [10] Breslow, N. E. and Wellner, J. A. (2007). Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression. Scandinavian Journal of Statistics 34 86–102.
  • [11] Breslow, N. E. and Wellner, J. A. (2008). A Z-theorem with estimated nuisance parameters and correction note for “Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression”. Scandinavian Journal of Statistics 35 186–192.
  • [12] Cox, D. R. (1972). Regression models and life-tables (with discussion). Journal of the Royal Statistical Society (Series B) 34 187–220.
  • [13] Cox, D. R. (1975). Partial likelihood. Biometrika 62 269–276.
  • [14] Deville, J. C. and Särndal, C.-E. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association 87 376–382.
  • [15] Kalbfleisch, J. D. and Lawless, J. F. (1988). Likelihood analysis of multi-state models for disease incidence and mortality. Statistics in Medicine 7 149–160.
  • [16] Kulich, M. and Lin, D. Y. (2004). Improving the efficiency of relative-risk estimation in case-cohort studies. Journal of the American Statistical Association 99 832–844.
  • [17] Lumley, T. (2010). Complex Surveys: A Guide to Analysis Using R. John Wiley & Sons, Hoboken, New Jersey.
  • [18] Manski, C. F. and Lerman, S. R. (1977). The estimation of choice probabilities from choice based samples. Econometrica 45 1977–1988.
  • [19] Nan, B., Emond, M. and Wellner, J. A. (2004). Information bounds for Cox regression models with missing data. Annals of Statistics 32 723–753.
  • [20] Præstgaard, J. and Wellner, J. A. (1993). Exchangeably weighted bootstraps of the general empirical process. Annals of Probability 21 2053–2086.
  • [21] Prentice, R. L. (1986). A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73 1–11.
  • [22] Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89 846–866.
  • [23] Särndal, C. E., Swensson, B. and Wretman, J. (2003). Model Assisted Survey Sampling. Springer, New York.
  • [24] Tsiatis, A. A. (2006). Semiparametric Theory and Missing Data. Springer, New York.
  • [25] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press, Cambridge.