Semiparametric models and two-phase samples: Applications to Cox regression

Breslow, Norman E., norm@uw.edu; Lumley, Thomas, t.lumley@auckland.ac.nz

doi:10.1214/12-IMSCOLL906

VOL. 9 | 2013 Semiparametric models and two-phase samples: Applications to Cox regression

Norman E. Breslow, Thomas Lumley

Editor(s) M. Banerjee, F. Bunea, J. Huang, V. Koltchinskii, M. H. Maathuis

Inst. Math. Stat. (IMS) Collect., 2013: 65-77 (2013) DOI: 10.1214/12-IMSCOLL906

Abstract

A standard estimation method when fitting parametric models to data from two-phase stratified samples is inverse probability weighting of the estimating equations. In previous work we applied this approach to likelihood equations for both Euclidean and non-Euclidean parameters in semi-parametric models. We proved weak convergence of the inverse probability weighted empirical process and derived an asymptotic expansion for the estimator of the Euclidean parameter. We also showed how adjustment of the sampling weights by their calibration to known totals of auxiliary variables, or their estimation using these same variables, could markedly improve efficiency.

Here we consider joint estimation of Euclidean and non-Euclidean parameters. Our asymptotic expansion for the non-Euclidean parameter is apparently new even in the special case of simple random sampling. The results are applied to estimation of survival probabilities for individual subjects using the regression coefficients (log hazard ratios) and baseline cumulative hazard function of the Cox proportional hazards model. Expressions derived for the variances of regression coefficients and cumulative hazards estimated after calibration of the weights aid construction of the auxiliary variables used for adjustment. We demonstrate empirically the improvement offered by calibration or estimation of the weights via simulation of two-phase stratified samples using publicly available data from the National Wilms Tumor Study and data analysis with the R survey package.