The Annals of Statistics

Large sample theory for semiparametric regression models with two-phase, outcome dependent sampling

Norman Breslow, Brad McNeney, and Jon A. Wellner

Full-text: Open access

Abstract

Outcome-dependent, two-phase sampling designs can dramatically reduce the costs of observational studies by judicious selection of the most informative subjects for purposes of detailed covariate measurement. Here we derive asymptotic information bounds and the form of the efficient score and influence functions for the semiparametric regression models studied by Lawless, Kalbfleisch and Wild (1999) under two-phase sampling designs. We show that the maximum likelihood estimators for both the parametric and nonparametric parts of the model are asymptotically normal and efficient. The efficient influence function for the parametric part agrees with the more general information bound calculations of Robins, Hsieh and Newey (1995). By verifying the conditions of Murphy and van der Vaart (2000) for a least favorable parametric submodel, we provide asymptotic justification for statistical inference based on profile likelihood.

Article information

Source
Ann. Statist. Volume 31, Number 4 (2003), 1110-1139.

Dates
First available in Project Euclid: 31 July 2003

Permanent link to this document
http://projecteuclid.org/euclid.aos/1059655907

Digital Object Identifier
doi:10.1214/aos/1059655907

Mathematical Reviews number (MathSciNet)
MR2001644

Zentralblatt MATH identifier
02077793

Subjects
Primary: 60F05: Central limit and other weak theorems 60F17: Functional limit theorems; invariance principles
Secondary: 60J65: Brownian motion [See also 58J65] 60J70: Applications of Brownian motions and diffusion theory (population genetics, absorption problems, etc.) [See also 92Dxx]

Keywords
asymptotic distributions asymptotic efficiency consistency covariates empirical processes information bounds least favorable maximum likelihood missing data profile likelihood outcome dependent stratified sampling two-phase $Z$-theorem

Citation

Breslow, Norman; McNeney, Brad; Wellner, Jon A. Large sample theory for semiparametric regression models with two-phase, outcome dependent sampling. Ann. Statist. 31 (2003), no. 4, 1110--1139. doi:10.1214/aos/1059655907. http://projecteuclid.org/euclid.aos/1059655907.


Export citation

References

  • BEGUN, J. M., HALL, W. J., HUANG, W.-M. and WELLNER, J. A. (1983). Information and asy mptotic efficiency in parametric-nonparametric models. Ann. Statist. 11 432-452.
  • BICKEL, P. J., KLAASSEN, C. A. J., RITOV, Y. and WELLNER, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins Univ. Press, Baltimore, MA.
  • BRESLOW, N. E. and CHATTERJEE, N. (1999). Design and analysis of two-phase studies with binary outcomes applied to Wilms tumour prognosis. Appl. Statist. 48 457-468.
  • BRESLOW, N. E. and HOLUBKOV, R. (1997). Maximum likelihood estimation of logistic regression parameters under two-phase, outcome-dependent sampling. J. Roy. Statist. Soc. Ser. B 59 447-461.
  • BRESLOW, N. E., MCNENEY, B. and WELLNER, J. A. (2000). Large sample theory for semiparametric regression models with two-phase, outcome dependent sampling. Technical Report 381, Dept. Statistics, Univ. Washington.
  • BRESLOW, N. E., ROBINS, J. M. and WELLNER, J. A. (2000). On the semi-parametric efficiency of logistic regression under case-control sampling. Bernoulli 6 447-455.
  • CHATTERJEE, N., CHEN, Y. H. and BRESLOW, N. E. (2003). A pseudoscore estimator for regression problems with two-phase sampling. J. Amer. Statist. Assoc. 98 158-168.
  • HUBER, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proc. Fifth Berkeley Sy mp. Math. Statist. Probab. 1 221-233. Univ. California Press.
  • LAWLESS, J. F., KALBFLEISCH, J. D. and WILD, C. J. (1999). Semiparametric methods for response-selective and missing data problems in regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 413-438.
  • MCCULLAGH, P. and NELDER, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, London.
  • MCNENEY, B. (1998). Asy mptotic efficiency in semiparametric models with non-i.i.d. data. Ph.D. dissertation, Univ. Washington.
  • MURPHY, S. A. and VAN DER VAART, A. W. (1997). Semiparametric likelihood ratio inference. Ann. Statist. 25 1471-1509.
  • MURPHY, S. A. and VAN DER VAART, A. W. (1999). Observed information in semi-parametric models. Bernoulli 5 381-412.
  • MURPHY, S. A. and VAN DER VAART, A. W. (2000). On profile likelihood (with discussion). J. Amer. Statist. Assoc. 95 449-485.
  • NAN, B., EMOND, M. and WELLNER, J. A. (2000). Information bounds for regression models with missing data. Technical Report 378, Dept. Statistics, Univ. Washington.
  • POLLARD, D. (1985). New way s to prove central limit theorems. Econometric Theory 1 295-314.
  • PRENTICE, R. L. and Py KE, R. (1979). Logistic disease incidence models and case-control studies. Biometrika 66 403-411.
  • ROBINS, J. M., HSIEH, F. and NEWEY, W. (1995). Semiparametric efficient estimation of a conditional density with missing or mismeasured covariates. J. Roy. Statist. Soc. Ser. B 57 409-424.
  • ROBINS, J. M., ROTNITZKY, A. and ZHAO, L. P. (1994). Estimation of regression coefficients when some regressors are not alway s observed. J. Amer. Statist. Assoc. 89 846-866.
  • SCOTT, A. J. and WILD, C. J. (1997). Fitting regression models to case-control data by maximum likelihood. Biometrika 84 57-71.
  • SCOTT, A. J. and WILD, C. J. (2000). Maximum likelihood for generalised case-control studies. Statistical design of medical experiments. II. J. Statist. Plann. Inference 96 3-27.
  • SELF, S. G. and PRENTICE, R. L. (1988). Asy mptotic distribution theory and efficiency results for case-cohort studies. Ann. Statist. 16 64-81.
  • VAN DER VAART, A. W. (1995). Efficiency of infinite-dimensional M-estimators. Statist. Neerlandica 49 9-30.
  • VAN DER VAART, A. W. (1998). Asy mptotic Statistics. Cambridge Univ. Press.
  • VAN DER VAART, A. W. and WELLNER, J. A. (1992). Existence and consistency of maximum likelihood in upgraded mixture models. J. Multivariate Anal. 43 133-146.
  • VAN DER VAART, A. W. and WELLNER, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
  • VAN DER VAART, A. W. and WELLNER, J. A. (2000). Preservation theorems for Glivenko-Cantelli and uniform Glivenko-Cantelli classes. In High Dimensional Probability II (E. Giné, D. M. Mason and J. A. Wellner, eds.) 113-132. Birkhäuser, Boston.
  • VAN DER VAART, A. W. and WELLNER, J. A. (2001). Consistency of semiparametric maximum likelihood estimators for two-phase sampling. Canad. J. Statist. 29 269-288.
  • WELLNER, J. A. and ZHAN, Y. (1997). Bootstrapping Z-estimators. Technical Report 308, Dept. Statistics, Univ. Washington.
  • SEATTLE, WASHINGTON 98195-7232 E-MAIL: norm@u.washington.edu B. MCNENEY DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE SIMON FRASER UNIVERSITY 8888 UNIVERSITY DRIVE
  • BURNABY, BRITISH COLUMBIA CANADA V5A 1S6 E-MAIL: mcneney@stat.sfu.ca J. A. WELLNER DEPARTMENT OF STATISTICS UNIVERSITY OF WASHINGTON BOX 354322
  • SEATTLE, WASHINGTON 98195-4322 E-MAIL: jaw@stat.washington.edu