Statistical Science

Risk set sampling in epidemiologic cohort studies

Larry Goldstein and Bryan Langholz

Full-text: Open access


Recent work has extended the methods for the analysis of nested case-control studies to accomodate a broad variety of risk set sampling designs. These results have implications for the design of sampled epidemiologic cohort studies. We describe a model which is a natural extension of the Cox proportional hazards model and may be used to estimate parameters from sampled risk set data. We illustrate how these techniques may be used to solve three diverse design and analysis problems from epidemiologic research.

Article information

Statist. Sci., Volume 11, Number 1 (1996), 35-53.

First available in Project Euclid: 16 September 2002

Permanent link to this document

Digital Object Identifier

Survival analysis cohort sampling case-control studies martingale censoring efficiency


Langholz, Bryan; Goldstein, Larry. Risk set sampling in epidemiologic cohort studies. Statist. Sci. 11 (1996), no. 1, 35--53. doi:10.1214/ss/1032209663.

Export citation


  • (1985). Cancer of the lary nx, phary nx and oesophagus in relation to alcohol and tobacco consumption among Danish brewery workers. Danish Medical Bulletin 32 119-123.
  • Andersen, P. K. and Gill, R. D. (1982). Cox's regression model for counting processes: A large sample study. Ann. Statist. 10 1100-1120.
  • Andersen, P. K., Borgan, Ø., Gill, R. D. and Keiding, N. (1992). Statistical Models Based on Counting Processes. Springer, New York.
  • Benichou, J. and Gail, M. (1995). Methods of inference for estimates of absolute risk derived from population-based casecontrol studies. Biometrics 51 182-194.
  • Benichou, J. and Wacholder, S. (1994). A comparison of three approaches to estimate exposure-specific incidence rates from population-based case-control data. Statistics in Medicine 13 651-661. Boice, J., Blettner, M., Kleinermam, R., Stovall, M. and
  • Moloney, W. (1987). Radiation dose and leukemia risk in patients treated for cancer of the cervix. Journal of the National Cancer Institute 79 1295-1311.
  • Borgan, Ø., Goldstein, L. and Langholz, B. (1995). Methods for the analysis of sampled cohort data in the Cox proportional hazards model. Ann. Statist. 23 1749-1778.
  • Borgan, Ø. and Langholz, B. (1993). Non-parametric estimation of relative mortality from nested case-control studies. Biometrics 49 593-602.
  • Borgan, Ø. and Langholz, B. (1997). Estimation of excess risk from case-control data using Aalen's linear regression model. Biometrics. To appear.
  • Breslow, N. and Cain, K. (1988). Logistic regression for two stage case-control data. Biometrika 75 11-20.
  • Breslow, N. and Langholz, B. (1987). Nonparametric estimation of relative mortality functions. Journal of Chronic Diseases 131 89S-99S.
  • Breslow, N. and Patton, J. (1979). Case-control analysis of cohort studies. In Energy and Health (N. Breslow and A. Whittemore, eds.), 226-242. SIAM, Philadelphia, PA.
  • Breslow, N. E. and Day, N. E. (1987). Statistical Methods in Cancer Research. Volume II. The Design and Analy sis of Cohort Studies. International Agency for Research on Cancer, Ly on. Breslow, N. E., Lubin, J. H., Marek, P. and Langholz, B.
  • (1983). Multiplicative models and cohort analysis. J. Amer. Statist. Assoc. 78 1-12.
  • Clay ton, D. and Hills, M. (1993). Statistical Models in Epidemiology. Oxford Univ. Press.
  • Cox, D. R. (1972). Regression models and life-tables (with discussion). J. Roy. Statist. Soc. Ser. B 34 187-220.
  • Cox, D. R. (1975). Partial likelihood. Biometrika 62 269-276.
  • Fears, T. and Brown, C. (1986). Logistic regression methods for retrospective case-control studies using complex sampling procedures. Biometrics 42 955-960. Floderus, B., Persson, T., Stenlund, C., Wennberg, A. ¨O. and
  • Knave, B. (1993). Occupational exposure to electromagnetic fields in relation to leukemia and brain tumors: A casecontrol study in Sweden. Cancer Causes and Control 4 465- 476.
  • Freedman, D. and Navidi, W. (1989). Multistage models for carcinogenesis. Environmental Health Perspectives 81 169-188. Garabrant, D., Held, J., Langholz, B. and Bernstein,
  • L. (1988). Mortality of aircraft manufacturing workers in Southern California. American Journal of Industrial Medicine 13 683-693. Garabrant, D., Held, J., Langholz, B., Peters, J. and Mack,
  • T. (1992). DDT and related compounds and the risk of pancreatic cancer. Journal of the National Cancer Institute 84 764-771.
  • Goldstein, L. and Langholz, B. (1992). Asy mptotic theory for nested case-control sampling in the Cox regression model. Ann. Statist. 20 1903-1928.
  • Goldstein, L. and Langholz, B. (1995). Risk set sampling in epidemiologic cohort studies: Detailed report. Technical Report 101, Dept. of Preventive Medicine, Biostatistics Division, Univ. Southern California, Los Angeles.
  • Holford, T. (1976). Life tables with concomitant information. Biometrics 32 587-597.
  • Hornung, R. and Meinhardt, T. (1987). Quantitative risk assessment of lung cancer in U.S. uranium miners. Health physics 52 417-430. Kogevinas, M., Kauppinen, T., Winkelmann, R., Becher, H., Bertazzi, P., Bueno de Mesquita, H., Coggon, D., Green, L., Johnson, E., Littorin, M., Ly nge, E., Marlow, D., Mathews, J., Neuberger, M., Benn, T., Pannett, B.,
  • Pearce, N. and Saracci, R. (1995). Soft tissue sarcoma and non-Hodgkin's ly mphoma in workers exposed to phenoxy herbicides, chlorophenols, and dioxins: Two nested casecontrol studies. Epidemiology 6 396-402.
  • Kupper, L., McMichael, A. and Spirtas, R. (1975). A hy brid epidemiologic study design useful in estimating relative risk. J. Amer. Statist. Soc. 70 524-528.
  • Langholz, B. and Borgan, Ø. (1995). Counter-matching: A stratified nested case-control sampling method. Biometrika 82 69-79.
  • Langholz, B. and Borgan, Ø. (1997). Estimation of absolute risk from nested case-control data. Biometrics. To appear.
  • Langholz, B. and Clay ton, D. (1994). Sampling strategies in nested case-control studies. Environmental Health Perspectives 102 (Suppl. 8) 47-51.
  • Liddell, F., McDonald, J. and Thomas, D. (1977). Methods of cohort analysis: Appraisal by application to asbestos miners. J. Roy. Statist. Soc. Ser. A 140 469-491.
  • Lubin, J. H. and Gail, M. (1984). Biased selection of controls for case-control analyses of cohort studies. Biometrics 40 63-75. Lubin, J., Boice, J., Edling, C., Hornung, R., Howe, G., Kunz, E., Kusiak, R., Morrison, H., Radford, E., Samet, J., Tir
  • marche, M., Woodward, A., Xiang, Y. and Pierce, D. (1994). Radon and lung cancer risk: A joint analysis of 11 underground miners studies. NIH Publication 94-3644, U.S. Department of Health and Human Services, Public Health Service, National Institutes of Health, Bethesda, MD.
  • Lundin, F., Wagoner, J. and Archer, V. (1971). Radon daughter exposure and respiratory cancer, quantitative and temporal aspects. Joint Monograph 1, U.S. Public Health Service, Washington, DC.
  • Mantel, N. (1973). Sy nthetic retrospective studies and related topics. Biometrics 29 479-486.
  • Midzuno, H. (1952). On the sampling sy stem with probability proportionate to sum of sizes. Ann. Inst. Statist. Math. 3 99- 107.
  • Miettinen, O. (1969). Individual matching with multiple controls in the case of all-or-none responses. Biometrics 25 339-355.
  • Oakes, D. (1981). Survival times: Aspects of partial likelihood (with discussion). Internat. Statist. Rev. 49 235-264.
  • Prentice, R. L. (1986). A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73 1-11.
  • Prentice, R. L. and Breslow, N. E. (1978). Retrospective studies and failure time models. Biometrika 65 153-158.
  • Self, S. G. and Prentice, R. L. (1988). Asy mptotic distribution theory and efficiency results for case-cohort studies. Ann. Statist. 16 64-81.
  • Thomas, D., Pogoda, J., Langholz, B. and Mack, W. (1994). Temporal modifiers of the radon-smoking interaction. Health physics 66 257-262.
  • Ury, H. (1975). Efficiency of case-control studies with multiple controls per case: Continuous or dichotomous data. Biometrics 31 643-649.
  • Weinberg, C. and Wacholder, S. (1993). Prospective analysis of case-control data under general multiplicative-intercept risk models. Biometrika 80 461-465.
  • Whittemore, A. and McMillan, A. (1983). Lung cancer mortality among U.S. uranium miners: A reappraisal. Journal of the National Cancer Institute 71 489-499.
  • Wild, C. (1991). Fitting prospective regression models to casecontrol data. Biometrika 78 705-717.
  • Xiang, A. and Langholz, B. (1995). Comparison of case-control to full cohort analyses when covariates are omitted from the model. Technical Report 108, Dept. of Preventive Medicine, Biostatistics Division, Univ. Southern California, Los Angeles.