Let I1,…,In be independent but not necessarily identically distributed Bernoulli random variables, and let Xn=∑j=1nIj. For ν in a bounded region, a local central limit theorem expansion of
is developed to any given degree. By conditioning, this expansion provides information on the high-order correlation structure of dependent, weighted sampling schemes of a population E (a special case of which is simple random sampling), where a set d⊂E is sampled with probability proportional to ∏A∈dxA, where xA are positive weights associated with individuals A∈E. These results are used to determine the asymptotic information, and demonstrate the consistency and asymptotic normality of the conditional and unconditional logistic likelihood estimator for unmatched case-control study designs in which sets of controls of the same size are sampled with equal probability.
References
Andersen, P. K., Borgan, \O., Gill, R. and Keiding, N. (1993). Statistical Models Based on Counting Processes. Springer, New York.
Barlow, W. E. and Prentice, R. L. (1988). Residuals for relative risk regression. Biometrika 75 65--74.
Mathematical Reviews (MathSciNet):
MR932818
Billingsley, P. (1961). Statistical Inference for Markov Processes. Univ. Chicago Press.
Mathematical Reviews (MathSciNet):
MR123419
Borgan, Ø., Goldstein, L. and Langholz, B. (1995). Methods for the analysis of sampled cohort data in the Cox proportional hazards model. Ann. Statist. 23 1749--1778.
Breslow, N. E. and Cain, K. (1988). Logistic regression for two stage case-control data. Biometrika 75 11--20.
Mathematical Reviews (MathSciNet):
MR932812
Breslow, N. E. and Day, N. E. (1980). Statistical Methods in Cancer Research 1. The Analysis of Case-Control Studies. International Agency for Research on Cancer, Lyon.
Breslow, N. E. and Powers, W. (1978). Are there two logistic regressions for retrospective studies? Biometrics 34 100--105.
Breslow, N. E., Robins, J. and Wellner, J. (2000). On the semi-parametric efficiency of logistic regression under case-control sampling. Bernoulli 6 447--455.
Carroll, R., Wang, S. and Wang, C. (1995). Prospective analysis of logistic case-control studies. J. Amer. Statist. Assoc. 90 157--169.
Cox, D. R. (1972). Regression models and life-tables (with discussion). J. Roy. Statist. Soc. Ser. B 34 187--220.
Mathematical Reviews (MathSciNet):
MR341758
Cox, D. R. and Snell, E. (1989). Analysis of Binary Data, 2nd ed. Chapman and Hall, New York.
Farewell, V. (1979). Some results on the estimation of logistic models based on retrospective data. Biometrika 66 27--32.
Mathematical Reviews (MathSciNet):
MR529144
Gail, M., Lubin, J. and Rubenstein, L. (1981). Likelihood calculations for matched case-control studies and survival studies with tied death times. Biometrika 68 703--707.
Mathematical Reviews (MathSciNet):
MR637792
Goldstein, L. and Langholz, B. (1992). Asymptotic theory for nested case-control sampling in the Cox regression model. Ann. Statist. 20 1903--1928.
Gordon, L. (1983). Successive sampling in large finite populations. Ann. Statist. 11 702--706.
Mathematical Reviews (MathSciNet):
MR696081
Hájek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann. Math. Statist. 35 1491--1523.
Mathematical Reviews (MathSciNet):
MR178555
Harkness, W. (1965). Properties of the extended hypergeometric distribution. Ann. Math. Statist. 36 938--945.
Mathematical Reviews (MathSciNet):
MR182073
Kelsey, J., Whittemore, A., Evans, A. and Thompson, W. (1996). Methods in Observational Epidemiology, 2nd ed. Oxford Univ. Press.
Kupper, L., McMichael, A. and Spirtas, R. (1975). A hybrid epidemiologic study design useful in estimating relative risk. J. Amer. Statist. Assoc. 70 524--528.
Langholz, B. and Goldstein, L. (2001). Conditional logistic analysis of case-control studies with complex sampling. Biostatistics 2 63--84.
Liang, K.-Y. and Qin, J. (2000). Regression analysis under non-standard situations: A pairwise pseudolikelihood approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 62 773--786.
Mantel, N. (1973). Synthetic retrospective studies and related topics. Biometrics 29 479--486.
Psaty, B. M., Heckbert, S. R., Koepsell, T. D., Siscovick, D. S., Raghunathan, T. E., Weiss, N. S., Rosendaal, F. R., Lemaitre, R. N., Smith, N. L., Wahl, P. W. et al. (1995). The risk of myocardial infarction associated with antihypertensive drug therapies. J. American Medical Association 274 620--625.
Prentice, R. L. and Pyke, R. (1979). Logistic disease incidence models and case-control studies. Biometrika 66 403--411.
Mathematical Reviews (MathSciNet):
MR556730
Rothman, K. and Greenland, S., eds. (1998). Modern Epidemiology, 2nd ed. Lippincott--Raven, Philadelphia.
Rosén, B. (1972). Asymptotic theory for successive sampling with varying probabilities without replacement. I, II. Ann. Math. Statist. 43 373--397, 748--776.
Mathematical Reviews (MathSciNet):
MR321223
Scott, A. and Wild, C. (1986). Fitting logistic models under case-control or choice based sampling. J. Roy. Statist. Soc. Ser. B 48 170--182.
Mathematical Reviews (MathSciNet):
MR867995
Thomas, D. (1981). General relative-risk models for survival time and matched case-control analysis. Biometrics 37 673--686.
Wacholder, S., Silverman, D., McLaughlin, J. and Mandel, J. (1992). Selection of controls in case-control studies III. Design options. American J. Epidemiology 135 1042--1050.
Weinberg, C. and Wacholder, S. (1993). Prospective analysis of case-control data under general multiplicative-intercept risk models. Biometrika 80 461--465.