Bernoulli

Semiparametric density estimation under a two-sample density ratio model

K.F. Cheng and C.K. Chu
Source: Bernoulli Volume 10, Number 4 (2004), 583-604.

Abstract

A semiparametric density estimation is proposed under a two-sample density ratio model. This model, arising naturally from case-control studies and logistic discriminant analyses, can also be regarded as a biased sampling model. Our proposed density estimate is therefore an extension of the kernel density estimate suggested by Jones for length-biased data. We show that under the model considered the new density estimator not only is consistent but also has the `smallest' asymptotic variance among general nonparametric density estimators. We also show how to use the new estimate to define a procedure for testing the goodness of fit of the density ratio model. Such a test is consistent under very general alternatives. Finally, we present some results from simulations and from the analysis of two real data sets.

First Page: Show Hide
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.bj/1093265631
Mathematical Reviews number (MathSciNet): MR2076064
Zentralblatt MATH identifier: 1055.62032
Digital Object Identifier: doi:10.3150/bj/1093265631

References

[1] Cheng, K.F. and Chen, L.C. (2003) Testing goodness-of-fit of a logistic regression model with case- control data. J. Statist. Plann. Inference. To appear.
[2] de Jong, P. (1987) A central limit theorem for generalized quadratic forms. Probab. Theory Related Fields, 75, 261-277.
[3] Efron, B. and Tibshirani, R. (1996) Using specially designed exponential families for density estimation. Ann. Statist., 24, 2431-2461. Abstract can also be found in the ISI/STMA publication
[4] Epanechnikov, V.A. (1969) Nonparametric estimation of a multidimensional probability density. Theory Probab. Appl., 14, 153-158.
[5] Fokianos, K. (2002) Merging information for semiparametric density estimation. Technical report, Department of Mathematics and Statistics, University of Cyprus.
[6] Fokianos, K., Kedem, B., Qin, J. and Short, D.A. (2001) A semiparametric approach to the one-way layout. Technometrics, 43, 56-64. Abstract can also be found in the ISI/STMA publication
[7] Glovsky, L. and Rigrodsky, S. (1964) A developmental analysis of mentally deficient children with early histories of aphasis. Training School Bull., 61, 76-96.
[8] Hjort, N.L. and Glad, I.K. (1995) Nonparametric density estimation with a parametric start. Ann. Statist., 23, 882-904. Abstract can also be found in the ISI/STMA publication
[9] Hosmer, D.J. and Lemeshow, S. (1989) Applied Logistic Regression. New York: Wiley.
[10] Jones, M.C. (1991) Kernel density estimation for length biased data. Biometrika, 78, 511-519. Abstract can also be found in the ISI/STMA publication
[11] Marron, J.S. and Wand, M.P. (1992) Exact mean integrated square error. Ann. Statist., 20, 712-736. Abstract can also be found in the ISI/STMA publication
[12] Prentice, R.L. and Pyke, R. (1979). Logistic disease incidence models and case-control studies. Biometrika, 66, 403-411.
[13] Qin, J. (1998) Inferences for case-control and semiparametric two-sample density ratio model. Biometrika, 85, 619-630. Abstract can also be found in the ISI/STMA publication
[14] Qin, J. and Zhang, B. (1997) A goodness-of-fit test for logistic regression models based on case- control data. Biometrika, 84, 609-618. Abstract can also be found in the ISI/STMA publication
[15] Silverman, B.W. (1986) Density Estimation for Statistics and Data Analysis. New York: Chapman and Hall.
[16] Vardi, Y. (1982) Nonparametric estimation in the presence of length bias. Ann. Statist., 10, 616-620.
[17] Vardi, Y. (1985) Empirical distributions in selection bias models. Ann. Statist., 13, 178-203.
[18] White, H. (1982) Maximum likelihood estimation of misspecified models. Econometrica, 50, 1-16.
[19] Zhang, B. (1999) A chi-squared goodness-of-fit test for logistic regression models based on case- control data. Biometrika, 86, 531-539. Abstract can also be found in the ISI/STMA publication
[20] Zhang, B. (2000) M-estimation under a two-sample semiparametric model. Scand. J. Statist., 27, 263- 280.
[21] Zhang, B. (2001) An information matrix test for logistic regression models based on case-control data. Biometrika, 88, 921-932. Abstract can also be found in the ISI/STMA publication
[22] Zhao, L.P., Kristal, A.R. and White, E. (1996) Estimating relative risk functions in case-control studies using a nonparametric logistic regression. Amer. J. Epidemiology, 144, 598-609.

2012 © Bernoulli Society for Mathematical Statistics and Probability

Bernoulli

Bernoulli

Turn MathJax Off
What is MathJax?