## The Annals of Statistics

### Large sample theory of maximum likelihood estimates in semiparametric biased sampling models

Peter B. Gilbert

#### Abstract

Vardi [Ann.Statist.13 178-203 (1985)] introduced an $s$-sample biased sampling model with known selection weight functions, gave a condition under which the common underlying probability distribution $G$ is uniquely estimable and developed simple procedure for computing the nonparametric maximum likelihood estimator (NPMLE) $\mathbb{G}_n$ of $G$. Gill, Vardi and Wellner thoroughly described the large sample properties of Vardi’s NPMLE, giving results on uniform consistency, convergence of $\sqrt{n}(\mathbb{G}-G)$ to a Gaussian process and asymptotic efficiency of $\mathbb{G}_n$. Gilbert, Lele and Vardi considered the class of semiparametric $s$-sample biased sampling models formed by allowing the weight functions to depend on an unknown finite-dimensional parameter $\theta$ .They extended Vardi’s estimation approach by developing a simple two-step estimation procedure in which $\hat{\theta}_n$ is obtained by maximizing a profile partial likelihood and $\mathbb{G}_n \equiv \mathbb{G}_n(\hat{\theta}_n)$ is obtained by evaluating Vardi’s NPMLE at $\hat{\theta}_n$. Here we examine the large sample behavior of the resulting joint MLE $(\hat{\theta}_n,\mathbb{G}_n)$, characterizing conditions on the selection weight functions and data in order that $(\hat{\theta}_n, \mathbb{G}_n)$ is uniformly consistent, asymptotically Gaussian and efficient.

Examples illustrated here include clinical trials (especially HIV vaccine efficacy trials), choice-based sampling in econometrics and case-control studies in biostatistics.

#### Article information

Source
Ann. Statist., Volume 28, Number 1 (2000), 151-194.

Dates
First available in Project Euclid: 14 March 2002

Permanent link to this document
https://projecteuclid.org/euclid.aos/1016120368

Digital Object Identifier
doi:10.1214/aos/1016120368

Mathematical Reviews number (MathSciNet)
MR1762907

Zentralblatt MATH identifier
1106.60302

#### Citation

Gilbert, Peter B. Large sample theory of maximum likelihood estimates in semiparametric biased sampling models. Ann. Statist. 28 (2000), no. 1, 151--194. doi:10.1214/aos/1016120368. https://projecteuclid.org/euclid.aos/1016120368

#### References

• Agresti, A. (1984). Analysis of Ordinal Categorical Data. Wiley, New York.
• Andersen, P. K. and Gill, R. D. (1982). Cox's regression model for counting processes: a large sample study. Ann. Statist. 10 1100-1120.
• Begun, J. M., Hall, W. J., Huang, W-M. and Wellner, J. A. (1983). Information and asymptotic efficiency in parametric-nonparametric models. Ann. Statist. 11 432-452.
• Bickel, P. J., Klaassen, C. A., Ritov, Y. and Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins Univ. Press, Baltimore, MD.
• Cosslett, S. R. (1981). Maximum likelihood estimator for choice based samples. Econometrika 49 1289-1316.
• Cox, D. R. and Snell, E. J. (1989). The Analysis of Binary Data. 2nd ed. Chapman and Hall, London.
• Dudley, R. M. (1984). A course on empirical process. Ecole d' ´Et´e de Probabiliti´es de Saint Flour XII. Lecture Notes in Math. 1097 1-142. Springer, New York.
• Dudley, R. M. (1985). An extended Wichura theorem, definitions of Donsker classes, and weighted empirical processes. Probability in Banach Spaces V. Lecture Notes in Math. 1153 1306- 1326. Springer, New York.
• Fienberg, S. E. (1980). The Analysis of Cross-Classified Categorical Data. MIT Press.
• Gilbert, P. B. (1996). Sieve analysis: statistical methods for assessing differential vaccine protection against HIV types. Ph.D. dissertation, Univ. Washington.
• Gilbert, P. B., Lele, S. R. and Vardi, Y. (1999). Maximum likelihood estimation in semiparametric selection bias models with application to AIDS vaccine trials. Biometrika 86 27-43.
• Gilbert, P. B., Self, S. G. and Ashby, M. A. (1998). Statistical methods for assessing differential vaccine protection against HIV types. Biometrics 54 799-814.
• Gill, R. D., Vardi, Y. and Wellner, J. A. (1988). Large sample theory of empirical distributions in biased sampling models. Ann. Statist. 16 1069-1112.
• Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifth Berkeley Symp. Math. Statist. Probab. 1 221-233. Univ. California Press, Berkeley.
• Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist. 27 887-906.
• Manski, C. F. (1993). The selection problemin econometrics and statistics. In Handbook of Statistics 11 (G. S. Maddala, C. R. Rao and H. D. Vinod, Eds.) 73-84. North-Holland, Amsterdam.
• Manski, C. F. and Lerman, S. R. (1977). The estimation of choice probabilities from choice-based samples. Econometrics 45 1977-1988.
• Ossiander, M. (1987). A central limit theorem under metric entropy with L2 bracketing. Ann. Probab. 15 897-919.
• Pollard, D. (1984). Convergence of Stochastic Processes. Springer, New York.
• Qin, J. (1998). Inferences for case-control and semiparametric two-sample density ratio models. Biometrika 85 619-630.
• Sun, J. and Woodroofe, M. (1997). Semiparametric estimates for biased sampling models. Statist. Sinica 7 545-575.
• Tricomi, F. G. (1957). Integer Equations. Interscience, New York.
• Van der Vaart, A. (1994). Bracketing smooth functions. Stochast. Process. Appl. 52 93-105.
• Van der Vaart, A. (1995). Efficiency of infinite dimensional M-estimators. Statist. Neerlandica 49 9-30.
• Van der Vaart, A. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
• Vardi, Y. (1985). Empirical distributions in selection bias models. Ann. Statist. 13 178-203.
• Wellner, J. A. and Zhan, Y. (1998). Bootstrapping Z-estimators. Technical Report 308, Dept. Statistics, Univ. Washington.