The Annals of Statistics

Large sample theory of maximum likelihood estimates in semiparametric biased sampling models

Peter B. Gilbert

Full-text: Open access


Vardi [Ann.Statist.13 178-203 (1985)] introduced an $s$-sample biased sampling model with known selection weight functions, gave a condition under which the common underlying probability distribution $G$ is uniquely estimable and developed simple procedure for computing the nonparametric maximum likelihood estimator (NPMLE) $\mathbb{G}_n$ of $G$. Gill, Vardi and Wellner thoroughly described the large sample properties of Vardi’s NPMLE, giving results on uniform consistency, convergence of $\sqrt{n}(\mathbb{G}-G)$ to a Gaussian process and asymptotic efficiency of $\mathbb{G}_n$. Gilbert, Lele and Vardi considered the class of semiparametric $s$-sample biased sampling models formed by allowing the weight functions to depend on an unknown finite-dimensional parameter $\theta$ .They extended Vardi’s estimation approach by developing a simple two-step estimation procedure in which $\hat{\theta}_n$ is obtained by maximizing a profile partial likelihood and $\mathbb{G}_n \equiv \mathbb{G}_n(\hat{\theta}_n)$ is obtained by evaluating Vardi’s NPMLE at $\hat{\theta}_n$. Here we examine the large sample behavior of the resulting joint MLE $(\hat{\theta}_n,\mathbb{G}_n)$, characterizing conditions on the selection weight functions and data in order that $(\hat{\theta}_n, \mathbb{G}_n)$ is uniformly consistent, asymptotically Gaussian and efficient.

Examples illustrated here include clinical trials (especially HIV vaccine efficacy trials), choice-based sampling in econometrics and case-control studies in biostatistics.

Article information

Ann. Statist., Volume 28, Number 1 (2000), 151-194.

First available in Project Euclid: 14 March 2002

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 60G05: Foundations of stochastic processes 62F05: Asymptotic properties of tests
Secondary: 62G20: Asymptotic properties 62G30: Order statistics; empirical distribution functions

Asymptotic theory choice-based sampling clinical trials empirical processes generalized logistic regression HIV vaccine trial nonparametric maximum likelihood selection bias models Vardi’s estimator


Gilbert, Peter B. Large sample theory of maximum likelihood estimates in semiparametric biased sampling models. Ann. Statist. 28 (2000), no. 1, 151--194. doi:10.1214/aos/1016120368.

Export citation


  • Agresti, A. (1984). Analysis of Ordinal Categorical Data. Wiley, New York.
  • Andersen, P. K. and Gill, R. D. (1982). Cox's regression model for counting processes: a large sample study. Ann. Statist. 10 1100-1120.
  • Begun, J. M., Hall, W. J., Huang, W-M. and Wellner, J. A. (1983). Information and asymptotic efficiency in parametric-nonparametric models. Ann. Statist. 11 432-452.
  • Bickel, P. J., Klaassen, C. A., Ritov, Y. and Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins Univ. Press, Baltimore, MD.
  • Cosslett, S. R. (1981). Maximum likelihood estimator for choice based samples. Econometrika 49 1289-1316.
  • Cox, D. R. and Snell, E. J. (1989). The Analysis of Binary Data. 2nd ed. Chapman and Hall, London.
  • Dudley, R. M. (1984). A course on empirical process. Ecole d' ´Et´e de Probabiliti´es de Saint Flour XII. Lecture Notes in Math. 1097 1-142. Springer, New York.
  • Dudley, R. M. (1985). An extended Wichura theorem, definitions of Donsker classes, and weighted empirical processes. Probability in Banach Spaces V. Lecture Notes in Math. 1153 1306- 1326. Springer, New York.
  • Fienberg, S. E. (1980). The Analysis of Cross-Classified Categorical Data. MIT Press.
  • Gilbert, P. B. (1996). Sieve analysis: statistical methods for assessing differential vaccine protection against HIV types. Ph.D. dissertation, Univ. Washington.
  • Gilbert, P. B., Lele, S. R. and Vardi, Y. (1999). Maximum likelihood estimation in semiparametric selection bias models with application to AIDS vaccine trials. Biometrika 86 27-43.
  • Gilbert, P. B., Self, S. G. and Ashby, M. A. (1998). Statistical methods for assessing differential vaccine protection against HIV types. Biometrics 54 799-814.
  • Gill, R. D., Vardi, Y. and Wellner, J. A. (1988). Large sample theory of empirical distributions in biased sampling models. Ann. Statist. 16 1069-1112.
  • Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifth Berkeley Symp. Math. Statist. Probab. 1 221-233. Univ. California Press, Berkeley.
  • Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist. 27 887-906.
  • Manski, C. F. (1993). The selection problemin econometrics and statistics. In Handbook of Statistics 11 (G. S. Maddala, C. R. Rao and H. D. Vinod, Eds.) 73-84. North-Holland, Amsterdam.
  • Manski, C. F. and Lerman, S. R. (1977). The estimation of choice probabilities from choice-based samples. Econometrics 45 1977-1988.
  • Ossiander, M. (1987). A central limit theorem under metric entropy with L2 bracketing. Ann. Probab. 15 897-919.
  • Pollard, D. (1984). Convergence of Stochastic Processes. Springer, New York.
  • Qin, J. (1998). Inferences for case-control and semiparametric two-sample density ratio models. Biometrika 85 619-630.
  • Sun, J. and Woodroofe, M. (1997). Semiparametric estimates for biased sampling models. Statist. Sinica 7 545-575.
  • Tricomi, F. G. (1957). Integer Equations. Interscience, New York.
  • Van der Vaart, A. (1994). Bracketing smooth functions. Stochast. Process. Appl. 52 93-105.
  • Van der Vaart, A. (1995). Efficiency of infinite dimensional M-estimators. Statist. Neerlandica 49 9-30.
  • Van der Vaart, A. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
  • Vardi, Y. (1985). Empirical distributions in selection bias models. Ann. Statist. 13 178-203.
  • Wellner, J. A. and Zhan, Y. (1998). Bootstrapping Z-estimators. Technical Report 308, Dept. Statistics, Univ. Washington.