Bernoulli

  • Bernoulli
  • Volume 24, Number 2 (2018), 929-955.

Asymptotics for the maximum sample likelihood estimator under informative selection from a finite population

Daniel Bonnéry, F. Jay Breidt, and François Coquet

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Inference for the parametric distribution of a response given covariates is considered under informative selection of a sample from a finite population. Under this selection, the conditional distribution of a response in the sample, given the covariates and given that it was selected for observation, is not the same as the conditional distribution of the response in the finite population, given only the covariates. It is instead a weighted version of the conditional distribution of interest. Inference must be modified to account for this informative selection. An established approach in this context is maximum “sample likelihood”, developing a weight function that reflects the informative sampling design, then treating the observations as if they were independently distributed according to the weighted distribution. While the sample likelihood methodology has been widely applied, its theoretical foundation has been less developed. A precise asymptotic description of a wide range of informative selection mechanisms is proposed. Under this framework, consistency and asymptotic normality of the maximum sample likelihood estimators are established. The theory allows for the possibility of nuisance parameters that describe the selection mechanism. The proposed regularity conditions are verifiable for various sample schemes, motivated by real problems in surveys. Simulation results for these examples illustrate the quality of the asymptotic approximations, and demonstrate a practical approach to variance estimation that combines aspects of model-based information theory and design-based variance estimation.

Article information

Source
Bernoulli, Volume 24, Number 2 (2018), 929-955.

Dates
Received: August 2015
Revised: November 2015
First available in Project Euclid: 21 September 2017

Permanent link to this document
https://projecteuclid.org/euclid.bj/1505980883

Digital Object Identifier
doi:10.3150/16-BEJ809

Mathematical Reviews number (MathSciNet)
MR3706781

Zentralblatt MATH identifier
06778352

Keywords
complex survey pseudo-likelihood stratified sampling weighted distribution

Citation

Bonnéry, Daniel; Breidt, F. Jay; Coquet, François. Asymptotics for the maximum sample likelihood estimator under informative selection from a finite population. Bernoulli 24 (2018), no. 2, 929--955. doi:10.3150/16-BEJ809. https://projecteuclid.org/euclid.bj/1505980883


Export citation

References

  • [1] Bonnéry, D. (2011). Asymptotic properties of the sample distribution under informative selection. Ph.D. thesis, Université de Rennes 1. Available at http://tel.archives-ouvertes.fr/tel-00658990.
  • [2] Bonnéry, D., Breidt, F.J. and Coquet, F. (2012). Uniform convergence of the empirical cumulative distribution function under informative selection from a finite population. Bernoulli 18 1361–1385.
  • [3] Breslow, N.E. and Cain, K.C. (1988). Logistic regression for two-stage case–control data. Biometrika 75 11–20.
  • [4] Cai, T. (2013). Investigation of ways to handle sampling weights for multilevel model analyses. Sociological Methodology 43 178–219.
  • [5] Chambers, R.L., Steel, D.G., Wang, S. and Welsh, A. (2012). Maximum Likelihood Estimation for Sample Surveys. Monographs on Statistics and Applied Probability 125. Boca Raton, FL: CRC Press.
  • [6] Eideh, A. and Nathan, G. (2009). Two-stage informative cluster sampling-estimation and prediction with applications for small-area models. J. Statist. Plann. Inference 139 3088–3101.
  • [7] Eideh, A.A.H. and Nathan, G. (2006). The analysis of data from sampling surveys under informative sampling. Acta Comment. Univ. Tartu. Math. 10 41–51.
  • [8] Eideh, A.A.H. and Nathan, G. (2006). Fitting time series models for longitudinal survey data under informative sampling. J. Statist. Plann. Inference 136 3052–3069.
  • [9] Eideh, A.A.H. and Nathan, G. (2006). Corrigendum to: “Fitting time series models for longitudinal survey data under informative sampling” [J. Statist. Plann. Inference 136 (2006) 3052–3069]. J. Statist. Plann. Inference 137 628.
  • [10] Fuller, W.A. (2009). Sampling Statistics 560. New York: Wiley.
  • [11] Ghosh, M. and Maiti, T. (2004). Small-area estimation based on natural exponential family quadratic variance function models and survey weights. Biometrika 91 95–112.
  • [12] Gong, G. and Samaniego, F.J. (1981). Pseudomaximum likelihood estimation: Theory and applications. Ann. Statist. 9 861–869.
  • [13] Horvitz, D.G. and Thompson, D.J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47 663–685.
  • [14] Korn, E.L. and Graubard, B.I. (1999). Analysis of Health Surveys. Wiley Series in Probability and Mathematical Statistics. New York: Wiley.
  • [15] Krieger, A.M. and Pfeffermann, D. (1992). Maximum likelihood estimation from complex sample surveys. Survey Methodology 18 225–239.
  • [16] Landsman, V. and Graubard, B.I. (2013). Efficient analysis of case–control studies with sample weights. Stat. Med. 32 347–360.
  • [17] Patil, G.P. and Rao, C.R. (1978). Weighted distributions and size-biased sampling with applications to wildlife populations and human families. Biometrics 34 179–189.
  • [18] Pfeffermann, D. (2011). Modelling of complex survey data: Why model? Why is it a problem? How can we approach it? Survey Methodology 37 115–136.
  • [19] Pfeffermann, D., Krieger, A.M. and Rinott, Y. (1998). Parametric distributions of complex survey data under informative probability sampling. Statist. Sinica 8 1087–1114.
  • [20] Pfeffermann, D., Moura, F.A.D.S. and Silva, P.L.d.N. (2006). Multi-level modelling under informative sampling. Biometrika 93 943–959.
  • [21] Pfeffermann, D. and Sikov, A. (2011). Imputation and estimation under nonignorable nonresponse for household surveys with missing covariate information. Journal of Official Statistics 27 181–209.
  • [22] Pfeffermann, D. and Sverchkov, M. (1999). Parametric and semi-parametric estimation of regression models fitted to survey data. Sankhyā Ser. B 61 166–186.
  • [23] Pfeffermann, D. and Sverchkov, M. (2007). Small-area estimation under informative probability sampling of areas and within the selected areas. J. Amer. Statist. Assoc. 102 1427–1439.
  • [24] Pfeffermann, D. and Sverchkov, M. (2009). Inference under informative sampling. In Sample Surveys: Inference and Analysis (D. Pfefermann, ed.). Handbook of Statistics 29B 455–487. Elsevier/North-Holland, Amsterdam.
  • [25] Pfeffermann, D. and Sverchkov, M.Y. (2003). Fitting generalized linear models under informative sampling. In Analysis of Survey Data (Southampton, 1999). Wiley Ser. Surv. Methodol. 175–195. Wiley, Chichester.
  • [26] Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. New York: Wiley.
  • [27] van der Vaart, A.W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge: Cambridge Univ. Press.
  • [28] Yuan, K. and Jennrich, R.I. (2000). Estimating equations with nuisance parameters: Theory and applications. Ann. Inst. Statist. Math. 52 343–350.