Electronic Journal of Statistics

Bayesian estimation under informative sampling

Terrance D. Savitsky and Daniell Toth

Full-text: Open access

Abstract

Bayesian analysis is increasingly popular for use in social science and other application areas where the data are observations from an informative sample. An informative sampling design leads to inclusion probabilities that are correlated with the response variable of interest. Model inference performed on the observed sample taken from the population will be biased for the population generative model under informative sampling since the balance of information in the sample data is different from that for the population. Typical approaches to account for an informative sampling design under Bayesian estimation are often difficult to implement because they require re-parameterization of the hypothesized generating model, or focus on design, rather than model-based, inference. We propose to construct a pseudo-posterior distribution that utilizes sampling weights based on the marginal inclusion probabilities to exponentiate the likelihood contribution of each sampled unit, which weights the information in the sample back to the population. Our approach provides a nearly automated estimation procedure applicable to any model specified by the data analyst for the population and retains the population model parameterization and posterior sampling geometry. We construct conditions on known marginal and pairwise inclusion probabilities that define a class of sampling designs where $L_{1}$ consistency of the pseudo posterior is guaranteed. We demonstrate our method on an application concerning the Bureau of Labor Statistics Job Openings and Labor Turnover Survey.

Article information

Source
Electron. J. Statist., Volume 10, Number 1 (2016), 1677-1708.

Dates
Received: July 2015
First available in Project Euclid: 18 July 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1468847267

Digital Object Identifier
doi:10.1214/16-EJS1153

Mathematical Reviews number (MathSciNet)
MR3522657

Zentralblatt MATH identifier
06624498

Keywords
Survey sampling Gaussian process Dirichlet process Bayesian hierarchical models Latent models Markov Chain Monte Carlo

Citation

Savitsky, Terrance D.; Toth, Daniell. Bayesian estimation under informative sampling. Electron. J. Statist. 10 (2016), no. 1, 1677--1708. doi:10.1214/16-EJS1153. https://projecteuclid.org/euclid.ejs/1468847267


Export citation

References

  • Barnard, J., McCulloch, R. & Meng, X.-L. (2000), ‘Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage’, Statistica Sinica 10(4), 1281–1311.
  • Breslow, N. E. & Wellner, J. A. (2007), ‘Weighted likelihood for semiparametric models and two-phase stratified samples, with application to cox regression’, Scandinavian Journal of Statistics 34(1), 86–102. URL: http://EconPapers.repec.org/RePEc:bla:scjsta:v:34:y:2007:i:1:p:86-102
  • Chambers, R. & Skinner, C. (2003), Analysis of Survey Data, Wiley Series in Survey Methodology, Wiley. URL: http://books.google.com/books?id=4pYGz69d-LkC
  • Dawid, A. (1981), ‘Some matrix-variate distribution theory: Notational considerations and a Bayesian application’, Biometrika 68(1), 265–274.
  • Dong, Q., Elliott, M. R. & Raghunathan, T. E. (2014), ‘A nonparametric method to generate synthetic populations to adjust for complex sampling design features’, Survey Methodology 40(1), 29–46.
  • Dunson, D. B. (2010), ‘Nonparametric bayes applications to biostatistics’, Bayesian nonparametrics 28, 223–273.
  • Ghosal, S., Ghosh, J. K. & Vaart, A. W. V. D. (2000), ‘Convergence rates of posterior distributions’, Ann. Statist pp. 500–531.
  • Ghosal, S. & van der Vaart, A. (2007), ‘Convergence rates of posterior distributions for noniid observations’, Ann. Statist. 35(1), 192–223. URL: http://dx.doi.org/10.1214/009053606000001172
  • Hoff, P. D. (2011), ‘Separable covariance arrays via the tucker product, with applications to multivariate relational data’, Bayesian Anal. 6(2), 179–196. URL: http://dx.doi.org/10.1214/11-BA606
  • Holt, D., Smith, T. & Winter, P. (1980), ‘A nonparametric method to generate synthetic populations to adjust for complex sampling design features’, Journal of the Royal Statistical Society. Series A (General) 143, 474–487.
  • Kunihama, T., Herring, A. H., Halpern, C. T. & Dunson, D. B. (2014), Nonparametric bayes modeling with sample survey weights, Technical report, Submitted to, Biometrika.
  • Little, R. J. (2004), ‘To model or not to model? Competing modes of inference for finite population sampling’, Journal of the American Statistical Association 99(466), 546–556.
  • Malec, D., Davis, W. W. & Cao, X. (1999), ‘Model-based small area estimates of overweight prevalence using sample selection adjustment’, Statistics in Medicine 18, 3189–3200.
  • Murray, I., Adams, R. P. & MacKay, D. J. (2010), ‘Elliptical slice sampling’, JMLR: W&CP 9, 541–548.
  • Pfeffermann, D., Da Silva Moura, F. A. & Do Nascimento Silva, P. L. (2006), ‘Multi-level modelling under informative sampling’, Biometrika 93(4), 943–959.
  • Pfeffermann, D. & Sverchkov, M. (2009), Inference under informative sampling, in D. Pfeffermann & C. Rao, eds, ‘Handbook of statistics 29B: sample surveys: inference and analysis’, Elsevier Science Ltd., pp. 455–487.
  • Rao, J. N. K. & Wu, C. (2010), ‘Bayesian pseudo-empirical-likelihood intervals for complex surveys’, Journal of the Royal Statistical Society Series B 72(4), 533–544. URL: http://EconPapers.repec.org/RePEc:bla:jorssb:v:72:y:2010:i:4:p:533-544
  • Särndal, C.-E., Swensson, B. & Wretman, J. (2003), ‘Model assisted survey sampling (springer series in, statistics)’.
  • Savitsky, T. (2015), growfunctions: Bayesian Non-Parametric Dependent Models for Time-Indexed Functional Data. R package version 0.12. URL: https://CRAN.R-project.org/package=growfunctions
  • Savitsky, T. D. (2014), ‘Bayesian Non-parametric Mixture Estimation for Time-indexed Functional Data in R’, To appear in Journal of Statistical Software.
  • Savitsky, T. D. & Dalal, S. R. (2013), ‘Bayesian non-parametric analysis of multirater ordinal data, with application to prioritizing research goals for prevention of suicide’, Journal of the Royal Statistical Society: Series C (Applied Statistics). URL: http://dx.doi.org/10.1111/rssc.12049
  • Si, Y., Pillai, N. S. & Gelman, A. (2015), ‘Bayesian nonparametric weighted sampling inference’, Bayesian Anal. 10(3), 605–625. URL: http://dx.doi.org/10.1214/14-BA924
  • Wong, W. H. & Shen, X. (1995), ‘Probability inequalities for likelihood ratios and convergence rates of sieve mles’, Ann. Statist. 23(2), 339–362. URL: http://dx.doi.org/10.1214/aos/1176324524