Electronic Journal of Statistics

Fully Bayesian estimation under informative sampling

Luis G. León-Novelo and Terrance D. Savitsky

Full-text: Open access

Abstract

Survey data are often collected under informative sampling designs where subject inclusion probabilities are designed to be correlated with the response variable of interest. The data modeler seeks to estimate the parameters of a population model they specify from these data. Sampling weights constructed from marginal inclusion probabilities are typically used to form an exponentiated pseudo likelihood as a plug-in estimator in a partially Bayesian pseudo posterior. We introduce the first fully Bayesian alternative, based on a Bayes rule construction, that simultaneously performs weight smoothing and estimates the population model parameters in a construction that treats the response variable(s) and inclusion probabilities as jointly randomly generated from a population distribution. We formulate conditions on known marginal and pairwise inclusion probabilities that define a class of sampling designs where $L_{1}$ consistency of the joint posterior is guaranteed. We compare performances between the two approaches on synthetic data. We demonstrate that the credibility intervals under our fully Bayesian method achieve nominal coverage. We apply our method to data from the National Health and Nutrition Examination Survey to explore the relationship between caffeine consumption and systolic blood pressure.

Article information

Source
Electron. J. Statist., Volume 13, Number 1 (2019), 1608-1645.

Dates
Received: July 2018
First available in Project Euclid: 17 April 2019

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1555466479

Digital Object Identifier
doi:10.1214/19-EJS1538

Mathematical Reviews number (MathSciNet)
MR3939589

Zentralblatt MATH identifier
07056159

Keywords
Bayesian penalized B-splines informative sampling inclusion probabilities NHANES sampling weights survey sampling

Rights
Creative Commons Attribution 4.0 International License.

Citation

León-Novelo, Luis G.; Savitsky, Terrance D. Fully Bayesian estimation under informative sampling. Electron. J. Statist. 13 (2019), no. 1, 1608--1645. doi:10.1214/19-EJS1538. https://projecteuclid.org/euclid.ejs/1555466479


Export citation

References

  • Bob Carpenter, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A probabilistic programming language., Journal of Statistical Software, 20:1–37, 2016.
  • CDC-A. Centers for Disease Control (CDC). national health and nutrition examination survey., https://www.cdc.gov/nchs/nhanes/index.htm, 2016. Accessed: June 7, 2016.
  • CDC-B. Centers for Disease Control (CDC). modeling usual intake using dietary recall data, task 4: Key concepts about variance estimation methods in NHANES., https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info4.htm, 2016. Accessed: June 7, 2016.
  • CDC-D. Centers for Disease Control (CDC). key concepts about NHANES survey design: NHANES sampling procedure., http://www.cdc.gov/nchs/tutorials/NHANES/SurveyDesign/SampleDesign/Info1.htm, 2016. Accessed: June 7, 2016.
  • Qi Dong, Michael R. Elliott, and Trivellore E. Raghunathan. A nonparametric method to generate synthetic populations to adjust for complex sampling design features., Survey Methodology, 40(1):29–46, 2014.
  • Subhashis Ghosal, Jayanta K Ghosh, and Aad W Van Der Vaart. Convergence rates of posterior distributions., Annals of Statistics, 28(2):500–531, 2000.
  • David R Judkins. Fay’s method for variance estimation., Journal of Official Statistics, 6(3):223, 1990.
  • D Krewski and JNK Rao. Inference from stratified samples: properties of the linearization, jackknife and balanced repeated replication methods., The Annals of Statistics, pages 1010–1019, 1981.
  • Tsuyoshi Kunihama, AH Herring, CT Halpern, and DB Dunson. Nonparametric bayes modeling with sample survey weights., Statistics & Probability Letters, 113:41–48, 2016.
  • Roderick J. Little. To model or not to model? Competing modes of inference for finite population sampling., Journal of the American Statistical Association, 99(466):546–556, 2004.
  • Philip J McCarthy. Pseudo-replication: Half samples., Revue de l’Institut International de Statistique, pages 239–264, 1969.
  • Robert Nishihara, Iain Murray, and Ryan P. Adams. Parallel mcmc with generalized elliptical slice sampling., J. Mach. Learn. Res., 15(1) :2087–2112, January 2014. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=2627435.2670318.
  • Danny Pfeffermann, Abba M Krieger, and Yosef Rinott. Parametric distributions of complex survey data under informative probability sampling., Statistica Sinica, pages 1087–1114, 1998.
  • Danny Pfeffermann, Fernando Antonio Da Silva Moura, and Pedro Luis Do Nascimento Silva. Multi-level modelling under informative sampling., Biometrika, 93(4):943–959, 2006.
  • J. N. K. Rao and Changbao Wu. Bayesian pseudo-empirical-likelihood intervals for complex surveys., Journal of the Royal Statistical Society Series B, 72(4):533–544, 2010.
  • Carl-Erik Särndal, Bengt Swensson, and Jan Wretman., Model Assisted Survey Sampling (Springer Series in Statistics). Springer Science & Business Media, 2003. ISBN 0387406204.
  • Terrance D Savitsky and Daniell Toth. Bayesian estimation under informative sampling., Electronic Journal of Statistics, 10(1) :1677–1708, 2016.
  • Yajuan Si, Natesh S Pillai, Andrew Gelman, et al. Bayesian nonparametric weighted sampling inference., Bayesian Analysis, 10(3):605–625, 2015.
  • Paul L Speckman and Dongchu Sun. Fully bayesian spline smoothing and intrinsic autoregressive priors., Biometrika, 90(2):289–302, 2003.
  • Mark Steffen, Carol Kuhle, Donald Hensrud, Patricia J Erwin, and Mohammad H Murad. The effect of coffee consumption on blood pressure and the development of hypertension: a systematic review and meta-analysis., Journal of hypertension, 30(12) :2245–2254, 2012.