Open Access
2019 Fully Bayesian estimation under informative sampling
Luis G. León-Novelo, Terrance D. Savitsky
Electron. J. Statist. 13(1): 1608-1645 (2019). DOI: 10.1214/19-EJS1538
Abstract

Survey data are often collected under informative sampling designs where subject inclusion probabilities are designed to be correlated with the response variable of interest. The data modeler seeks to estimate the parameters of a population model they specify from these data. Sampling weights constructed from marginal inclusion probabilities are typically used to form an exponentiated pseudo likelihood as a plug-in estimator in a partially Bayesian pseudo posterior. We introduce the first fully Bayesian alternative, based on a Bayes rule construction, that simultaneously performs weight smoothing and estimates the population model parameters in a construction that treats the response variable(s) and inclusion probabilities as jointly randomly generated from a population distribution. We formulate conditions on known marginal and pairwise inclusion probabilities that define a class of sampling designs where $L_{1}$ consistency of the joint posterior is guaranteed. We compare performances between the two approaches on synthetic data. We demonstrate that the credibility intervals under our fully Bayesian method achieve nominal coverage. We apply our method to data from the National Health and Nutrition Examination Survey to explore the relationship between caffeine consumption and systolic blood pressure.

References

1.

Bob Carpenter, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A probabilistic programming language., Journal of Statistical Software, 20:1–37, 2016.Bob Carpenter, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A probabilistic programming language., Journal of Statistical Software, 20:1–37, 2016.

2.

CDC-A. Centers for Disease Control (CDC). national health and nutrition examination survey.,  https://www.cdc.gov/nchs/nhanes/index.htm, 2016. Accessed: June 7, 2016.CDC-A. Centers for Disease Control (CDC). national health and nutrition examination survey.,  https://www.cdc.gov/nchs/nhanes/index.htm, 2016. Accessed: June 7, 2016.

3.

CDC-B. Centers for Disease Control (CDC). modeling usual intake using dietary recall data, task 4: Key concepts about variance estimation methods in NHANES.,  https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info4.htm, 2016. Accessed: June 7, 2016.CDC-B. Centers for Disease Control (CDC). modeling usual intake using dietary recall data, task 4: Key concepts about variance estimation methods in NHANES.,  https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info4.htm, 2016. Accessed: June 7, 2016.

4.

CDC-D. Centers for Disease Control (CDC). key concepts about NHANES survey design: NHANES sampling procedure.,  http://www.cdc.gov/nchs/tutorials/NHANES/SurveyDesign/SampleDesign/Info1.htm, 2016. Accessed: June 7, 2016.CDC-D. Centers for Disease Control (CDC). key concepts about NHANES survey design: NHANES sampling procedure.,  http://www.cdc.gov/nchs/tutorials/NHANES/SurveyDesign/SampleDesign/Info1.htm, 2016. Accessed: June 7, 2016.

5.

Qi Dong, Michael R. Elliott, and Trivellore E. Raghunathan. A nonparametric method to generate synthetic populations to adjust for complex sampling design features., Survey Methodology, 40(1):29–46, 2014.Qi Dong, Michael R. Elliott, and Trivellore E. Raghunathan. A nonparametric method to generate synthetic populations to adjust for complex sampling design features., Survey Methodology, 40(1):29–46, 2014.

6.

Subhashis Ghosal, Jayanta K Ghosh, and Aad W Van Der Vaart. Convergence rates of posterior distributions., Annals of Statistics, 28(2):500–531, 2000. 1105.62315 10.1214/aos/1016218228 euclid.aos/1016218228Subhashis Ghosal, Jayanta K Ghosh, and Aad W Van Der Vaart. Convergence rates of posterior distributions., Annals of Statistics, 28(2):500–531, 2000. 1105.62315 10.1214/aos/1016218228 euclid.aos/1016218228

7.

David R Judkins. Fay’s method for variance estimation., Journal of Official Statistics, 6(3):223, 1990.David R Judkins. Fay’s method for variance estimation., Journal of Official Statistics, 6(3):223, 1990.

8.

D Krewski and JNK Rao. Inference from stratified samples: properties of the linearization, jackknife and balanced repeated replication methods., The Annals of Statistics, pages 1010–1019, 1981. 0474.62013 10.1214/aos/1176345580 euclid.aos/1176345580D Krewski and JNK Rao. Inference from stratified samples: properties of the linearization, jackknife and balanced repeated replication methods., The Annals of Statistics, pages 1010–1019, 1981. 0474.62013 10.1214/aos/1176345580 euclid.aos/1176345580

9.

Tsuyoshi Kunihama, AH Herring, CT Halpern, and DB Dunson. Nonparametric bayes modeling with sample survey weights., Statistics & Probability Letters, 113:41–48, 2016. 1384.62031 10.1016/j.spl.2016.02.009Tsuyoshi Kunihama, AH Herring, CT Halpern, and DB Dunson. Nonparametric bayes modeling with sample survey weights., Statistics & Probability Letters, 113:41–48, 2016. 1384.62031 10.1016/j.spl.2016.02.009

10.

Roderick J. Little. To model or not to model? Competing modes of inference for finite population sampling., Journal of the American Statistical Association, 99(466):546–556, 2004. 1117.62389 10.1198/016214504000000467Roderick J. Little. To model or not to model? Competing modes of inference for finite population sampling., Journal of the American Statistical Association, 99(466):546–556, 2004. 1117.62389 10.1198/016214504000000467

11.

Philip J McCarthy. Pseudo-replication: Half samples., Revue de l’Institut International de Statistique, pages 239–264, 1969. 0186.53001 10.2307/1402116Philip J McCarthy. Pseudo-replication: Half samples., Revue de l’Institut International de Statistique, pages 239–264, 1969. 0186.53001 10.2307/1402116

12.

Robert Nishihara, Iain Murray, and Ryan P. Adams. Parallel mcmc with generalized elliptical slice sampling., J. Mach. Learn. Res., 15(1) :2087–2112, January 2014. ISSN 1532-4435. URL  http://dl.acm.org/citation.cfm?id=2627435.26703181319.60153Robert Nishihara, Iain Murray, and Ryan P. Adams. Parallel mcmc with generalized elliptical slice sampling., J. Mach. Learn. Res., 15(1) :2087–2112, January 2014. ISSN 1532-4435. URL  http://dl.acm.org/citation.cfm?id=2627435.26703181319.60153

13.

Danny Pfeffermann, Abba M Krieger, and Yosef Rinott. Parametric distributions of complex survey data under informative probability sampling., Statistica Sinica, pages 1087–1114, 1998. 0923.62019Danny Pfeffermann, Abba M Krieger, and Yosef Rinott. Parametric distributions of complex survey data under informative probability sampling., Statistica Sinica, pages 1087–1114, 1998. 0923.62019

14.

Danny Pfeffermann, Fernando Antonio Da Silva Moura, and Pedro Luis Do Nascimento Silva. Multi-level modelling under informative sampling., Biometrika, 93(4):943–959, 2006. 06598509 10.1093/biomet/93.4.943Danny Pfeffermann, Fernando Antonio Da Silva Moura, and Pedro Luis Do Nascimento Silva. Multi-level modelling under informative sampling., Biometrika, 93(4):943–959, 2006. 06598509 10.1093/biomet/93.4.943

15.

J. N. K. Rao and Changbao Wu. Bayesian pseudo-empirical-likelihood intervals for complex surveys., Journal of the Royal Statistical Society Series B, 72(4):533–544, 2010. 1411.62034 10.1111/j.1467-9868.2010.00747.xJ. N. K. Rao and Changbao Wu. Bayesian pseudo-empirical-likelihood intervals for complex surveys., Journal of the Royal Statistical Society Series B, 72(4):533–544, 2010. 1411.62034 10.1111/j.1467-9868.2010.00747.x

16.

Carl-Erik Särndal, Bengt Swensson, and Jan Wretman., Model Assisted Survey Sampling (Springer Series in Statistics). Springer Science & Business Media, 2003. ISBN 0387406204.Carl-Erik Särndal, Bengt Swensson, and Jan Wretman., Model Assisted Survey Sampling (Springer Series in Statistics). Springer Science & Business Media, 2003. ISBN 0387406204.

17.

Terrance D Savitsky and Daniell Toth. Bayesian estimation under informative sampling., Electronic Journal of Statistics, 10(1) :1677–1708, 2016. 1397.62117 10.1214/16-EJS1153Terrance D Savitsky and Daniell Toth. Bayesian estimation under informative sampling., Electronic Journal of Statistics, 10(1) :1677–1708, 2016. 1397.62117 10.1214/16-EJS1153

18.

Yajuan Si, Natesh S Pillai, Andrew Gelman, et al. Bayesian nonparametric weighted sampling inference., Bayesian Analysis, 10(3):605–625, 2015. 1334.62024 10.1214/14-BA924Yajuan Si, Natesh S Pillai, Andrew Gelman, et al. Bayesian nonparametric weighted sampling inference., Bayesian Analysis, 10(3):605–625, 2015. 1334.62024 10.1214/14-BA924

19.

Paul L Speckman and Dongchu Sun. Fully bayesian spline smoothing and intrinsic autoregressive priors., Biometrika, 90(2):289–302, 2003. 1034.62023 10.1093/biomet/90.2.289Paul L Speckman and Dongchu Sun. Fully bayesian spline smoothing and intrinsic autoregressive priors., Biometrika, 90(2):289–302, 2003. 1034.62023 10.1093/biomet/90.2.289

20.

Mark Steffen, Carol Kuhle, Donald Hensrud, Patricia J Erwin, and Mohammad H Murad. The effect of coffee consumption on blood pressure and the development of hypertension: a systematic review and meta-analysis., Journal of hypertension, 30(12) :2245–2254, 2012.Mark Steffen, Carol Kuhle, Donald Hensrud, Patricia J Erwin, and Mohammad H Murad. The effect of coffee consumption on blood pressure and the development of hypertension: a systematic review and meta-analysis., Journal of hypertension, 30(12) :2245–2254, 2012.
Luis G. León-Novelo and Terrance D. Savitsky "Fully Bayesian estimation under informative sampling," Electronic Journal of Statistics 13(1), 1608-1645, (2019). https://doi.org/10.1214/19-EJS1538
Received: 1 July 2018; Published: 2019
Vol.13 • No. 1 • 2019
Back to Top