Bayesian Analysis

Bayesian Nonparametric Weighted Sampling Inference

Yajuan Si, Natesh S. Pillai, and Andrew Gelman

Full-text: Open access


It has historically been a challenge to perform Bayesian inference in a design-based survey context. The present paper develops a Bayesian model for sampling inference in the presence of inverse-probability weights. We use a hierarchical approach in which we model the distribution of the weights of the nonsampled units in the population and simultaneously include them as predictors in a nonparametric Gaussian process regression. We use simulation studies to evaluate the performance of our procedure and compare it to the classical design-based estimator. We apply our method to the Fragile Family and Child Wellbeing Study. Our studies find the Bayesian nonparametric finite population estimator to be more robust than the classical design-based estimator without loss in efficiency, which works because we induce regularization for small cells and thus this is a way of automatically smoothing the highly variable weights.

Article information

Bayesian Anal., Volume 10, Number 3 (2015), 605-625.

First available in Project Euclid: 2 February 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

survey weighting poststratification model-based survey inference Gaussian process prior Stan


Si, Yajuan; Pillai, Natesh S.; Gelman, Andrew. Bayesian Nonparametric Weighted Sampling Inference. Bayesian Anal. 10 (2015), no. 3, 605--625. doi:10.1214/14-BA924.

Export citation


  • Carlson, B. L. (2008). “Fragile Families and Child Wellbeing Study: Methodology for constructing mother, father, and couple weights for core telephone surveys waves 1-4.” Technical report, Mathematica Policy Research.
  • CBS News/New York Times (1988). Monthly Poll, July 1988. Inter-university Consortium for Political and Social Research, University of Michigan.
  • Chen, Q., Elliott, M. R., and Little, R. J. (2010). “Bayesian penalized spline model-based inference for finite population proportion in unequal probability sampling.” Survey Methodology, 36(1): 23–34.
  • Cook, S., Gelman, A., and Rubin, D. B. (2006). “Validation of software for Bayesian models using posterior quantiles.” Journal of Computational and Graphical Statistics, 15: 675–692.
  • Elliott, M. R. (2007). “Bayesian weight trimming for generalized linear regression models.” Journal of Official Statistics, 33(1): 23–34.
  • Elliott, M. R. and Little, R. J. (2000). “Model-based alternatives to trimming survey weights.” Journal of Official Statistics, 16(3): 191–209.
  • Gelman, A. (2006). “Prior distributions for variance parameters in hierarchical models.” Bayesian Analysis, 3: 515–533.
  • — (2007). “Struggles with survey weighting and regression modeling (with discussion).” Statistical Science, 22(2): 153–164.
  • Gelman, A. and Carlin, J. B. (2001). “Poststratification and weighting adjustments.” In Groves, R., Dillman, D., Eltinge, J., and Little, R. (eds.), Survey Nonresponse.
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. CRC Press, London, 3rd edition.
  • Gelman, A., Jakulin, A., Pittau, M. G., and Su, Y.-S. (2008). “A weakly informative default prior distribution for logistic and other regression models.” Annals of Applied Statistics, 2(4): 1360–1383.
  • Gelman, A. and Little, T. C. (1997). “Poststratifcation into many cateogiries using hierarchical logistic regression.” Survey Methodology, 23: 127–135.
  • Gelman, A., Meng, X.-L., and Stern, H. S. (1996). “Posterior predictive assessment of model fitness via realized discrepancies.” Statistica Sinica, 6: 733–807.
  • Ghitza, Y. and Gelman, A. (2013). “Deep interactions with MRP: Election turnout and voting patterns among small electoral subgroups.” American Journal of Political Science, 57: 762–776.
  • Hájek, J. (1971). “Comment on “An Essay on the logical foundations of survey sampling” by D. Basu.” In Godambe, V. P. and Sprott, D. A. (eds.), The Foundations of Survey Sampling, 236. Holt, Rinehart and Winston.
  • Hoffman, M. D. and Gelman, A. (2014). “The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research, 15: 1351–1381.
  • Horvitz, D. G. and Thompson, D. J. (1952). “A generalization of sampling without replacement from a finite university.” Journal of the American Statistical Association, 47(260): 663–685.
  • Lax, J. and Phillips, J. (2009a). “Gay rights in the states: Public opinion and policy responsiveness.” American Political Science Review, 103: 367–386.
  • — (2009b). “How should we estimate public opinion in the states?” American Journal of Political Science, 53: 107–121.
  • Little, R. J. (1983). “Comment on “An evaluation of model-dependent and probability-sampling inferences in sample surveys” by M. H. Hansen, W. G. Madow and B. J. Tepping.” Journal of the American Statistical Association, 78: 797–799.
  • — (1991). “Inference with survey weights.” Journal of Official Statistics, 7: 405–424.
  • — (1993). “Post-stratification: A modeler’s perspective.” Journal of the American Statistical Association, 88: 1001–1012.
  • Lumley, T. (2010). Complex Surveys: A Guide to Analysis Using R. Wiley, New York.
  • Oleson, J. J., He, C., Sun, D., and Sheriff, S. (2007). “Bayesian estimation in small areas when the sampling design strata differ from the study domains.” Survey Methodology, 33: 173–185.
  • Polson, N. G. and Scott, J. G. (2012). “On the half-Cauchy prior for a global scale parameter.” Bayesian Analysis, 7(2): 1–16.
  • Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, Mass.
  • Reichman, N. E., Teitler, J. O., Garfinkel, I., and McLanahan, S. S. (2001). “Fragile Families: Sample and design.” Children and Youth Services Review, 23(4/5): 303–326.
  • Rubin, D. B. (1983). “Comment on “An evaluation of model-dependent and probability-sampling inferences in sample surveys” by M. H. Hansen, W. G. Madow and B. J. Tepping.” Journal of the American Statistical Association, 78: 803–805.
  • Stan Development Team (2014a). “Stan: A C++ Library for Probability and Sampling, version 2.2.”
  • — (2014b). Stan Modeling Language User’s Guide and Reference Manual, version 2.2.
  • van der Vaart, A. W. and van Zanten, J. H. (2008). “Rates of contraction of posterior distributions based on Gaussian process priors.” Annals of Statistics, 36(3): 1435–1463.
  • Zangeneh, S. Z. and Little, R. J. (2012). “Bayesian inference for the finite population total from a heteroscedastic probability proportional to size sample.” Proceedings of the Joint Statistical Meetings, Section on Survey Methodology.
  • Zheng, H. and Little, R. J. (2003). “Penalized spline model-based estimation of the finite populations total from probability-proportional-to-size samples.” Journal of Official Statistics, 19(2): 99–107.