Electronic Journal of Statistics

Bayesian pairwise estimation under dependent informative sampling

Matthew R. Williams and Terrance D. Savitsky

Full-text: Open access


An informative sampling design leads to the selection of units whose inclusion probabilities are correlated with the response variable of interest. Inference under the population model performed on the resulting observed sample, without adjustment, will be biased for the population generative model. One approach that produces asymptotically unbiased inference employs marginal inclusion probabilities to form sampling weights used to exponentiate each likelihood contribution of a pseudo likelihood used to form a pseudo posterior distribution. Conditions for posterior consistency restrict applicable sampling designs to those under which pairwise inclusion dependencies asymptotically limit to $0$. There are many sampling designs excluded by this restriction; for example, a multi-stage design that samples individuals within households. Viewing each household as a population, the dependence among individuals does not attenuate. We propose a more targeted approach in this paper for inference focused on pairs of individuals or sampled units; for example, the substance use of one spouse in a shared household, conditioned on the substance use of the other spouse. We formulate the pseudo likelihood with weights based on pairwise or second order probabilities and demonstrate consistency, removing the requirement for asymptotic independence and replacing it with restrictions on higher order selection probabilities. Our approach provides a nearly automated estimation procedure applicable to any model specified by the data analyst. We demonstrate our method on the National Survey on Drug Use and Health.

Article information

Electron. J. Statist., Volume 12, Number 1 (2018), 1631-1661.

Received: September 2017
First available in Project Euclid: 26 May 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Survey sampling sampling weights quantile regression non-linear regression Markov chain Monte Carlo

Creative Commons Attribution 4.0 International License.


Williams, Matthew R.; Savitsky, Terrance D. Bayesian pairwise estimation under dependent informative sampling. Electron. J. Statist. 12 (2018), no. 1, 1631--1661. doi:10.1214/18-EJS1435. https://projecteuclid.org/euclid.ejs/1527300143

Export citation


  • Breslow, N. E. & Wellner, J. A. (2007), ‘Weighted likelihood for semiparametric models and two-phase stratified samples, with application to cox regression’, Scandinavian Journal of Statistics 34(1), 86–102.
  • Brewer, K. (1975), ‘A simple procedure for $\pi$pswor’, Australian Journal of Statistics 17, 166–172.
  • Carpenter, B. (2015), ‘Stan: A probabilistic programming language’, Journal of Statistical Software 76(1).
  • Clifford, S. & Choy, S. L. (2012), Bayesian splines, in C. L. Alston, K. L. Mengersen & A. N. Pettitt, eds, ‘Case Studies in Bayesian Statistical Modelling and Analysis’, Wiley series in probability and statistics, John Wiley & Sons, Ltd, Oxford, pp. 197–220.
  • Dong, Q., Elliott, M. R. & Raghunathan, T. E. (2014), ‘A nonparametric method to generate synthetic populations to adjust for complex sampling design features’, Survey Methodology 40(1), 29–46.
  • Ghosal, S., Ghosh, J. K. & Vaart, A. W. V. D. (2000), ‘Convergence rates of posterior distributions’, Ann. Statist 28(2), 500–531.
  • Ghosal, S. & van der Vaart, A. (2007), ‘Convergence rates of posterior distributions for noniid observations’, Ann. Statist. 35(1), 192–223.
  • Kunihama, T., Herring, A. H., Halpern, C. T. & Dunson, D. B. (2014), Nonparametric bayes modeling with sample survey weights, Statist. Probab. Lett 113, 41–48.
  • Morton, K. B., Aldworth, J., Hirsch, E. L., Martin, P. C. & Shook-Sa, B. E. (2016), Section 2, sample design report, in ‘2014 National Survey on Drug Use and Health: Methodological Resource Book’, Center for Behavioral Health Statistics and Quality, Substance Abuse and Mental Health Services Administration, Rockville, MD.
  • Rao, J. N. K. & Wu, C. (2010), ‘Bayesian pseudo-empirical-likelihood intervals for complex surveys’, Journal of the Royal Statistical Society Series B 72(4), 533–544.
  • Reed, C. & Yu, K. (2009), A partially collapsed gibbs sampler for bayesian quantile regression, Technical, report.
  • Rue, H. & Held, L. (2005), Gaussian Markov Random Fields: Theory and Applications, Chapman & Hall/CRC.
  • Savitsky, T. D. & Srivastava, S. (2018), ‘Scalable Bayes under Informative Sampling’, Scandinavian Journal of Statistics. URL: https://doi.org/10.1111/sjos.12312
  • Savitsky, T. D. & Toth, D. (2016), ‘Bayesian Estimation Under Informative Sampling’, Electronic Journal of Statistics 10(1), 1677–1708.
  • Si, Y., Pillai, N. S. & Gelman, A. (2015), ‘Bayesian nonparametric weighted sampling inference’, Bayesian Anal. 10(3), 605–625.
  • Westlake, M., Chen, P., Gordek, H., Williams, M. & Hughes, A. (2016), Section 12, questionnaire dwelling unit-level and person pair-level sampling weight calibration, in ‘2014 National Survey on Drug Use and Health: Methodological Resource Book’, Center for Behavioral Health Statistics and Quality, Substance Abuse and Mental Health Services Administration, Rockville, MD.
  • Yi, G. Y., Rao, J. N. K. & Li, H. (2016), ‘A Weighted Composite Likelihood Approach for Analysis of Survey Data under Two-level Models’, Statistica Sinica 26, 569–587.