Statistical Science

Inference for Nonprobability Samples

Michael R. Elliott and Richard Valliant

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Although selecting a probability sample has been the standard for decades when making inferences from a sample to a finite population, incentives are increasing to use nonprobability samples. In a world of “big data”, large amounts of data are available that are faster and easier to collect than are probability samples. Design-based inference, in which the distribution for inference is generated by the random mechanism used by the sampler, cannot be used for nonprobability samples. One alternative is quasi-randomization in which pseudo-inclusion probabilities are estimated based on covariates available for samples and nonsample units. Another is superpopulation modeling for the analytic variables collected on the sample units in which the model is used to predict values for the nonsample units. We discuss the pros and cons of each approach.

Article information

Source
Statist. Sci., Volume 32, Number 2 (2017), 249-264.

Dates
First available in Project Euclid: 11 May 2017

Permanent link to this document
https://projecteuclid.org/euclid.ss/1494489814

Digital Object Identifier
doi:10.1214/16-STS598

Mathematical Reviews number (MathSciNet)
MR3648958

Zentralblatt MATH identifier
1381.62024

Keywords
Coverage error hierarchical regression quasi-randomization reference sample selection bias superpopulation model

Citation

Elliott, Michael R.; Valliant, Richard. Inference for Nonprobability Samples. Statist. Sci. 32 (2017), no. 2, 249--264. doi:10.1214/16-STS598. https://projecteuclid.org/euclid.ss/1494489814


Export citation

References

  • Alvarez, R., Sherman, R. and Van Beselaere, C. (2003). Subject acquisition for web-based surveys. Polit. Anal. 11 23–43.
  • Baker, R., Brick, J., Bates, N., Couper, M., Courtright, M., Dennis, J., Dillman, D., Frankel, M., Garland, P., Groves, R., Kennedy, C., Krosnick, J., Lavrakas, P., Lee, S., Link, M., Piekarski, L., Rao, K., Thomas, R. and Zahs, D. (2010). AAPOR report on online panels. Public Opin. Q. 74 711–781.
  • Baker, R., Brick, J. M., Bates, N. A., Battaglia, M., Couper, M. P., Dever, J. A., Gile, K. and Tourangeau, R. (2013a). Report of the AAPOR Task Force on Non-probability Sampling. Technical report, American Association for Public Opinion Research, Deerfield, IL.
  • Baker, R., Brick, J. M., Bates, N. A., Battaglia, M., Couper, M. P., Dever, J. A., Gile, K. and Tourangeau, R. (2013b). Summary report of the AAPOR task force on non-probability sampling. Journal of Survey Statistics and Methodology 1 90–143.
  • Berzofsky, M., Williams, R. and Biemer, P. (2009). Combining probability and non-probability sampling methods: Model-aided sampling and the O∗NET data collection program. Survey Practice.
  • Bethlehem, J. (2010). Selection bias in web surveys. Int. Stat. Rev. 78 161–188.
  • Binder, D. and Roberts, G. (2009). Imputation of business survey data. In Handbook of Statistics, Sample Surveys: Inference and Analysis, Volume 29B (D. Pfeffermann and C. Rao, eds.). Elsevier, Amsterdam.
  • Brick, J. (2015). Compositional model inference. In Proceedings of the Section on Survey Research Methods 299–307. Amer. Statist. Assoc., Alexandria, VA.
  • Cavallo, A. and Rigobon, R. (2016). The billion prices project: Using online prices for measurement and research. The Journal of Economic Perspectives 151–178.
  • Chen, J. K.-T. (2015). Using LASSO to Calibrate Non-probability Samples using Probability Samples. Ph.D. thesis, Univ. Michigan, Ann Arbor, MI.
  • Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). BART: Bayesian additive regression trees. Ann. Appl. Stat. 4 266–298.
  • Clement, S. (2016). How the Washington Post-SurveyMonkey 50-state poll was conducted. Available at https://www.washingtonpost.com/news/post-politics/wp/2016/09/06/how-the-washington-post-surveymonkey-50-state-poll-was-conducted/.
  • Couper, M. (2013). Is the sky falling? New technology, changing media, and the future of surveys. Survey Research Methods 7 145–156.
  • Cowling, D. (2015). Election 2015: How the opinion polls got it wrong. Available at http://www.bbc.com/news/uk-politics-32751993. BBC News online; accessed 06-November-2016.
  • Dever, J., Rafferty, A. and Valliant, R. (2008). Internet surveys: Can statistical adjustments eliminate coverage bias? Survey Research Methods 2 47–62.
  • Dever, J. and Valliant, R. (2010). A comparison of variance estimators for poststratification to estimated control totals. Surv. Methodol. 36 45–56.
  • Dever, J. and Valliant, R. (2014). Estimation with non-probability surveys and the question of external validity. In Proceedings of Statistics Canada Symposium 2014. Statistics Canada, Ottawa, ON.
  • Dever, J. and Valliant, R. (2016). GREG estimation with undercoverage and estimated controls. Journal of Survey Statistics and Methodology 4 289–318.
  • Deville, J. (1991). A theory of quota surveys. Surv. Methodol. 17 163–181.
  • Dong, Q., Elliott, M. and Raghunathan, T. (2014). A non-parametric method to generate synthetic populations to adjust for complex sample designs. Surv. Methodol. 40 29–46.
  • Elliott, M. (2009). Combining data from probability and non-probability samples using pseudo-weights. Survey Practice.
  • Elliott, M. R. and Davis, W. W. (2005). Obtaining cancer risk factor prevalence estimates in small areas: Combining data from two surveys. J. R. Stat. Soc. Ser. C. Appl. Stat. 54 595–609.
  • Elliott, M. and Little, R. J. A. (2000). Model averaging methods for weight trimming. J. Off. Stat. 16 191–209.
  • Elliott, M., Resler, A., Flannagan, C. and Rupp, J. (2010). Combining data from probability and non-probability samples using pseudo-weights. Accident Analysis and Prevention 42 530–539.
  • Enten, H. (2014). Flying Blind Toward Hogan’s Upset Win In Maryland. Available at http://fivethirtyeight.com/datalab/governor-maryland-surprise-brown-hogan/. FiveThirtyEight online; accessed 06-November-2016.
  • Ferrari, S. L. P. and Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. J. Appl. Stat. 31 799–815.
  • File, T. and Ryan, C. (2014). Computer and internet use in the United States: 2013. Available at http://www.census.gov/content/dam/Census/library/publications/2014/acs/acs-28.pdf. US Census Bureau; accessed 06-November-2016.
  • Frost, S., Brouwer, K., Firestone-Cruz, M., Ramos, R., Ramos, M., Lozada, R., Magis-Rodriguez, C. and Strathdee, S. (2006). Respondent-driven sampling of injection drug users in two U.S.-Mexico border cities: Recruitment dynamics and impact on estimates of hiv and syphilis prevalence. Journal of Urban Health 83 83–97.
  • Gile, K. J. and Handcock, M. S. (2010). Respondent-driven sampling: An assessment of current methodology. Sociol. Method. 40 285–327.
  • Gosnell, H. F. (1937). How accurate were the polls? Public Opin. Q. 1 97–105.
  • Haziza, D. and Beaumont, J.-F. (2017). Construction of weights in surveys: A review. Statist. Sci. 32 206–226.
  • Heckathorn, D. D. (1997). Respondent-driven sampling: A new approach to the study of hidden populations. Soc. Probl. 44 174–199.
  • Holt, D. and Smith, T. M. F. (1979). Poststratification. J. R. Stat. Soc., A 142 33–46.
  • Kaizar, E. (2015). Incorporating both randomized and observational data into a single analysis. Annual Review of Statistics and Its Application 2 49–72.
  • Keiding, N. and Louis, T. (2016). Perils and potentials of self-selected entry to epidemiological studies and surveys. J. R. Stat. Soc., A 179 319–376.
  • Kohut, A., Keeter, S., Doherty, C., Dimock, M. and Christian, L. (2012). Assessing the representativeness of public opinion surveys. Available at http://www.people-press.org/2012/05/15/assessing-the-representativeness-of-public-opinion-surveys/. Pew Research Center; accessed 06-November-2016.
  • Korn, E. and Graubard, B. (1999). Analysis of Health Surveys. Wiley, New York.
  • LeBlanc, M. and Tibshirani, R. (1998). Monotone shrinkage of trees. J. Comput. Graph. Statist. 7 417–433.
  • Lee, S. and Valliant, R. (2009). Estimation for volunteer panel web surveys uing propensity score adjustment and calibration adjustment. Sociol. Methods Res. 37 319–343.
  • Liebermann, O. (2015). Why were the Israeli election polls so wrong? Available at http://www.cnn.com/2015/03/18/middleeast/israel-election-polls/. CNN online; accessed 06-November-2016.
  • Little, R. J. A. (1982). Models for nonresponse in sample surveys. J. Amer. Statist. Assoc. 77 237–250.
  • Little, R. J. A. (2003). Bayesian methods for unit and item nonresponse. In Analysis of Survey Data (R. Chambers and C. Skinner, eds.). Wiley, Chichester.
  • Lumley, T. and Scott, A. (2017). Fitting regression models to survey data. Statist. Sci. 32 265–278.
  • Madigan, D., Stang, P., Berlin, J., Schuemie, M., Overhage, J., Suchard, M., Dumouchel, W., Hartzema, W. and Ryan, P. (2014). A systematic statistical approach to evaluating evidence from observational studies. Annual Review of Statistics and Its Application 1 11–39.
  • Murphy, J., Link, M., Childs, J., Tesfaye, C., Dean, E., Stern, M., Pasek, J., Cohen, J., Callegaro, M. and Harwood, P. (2015). Social media in public opinion research. Public Opin. Q. 78 788–794.
  • Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society 97 558–625.
  • O’Muircheartaigh, C. and Hedges, L. V. (2014). Generalizing from unrepresentative experiments: A stratified propensity score approach. J. R. Stat. Soc. Ser. C. Appl. Stat. 63 195–210.
  • Rao, J. N. K. and Wu, C. F. J. (1988). Resampling inference with complex survey data. J. Amer. Statist. Assoc. 83 231–241.
  • Rao, J. N. K., Wu, C. F. J. and Yue, K. (1992). Some recent work on resampling methods for complex surveys. Surv. Methodol. 18 209–217.
  • Rivers, D. (2007). Sampling for web surveys. Amazon Web Services. Available at https://s3.amazonaws.com/yg-public/Scientific/Sample+Matching_JSM.pdf.
  • Rosenbaum, P. and Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41–55.
  • Royall, R. (1970). On finite population sampling theory under certain linear regression models. Biometrika 57 377–387.
  • Royall, R. (1971). Linear regression models in finite population sampling theory. In Foundations of Statistical Inference (V. Godambe and D. Sprott, eds.). Holt, Rinehart, and Winston, Toronto.
  • Rubin, D. B. (1976). Inference and missing data. Biometrika 63 581–592.
  • Rubin, D. (1979). Using multivariate matched sampling and regression adjustment to control bias in observational studies. J. Amer. Statist. Assoc. 74 318–328.
  • Rubin, D. B. (1981). The Bayesian bootstrap. Ann. Statist. 9 130–134.
  • Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer, New York.
  • Schonlau, M. and Couper, M. (2017). Options for conducting web surveys. Statist. Sci. 32 279–292.
  • Schonlau, M., van Soest, A. and Kapteyn, A. (2007). Are “Webographic” or attitudinal questions useful for adjusting estimates from web surveys using propensity scoring? Survey Research Methods 1 155–163.
  • Schonlau, M., Weidmer, B. and Kapteyn, A. (2014). Recruiting an Internet panel using respondent-driven sampling. J. Off. Stat. 30 291–310.
  • Simon, H. (1956). Rational choice and the structure of the environment. Psychological Review 63 129–138.
  • Sirken, M. (1970). Household surveys with multiplicity. J. Amer. Statist. Assoc. 65 257–266.
  • Smith, T. M. F. (1976). The foundations of survey sampling: A review. J. Roy. Statist. Soc. Ser. A 139 183–204.
  • Smith, T. M. F. (1983). On the validity of inferences from non-random samples. J. R. Stat. Soc., A 146 394–403.
  • Squire, P. (1988). Why the 1936 literary digest poll failed. Public Opin. Q. 52 125–133.
  • Stuart, E. A., Cole, S. R., Bradshaw, C. P. and Leaf, P. J. (2011). The use of propensity scores to assess the generalizability of results from randomized trials. J. R. Stat. Soc., A 174 369–386.
  • Sturgis, P., Baker, N., Callegaro, M., Fisher, S., Green, J., Jennings, W., Kuha, J., Lauderdale, B. and Smith, P. (2016). Report of the Inquiry into the 2015 British general election opinion polls. Available at http://eprints.ncrm.ac.uk/3789/1/Report_final_revised.pdf. accessed 06-November-2016.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
  • US Energy Information Administration (2016). Weekly petroleum status report. Available at https://www.eia.gov/petroleum/supply/weekly/pdf/appendixb.pdf. US Department of Energy online: accessed 06-November-2016.
  • Valliant, R. and Dever, J. A. (2011). Estimating propensity adjustments for volunteer web surveys. Sociol. Methods Res. 40 105–137.
  • Valliant, R., Dever, J. A. and Kreuter, F. (2013). Practical Tools for Designing and Weighting Survey Samples. Springer, New York.
  • Valliant, R., Dorfman, A. H. and Royall, R. M. (2000). Finite Population Sampling and Inference: A Prediction Approach. Wiley, New York.
  • Van der Laan, M. J., Polley, E. C. and Hubbard, A. E. (2007). Super learner. Stat. Appl. Genet. Mol. Biol. 6.
  • Vonk, T. W. E., van Ossenbruggen, R. and Willems, P. (2006). The effects of panel recruitment and management on research results. Available at https://www.esomar.org/web/research_papers/Web-Panel_1476_The-effects-of-panel-recruitment-and-management-on-research-results.php. ESOMAR; accessed 06-November-2016.
  • Wang, W., Rothschild, D., Goel, S. and Gelman, A. (2015). Forecasting elections with non-representative polls. Int. J. Forecast. 31 980–991.
  • Zhou, H., Elliott, M. and Raghunathan, T. (2016a). Multiple imputation in two-stage cluster samples using the weighted finite population Bayesian bootstrap. Journal of Survey Statistics and Methodology 4 139–170.
  • Zhou, H., Elliott, M. and Raghunathan, T. (2016b). Synthetic multiple imputation procedure for multi-stage complex samples. J. Off. Stat. 32 251–256.
  • Zhou, H., Elliott, M. and Raghunathan, T. (2016c). A two-step semiparametric method to accommodate sampling weights in multiple imputation. Biometrics 72 242–252.