Statistical Science

Approaches to Improving Survey-Weighted Estimates

Qixuan Chen, Michael R. Elliott, David Haziza, Ye Yang, Malay Ghosh, Roderick J. A. Little, Joseph Sedransk, and Mary Thompson

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

In sample surveys, the sample units are typically chosen using a complex design. This may lead to a selection effect and, if uncorrected in the analysis, may lead to biased inferences. To mitigate the effect on inferences of deviations from a simple random sample a common technique is to use survey weights in the analysis. This article reviews approaches to address possible inefficiency in estimation resulting from such weighting.

To improve inferences we emphasize modifications of the basic design-based weight, that is, the inverse of a unit’s inclusion probability. These techniques include weight trimming, weight modelling and incorporating weights via models for survey variables. We start with an introduction to survey weighting, including methods derived from both the design and model-based perspectives. Then we present the rationale and a taxonomy of methods for modifying the weights. We next describe an extensive numerical study to compare these methods. Using as the criteria relative bias, relative mean square error, confidence or credible interval width and coverage probability, we compare the alternative methods and summarize our findings. To supplement this numerical study we use Texas school data to compare the distributions of the weights for several methods. We also make general recommendations, describe limitations of our numerical study and make suggestions for further investigation.

Article information

Source
Statist. Sci. Volume 32, Number 2 (2017), 227-248.

Dates
First available in Project Euclid: 11 May 2017

Permanent link to this document
https://projecteuclid.org/euclid.ss/1494489813

Digital Object Identifier
doi:10.1214/17-STS609

Keywords
Design-based survey weights finite population survey sampling inclusion probability weight modeling weight trimming

Citation

Chen, Qixuan; Elliott, Michael R.; Haziza, David; Yang, Ye; Ghosh, Malay; Little, Roderick J. A.; Sedransk, Joseph; Thompson, Mary. Approaches to Improving Survey-Weighted Estimates. Statist. Sci. 32 (2017), no. 2, 227--248. doi:10.1214/17-STS609. https://projecteuclid.org/euclid.ss/1494489813.


Export citation

References

  • Basu, D. (1971). An essay on the logical foundations of survey sampling. Part I. In Foundations of Statistical Inference (V. P. Godambe and D. A. Sprott, eds.) 203–242. Holt, Rinehart and Winston, Toronto.
  • Beaumont, J.-F. (2008). A new approach to weighting and inference in sample surveys. Biometrika 95 539–553.
  • Beaumont, J.-F., Haziza, D. and Ruiz-Gazen, A. (2013). A unified approach to robust estimation in finite population sampling. Biometrika 100 555–569.
  • Benrud, C. H. et al. (1978). Final report on national assessment of educational progress: sampling and weighting activities for assessment year 08. Research Triangle Park, North Carolina: National Assessment of Education Progress.
  • Binder, D. A. (1982). Nonparametric Bayesian models for samples from finite populations. J. R. Stat. Soc., Ser. B. 44 388–393.
  • Box, G. E. P. (1980). Sampling and Bayes’ inference in scientific modelling and robustness. J. R. Stat. Soc., Ser. A 143 383–430.
  • Brewer, K. R. W. (1979). A class of robust sampling designs for large-scale surveys. J. Amer. Statist. Assoc. 74 911–915.
  • Chen, Q., Elliott, M. R. and Little, R. J. A. (2010). Bayesian penalized spline model-based inference for finite population proportion in unequal probability sampling. Surv. Methodol. 36 23–34.
  • Chen, Q., Elliott, M. R. and Little, R. J. A. (2012). Bayesian inference for finite population quantiles from unequal probability samples. Surv. Methodol. 38 203–214.
  • Chen, Q., Elliott, M. R., Haziza, D., Yang, Y., Ghosh, M., Little, R. J., Sedransk, J. and Thompson, M. (2017). Supplement to “Approaches to Improving Survey-Weighted Estimates.” DOI:10.1214/17-STS609SUPP.
  • Cochran, W. G. (1977). Sampling Techniques, 3rd ed. Wiley, New York.
  • Dalén, J. (1986). Sampling from finite populations: Actual coverage probabilities for confidence intervals on the population mean. J. Off. Stat. 2 13–24.
  • Deville, J.-C. and Särndal, C.-E. (1992). Calibration estimators in survey sampling. J. Amer. Statist. Assoc. 87 376–382.
  • Elliott, M. R. (2007). Bayesian weight trimming for generalized linear regression models. Surv. Methodol. 33 23–34.
  • Elliott, M. R. (2008). Model averaging methods for weight trimming. J. Off. Stat. 24 517–540.
  • Elliott, M. R. (2009). Model averaging methods for weight trimming in generalized linear regression models. J. Off. Stat. 25 1–20.
  • Elliott, M. R. and Little, R. J. A. (2000). Model-based alternatives to trimming survey weights. J. Off. Stat. 16 191–209.
  • Ericson, W. A. (1969). Subjective Bayesian models in sampling finite populations. J. R. Stat. Soc., Ser. B. 31 195–233.
  • Ericson, W. A. (1988). Bayesian inference in finite populations. In Handbook of Statistics (P. R. C. Krishnaiah and C. R. Rao, eds.) 213–246. North-Holland, Amsterdam.
  • Favre-Martinoz, C., Haziza, D. and Beaumont, J.-F. (2015). A method of determining the winsorization threshold, with an application to domain estimation. Surv. Methodol. 41 57–77.
  • Favre-Martinoz, C., Haziza, D. and Beaumont, J.-F. (2016). Robust inference in two-phase sampling designs with application to unit nonresponse. Scand. J. Stat. 43 1019–1034.
  • Fuller, W. A. (2009). Sampling Statistics. Wiley, New York.
  • Gelman, A. (2007). Struggles with survey weighting and regression modeling. Statist. Sci. 22 153–164.
  • Ghosh, M. and Meeden, G. (1986). Empirical Bayes estimation in finite population sampling. J. Amer. Statist. Assoc. 81 1058–1062.
  • Ghosh, M. and Meeden, G. (1997). Bayesian Methods for Finite Population Sampling. Chapman & Hall, London.
  • Godambe, V. and Thompson, M. (1986). Parameters of superpopulation and survey populations: Their relationships and estimation. Int. Stat. Rev. 54 127–138.
  • Hájek, J. (1971). Comment on a paper by D. Basu. Foundations of Statistical Inference 236.
  • Hansen, M. H., Hurwitz, W. N. and Madow, W. G. (1953). Sample Survey Methods and Theory. Methods and Applications 1. Wiley, New York.
  • Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Monographs on Statistics and Applied Probability 43. Chapman & Hall, London.
  • Haziza, D. and Beaumont, J. (2017). Construction of weights in surveys: A review. Statist. Sci. 32 206–226.
  • Haziza, D., Mecatti, F. and Rao, J. N. K. (2008). Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron 66 91–108.
  • Henry, K. and Valliant, R. (2012). Comparing alternative weight adjustment methods. In Proceedings of the Section on Survey Research Methods 4696–4710. Amer. Statist. Assoc., Alexandria, VA.
  • Holt, D. and Smith, T. M. F. (1979). Post stratification. J. R. Stat. Soc., Ser. A 142 33–46.
  • Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47 663–685.
  • Hulliger, B. (1995). Outlier robust Horvitz–Thompson estimator. Surv. Methodol. 21 79–87.
  • Isaki, C. T. and Fuller, W. A. (1982). Survey design under the regression superpopulation model. J. Amer. Statist. Assoc. 77 89–96.
  • Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci. 22 523–539.
  • Kim, J. K. and Skinner, C. J. (2013). Weighting in survey analysis under informative sampling. Biometrika 100 385–398.
  • Kish, L. (1995). The hundred years’ wars of survey sampling. Stat. Transit. 2 813–830.
  • Kokic, P. (1998). On winsorization in business surveys. In Proceedings of the Survey Methods Section 237–239. Statistical Society of Canada, Ottawa.
  • Kokic, P. N. and Bell, P. A. (1994). Optimal winsorising cut-offs for a stratified finite population estimator. J. Off. Stat. 10 419–435.
  • Lazzeroni, L. C. and Little, R. J. A. (1998). Random effects models for smoothing post-stratification weights. J. Off. Stat. 14 61–78.
  • Little, R. J. A. (1983). Comment on “An evaluation of model-dependent and probability sampling inferences in sample surveys” by M. H. Hansen, W. G. Madow and B. J. Tepping. J. Amer. Statist. Assoc. 78 797–799.
  • Little, R. J. A. (1991). Inference with survey weights. J. Off. Stat. 7 405–424.
  • Little, R. J. A. (2003). Bayesian methods for unit and item nonresponse. In Analysis of Survey Data (R. L. Chambers and C. J. Skinner, eds.) 289–306. Wiley, Chichester.
  • Little, R. J. A. (2006). Calibrated Bayes: A Bayes/frequentist roadmap. Amer. Statist. 60 213–223.
  • Little, R. J. A. (2012). Calibrated Bayes: An alternative inferential paradigm for official statistics. J. Off. Stat. 28 309–372.
  • Matei, A. and Tillé, Y. (2005). Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. J. Off. Stat. 21 543–570.
  • Molina, C. and Skinner, C. (1992). Pseudo-likelihood and quasi-likelihood estimation for complex sampling schemes. Comput. Statist. Data Anal. 13 395–405.
  • Moreno-Rebollo, J. L., Muñoz-Reyes, A. and Muñoz-Pichardo, J. (1999). Influence diagnostic in survey sampling: Conditional bias. Biometrika 86 923–928.
  • Moreno-Rebollo, J. L., Muñoz-Reyes, A., Jiménez-Gamero, M. D. and Muñoz-Pichardo, J. (2002). Influence diagnostic in survey sampling: Estimating the conditional bias. Metrika 55 209–214.
  • Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. J. Roy. Statist. Soc. 97 558–625.
  • O’Hagan, A. (1995). Fractional Bayes factors for model comparison. J. R. Stat. Soc., Ser. B. 57 99–138.
  • Peng, Y., Little, R. J. A. and Raghunathan, T. E. (2004). An extended general location model for causal inferences from data subject to noncompliance and missing values. Biometrics 60 598–607.
  • Pfeffermann, D. and Sverchkov, M. (2009). Inference under informative sampling. In Sample Surveys: Inference and Analysis (D. Pfeffermann and C. R. Rao, eds.) 29B 455–487. Elsevier, Amsterdam.
  • Potter, F. (1988). Survey of procedures to control extreme sampling weights. In Proceedings of the Survey Research Methods Section of the American Statistical Association 453–458. Amer. Statist. Assoc., Alexandria, VA.
  • Potter, F. (1990). A study of procedures to identify and trim extreme sampling weights. In Proceedings of the Survey Research Methods Section of the American Statistical Association 225–230.
  • Rao, J. N. K. (1966). Alternative estimators in sampling for multiple characteristics. Sankhyā Ser. A 28 47–60.
  • Rao, J. N. K. and Wu, C.-F. J. (1988). Resampling inference with complex survey data. J. Amer. Statist. Assoc. 83 231–241.
  • Rao, J. N. K., Wu, C. F. J. and Yue, K. (1992). Some recent work on resampling methods for complex surveys. Surv. Methodol. 18 209–217.
  • Rivest, L.-P. and Hidiroglou, M. (2004). Outlier treatment for disaggregated estimates. In Proceedings of the Survey Research Methods Section of the American Statistical Association 4248–4256. Amer. Statist. Assoc., Alexandria, VA.
  • Rivest, L.-P. and Hurtubise, D. (1995). On Searls’ winsorized means for skewed populations. Surv. Methodol. 21 107–116.
  • Royall, R. M. (1970). On finite population sampling theory under certain linear regression models. Biometrika 57 377–387.
  • Rubin, D. B. (1976). Inference and missing data. Biometrika 63 581–592.
  • Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6 34–58.
  • Rubin, D. B. (1983). Comment on “An evaluation of model-dependent and probability-sampling inferences in sample surveys” by M. H. Hansen, W. G. Madow, and B. J. Tepping. J. Amer. Statist. Assoc. 78 803–805.
  • Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12 1151–1172.
  • Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley, New York.
  • Särndal, C.-E. (2007). The calibration approach in survey theory and practice. Surv. Methodol. 33 99–119.
  • Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer, New York.
  • Scott, A. J. (1977). On the problem of randomization in survey sampling. Sankhyā 39 1–9.
  • Scott, A. J. and Smith, T. M. F. (1969). Estimation in multi-stage surveys. J. Amer. Statist. Assoc. 64 830–840.
  • Sugden, R. A. and Smith, T. M. F. (1984). Ignorable and informative designs in survey sampling inference. Biometrika 71 495–506.
  • Tambay, J. L. (1988). Integrated approach for the treatment of outliers in sub-annual economic surveys. In Proceedings of the Survey Research Methods Section of the American Statistical Association 229–234. Amer. Statist. Assoc., Alexandra, VA.
  • Thompson, M. E. (1997). Theory of Sample Surveys. Monographs on Statistics and Applied Probability 74. Chapman & Hall, London.
  • Tillé, Y. (2017). Sampling designs: New methods and guidelines. Statist. Sci. To appear.
  • Valliant, R., Dorfman, A. H. and Royall, R. M. (2000). Finite Population Sampling and Inference: A Prediction Approach. Wiley, New York.
  • Wahba, G. (1978). Improper priors, spline smoothing and the problem of guarding against model errors in regression. J. R. Stat. Soc., Ser. B. 40 364–372.
  • Wu, C.-F. and Deng, L.-Y. (1983). Estimation of variance of the ratio estimator: An empirical study. In Scientific Inference, Data Analysis and Robustness (G. E. P. Box, T. Leonard and C. F. J. Wu, eds.) 245–277. Academic Press, Orlando, FL.
  • Wu, C. and Lu, W. W. (2016). Calibration weighting methods for complex surveys. Int. Stat. Rev. 84 79–98.
  • Zangeneh, S. Z. and Little, R. J. A. (2015). Bayesian inference for the finite population total from a heteroscedastic probability proportional to size sample. Journal of Survey Statistics and Methodology 3 162–192.
  • Zheng, H. and Little, R. J. A. (2003). Penalized spline model-based estimation of the finite population total from probability-proportional-to-size samples. J. Off. Stat. 19 99–117.
  • Zheng, H. and Little, R. J. A. (2005). Inference for the population total from probability-proportional-to-size samples based on predictions from a penalized spline nonparametric model. J. Off. Stat. 21 1–20.

Supplemental materials

  • Supplement to “Approaches to Improving Survey-Weighted Estimates”. The Supplementary Material includes the density plots of the size variables, and dot plots summarizing the relative mean square errors, interval widths, and percent noncoverage for the methods in Section 4. Results of the simulations for binary outcomes, and plots corresponding to a thorough study of two scenarios with continuous outcomes are also presented.