## Statistical Science

### Struggles with Survey Weighting and Regression Modeling

Andrew Gelman

#### Abstract

The general principles of Bayesian data analysis imply that models for survey responses should be constructed conditional on all variables that affect the probability of inclusion and nonresponse, which are also the variables used in survey weighting and clustering. However, such models can quickly become very complicated, with potentially thousands of poststratification cells. It is then a challenge to develop general families of multilevel probability models that yield reasonable Bayesian inferences. We discuss in the context of several ongoing public health and social surveys. This work is currently open-ended, and we conclude with thoughts on how research could proceed to solve these problems.

#### Article information

Source
Statist. Sci., Volume 22, Number 2 (2007), 153-164.

Dates
First available in Project Euclid: 27 September 2007

https://projecteuclid.org/euclid.ss/1190905511

Digital Object Identifier
doi:10.1214/088342306000000691

Mathematical Reviews number (MathSciNet)
MR2408951

Zentralblatt MATH identifier
1246.62043

#### Citation

Gelman, Andrew. Struggles with Survey Weighting and Regression Modeling. Statist. Sci. 22 (2007), no. 2, 153--164. doi:10.1214/088342306000000691. https://projecteuclid.org/euclid.ss/1190905511

#### References

• Becker, D. E. (1998). The New York City Social Indicators Survey: An analysis of the weighting procedure. Technical report, School of Social Work, Columbia Univ.
• Binder, D. A. (1983). On the variances of asymptotically normal estimators from complex surveys. Internat. Statist. Rev. 51 279--292.
• Centre for Multilevel Modelling (2005). Software reviews of multilevel analysis packages. Available at www.cmm.bristol.ac.uk/learning-training/multilevel-m-software/index.shtml.
• Cook, S. R. and Gelman, A. (2006). Survey weighting and regression. Technical report, Dept. Statistics, Columbia Univ.
• Deming, W. E. and Stephan, F. F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Statist. 11 427--444.
• Deville, J., Sarndal, C. and Sautory, O. (1993). Generalizing raking procedures in survey sampling. J. Amer. Statist. Assoc. 88 1013--1020.
• DuMouchel, W. H. and Duncan, G. J. (1983). Using sample survey weights in multiple regression analyses of stratified samples. J. Amer. Statist. Assoc. 78 535--543.
• Elliott, M. R. and Little, R. J. A. (2000). Model-based alternatives to trimming survey weights. J. Official Statistics 16 191--209.
• Fay, R. E. and Herriot, R. A. (1979). Estimates of income for small places: An application of James--Stein procedures to census data. J. Amer. Statist. Assoc. 74 269--277.
• Garfinkel, I., Kaushal, N., Teitler, J. and Garcia, S. (2003). Vulnerability and resilience: New Yorkers respond to 9/11. Technical report, School of Social Work, Columbia Univ.
• Garfinkel, I. and Meyers, M. K. (1999). New York City social indicators 1997: A tale of many cities. Technical report, School of Social Work, Columbia Univ.
• Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 1 515--533.
• Gelman, A. and Carlin, J. B. (2002). Poststratification and weighting adjustments. In Survey Nonresponse (R. M. Groves, D. A. Dillman, J. L. Eltinge and R. J. A. Little, eds.) 289--302. Wiley, New York.
• Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis, 2nd ed. CRC Press, London.
• Gelman, A. and Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge Univ. Press.
• Gelman, A. and Little, T. C. (1998). Improving on probability weighting for household size. Public Opinion Quarterly 62 398--404.
• Graubard, B. I. and Korn, E. L. (2002). Inference for superpopulation parameters using sample surveys. Statist. Sci. 17 73--96.
• Holt, D. and Smith, T. M. F. (1979). Post stratification. J. Roy. Statist. Soc. Ser. A 142 33--46.
• Kish, L. (1992). Weighting for unequal $P_i$. J. Official Statistics 8 183--200.
• Lazzeroni, L. C. and Little, R. J. A. (1998). Random-effects models for smoothing poststratification weights. J. Official Statistics 14 61--78.
• Little, R. J. A. (1991). Inference with survey weights. J. Official Statistics 7 405--424.
• Little, R. J. A. (1993). Post-stratification: A modeler's perspective. J. Amer. Statist. Assoc. 88 1001--1012.
• Lohr, S. (1999). Sampling: Design and Analysis. Duxbury, Pacific Grove, CA.
• Lu, H. and Gelman, A. (2003). A method for estimating design-based sampling variances for surveys with weighting, poststratification and raking. J. Official Statistics 19 133--151.
• Meyers, M. K. and Teitler, J. O. (2001). New York City social indicators 1999: Pulling ahead, falling behind. Technical report, School of Social Work, Columbia Univ.
• Park, D. K., Gelman, A. and Bafumi, J. (2004). Bayesian multilevel estimation with poststratification: State-level estimates from national polls. Political Analysis 12 375--385.
• Pfeffermann, D. (1993). The role of sampling weights when modeling survey data. Internat. Statist. Rev. 61 317--337.
• Rao, J. N. K. (2003). Small Area Estimation. Wiley, Hoboken, NJ.
• Rubin, D. B. (1976). Inference and missing data (with discussion). Biometrika 63 581--592.
• Rubin, D. B. (1996). Multiple imputation after 18+ years (with discussion). J. Amer. Statist. Assoc. 91 473--520.
• Voss, D. S., Gelman, A. and King, G. (1995). Preelection survey methodology: Details from eight polling organizations, 1988 and 1992. Public Opinion Quarterly 59 98--132.
• Yung, W. and Rao, J. N. K. (1996). Jackknife linearization variance estimators under stratified multi-stage sampling. Survey Methodology 22 23--31.