Statistics Surveys

A design-sensitive approach to fitting regression models with complex survey data

Phillip S. Kott

Full-text: Open access


Fitting complex survey data to regression equations is explored under a design-sensitive model-based framework. A robust version of the standard model assumes that the expected value of the difference between the dependent variable and its model-based prediction is zero no matter what the values of the explanatory variables. The extended model assumes only that the difference is uncorrelated with the covariates. Little is assumed about the error structure of this difference under either model other than independence across primary sampling units. The standard model often fails in practice, but the extended model very rarely does. Under this framework some of the methods developed in the conventional design-based, pseudo-maximum-likelihood framework, such as fitting weighted estimating equations and sandwich mean-squared-error estimation, are retained but their interpretations change. Few of the ideas here are new to the refereed literature. The goal instead is to collect those ideas and put them into a unified conceptual framework.

Article information

Statist. Surv., Volume 12 (2018), 1-17.

Received: June 2017
First available in Project Euclid: 17 January 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Pseudo-maximum likelihood extended model proportional-odds model generalized cumulative logistic model design-based

Creative Commons Attribution 4.0 International License.


Kott, Phillip S. A design-sensitive approach to fitting regression models with complex survey data. Statist. Surv. 12 (2018), 1--17. doi:10.1214/17-SS118.

Export citation


  • Binder, D.A. (1983). On the variances of asymptotically normal estimators from complex surveys. International Statistical Review, 51, 279–292.
  • Binder, D.A. and Roberts, G.R. (2003). Design-based and model-based methods for estimating model parameters, in R.L. Chambers and C.J. Skinner, eds., Analysis of Survey Data, John Wiley & Sons, New York, Chapter 3.
  • Deville, J.-C. and Särndal, C.-E. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376–382.
  • Felligi, I. (1980). Approximate test of independence and goodness of fit based on stratified multistage surveys. Journal of the American Statistical Association, 75, 261–268.
  • Fuller, W.A. (1975). Regression analysis for sample survey. Sankhya-The Indian Journal of Statistics, 37(Series C), 117–132.
  • Fuller, W.A. (2002). Regression estimation for survey samples (with discussion). Survey Methodology, 28(1), 5–23.
  • Godambe, V.P. and Thompson, M.E. (1974). Estimating equations in the presence of a nuisance parameter. Annals of Statistics, 2, 568–571.
  • Graubard, B.I. and Korn, E.L. (2002). Inference for superpopulation parameters using sample surveys. Statistical Science, 17, 73–96.
  • Graubard, B.I., Korn, E.L., and Midthune, D. (1997). Testing goodness-of-fit for logistic regression with survey data. American Statistical Association Proceedings of the Section on Survey Research Methods, 170–174.
  • Hartley, H.O. and Rao, J.N.K. (1962). Sampling With Unequal Probabilities and Without Replacement. Annals of Mathematical Statistics, 33, 350–374.
  • Hausman, J. (1978). Specification tests in econometrics, Econometrica 46, 1251–1271.
  • Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70.
  • Korn, E. and Graubard, B. (1998). Confidence intervals for proportions with small expected number of positive counts estimated from Survey data. Survey Methodoloegy, 193–201.
  • Kott, P. (1991). What does performing linear regression on sample survey data mean? Journal of Agricultural Economics Research, 30–33.
  • Kott, P.S. (1994). Hypothesis testing of linear regression coefficients with survey data. Survey Methodology, 20, 159–164.
  • Kott, P. (2005). Randomization-assisted model-based survey sampling. Journal of Statistical Planning and Inference, 129, 263–277.
  • Kott, P. (2006). Using calibration weighting to adjust for nonresponse and coverage errors. Survey Methodology, 133–142
  • Kott, P. (2007). Clarifying some issues in the regression analysis of survey data. Survey Research Methods, 1, 11–18.
  • Korn, E.L. and Graubard, B.I. (1990). Simultaneous testing of regression coefficients with complex survey data: Use of Bonferroni $t$ statistics. American Statistician, 44, 270–276.
  • Little, R.J.A. and Rubin, D.B. (2002). Statistical Analysis with Missing Data. (2nd ed.), New York: Wiley.
  • Lohr, S. (2010). Sampling: Design and Analysis. (2nd ed.), Boston: Brooks/Cole.
  • Lumley, T. and Scott, A. (2015). AIC and BIC for modeling with complex survey data. Journal of Survey Statistics and Methodology, 3, 1–18.
  • Lumley, T. and Scott, A. (2017). Fitting regression models to survey data. Statistical Science, 32, 265–278.
  • Pfeffermann, D. (2011). Modelling of complex survey data: Why model? Why is it a problem? How can we approach it? Survey Methodology, 37(2), 115–136.
  • Pfeffermann, D. (1993). The role of sampling weights when modeling survey data. International Statistical Review, 61, 317–337.
  • Pfeffermann, D. and Sverchkov, M. (1999). Parametric and semiparametric estimation of regression models fitted to survey data. Sankhya-The Indian Journal of Statistics, 61(Series B), 166–186.
  • Rao, J. and Scott, A. (1981). The analysis of categorical data from complex surveys: Chi-squared tests for goodness of fit and independence in two-way tables. Journal of the American Statistical Association, 76, 221–230.
  • Research Triangle Institute (2012). SUDAAN Language Manual, Volumes 1 and 2, Release 11. Research Triangle Park, NC: Research Triangle Institute.
  • Särndal, C.-E., Swensson, B., and Wretmann, J. (1992). Model assisted survey sampling. New York: Springer.
  • SAS Institute Inc. (2015). SAS/STAT® 14.1 User’s Guide. Cary, NC: SAS Institute Inc.
  • Skinner, C.J. (1989). Domain means, regression and multivariate analysis. In Skinner, C.J., Holt, D. and Smith, T.M.F. eds. Analysis of Complex Surveys. Chichester: Wiley, 59–87.
  • Valliant, R. and Rust, K. (2010). Degrees of freedom approximations and rules-of-thumb. Journal of Official Statistics, 26, 585–602.
  • White, H. (1984). Asymptotic Theory for Econometricians. Orlando: Academic Press.
  • White, H. (1980). Using least squares to approximate unknown regression functions. International Economic Review, 21, 149–170.
  • Wilkinson, L. and the Task Force on Statistical Inference, Board of Scientific Affairs, American Psychological Association (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 8, 594–604.
  • Williams, R. (2005). Gologit2: A Program for Generalized Logistic Regression/Partial Proportional Odds Models for Ordinal Variables. Retrieved January 3, 2016 (