Statistical Science

Handling Attrition in Longitudinal Studies: The Case for Refreshment Samples

Yiting Deng, D. Sunshine Hillygus, Jerome P. Reiter, Yajuan Si, and Siyu Zheng

Full-text: Open access


Panel studies typically suffer from attrition, which reduces sample size and can result in biased inferences. It is impossible to know whether or not the attrition causes bias from the observed panel data alone. Refreshment samples—new, randomly sampled respondents given the questionnaire at the same time as a subsequent wave of the panel—offer information that can be used to diagnose and adjust for bias due to attrition. We review and bolster the case for the use of refreshment samples in panel studies. We include examples of both a fully Bayesian approach for analyzing the concatenated panel and refreshment data, and a multiple imputation approach for analyzing only the original panel. For the latter, we document a positive bias in the usual multiple imputation variance estimator. We present models appropriate for three waves and two refreshment samples, including nonterminal attrition. We illustrate the three-wave analysis using the 2007–2008 Associated Press–Yahoo! News Election Poll.

Article information

Statist. Sci., Volume 28, Number 2 (2013), 238-256.

First available in Project Euclid: 21 May 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Attrition imputation missing panel survey


Deng, Yiting; Hillygus, D. Sunshine; Reiter, Jerome P.; Si, Yajuan; Zheng, Siyu. Handling Attrition in Longitudinal Studies: The Case for Refreshment Samples. Statist. Sci. 28 (2013), no. 2, 238--256. doi:10.1214/13-STS414.

Export citation


  • Ahern, K. and Le Brocque, R. (2005). Methodological issues in the effects of attrition: Simple solutions for social scientists. Field Methods 17 53–69.
  • Barnard, J. and Meng, X. L. (1999). Applications of multiple imputation in medical studies: From AIDS to NHANES. Stat. Methods Med. Res. 8 17–36.
  • Barnard, J. and Rubin, D. B. (1999). Small-sample degrees of freedom with multiple imputation. Biometrika 86 948–955.
  • Bartels, L. (1993). Messages received: The political impact of media exposure. American Political Science Review 88 267–285.
  • Bartels, L. (1999). Panel effects in the American National Election Studies. Political Analysis 8 1–20.
  • Basic, E. and Rendtel, U. (2007). Assessing the bias due to non-coverage of residential movers in the German Microcensus Panel: An evaluation using data from the Socio-Economic Panel. AStA Adv. Stat. Anal. 91 311–334.
  • Behr, A., Bellgardt, E. and Rendtel, U. (2005). Extent and determinants of panel attrition in the European Community Household Panel. European Sociological Review 23 81–97.
  • Bhattacharya, D. (2008). Inference in panel data models under attrition caused by unobservables. J. Econometrics 144 430–446.
  • Burgette, L. F. and Reiter, J. P. (2010). Multiple imputation for missing data via sequential regression trees. Am. J. Epidemiol. 172 1070–1076.
  • Chen, H. Y. and Little, R. (1999). A test of missing completely at random for generalised estimating equations with missing data. Biometrika 86 1–13.
  • Chen, B., Yi, G. Y. and Cook, R. J. (2010). Weighted generalized estimating functions for longitudinal response and covariate data that are missing at random. J. Amer. Statist. Assoc. 105 336–353.
  • Clinton, J. (2001). Panel bias from attrition and conditioning: A case study of the Knowledge Networks Panel. Unpublished manuscript, Vanderbilt Univ. Available at
  • Daniels, M. J. and Hogan, J. W. (2008). Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Monographs on Statistics and Applied Probability 109. Chapman & Hall/CRC, Boca Raton, FL.
  • Das, M. (2004). Simple estimators for nonparametric panel data models with sample attrition. J. Econometrics 120 159–180.
  • Deng, Y. (2012). Modeling missing data in panel studies with multiple refreshment samples. Master’s thesis, Dept. Statistical Science, Duke Univ, Durham, NC.
  • Diggle, P. (1989). Testing for random dropouts in repeated measurement data. Biometrics 45 1255–1258.
  • Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Statistical Science Series 25. Oxford Univ. Press, Oxford.
  • Dorsett, R. (2010). Adjusting for nonignorable sample attrition using survey substitutes identified by propensity score matching: An empirical investigation using labour market data. Journal of Official Statistics 26 105–125.
  • Fitzgerald, J., Gottschalk, P. and Moffitt, R. (1998). An analysis of sample attrition in panel data: The Michigan Panel Study of Income Dynamics. Journal of Human Resources 33 251–299.
  • Fitzmaurice, G. M., Laird, N. M. and Ware, J. H. (2004). Applied Longitudinal Analysis. Wiley, Hoboken, NJ.
  • Frick, J. R., Goebel, J., Schechtman, E., Wagner, G. G. and Yitzhaki, S. (2006). Using analysis of Gini (ANOGI) for detecting whether two subsamples represent the same universe. Sociol. Methods Res. 34 427–468.
  • Gelman, A. (2007). Struggles with survey weighting and regression modeling. Statist. Sci. 22 153–164.
  • Gelman, A., Meng, X.-L. and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statist. Sinica 6 733–807.
  • Hausman, J. and Wise, D. (1979). Attrition bias in experimental and panel data: The Gary income maintenance experiment. Econometrica 47 455–473.
  • He, Y., Zaslavsky, A. M., Landrum, M. B., Harrington, D. P. and Catalano, P. (2010). Multiple imputation in a large-scale complex survey: A practical guide. Stat. Methods Med. Res. 19 653–670.
  • Hedeker, D. and Gibbons, R. D. (2006). Longitudinal Data Analysis. Wiley, Hoboken, NJ.
  • Heeringa, S. (1997). Russia longitudinal monitoring survey sample attrition, replenishment, and weighting: Rounds V–VII. Univ. Michigan Institute for Social Research.
  • Henderson, M., Hillygus, D. and Tompson, T. (2010). “Sour grapes” or rational voting? Voter decision making among thwarted primary voters in 2008. Public Opinion Quarterly 74 499–529.
  • Hirano, K., Imbens, G., Ridder, G. and Rubin, D. (1998). Combining panel data sets with attrition and refreshment samples. NBER Working Paper 230.
  • Hirano, K., Imbens, G. W., Ridder, G. and Rubin, D. B. (2001). Combining panel data sets with attrition and refreshment samples. Econometrica 69 1645–1659.
  • Honaker, J. and King, G. (2010). What to do about missing values in time-series cross-section data. American Journal of Political Science 54 561–581.
  • Jelicić, H., Phelps, E. and Lerner, R. M. (2009). Use of missing data methods in longitudinal studies: The persistence of bad practices in developmental psychology. Dev. Psychol. 45 1195–1199.
  • Kalton, G. and Kasprzyk, D. (1986). The treatment of missing survey data. Survey Methodology 12 1–16.
  • Kenward, M. G. (1998). Selection models for repeated measurements with non-random dropout: An illustration of sensitivity. Stat. Med. 17 2723–2732.
  • Kenward, M. G., Molenberghs, G. and Thijs, H. (2003). Pattern-mixture models with proper time dependence. Biometrika 90 53–71.
  • Kish, L. and Hess, I. (1959). A “replacement” procedure for reducing the bias of nonresponse. Amer. Statist. 13 17–19.
  • Kristman, V., Manno, M. and Cote, P. (2005). Methods to account for attrition in longitudinal data: Do they work? A simulation study. European Journal of Epidemiology 20 657–662.
  • Kruse, Y., Callegaro, M., Dennis, J., Subias, S., Lawrence, M., DiSogra, C. and Tompson, T. (2009). Panel conditioning and attrition in the AP-Yahoo! News Election Panel Study. In 64th Conference of the American Association for Public Opinion Research (AAPOR), Hollywood, FL.
  • Lawless, J. (2009). Sexism and gender bias in election 2008: A more complex path for women in politics. Politics Gender 5 70–80.
  • Li, K. H., Raghunathan, T. E. and Rubin, D. B. (1991). Large-sample significance levels from multiply imputed data using moment-based statistics and an $F$ reference distribution. J. Amer. Statist. Assoc. 86 1065–1073.
  • Li, K. H., Meng, X.-L., Raghunathan, T. E. and Rubin, D. B. (1991). Significance levels from repeated $p$-values with multiply-imputed data. Statist. Sinica 1 65–92.
  • Lin, H., McCulloch, C. E. and Rosenheck, R. A. (2004). Latent pattern mixture models for informative intermittent missing data in longitudinal studies. Biometrics 60 295–305.
  • Lin, I. and Schaeffer, N. C. (1995). Using survey participants to estimate the impact of nonparticipation. Public Opinion Quarterly 59 236–258.
  • Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. J. Amer. Statist. Assoc. 88 125–134.
  • Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley, Hoboken, NJ.
  • Lohr, S. (1998). Sampling: Design and Analysis. Cole Publishing Company, London.
  • Lynn, P. (2009). Methodology of Longitudinal Surveys. Wiley, Chichester, UK.
  • Meng, X.-L. (1994). Posterior predictive $p$-values. Ann. Statist. 22 1142–1160.
  • Meng, X.-L. and Rubin, D. B. (1992). Performing likelihood ratio tests with multiply-imputed data sets. Biometrika 79 103–111.
  • Molenberghs, G., Beunckens, C., Sotto, C. and Kenward, M. G. (2008). Every missingness not at random model has a missingness at random counterpart with equal fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 371–388.
  • Nevo, A. (2003). Using weights to adjust for sample selection when auxiliary information is available. J. Bus. Econom. Statist. 21 43–52.
  • Olsen, R. (2005). The problem of respondent attrition: Survey methodology is key. Monthly Labor Review 128 63–71.
  • Olson, K. and Witt, L. (2011). Are we keeping the people who used to stay? Changes in correlates of panel survey attrition over time. Social Science Research 40 1037–1050.
  • Packer, M., Colucci, W., Sackner-Bernstein, J., Liang, C., Goldscher, D., Freeman, I., Kukin, M., Kinhal, V., Udelson, J., Klapholz, M. et al. (1996). Double-blind, placebo-controlled study of the effects of carvedilol in patients with moderate to severe heart failure: The PRECISE trial. Circulation 94 2800–2806.
  • Pasek, J., Tahk, A., Lelkes, Y., Krosnick, J., Payne, B., Akhtar, O. and Tompson, T. (2009). Determinants of turnout and candidate choice in the 2008 US Presidential election: Illuminating the impact of racial prejudice and other considerations. Public Opinion Quarterly 73 943–994.
  • Pew Research Center (2010). Four years later republicans faring better with men, whites, independents and seniors (press release). Available at
  • Prior, M. (2010). You’ve either got it or you don’t? The stability of political interest over the life cycle. The Journal of Politics 72 747–766.
  • Qu, A. and Song, P. X. K. (2002). Testing ignorable missingness in estimating equation approaches for longitudinal data. Biometrika 89 841–850.
  • Qu, A., Yi, G. Y., Song, P. X. K. and Wang, P. (2011). Assessing the validity of weighted generalized estimating equations. Biometrika 98 215–224.
  • Raghunathan, T. E., Lepkowski, J. M., van Hoewyk, J. and Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a series of regression models. Survey Methodology 27 85–96.
  • Reiter, J. P. (2007). Small-sample degrees of freedom for multi-component significance tests for multiple imputation for missing data. Biometrika 94 502–508.
  • Reiter, J. P. (2008). Multiple imputation when records used for imputation are not used or disseminated for analysis. Biometrika 95 933–946.
  • Reiter, J. P. and Raghunathan, T. E. (2007). The multiple adaptations of multiple imputation. J. Amer. Statist. Assoc. 102 1462–1471.
  • Ridder, G. (1992). An empirical evaluation of some models for non-random attrition in panel data. Structural Change and Economic Dynamics 3 337–355.
  • Robins, J. M. and Rotnitzky, A. (1995). Semiparametric efficiency in multivariate regression models with missing data. J. Amer. Statist. Assoc. 90 122–129.
  • Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Amer. Statist. Assoc. 90 106–121.
  • Rotnitzky, A., Robins, J. M. and Scharfstein, D. O. (1998). Semiparametric regression for repeated outcomes with nonignorable nonresponse. J. Amer. Statist. Assoc. 93 1321–1339.
  • Roy, J. (2003). Modeling longitudinal data with nonignorable dropouts using a latent dropout class model. Biometrics 59 829–836.
  • Roy, J. and Daniels, M. J. (2008). A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics 64 538–545, 668.
  • Rubin, D. B. (1976). Inference and missing data (with discussion). Biometrika 63 581–592.
  • Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley, New York.
  • Rubin, D. B. (1996). Multiple imputation after 18+ years. J. Amer. Statist. Assoc. 91 473–489.
  • Rubin, D. B. (2003). Nested multiple imputation of NMES via partially incompatible MCMC. Stat. Neerl. 57 3–18.
  • Rubin, D. and Zanutto, E. (2001). Using matched substitutes to adjust for nonignorable nonresponse through multiple imputations. In Survey Nonresponse (R. Groves, D. Dillman, R. Little and J. Eltinge, eds.) 389–402. Wiley, New York.
  • Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. Monographs on Statistics and Applied Probability 72. Chapman & Hall, London.
  • Scharfstein, D. O., Rotnitzky, A. and Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Amer. Statist. Assoc. 94 1096–1120.
  • Semykina, A. and Wooldridge, J. M. (2010). Estimating panel data models in the presence of endogeneity and selection. J. Econometrics 157 375–380.
  • Shen, Z. (2000). Nested multiple imputation. Ph.D. thesis, Dept. Statistics, Harvard Univ.
  • Siddique, J., Harel, O. and Crespi, C. M. (2012). Addressing missing data mechanism uncertainty using multiple imputation: Application to a longitudinal clinical trial. Ann. Appl. Stat. 6 1814–1837.
  • Siddiqui, O., Flay, B. and Hu, F. (1996). Factors affecting attrition in a longitudinal smoking prevention study. Preventive Medicine 25 554–560.
  • Thompson, M., Fong, G., Hammond, D., Boudreau, C., Driezen, P., Hyland, A., Borland, R., Cummings, K., Hastings, G., Siahpush, M. et al. (2006). Methods of the International Tobacco Control (ITC) four country survey. Tobacco Control 15 Suppl. 3.
  • Traugott, M. and Tucker, C. (1984). Strategies for predicting whether a citizen will vote and estimation of electoral outcomes. Public Opinion Quarterly 48 330–343.
  • Van Buuren, S. and Oudshoorn, C. (1999). Flexible multivariate imputation by MICE. Technical Report TNO/VGZ/PG 99.054, TNO Preventie en Gezondheid, Leiden.
  • Vandecasteele, L. and Debels, A. (2007). Attrition in panel data: The effectiveness of weighting. European Sociological Review 23 81–97.
  • Vehovar, V. (1999). Field substitution and unit nonresponse. Journal of Official Statistics 15 335–350.
  • Vella, F. and Verbeek, M. (1999). Two-step estimation of panel data models with censored endogenous variables and selection bias. J. Econometrics 90 239–263.
  • Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Effects Models for Longitudinal Data. Springer, Berlin.
  • Wawro, G. (2002). Estimating dynamic panel data models in political science. Political Analysis 10 25–48.
  • Wissen, L. and Meurs, H. (1989). The Dutch mobility panel: Experiences and evaluation. Transportation 16 99–119.
  • Wooldridge, J. M. (2005). Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity. J. Appl. Econometrics 20 39–54.
  • Zabel, J. (1998). An analysis of attrition in the Panel Study of Income Dynamics and the Survey of Income and Program Participation with an application to a model of labor market behavior. Journal of Human Resources 33 479–506.