## The Annals of Applied Statistics

### Prevalence and trend estimation from observational data with highly variable post-stratification weights

#### Abstract

In observational surveys, post-stratification is used to reduce bias resulting from differences between the survey population and the population under investigation. However, this can lead to inflated post-stratification weights and, therefore, appropriate methods are required to obtain less variable estimates. Proposed methods include collapsing post-strata, trimming post-stratification weights, generalized regression estimators (GREG) and weight smoothing models, the latter defined by random-effects models that induce shrinkage across post-stratum means. Here, we first describe the weight-smoothing model for prevalence estimation from binary survey outcomes in observational surveys. Second, we propose an extension of this method for trend estimation. And, third, a method is provided such that the GREG can be used for prevalence and trend estimation for observational surveys. Variance estimates of all methods are described. A simulation study is performed to compare the proposed methods with other established methods. The performance of the nonparametric GREG is consistent over all simulation conditions and therefore serves as a valuable solution for prevalence and trend estimation from observational surveys. The method is applied to the estimation of the prevalence and incidence trend of influenza-like illness using the 2010/2011 Great Influenza Survey in Flanders, Belgium.

#### Article information

Source
Ann. Appl. Stat., Volume 10, Number 1 (2016), 94-117.

Dates
Revised: July 2015
First available in Project Euclid: 25 March 2016

https://projecteuclid.org/euclid.aoas/1458909909

Digital Object Identifier
doi:10.1214/15-AOAS874

Mathematical Reviews number (MathSciNet)
MR3480489

Zentralblatt MATH identifier
06586138

#### Citation

Vandendijck, Yannick; Faes, Christel; Hens, Niel. Prevalence and trend estimation from observational data with highly variable post-stratification weights. Ann. Appl. Stat. 10 (2016), no. 1, 94--117. doi:10.1214/15-AOAS874. https://projecteuclid.org/euclid.aoas/1458909909

#### References

• Adler, A. J., Eames, K. T. D., Funk, S. and Edmunds, J. W. (2014). Incidence and risk factors for influenza-like-illness in the UK: Online surveillance using flusurvey. BMC Infect. Dis. 14. DOI:10.1186/1471-2334-14-232.
• Basu, D. (1971). An essay on the logical foundations of survey sampling, Part 1. 203–242 Holt, Rinehart and Winston, Toronto.
• Beaumont, J. F. and Alavi, A. (2006). Robust generalized regression estimation. Statistics Canada 30 195–208.
• Chen, Q., Elliott, M. R. and Little, R. J. A. (2010). Bayesian penalized spline model-based inference for finite population proportion in unequal probability sampling. Surv. Methodol. 36 23–34.
• Cox, B. G. and McGrath, D. S. (1981). An examination of the effect of sample weight truncation on the mean squared error of survey estimates. In Biometric Society ENAR Meeting, Richmond, VA.
• Elliott, M. R. (2007). Bayesian weight trimming for generalized linear regression models. Surv. Methodol. 33 23–34.
• Elliott, M. R. and Little, R. J. A. (2000). Model-based alternatives to trimming survey weights. J. Off. Stat. 16 191–209.
• Farrell, P. J. (2000). Bayesian inference for small area proportions. The Indian Journal of Statistics 62 402–416.
• Friesema, I. H. M., Koppeschaar, C. E., Donker, G. A., Dijkstra, F., van Noort, S. P., Smallenburg, R., van der Hoek, W. and van der Sande, M. A. B. (2009). Internet-based monitoring of influenza-like illness in the general population: Experience of five influenza seasons in the Netherlands. Vaccine 27 6353–6357.
• Gelman, A. (2007). Struggles with survey weighting and regression modeling. Statist. Sci. 22 153–164.
• Ghosh, M. and Meeden, G. (1986). Empirical Bayes estimation in finite population sampling. J. Amer. Statist. Assoc. 81 1058–1062.
• González-Manteiga, W., Lombardía, M. J., Molina, I., Morales, D. and Santamaría, L. (2007). Estimation of the mean squared error of predictors of small area linear parameters under a logistic mixed model. Comput. Statist. Data Anal. 51 2720–2733.
• Holt, D. and Smith, T. M. F. (1979). Post stratification. J. Roy. Statist. Soc. Ser. A 142 33–46.
• Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47 663–685.
• Isaki, C. T. and Fuller, W. A. (1982). Survey design under the regression superpopulation model. J. Amer. Statist. Assoc. 77 89–96.
• Laird, N. M. and Louis, T. A. (1987). Empirical Bayes confidence intervals based on bootstrap samples. J. Amer. Statist. Assoc. 82 739–757.
• Lazzeroni, L. C. and Little, R. J. A. (1998). Random-effects models for smoothing poststratification weights. J. Off. Stat. 14 61–78.
• Lehtonen, R. and Veijanen, A. (1998). Logistic generalized regression estimators. Surv. Methodol. 24 51–55.
• Little, R. J. A. (1983). Estimating a finite population mean from unequal probability samples. J. Amer. Statist. Assoc. 78 596–604.
• Little, R. J. A. (1991). Inference with survey weights. J. Off. Stat. 7 405–424.
• Little, R. J. A. (1993). Post-stratification: A modeler’s perspective. J. Amer. Statist. Assoc. 88 1001–1012.
• Little, R. J. (2004). To model or not to model? Competing modes of inference for finite population sampling. J. Amer. Statist. Assoc. 99 546–556.
• Marquet, R. L., Bartelds, A. I. M., van Noort, S. P., Koppeschaar, C. E., Paget, J., Schellevis, F. G. and van der Zee, J. (2006). Internet-based monitoring of influenza-like illness (ILI) in the general population of the Netherlands during the 2003–2004 influenza season. BMC Public Health 6. DOI:10.1186/1471-2458-6-242.
• Potter, F. (1990). A study to identify and trim extreme sample weights. In Proceeds of the Survey Research Methods Section 225–230. Amer. Statist. Assoc., Alexandria, VA.
• Prasad, N. G. N. and Rao, J. N. K. (1990). The estimation of the mean squared error of small-area estimators. J. Amer. Statist. Assoc. 85 163–171.
• Rao, J. N. K. (2003). Small Area Estimation. Wiley, Hoboken, NJ.
• Royall, R. M. (1970). On finite population sampling theory under certain linear regression models. Biometrika 57 377–387.
• Tremblay, V. (1986). Practical criteria for definition of weighting classes. Surv. Methodol. 12 85–97.
• Vandendijck, Y., Faes, C. and Hens, N. (2013). Eight years of the great influenza survey to monitor influenza-like illness in flanders. PLoS ONE 8 e64156.
• Vandendijck, Y., Faes, C. and Hens, N. (2015). Supplement to “Prevalence and trend estimation from observational data with highly variable post-stratification weights.” DOI:10.1214/15-AOAS874SUPP.
• Wolfinger, R. D. and O’Connell, M. (1993). A pseudo-likelihood approach. J. Stat. Comput. Simul. 48 233–243.
• Zheng, H. and Little, R. J. A. (2004). Penalized spline nonparametric mixed models for inference about a finite population mean from two-stage samples. Surv. Methodol. 30 209–218.
• Zheng, H. and Little, R. J. A. (2005). Inference for the population total from probability-proportional-to-size samples based on predictions from a penalized spline nonparametric model. J. Off. Stat. 21 1–20.

#### Supplemental materials

• Additional details and results. The reader is referred to the online Supplementary Material for more information on how the models can be cast in the GLMM framework (Appendix A), for more details on the estimation method (Appendix B), for annotated SAS and R programs (Appendix C), for additional simulation results (Appendix D), and for additional results for different values of $w_{0}$, additional results for smaller sample size and results on model fits and other spline basis functions (Appendix E).