The Annals of Applied Statistics

Assessing nonresponse bias in a business survey: Proxy pattern-mixture analysis for skewed data

Rebecca Andridge and Katherine Jenny Thompson

Full-text: Open access

Abstract

The Service Annual Survey (SAS) is a business survey conducted annually by the U.S. Census Bureau that collects aggregate and detailed revenues and expenses data. Typical of many business surveys, the SAS population is highly positively skewed, with large companies comprising a large proportion of the published totals. When alternative data are not available, missing data are handled with ratio imputation models that assume missingness is at random. We propose a proxy pattern-mixture (PPM) model that provides a simple framework for assessing nonresponse bias with respect to different nonresponse mechanisms. PPM models were first introduced in this context by Andridge and Little [Journal of Official Statistics 27 (2011) 153–180], but their model assumed the characteristic of interest and the predicted proxy have a bivariate normal distribution, conditional on the missingness indicator. Although often appropriate for large demographic surveys, the normality assumption is less justifiable for the highly skewed SAS data. We propose an alternative PPM model using a bivariate gamma distribution more appropriate for the SAS data. We compare the two PPM models through application to data from six years of data collection in three industries in the health care and transportation sectors of the SAS. Finally, we illustrate properties of the method through simulation.

Article information

Source
Ann. Appl. Stat., Volume 9, Number 4 (2015), 2237-2265.

Dates
Received: March 2015
Revised: July 2015
First available in Project Euclid: 28 January 2016

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1453994199

Digital Object Identifier
doi:10.1214/15-AOAS878

Mathematical Reviews number (MathSciNet)
MR3456373

Zentralblatt MATH identifier
06560829

Keywords
Missing data nonresponse bias analysis nonignorable missingness multiple imputation skewed data business surveys proxy pattern-mixture models

Citation

Andridge, Rebecca; Thompson, Katherine Jenny. Assessing nonresponse bias in a business survey: Proxy pattern-mixture analysis for skewed data. Ann. Appl. Stat. 9 (2015), no. 4, 2237--2265. doi:10.1214/15-AOAS878. https://projecteuclid.org/euclid.aoas/1453994199


Export citation

References

  • Andridge, R. R. and Little, R. J. A. (2011). Proxy pattern-mixture analysis for survey nonresponse. Journal of Official Statistics 27 153–180.
  • Andridge, R. R. and Thompson, K. J. (2015a). Using the fraction of missing information to identify auxiliary variables for imputation procedures via proxy pattern-mixture models. Int. Stat. Rev. 83 472–492.
  • Andridge, R. R. and Thompson, K. J. (2015b). Supplement to “Assessing nonresponse bias in a business survey: Proxy pattern-mixture analysis for skewed data.” DOI:10.1214/15-AOAS878.
  • Bavdaž, M. (2010). The multidimensional integral business survey response model. Survey Methodology 1 81–93.
  • Beaumont, J.-F., Haziza, D. and Bocci, C. (2011). On variance estimation under auxiliary value imputation in sample surveys. Statist. Sinica 21 515–537.
  • Devroye, L. (2002). Simulating Bessel random variables. Statist. Probab. Lett. 57 249–257.
  • Efron, B. (1994). Missing data, imputation, and the bootstrap. J. Amer. Statist. Assoc. 89 463–479.
  • Fay, R. E. III and Herriot, R. A. (1979). Estimates of income for small places: An application of James–Stein procedures to census data. J. Amer. Statist. Assoc. 74 269–277.
  • Feller, W. (1966). An Introduction to Probability Theory and Its Applications. Vol. II. Wiley, New York.
  • Harel, O. (2007). Inferences on missing information under multiple imputation and two-stage multiple imputation. Stat. Methodol. 4 75–89.
  • Haziza, D., Thompson, K. J. and Yung, W. (2010). The effect of nonresponse adjustments on variance estimation. Survey Methodology 36 35–43.
  • Iliopoulos, G., Karlis, D. and Ntzoufras, I. (2005). Bayesian estimation in Kibble’s bivariate gamma distribution. Canad. J. Statist. 33 571–589.
  • Izawa, T. (1965). Two or multi-dimensional gamma-type distribution and its application to rainfall data. Papers in Meteorology and Geophysics 15 167–200.
  • Kibble, W. F. (1941). A two-variate gamma type distribution. Sankhyā 5 137–150.
  • Kreuter, F., Olson, K., Wagner, J., Yan, T., Ezzati-Rice, T. M., Casas-Cordero, C., Lemay, M., Peytchev, A., Groves, R. M. and Raghunathan, T. E. (2010). Using proxy measures and other correlates of survey outcomes to adjust for non-response: Examples from multiple surveys. J. Roy. Statist. Soc. Ser. A 173 389–407.
  • Krewski, D. and Rao, J. N. K. (1981). Inference from stratified samples: Properties of the linearization, jackknife and balanced repeated replication methods. Ann. Statist. 9 1010–1019.
  • Little, R. J. A. (1994). A class of pattern-mixture models for normal incomplete data. Biometrika 81 471–483.
  • Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley, Hoboken, NJ.
  • Lohr, S. L. (2010). Sampling: Design and Analysis, 2nd ed. Brooks/Cole, Boston, MA.
  • Makarov, R. N. and Glew, D. (2010). Exact simulation of Bessel diffusions. Monte Carlo Methods Appl. 16 283–306.
  • Ong, S. H. (1992). The computer generation of bivariate binomial variables with given marginals and correlations. Comm. Statist. Simulation Comput. 21 285–299.
  • Peytcheva, E. and Groves, R. M. (2009). Using variation in response rates of demographic subgroups as evidence of nonresponse bias in survey estimates. Journal of Official Statistics 25 193–201.
  • R Core Team (2012). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at http://www.R-project.org/.
  • Rao, J. N. K. (2003). Small Area Estimation. Wiley, Hoboken, NJ.
  • Rao, J. N. K. and Scott, A. J. (1992). A simple method of the analysis of clustered binary data. Biometrika 74 577–585.
  • Roberts, G., Rao, J. N. K. and Kumar, S. (1987). Logistic regression analysis of sample survey data. Biometrika 74 1–12.
  • Royall, R. M. (1992). The model based (prediction) approach to finite population sampling theory. In Current Issues in Statistical Inference: Essays in Honor of D. Basu. Institute of Mathematical Statistics Lecture Notes—Monograph Series 17 225–240. IMS, Hayward, CA.
  • Snijkers, G., Haraldsen, G., Jones, J. and Willimack, D. K. (2013). Designing and Conducting Business Surveys. Wiley, New York.
  • Thompson, K. J. (2005). An empirical investigation into the effects of replicate reweighting on variance estimates for the annual capital expenditures survey. In Proceedings of the Federal Committee on Statistical Methods Research Conference. U.S. Office of Management and Budget, Washington, DC.
  • Thompson, K. J. and Oliver, B. E. (2012). Response rates in business surveys: Going beyond the usual performance measure. Journal of Official Statistics 28 221–237.
  • Thompson, K. J. and Washington, K. T. (2013). Challenges in the treatment of unit nonresponse for selected business surveys: A case study. Survey Methods: Insights from the Field. Retrieved from http://surveyinsights.org/?p=2991.
  • Wagner, J. (2010). The fraction of missing information as a tool for monitoring the quality of survey data. Public Opinion Quarterly 74 223–243.
  • Wagner, J. (2012). A comparison of alternative indicators for the risk of nonresponse bias. Public Opinion Quarterly 76 555–575.
  • Willimack, D. K. and Nichols, E. (2010). A hybrid response process model for business surveys. Journal of Official Statistics 1 3–24.
  • Yuan, L. and Kalbfleisch, J. D. (2000). On the Bessel distribution and related problems. Ann. Inst. Statist. Math. 52 438–447.

Supplemental materials

  • Supplement to “Assessing nonresponse bias in a business survey: Proxy pattern-mixture analysis for skewed data”. The supplementary material contains the results of applying multiple imputation using the gamma PPM model and the normal PPM model for $\lambda=0$ (MAR) and $\lambda=\infty$ (MNAR) in the three SAS industries for the expenses model.