The Annals of Applied Statistics

Adjusting models of ordered multinomial outcomes for nonignorable nonresponse in the occupational employment statistics survey

Nicholas J. Horton, Daniell Toth, and Polly Phipps

Full-text: Open access


An establishment’s average wage, computed from administrative wage data, has been found to be related to occupational wages. These occupational wages are a primary outcome variable for the Bureau of Labor Statistics Occupational Employment Statistics survey. Motivated by the fact that nonresponse in this survey is associated with average wage even after accounting for other establishment characteristics, we propose a method that uses the administrative data for imputing missing occupational wage values due to nonresponse. This imputation is complicated by the structure of the data. Since occupational wage data is collected in the form of counts of employees in predefined wage ranges for each occupation, weighting approaches to deal with nonresponse do not adequately adjust the estimates for certain domains of estimation. To preserve the current data structure, we propose a method to impute each missing establishment’s wage interval count data as an ordered multinomial random variable using a separate survival model for each occupation. Each model incorporates known auxiliary information for each establishment associated with the distribution of the occupational wage data, including geographic and industry characteristics. This flexible model allows the baseline hazard to vary by occupation while allowing predictors to adjust the probabilities of an employee’s salary falling within the specified ranges. An empirical study and simulation results suggest that the method imputes missing OES wages that are associated with the average wage of the establishment in a way that more closely resembles the observed association.

Article information

Ann. Appl. Stat., Volume 8, Number 2 (2014), 956-973.

First available in Project Euclid: 1 July 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Administrative data auxiliary data categorical outcome establishment survey missing data imputation survival analysis regression trees


Horton, Nicholas J.; Toth, Daniell; Phipps, Polly. Adjusting models of ordered multinomial outcomes for nonignorable nonresponse in the occupational employment statistics survey. Ann. Appl. Stat. 8 (2014), no. 2, 956--973. doi:10.1214/14-AOAS714.

Export citation


  • Abayomi, K., Gelman, A. and Levy, M. (2008). Diagnostics for multivariate imputations. J. R. Stat. Soc. Ser. C. Appl. Stat. 57 273–291.
  • Bureau of Labor Statistics (2011). Occupational establishment survey state operations manual (Appendix M: OES estimation procedures).
  • Chang, T. and Kott, P. S. (2008). Using calibration weighting to adjust for nonresponse under a plausible model. Biometrika 95 555–571.
  • Chen, J. and Shao, J. (2000). Nearest neighbor imputation for survey data. Journal of Official Statistics 16 113–131.
  • Collins, L. M., Schafer, J. L. and Kam, C. M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol. Methods 6 330–351.
  • Cox, D. R. (1972). Regression models and life-tables. J. R. Stat. Soc. Ser. B Stat. Methodol. 34 187–220.
  • Efron, B. (1977). The efficiency of Cox’s likelihood function for censored data. J. Amer. Statist. Assoc. 72 557–565.
  • Groshen, E. (1991). Sources of intra-industry wage dispersion: How much do employers matter? The Quarterly Journal of Economics 106 869–884.
  • Holt, D. and Smith, T. M. F. (1979). Post-stratification. Journal of the Royal Statistical Society, Series A: General 142 33–46.
  • Kim, J. K. and Kim, J. J. (2007). Nonresponse weighting adjustment using estimated response probability. Canad. J. Statist. 35 501–514.
  • Kott, P. (2006). Using calibration weighting to adjust for nonresponse and coverage errors. Survey Methodology 32 133–142.
  • Kott, P. S. and Chang, T. (2010). Using calibration weighting to adjust for nonignorable unit nonresponse. J. Amer. Statist. Assoc. 105 1265–1275.
  • Lane, J., Salmon, L. and Spletzer, J. (2007). Establishment wage differentials. Monthly Labor Review 4 3–17.
  • Little, R. J. A. (1982). Models for nonresponse in sample surveys. J. Amer. Statist. Assoc. 77 237–250.
  • Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley, Hoboken, NJ.
  • Little, R. and Vartivarian, S. (2005). Does weighting for nonresponse increase the variance of survey means? Survey Methodology 31 161–168.
  • Phipps, P. and Toth, D. (2012). Analyzing establishment nonresponse using an interpretable regression tree model with linked administrative data. Ann. Appl. Stat. 6 772–794.
  • Piccone, D. and Hesley, T. E. (2010). Using point and intervalized data in occupational employment statistics survey estimates. Bureau of Labor Statistics Survey Papers. Available at
  • Rubin, D. B. (1976). Inference and missing data. Biometrika 63 581–592.
  • Schenker, N., Borrud, L. G., Burt, V. L., Curtin, L. R., Flegal, K. M., Hughes, J., Johnson, C. L., Looker, A. C. and Mirel, L. (2011). Multiple imputation of missing dual-energy X-ray absorptiometry data in the National Health and Nutrition Examination Survey. Stat. Med. 30 260–276.
  • Therneau, T. (2013). A package for survival analysis in S. R package version 2.37-4. Available at