The Annals of Applied Statistics

Multivariate spatio-temporal models for high-dimensional areal data with application to Longitudinal Employer-Household Dynamics

Jonathan R. Bradley, Scott H. Holan, and Christopher K. Wikle

Full-text: Open access

Abstract

Many data sources report related variables of interest that are also referenced over geographic regions and time; however, there are relatively few general statistical methods that one can readily use that incorporate these multivariate spatio-temporal dependencies. Additionally, many multivariate spatio-temporal areal data sets are extremely high dimensional, which leads to practical issues when formulating statistical models. For example, we analyze Quarterly Workforce Indicators (QWI) published by the US Census Bureau’s Longitudinal Employer-Household Dynamics (LEHD) program. QWIs are available by different variables, regions, and time points, resulting in millions of tabulations. Despite their already expansive coverage, by adopting a fully Bayesian framework, the scope of the QWIs can be extended to provide estimates of missing values along with associated measures of uncertainty. Motivated by the LEHD, and other applications in federal statistics, we introduce the multivariate spatio-temporal mixed effects model (MSTM), which can be used to efficiently model high-dimensional multivariate spatio-temporal areal data sets. The proposed MSTM extends the notion of Moran’s I basis functions to the multivariate spatio-temporal setting. This extension leads to several methodological contributions, including extremely effective dimension reduction, a dynamic linear model for multivariate spatio-temporal areal processes, and the reduction of a high-dimensional parameter space using a novel parameter model.

Article information

Source
Ann. Appl. Stat., Volume 9, Number 4 (2015), 1761-1791.

Dates
Received: January 2015
Revised: June 2015
First available in Project Euclid: 28 January 2016

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1453993093

Digital Object Identifier
doi:10.1214/15-AOAS862

Mathematical Reviews number (MathSciNet)
MR3456353

Zentralblatt MATH identifier
06560809

Keywords
Bayesian hierarchical model Longitudinal Employer-Household Dynamics (LEHD) program Kalman filter Markov chain Monte Carlo multivariate spatio-temporal data Moran’s I basis

Citation

Bradley, Jonathan R.; Holan, Scott H.; Wikle, Christopher K. Multivariate spatio-temporal models for high-dimensional areal data with application to Longitudinal Employer-Household Dynamics. Ann. Appl. Stat. 9 (2015), no. 4, 1761--1791. doi:10.1214/15-AOAS862. https://projecteuclid.org/euclid.aoas/1453993093


Export citation

References

  • Abowd, J., Schneider, M. and Vilhuber, L. (2013). Differential privacy applications to Bayesian and linear mixed model estimation. Journal of Privacy and Confidentiality 5 73–105.
  • Abowd, J., Stephens, B., Vilhuber, L., Andersson, F., McKinney, K., Roemer, M. and Woodcock, S. (2009). The LEHD infrastructure files and the creation of the Quarterly Workforce Indicators. In Producer Dynamics: New Evidence from Micro Data (T. Dunne, J. Jensen and M. Roberts, eds.) 149–230. Univ. Chicago Press, Chicago.
  • Aldworth, J. and Cressie, N. (1999). Sampling designs and prediction methods for Gaussian spatial processes. In Multivariate Analysis, Design of Experiments, and Survey Sampling. Statist. Textbooks Monogr. 159 1–54. Dekker, New York.
  • Allegretto, S., Dube, A., Reich, M. and Zipperer, B. (2013). Credible research designs for minimum wage studies. Working paper series 1–63, Institute for Research on Labor and Employment.
  • Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2004). Hierarchical Modeling and Analysis for Spatial Data. Chapman & Hall, London, UK.
  • Banerjee, S., Gelfand, A. E., Finley, A. O. and Sang, H. (2008). Gaussian predictive process models for large spatial data sets. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 825–848.
  • Banerjee, S., Finley, A. O., Waldmann, P. and Ericsson, T. (2010). Hierarchical spatial process models for multiple traits in large genetic trials. J. Amer. Statist. Assoc. 105 506–521.
  • Bell, W. and Hillmer, S. (1990). The time series approach to estimation for repeated surveys. Surv. Methodol. 16 195–215.
  • Bradley, J. R., Cressie, N. and Shi, T. (2011). Selection of rank and basis functions in the spatial random effects model. In Proceedings of the 2011 Joint Statistical Meetings 3393–3406. American Statistical Association, Alexandria, VA.
  • Bradley, J. R., Cressie, N. and Shi, T. (2014). A comparison of spatial predictors when datasets could be very large. Preprint. Available at arXiv:1410.7748.
  • Bradley, J. R., Cressie, N. and Shi, T. (2015). Comparing and selecting spatial predictors using local criteria. TEST 24 1–28 (Rejoinder, pp. 54–60).
  • Carlin, B. P. and Banerjee, S. (2003). Hierarchical multivariate CAR models for spatio-temporally correlated survival data. In Bayesian Statistics, 7 (Tenerife, 2002) 45–63. Oxford Univ. Press, New York.
  • Carter, C. K. and Kohn, R. (1994). On Gibbs sampling for state space models. Biometrika 81 541–553.
  • Congdon, P. (2002). A multivariate model for spatio-temporal health outcomes with an application to suicide mortality. Geogr. Anal. 36 235–258.
  • Cressie, N. A. C. (1993). Statistics for Spatial Data, rev. ed. Wiley, New York.
  • Cressie, N. and Huang, H.-C. (1999). Classes of nonseparable, spatio-temporal stationary covariance functions. J. Amer. Statist. Assoc. 94 1330–1340.
  • Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 209–226.
  • Cressie, N. and Wikle, C. K. (2011). Statistics for Spatio-Temporal Data. Wiley, Hoboken, NJ.
  • Daniels, M. J., Zhou, Z. and Zou, H. (2006). Conditionally specified space-time models for multivariate processes. J. Comput. Graph. Statist. 15 157–177.
  • Davis, E., Freedman, M., Lane, J., McCall, B., Nestoriak, N. and Park, T. (2006). Supermarket human resource practices and competition from mass merchandisers. Am. J. Agric. Econ. 88 1289–1295.
  • Dube, A., Lester, T. and Reich, M. (2013). Minimum wage, labor market flows, job turnover, search frictions, monopsony, unemployment. Working paper series 1–63, Institute for Research on Labor and Employment.
  • Feder, M. (2001). Time series analysis of repeated surveys: The state-space approach. Stat. Neerl. 55 182–199.
  • Finley, A. O., Sang, H., Banerjee, S. and Gelfand, A. E. (2009). Improving the performance of predictive process modeling for large datasets. Comput. Statist. Data Anal. 53 2873–2884.
  • Frühwirth-Schnatter, S. (1994). Data augmentation and dynamic linear models. J. Time Series Anal. 15 183–202.
  • Gelman, A. and Rubin, D. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 7 473–511.
  • Gneiting, T. (1999). Correlation functions for atmospheric data analysis. Q. J. R. Meteorol. Soc. 125 2449–2464.
  • Griffith, D. (2000). A linear regression solution to the spatial autocorrelation problem. J. Geogr. Syst. 2 141–156.
  • Griffith, D. A. (2002). A spatial filtering specification for the auto-Poisson model. Statist. Probab. Lett. 58 245–251.
  • Griffith, D. (2004). A spatial filtering specification for the auto-logistic model. Environ. Plann. A 36 1791–1811.
  • Griffith, D. and Tiefelsdorf, M. (2007). Semiparametric filtering of spatial autocorrelation: The eigenvector approach. Environ. Plann. A 39 1193–1221.
  • Higham, N. J. (1988). Computing a nearest symmetric positive semidefinite matrix. Linear Algebra Appl. 103 103–118.
  • Hughes, J. and Haran, M. (2013). Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 139–159.
  • Jones, R. G. (1980). Best linear unbiased estimators for repeated surveys. J. Roy. Statist. Soc. Ser. B 42 221–226.
  • Jones, G. L., Haran, M., Caffo, B. S. and Neath, R. (2006). Fixed-width output analysis for Markov chain Monte Carlo. J. Amer. Statist. Assoc. 101 1537–1547.
  • Kang, E. L., Cressie, N. and Shi, T. (2010). Using temporal variability to improve spatial mapping with application to satellite data. Canad. J. Statist. 38 271–289.
  • Katzfuss, M. and Cressie, N. (2012). Bayesian hierarchical spatio-temporal smoothing for very large datasets. Environmetrics 23 94–107.
  • Lindgren, F., Rue, H. and Lindström, J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 423–498.
  • Oehlert, G. W. (1992). A note on the delta method. Amer. Statist. 46 27–29.
  • Pettitt, A. N., Weir, I. S. and Hart, A. G. (2002). A conditional autoregressive Gaussian process for irregularly spaced multivariate data with application to modelling large sets of binary data. Stat. Comput. 12 353–367.
  • Porter, A. T., Holan, S. H. and Wikle, C. K. (2015). Bayesian semiparametric hierarchical empirical likelihood spatial models. J. Statist. Plann. Inference 165 78–90.
  • Porter, A. T., Wikle, C. K. and Holan, S. H. (2015). Small area estimation via multivariate Fay-Herriot models with latent spatial dependence. Aust. N. Z. J. Stat. 57 15–29.
  • Ravishanker, N. and Dey, D. K. (2002). A First Course in Linear Model Theory. Chapman & Hall/CRC, Boca Raton, FL.
  • Reich, B. J., Hodges, J. S. and Zadnik, V. (2006). Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62 1197–1206.
  • Roberts, G. O. (1996). Markov chain concepts related to sampling algorithms. In Markov Chain Monte Carlo in Practice (W. Gilks, S. Richardson and D. Spiegelhalter, eds.). Interdiscip. Statist. 45–57. Chapman & Hall, London.
  • Royle, J., Berliner, M., Wikle, C. and Milliff, R. (1999). A hierarchical spatial model for constructing wind fields from scatterometer data in the Labrador sea. In Case Studies in Bayesian Statistics (R. Kass, B. Carlin, A. Carriquiry, A. Gelman, I. Verdinelli and M. West, eds.) 367–382. Springer, New York.
  • Sampson, P. and Guttorp, P. (1992). Nonparametric estimation of nonstationary spatial covariance structure. J. Amer. Statist. Assoc. 87 108–119.
  • Sengupta, A., Cressie, N., Frey, R. and Kahn, B. (2012). Statistical modeling of MODIS cloud data using the spatial random effects model. In Proceedings of the Joint Statistical Meetings 3111–3123. American Statistical Association, Alexandria, VA.
  • Shumway, R. H. and Stoffer, D. S. (2006). Time Series Analysis and Its Applications: With R Examples, 2nd ed. Springer, New York.
  • Stein, M. L. (2005). Space-time covariance functions. J. Amer. Statist. Assoc. 100 310–321.
  • Stein, M. L. (2014). Limitations on low rank approximations for covariance matrices of spatial data. Spat. Stat. 8 1–19.
  • Sun, Y. and Li, B. (2012). Geostatistics for large datasets. In Space-Time Processes and Challenges Related to Environmental Problems (E. Porcu, J. M. Montero and M. Schlather, eds.) 55–77. Springer, Berlin.
  • Thompson, J. (2009). Using local labor market data to re-examine the employment effects of the minimum wage. Ind. Labor Relat. Rev. 63 343–366.
  • Tzala, E. and Best, N. (2008). Bayesian latent variable modelling of multivariate spatio-temporal variation in cancer mortality. Stat. Methods Med. Res. 17 97–118.
  • Wikle, C. K. (2010). Low-rank representations for spatial processes. In Handbook of Spatial Statistics 107–118. CRC Press, Boca Raton, FL.
  • Zhu, J., Eickhoff, J. C. and Yan, P. (2005). Generalized linear latent variable models for repeated measures of spatially correlated multivariate data. Biometrics 61 674–683.