Bayesian Analysis

Computationally Efficient Multivariate Spatio-Temporal Models for High-Dimensional Count-Valued Data

Jonathan R. Bradley, Scott H. Holan, and Christopher K. Wikle

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access

Abstract

We introduce a computationally efficient Bayesian model for predicting high-dimensional dependent count-valued data. In this setting, the Poisson data model with a latent Gaussian process model has become the de facto model. However, this model can be difficult to use in high dimensional settings, where the data may be tabulated over different variables, geographic regions, and times. These computational difficulties are further exacerbated by acknowledging that count-valued data are naturally non-Gaussian. Thus, many of the current approaches, in Bayesian inference, require one to carefully calibrate a Markov chain Monte Carlo (MCMC) technique. We avoid MCMC methods that require tuning by developing a new conjugate multivariate distribution. Specifically, we introduce a multivariate log-gamma distribution and provide substantial methodological development of independent interest including: results regarding conditional distributions, marginal distributions, an asymptotic relationship with the multivariate normal distribution, and full-conditional distributions for a Gibbs sampler. To incorporate dependence between variables, regions, and time points, a multivariate spatio-temporal mixed effects model (MSTM) is used. To demonstrate our methodology we use data obtained from the US Census Bureau’s Longitudinal Employer-Household Dynamics (LEHD) program. In particular, our approach is motivated by the LEHD’s Quarterly Workforce Indicators (QWIs), which constitute current estimates of important US economic variables.

Article information

Source
Bayesian Anal. (2017), 29 pages.

Dates
First available in Project Euclid: 11 October 2017

Permanent link to this document
https://projecteuclid.org/euclid.ba/1507687687

Digital Object Identifier
doi:10.1214/17-BA1069

Subjects
Primary: 62H11: Directional data; spatial statistics
Secondary: 62P12: Applications to environmental and related topics

Keywords
aggregation American Community Survey Bayesian hierarchical model big data Longitudinal Employer-Household Dynamics (LEHD) program Markov chain Monte Carlo non-Gaussian Quarterly Workforce Indicators

Rights
Creative Commons Attribution 4.0 International License.

Citation

Bradley, Jonathan R.; Holan, Scott H.; Wikle, Christopher K. Computationally Efficient Multivariate Spatio-Temporal Models for High-Dimensional Count-Valued Data. Bayesian Anal., advance publication, 11 October 2017. doi:10.1214/17-BA1069. https://projecteuclid.org/euclid.ba/1507687687


Export citation

References

  • Abowd, J., Schneider, M., and Vilhuber, L. (2013). “Differential privacy applications to Bayesian and linear mixed model estimation.”Journal of Privacy and Confidentiality, 5: 73–105.
  • Abowd, J., Stephens, B., Vilhuber, L., Andersson, F., McKinney, K., Roemer, M., and Woodcock, S. (2009). “The LEHD infrastructure files and the creation of the Quarterly Workforce Indicators.” In Dunne, T., Jensen, J., and Roberts, M. (eds.),Producer Dynamics: New Evidence from Micro Data, 149–230. Chicago: University of Chicago Press for the National Bureau of Economic Research.
  • Anderson, T. (1958).Introduction to Multivariate Statistical Analysis. Canada: Wiley and Sons.
  • Bernardoff, P. (2006). “Which multivariate gamma distributions are infinitely divisible?”Bernoulli, 12: 169–189.
  • Bradley, J. R., Cressie, N., and Shi, T. (2014). “A comparison of spatial predictors when datasets could be very large.”Statistics Surveys, 10: 100–131.
  • Bradley, J. R., Holan, S. H., and Wikle, C. K. (2015). “Multivariate spatio-temporal models for high-dimensional areal data with application to Longitudinal Employer-Household Dynamics.”The Annals of Applied Statistics, 9: 1761–1791.
  • Bradley, J. R., Holan, S. H., and Wikle, C. K. (2017). “Supplemental Materials: Computationally Efficient Multivariate Spatio-Temporal Models for High-Dimensional Count-Valued Data.”Bayesian Analysis.
  • Clayton, D., Bernardinelli, L., and Montomoli, C. (1993). “Spatial correlation in ecological analysis.”International Journal of Epidemiology, 6: 1193–1202.
  • Cressie, N. and Johannesson, G. (2008). “Fixed rank kriging for very large spatial data sets.”Journal of the Royal Statistical Society, Series B, 70: 209–226.
  • Cressie, N. and Wikle, C. K. (2011).Statistics for Spatio-Temporal Data. Hoboken, NJ: Wiley.
  • Crooks, G. (2015). “The Amoroso distribution.”arXiv preprint: 1005.3274.
  • Daniels, M., Zhou, Z., and Zou, H. (2006). “Conditionally specified space–time models for multivariate processes.”Journal of Computational and Graphical Statistics, 15: 157–177.
  • De Oliveira, V. (2003). “A note on the correlation structure of transformed Gaussian random fields.”Australian and New Zealand Journal of Statistics, 45: 353–366.
  • De Oliveira, V. (2013). “Hierarchical Poisson models for spatial count data.”Journal of Multivariate Analysis, 122: 393–408.
  • Demirhan, H. and Hamurkaroglu, C. (2011). “On a multivariate log-gamma distribution and the use of the distribution in the Bayesian analysis.”Journal of Statistical Planning and Inference, 141: 1141–1152.
  • Diggle, P. J., Tawn, J. A., and Moyeed, R. A. (1998). “Model-based geostatistics.”Journal of the Royal Statistical Society, Series C, 47: 299–350.
  • Garg, S., Singh, A., and Ramos, F. (2012). “Learning non-stationary space-time models for environmental monitoring.” InProceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence.
  • Gelfand, A. and Vounatsou, P. (2003). “Proper multivariate conditional autoregressive models for spatial data analysis.”Biostatistics, 4: 11–15.
  • Gelfand, A. E. and Schliep, E. M. (2016). “Spatial statistics and Gaussian processes: a beautiful marriage.”Spatial Statistics, 18: 86–104.
  • Gelfand, A. E. and Smith, A. (2007). “Disease mapping and spatial regression with count data.”Biostatistics, 8: 158–183.
  • Gelman, A. and Rubin, D. (1992). “Inference from iterative simulation using multiple sequences.”Statistical Science, 7: 473–511.
  • Gong, L. and Flegal, J. M. (2016). “A practical sequential stopping rule for highdimensional Markov chain Monte Carlo.”Journal of Computational and Graphical Statistics, 25: 684–700.
  • Griffith, D. (2000). “A linear regression solution to the spatial autocorrelation problem.”Journal of Geographical Systems, 2: 141–156.
  • Griffith, D. (2002). “A spatial filtering specification for the auto-Poisson model.”Statistics and Probability Letters, 58: 245–251.
  • Griffith, D. (2004). “A spatial filtering specification for the auto-logistic model.”Environment and Planning A, 36: 1791–1811.
  • Griffiths, R. C. (1984). “Characterization of infinitely divisible multivariate gamma distribution.”Journal of Multivariate Analysis, 15: 13–20.
  • Hodges, J. S. and Reich, B. J. (2011). “Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love.”The American Statistician, 64: 325–334.
  • Holan, S. H. and Wikle, C. K. (2016). “Hierarchical dynamic generalized linear mixed models for discrete-valued spatio-temporal data.” InHandbook of Discrete–Valued Time Series. R. A. Davis, S. H. Holan, R. Lund, and N. Ravishanker (eds). CRC Press.
  • Huang, H. C. and Hsu, N. J. (2004). “Modeling transport effects on ground-level ozone using a non-stationary space-time model.”Environmetrics, 15: 251–268.
  • Hughes, J. and Haran, M. (2013). “Dimension reduction and alleviation of confounding for spatial generalized linear mixed model.”Journal of the Royal Statistical Society, Series B, 75: 139–159.
  • Jin, X., Banerjee, S., and Carlin, B. (2007). “Order-free coregionalized lattice models with application to multiple disease mapping.”Journal of the Royal Statistical Society series B, 69: 817–838.
  • Johnson, R. and Wichern, D. (1999).Applied Multivariate Statistical Analysis, 3rd ed.. Englewood Cliffs, New Jersey: Prentice Hall, Inc.
  • Jones, G., Haran, M., Caffo, B., and Neath, R. (2006). “Fixed-width output analysis for Markov chain Monte Carlo.”Journal of the American Statistical Association, 101: 1537–1547.
  • Kass, R. E., Carlin, B. P., and Neal, R. M. (2016). “Markov chain Monte Carlo in practice: a roundtable discussion.”The American Statistician, 52: 93–100.
  • Kotz, S., Balakrishnan, N., and Johnson, N. (2000).Continuous Multivariate Distributions, Volume 1: Models and Applications. New York, NY: Wiley.
  • Lawson, A. B. (2006).Statistical Methods in Spatial Epidemiology, 2nd edn.. New York, NY: Wiley.
  • Lee, Y. and Nelder, J. (1974). “Double hierarchical generalized linear models with discussion.”Applied Statistics, 55: 129–185.
  • Lee, Y. and Nelder, J. A. (2000). “HGLMs for analysis of correlated non-normal data.” In Bethlehem, J. G. and van der Heijden, P. G. M. (eds.),COMPSTAT: Proceedings in Computational Statistics 14th Symposium held in Utrecht, The Netherlands, 2000, 97–107. Utrecht, the Netherlands.
  • Lee, Y. and Nelder, J. A. (2001). “Modelling and analysing correlated non-normal data.”Statistical Modelling, 1: 3–16.
  • Liu, J. S. (2008).Monte Carlo Strategies in Scientific Computing. New York, NY: Springer.
  • Lohr, S. (1999).Sampling Design and Analysis. Pacific Grove, CA, USA: Brooks/Cole Publishing Company.
  • Ma, C. (2002). “Spatio-temporal covariance functions generated by mixtures.”Mathematical Geology, 34: 965–975.
  • Moran, P. A. P. (1950). “Notes on Continuous Stochastic Phenomena.”Biometrika, 37: 17–23.
  • Moran, P. A. P. and Vere-Jones, D. (1969). “The infinite divisibility of multivariate gamma distributions.”Sankhya. Series A, 40: 393–398.
  • Neal, R. M. (2011). “MCMC Using Hamiltonian Dynamics.” In Brooks, S., Gelman, A., Jones, G. L., and Meng, X. (eds.),Handbook of Markov Chain Monte Carlo, 113–160. Chapman and Hall.
  • Nieto-Barajas, L. E. and Huerta, G. (2017). “Spatio-temporal pareto modelling of heavy-tail data.”Spatial Statistics, 20: 92–109.
  • Prentice, R. (1974). “A log gamma model and its maximum likelihood estimation.”Biometrika, 61: 539–544.
  • Quick, H., Holan, S. H., and Wikle, C. K. (2015). “Zeros and ones: a case for suppressing zeros in sensitive count data with an application to stroke mortality.”Stat, 4: 227–234.
  • Reich, B. J., Hodges, J. S., and Zadnik, V. (2006). “Effects of Residual Smoothing on the Posterior of the Fixed Effects in Disease-Mapping Models.”Biometrics, 62: 1197–1206.
  • Robert, C. P. and Casella, G. (2013).Monte Carlo Statistical Methods. New York, NY: Springer.
  • Roberts, G. (1996). “Markov chain concepts related to sampling algorithms.” In Gilks, W., Richardson, S., and Spiegelhalter, D. (eds.),Markov Chain Monte Carlo in Practice, 45–57. Chapman and Hall, Boca Raton.
  • Royle, J. A. and Wikle, C. K. (2005). “Efficient statistical mapping of avian count data.”Environmental and Ecological Statistics, 12: 225–243.
  • Rue, H., Martino, S., and Chopin, N. (2009). “Approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations.”Journal of the Royal Statistical Society, Series B, 71: 319–392.
  • Sigrist, F., Kunsch, H. R., and Stehel, W. A. (2011). “A dynamic nonstationary spatiotemporal model for short term prediction of precipitation.”The Annals of Applied Statistics, 6: 1452–1477.
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). “Bayesian measures of model complexity and fit.”Journal of the Royal Statistical Society, Series B, 64: 583–616.
  • Vats, D., Flegal, J. M., and Jones, G. L. (2016). “Multivariate Output Analysis for Markov Chain Monte Carlo.”arXiv preprint: 1512.07713.
  • Vere-Jones, D. (1967). “The infinite divisibility of a bivariate gamma distribution.”Sankhya. Series A, 29: 421–422.
  • Wikle, C. K. and Anderson, C. J. (2003). “Limatological analysis of tornado report counts using a hierarchical Bayesian spatio-temporal model.”Journal of Geophysical Research-Atmospheres, 108: 9005.
  • Wikle, C. K., Milliff, R. F., Nychka, D., and Berliner, L. M. (2001). “Spatiotemporal hierarchical Bayesian modeling tropical ocean surface winds.”Journal of the American Statistical Association (Theory and Methods), 96: 382–397.
  • Wolpert, R. and Ickstadt, K. (1998). “Poisson/gamma random field models for spatial statistics.”Biometrika, 85: 251–267.
  • Wu, G., Holan, S. H., and Wikle, C. K. (2013). “Hierarchical Bayesian Spatio-Temporal Conway-Maxwell Poisson Models with Dynamic Dispersion.”Journal of Agricultural, Biological, and Environmental Statistics, 18: 335–356.

Supplemental materials