The Annals of Applied Statistics

Nonlinear predictive latent process models for integrating spatio-temporal exposure data from multiple sources

Nikolay Bliznyuk, Christopher J. Paciorek, Joel Schwartz, and Brent Coull

Full-text: Open access


Spatio-temporal prediction of levels of an environmental exposure is an important problem in environmental epidemiology. Our work is motivated by multiple studies on the spatio-temporal distribution of mobile source, or traffic related, particles in the greater Boston area. When multiple sources of exposure information are available, a joint model that pools information across sources maximizes data coverage over both space and time, thereby reducing the prediction error.

We consider a Bayesian hierarchical framework in which a joint model consists of a set of submodels, one for each data source, and a model for the latent process that serves to relate the submodels to one another. If a submodel depends on the latent process nonlinearly, inference using standard MCMC techniques can be computationally prohibitive. The implications are particularly severe when the data for each submodel are aggregated at different temporal scales.

To make such problems tractable, we linearize the nonlinear components with respect to the latent process and induce sparsity in the covariance matrix of the latent process using compactly supported covariance functions. We propose an efficient MCMC scheme that takes advantage of these approximations. We use our model to address a temporal change of support problem whereby interest focuses on pooling daily and multiday black carbon readings in order to maximize the spatial coverage of the study region.

Article information

Ann. Appl. Stat., Volume 8, Number 3 (2014), 1538-1560.

First available in Project Euclid: 23 October 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Air pollution approximate inference covariance tapering Gaussian processes hierarchical model likelihood approximation particulate matter semiparametric model spatio-temporal model


Bliznyuk, Nikolay; Paciorek, Christopher J.; Schwartz, Joel; Coull, Brent. Nonlinear predictive latent process models for integrating spatio-temporal exposure data from multiple sources. Ann. Appl. Stat. 8 (2014), no. 3, 1538--1560. doi:10.1214/14-AOAS737.

Export citation


  • Adar, S. D., Klein, R., Klein, B. E. K., Szpiro, A. A., Cotch, M. F., Wong, T. Y., O’Neill, M. S., Shrager, S., Barr, R. G., Siscovick, D. S., Daviglus, M. L., Sampson, P. D. and Kaufman, J. D. (2010). Air pollution and the microvasculature: A cross-sectional assessment of in vivo retinal images in the population-based multi-ethnic study of atherosclerosis (MESA). PLOS Medicine 7 e1000372.
  • Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2004). Hierarchical Modeling and Analysis for Spatial Data. Chapman & Hall, Boca Raton, FL.
  • Berhane, K., Gauderman, W. J., Stram, D. O. and Thomas, D. C. (2004). Statistical issues in studies of the long-term effects of air pollution: The Southern California children’s health study. Statist. Sci. 19 414–449.
  • Bliznyuk, N., Ruppert, D. and Shoemaker, C. A. (2011). Efficient interpolation of computationally expensive posterior densities with variable parameter costs. J. Comput. Graph. Statist. 20 636–655.
  • Bliznyuk, N., Paciorek, C. J., Schwartz, J. and Coull, B. (2014). Supplement to “Nonlinear predictive latent process models for integrating spatio-temporal exposure data from multiple sources.” DOI:10.1214/14-AOAS737SUPP.
  • Calder, C. A. (2007). Dynamic factor process convolution models for multivariate space–time data with application to air quality assessment. Environ. Ecol. Stat. 14 229–247.
  • Calder, C. A. (2008). A dynamic process convolution approach to modeling ambient particulate matter concentrations. Environmetrics 19 39–48.
  • Christensen, O. F., Roberts, G. O. and Sköld, M. (2006). Robust Markov chain Monte Carlo methods for spatial generalized linear mixed models. J. Comput. Graph. Statist. 15 1–17.
  • Christensen, O. F. and Waagepetersen, R. (2002). Bayesian prediction of spatial count data using generalized linear mixed models. Biometrics 58 280–286.
  • Fuentes, M. and Raftery, A. E. (2005). Model evaluation and spatial interpolation by Bayesian combination of observations with outputs from numerical models. Biometrics 61 36–45.
  • Furrer, R., Genton, M. G. and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Statist. 15 502–523.
  • Gelfand, A., Zhu, L. and Carlin, B. (2001). On the change of support problem for spatio-temporal data. Biostatistics 2 31–45.
  • Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359–378.
  • Gneiting, T., Ševčíková, H. and Percival, D. B. (2012). Estimators of fractal dimension: Assessing the roughness of time series and spatial data. Statist. Sci. 27 247–277.
  • Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, 3rd ed. Johns Hopkins Univ. Press, Baltimore, MD.
  • Gotway, C. A. and Young, L. J. (2002). Combining incompatible spatial data. J. Amer. Statist. Assoc. 97 632–648.
  • Gotway, C. A. and Young, L. J. (2007). A geostatistical approach to linking geographically aggregated data from different sources. J. Comput. Graph. Statist. 16 115–135.
  • Gryparis, A., Coull, B. A., Schwartz, J. and Suh, H. H. (2007). Semiparametric latent variable regression models for spatiotemporal modelling of mobile source particles in the greater Boston area. J. Roy. Statist. Soc. Ser. C 56 183–209.
  • Gryparis, A., Paciorek, C. J., Zeka, A., Schwartz, J. and Coull, B. A. (2009). Measurement error caused by spatial misalignment in environmental epidemiology. Biostatistics 10 258–274.
  • Haario, H., Saksman, E. and Tamminen, J. (2001). An adaptive Metropolis algorithm. Bernoulli 7 223–242.
  • Janssen, N. A. H., Hoek, G., Simic-Lawson, S., Fischer, P., van Bree, L., ten Brink, H., Keuken, M., Atkinson, R. W., Anderson, H. R., Brunekreef, B. and Casee, F. R. (2011). Black carbon as an additional indicator of the adverse health effects of airborne particles compared with PM10 and PM2.5. Environ. Health Perspect. 119 1691–1699.
  • Opsomer, J., Wang, Y. and Yang, Y. (2001). Nonparametric regression with correlated errors. Statist. Sci. 16 134–153.
  • Robert, C. P. and Casella, G. (1999). Monte Carlo Statistical Methods. Springer, New York.
  • Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability 104. Chapman & Hall, Boca Raton, FL.
  • Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 319–392.
  • Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Univ. Press, Cambridge.
  • Tierney, L. and Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities. J. Amer. Statist. Assoc. 81 82–86.
  • van Dyk, D. A. and Park, T. (2008). Partially collapsed Gibbs samplers: Theory and methods. J. Amer. Statist. Assoc. 103 790–796.
  • Wang, Y. (1998). Smoothing spline models with correlated random errors. J. Amer. Statist. Assoc. 93 341–348.
  • Wannemuehler, K. A., Lyles, R. H., Waller, L. A., Hoekstra, R. M., Klein, M. and Tolbert, P. (2009). A conditional expectation approach for associating ambient air pollutant exposures with health outcomes. Environmetrics 20 877–894.
  • Wood, S. N. (2006). Generalized Additive Models: An Introduction with $R$. Chapman & Hall, Boca Raton, FL.
  • Zanobetti, A., Coull, B. A., Gryparis, A., Sparrow, D., Vokonas, P. S., Wright, R. O., Gold, D. R. and Schwartz, J. (2014). Associations between arrhythmia episodes and temporally and spatially resolved black carbon and particulate matter in elderly patients. Occup. Environ. Med. 71 201–207.
  • Zeger, S. L., Thomas, D., Dominici, F., Samet, J. M., Schwartz, J., Dockery, D. and Cohen, A. (2000). Exposure measurement error in time-series studies of air pollution: Concepts and consequences. Occup. Environ. Med. 108 419–426.

Supplemental materials

  • Supplementary material: Supplement to “Nonlinear predictive latent process models for integrating spatio-temporal exposure data from multiple sources”. Online supplements contain technical details and supplementary figures and tables.