The Annals of Applied Statistics

Reduced-rank spatio-temporal modeling of air pollution concentrations in the Multi-Ethnic Study of Atherosclerosis and Air Pollution

Casey Olives, Lianne Sheppard, Johan Lindström, Paul D. Sampson, Joel D. Kaufman, and Adam A. Szpiro

Full-text: Open access


There is growing evidence in the epidemiologic literature of the relationship between air pollution and adverse health outcomes. Prediction of individual air pollution exposure in the Environmental Protection Agency (EPA) funded Multi-Ethnic Study of Atheroscelerosis and Air Pollution (MESA Air) study relies on a flexible spatio-temporal prediction model that integrates land-use regression with kriging to account for spatial dependence in pollutant concentrations. Temporal variability is captured using temporal trends estimated via modified singular value decomposition and temporally varying spatial residuals. This model utilizes monitoring data from existing regulatory networks and supplementary MESA Air monitoring data to predict concentrations for individual cohort members.

In general, spatio-temporal models are limited in their efficacy for large data sets due to computational intractability. We develop reduced-rank versions of the MESA Air spatio-temporal model. To do so, we apply low-rank kriging to account for spatial variation in the mean process and discuss the limitations of this approach. As an alternative, we represent spatial variation using thin plate regression splines. We compare the performance of the outlined models using EPA and MESA Air monitoring data for predicting concentrations of oxides of nitrogen ($\mathrm{NO}_{x}$)—a pollutant of primary interest in MESA Air—in the Los Angeles metropolitan area via cross-validated $R^{2}$.

Our findings suggest that use of reduced-rank models can improve computational efficiency in certain cases. Low-rank kriging and thin plate regression splines were competitive across the formulations considered, although TPRS appeared to be more robust in some settings.

Article information

Ann. Appl. Stat., Volume 8, Number 4 (2014), 2509-2537.

First available in Project Euclid: 19 December 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Spatiotemporal modeling reduced-rank air pollution kriging thin plate splines


Olives, Casey; Sheppard, Lianne; Lindström, Johan; Sampson, Paul D.; Kaufman, Joel D.; Szpiro, Adam A. Reduced-rank spatio-temporal modeling of air pollution concentrations in the Multi-Ethnic Study of Atherosclerosis and Air Pollution. Ann. Appl. Stat. 8 (2014), no. 4, 2509--2537. doi:10.1214/14-AOAS786.

Export citation


  • Banerjee, S., Gelfand, A. E., Finley, A. O. and Sang, H. (2008). Gaussian predictive process models for large spatial data sets. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 825–848.
  • Brauer, M., Hoek, G., van Vliet, P., Meliefste, K., Fischer, P., Gehring, U., Heinrich, J., Cyrys, J., Bellander, T., Lewne, M. and Brunekreef, B. (2003). Estimating long-term average particulate air pollution concentrations: Application of traffic indicators and geographic information systems. Epidemiology 14 228–239.
  • Byrd, R. H., Lu, P., Nocedal, J. and Zhu, C. Y. (1995). A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16 1190–1208.
  • Carroll, M. L., DiMiceli, C. M., Sohlberg, R. A. and Townshend, J. R. G. (2004). 250m MODIS Normalized Difference Vegetation Index, 250ndvi28920033435, Collection 4. Univ. Maryland, College Park, Maryland, Day 289, 2003.
  • Cohen, M. A., Adar, S. D., Allen, R. W., Avol, E., Curl, C. L., Gould, T., Hardie, D., Ho, A., Kinney, P., Larson, T. V., Sampson, P., Sheppard, L., Stukovsky, K. D., Swan, S. S., Liu, L. J. S. and Kaufman, J. D. (2009). Approach to estimating participant pollutant exposures in the multi-ethnic study of atherosclerosis and air pollution (MESA air). Environmental Science & Technology 43 4687–4693.
  • Crainiceanu, C. M., Diggle, P. J. and Rowlingson, B. (2008). Bivariate binomial spatial modeling of Loa loa prevalence in tropical Africa. J. Amer. Statist. Assoc. 103 21–37.
  • Dockery, D. W., Pope, C. A. 3rd, Xu, X., Spengler, J. D., Ware, J. H., Fay, M. E., Ferris, B. G. Jr and Speizer, F. E. (1993). An association between air pollution and mortality in six U.S. cities. N. Engl. J. Med. 329 1753–1759.
  • Eckhoff, P. A. and Braverman, T. N. (1995). Addendum to the user’s guide to CAL3QHC version 2.0 (CAL3QHCR user’s guide). Technical Support Division, Office of Air Quality Planning and Standards, Research Triangle Park, NC.
  • Fry, J., Xian, G., Jin, S., Dewitz, J., Homer, C., Yang, L., Barnes, C., Herold, N. and Wickham, J. (2011). Completion of the 2006 National Land Cover Database for the Conterminous United States. Photogrammetric Engineering & Remote Sensing 77 858–864.
  • Fuentes, M. (2007). Approximate likelihood for large irregularly spaced spatial data. J. Amer. Statist. Assoc. 102 321–331.
  • Fuentes, M., Guttorp, P. and Sampson, P. D. (2006). Using transforms to analyze space-time processes. Monogr. Statist. Appl. Probab. 107 77.
  • Gelfand, A. E., Banerjee, S. and Gamerman, D. (2005). Spatial process modelling for univariate and multivariate dynamic spatial data. Environmetrics 16 465–479.
  • Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Monographs on Statistics and Applied Probability 58. Chapman & Hall, London.
  • Hodges, J. S. (2013). Richly Parameterized Linear Models. Chapman & Hall, Boca Raton.
  • Hodges, J. and Clayton, M. K. (2011). Random effects old and new. Technical report, Univ. Minnesota, Minneapolis, MN.
  • Hoek, G., Beelen, R., de Hoogh, K., Vienneau, D., Gulliver, J., Fischer, P. and Briggs, D. (2008). A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmospheric Environment 42 7561–7578.
  • Jerrett, M., Arain, A., Kanaroglou, P., Beckerman, B., Potoglou, D., Sahsuvaroglu, T., Morrison, J. and Giovis, C. (2005a). A review and evaluation of intraurban air pollution exposure models. J. Expo. Anal. Environ. Epidemiol. 15 185–204.
  • Jerrett, M., Burnett, R. T., Ma, R., Pope, C. A. 3rd, Krewski, D., Newbold, K. B., Thurston, G., Shi, Y., Finkelstein, N., Calle, E. E. and Thun, M. J. (2005b). Spatial analysis of air pollution and mortality in los angeles. Epidemiology 16 727–736.
  • Kammann, E. E. and Wand, M. P. (2003). Geoadditive models. J. Roy. Statist. Soc. Ser. C 52 1–18.
  • Kaufman, J. K., Adar, S. D., Allen, R. W., Barr, R. G., Budoff, M. J., Burke, G. L., Casillas, A. M., Cohen, M. A., Curl, C. L., Daviglus, M. L., Diez Roux, A. V., Jacobs, D. R. Jr, Kronmal, R. A., Larson, T. V., Liu, S. L., Lumley, T., Navas-Acien, A., O’Leary, D. H., Rotter, J. I., Sampson, P. D., Sheppard, L., Siscovick, D. S., Stein, J. H., Szpiro, A. A. and Tracy, R. P. (2012). Prospective study of particulate air pollution exposures, subclinical atherosclerosis, and clinical cardiovascular disease the multi-ethnic study of atherosclerosis and air pollution (MESA air). American Journal of Epidemiology 176 825–837.
  • Keller, J. P., Olives, C., Kim, S.-Y., Sheppard, L., Sampson, P. D., Szpiro, A. A., Oron, A. P., Lindström, J., Vedal, S. and Kaufman, J. D. (2014). A unified spatiotemporal modeling approach for prediction of multiple air pollutants in the multi-ethnic study of atherosclerosis and air pollution. Environ. Health Perspect. To appear.
  • Kimeldorf, G. S. and Wahba, G. (1970). A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Math. Statist. 41 495–502.
  • Künzli, N., Jerrett, M., Mack, W. J., Beckerman, B., LaBree, L., Gilliland, F., Thomas, D., Peters, J. and Hodis, H. N. (2005). Ambient air pollution and atherosclerosis in Los Angeles. Environ. Health Perspect. 113 201–206.
  • Lindström, J., Szpiro, A. A., Sampson, P. D., Oron, A., Richards, M., Larson, T. and Sheppard, L. (2013). A flexible spatio-temporal model for air pollution with spatial and spatio-temporal covariates. Environ. Ecol. Stat. 1–23.
  • Miller, K. A., Siscovick, D. S., Sheppard, L., Shepherd, K., Sullivan, J. H., Anderson, G. L. and Kaufman, J. D. (2007). Long-term exposure to air pollution and incidence of cardiovascular events in women. N. Engl. J. Med. 356 447–458.
  • Nychka, D. W. (2000). Spatial-process estimates as smoothers. In Smoothing and Regression: Approaches, Computation, and Application 393–424. Wiley, New York.
  • Nychka, D. and Saltzman, N. (1998). Design of air quality networks. In Case Studies in Environmental Statistics (D. Nychka, L. Cox and W. Piegorsch, eds.). Lecture Notes in Statistics 132 51–76. Springer, New York.
  • Olives, C., Sheppard, L., Lindström, J., Sampson, P. D., Kaufman, J. D. and Szpiro, A. A. (2014). Supplement to “Reduced-rank spatio-temporal modeling of air pollution concentrations in the Multi-Ethnic Study of Atherosclerosis and Air Pollution.” DOI:10.1214/14-AOAS786SUPP.
  • Pace, R. and LeSage, J. (2009). A sampling approach to estimate the log determinant used in spatial likelihood problems. Journal of Geographical Systems 11 209–225.
  • Paciorek, C. J., Yanosky, J. D., Puett, R. C., Laden, F. and Suh, H. H. (2009). Practical large-scale spatio-temporal modeling of particulate matter concentrations. Ann. Appl. Stat. 3 370–397.
  • Pope, C. A. 3rd, Thun, M. J., Namboodiri, M. M., Dockery, D. W., Evans, J. S., Speizer, F. E. and Heath, C. W. Jr (1995). Particulate air pollution as a predictor of mortality in a prospective study of U.S. adults. Am. J. Respir. Crit. Care Med. 151 669–674.
  • Pope, C. A. 3rd, Burnett, R. T., Thun, M. J., Calle, E. E., Krewski, D., Ito, K. and Thurston, G. D. (2002). Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution. Journal of the American Medical Association 287 1132–1141.
  • Ritz, B., Wilhelm, M. and Zhao, Y. (2006). Air pollution and infant death in southern California, 1989–2000. Pediatrics 118 493–502.
  • Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics 12. Cambridge Univ. Press, Cambridge.
  • Samet, J. M., Dominici, F., Curriero, F. C., Coursac, I. and Zeger, S. L. (2000). Fine particulate air pollution and mortality in 20 U.S. cities, 1987–1994. N. Engl. J. Med. 343 1742–1749.
  • Sampson, P. D., Szpiro, A. A., Sheppard, L., Lindström, J. and Kaufman, J. D. (2011). Pragmatic estimation of spatio-temporal air quality model with irregular monitoring data. Atmospheric Evnironment 45 6593–6606.
  • Stein, M. L. (2007). Spatial variation of total column ozone on a global scale. Ann. Appl. Stat. 1 191–210.
  • Stein, M. L. (2008). A modeling approach for large spatial datasets. J. Korean Statist. Soc. 37 3–10.
  • Stroud, J. R., Müller, P. and Sansó, B. (2001). Dynamic models for spatiotemporal data. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 673–689.
  • Szpiro, A. A. and Paciorek, C. J. (2013). Measurement error in two-stage analyses, with application to air pollution epidemiology. Environmetrics 24 501–517.
  • Szpiro, A. A., Sheppard, L. and Lumley, T. (2011). Efficient measurement error correction with spatially misaligned data. Biostatistics 12 610–623.
  • Szpiro, A. A., Sampson, P. D., Sheppard, L., Lumley, T., Adar, S. D. and Kaufman, J. D. (2010). Predicting intra-urban variation in air pollution concentrations with complex spatio-temporal dependencies. Environmetrics 21 606–631.
  • TeleAtlas (2000). TeleAtlas Dynamap 2000. [CD_ROM], TeleAtlas, Lebanon, NH.
  • Wahba, G. (1981). Spline interpolation and smoothing on the sphere. SIAM J. Sci. Statist. Comput. 2 5–16.
  • Wood, S. N. (2003). Thin plate regression splines. J. R. Stat. Soc. Ser. B Stat. Methodol. 65 95–114.
  • Zhang, H. (2004). Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J. Amer. Statist. Assoc. 99 250–261.

Supplemental materials

  • Supplementary material: Supplement to “Reduced-rank spatio-temporal modeling of air pollution concentrations in the Multi-Ethnic Study of Atherosclerosis and Air Pollution”. We provide a detailed derivation of the optimized likelihood, comparisons of the prediction variances, discussion model selection by AIC for the paper “Reduced-rank spatio-temporal modeling of air pollution concentrations in the Multi-Ethnic Study of Atherosclerosis and Air Pollution” by Casey Olives, Lianne Sheppard, Johan Lindström, Paul D. Sampson, Joel D. Kaufman and Adam A. Szpiro.