Statistical Science

The Importance of Scale for Spatial-Confounding Bias and Precision of Spatial Regression Estimators

Christopher J. Paciorek

Full-text: Open access

Abstract

Residuals in regression models are often spatially correlated. Prominent examples include studies in environmental epidemiology to understand the chronic health effects of pollutants. I consider the effects of residual spatial structure on the bias and precision of regression coefficients, developing a simple framework in which to understand the key issues and derive informative analytic results. When unmeasured confounding introduces spatial structure into the residuals, regression models with spatial random effects and closely-related models such as kriging and penalized splines are biased, even when the residual variance components are known. Analytic and simulation results show how the bias depends on the spatial scales of the covariate and the residual: one can reduce bias by fitting a spatial model only when there is variation in the covariate at a scale smaller than the scale of the unmeasured confounding. I also discuss how the scales of the residual and the covariate affect efficiency and uncertainty estimation when the residuals are independent of the covariate. In an application on the association between black carbon particulate matter air pollution and birth weight, controlling for large-scale spatial variation appears to reduce bias from unmeasured confounders, while increasing uncertainty in the estimated pollution effect.

Article information

Source
Statist. Sci. Volume 25, Number 1 (2010), 107-125.

Dates
First available: 3 August 2010

Permanent link to this document
http://projecteuclid.org/euclid.ss/1280841736

Digital Object Identifier
doi:10.1214/10-STS326

Mathematical Reviews number (MathSciNet)
MR2741817

Citation

Paciorek, Christopher J. The Importance of Scale for Spatial-Confounding Bias and Precision of Spatial Regression Estimators. Statistical Science 25 (2010), no. 1, 107--125. doi:10.1214/10-STS326. http://projecteuclid.org/euclid.ss/1280841736.


Export citation

References

  • Augustin, N., Lang, S., Musio, M. and von Wilpert, K. (2007). A spatial model for the needle losses of pine-trees in the forests of Baden–Wurttemberg: An application of Bayesian structured additive regression. J. Roy. Statist. Soc. Ser. C 56 29–50.
  • Beelen, R., Hoek, G., Fischer, P., Brandt, P. and Brunekreef, B. (2007). Estimated long-term outdoor air pollution concentrations in a cohort study. Atmospheric Environment 41 1343–1358.
  • Biggeri, A., Bonannini, M., Catelan, D., Divino, F., Dreassi, E. and Lagazio, C. (2005). Bayesian ecological regression with latent factors: Atmospheric pollutants, emissions, and mortality for lung cancer. Environmental and Ecological Statistics 12 397–409.
  • Bivand, R. (1980). A Monte Carlo study of correlation coefficient estimation with spatially autocorrelated observations. Quaestiones Geographicae 6 5–10.
  • Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88 9–25.
  • Buja, A., Hastie, T. and Tibshirani, R. (1989). Linear smoothers and additive models. Ann. Statist. 17 453–510.
  • Burden, S., Guha, S., Morgan, G., Ryan, L., Sparks, R. and Young, L. (2005). Spatio-temporal analysis of acute admissions for ischemic heart disease in NSW, Australia. Environmental and Ecological Statistics 12 427–448.
  • Burnett, R., Ma, R., Jerrett, M., Goldberg, M., Cakmak, S., Pope III, C. and Krewski, D. (2001). The spatial association between community air pollution and mortality: A new method of analyzing correlated geographic cohort data. Environmental Health Perspectives 109 375–380.
  • Cakmak, S., Burnett, R., Jerrett, M., Goldberg, M., Pope III, C., Ma, R., Gultekin, T., Thun, M. and Krewski, D. (2003). Spatial regression models for large-cohort studies linking community air pollution and health. Journal of Toxicology and Environmental Health, Part A 66 1811–1823.
  • Cerdá, M., Tracy, M., Messner, S., Vlahov, D., Tardiff, K. and Galea, S. (2009). Misdemeanor policing, physical disorder, and gun-related homicide: A spatial analytic test of “broken-windows” theory. Epidemiology 20 533–541.
  • Cho, W. (2003). Contagion effects and ethnic contribution networks. American Journal of Political Science 47 368–387.
  • Claeskens, G., Krivobokova, T. and Opsomer, J. (2009). Asymptotic properties of penalized spline estimators. Biometrika 96 529–544.
  • Clayton, D., Bernardinelli, L. and Montomoli, C. (1993). Spatial correlation in ecological analysis. International Journal of Epidemiology 22 1193–1202.
  • Cressie, N. (1993). Statistics for Spatial Data, rev. ed. Wiley-Interscience, New York.
  • Diggle, P., Heagerty, P. J., Liang, K.-Y. and Zeger, S. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Univ. Press, Oxford.
  • Dominici, F., McDermott, A. and Hastie, T. (2004). Improved semiparametric time series models of air pollution and mortality. J. Amer. Statist. Assoc. 99 938–949.
  • Dow, M., Burton, M. and White, D. (1982). Network autocorrelation: A simulation study of a foundational problem in regression and survey research. Social Networks 4 169–200.
  • Gryparis, A., Coull, B., Schwartz, J. and Suh, H. (2007). Latent variable semiparametric regression models for spatio-temporal modeling of mobile source pollution in the greater Boston area. J. Roy. Statist. Soc. Ser. C 56 183–209.
  • Gryparis, A., Paciorek, C., Zeka, A., Schwartz, J. and Coull, B. (2009). Measurement error caused by spatial misalignment in environmental epidemiology. Biostatistics 10 258–274.
  • Gustafson, P. and Greenland, S. (2006). The performance of random coefficient regression in accounting for residual confounding. Biometrics 62 760–768.
  • He, S., Mazumdar, S. and Arena, V. (2006). A comparative study of the use of GAM and GLM in air pollution research. Environmetrics 17 81–93.
  • Hodges, J. and Reich, B. (2010). Adding spatially-correlated errors can mess up the fixed effect you love. Technical Report 2010-002, Division of Biostatistics, Univ. Minnesota. Available at http://www.biostat.umn.edu/ftp/pub/2010/rr2010-002.pdf.
  • Houseman, E., Coull, B. and Shine, J. (2006). A nonstationary negative binomial time series with time-dependent covariates: Enterococcus counts in Boston harbor. J. Amer. Statist. Assoc. 101 1365–1376.
  • Janes, H., Dominici, F. and Zeger, S. (2007). Trends in air pollution and mortality: An approach to the assessment of unmeasured confounding. Epidemiology 18 416–423.
  • Johnston, J. and DiNardo, J. (1997). Econometric Methods, 4th ed. McGraw-Hill, New York.
  • Lawson, A. (2006). Statistical Methods in Spatial Epidemiology, 2nd ed. Wiley, New York.
  • Lee, D., Ferguson, C. and Mitchell, R. (2009). Air pollution and health in Scotland: A multicity study. Biostatistics 10 409–423.
  • Legendre, P. (1993). Spatial autocorrelation: Trouble or new paradigm? Ecology 74 1659–1673.
  • Lombardía, M. J. and Sperlich, S. (2007). Multi-level regression between fixed effects and mixed effects models. Technical report, Georg-August-Univ. Göttingen. Available at http:// www.zfs.uni-goettingen.de/index.php?id=54.
  • Lu, Y. and Zeger, S. (2007). Decomposition of regression estimators to explore the influence of “unmeasured” time-varying confounders. Technical Report 159, Dept. Biostatistics, Johns Hopkins Univ., Baltimore, MD. Available at http:// www.bepress.com/jhubiostat/paper159.
  • Molitor, J., Jerrett, M., Chang, C. et al. (2007). Assessing uncertainty in spatial exposure models for air pollution health effects assessment. Environmental Health Perspectives 115 1147–1153.
  • Peng, R., Dominici, F. and Louis, T. (2006). Model choice in time series studies of air pollution and mortality. J. Roy. Statist. Soc. Ser. A 169 179–203.
  • Pope III, C., Burnett, R., Thun, M., Calle, E., Krewski, D., Ito, K. and Thurston, G. (2002). Lung cancer, cardiopulmonary mortality and long-term exposure to fine particulate air pollution. Journal of the American Medical Association 287 1132–1141.
  • Ramsay, T., Burnett, R. and Krewski, D. (2003). Exploring bias in a generalized additive model for spatial air pollution data. Environmental Health Perspectives 111 1283–1288.
  • Reich, B., Hodges, J. and Zadnik, V. (2006). Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62 1197–1206.
  • Rice, J. (1986). Convergence rate for partially linear splined models. Statist. Probab. Lett. 4 203–208.
  • Richardson, S. (2003). Spatial models in epidemiological applications. In Highly Structured Stochastic Systems ( P. Green, N. Hjort and S. Richardson, eds.) 237–259. Oxford Univ. Press, Oxford.
  • Ruppert, D., Wand, M. and Carroll, R. (2003). Semiparametric Regression. Cambridge Univ. Press, Cambridge.
  • Schabenberger, O. and Gotway, C. (2005). Statistical Methods for Spatial Data Analysis. Chapman & Hall, Boca Raton, FL.
  • Speckman, P. (1988). Kernel smoothing in partial linear models. J. Roy. Statist. Soc. Ser. B 50 413–436.
  • Wakefield, J. (2007). Disease mapping and spatial regression with count data. Biostatistics 8 158–183.
  • Waller, L. and Gotway, C. (2004). Applied Spatial Statistics for Public Health Data. Wiley, Hoboken, NJ.
  • Wood, S. (2006). Generalized Additive Models: An Introduction with R. Chapman & Hall, Boca Raton, FL.
  • Zeger, S., Dominici, F., McDermott, A. and Samet, J. (2007). Mortality in the Medicare population and chronic exposure to fine particulate air pollution. Technical Report 133, Dept. Biostatistics, Johns Hopkins Univ., Baltimore, MD. Available at http://www.bepress.com/jhubiostat/paper133.
  • Zeka, A., Melly, S. and Schwartz, J. (2008). The effects of socioeconomic status and indices of physical environment on reduced birth weight and preterm births in eastern Massachusetts. Environmental Health 7 60.