Statistical Science

The Importance of Scale for Spatial-Confounding Bias and Precision of Spatial Regression Estimators

Christopher J. Paciorek
Source: Statist. Sci. Volume 25, Number 1 (2010), 107-125.

Abstract

Residuals in regression models are often spatially correlated. Prominent examples include studies in environmental epidemiology to understand the chronic health effects of pollutants. I consider the effects of residual spatial structure on the bias and precision of regression coefficients, developing a simple framework in which to understand the key issues and derive informative analytic results. When unmeasured confounding introduces spatial structure into the residuals, regression models with spatial random effects and closely-related models such as kriging and penalized splines are biased, even when the residual variance components are known. Analytic and simulation results show how the bias depends on the spatial scales of the covariate and the residual: one can reduce bias by fitting a spatial model only when there is variation in the covariate at a scale smaller than the scale of the unmeasured confounding. I also discuss how the scales of the residual and the covariate affect efficiency and uncertainty estimation when the residuals are independent of the covariate. In an application on the association between black carbon particulate matter air pollution and birth weight, controlling for large-scale spatial variation appears to reduce bias from unmeasured confounders, while increasing uncertainty in the estimated pollution effect.

First Page: Show Hide
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ss/1280841736
Digital Object Identifier: doi:10.1214/10-STS326
Mathematical Reviews number (MathSciNet): MR2741817

References

Augustin, N., Lang, S., Musio, M. and von Wilpert, K. (2007). A spatial model for the needle losses of pine-trees in the forests of Baden–Wurttemberg: An application of Bayesian structured additive regression. J. Roy. Statist. Soc. Ser. C 56 29–50.
Beelen, R., Hoek, G., Fischer, P., Brandt, P. and Brunekreef, B. (2007). Estimated long-term outdoor air pollution concentrations in a cohort study. Atmospheric Environment 41 1343–1358.
Biggeri, A., Bonannini, M., Catelan, D., Divino, F., Dreassi, E. and Lagazio, C. (2005). Bayesian ecological regression with latent factors: Atmospheric pollutants, emissions, and mortality for lung cancer. Environmental and Ecological Statistics 12 397–409.
Mathematical Reviews (MathSciNet): MR2196329
Digital Object Identifier: doi:10.1007/s10651-005-1521-8
Bivand, R. (1980). A Monte Carlo study of correlation coefficient estimation with spatially autocorrelated observations. Quaestiones Geographicae 6 5–10.
Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88 9–25.
Buja, A., Hastie, T. and Tibshirani, R. (1989). Linear smoothers and additive models. Ann. Statist. 17 453–510.
Mathematical Reviews (MathSciNet): MR994249
Zentralblatt MATH: 0689.62029
Digital Object Identifier: doi:10.1214/aos/1176347115
Project Euclid: euclid.aos/1176347115
Burden, S., Guha, S., Morgan, G., Ryan, L., Sparks, R. and Young, L. (2005). Spatio-temporal analysis of acute admissions for ischemic heart disease in NSW, Australia. Environmental and Ecological Statistics 12 427–448.
Mathematical Reviews (MathSciNet): MR2196331
Digital Object Identifier: doi:10.1007/s10651-005-1517-4
Burnett, R., Ma, R., Jerrett, M., Goldberg, M., Cakmak, S., Pope III, C. and Krewski, D. (2001). The spatial association between community air pollution and mortality: A new method of analyzing correlated geographic cohort data. Environmental Health Perspectives 109 375–380.
Cakmak, S., Burnett, R., Jerrett, M., Goldberg, M., Pope III, C., Ma, R., Gultekin, T., Thun, M. and Krewski, D. (2003). Spatial regression models for large-cohort studies linking community air pollution and health. Journal of Toxicology and Environmental Health, Part A 66 1811–1823.
Cerdá, M., Tracy, M., Messner, S., Vlahov, D., Tardiff, K. and Galea, S. (2009). Misdemeanor policing, physical disorder, and gun-related homicide: A spatial analytic test of “broken-windows” theory. Epidemiology 20 533–541.
Cho, W. (2003). Contagion effects and ethnic contribution networks. American Journal of Political Science 47 368–387.
Claeskens, G., Krivobokova, T. and Opsomer, J. (2009). Asymptotic properties of penalized spline estimators. Biometrika 96 529–544.
Mathematical Reviews (MathSciNet): MR2538755
Zentralblatt MATH: 1170.62031
Digital Object Identifier: doi:10.1093/biomet/asp035
Clayton, D., Bernardinelli, L. and Montomoli, C. (1993). Spatial correlation in ecological analysis. International Journal of Epidemiology 22 1193–1202.
Cressie, N. (1993). Statistics for Spatial Data, rev. ed. Wiley-Interscience, New York.
Mathematical Reviews (MathSciNet): MR1239641
Zentralblatt MATH: 0799.62002
Diggle, P., Heagerty, P. J., Liang, K.-Y. and Zeger, S. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Univ. Press, Oxford.
Mathematical Reviews (MathSciNet): MR2049007
Zentralblatt MATH: 1031.62002
Dominici, F., McDermott, A. and Hastie, T. (2004). Improved semiparametric time series models of air pollution and mortality. J. Amer. Statist. Assoc. 99 938–949.
Mathematical Reviews (MathSciNet): MR2113312
Zentralblatt MATH: 1055.62132
Digital Object Identifier: doi:10.1198/016214504000000656
Dow, M., Burton, M. and White, D. (1982). Network autocorrelation: A simulation study of a foundational problem in regression and survey research. Social Networks 4 169–200.
Gryparis, A., Coull, B., Schwartz, J. and Suh, H. (2007). Latent variable semiparametric regression models for spatio-temporal modeling of mobile source pollution in the greater Boston area. J. Roy. Statist. Soc. Ser. C 56 183–209.
Mathematical Reviews (MathSciNet): MR2359241
Zentralblatt MATH: 05188764
Digital Object Identifier: doi:10.1111/j.1467-9876.2007.00573.x
Gryparis, A., Paciorek, C., Zeka, A., Schwartz, J. and Coull, B. (2009). Measurement error caused by spatial misalignment in environmental epidemiology. Biostatistics 10 258–274.
Gustafson, P. and Greenland, S. (2006). The performance of random coefficient regression in accounting for residual confounding. Biometrics 62 760–768.
Mathematical Reviews (MathSciNet): MR2247204
Digital Object Identifier: doi:10.1111/j.1541-0420.2005.00510.x
He, S., Mazumdar, S. and Arena, V. (2006). A comparative study of the use of GAM and GLM in air pollution research. Environmetrics 17 81–93.
Mathematical Reviews (MathSciNet): MR2222035
Digital Object Identifier: doi:10.1002/env.751
Hodges, J. and Reich, B. (2010). Adding spatially-correlated errors can mess up the fixed effect you love. Technical Report 2010-002, Division of Biostatistics, Univ. Minnesota. Available at http://www.biostat.umn.edu/ftp/pub/2010/rr2010-002.pdf.
Houseman, E., Coull, B. and Shine, J. (2006). A nonstationary negative binomial time series with time-dependent covariates: Enterococcus counts in Boston harbor. J. Amer. Statist. Assoc. 101 1365–1376.
Mathematical Reviews (MathSciNet): MR2307571
Zentralblatt MATH: 1171.62343
Digital Object Identifier: doi:10.1198/016214506000000627
Janes, H., Dominici, F. and Zeger, S. (2007). Trends in air pollution and mortality: An approach to the assessment of unmeasured confounding. Epidemiology 18 416–423.
Johnston, J. and DiNardo, J. (1997). Econometric Methods, 4th ed. McGraw-Hill, New York.
Lawson, A. (2006). Statistical Methods in Spatial Epidemiology, 2nd ed. Wiley, New York.
Mathematical Reviews (MathSciNet): MR2243369
Lee, D., Ferguson, C. and Mitchell, R. (2009). Air pollution and health in Scotland: A multicity study. Biostatistics 10 409–423.
Legendre, P. (1993). Spatial autocorrelation: Trouble or new paradigm? Ecology 74 1659–1673.
Lombardía, M. J. and Sperlich, S. (2007). Multi-level regression between fixed effects and mixed effects models. Technical report, Georg-August-Univ. Göttingen. Available at http:// www.zfs.uni-goettingen.de/index.php?id=54.
Lu, Y. and Zeger, S. (2007). Decomposition of regression estimators to explore the influence of “unmeasured” time-varying confounders. Technical Report 159, Dept. Biostatistics, Johns Hopkins Univ., Baltimore, MD. Available at http:// www.bepress.com/jhubiostat/paper159.
Molitor, J., Jerrett, M., Chang, C. et al. (2007). Assessing uncertainty in spatial exposure models for air pollution health effects assessment. Environmental Health Perspectives 115 1147–1153.
Peng, R., Dominici, F. and Louis, T. (2006). Model choice in time series studies of air pollution and mortality. J. Roy. Statist. Soc. Ser. A 169 179–203.
Mathematical Reviews (MathSciNet): MR2225539
Digital Object Identifier: doi:10.1111/j.1467-985X.2006.00410.x
Pope III, C., Burnett, R., Thun, M., Calle, E., Krewski, D., Ito, K. and Thurston, G. (2002). Lung cancer, cardiopulmonary mortality and long-term exposure to fine particulate air pollution. Journal of the American Medical Association 287 1132–1141.
Ramsay, T., Burnett, R. and Krewski, D. (2003). Exploring bias in a generalized additive model for spatial air pollution data. Environmental Health Perspectives 111 1283–1288.
Reich, B., Hodges, J. and Zadnik, V. (2006). Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62 1197–1206.
Mathematical Reviews (MathSciNet): MR2307445
Digital Object Identifier: doi:10.1111/j.1541-0420.2006.00617.x
Rice, J. (1986). Convergence rate for partially linear splined models. Statist. Probab. Lett. 4 203–208.
Mathematical Reviews (MathSciNet): MR848718
Richardson, S. (2003). Spatial models in epidemiological applications. In Highly Structured Stochastic Systems ( P. Green, N. Hjort and S. Richardson, eds.) 237–259. Oxford Univ. Press, Oxford.
Mathematical Reviews (MathSciNet): MR2082412
Ruppert, D., Wand, M. and Carroll, R. (2003). Semiparametric Regression. Cambridge Univ. Press, Cambridge.
Mathematical Reviews (MathSciNet): MR1998720
Schabenberger, O. and Gotway, C. (2005). Statistical Methods for Spatial Data Analysis. Chapman & Hall, Boca Raton, FL.
Mathematical Reviews (MathSciNet): MR2134116
Zentralblatt MATH: 1068.62096
Speckman, P. (1988). Kernel smoothing in partial linear models. J. Roy. Statist. Soc. Ser. B 50 413–436.
Mathematical Reviews (MathSciNet): MR970977
Wakefield, J. (2007). Disease mapping and spatial regression with count data. Biostatistics 8 158–183.
Waller, L. and Gotway, C. (2004). Applied Spatial Statistics for Public Health Data. Wiley, Hoboken, NJ.
Mathematical Reviews (MathSciNet): MR2075123
Zentralblatt MATH: 1057.62106
Wood, S. (2006). Generalized Additive Models: An Introduction with R. Chapman & Hall, Boca Raton, FL.
Mathematical Reviews (MathSciNet): MR2206355
Zeger, S., Dominici, F., McDermott, A. and Samet, J. (2007). Mortality in the Medicare population and chronic exposure to fine particulate air pollution. Technical Report 133, Dept. Biostatistics, Johns Hopkins Univ., Baltimore, MD. Available at http://www.bepress.com/jhubiostat/paper133.
Zeka, A., Melly, S. and Schwartz, J. (2008). The effects of socioeconomic status and indices of physical environment on reduced birth weight and preterm births in eastern Massachusetts. Environmental Health 7 60.

2013 © Institute of Mathematical Statistics

Statistical Science

Statistical Science

Turn MathJax Off
What is MathJax?