Bayesian Analysis

Spatial Disease Mapping Using Directed Acyclic Graph Auto-Regressive (DAGAR) Models

Abhirup Datta, Sudipto Banerjee, James S. Hodges, and Leiwen Gao

Full-text: Open access

Abstract

Hierarchical models for regionally aggregated disease incidence data commonly involve region specific latent random effects that are modeled jointly as having a multivariate Gaussian distribution. The covariance or precision matrix incorporates the spatial dependence between the regions. Common choices for the precision matrix include the widely used ICAR model, which is singular, and its nonsingular extension which lacks interpretability. We propose a new parametric model for the precision matrix based on a directed acyclic graph (DAG) representation of the spatial dependence. Our model guarantees positive definiteness and, hence, in addition to being a valid prior for regional spatially correlated random effects, can also directly model the outcome from dependent data like images and networks. Theoretical results establish a link between the parameters in our model and the variance and covariances of the random effects. Simulation studies demonstrate that the improved interpretability of our model reaps benefits in terms of accurately recovering the latent spatial random effects as well as for inference on the spatial covariance parameters. Under modest spatial correlation, our model far outperforms the CAR models, while the performances are similar when the spatial correlation is strong. We also assess sensitivity to the choice of the ordering in the DAG construction using theoretical and empirical results which testify to the robustness of our model. We also present a large-scale public health application demonstrating the competitive performance of the model.

Article information

Source
Bayesian Anal., Volume 14, Number 4 (2019), 1221-1244.

Dates
First available in Project Euclid: 3 October 2019

Permanent link to this document
https://projecteuclid.org/euclid.ba/1570068455

Digital Object Identifier
doi:10.1214/19-BA1177

Mathematical Reviews number (MathSciNet)
MR4044851

Zentralblatt MATH identifier
07159874

Keywords
areal data Bayesian inference directed acyclic graphs disease mapping spatial autoregression

Rights
Creative Commons Attribution 4.0 International License.

Citation

Datta, Abhirup; Banerjee, Sudipto; Hodges, James S.; Gao, Leiwen. Spatial Disease Mapping Using Directed Acyclic Graph Auto-Regressive (DAGAR) Models. Bayesian Anal. 14 (2019), no. 4, 1221--1244. doi:10.1214/19-BA1177. https://projecteuclid.org/euclid.ba/1570068455


Export citation

References

  • Assuncao, R. and Krainski, E. (2009). “Neighborhood dependence in Bayesian spatial models.” Biometrical Journal, 51: 851–869.
  • Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2014). Hierarchical Modeling and Analysis for Spatial Data. Boca Raton, FL: Chapman & Hall/CRC, second edition.
  • Basseville, M., Benveniste, A., Chou, K. C., Golden, S. A., Nikoukhah, R., and Willsky, A. S. (2006). “Modeling and Estimation of Multiresolution Stochastic Processes.” IEEE Transactions on Information Theory, 38(2): 766–784. URL http://dx.doi.org/10.1109/18.119735
  • Besag, J. (1974). “Spatial interaction and statistical analysis of lattice systems.” Journal of the Royal Statistical Society, Series B, 36: 192–225.
  • Besag, J. and Higdon, D. (1999). “Bayesian analysis of agricultural field experiments.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(4): 691–746.
  • Besag, J. and Kooperberg, C. (1995). “On conditional and intrinsic autoregressions.” Biometrika, 82: 733–746.
  • Besag, J. and Mondal, D. (2005). “First-order intrinsic autoregressions and the de Wijs process.” Biometrika, 92(4): 909–920.
  • Bickel, P. J. and Levina, E. (2008a). “Covariance regularization by thresholding.” The Annals of Statistics, 36(6): 2577–2604.
  • Bickel, P. J. and Levina, E. (2008b). “Regularized estimation of large covariance matrices.” The Annals of Statistics, 36(1): 199–227.
  • Cai, T. T., Zhang, C.-H., and Zhou, H. H. (2010). “Optimal rates of convergence for covariance matrix estimation.” The Annals of Statistics, 38(4): 2118–2144.
  • Clayton, D. G. and Bernardinelli, L. (1992). “Bayesian Methods for Mapping Disease Risk.” In Elliott, P., Cuzick, J., English, D., and Stern, R. (eds.), Geographical and Environmental Epidemiology: Methods for Small-Area Studies, 205–220. Oxford University Press.
  • Cressie, N. and Davidson, J. L. (1998). “Image analysis with partially ordered Markov models.” Computational Statistics and Data Analysis, 29(1): 1– 26. URL http://www.sciencedirect.com/science/article/pii/S0167947398000528
  • Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E. (2016). “Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets.” Journal of the American Statistical Association, 111(514): 800–812.
  • Datta, A., Banerjee, S., Finley, A. O., Hamm, N. A. S., and Schaap, M. (2016). “Nonseparable dynamic nearest neighbor Gaussian process models for large spatio-temporal data with an application to particulate matter analysis.” The Annals of Applied Statistics, 10(3): 1286–1316.
  • Datta, A., Banerjee, S., Hodges, J. S., and Gao, L. (2019). “Supplement to “Spatial disease mapping using directed acyclic graph auto-regressive (DAGAR) models”.” Bayesian Analysis.
  • El Karoui, N. (2008). “Operator norm consistent estimation of large-dimensional sparse covariance matrices.” The Annals of Statistics, 36(6): 2717–2756.
  • Finley, A. O., Datta, A., Cook, B. C., Morton, D. C., Andersen, H. E., and Banerjee, S. (2017). “Applying Nearest Neighbor Gaussian Processes to Massive Spatial Data Sets: Forest Canopy Height Prediction Across Tanana Valley Alaska.” https://arxiv.org/pdf/1702.00434.pdf.
  • Friedman, J., Hastie, T., and Tibshirani, R. (2007). “Sparse inverse covariance estimation with the graphical lasso.” Biostatistics, 9: 432–441.
  • Gelfand, A. E. and Vounatsou, P. (2003). “Proper multivariate conditional autoregressive models for spatial data analysis.” Biostatistics, 4(1): 11. URL http://dx.doi.org/10.1093/biostatistics/4.1.11
  • Hinton, G. E. (2002). “Training products of experts by minimizing contrastive divergence.” Neural Computation, 14: 1711–1800.
  • Hughes, J. and Cui, X. (2018). ngspatial: Fitting the Centered Autologistic and Sparse Spatial Generalized Linear Mixed Models for Areal Data. Denver, CO. R package version 1.2-1.
  • Hughes, J. and Haran, M. (2013). “Dimension reduction and alleviation of confounding for spatial generalized linear mixed models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(1): 139–159.
  • Leroux, B. G., Lei, X., and Breslow, N. (2000). “Estimation of Disease Rates in Small Areas: A new Mixed Model for Spatial Dependence.” In Halloran, M. E. and Berry, D. (eds.), Statistical Models in Epidemiology, the Environment, and Clinical Trials, 179–191. New York, NY: Springer New York.
  • MacNab, Y. and Dean, C. (2000). “Parametric bootstrap and penalized quasi-likelihood inference in conditional autoregressive models.” Statistis in Medicine, 19: 15–30.
  • Martinez-Beneito, M. A. (2013). “A general modelling framework for multivariate disease mapping.” Biometrika, 100(3): 539.
  • Martinez-Beneito, M. A., Botella-Rocamora, P., and Banerjee, S. (2017). “Towards a Multidimensional Approach to Bayesian Disease Mapping.” Bayesian Analysis, 12(1): 239–259.
  • Meinshausen, N. and Buhlmann, P. (2006). “High-dimensional graphs and variable selection with the Lasso.” The Annals of Statistics, 34(3): 1436–1462.
  • Rothman, A. J., Levina, E., and Zhu, J. (2009). “Generalized Thresholding of Large Covariance Matrices.” Journal of the American Statistical Association, 104(485): 177–186.
  • Sørbye, S. H. and Rue, H. (2014). “Scaling intrinsic Gaussian Markov random field priors in spatial modelling.” Spatial Statistics, 8: 39–51.
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and van der Linde, A. (2002). “Bayesian Measures of Model Complexity and Fit.” Journal of the Royal Statistical Society, Series B, 64: 583–639.
  • Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. New York, NY: Springer, first edition.
  • Stein, M. L., Chi, Z., and Welty, L. J. (2004). “Approximating Likelihoods for Large Spatial Data Sets.” Journal of the Royal Statistical Society, Series B, 66: 275–296.
  • Sudderth, E. B. (2002). “Embedded Trees: Estimation of Gaussian Processes on Graphs with Cycles.” http://cs.brown.edu/~sudderth/papers/sudderthMasters.pdf.
  • Vecchia, A. V. (1988). “Estimation and Model Identification for Continuous Spatial Processes.” Journal of the Royal Statistical Society, Series B, 50: 297–312.
  • Wall, M. (2004). “A close look at the spatial structure implied by the CAR and SAR models.” Journal of Statistical Planning and Inference, 121: 311–324.
  • Whittle, P. (1954). “On Stationary Processes in the Plane.” Biometrika, 41(3/4): 434–449. URL http://www.jstor.org/stable/2332724
  • Wu, W. and Pourahmadi, M. (2003). “Nonparametric Estimation of Large Covariance Matrices of Longitudinal Data.” Biometrika, 90(4): 831–844. URL http://www.jstor.org/stable/30042091
  • Xue, L., Ma, S., and Zou, H. (2012). “Positive-Definite $\ell _{1}$-Penalized Estimation of Large Covariance Matrices.” Journal of the American Statistical Association, 107(500): 1480–1491.

Supplemental materials