Bayesian Analysis

High-Dimensional Bayesian Geostatistics

Sudipto Banerjee

Full-text: Open access

Abstract

With the growing capabilities of Geographic Information Systems (GIS) and user-friendly software, statisticians today routinely encounter geographically referenced data containing observations from a large number of spatial locations and time points. Over the last decade, hierarchical spatiotemporal process models have become widely deployed statistical tools for researchers to better understand the complex nature of spatial and temporal variability. However, fitting hierarchical spatiotemporal models often involves expensive matrix computations with complexity increasing in cubic order for the number of spatial locations and temporal points. This renders such models unfeasible for large data sets. This article offers a focused review of two methods for constructing well-defined highly scalable spatiotemporal stochastic processes. Both these processes can be used as “priors” for spatiotemporal random fields. The first approach constructs a low-rank process operating on a lower-dimensional subspace. The second approach constructs a Nearest-Neighbor Gaussian Process (NNGP) that ensures sparse precision matrices for its finite realizations. Both processes can be exploited as a scalable prior embedded within a rich hierarchical modeling framework to deliver full Bayesian inference. These approaches can be described as model-based solutions for big spatiotemporal datasets. The models ensure that the algorithmic complexity has n floating point operations (flops), where n the number of spatial locations (per iteration). We compare these methods and provide some insight into their methodological underpinnings.

Article information

Source
Bayesian Anal. Volume 12, Number 2 (2017), 583-614.

Dates
First available in Project Euclid: 16 May 2017

Permanent link to this document
https://projecteuclid.org/euclid.ba/1494921642

Digital Object Identifier
doi:10.1214/17-BA1056R

Keywords
Bayesian statistics Gaussian process low rank Gaussian process Nearest Neighbor Gaussian process (NNGP) predictive process sparse Gaussian process spatiotemporal statistics

Rights
Creative Commons Attribution 4.0 International License.

Citation

Banerjee, Sudipto. High-Dimensional Bayesian Geostatistics. Bayesian Anal. 12 (2017), no. 2, 583--614. doi:10.1214/17-BA1056R. https://projecteuclid.org/euclid.ba/1494921642


Export citation

References

  • Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis. New York, NY: Wiley, third edition.
  • Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2014). Hierarchical Modeling and Analysis for Spatial Data. Boca Raton, FL: Chapman & Hall/CRC, second edition.
  • Banerjee, S., Finley, A. O., Waldmann, P., and Ericcson, T. (2010). “Hierarchical spatial process models for multiple traits in large genetic trials.” Journal of the American Statistical Association, 105: 506–521.
  • Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H. (2008). “Gaussian predictive process models for large spatial datasets.” Journal of the Royal Statistical Society, Series B, 70: 825–848.
  • Barry, R. and Ver Hoef, J. (1996). “Blackbox kriging: Spatial prediction without specifying variogram models.” Journal of Agricultural, Biological and Environmental Statistics, 1: 297–322.
  • Bevilacqua, M. and Gaetan, C. (2014). “Comparing composite likelihood methods based on pairs for spatial Gaussian random fields.” Statistics and Computing, 1–16.
  • Bishop, C. (2006). Pattern Recognition and Machine Learning. New York, NY: Springer-Verlag.
  • Brooks, S., Gelman, A., Jones, G. L., and Meng, X.-L. (2011). Handbook of Markov Chain Monte Carlo. Boca Raton, FL: CRC Press.
  • Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., and Riddell, A. (2017). “Stan: A probabilistic programming language.” Journal of Statistical Software, 76(1): 1–32. https://www.jstatsoft.org/index.php/jss/article/view/v076i01.
  • Castrignanó, A., Cherubini, C., Giasi, C., Castore, M., Mucci, G. D., and Molinari, M. (2005). “Using Multivariate Geostatistics for Describing Spatial Relationships among some Soil Properties.” In ISTRO Conference Brno.
  • Chilés, J. and Delfiner, P. (1999). Geostatistics: Modeling Spatial Uncertainty. John Wiley: New York.
  • Crainiceanu, C. M., Diggle, P. J., and Rowlingson, B. (2008). “Bivariate binomial spatial modeling of Loa Loa prevalence in tropical Africa.” Journal of the American Statistical Association, 103: 21–37.
  • Cressie, N. (1993). Statistics for Spatial Data. Wiley-Interscience, revised edition.
  • Cressie, N. and Johannesson, G. (2008). “Fixed rank kriging for very large data sets.” Journal of the Royal Statistical Society, Series B, 70: 209–226.
  • Cressie, N., Shi, T., and Kang, E. L. (2010). “Fixed rank filtering for spatio-temporal data.” Journal of Computational and Graphical Statistics, 19: 724–745.
  • Cressie, N. A. C. and Wikle, C. K. (2011). Statistics for Spatio-temporal Data. Wiley series in probability and statistics. Hoboken, N.J. Wiley. http://opac.inria.fr/record=b1133266
  • Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E. (2016a). “Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets.” Journal of the American Statistical Association, 111: 800–812.
  • Datta, A., Banerjee, S., Finley, A. O., Hamm, N. A. S., and Schaap, M. (2016b). “Non-separable dynamic nearest-neighbor Gaussian process models for large spatio-temporal data with an application to particulate matter analysis.” Annals of Applied Statistics, 10: 1286–1316.
  • de Valpine, P., Turek, D., Paciorek, C., Anderson-Bergman, C., Temple Lang, D., and Bodik, R. (2017). “Programming with models: Writing statistical algorithms for general model structures with NIMBLE.” Journal of Computational and Graphical Statistics, 26: 403–413.
  • Du, J., Zhang, H., and Mandrekar, V. S. (2009). “Fixed-domain asymptotic properties of tapered maximum likelihood estimators.” Annals of Statistics, 37: 3330–3361.
  • Eidsvik, J., Shaby, B. A., Reich, B. J., Wheeler, M., and Niemi, J. (2014). “Estimation and prediction in spatial models with block composite likelihoods.” Journal of Computational and Graphical Statistics, 23: 295–315.
  • Emery, X. (2009). “The kriging update equations and their application to the selection of neighboring data.” Computational Geosciences, 13(3): 269–280. http://dx.doi.org/10.1007/s10596-008-9116-8.
  • Finley, A. O., Banerjee, S., and Gelfand, A. E. (2015). “spBayes for large univariate and multivariate point-referenced spatio-temporal data models.” Journal of Statistical Software, 63(13): 1–28. http://www.jstatsoft.org/v63/i13/.
  • Finley, A. O., Banerjee, S., and McRoberts, R. E. (2009a). “Hierarchical spatial models for predicting tree species assemblages across large domains.” Annals of Applied Statistics, 3(3): 1052–1079.
  • Finley, A. O., Sang, H., Banerjee, S., and Gelfand, A. E. (2009b). “Improving the performance of predictive process modeling for large datasets.” Computational Statistics and Data Analysis, 53(8): 2873–2884.
  • Fuentes, M. (2007). “Approximate likelihood for large irregularly spaced spatial data.” Journal of the American Statistical Association, 102(477): 321–331. http://dx.doi.org/10.1198/016214506000000852.
  • Furrer, R., Genton, M. G., and Nychka, D. (2006). “Covariance tapering for interpolation of large spatial datasets.” Journal of Computational and Graphical Statistics, 15: 503–523.
  • Gelfand, A., Diggle, P., Fuentes, M., and Guttorp, P. (2010). Handbook of Spatial Statistics. Boca Raton, FL: CRC Press.
  • Gelfand, A. E., Schmidt, A. M., Banerjee, S., and Sirmans, C. F. (2004). “Nonstationary multivariate process modeling through spatially varying coregionalization.” TEST, 13(2): 263–312.
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis, 3rd Edition. Chapman & Hall/CRC Texts in Statistical Science. Chapman & Hall/CRC.
  • Gneiting, T. and Guttorp, P. (2010). “Continuous-parameter Spatio-temporal Processes.” In Gelfand, A., Diggle, P., Fuentes, M., and Guttorp, P. (eds.), Handbook of Spatial Statistics, 427–436.
  • Gramacy, R. (2016). “laGP: Large-scale spatial modeling via local approximate Gaussian processes in R.” Journal of Statistical Software, 72(1): 1–46. https://www.jstatsoft.org/index.php/jss/article/view/v072i01
  • Gramacy, R. B. and Apley, D. W. (2015). “Local Gaussian process approximation for large computer experiments.” Journal of Computational and Graphical Statistics, 24(2): 561–578. http://dx.doi.org/10.1080/10618600.2014.914442.
  • Gramacy, R. B. and Haaland, B. (2016). “Speeding up neighborhood search in local Gaussian process prediction.” Technometrics, 58(3): 294–303. http://dx.doi.org/10.1080/00401706.2015.1027067.
  • Gramacy, R. B. and Lee, H. K. H. (2008). “Bayesian treed Gaussian process models with an application to computer modeling.” Journal of the American Statistical Association, 103(483): 1119–1130. http://dx.doi.org/10.1198/016214508000000689.
  • Guinness, J. (2016). “Permutation Methods for Sharpening Gaussian Process Approximations.” https://arxiv.org/abs/1609.05372.
  • Guyon, X. (1995). Random Fields on a Network: Modeling, Statistics, and Applications. New York: Springer-Verlag.
  • Higdon, D. (1998). “A process-convolution approach to modeling temperatures in the north Atlantic Ocean.” Environmental and Ecological Statistics, 5: 173–190.
  • Higdon, D. (2002a). “Space and Space Time Modeling using Process Convolutions.” In Anderson, C., Barnett, V., Chatwin, P., and El-Shaarawi, A. (eds.), Quantitative Methods for Current Environmental Issues, 37–56. Springer.
  • Higdon, D. (2002b). “Space and Space Time Modeling using Process Convolutions.” In Anderson, C., Barnett, V., Chatwin, P., and El-Shaarawi, A. (eds.), Quantitative Methods for Current Environmental Issues, 37–56. Springer.
  • Higdon, D., Swall, J., and Kern, J. (1999). “Non-stationary spatial modeling.” In Bernardo, J., Berger, J., Dawid, A., and Smith, A. (eds.), Bayesian Statistics 6, 761–768. Oxford: Oxford University Press.
  • Hodges, J. S. (2013). Richly Parameterized Linear Models: Additive, Time Series, and Spatial Models Using Random Effects. Chapman & Hall/CRC Texts in Statistical Science. Boca Raton, FL: Chapman & Hall/CRC.
  • Hoffman, M. D. and Gelman, A. (2014). “The No U-Turn Sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research, 15: 1593–1623.
  • Kammann, E. E. and Wand, M. P. (2003). “Geoadditive models.” Applied Statistics, 52: 1–18.
  • Katzfuss, M. (2013). “Bayesian nonstationary modeling for very large spatial datasets.” Environmetrics, 24: 189–200.
  • Katzfuss, M. (2017). “A multi-resolution approximation for massive spatial datasets.” Journal of the American Statistical Association, 112: 201–214. http://dx.doi.org/10.1080/01621459.2015.1123632.
  • Katzfuss, M. and Cressie, N. (2012). “Bayesian hierarchical spatio-temporal smoothing for very large datasets.” Environmetrics, 23: 94–107.
  • Kaufman, C. G., Scheverish, M. J., and Nychka, D. W. (2008). “Covariance tapering for likelihood-based estimation in large spatial data sets.” Journal of the American Statistical Association, 103: 1545–1555.
  • Lark, R. and Papritz, A. (2003). “Fitting a linear model of coregionalization for soil properties using simulated annealing.” Geoderma, 115: 245–260.
  • Lauritzen, S. L. (1996). Graphical Models. Oxford, United Kingdom: Clarendon Press.
  • Lemos, R. and Sansó, B. (2009). “A spatio-temporal model for mean, anomaly and trend fields of North Atlantic Sea surface temperature (with discussion).” Journal of the American Statistical Association, 104: 5–25.
  • Lindgren, F., Rue, H., and Lindstrom, J. (2011). “An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(4): 423–498.
  • Lindley, D. and Smith, A. (1972). “Bayes estimates for the linear model.” Journal of the Royal Statistical Society, Series B, 34: 1–41.
  • Lopes, H. F., Salazar, E., and Gamerman, D. (2008). “Spatial dynamic factor analysis.” Bayesian Analysis, 3(4): 759–792.
  • Lopes, H. F. and West, M. (2004). “Bayesian model assessment in factor analysis.” Statistica Sinica, 14: 41–67.
  • Matheron, G. (1982). “Pour une Analyse Krigeante des Donnes Regionalises.” Centre de Geostatistique, N 732.
  • Moller, J. and Waagepetersen, R. P. (2003). Statistical Inference and Simulation for Spatial Point Processes. Chapman and Hall, first edition.
  • Murphy, K. (2012). Machine Learning: A probabilistic perspective. Cambridge, MA: The MIT Press.
  • Neal, R. (2011). “MCMC using Hamiltonian Dynamics.” In Brooks, S., Gelman, A., Jones, G. L., and Meng, X.-L. (eds.), Handbook of Markov Chain Monte Carlo, 113–162. Boca Raton, FL: CRC Press.
  • Nychka, D., Bandyopadhyay, S., Hammerling, D., Lindgren, F., and Sain, S. (2015). “A multiresolution Gaussian process model for the analysis of large spatial datasets.” Journal of Computational and Graphical Statistics, 24(2): 579–599. http://dx.doi.org/10.1080/10618600.2014.914946.
  • Nychka, D., Wikle, C., and Royle, J. A. (2002). “Multiresolution models for nonstationary spatial covariance functions.” Statistical Modelling, 2(4): 315–331.
  • Omidi, M. and Mohammadzadeh, M. (2015). “A new method to build spatio-temporal covariance functions: Analysis of ozone data.” Statistical Papers, 1–15.
  • Paciorek, C. J. and Schervish, M. J. (2006). “Spatial modelling using a new class of nonstationary covariance functions.” Environmetrics, 483–506.
  • Quinoñero, C. and Rasmussen, C. (2005). “A unifying view of sparse approximate Gaussian process regression.” Journal of Machine Learning Research, 6: 1939–1959.
  • Rasmussen, C. E. and Williams, C. K. I. (2005). Gaussian Processes for Machine Learning. Cambridge, MA: The MIT Press, first edition.
  • Ren, Q. and Banerjee, S. (2013). “Hierarchical factor models for large spatially misaligned datasets: A low-rank predictive process approach.” Biometrics, 69: 19–30.
  • Robert, C. and Casella, G. (2004). Monte Carlo Statistical Methods. Boca Raton, FL: CRC Press, second edition.
  • Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Monographs on statistics and applied probability. Boca Raton, FL: Chapman & Hall/CRC. http://opac.inria.fr/record=b1119989
  • Rue, H., Martino, S., and Chopin, N. (2009). “Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2): 319–392. http://dx.doi.org/10.1111/j.1467-9868.2008.00700.x.
  • Ruppert, W. M., D. and Carroll, R. (2003). Semiparametric Regression. Cambridge, United Kingdom: Cambridge University Press.
  • Sang, H. and Huang, J. Z. (2012). “A full scale approximation of covariance functions for large spatial data sets.” Journal of the Royal Statistical Society, Series B, 74: 111–132.
  • Sang, H., Jun, M., and Huang, J. (2011). “Covariance approximation for large multivariate spatial datasets with an application to multiple climate model errors.” Annals of Applied Statistics, 4: 2519–2548.
  • Sansó, B., Schmidt, A., and Nobre, A. (2008). “Spatio-temporal models based on discrete convolutions.” Canadian Journal of Statistics, 36: 239–258.
  • Schabenberger, O. and Gotway, C. A. (2004). Statistical Methods for Spatial Data Analysis. Chapman and Hall/CRC, first edition.
  • Schmidt, A. M. and Gelfand, A. E. (2003). “A Bayesian coregionalization approach for multivariate pollutant data.” Journal of Geophysical Research, 108: D24.
  • Shaby, B. A. and Ruppert, D. (2012). “Tapered covariance: Bayesian estimation and asymptotics.” Journal of Computational and Graphical Statistics, 21: 433–452.
  • Shi, T. and Cressie, N. (2007). “Global Statistical analysis of MISR aerosol data: A massive data product from NASA’s Terra satellite.” Environmetrics, 18: 665–680.
  • Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer, first edition.
  • Stein, M. L. (2007). “Spatial variation of total column ozone on a global scale.” Annals of Applied Statistics, 1: 191–210.
  • Stein, M. L. (2008). “A modeling approach for large spatial datasets.” Journal of the Korean Statistical Society, 37: 3–10.
  • Stein, M. L. (2013). “On a class of space-time intrinsic random functions.” Bernoulli, 19(2): 387–408. http://dx.doi.org/10.3150/11-BEJ405.
  • Stein, M. L. (2014). “Limitations on low rank approximations for covariance matrices of spatial data.” Spatial Statistics, 8: 1–19.
  • Stein, M. L., Chi, Z., and Welty, L. J. (2004). “Approximating likelihoods for large spatial data sets.” Journal of the Royal Statistical Society, Series B, 66: 275–296.
  • Stroud, J. R., Stein, M. L., and Lysen, S. (2017). “Bayesian and Maximum Likelihood Estimation for Gaussian Processes on an Incomplete Lattice.” Journal of Computational and Graphical Statistics, 26: 108–120.
  • Sun, Y., Li, B., and Genton, M. (2011). “Geostatistics for large datasets.” In Montero, J., Porcu, E., and Schlather, M. (eds.), Advances and Challenges in Space-time Modelling of Natural Events, 55–77. Berlin Heidelberg: Springer-Verlag.
  • Vecchia, A. V. (1988). “Estimation and model identification for continuous spatial processes.” Journal of the Royal Statistical Society, Series B, 50: 297–312.
  • Vecchia, A. V. (1992). “A new method of prediction for spatial regression models with correlated errors.” Journal of the Royal Statistical Society, Series B, 54: 813–830.
  • Wang, F. and Wall, M. M. (2003). “Generalized common spatial factor model.” Biostatistics, 4(4): 569–582. http://dx.doi.org/10.1093/biostatistics/4.4.569.
  • Whittle, P. (1954). “On stationary processes in the plane.” Biometrika, 41: 434–449.
  • Wikle, C. and Cressie, N. (1999). “A dimension reduced approach to space-time Kalman filtering.” Biometrika, 86: 815–829.
  • Wikle, C. K. (2010). “Low-Rank Representations for Spatial Processes.” Handbook of Spatial Statistics, 107–118. Gelfand, A. E., Diggle, P., Fuentes, M. and Guttorp, P., editors, Chapman and Hall/CRC, pp. 107–118.
  • Zhang, H. (2007). “Maximum-likelihood estimation for multivariate spatial linear coregionalization models.” Environmetrics, 18: 125–139.