Statistics Surveys

A comparison of spatial predictors when datasets could be very large

Jonathan R. Bradley, Noel Cressie, and Tao Shi

Full-text: Open access

Abstract

In this article, we review and compare a number of methods of spatial prediction, where each method is viewed as an algorithm that processes spatial data. To demonstrate the breadth of available choices, we consider both traditional and more-recently-introduced spatial predictors. Specifically, in our exposition we review: traditional stationary kriging, smoothing splines, negative-exponential distance-weighting, fixed rank kriging, modified predictive processes, a stochastic partial differential equation approach, and lattice kriging. This comparison is meant to provide a service to practitioners wishing to decide between spatial predictors. Hence, we provide technical material for the unfamiliar, which includes the definition and motivation for each (deterministic and stochastic) spatial predictor. We use a benchmark dataset of $\mathrm{CO}_{2}$ data from NASA’s AIRS instrument to address computational efficiencies that include CPU time and memory usage. Furthermore, the predictive performance of each spatial predictor is assessed empirically using a hold-out subset of the AIRS data.

Article information

Source
Statist. Surv. Volume 10 (2016), 100-131.

Dates
Received: October 2014
First available in Project Euclid: 19 July 2016

Permanent link to this document
https://projecteuclid.org/euclid.ssu/1468952015

Digital Object Identifier
doi:10.1214/16-SS115

Mathematical Reviews number (MathSciNet)
MR3527662

Zentralblatt MATH identifier
1347.62083

Subjects
Primary: 62H11: Directional data; spatial statistics
Secondary: 62P12: Applications to environmental and related topics

Keywords
Best linear unbiased predictor GIS massive data reduced rank statistical models model selection

Citation

Bradley, Jonathan R.; Cressie, Noel; Shi, Tao. A comparison of spatial predictors when datasets could be very large. Statist. Surv. 10 (2016), 100--131. doi:10.1214/16-SS115. https://projecteuclid.org/euclid.ssu/1468952015.


Export citation

References

  • [1] Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2004). Hierarchical Modeling and Analysis for Spatial Data. London, UK: Chapman and Hall.
  • [2] Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H. (2008). “Gaussian predictive process models for large spatial data sets.” Journal of the Royal Statistical Society Series B, 70, 825–848.
  • [3] Bradley, J. R., Cressie, N., and Shi, T. (2011). “Selection of rank and basis functions in the Spatial Random Effects model.” In Proceedings of the 2011 Joint Statistical Meetings, 3393–3406. Alexandria, VA: American Statistical Association.
  • [4] Bradley, J. R., Cressie, N., and Shi, T. (2015). “Comparing and selecting spatial predictors using local criteria (with discussion).” TEST, 24, 1–28 (Rejoinder: pp. 54 – 60).
  • [5] Bradley, J. R., Holan, S. H., and Wikle, C. K. (2015). “Multivariate spatio- temporal models for high-dimensional areal data with application to Longitudinal Employer-Household Dynamics.” The Annals of Applied Statistics, 9, 1761–1791.
  • [6] Campbell, J. B. (2010). “Improving lead generation success through integrated methods: transcending ‘drug discovery by numbers’.” IDrugs: the Investigational Drugs Journal, 21, 62–71.
  • [7] Chahine, M. T., Pagano, T. S., Aumann, H. H., Atlas, R., Barnet, C., Blaisdell, J., Chen, L., Divakarla, M., Fetzer, E. J., Goldberg, M., Gautier, C., Granger, S., Hannon, S., Irion, F. W., Kakar, R., Kalnay, E., Lambrigtsen, B. H., Lee, S. Y., Marshall, J. L., McMillian, W. W., McMillin, L., Olsen, E. T., Revercomb, H., Rosenkranz, P., Smith, W. L., Staelin, D., Strow, L. L., Susskind, J., Tobin, D., Wolf, W., and Zhou, L. (2006). “AIRS: Improving weather forecasting and providing new data on greenhouse gases.” Bulletin of the American Meteorological Society, 87, 911–926.
  • [8] Chiles, J. P. and Delfiner, P. (1999). Geostatistics: Modeling Spatial Uncertainty. Hoboken, NJ: Wiley.
  • [9] Cressie, N. (1988). “Spatial prediction and ordinary kriging.” Mathematical Geology, 20, 405–421.
  • [10] Cressie, N. (1990). “The origins of kriging.” Mathematical Geology, 22, 239–252.
  • [11] Cressie, N. (1993). Statistics for Spatial Data, rev. edn. New York, NY: Wiley.
  • [12] Cressie, N. and Johannesson, G. (2006). “Spatial prediction for massive data sets.” In Australian Academy of Science Elizabeth and Frederick White Conference, 1–11. Canberra: Australian Academy of Science.
  • [13] Cressie, N. and Johannesson, G. (2008). “Fixed rank kriging for very large spatial data sets.” Journal of the Royal Statistical Society, Series B, 70, 209–226.
  • [14] Cressie, N., Shi, T., and Kang, E. L. (2010). “Fixed Rank Filtering for spatio-temporal data.” Journal of Computational and Graphical Statistics, 19, 724–745.
  • [15] Cressie, N. and Wikle, C. K. (2011). Statistics for Spatio-Temporal Data. Hoboken, NJ: Wiley.
  • [16] Finley, A. O. and Banerjee, S. (2013). spBayes: Univariate and Multivariate Spatial-temporal Modeling. R package version 0.3-8.
  • [17] Finley, A. O., Sang, H., Banerjee, S., and Gelfand, A. E. (2009). “Improving the performance of predictive process modeling for large datasets.” Computational Statistics and Data Analysis, 53, 2873–2884.
  • [18] Gandin, L. S. (1965). Objective Analysis of Meteorological Fields: Gidrometeorologicheskoe Izdatel’stvo (GIMIZ), Leningrad. Jerusalem: translated by Israel Program for Scientific Translations.
  • [19] Gauss, C. (1809). Theoria motus corporum celestium: Perthes et Besser, Hamburg. Translated as “Theory of motion of the heavenly bodies moving about the sun in conic sections”, trans. C. H. Davis. Boston, MA: Little, Brown.
  • [20] Gneiting, T. and Raftery, A. (2007). “Strictly proper scoring rules, prediction, and estimation.” Journal of the American Statistical Association, 102, 359–378.
  • [21] Hammerling, D. M., Michalak, A. M., and Kawa, S. R. (2012). “Mapping of CO2 at high spatiotemporal resolution using satellite observations: Global distributions from OCO-2.” Journal of Geophysical Research, 117, 1–10.
  • [22] Henderson, H. V. and Searle, S. R. (1981). “On deriving the inverse of a sum of matrices.” SIAM Review, 23, 53–60.
  • [23] Hormozi, A. M. and Giles, S. (2004). “Data mining: A competitive weapon for banking and retail industries.” Information Systems Management, 21, 62–71.
  • [24] Kang, E. L. and Cressie, N. (2011). “Bayesian inference for the Spatial Random Effects model.” Journal of the American Statistical Association, 106, 972–983.
  • [25] Kang, E. L., Cressie, N., and Shi, T. (2010). “Using temporal variability to improve spatial mapping with application to satellite data.” Canadian Journal of Statistics, 38, 271–289.
  • [26] Katzfuss, M. and Cressie, N. (2009). “Maximum likelihood estimation of covariance parameters in the spatial-random-effects model.” In Proceedings of the Joint Statistical Meetings, 3378–3390. Alexandria, VA: American Statistical Association.
  • [27] Katzfuss, M. and Cressie, N. (2011). “Spatio-temporal smoothing and EM estimation for massive remote-sensing data sets.” Journal of Time Series Analysis, 32, 430–446.
  • [28] Katzfuss, M. and Cressie, N. (2012). “Bayesian hierarchical spatio-temporal smoothing for very large datasets.” Environmetrics, 23, 94–107.
  • [29] Li, X., Cheng, G., and Lu, L. (2000). “Comparison of spatial interpolation methods.” Advances in Earth Sciences, 15, 260–265.
  • [30] Lindgren, F., Rue, H., and Lindström, J. (2011). “An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach.” Journal of the Royal Statistical Society, Series B, 73, 423–498.
  • [31] Matérn, B. (1960). “Spatial Variation.” Meddelanden fran Statens Skogsforskningsinstitut, 49, 1–144.
  • [32] Matheron, G. (1963). “Principles of geostatistics.” Economic Geology, 58, 1246–1266.
  • [33] Nguyen, H., Cressie, N., and Braverman, A. (2012). “Spatial statistical data fusion for remote sensing applications.” Journal of the American Statistical Association, 107, 1004–1018.
  • [34] Nguyen, H., Katzfuss, M., Cressie, N., and Braverman, A. (2014). “Spatio-temporal data fusion for remote-sensing applications.” Technometrics, 56, 174–185.
  • [35] Nychka, D. (2001). “Spatial process estimates as smoothers.” In Smoothing and Regression: Approaches, Computation and Applications, rev. edn, ed. M. G. Schmiek, 393–424. New York, NY: Wiley.
  • [36] Nychka, D., Bandyopadhyay, S., Hammerling, D., Lindgren, F., and Sain, S. (2015). “A multi-resolution Gaussian process model for the analysis of large spatial data sets.” Journal of Computational and Graphical Statistics, 2, 579–599.
  • [37] Ribeiro, P. J. and Diggle, P. J. (2001). “geoR: a package for geostatistical analysis.” R-NEWS, 1, 2, 14–18. ISSN 1609-3631.
  • [38] Rue, H., Martino, S., and Chopin, N. (2009). “Approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations.” Journal of the Royal Statistical Society, Series B, 71, 319–392.
  • [39] Rue, H., Martino, S., Lindgren, F., Simpson, D., and Riebler, A. (2014). INLA: Functions which allow to perform full Bayesian analysis of latent Gaussian models using Integrated Nested Laplace Approximaxion. R package version 3.0.2.
  • [40] Schabenberger, O. and Gotway, C. (2005). Statistical Methods for Spatial Data Analysis. Boca Raton, FL: Chapman & Hall/CRC Press.
  • [41] Sengupta, A., Cressie, N., Frey, R., and Kahn, B. (2012). “Statistical modeling of MODIS cloud data using the Spatial Random Effects model.” In Proceedings of the Joint Statistical Meetings, 3111–3123. Alexandria, VA: American Statistical Association.
  • [42] Shi, T. and Cressie, N. (2007). “Global statistical analysis of MISR aerosol data: A massive data product from NASA’s Terra satellite.” Environmetrics, 18, 665–680.
  • [43] Stein, M. (2014). “Limitations on low rank approximations for covariance matrices of spatial data.” Spatial Statistics, 8, 1–19.
  • [44] Sun, Y., Li, B., and Genton, M. G. (2012). “Geostatistics for large datasets.” In Space-Time Processes and Challenges Related to Environmental Problems, eds. E. Porcu, J. M. Montero, and M. Schlather, 55–77. Berlin, DE: Springer.
  • [45] Tierney, L. and Kadane, J. B. (1986). “Accurate approximations for posterior moments and marginal densities.” Journal of the American Statistical Association, 81, 82–86.
  • [46] Wahba, G. (1990). Spline Models for Observational Data. Philadelphia, PA: Society for Industrial and Applied Mathematics.
  • [47] Whittle, P. (1963). “Stochastic processes in several dimensions.” Bulletin of the International Statistical Institute, 40, 974–994.
  • [48] Wikle, C. K. and Cressie, N. (1999). “A dimension-reduced approach to space-time Kalman filtering.” Biometrika, 86, 815–829.
  • [49] Wikle, C. K., Milliff, R. F., Nychka, D., and Berliner, L. M. (2001). “Spatiotemporal hierarchical Bayesian modeling of tropical ocean surface winds.” Journal of the American Statistical Association, 96, 382–397.