The Annals of Applied Statistics

Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors

Huiyan Sang, Mikyoung Jun, and Jianhua Z. Huang

Full-text: Open access


This paper investigates the cross-correlations across multiple climate model errors. We build a Bayesian hierarchical model that accounts for the spatial dependence of individual models as well as cross-covariances across different climate models. Our method allows for a nonseparable and nonstationary cross-covariance structure. We also present a covariance approximation approach to facilitate the computation in the modeling and analysis of very large multivariate spatial data sets. The covariance approximation consists of two parts: a reduced-rank part to capture the large-scale spatial dependence, and a sparse covariance matrix to correct the small-scale dependence error induced by the reduced rank approximation. We pay special attention to the case that the second part of the approximation has a block-diagonal structure. Simulation results of model fitting and prediction show substantial improvement of the proposed approximation over the predictive process approximation and the independent blocks analysis. We then apply our computational approach to the joint statistical modeling of multiple climate model errors.

Article information

Ann. Appl. Stat., Volume 5, Number 4 (2011), 2519-2548.

First available in Project Euclid: 20 December 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Climate model output co-regionalization Gaussian processes large spatial data set multivariate spatial process


Sang, Huiyan; Jun, Mikyoung; Huang, Jianhua Z. Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors. Ann. Appl. Stat. 5 (2011), no. 4, 2519--2548. doi:10.1214/11-AOAS478.

Export citation


  • Abramowitz, M. and Stegun, I. (1964). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York.
  • Apanasovich, T. V. and Genton, M. G. (2010). Cross-covariance functions for multivariate random fields based on latent dimensions. Biometrika 97 15–30.
  • Banerjee, S., Carlin, B. and Gelfand, A. (2004). Hierarchical Modeling and Analysis for Spatial Data. Chapman & Hall, Boca Raton, FL.
  • Banerjee, S., Gelfand, A. E., Finley, A. O. and Sang, H. (2008). Gaussian predictive process models for large spatial data sets. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 825–848.
  • Banerjee, S., Finley, A. O., Waldmann, P. and Ericsson, T. (2010). Hierarchical spatial process models for multiple traits in large genetic trials. J. Amer. Statist. Assoc. 105 506–521.
  • Caragea, P. C. and Smith, R. L. (2007). Asymptotic properties of computationally efficient alternative estimators for a class of multivariate normal models. J. Multivariate Anal. 98 1417–1440.
  • Christensen, W. and Sain, S. (2010). Latent variable modeling for integrating output from multiple climate models. Math. Geosci. 1 1–16.
  • Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 209–226.
  • Finley, A. O., Sang, H., Banerjee, S. and Gelfand, A. E. (2009). Improving the performance of predictive process modeling for large datasets. Comput. Statist. Data Anal. 53 2873–2884.
  • Fuentes, M. (2007). Approximate likelihood for large irregularly spaced spatial data. J. Amer. Statist. Assoc. 102 321–331.
  • Furrer, R., Genton, M. G. and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Statist. 15 502–523.
  • Furrer, R. and Sain, S. R. (2009). Spatial model fitting for large datasets with applications to climate and microarray problems. Stat. Comput. 19 113–128.
  • Furrer, R., Sain, S. R., Nychka, D. and Meehl, G. A. (2007). Multivariate Bayesian analysis of atmosphere-ocean general circulation models. Environ. Ecol. Stat. 14 249–266.
  • Gaspari, G. and Cohn, S. (1999). Construction of correlation functions in two and three dimensions. Quarterly Journal of the Royal Meteorological Society 125 723–757.
  • Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398–409.
  • Gelfand, A. E., Schmidt, A. M., Banerjee, S. and Sirmans, C. F. (2004). Nonstationary multivariate process modeling through spatially varying coregionalization. TEST 13 263–312.
  • Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL.
  • Giorgi, F. and Mearns, L. O. (2002). Calculation of average, uncertainty range, and reliability of regional climate changes from aogcm simulations via the “reliability ensemble averaging” (rea) method. Journal of Climate 15 1141–1158.
  • Gneiting, T. (2002). Compactly supported correlation functions. J. Multivariate Anal. 83 493–508.
  • Gneiting, T., Kleiber, W. and Schlather, M. (2010). Matérn cross-covariance functions for multivariate random fields. J. Amer. Statist. Assoc. 105 1167–1177.
  • Green, A. M., Goddard, L. and Lall, U. (2006). Probabilistic multimodel regional temperature change projections. Journal of Climate 19 4326–4346.
  • Green, P. J. and Sibson, R. (1978). Computing Dirichlet tessellations in the plane. Comput. J. 21 168–173.
  • Harville, D. (2008). Matrix Algebra from a Statistician’s Perspective. Springer, New York.
  • Higdon, D. (2002). Space and space–time modeling using process convolutions. In Quantitative Methods for Current Environmental Issues (C. W. Anderson, V. Barnett, P. C. Chatwin and A. H. El-Shaarawi, eds.) 37–56. Springer, London.
  • Higdon, D., Swall, J. and Kern, J. (1999). Non-stationary spatial modeling. Bayesian Statistics 6 761–768.
  • Jones, P., New, M., Parker, D., Martin, S. and Rigor, I. (1999). Surface air temperature and its variations over the last 150 years. Reviews of Geophysics 37 173–199.
  • Jun, M. (2009). Nonstationary cross-covariance models for multivariate processes on a globe. IAMCS preprint series 2009-110, Texas A&M Univ.
  • Jun, M., Knutti, R. and Nychka, D. W. (2008a). Local eigenvalue analysis of CMIP3 climate model errors. Tellus 60A 992–1000.
  • Jun, M., Knutti, R. and Nychka, D. W. (2008b). Spatial analysis to quantify numerical model bias and dependence: How many climate models are there? J. Amer. Statist. Assoc. 103 934–947.
  • Kammann, E. E. and Wand, M. P. (2003). Geoadditive models. J. R. Stat. Soc. Ser. C Appl. Stat. 52 1–18.
  • Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.
  • Kaufman, C. G., Schervish, M. J. and Nychka, D. W. (2008). Covariance tapering for likelihood-based estimation in large spatial data sets. J. Amer. Statist. Assoc. 103 1545–1555.
  • Knutti, R. (2010). The end of model democracy?: An editorial comment. Climatic Change 102 395–404.
  • Knutti, R., Abramowitz, G., Collins, M., Eyring, V., Gleckler, P. J., Hewitson, B. and Mearns, L. (2010a). Good practice guidance paper on assessing and combining multi model climate projections. In Meeting Report of the Intergovernmental Panel on Climate Change Expert Meeting on Assessing and Combining Multi Model Climate Projections (T. Stocker, D. Qin, G.-K. Plattner, M. Tignor and P. M. Midgley, eds.). Univ. Bern, IPCC Working Group 1 Technical support unit, Univ. Bern, Bern, Switzerland.
  • Knutti, R., Furrer, R., Tebaldi, C., Cermak, J. and Meehl, G. A. (2010b). Challenges in combining projections from multiple climate models. J. Clim. 23 2739–2758.
  • Majumdar, A. and Gelfand, A. E. (2007). Multivariate spatial modeling for geostatistical data using convolved covariance functions. Math. Geol. 39 225–245.
  • Mardia, K. V. and Goodall, C. R. (1993). Spatial-temporal analysis of multivariate environmental monitoring data. In Multivariate Environmental Statistics (G. P. Patil and C. R. Rao, eds.). North-Holland Ser. Statist. Probab. 6 347–386. North-Holland, Amsterdam.
  • Rayner, N., Brohan, P., Parker, D., Folland, C., Kennedy, J., Vanicek, M., Ansell, T. and Tett, S. (2006). Improved Analyses of Changes and Uncertainties in Marine Temperature Measured in Situ Since the Mid-nineteenth century: The hadsst2 dataset. Journal of Climate 19 446–469.
  • Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability 104. Chapman & Hall/CRC, Boca Raton, FL.
  • Rue, H. and Tjelmeland, H. (2002). Fitting Gaussian Markov random fields to Gaussian fields. Scand. J. Stat. 29 31–49.
  • Sain, S. and Furrer, R. (2010). Combining climate model output via model correlations. Stoch. Environ. Res. Risk Assess. 24 821–829.
  • Sain, S. R., Furrer, R. and Cressie, N. (2011). A spatial analysis of multivariate output from regional climate models. Ann. Appl. Stat. 5 150–175.
  • Sang, H. and Huang, J. (2010). A full-scale approximation of covariance functions for large spatial data sets. Preprint.
  • Smith, R. L., Tebaldi, C., Nychka, D. and Mearns, L. O. (2009). Bayesian modeling of uncertainty in ensembles of climate models. J. Amer. Statist. Assoc. 104 97–116.
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 583–639.
  • Stein, M. L. (2008). A modeling approach for large spatial datasets. J. Korean Statist. Soc. 37 3–10.
  • Stein, M. L., Chi, Z. and Welty, L. J. (2004). Approximating likelihoods for large spatial data sets. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 275–296.
  • Tebaldi, C. and Knutti, R. (2007). The use of the multi-model ensemble in probabilistic climate projections. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 365 2053–2075.
  • Tebaldi, C. and Sansó, B. (2009). Joint projections of temperature and precipitation change from multiple climate models: A hierarchical Bayesian approach. J. Roy. Statist. Soc. Ser. A 172 83–106.
  • Tebaldi, C., Smith, R. L., Nychka, D. and Mearns, L. O. (2005). Quantifying uncertainty in projections of regional climate change: A Bayesian approach to the analysis of multimodel ensembles. Journal of Climate 18 1524–1540.
  • Vecchia, A. V. (1988). Estimation and model identification for continuous spatial processes. J. Roy. Statist. Soc. Ser. B 50 297–312.
  • Ver Hoef, J. M., Cressie, N. and Barry, R. P. (2004). Flexible spatial models for kriging and cokriging using moving averages and the fast Fourier transform (FFT). J. Comput. Graph. Statist. 13 265–282.
  • Wackernagel, H. (2003). Multivariate Geostatistics: An Introduction with Applications. Springer, Berlin.
  • Weigel, A., Knutti, R., Liniger, M. and Appenzeller, C. (2010). Risks of model weighting in multimodel climate projections. Journal of Climate 23 4175–4191.
  • Wendland, H. (1995). Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Adv. Comput. Math. 4 389–396.
  • Wendland, H. (1998). Error estimates for interpolation by compactly supported radial basis functions of minimal degree. J. Approx. Theory 93 258–272.
  • Wikle, C. K. and Cressie, N. (1999). A dimension-reduced approach to space–time Kalman filtering. Biometrika 86 815–829.