The Annals of Applied Statistics

Statistical paleoclimate reconstructions via Markov random fields

Dominique Guillot, Bala Rajaratnam, and Julien Emile-Geay

Full-text: Open access


Understanding centennial scale climate variability requires data sets that are accurate, long, continuous and of broad spatial coverage. Since instrumental measurements are generally only available after 1850, temperature fields must be reconstructed using paleoclimate archives, known as proxies. Various climate field reconstructions (CFR) methods have been proposed to relate past temperature to such proxy networks. In this work, we propose a new CFR method, called GraphEM, based on Gaussian Markov random fields embedded within an EM algorithm. Gaussian Markov random fields provide a natural and flexible framework for modeling high-dimensional spatial fields. At the same time, they provide the parameter reduction necessary for obtaining precise and well-conditioned estimates of the covariance structure, even in the sample-starved setting common in paleoclimate applications. In this paper, we propose and compare the performance of different methods to estimate the graphical structure of climate fields, and demonstrate how the GraphEM algorithm can be used to reconstruct past climate variations. The performance of GraphEM is compared to the widely used CFR method RegEM with regularization via truncated total least squares, using synthetic data. Our results show that GraphEM can yield significant improvements, with uniform gains over space, and far better risk properties. We demonstrate that the spatial structure of temperature fields can be well estimated by graphs where each neighbor is only connected to a few geographically close neighbors, and that the increase in performance is directly related to recovering the underlying sparsity in the covariance of the spatial field. Our work demonstrates how significant improvements can be made in climate reconstruction methods by better modeling the covariance structure of the climate field.

Article information

Ann. Appl. Stat., Volume 9, Number 1 (2015), 324-352.

First available in Project Euclid: 28 April 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Climate reconstructions Markov random fields covariance matrix estimation sparsity model selection pseudoproxies


Guillot, Dominique; Rajaratnam, Bala; Emile-Geay, Julien. Statistical paleoclimate reconstructions via Markov random fields. Ann. Appl. Stat. 9 (2015), no. 1, 324--352. doi:10.1214/14-AOAS794.

Export citation


  • Ammann, C. M., Joos, F., Schimel, D. S., Otto-Bliesner, B. L. and Tomas, R. A. (2007). Solar influence on climate during the past millennium: Results from transient simulations with the NCAR climate system model. Proc. Natl. Acad. Sci. USA 104 3713–3718.
  • Annan, J. D. and Hargreaves, J. C. (2012). Identification of climatic state with limited proxy data. Clim. Past 8 1141–1151.
  • Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.
  • Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. J. Roy. Statist. Soc. Ser. B 36 192–236.
  • Bradley, R. S. (1996). Are there optimum sites for global paleotemperature reconstruction? In Climate Variations and Forcing Mechanisms of the Last 2000 Years. NATO ASI 41 603–624. Springer, Berlin.
  • Brohan, P., Kennedy, J. J., Harris, I., Tett, S. F. B. and Jones, P. D. (2006). Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850. Journal of Geophysical Research: Atmospheres (19842012) 111 D12106, 1–21.
  • Bürger, G. (2007). On the verification of climate reconstructions. Clim. Past 3 397–409.
  • Carroll, R. J. and Ruppert, D. (1996). The use and misuse of orthogonal regression in linear errors-in-variables models. Amer. Statist. 50 1–6.
  • Christiansen, B. (2010). Reconstructing the NH mean temperature: Can underestimation of trends and variability be avoided? J. Climate 24 674–692.
  • Christiansen, B. (2013). Straight line fitting and predictions: On a marginal likelihood approach to linear regression and errors-in-variables models. J. Climate 27 2014–2031.
  • Christiansen, B., Schmith, T. and Thejll, P. (2009). A surrogate ensemble study of climate reconstruction methods: Stochasticity and robustness. J. Climate 22 951–976.
  • Cook, E. R., Briffa, K. R. and Jones, P. D. (1994). Spatial regression methods in dendroclimatology: A review and comparison of two techniques. Intern. J. of Clim. 14 379–402.
  • Cook, E. R., Meko, D. M., Stahle, D. W. and Cleaveland, M. K. (1999). Drought reconstructions for the continental United States. J. Climate 12 1145–1162.
  • Dawid, A. P. and Lauritzen, S. L. (1993). Hyper-Markov laws in the statistical analysis of decomposable graphical models. Ann. Statist. 21 1272–1317.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1–38. With discussion.
  • Emile-Geay, J., Cobb, K. M., Mann, M. E. and Wittenberg, A. T. (2013a). Estimating central equatorial pacific SST variability over the past millennium. Part 1: Methodology and validation. J. Climate 26 2302–2328.
  • Emile-Geay, J., Cobb, K. M., Mann, M. E. and Wittenberg, A. T. (2013b). Estimating central equatorial pacific SST variability over the past millennium. Part 2: Reconstructions and implications. J. Climate 26 2329–2352.
  • Fierro, R. D., Golub, G. H., Hansen, P. C. and O’Leary, D. P. (1997). Regularization by truncated total least squares. SIAM J. Sci. Comput. 18 1223–1241.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostat. 9 432–441.
  • Frost, C. and Thompson, S. G. (2000). Correcting for regression dilution bias: Comparison of methods for a single predictor variable. J. Roy. Statist. Soc. Ser. A 163 173–189.
  • Golub, G. H. and Van Loan, C. F. (1980). An analysis of the total least squares problem. SIAM J. Numer. Anal. 17 883–893.
  • Guillot, D., Rajaratnam, B., Rolfs, B., Wong, I. and Maleki, A. (2012). Iterative thresholding algorithm for sparse inverse covariance estimation. In Advances in Neural Information Processing Systems 25 1583–1591. Curran Associates, Red Hook.
  • Hanke, M. and Hansen, P. C. (1993). Regularization methods for large-scale problems. Surveys Math. Indust. 3 253–315.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2008). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
  • Hoerl, A. E. and Kennard, R. W. (1970a). Ridge regression: Biased estimation for non-orthogonal problems. Technometrics 12 55–67.
  • Hoerl, A. E. and Kennard, R. W. (1970b). Ridge regression: Applications to non-orthogonal problems. Technometrics 12 69–82.
  • Hsieh, C.-j., Sustik, M. A., Dhillon, I. S. and Ravikumar, P. K. (2011). Sparse inverse covariance matrix estimation using quadratic approximation. In Advances in Neural Information Processing Systems 24 (J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira and K. Q. Weinberger, eds.) 2330–2338. Curran Associates, Red Hook.
  • Janson, L. and Rajaratnam, B. (2014). A methodology for robust multiproxy paleoclimate reconstructions and modeling of temperature conditional quantiles. J. Amer. Statist. Assoc. 109 63–77.
  • Jones, P. D., Briffa, K. R., Osborn, T. J., Lough, J. M., van Ommen, T. D., Vinther, B. M., Luterbacher, J., Wahl, E. R., Zwiers, F. W., Mann, M. E., Schmidt, G. A., Ammann, C. M., Buckley, B. M., Cobb, K. M., Esper, J., Goosse, H., Graham, N., Jansen, E., Kiefer, T., Kull, C., Kuttel, M., Mosley-Thompson, E., Overpeck, J. T., Riedwyl, N., Schulz, M., Tudhope, A. W., Villalba, R., Wanner, H., Wolff, E. and Xoplaki, E. (2009). High-resolution palaeoclimatology of the last millennium: A review of current status and future prospects. The Holocene 19 3–49.
  • Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford Univ. Press, New York.
  • Letac, G. and Massam, H. (2007). Wishart distributions for decomposable graphs. Ann. Statist. 35 1278–1323.
  • Li, B., Nychka, D. W. and Ammann, C. M. (2010). The value of multiproxy reconstruction of past climate. J. Amer. Statist. Assoc. 105 883–895.
  • Li, B. and Smerdon, J. E. (2012). Defining spatial comparison metrics for evaluation of paleoclimatic field reconstructions of the common era. Environmetrics 23 394–406.
  • Lin, S. P. and Perlman, M. D. (1985). A Monte Carlo comparison of four estimators of a covariance matrix. In Multivariate Analysis VI (Pittsburgh, PA, 1983) (P. R. Krishnaiah, ed.) 411–429. North-Holland, Amsterdam.
  • Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley, New York.
  • Liu, R. Y. (1988). Bootstrap procedures under some non-i.i.d. models. Ann. Statist. 16 1696–1708.
  • Mann, M. E., Bradley, R. S. and Hughes, M. K. (1998). Global-scale temperature patterns and climate forcing over the past six centuries. Nature 392 779–787.
  • Mann, M. E. and Rutherford, S. (2002). Climate reconstruction using ‘pseudoproxies.’ Geophys. Res. Lett. 29 139-1–139-4.
  • Mann, M. E., Rutherford, S., Wahl, E. and Ammann, C. (2005). Testing the fidelity of methods used in proxy-based reconstructions of past climate. J. Climate 18 4097–4107.
  • Mann, M. E., Rutherford, S., Wahl, E. and Ammann, C. (2007a). Robustness of proxy-based climate field reconstruction methods. Journal of Geophysical Research (Atmospheres) 112 D12109.
  • Mann, M. E., Rutherford, S., Wahl, E. and Ammann, C. (2007b). Reply to comments on “Testing the fidelity of methods used in proxy-based reconstructions of past climate” by Smerdon and Kaplan. J. Climate 20 5671–5674.
  • Mann, M. E., Zhang, Z., Hughes, M. K., Bradley, R. S., Miller, S. K., Rutherford, S. and Ni, F. (2008). Proxy-based reconstructions of hemispheric and global surface temperature variations over the past two millennia. Proc. Natl. Acad. Sci. USA 105 13252–13257.
  • Mann, M. E., Zhang, Z., Rutherford, S., Bradley, R. S., Hughes, M. K., Shindell, D., Ammann, C., Faluvegi, G. and Ni, F. (2009). Global signatures and dynamical origins of the little ice age and medieval climate anomaly. Science 326 1256–1260.
  • Masson-Delmotte, V., Schulz, M., Abe-Ouchi, A., Beer, J., Ganopolski, A., Rouco, J. F. G., Jansen, E., Lambeck, K., Luterbacher, J., Naish, T., Osborn, T., Otto-Bliesner, B., Quinn, T., Ramesh, R., Rojas, M., Shao, X. and Timmermann, A. (2013). Information from Paleoclimate Archives. In Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change (T. F. Stocker, D. Qin, G. K. Plattner, M. Tignor, S. K. Allen, J. Boschung, A. Nauels, Y. Xia, V. Bex and P. M. Midgley, eds.) 383–464. Cambridge Univ. Press, Cambridge.
  • McLachlan, G. J. and Krishnan, T. (2008). The EM Algorithm and Extensions, 2nd ed. Wiley, Hoboken, NJ.
  • National Research Council (2006). Surface Temperature Reconstructions for the Last 2000 Years. The National Academies Press, Washington, DC.
  • Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
  • Rajaratnam, B., Massam, H. and Carvalho, C. M. (2008). Flexible covariance estimation in graphical Gaussian models. Ann. Statist. 36 2818–2849.
  • Riedwyl, N., Küttel, M., Luterbacher, J. and Wanner, H. (2009). Comparison of climate field reconstruction techniques: Application to Europe. Clim. Dyn. 32 381–395.
  • Rutherford, S., Mann, M. E., Osborn, T. J., Bradley, R. S., Briffa, K. R., Hughes, M. K. and Jones, P. D. (2005). Proxy-based northern hemisphere surface temperature reconstructions: Sensitivity to method, predictor network, target season, and target domain. J. Climate 18 2308–2329.
  • Schneider, T. (2001). Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. J. Climate 14 853–871.
  • Smerdon, J. E. (2011). Climate models as a test bed for climate reconstruction methods: Pseudoproxy experiments. WIREs Clim Change 3 67–77.
  • Smerdon, J. E. and Kaplan, A. (2007). Comments on “Testing the fidelity of methods used in proxy-based reconstructions of past climate:” The role of the standardization interval. J. Climate 20 5666–5670.
  • Smerdon, J. E., Kaplan, A., Chang, D. and Evans, M. N. (2010). A pseudoproxy evaluation of the CCA and RegEM methods for reconstructing climate fields of the last millennium. J. Climate 23 4856–4880.
  • Smerdon, J. E., Kaplan, A., Zorita, E., González-Rouco, J. F. and Evans, M. N. (2011). Spatial performance of four climate field reconstruction methods targeting the Common Era. Geophys. Res. Lett. 38 L11705.
  • Steiger, N. J., Hakim, G. J., Steig, E. J., Battisti, D. S. and Roe, G. H. (2013). Assimilation of time-averaged pseudoproxies for climate reconstruction. J. Climate 27 426–441.
  • Stein, C. (1986). Lectures on the theory of estimation of many parameters. J. Math. Sci. 34 1373–1403.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tikhonov, A. N. and Arsenin, V. Y. (1977). Solution of Ill-Posed Problems. V. H. Winston and Sons, Washington.
  • Tingley, M. P. and Huybers, P. (2010a). A Bayesian algorithm for reconstructing climate anomalies in space and time. Part II: Comparison with the regularized expectation-maximization algorithm. J. Climate 23 2782–2800.
  • Tingley, M. P. and Huybers, P. (2010b). A Bayesian algorithm for reconstructing climate anomalies in space and time. Part I: Development and applications to paleoclimate reconstruction problems. J. Climate 23 2759–2781.
  • Tingley, M. P. and Li, B. (2012). Comments on “Reconstructing the NH mean temperature: Can underestimation of trends and variability be avoided?” J. Climate 25 3441–3446.
  • Tingley, M. P., Craigmile, P. F., Haran, M., Li, B., Mannshardt, E. and Rajaratnam, B. (2012). Piecing together the past: Statistical insights into paleoclimatic reconstructions. Quaternary Science Reviews 35 1–22.
  • Van Huffel, S. and Vandewalle, J. (1991). The Total Least Squares Problem: Computational Aspects and Analysis. Frontiers in Applied Mathematics 9. SIAM, Philadelphia, PA.
  • von Storch, H., Zorita, E., Jones, J. M., Dimitriev, Y., González-Rouco, F. and Tett, S. F. B. (2004). Reconstructing past climate from noisy data. Science 306 679–682.
  • Wang, J., Emile-Geay, J., Guillot, D., Smerdon, J. E. and Rajaratnam, B. (2014). Evaluating climate field reconstruction techniques using improved emulations of real-world conditions. Climate of the Past 10 1–19.
  • Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Wiley, Chichester.
  • Witten, D. M., Friedman, J. H. and Simon, N. (2011). New insights and faster computations for the graphical lasso. J. Comput. Graph. Statist. 20 892–900.