The Annals of Applied Statistics

Stochastic approximation of score functions for Gaussian processes

Michael L. Stein, Jie Chen, and Mihai Anitescu

Full-text: Open access


We discuss the statistical properties of a recently introduced unbiased stochastic approximation to the score equations for maximum likelihood calculation for Gaussian processes. Under certain conditions, including bounded condition number of the covariance matrix, the approach achieves $O(n)$ storage and nearly $O(n)$ computational effort per optimization step, where $n$ is the number of data sites. Here, we prove that if the condition number of the covariance matrix is bounded, then the approximate score equations are nearly optimal in a well-defined sense. Therefore, not only is the approximation efficient to compute, but it also has comparable statistical properties to the exact maximum likelihood estimates. We discuss a modification of the stochastic approximation in which design elements of the stochastic terms mimic patterns from a $2^{n}$ factorial design. We prove these designs are always at least as good as the unstructured design, and we demonstrate through simulation that they can produce a substantial improvement over random designs. Our findings are validated by numerical experiments on simulated data sets of up to 1 million observations. We apply the approach to fit a space–time model to over 80,000 observations of total column ozone contained in the latitude band $40^{\circ}\mathrm{-}50^{\circ}\mathrm{N}$ during April 2012.

Article information

Ann. Appl. Stat. Volume 7, Number 2 (2013), 1162-1191.

First available in Project Euclid: 27 June 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Gaussian process unbiased estimating equations Hutchinson trace estimators maximum likelihood iterative methods preconditioning


Stein, Michael L.; Chen, Jie; Anitescu, Mihai. Stochastic approximation of score functions for Gaussian processes. Ann. Appl. Stat. 7 (2013), no. 2, 1162--1191. doi:10.1214/13-AOAS627.

Export citation


  • Anitescu, M., Chen, J. and Wang, L. (2012). A matrix-free approach for solving the parametric Gaussian process maximum likelihood problem. SIAM J. Sci. Comput. 34 A240–A262.
  • Aune, E., Simpson, D. and Eidsvik, J. (2013). Parameter estimation in high dimensional Gaussian distributions. Statist. Comput. To appear.
  • Avron, H. and Toledo, S. (2011). Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. J. ACM 58 Art. 8, 17.
  • Barnes, J. E. and Hut, P. (1986). A hierarchical $O(N\log N)$ force-calculation algorithm. Nature 324 446–449.
  • Bhapkar, V. P. (1972). On a measure of efficiency of an estimating equation. Sankhyā Ser. A 34 467–472.
  • Box, G. E. P., Hunter, J. S. and Hunter, W. G. (2005). Statistics for Experimenters: Design, Innovation, and Discovery, 2nd ed. Wiley, Hoboken, NJ.
  • Caragea, P. C. and Smith, R. L. (2007). Asymptotic properties of computationally efficient alternative estimators for a class of multivariate normal models. J. Multivariate Anal. 98 1417–1440.
  • Chan, R. H.-F. and Jin, X.-Q. (2007). An Introduction to Iterative Toeplitz Solvers. Fundamentals of Algorithms 5. SIAM, Philadelphia, PA.
  • Chen, K. (2005). Matrix Preconditioning Techniques and Applications. Cambridge Monographs on Applied and Computational Mathematics 19. Cambridge Univ. Press, Cambridge.
  • Chen, J., Anitescu, M. and Saad, Y. (2011). Computing $f(A)b$ via least squares polynomial approximations. SIAM J. Sci. Comput. 33 195–222.
  • Chilès, J.-P. and Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty, 2nd ed. Wiley, Hoboken, NJ.
  • Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 209–226.
  • Dahlhaus, R. and Künsch, H. (1987). Edge effects and efficient parameter estimation for stationary random fields. Biometrika 74 877–882.
  • Eidsvik, J., Finley, A. O., Banerjee, S. and Rue, H. (2012). Approximate Bayesian inference for large spatial datasets using predictive process models. Comput. Statist. Data Anal. 56 1362–1380.
  • Fang, D. and Stein, M. L. (1998). Some statistical methods for analyzing the TOMS data. Journal of Geophysical Research 103 26, 165–26, 182.
  • Forsythe, G. E., Malcolm, M. A. and Moler, C. B. (1976/1977). Computer Methods for Mathematical Computations. Prentice Hall, Englewood Cliffs, NJ.
  • Fuentes, M. (2007). Approximate likelihood for large irregularly spaced spatial data. J. Amer. Statist. Assoc. 102 321–331.
  • Furrer, R., Genton, M. G. and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Statist. 15 502–523.
  • Girard, D. A. (1998). Asymptotic comparison of (partial) cross-validation, GCV and randomized GCV in nonparametric regression. Ann. Statist. 26 315–334.
  • Gneiting, T. (2013). Strictly and non-strictly positive definite functions on spheres. Bernoulli. To appear.
  • Greengard, L. and Rokhlin, V. (1987). A fast algorithm for particle simulations. J. Comput. Phys. 73 325–348.
  • Guyon, X. (1982). Parameter estimation for a stationary process on a $d$-dimensional lattice. Biometrika 69 95–105.
  • Heyde, C. C. (1997). Quasi-Likelihood and Its Application: A General Approach to Optimal Parameter Estimation. Springer, New York.
  • Hutchinson, M. F. (1990). A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Comm. Statist. Simulation Comput. 19 433–450.
  • Kaufman, C. G., Schervish, M. J. and Nychka, D. W. (2008). Covariance tapering for likelihood-based estimation in large spatial data sets. J. Amer. Statist. Assoc. 103 1545–1555.
  • Kolotilina, L. Y. and Yeremin, A. Y. (1993). Factorized sparse approximate inverse preconditionings. I. Theory. SIAM J. Matrix Anal. Appl. 14 45–58.
  • O’Leary, D. P. (1980). The block conjugate gradient algorithm and related methods. Linear Algebra Appl. 29 293–322.
  • Saad, Y. (2003). Iterative Methods for Sparse Linear Systems, 2nd ed. SIAM, Philadelphia, PA.
  • Sang, H. and Huang, J. Z. (2012). A full scale approximation of covariance functions for large spatial data sets. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 111–132.
  • Stein, M. L. (1995). Fixed-domain asymptotics for spatial periodograms. J. Amer. Statist. Assoc. 90 1277–1288.
  • Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York.
  • Stein, M. L. (2007). Spatial variation of total column ozone on a global scale. Ann. Appl. Stat. 1 191–210.
  • Stein, M. L. (2008). A modeling approach for large spatial datasets. J. Korean Statist. Soc. 37 3–10.
  • Stein, M. L. (2012). Statistical properties of covariance tapers. J. Comput. Graph. Statist. DOI:10.1080/10618600.2012.719844.
  • Stein, M. L., Chen, J. and Anitescu, M. (2012). Difference filter preconditioning for large covariance matrices. SIAM J. Matrix Anal. Appl. 33 52–72.
  • Stein, M. L., Chi, Z. and Welty, L. J. (2004). Approximating likelihoods for large spatial data sets. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 275–296.
  • Varin, C., Reid, N. and Firth, D. (2011). An overview of composite likelihood methods. Statist. Sinica 21 5–42.
  • Vecchia, A. V. (1988). Estimation and model identification for continuous spatial processes. J. R. Stat. Soc. Ser. B Stat. Methodol. 50 297–312.
  • Wang, D. and Loh, W.-L. (2011). On fixed-domain asymptotics and covariance tapering in Gaussian random field models. Electron. J. Stat. 5 238–269.
  • Whittle, P. (1954). On stationary processes in the plane. Biometrika 41 434–449.
  • Zhang, H. (2004). Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J. Amer. Statist. Assoc. 99 250–261.
  • Zhang, Y. (2006). Uniformly distributed seeds for randomized trace estimator on $O(N^{2})$-operation log-det approximation in Gaussian process regression. In Proceedings of the 2006 IEEE International Conference on Networking, Sensing and Control ICNSC’06 498–503. Elsevier, Amsterdam.
  • Zhang, H. H., Wahba, G., Lin, Y., Voelker, M., Ferris, M., Klein, R. and Klein, B. (2004). Variable selection and model building via likelihood basis pursuit. J. Amer. Statist. Assoc. 99 659–672.