## The Annals of Applied Statistics

### Stochastic approximation of score functions for Gaussian processes

#### Abstract

We discuss the statistical properties of a recently introduced unbiased stochastic approximation to the score equations for maximum likelihood calculation for Gaussian processes. Under certain conditions, including bounded condition number of the covariance matrix, the approach achieves $O(n)$ storage and nearly $O(n)$ computational effort per optimization step, where $n$ is the number of data sites. Here, we prove that if the condition number of the covariance matrix is bounded, then the approximate score equations are nearly optimal in a well-defined sense. Therefore, not only is the approximation efficient to compute, but it also has comparable statistical properties to the exact maximum likelihood estimates. We discuss a modification of the stochastic approximation in which design elements of the stochastic terms mimic patterns from a $2^{n}$ factorial design. We prove these designs are always at least as good as the unstructured design, and we demonstrate through simulation that they can produce a substantial improvement over random designs. Our findings are validated by numerical experiments on simulated data sets of up to 1 million observations. We apply the approach to fit a space–time model to over 80,000 observations of total column ozone contained in the latitude band $40^{\circ}\mathrm{-}50^{\circ}\mathrm{N}$ during April 2012.

#### Article information

Source
Ann. Appl. Stat. Volume 7, Number 2 (2013), 1162-1191.

Dates
First available in Project Euclid: 27 June 2013

http://projecteuclid.org/euclid.aoas/1372338483

Digital Object Identifier
doi:10.1214/13-AOAS627

Mathematical Reviews number (MathSciNet)
MR3113505

Zentralblatt MATH identifier
06279869

#### Citation

Stein, Michael L.; Chen, Jie; Anitescu, Mihai. Stochastic approximation of score functions for Gaussian processes. Ann. Appl. Stat. 7 (2013), no. 2, 1162--1191. doi:10.1214/13-AOAS627. http://projecteuclid.org/euclid.aoas/1372338483.

#### References

• Anitescu, M., Chen, J. and Wang, L. (2012). A matrix-free approach for solving the parametric Gaussian process maximum likelihood problem. SIAM J. Sci. Comput. 34 A240–A262.
• Aune, E., Simpson, D. and Eidsvik, J. (2013). Parameter estimation in high dimensional Gaussian distributions. Statist. Comput. To appear.
• Avron, H. and Toledo, S. (2011). Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. J. ACM 58 Art. 8, 17.
• Barnes, J. E. and Hut, P. (1986). A hierarchical $O(N\log N)$ force-calculation algorithm. Nature 324 446–449.
• Bhapkar, V. P. (1972). On a measure of efficiency of an estimating equation. Sankhyā Ser. A 34 467–472.
• Box, G. E. P., Hunter, J. S. and Hunter, W. G. (2005). Statistics for Experimenters: Design, Innovation, and Discovery, 2nd ed. Wiley, Hoboken, NJ.
• Caragea, P. C. and Smith, R. L. (2007). Asymptotic properties of computationally efficient alternative estimators for a class of multivariate normal models. J. Multivariate Anal. 98 1417–1440.
• Chan, R. H.-F. and Jin, X.-Q. (2007). An Introduction to Iterative Toeplitz Solvers. Fundamentals of Algorithms 5. SIAM, Philadelphia, PA.
• Chen, K. (2005). Matrix Preconditioning Techniques and Applications. Cambridge Monographs on Applied and Computational Mathematics 19. Cambridge Univ. Press, Cambridge.
• Chen, J., Anitescu, M. and Saad, Y. (2011). Computing $f(A)b$ via least squares polynomial approximations. SIAM J. Sci. Comput. 33 195–222.
• Chilès, J.-P. and Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty, 2nd ed. Wiley, Hoboken, NJ.
• Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 209–226.
• Dahlhaus, R. and Künsch, H. (1987). Edge effects and efficient parameter estimation for stationary random fields. Biometrika 74 877–882.
• Eidsvik, J., Finley, A. O., Banerjee, S. and Rue, H. (2012). Approximate Bayesian inference for large spatial datasets using predictive process models. Comput. Statist. Data Anal. 56 1362–1380.
• Fang, D. and Stein, M. L. (1998). Some statistical methods for analyzing the TOMS data. Journal of Geophysical Research 103 26, 165–26, 182.
• Forsythe, G. E., Malcolm, M. A. and Moler, C. B. (1976/1977). Computer Methods for Mathematical Computations. Prentice Hall, Englewood Cliffs, NJ.
• Fuentes, M. (2007). Approximate likelihood for large irregularly spaced spatial data. J. Amer. Statist. Assoc. 102 321–331.
• Furrer, R., Genton, M. G. and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Statist. 15 502–523.
• Girard, D. A. (1998). Asymptotic comparison of (partial) cross-validation, GCV and randomized GCV in nonparametric regression. Ann. Statist. 26 315–334.
• Gneiting, T. (2013). Strictly and non-strictly positive definite functions on spheres. Bernoulli. To appear.
• Greengard, L. and Rokhlin, V. (1987). A fast algorithm for particle simulations. J. Comput. Phys. 73 325–348.
• Guyon, X. (1982). Parameter estimation for a stationary process on a $d$-dimensional lattice. Biometrika 69 95–105.
• Heyde, C. C. (1997). Quasi-Likelihood and Its Application: A General Approach to Optimal Parameter Estimation. Springer, New York.
• Hutchinson, M. F. (1990). A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Comm. Statist. Simulation Comput. 19 433–450.
• Kaufman, C. G., Schervish, M. J. and Nychka, D. W. (2008). Covariance tapering for likelihood-based estimation in large spatial data sets. J. Amer. Statist. Assoc. 103 1545–1555.
• Kolotilina, L. Y. and Yeremin, A. Y. (1993). Factorized sparse approximate inverse preconditionings. I. Theory. SIAM J. Matrix Anal. Appl. 14 45–58.
• O’Leary, D. P. (1980). The block conjugate gradient algorithm and related methods. Linear Algebra Appl. 29 293–322.
• Saad, Y. (2003). Iterative Methods for Sparse Linear Systems, 2nd ed. SIAM, Philadelphia, PA.
• Sang, H. and Huang, J. Z. (2012). A full scale approximation of covariance functions for large spatial data sets. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 111–132.
• Stein, M. L. (1995). Fixed-domain asymptotics for spatial periodograms. J. Amer. Statist. Assoc. 90 1277–1288.
• Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York.
• Stein, M. L. (2007). Spatial variation of total column ozone on a global scale. Ann. Appl. Stat. 1 191–210.
• Stein, M. L. (2008). A modeling approach for large spatial datasets. J. Korean Statist. Soc. 37 3–10.
• Stein, M. L. (2012). Statistical properties of covariance tapers. J. Comput. Graph. Statist. DOI:10.1080/10618600.2012.719844.
• Stein, M. L., Chen, J. and Anitescu, M. (2012). Difference filter preconditioning for large covariance matrices. SIAM J. Matrix Anal. Appl. 33 52–72.
• Stein, M. L., Chi, Z. and Welty, L. J. (2004). Approximating likelihoods for large spatial data sets. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 275–296.
• Varin, C., Reid, N. and Firth, D. (2011). An overview of composite likelihood methods. Statist. Sinica 21 5–42.
• Vecchia, A. V. (1988). Estimation and model identification for continuous spatial processes. J. R. Stat. Soc. Ser. B Stat. Methodol. 50 297–312.
• Wang, D. and Loh, W.-L. (2011). On fixed-domain asymptotics and covariance tapering in Gaussian random field models. Electron. J. Stat. 5 238–269.
• Whittle, P. (1954). On stationary processes in the plane. Biometrika 41 434–449.
• Zhang, H. (2004). Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J. Amer. Statist. Assoc. 99 250–261.
• Zhang, Y. (2006). Uniformly distributed seeds for randomized trace estimator on $O(N^{2})$-operation log-det approximation in Gaussian process regression. In Proceedings of the 2006 IEEE International Conference on Networking, Sensing and Control ICNSC’06 498–503. Elsevier, Amsterdam.
• Zhang, H. H., Wahba, G., Lin, Y., Voelker, M., Ferris, M., Klein, R. and Klein, B. (2004). Variable selection and model building via likelihood basis pursuit. J. Amer. Statist. Assoc. 99 659–672.