The Annals of Statistics

Accurate emulators for large-scale computer experiments

Ben Haaland and Peter Z. G. Qian

Full-text: Open access

Abstract

Large-scale computer experiments are becoming increasingly important in science. A multi-step procedure is introduced to statisticians for modeling such experiments, which builds an accurate interpolator in multiple steps. In practice, the procedure shows substantial improvements in overall accuracy, but its theoretical properties are not well established. We introduce the terms nominal and numeric error and decompose the overall error of an interpolator into nominal and numeric portions. Bounds on the numeric and nominal error are developed to show theoretically that substantial gains in overall accuracy can be attained with the multi-step approach.

Article information

Source
Ann. Statist., Volume 39, Number 6 (2011), 2974-3002.

Dates
First available in Project Euclid: 24 January 2012

Permanent link to this document
https://projecteuclid.org/euclid.aos/1327413775

Digital Object Identifier
doi:10.1214/11-AOS929

Mathematical Reviews number (MathSciNet)
MR3012398

Zentralblatt MATH identifier
1246.65172

Subjects
Primary: 41A17: Inequalities in approximation (Bernstein, Jackson, Nikol s kii-type inequalities)
Secondary: 65M12: Stability and convergence of numerical methods 65G50: Roundoff error

Keywords
Computer experiment emulation interpolation Gaussian process large-scale problem multi-step procedure numerical technique radial basis function reproducing kernel Hilbert space

Citation

Haaland, Ben; Qian, Peter Z. G. Accurate emulators for large-scale computer experiments. Ann. Statist. 39 (2011), no. 6, 2974--3002. doi:10.1214/11-AOS929. https://projecteuclid.org/euclid.aos/1327413775


Export citation

References

  • [1] Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337–404.
  • [2] Barry, R. P. and Pace, R. K. (1999). Monte Carlo estimates of the log determinant of large sparse matrices. Linear Algebra Appl. 289 41–54.
  • [3] Bingham, D., Sitter, R. R. and Tang, B. (2009). Orthogonal and nearly orthogonal designs for computer experiments. Biometrika 96 51–65.
  • [4] The Boeing Company (2010). Design space exploration software.
  • [5] Booker, A. J. (2000). Well-conditioned kriging models for optimization of computer simulations. Mathematics and Computing Technology Phantom Works, M&CT Technical Report 00-002, The Boeing Co., Bellvue, WA.
  • [6] Currin, C., Mitchell, T., Morris, M. and Ylvisaker, D. (1991). Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments. J. Amer. Statist. Assoc. 86 953–963.
  • [7] Dancik, G. M. (2011). mlegp: Maximum likelihood estimates of Gaussian processes. R package Version 3.1.2. Available at http://CRAN.R-project.org/package=mlegp.
  • [8] Fang, K.-T., Li, R. and Sudjianto, A. (2006). Design and Modeling for Computer Experiments. Chapman and Hall/CRC, Boca Raton, FL.
  • [9] Fang, K.-T., Lin, D. K. J., Winker, P. and Zhang, Y. (2000). Uniform design: Theory and application. Technometrics 42 237–248.
  • [10] Fasshauer, G. E. (2007). Meshfree Approximation Methods with MATLAB. Interdisciplinary Mathematical Sciences 6. World Scientific, Hackensack, NJ.
  • [11] Fasshauer, G. E. and Jerome, J. W. (1999). Multistep approximation algorithms: Improved convergence rates through postconditioning with smoothing kernels. Adv. Comput. Math. 10 1–27.
  • [12] Floater, M. S. and Iske, A. (1996). Multistep scattered data interpolation using compactly supported radial basis functions. J. Comput. Appl. Math. 73 65–78.
  • [13] Gneiting, T. (2002). Compactly supported correlation functions. J. Multivariate Anal. 83 493–508.
  • [14] Golub, G. H. and Van Loan, C. F. (1989). Matrix Computations, 2nd ed. Johns Hopkins Series in the Mathematical Sciences 3. Johns Hopkins Univ. Press, Baltimore, MD.
  • [15] Haaland, B. and Qian, P. Z. G. (2010). An approach to constructing nested space-filling designs for multi-fidelity computer experiments. Statist. Sinica 20 1063–1075.
  • [16] Hales, S. J. and Levesley, J. (2002). Error estimates for multilevel approximation using polyharmonic splines. Numer. Algorithms 30 1–10.
  • [17] Han, G., Santner, T. J., Notz, W. I. and Bartel, D. L. (2009). Prediction for computer experiments having quantitative and qualitative input variables. Technometrics 51 278–288.
  • [18] Harville, D. A. (2008). Matrix Algebra From a Statistician’s Perspective. Springer, New York.
  • [19] Hedayat, A. S., Sloane, N. J. A. and Stufken, J. (1999). Orthogonal Arrays: Theory and Applications. Springer, New York.
  • [20] IBM (2010). A smarter planet project. Available at http://www.ibm.com/smarterplanet/.
  • [21] Joseph, V. R. and Hung, Y. (2008). Orthogonal-maximin Latin hypercube designs. Statist. Sinica 18 171–186.
  • [22] Kaufman, C. G., Schervish, M. J. and Nychka, D. W. (2008). Covariance tapering for likelihood-based estimation in large spatial data sets. J. Amer. Statist. Assoc. 103 1545–1555.
  • [23] Koehler, J. R. and Owen, A. B. (1996). Computer experiments. In Design and Analysis of Experiments. Handbook of Statistics 13 261–308. North-Holland, Amsterdam.
  • [24] Le Gia, Q. T., Sloan, I. H. and Wendland, H. (2010). Multiscale analysis in Sobolev spaces on the sphere. SIAM J. Numer. Anal. 48 2065–2090.
  • [25] Lin, C. D., Bingham, D., Sitter, R. R. and Tang, B. (2010). A new and flexible method for constructing designs for computer experiments. Ann. Statist. 38 1460–1477.
  • [26] Lin, C. D., Mukerjee, R. and Tang, B. (2009). Construction of orthogonal and nearly orthogonal Latin hypercubes. Biometrika 96 243–247.
  • [27] Linkletter, C., Bingham, D., Hengartner, N., Higdon, D. and Ye, K. Q. (2006). Variable selection for Gaussian process models in computer experiments. Technometrics 48 478–490.
  • [28] Loh, W.-L. (1996). A combinatorial central limit theorem for randomized orthogonal array sampling designs. Ann. Statist. 24 1209–1224.
  • [29] Loh, W.-L. (1996). On Latin hypercube sampling. Ann. Statist. 24 2058–2080.
  • [30] Loh, W.-L. (2003). On the asymptotic distribution of scrambled net quadrature. Ann. Statist. 31 1282–1324.
  • [31] Loh, W.-L. (2008). A multivariate central limit theorem for randomized orthogonal array sampling designs in computer experiments. Ann. Statist. 36 1983–2023.
  • [32] McKay, M. D., Conover, W. J. and Beckman, R. J. (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21 239–245.
  • [33] Narcowich, F. J., Schaback, R. and Ward, J. D. (1999). Multilevel interpolation and approximation. Appl. Comput. Harmon. Anal. 7 243–261.
  • [34] Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Methods. CBMS-NSF Regional Conference Series in Applied Mathematics 63. SIAM, Philadelphia, PA.
  • [35] Owen, A. (1994). Lattice sampling revisited: Monte Carlo variance of means over randomized orthogonal arrays. Ann. Statist. 22 930–945.
  • [36] Owen, A. B. (1992). A central limit theorem for Latin hypercube sampling. J. Roy. Statist. Soc. Ser. B 54 541–551.
  • [37] Owen, A. B. (1992). Orthogonal arrays for computer experiments, integration and visualization. Statist. Sinica 2 439–452.
  • [38] Owen, A. B. (1995). Randomly permuted (t, m, s)-nets and (t, s)-sequences. In Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (Las Vegas, NV, 1994). Lecture Notes in Statist. 106 299–317. Springer, New York.
  • [39] Owen, A. B. (1997). Monte Carlo variance of scrambled net quadrature. SIAM J. Numer. Anal. 34 1884–1910.
  • [40] Owen, A. B. (1997). Scrambled net variance for integrals of smooth functions. Ann. Statist. 25 1541–1562.
  • [41] Qian, P. Z. G. (2009). Nested Latin hypercube designs. Biometrika 96 957–970.
  • [42] Qian, P. Z. G. and Ai, M. (2010). Nested lattice sampling: A new sampling scheme derived by randomizing nested orthogonal arrays. J. Amer. Statist. Assoc. 105 1147–1155.
  • [43] Qian, P. Z. G., Ai, M. and Wu, C. F. J. (2009). Construction of nested space-filling designs. Ann. Statist. 37 3616–3643.
  • [44] Qian, P. Z. G., Tang, B. and Wu, C. F. J. (2009). Nested space-filling designs for computer experiments with two levels of accuracy. Statist. Sinica 19 287–300.
  • [45] Qian, P. Z. G. and Wu, C. F. J. (2009). Sliced space-filling designs. Biometrika 96 945–956.
  • [46] Qian, P. Z. G., Wu, H. and Wu, C. F. J. (2008). Gaussian process models for computer experiments with qualitative and quantitative factors. Technometrics 50 383–396.
  • [47] Rippa, S. (1999). An algorithm for selecting a good value for the parameter c in radial basis function interpolation. Adv. Comput. Math. 11 193–210.
  • [48] Sacks, J., Schiller, S. B. and Welch, W. J. (1989). Designs for computer experiments. Technometrics 31 41–47.
  • [49] Sacks, J., Welch, W. J., Mitchell, T. J. and Wynn, H. P. (1989). Design and analysis of computer experiments. Statist. Sci. 4 409–423.
  • [50] Santner, T. J., Williams, B. J. and Notz, W. I. (2003). The Design and Analysis of Computer Experiments. Springer, New York.
  • [51] Stein, E. M. and Weiss, G. (1971). Introduction to Fourier Analysis on Euclidean Spaces. Princeton Mathematical Series 32. Princeton Univ. Press, Princeton, NJ.
  • [52] Stein, M. (1987). Large sample properties of simulations using Latin hypercube sampling. Technometrics 29 143–151.
  • [53] Stein, M. L., Chi, Z. and Welty, L. J. (2004). Approximating likelihoods for large spatial data sets. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 275–296.
  • [54] Steinberg, D. M. and Lin, D. K. J. (2006). A construction method for orthogonal Latin hypercube designs. Biometrika 93 279–288.
  • [55] Tang, B. (1993). Orthogonal array-based Latin hypercubes. J. Amer. Statist. Assoc. 88 1392–1397.
  • [56] Tang, B. (1994). A theorem for selecting OA-based Latin hypercubes using a distance criterion. Comm. Statist. Theory Methods 23 2047–2058.
  • [57] Varga, R. S. (2004). Geršgorin and His Circles. Springer Series in Computational Mathematics 36. Springer, Berlin.
  • [58] Wahba, G. (1983). Bayesian “confidence intervals” for the cross-validated smoothing spline. J. Roy. Statist. Soc. Ser. B 45 133–150.
  • [59] Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59. SIAM, Philadelphia, PA.
  • [60] Wendland, H. (2005). Scattered Data Approximation. Cambridge Monographs on Applied and Computational Mathematics 17. Cambridge Univ. Press, New York.
  • [61] Wendland, H. (2010). Multiscale analysis in Sobolev spaces on bounded domains. Numerische Mathematik 116 493–517.
  • [62] Ye, K. Q. (1998). Orthogonal column Latin hypercubes and their application in computer experiments. J. Amer. Statist. Assoc. 93 1430–1439.