Statistical Science

Analysis Methods for Computer Experiments: How to Assess and What Counts?

Hao Chen, Jason L. Loeppky, Jerome Sacks, and William J. Welch

Full-text: Open access


Statistical methods based on a regression model plus a zero-mean Gaussian process (GP) have been widely used for predicting the output of a deterministic computer code. There are many suggestions in the literature for how to choose the regression component and how to model the correlation structure of the GP. This article argues that comprehensive, evidence-based assessment strategies are needed when comparing such modeling options. Otherwise, one is easily misled. Applying the strategies to several computer codes shows that a regression model more complex than a constant mean either has little impact on prediction accuracy or is an impediment. The choice of correlation function has modest effect, but there is little to separate two common choices, the power exponential and the Matérn, if the latter is optimized with respect to its smoothness. The applications presented here also provide no evidence that a composite of GPs provides practical improvement in prediction accuracy. A limited comparison of Bayesian and empirical Bayes methods is similarly inconclusive. In contrast, we find that the effect of experimental design is surprisingly large, even for designs of the same type with the same theoretical properties.

Article information

Statist. Sci., Volume 31, Number 1 (2016), 40-60.

First available in Project Euclid: 10 February 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Correlation function Gaussian process kriging prediction accuracy regression


Chen, Hao; Loeppky, Jason L.; Sacks, Jerome; Welch, William J. Analysis Methods for Computer Experiments: How to Assess and What Counts?. Statist. Sci. 31 (2016), no. 1, 40--60. doi:10.1214/15-STS531.

Export citation


  • Abt, M. (1999). Estimating the prediction mean squared error in Gaussian stochastic processes with exponential correlation structure. Scand. J. Stat. 26 563–578.
  • Andrianakis, I. and Challenor, P. G. (2012). The effect of the nugget on Gaussian process emulators of computer models. Comput. Statist. Data Anal. 56 4215–4228.
  • Ba, S. and Joseph, V. R. (2012). Composite Gaussian process models for emulating expensive functions. Ann. Appl. Stat. 6 1838–1860.
  • Barthelmann, V., Novak, E. and Ritter, K. (2000). High dimensional polynomial interpolation on sparse grids. Adv. Comput. Math. 12 273–288.
  • Bastos, L. S. and O’Hagan, A. (2009). Diagnostics for Gaussian process emulators. Technometrics 51 425–438.
  • Bayarri, M. J., Berger, J. O., Paulo, R., Sacks, J., Cafeo, J. A., Cavendish, J., Lin, C.-H. and Tu, J. (2007). A framework for validation of computer models. Technometrics 49 138–154.
  • Bayarri, M. J., Berger, J. O., Calder, E. S., Dalbey, K., Lunagomez, S., Patra, A. K., Pitman, E. B., Spiller, E. T. and Wolpert, R. L. (2009). Using statistical and computer models to quantify volcanic hazards. Technometrics 51 402–413.
  • Bingham, D., Ranjan, P. and Welch, W. J. (2014). Design of computer experiments for optimization, estimation of function contours, and related objectives. In Statistics in Action (J. F. Lawless, ed.) 109–124. CRC Press, Boca Raton, FL.
  • Chapman, W. L., Welch, W. J., Bowman, K. P., Sacks, J. and Walsh, J. E. (1994). Arctic sea ice variability: Model sensitivities and a multidecadal simulation. J. Geophys. Res. 99C 919–935.
  • Chen, H. (2013). Bayesian prediction and inference in analysis of computer experiments. Master’s thesis, Univ. British, Columbia, Vancouver.
  • Chen, H., Loeppky, J. L., Sacks, J. and Welch, W. J. (2016). Supplement to “Analysis Methods for Computer Experiments: How to Assess and What Counts?” DOI:10.1214/15-STS531SUPP.
  • Currin, C., Mitchell, T., Morris, M. and Ylvisaker, D. (1991). Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments. J. Amer. Statist. Assoc. 86 953–963.
  • Dixon, L. C. W. and Szegö, G. P. (1978). The global optimisation problem: An introduction. In Towards Global Optimisation (L. C. W. Dixon and G. P. Szegö, eds.) 1–15. North Holland, Amsterdam.
  • Gao, F., Sacks, J. and Welch, W. J. (1996). Predicting urban ozone levels and trends with semiparametric modeling. J. Agric. Biol. Environ. Stat. 1 404–425.
  • Gough, W. A. and Welch, W. J. (1994). Parameter space exploration of an ocean general circulation model using an isopycnal mixing parameterization. J. Mar. Res. 52 773–796.
  • Gramacy, R. B. and Lee, H. K. H. (2008). Bayesian treed Gaussian process models with an application to computer modeling. J. Amer. Statist. Assoc. 103 1119–1130.
  • Gramacy, R. B. and Lee, H. K. H. (2012). Cases for the nugget in modeling computer experiments. Stat. Comput. 22 713–722.
  • Jones, D. R., Schonlau, M. and Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. J. Global Optim. 13 455–492.
  • Joseph, V. R., Hung, Y. and Sudjianto, A. (2008). Blind kriging: A new method for developing metamodels. J. Mech. Des. 130 031102–1–8.
  • Kaufman, C. G., Bingham, D., Habib, S., Heitmann, K. and Frieman, J. A. (2011). Efficient emulators of computer experiments using compactly supported correlation functions, with an application to cosmology. Ann. Appl. Stat. 5 2470–2492.
  • Kennedy, M. (2004). Description of the Gaussian process model used in GEM-SA. Techical report, Univ. Sheffield. Available at
  • Lim, Y. B., Sacks, J., Studden, W. J. and Welch, W. J. (2002). Design and analysis of computer experiments when the output is highly correlated over the input space. Canad. J. Statist. 30 109–126.
  • Loeppky, J. L., Moore, L. M. and Williams, B. J. (2010). Batch sequential designs for computer experiments. J. Statist. Plann. Inference 140 1452–1464.
  • Loeppky, J. L., Sacks, J. and Welch, W. J. (2009). Choosing the sample size of a computer experiment: A practical guide. Technometrics 51 366–376.
  • McMillan, N. J., Sacks, J., Welch, W. J. and Gao, F. (1999). Analysis of protein activity data by Gaussian stochastic process models. J. Biopharm. Statist. 9 145–160.
  • Morris, M. D. and Mitchell, T. J. (1995). Exploratory designs for computational experiments. J. Statist. Plann. Inference 43 381–402.
  • Morris, M. D., Mitchell, T. J. and Ylvisaker, D. (1993). Bayesian design and analysis of computer experiments: Use of derivatives in surface prediction. Technometrics 35 243–255.
  • Nilson, T. and Kuusk, A. (1989). A reflectance model for the homogeneous plant canopy and its inversion. Remote Sens. Environ. 27 157–167.
  • O’Hagan, A. (1992). Some Bayesian numerical analysis. In Bayesian Statistics, 4 (PeñíScola, 1991) (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 345–363. Oxford Univ. Press, New York.
  • Picheny, V., Ginsbourger, D., Richet, Y. and Caplin, G. (2013). Quantile-based optimization of noisy computer experiments with tunable precision. Technometrics 55 2–13.
  • Preston, D. L., Tonks, D. L. and Wallace, D. C. (2003). Model of plastic deformation for extreme loading conditions. J. Appl. Phys. 93 211–220.
  • Ranjan, P., Haynes, R. and Karsten, R. (2011). A computationally stable approach to Gaussian process interpolation of deterministic computer simulation data. Technometrics 53 366–378.
  • Sacks, J., Schiller, S. B. and Welch, W. J. (1989). Designs for computer experiments. Technometrics 31 41–47.
  • Sacks, J., Welch, W. J., Mitchell, T. J. and Wynn, H. P. (1989). Design and analysis of computer experiments (with discussion). Statist. Sci. 4 409–435.
  • Schonlau, M. and Welch, W. J. (2006). Screening the input variables to a computer model via analysis of variance and visualization. In Screening: Methods for Experimentation in Industry, Drug Discovery, and Genetics (A. Dean and S. Lewis, eds.) 308–327. Springer, New York.
  • Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York.
  • Styer, P., McMillan, N., Gao, F., Davis, J. and Sacks, J. (1995). Effect of outdoor airborne particulate matter on daily death counts. Environ. Health Perspect. 103 490–497.
  • Surjanovic, S. and Bingham, D. (2015). Virtual library of simulation experiments: Test functions and datasets. Available at
  • Welch, W. J., Buck, R. J., Sacks, J., Wynn, H. P., Mitchell, T. J. and Morris, M. D. (1992). Screening, predicting, and computer experiments. Technometrics 34 15–25.
  • Welch, W. J., Buck, R. J., Sacks, J., Wynn, H. P., Morris, M. D. and Schonlau, M. (1996). Response to James M. Lucas. Technometrics 38 199–203.
  • West, O. R., Siegrist, R. L., Mitchell, T. J. and Jenkins, R. A. (1995). Measurement error and spatial variability effects on characterization of volatile organics in the subsurface. Environ. Sci. Technol. 29 647–656.

Supplemental materials

  • Supplement to “Analysis Methods for Computer Experiments: How to Assess and What Counts?”. This report (whatcounts-supp.pdf) contains further description of the test functions and data from running them, further results for root mean squared error, findings for maximum absolute error, further results on uncertainty of prediction, and details of the simulation investigating regression terms. Inputs to the Arctic sea-ice code—ice-x.txt. Outputs from the code—ice-y.txt.