• Bernoulli
  • Volume 25, Number 4A (2019), 2883-2919.

A supermartingale approach to Gaussian process based sequential design of experiments

Julien Bect, François Bachoc, and David Ginsbourger

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Gaussian process (GP) models have become a well-established framework for the adaptive design of costly experiments, and notably of computer experiments. GP-based sequential designs have been found practically efficient for various objectives, such as global optimization (estimating the global maximum or maximizer(s) of a function), reliability analysis (estimating a probability of failure) or the estimation of level sets and excursion sets. In this paper, we study the consistency of an important class of sequential designs, known as stepwise uncertainty reduction (SUR) strategies. Our approach relies on the key observation that the sequence of residual uncertainty measures, in SUR strategies, is generally a supermartingale with respect to the filtration generated by the observations. This observation enables us to establish generic consistency results for a broad class of SUR strategies. The consistency of several popular sequential design strategies is then obtained by means of this general result. Notably, we establish the consistency of two SUR strategies proposed by Bect, Ginsbourger, Li, Picheny and Vazquez (Stat. Comput. 22 (2012) 773–793) – to the best of our knowledge, these are the first proofs of consistency for GP-based sequential design algorithms dedicated to the estimation of excursion sets and their measure. We also establish a new, more general proof of consistency for the expected improvement algorithm for global optimization which, unlike previous results in the literature, applies to any GP with continuous sample paths.

Article information

Bernoulli, Volume 25, Number 4A (2019), 2883-2919.

Received: August 2017
Revised: September 2018
First available in Project Euclid: 13 September 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

active learning convergence sequential design of experiments stepwise uncertainty reduction supermartingale uncertainty functional


Bect, Julien; Bachoc, François; Ginsbourger, David. A supermartingale approach to Gaussian process based sequential design of experiments. Bernoulli 25 (2019), no. 4A, 2883--2919. doi:10.3150/18-BEJ1074.

Export citation


  • [1] Adler, R.J. (1981). The Geometry of Random Fields. Wiley Series in Probability and Mathematical Statistics. Chichester: Wiley.
  • [2] Azaïs, J.-M. and Wschebor, M. (2009). Level Sets and Extrema of Random Processes and Fields. Hoboken, NJ: Wiley.
  • [3] Azzimonti, D., Ginsbourger, D., Chevalier, C., Bect, J. and Richet, Y. (2018). Adaptive design of experiments for conservative estimation of excursion sets. ArXiv preprint, arXiv:1611.07256v3.
  • [4] Bayarri, M.J., Berger, J.O., Paulo, R., Sacks, J., Cafeo, J.A., Cavendish, J., Lin, C.-H. and Tu, J. (2007). A framework for validation of computer models. Technometrics 49 138–154.
  • [5] Bect, J., Li, L. and Vazquez, E. (2017). Bayesian subset simulation. SIAM/ASA J. Uncertain. Quantificat. 5 762–786.
  • [6] Bect, J., Ginsbourger, D., Li, L., Picheny, V. and Vazquez, E. (2012). Sequential design of computer experiments for the estimation of a probability of failure. Stat. Comput. 22 773–793.
  • [7] Billingsley, P. (1999). Convergence of Probability Measures, 2nd ed. Wiley Series in Probability and Statistics: Probability and Statistics. New York: Wiley. A Wiley-Interscience Publication.
  • [8] Binois, M. (2015). Uncertainty quantification on Pareto fronts and high-dimensional strategies in Bayesian optimization, with applications in multi-objective automotive design. Ph.D. thesis, Ecole Nationale Supérieure des Mines de Saint-Etienne.
  • [9] Bogachev, V.I. (1998). Gaussian Measures. Mathematical Surveys and Monographs 62. Providence, RI: Amer. Math. Soc..
  • [10] Brezis, H. (2011). Functional Analysis, Sobolev Spaces and Partial Differential Equations. Universitext. New York: Springer.
  • [11] Bull, A.D. (2011). Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res. 12 2879–2904.
  • [12] Chevalier, C. (2013). Fast uncertainty reduction strategies relying on Gaussian process models. Ph.D. thesis, University of Bern.
  • [13] Chevalier, C., Ginsbourger, D., Bect, J., Vazquez, E., Picheny, V. and Richet, Y. (2014). Fast parallel kriging-based stepwise uncertainty reduction with application to the identification of an excursion set. Technometrics 56 455–465.
  • [14] Cohn, D.A., Ghahramani, Z. and Jordan, M.I. (1996). Active learning with statistical models. J. Artificial Intelligence Res. 4 129–145.
  • [15] Cover, T.M. and Thomas, J.A. (1991). Elements of Information Theory. Wiley Series in Telecommunications. New York: Wiley. A Wiley-Interscience Publication.
  • [16] Cressie, N.A.C. (1993). Statistics for Spatial Data. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. New York: Wiley. Revised reprint of the 1991 edition, A Wiley-Interscience Publication.
  • [17] DeGroot, M.H. (1962). Uncertainty, information, and sequential experiments. Ann. Math. Stat. 33 404–419.
  • [18] DeGroot, M.H. (1984). Changes in utility as information. Theory and Decision 17 287–303.
  • [19] DeGroot, M.H. (1986). Concepts of information based on utility. In Recent Developments in the Foundations of Utility and Risk Theory (L. Daboni, A. Montesano and M. Lines, eds.) 265–275. Netherlands, Dordrecht: Springer.
  • [20] Diaconis, P. (1988). Bayesian numerical analysis. In Statistical Decision Theory and Related Topics, IV, Vol. 1 (West Lafayette, Ind., 1986) 163–175. New York: Springer.
  • [21] Emmerich, M., Giannakoglou, K. and Naujoks, B. (2006). Single-and multiobjective optimization assisted by Gaussian random field metamodels. IEEE Trans. Evol. Comput. 10.
  • [22] Feliot, P., Bect, J. and Vazquez, E. (2017). A Bayesian approach to constrained single- and multi-objective optimization. J. Global Optim. 67 97–133.
  • [23] Forrester, A., Sobester, A. and Keane, A. (2008). Engineering Design Via Surrogate Modelling: A Practical Guide. Wiley.
  • [24] Frazier, P. (2009). Knowledge-Gradient Methods for Statistical Learning. Ann Arbor, MI: ProQuest LLC. Thesis (Ph.D.)–Princeton University.
  • [25] Frazier, P.I., Powell, W.B. and Dayanik, S. (2008). A knowledge-gradient policy for sequential information collection. SIAM J. Control Optim. 47 2410–2439.
  • [26] Frazier, P., Powell, W. and Dayanik, S. (2009). The knowledge-gradient policy for correlated normal beliefs. INFORMS J. Comput. 21 599–613.
  • [27] Geman, D. and Jedynak, B. (1996). An active testing model for tracking roads in satellite images. IEEE Trans. Pattern Anal. Mach. Intell. 18 1–14.
  • [28] Ginsbourger, D., Roustant, O. and Durrande, N. (2016). On degeneracy and invariances of random fields paths with applications in Gaussian process modelling. J. Statist. Plann. Inference 170 117–128.
  • [29] Ginsbourger, D., Baccou, J., Chevalier, C., Perales, F., Garland, N. and Monerie, Y. (2014). Bayesian adaptive reconstruction of profile optima and optimizers. SIAM/ASA J. Uncertain. Quantificat. 2 490–510.
  • [30] Gramacy, R.B., Gray, G.A., Le Digabel, S., Lee, H.K.H., Ranjan, P., Wells, G. and Wild, S.M. (2016). Modeling an augmented Lagrangian for blackbox constrained optimization. Technometrics 58 1–11.
  • [31] Grünewälder, S., Audibert, J.-Y., Opper, M. and Shawe-Taylor, J. (2010). Regret bounds for Gaussian process bandit problems. In International Conference on Artificial Intelligence and Statistics.
  • [32] Hainy, M., Müller, W.G. and Wynn, H.P. (2014). Learning functions and approximate Bayesian computation design: ABCD. Entropy 16 4353–4374.
  • [33] Hennig, P., Osborne, M.A. and Girolami, M. (2015). Probabilistic numerics and uncertainty in computations. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 471 20150142, 17.
  • [34] Ibragimov, I.d.A. and Rozanov, Y.A. (1978). Gaussian Random Processes. Applications of Mathematics 9. New York–Berlin: Springer. Translated from the Russian by A.B. Aries.
  • [35] Johnson, R. (1960). An information theory approach to diagnosis. In Proceedings of the 6th Symposium on Reliability and Quality Control 102–109.
  • [36] Jones, D.R., Schonlau, M. and Welch, W.J. (1998). Efficient global optimization of expensive black-box functions. J. Global Optim. 13 455–492. Workshop on Global Optimization (Trier, 1997).
  • [37] Kallenberg, O. (2002). Foundations of Modern Probability, 2nd ed. Probability and Its Applications (New York). New York: Springer.
  • [38] King-Smith, P.E., Grigsby, S.S., Vingrys, A.J., Benes, S.C. and Supowit, A. (1994). Efficient and unbiased modifications of the QUEST threshold method: Theory, simulations, experimental evaluation and practical implementation. Vision Res. 34 885–912.
  • [39] Koehler, J.R., Puhalskii, A.A. and Simon, B. (1998). Estimating functions evaluated by simulation: A Bayesian/analytic approach. Ann. Appl. Probab. 8 1184–1215.
  • [40] Ledoux, M. and Talagrand, M. (2011). Probability in Banach Spaces: Isoperimetry and Processes. Classics in Mathematics. Berlin: Springer. Reprint of the 1991 edition.
  • [41] Lukić, M.N. and Beder, J.H. (2001). Stochastic processes with sample paths in reproducing kernel Hilbert spaces. Trans. Amer. Math. Soc. 353 3945–3969.
  • [42] MacKay, D.J.C. (1992). Information-based objective functions for active data selection. Neural Comput. 4 590–604.
  • [43] Mockus, J.B., Tiesis, V. and Žilinskas, A. (1978). The application of Bayesian methods for seeking the extremum. In Towards Global Optimization, Volume 2 (L.C.W. Dixon and G.P. Szegö, eds.) 117–129. New York: North Holland.
  • [44] Molchanov, I. (2005). Theory of Random Sets. Probability and Its Applications (New York). London: Springer London, Ltd..
  • [45] O’Geran, J.H., Wynn, H.P. and Zhiglyavsky, A.A. (1993). Mastermind as a test-bed for search algorithms. Chance 6 31–37.
  • [46] O’Hagan, A. (1991). Bayes–Hermite quadrature. J. Statist. Plann. Inference 29 245–260.
  • [47] Perlman, M.D. (1974). Jensen’s inequality for a convex vector-valued function on an infinite-dimensional space. J. Multivariate Anal. 4 52–65.
  • [48] Picheny, V. (2014). A stepwise uncertainty reduction approach to constrained global optimization. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS).
  • [49] Picheny, V., Ginsbourger, D., Roustant, O., Haftka, R. and Kim, N.-H. (2010). Adaptive designs of experiments for accurate approximation of target regions. J. Mech. Des. 132.
  • [50] Ranjan, P., Bingham, D. and Michailidis, G. (2008). Sequential experiment design for contour estimation from complex computer codes. Technometrics 50 527–541.
  • [51] Ritter, K. (2000). Average-Case Analysis of Numerical Problems. Lecture Notes in Math. 1733. Berlin: Springer.
  • [52] Sacks, J., Welch, W.J., Mitchell, T.J. and Wynn, H.P. (1989). Design and analysis of computer experiments. Statist. Sci. 4 409–435. With comments and a rejoinder by the authors.
  • [53] Santner, T.J., Williams, B.J. and Notz, W.I. (2003). The Design and Analysis of Computer Experiments. Springer Series in Statistics. New York: Springer.
  • [54] Schönfeld, P. (1973). A note on the measurability of the pseudo-inverse. J. Econometrics 1 313–314.
  • [55] Scott, W., Frazier, P. and Powell, W. (2011). The correlated knowledge gradient for simulation optimization of continuous parameters using Gaussian process regression. SIAM J. Optim. 21 996–1026.
  • [56] Shahriari, B., Swersky, K., Wang, Z., Adams, R. and de Freitas, N. (2016). Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE 104 148–175.
  • [57] Srinivas, N., Krause, A., Kakade, S.M. and Seeger, M.W. (2012). Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Trans. Inform. Theory 58 3250–3265.
  • [58] Stein, M.L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer Series in Statistics. New York: Springer.
  • [59] Stroock, D.W. (2011). Probability Theory: An Analytic View, 2nd ed. Cambridge: Cambridge Univ. Press.
  • [60] Vakhania, N.N., Tarieladze, V.I. and Chobanyan, S.A. (1987). Probability Distributions on Banach Spaces. Mathematics and Its Applications (Soviet Series) 14. Dordrecht: D. Reidel Publishing Co. Translated from the Russian and with a preface by Wojbor A. Woyczynski.
  • [61] van der Vaart, A.W. and van Zanten, J.H. (2008). Reproducing kernel Hilbert spaces of Gaussian priors. In Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh. Inst. Math. Stat. (IMS) Collect. 3 200–222. Beachwood, OH: IMS.
  • [62] Vazquez, E. and Bect, J. (2009). A sequential Bayesian algorithm to estimate a probability of failure. In 15th IFAC Symposium on System Identification (SYSID 2009).
  • [63] Vazquez, E. and Bect, J. (2010). Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. J. Statist. Plann. Inference 140 3088–3095.
  • [64] Vazquez, E. and Bect, J. (2010). Pointwise consistency of the kriging predictor with known mean and covariance functions. In MODa 9–Advances in Model-Oriented Design and Analysis 221–228. Springer.
  • [65] Vazquez, E. and Bect, J. (2012). Sequential search based on kriging: Convergence analysis of some algorithms. In Bulletin of the ISI 58th World Statistics Congress of the International Statistical Institute, 2011, The Hague, The Netherlands International Statistical Institute.
  • [66] Villemonteix, J., Vazquez, E. and Walter, E. (2009). An informational approach to the global optimization of expensive-to-evaluate functions. J. Global Optim. 44 509–534.
  • [67] Wang, H., Lin, G. and Li, J. (2016). Gaussian process surrogates for failure detection: A Bayesian experimental design approach. J. Comput. Phys. 313 247–259.
  • [68] Williams, B.J., Santner, T.J. and Notz, W.I. (2000). Sequential design of computer experiments to minimize integrated response functions. Statist. Sinica 10 1133–1152.
  • [69] Yarotsky, D. (2013). Examples of inconsistency in optimization by expected improvement. J. Global Optim. 56 1773–1790.
  • [70] Yarotsky, D. (2013). Univariate interpolation by exponential functions and Gaussian RBFs for generic sets of nodes. J. Approx. Theory 166 163–175.
  • [71] Zuluaga, M., Krause, A., Sergent, G. and Püschel, M. (2013). Active learning for level set estimation. In International Joint Conference on Artificial Intelligence (IJCAI).