## Electronic Journal of Statistics

### Gaussian process bandits with adaptive discretization

#### Abstract

In this paper, the problem of maximizing a black-box function $f:\mathcal{X}\to \mathbb{R}$ is studied in the Bayesian framework with a Gaussian Process prior. In particular, a new algorithm for this problem is proposed, and high probability bounds on its simple and cumulative regret are established. The query point selection rule in most existing methods involves an exhaustive search over an increasingly fine sequence of uniform discretizations of $\mathcal{X}$. The proposed algorithm, in contrast, adaptively refines $\mathcal{X}$ which leads to a lower computational complexity, particularly when $\mathcal{X}$ is a subset of a high dimensional Euclidean space. In addition to the computational gains, sufficient conditions are identified under which the regret bounds of the new algorithm improve upon the known results. Finally, an extension of the algorithm to the case of contextual bandits is proposed, and high probability bounds on the contextual regret are presented.

#### Article information

Source
Electron. J. Statist., Volume 12, Number 2 (2018), 3829-3874.

Dates
First available in Project Euclid: 4 December 2018

https://projecteuclid.org/euclid.ejs/1543892564

Digital Object Identifier
doi:10.1214/18-EJS1497

#### Citation

Shekhar, Shubhanshu; Javidi, Tara. Gaussian process bandits with adaptive discretization. Electron. J. Statist. 12 (2018), no. 2, 3829--3874. doi:10.1214/18-EJS1497. https://projecteuclid.org/euclid.ejs/1543892564

#### References

• Bogunovic, I., Scarlett, J., and Cevher, V. (2016a). Time-varying gaussian process bandit optimization. In, Artificial Intelligence and Statistics, pages 314–323.
• Bogunovic, I., Scarlett, J., Krause, A., and Cevher, V. (2016b). Truncated variance reduction: A unified approach to bayesian optimization and level-set estimation. In, Advances in Neural Information Processing Systems, pages 1507–1515.
• Bubeck, S., Munos, R., and Stoltz, G. (2011a). Pure exploration in finitely-armed and continuous-armed bandits., Theoretical Computer Science, 412(19):1832–1852.
• Bubeck, S., Munos, R., Stoltz, G., and Szepesvári, C. (2011b). X-armed bandits., Journal of Machine Learning Research, 12(May):1655–1695.
• Bull, A. D. (2011). Convergence rates of efficient global optimization algorithms., Journal of Machine Learning Research, 12(Oct):2879–2904.
• Cesa-Bianchi, N. and Lugosi, G. (2006)., Prediction, learning, and games. Cambridge university press.
• Contal, E. (2016)., Statistical learning approaches for global optimization. PhD thesis, Université Paris-Saclay.
• Contal, E., Buffoni, D., Robicquet, A., and Vayatis, N. (2013). Parallel gaussian process optimization with upper confidence bound and pure exploration. In, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 225–240. Springer.
• Contal, E. and Vayatis, N. (2016). Stochastic process bandits: Upper confidence bounds algorithms via generic chaining., arXiv preprint arXiv:1602.04976.
• Desautels, T., Krause, A., and Burdick, J. W. (2014). Parallelizing exploration-exploitation tradeoffs in gaussian process bandit optimization., The Journal of Machine Learning Research, 15(1):3873–3923.
• Duvenaud, D. (2014)., Automatic model construction with Gaussian processes. PhD thesis, University of Cambridge.
• Kandasamy, K., Dasarathy, G., Oliva, J. B., Schneider, J., and Poczos, B. (2016). Multi-fidelity gaussian process bandit optimisation., arXiv preprint arXiv:1603.06288.
• Kleinberg, R., Slivkins, A., and Upfal, E. (2013). Bandits and experts in metric spaces., arXiv preprint arXiv:1312.1277.
• Krause, A. and Ong, C. S. (2011). Contextual gaussian process bandit optimization. In, Advances in Neural Information Processing Systems, pages 2447–2455.
• Munos, R. (2011). Optimistic optimization of a deterministic function without the knowledge of its smoothness. In, NIPS, pages 783–791.
• Munos, R. et al. (2014). From bandits to monte-carlo tree search: The optimistic principle applied to optimization and planning., Foundations and Trends® in Machine Learning, 7(1):1–129.
• Perchet, V. and Rigollet, P. (2013). The multi-armed bandit problem with covariates., The Annals of Statistics, pages 693–721.
• Rasmussen, C. E. and Williams, C. K. (2006)., Gaussian processes for machine learning, volume 1. MIT press Cambridge.
• Rigollet, P. and Zeevi, A. (2010). Nonparametric bandits with covariates., arXiv preprint arXiv:1003.1630.
• Russo, D. and Van Roy, B. (2014). Learning to optimize via posterior sampling., Mathematics of Operations Research, 39(4):1221–1243.
• Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and de Freitas, N. (2016). Taking the human out of the loop: A review of bayesian optimization., Proceedings of the IEEE, 104(1):148–175.
• Slepian, D. (1961). First passage time for a particular gaussian process., The Annals of Mathematical Statistics, 32(2):610–612.
• Slivkins, A. (2014). Contextual bandits with similarity information., Journal of Machine Learning Research, 15:2533–2568.
• Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. W. (2012). Information-theoretic regret bounds for gaussian process optimization in the bandit setting., IEEE Transactions on Information Theory, 58(5):3250–3265.
• Valko, M., Carpentier, A., and Munos, R. (2013). Stochastic simultaneous optimistic optimization. In, Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 19–27.
• van Handel, R. (2014). Probability in high dimension. Technical report, DTIC, Document.
• Van Handel, R. (2015). Chaining, interpolation, and convexity., arXiv preprint arXiv:1508.05906.
• Wang, Z., Shakibi, B., Jin, L., and Freitas, N. (2014). Bayesian multi-scale optimistic optimization. In, Artificial Intelligence and Statistics, pages 1005–1014.
• Wang, Z., Zhou, B., and Jegelka, S. (2016). Optimization as estimation with gaussian processes in bandit settings. In, Artificial Intelligence and Statistics, pages 1022–1031.