Bayesian Analysis

Constrained Bayesian Optimization with Noisy Experiments

Benjamin Letham, Brian Karrer, Guilherme Ottoni, and Eytan Bakshy

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access

Abstract

Randomized experiments are the gold standard for evaluating the effects of changes to real-world systems. Data in these tests may be difficult to collect and outcomes may have high variance, resulting in potentially large measurement error. Bayesian optimization is a promising technique for efficiently optimizing multiple continuous parameters, but existing approaches degrade in performance when the noise level is high, limiting its applicability to many randomized experiments. We derive an expression for expected improvement under greedy batch optimization with noisy observations and noisy constraints, and develop a quasi-Monte Carlo approximation that allows it to be efficiently optimized. Simulations with synthetic functions show that optimization performance on noisy, constrained problems outperforms existing methods. We further demonstrate the effectiveness of the method with two real-world experiments conducted at Facebook: optimizing a ranking system, and optimizing server compiler flags.

Article information

Source
Bayesian Anal., Advance publication (2018), 25 pages.

Dates
First available in Project Euclid: 10 August 2018

Permanent link to this document
https://projecteuclid.org/euclid.ba/1533866666

Digital Object Identifier
doi:10.1214/18-BA1110

Keywords
Bayesian optimization randomized experiments quasi-Monte Carlo methods

Rights
Creative Commons Attribution 4.0 International License.

Citation

Letham, Benjamin; Karrer, Brian; Ottoni, Guilherme; Bakshy, Eytan. Constrained Bayesian Optimization with Noisy Experiments. Bayesian Anal., advance publication, 10 August 2018. doi:10.1214/18-BA1110. https://projecteuclid.org/euclid.ba/1533866666


Export citation

References

  • Adams, K., Evans, J., Maher, B., Ottoni, G., Paroski, A., Simmers, B., Smith, E., and Yamauchi, O. (2014). “The Hiphop Virtual Machine.” In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA, 777–790.
  • Athey, S. and Wager, S. (2017). “Efficient Policy Learning.” URL https://arxiv.org/abs/1702.02896
  • Bakshy, E. and Frachtenberg, E. (2015). “Design and Analysis of Benchmarking Experiments for Distributed Internet Services.” In Proceedings of the 24th International Conference on World Wide Web, WWW.
  • Bendersky, M., Gabrilovich, E., Josifovski, V., and Metzler, D. (2010). “The Anatomy of an Ad: Structured Indexing and Retrieval for Sponsored Search.” In Proceedings of the 19th International Conference on World Wide Web, WWW, 101–110.
  • Binois, M., Huang, J., Gramacy, R. B., and Ludkovski, M. (2017). “Replication or Exploration? Sequential Design for Stochastic Simulation Experiments.” URL https://arxiv.org/abs/1710.03206
  • Bull, A. D. (2011). “Convergence Rates of Efficient Global Optimization Algorithms.” Journal of Machine Learning Research, 12: 2879–2904.
  • Caflisch, R. E. (1998). “Monte Carlo and Quasi-Monte Carlo Methods.” Acta Numerica, 7: 1–49.
  • Chevalier, C. and Ginsbourger, D. (2013). “Fast Computation of the Multipoint Expected Improvement with Applications in Batch Selection.” In Learning and Intelligent Optimization, Lecture Notes in Computer Science, volume 7997, 59–69.
  • Covington, P., Adams, J., and Sargin, E. (2016). “Deep Neural Networks for YouTube Recommendations.” In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys, 191–198.
  • Deng, A. and Shi, X. (2016). “Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, 77–86.
  • Dick, J., Kuo, F. Y., and Sloan, I. H. (2013). “High-Dimensional Integration: the Quasi-Monte Carlo Way.” Acta Numerica, 22: 133–288.
  • Dudík, M., Erhan, D., Langford, J., and Li, L. (2014). “Doubly Robust Policy Evaluation and Optimization.” Statistical Science, 29(4): 485–511.
  • Gardner, J. R., Kusner, M. J., Xu, Z., Weinberger, K. Q., and Cunningham, J. P. (2014). “Bayesian Optimization with Inequality Constraints.” In Proceedings of the 31st International Conference on Machine Learning, ICML.
  • Gelbart, M. A., Snoek, J., and Adams, R. P. (2014). “Bayesian Optimization with Unknown Constraints.” In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, UAI.
  • Gentle, J. E. (2009). Computational Statistics. New York: Springer.
  • Ginsbourger, D., Janusevskis, J., and Le Riche, R. (2011). “Dealing with Asynchronicity in Parallel Gaussian Process Based Global Optimization.” Technical report. URL https://hal.archives-ouvertes.fr/hal-00507632
  • Gramacy, R. B., Gray, G. A., Digabel, S. L., Lee, H. K. H., Ranjan, P., Wells, G., and Wild, S. M. (2016). “Modeling an Augmented Lagrangian for Blackbox Constrained Optimization.” Technometrics, 58(1): 1–11.
  • Gramacy, R. B. and Lee, H. K. H. (2011). “Optimization under Unknown Constraints.” In Bernardo, J., Bayarri, M. J., Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M., and West, M. (eds.), Bayesian Statistics 9, 229–256. Oxford University Press.
  • Gramacy, R. B. and Taddy, M. A. (2010). “Categorical Inputs, Sensitivity Analysis, Optimization and Importance Tempering with tgp Version 2, an R Package for Treed Gaussian Process Models.” Journal of Statistical Software, 33(6).
  • Hennig, P. and Schuler, C. J. (2012). “Entropy Search for Information-Efficient Global Optimization.” Journal of Machine Learning Research, 13: 1809–1837.
  • Hernández-Lobato, J. M., Gelbart, M. A., Hoffman, M. W., Adams, R. P., and Ghahramani, Z. (2015). “Predictive Entropy Search for Bayesian Optimization with Unknown Constraints.” In Proceedings of the 32nd International Conference on Machine Learning, ICML.
  • Hernández-Lobato, J. M., Hoffman, M. W., and Ghahramani, Z. (2014). “Predictive Entropy Search for Efficient Global Optimization of Black-Box Functions.” In Advances in Neural Information Processing Systems 27, NIPS.
  • Hernández-Lobato, J. M., Requeima, J., Pyzer-Knapp, E. O., and Aspuru-Guzik, A. (2017). “Parallel and Distributed Thompson Sampling for Large-Scale Accelerated Exploration of Chemical Space.” In Proceedings of the 34th International Conference on Machine Learning, ICML.
  • Hoffman, M. D. and Gelman, A. (2014). “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research, 15: 1351–1381.
  • Huang, D., Allen, T. T., Notz, W. I., and Zeng, N. (2006). “Global Optimization of Stochastic Black-Box Systems via Sequential Kriging Meta-Models.” Journal of Global Optimization, 34: 441–466.
  • Jalali, H., Nieuwenhuyse, I., and Picheny, V. (2017). “Comparison of Kriging-Based Algorithms for Simulation Optimization with Heterogeneous Noise.” European Journal of Operational Research, 261(1): 279–301.
  • Jones, D. R., Schonlau, M., and Welch, W. J. (1998). “Efficient Global Optimization of Expensive Black-Box Functions.” Journal of Global Optimization, 13: 455–492.
  • Kandasamy, K., Krishnamurthy, A., Schneider, J., and Póczos, B. (2018). “Parallelised Bayesian Optimisation via Thompson Sampling.” In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, AISTATS.
  • Kohavi, R., Deng, A., Longbotham, R., and Xu, Y. (2014). “Seven Rules of Thumb for Web Site Experimenters.” In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, 1857–1866.
  • Letham, B., Karrer, B., Ottoni, G., and Bakshy, E. (2018). “Supplement to “Constrained Bayesian Optimization with Noisy Experiments”.” Bayesian Analysis.
  • Marco, A., Berkenkamp, F., Hennig, P., Schoellig, A. P., Krause, A., Schaal, S., and Trimpe, S. (2017). “Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian Optimization.” In Proceedings of the IEEE International Conference on Robotics and Automation, ICRA, 1557–1563.
  • Marmin, S., Chevalier, C., and Ginsbourger, D. (2016). “Efficient Batch-Sequential Bayesian Optimization with Moments of Truncated Gaussian Vectors.” URL https://arxiv.org/abs/1609.02700
  • Metzen, J. H. (2016). “Minimum Regret Search for Single- and Multi-Task Optimization.” In Proceedings of the 33rd International Conference on Machine Learning, ICML.
  • Ottoni, G. (2016). “Retune some JIT runtime options.” https://github.com/facebook/hhvm/commit/f9fc204de7165eab5ec9d1a93e290ce8d5f21f58.
  • Owen, A. B. (1998). “Scrambling Sobol’ and Niederreiter-Xing Points.” Journal of Complexity, 14: 466–489.
  • Picheny, V., Ginsbourger, D., and Richet, Y. (2010). “Noisy Expected Improvement and On-Line Computation Time Allocation for the Optimization of Simulators with Tunable Fidelity.” In Proceedings of the 2nd International Conference on Engineering Optimization, EngOpt.
  • Picheny, V., Ginsbourger, D., Richet, Y., and Caplin, G. (2013a). “Quantile-Based Optimization of Noisy Computer Experiments with Tunable Precision.” Technometrics, 55(1): 2–13.
  • Picheny, V., Gramacy, R. B., Wild, S., and Le Digabel, S. (2016). “Bayesian Optimization under Mixed Constraints with a Slack-Variable Augmented Lagrangian.” In Advances in Neural Information Processing Systems 29, NIPS.
  • Picheny, V., Wagner, T., and Ginsbourger, D. (2013b). “A Benchmark of Kriging-Based Infill Criteria for Noisy Optimization.” Structural and Multidisciplinary Optimization, 48: 607–626.
  • Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. Cambridge, Massachusetts: The MIT Press.
  • Rodriguez, M., Posse, C., and Zhang, E. (2012). “Multiple Objective Optimization in Recommender Systems.” In Proceedings of the 6th ACM Conference on Recommender Systems, RecSys, 11–18.
  • Schonlau, M., Welch, W. J., and Jones, D. R. (1998). “Global versus Local Search in Constrained Optimization of Computer Models.” Lecture Notes—Monograph Series, 34: 11–25.
  • Scott, W., Frazier, P., and Powell, W. (2011). “The Correlated Knowledge Gradient for Simulation Optimization of Continuous Parameters using Gaussian Process Regression.” SIAM Journal of Optimization, 21: 996–1026.
  • Shah, A. and Ghahramani, Z. (2015). “Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions.” In Advances in Neural Information Processing Systems 28, NIPS.
  • Snoek, J., Larochelle, H., and Adams, R. P. (2012). “Practical Bayesian Optimization of Machine Learning Algorithms.” In Advances in Neural Information Processing Systems 25, NIPS.
  • Snoek, J., Swersky, K., Zemel, R., and Adams, R. P. (2014). “Input Warping for Bayesian Optimization of Non-Stationary Functions.” In Proceedings of the 31st International Conference on Machine Learning, ICML.
  • Taddy, M. A., Lee, H. K. H., Gray, G. A., and Griffin, J. D. (2009). “Bayesian Guided Pattern Search for Robust Local Optimization.” Technometrics, 51(4): 389–401.
  • Thompson, W. R. (1933). “On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples.” Biometrika, 25(3/4): 285–294.
  • Vazquez, E., Villemonteix, J., Sidorkiewicz, M., and Walter, E. (2008). “Global Optimization based on Noisy Evaluations: An Empirical Study of Two Statistical Approaches.” Journal of Global Optimization, 43: 373–389.
  • Villemonteix, J., Vazquez, E., and Walter, E. (2009). “An Informational Approach to the Global Optimization of Expensive-to-Evaluate Functions.” Journal of Global Optimization, 44: 509–534.
  • Wang, J., Clark, S. C., Liu, E., and Frazier, P. I. (2016). “Parallel Bayesian Global Optimization of Expensive Functions.” URL https://arxiv.org/abs/1602.05149
  • Wang, X. and Fang, K.-T. (2003). “The Effective Dimension and Quasi-Monte Carlo Integration.” Journal of Complexity, 19: 101–124.
  • Wilson, A., Fern, A., and Tadepalli, P. (2014). “Using Trajectory Data to Improve Bayesian Optimization for Reinforcement Learning.” Journal of Machine Learning Research, 15(1): 253–282.
  • Wu, J. and Frazier, P. I. (2016). “The Parallel Knowledge Gradient Method for Batch Bayesian Optimization.” In Advances in Neural Information Processing Systems 29, NIPS.
  • Zhao, Y., Zeng, D., Rush, A. J., and Kosorok, M. R. (2012). “Estimating Individualized Treatment Rules using Outcome Weighted Learning.” Journal of the American Statistical Association, 107.

Supplemental materials