Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
"Batched bandit problems." Ann. Statist. 44 (2) 660 - 681, April 2016. https://doi.org/10.1214/15-AOS1381