Open Access
November 2017 Some monotonicity properties of parametric and nonparametric Bayesian bandits
Yaming Yu
Bernoulli 23(4B): 3685-3710 (November 2017). DOI: 10.3150/16-BEJ862

Abstract

One of two independent stochastic processes (arms) is to be selected at each of $n$ stages. The selection is sequential and depends on past observations as well as the prior information. The objective is to maximize the expected future-discounted sum of the $n$ observations. We study structural properties of this classical bandit problem, in particular how the maximum expected payoff and the optimal strategy vary with the priors, in two settings: (a) observations from each arm have an exponential family distribution and different arms are assigned independent conjugate priors; (b) observations from each arm have a nonparametric distribution and different arms are assigned independent Dirichlet process priors. In both settings, we derive results of the following type: (i) for a particular arm and a fixed prior weight, the maximum expected payoff increases as the prior mean yield increases; (ii) for a fixed prior mean yield, the maximum expected payoff increases as the prior weight decreases. Specializing to the one-armed bandit, the second result captures the intuition that, given the same immediate payoff, the less one knows about an arm, the more desirable it becomes because there remains more information to be gained when selecting that arm. In the parametric case, our results extend those of (Ann. Statist. 20 (1992) 1625–1636) concerning Bernoulli and normal bandits (see also (In Time Series and Related Topics (2006) pp. 284–294 IMS)). In the nonparametric case, we extend those of (Ann. Statist. 13 (1985) 1523–1534). A key tool in the derivation is stochastic orders.

Citation

Download Citation

Yaming Yu. "Some monotonicity properties of parametric and nonparametric Bayesian bandits." Bernoulli 23 (4B) 3685 - 3710, November 2017. https://doi.org/10.3150/16-BEJ862

Information

Received: 1 April 2011; Revised: 1 May 2016; Published: November 2017
First available in Project Euclid: 23 May 2017

zbMATH: 1383.62027
MathSciNet: MR3654820
Digital Object Identifier: 10.3150/16-BEJ862

Keywords: Bernoulli bandits , Convex order , Dirichlet bandits , Log-concavity , Optimal stopping , sequential decision , two-armed bandits

Rights: Copyright © 2017 Bernoulli Society for Mathematical Statistics and Probability

Vol.23 • No. 4B • November 2017
Back to Top