## Bernoulli

- Bernoulli
- Volume 23, Number 4B (2017), 3685-3710.

### Some monotonicity properties of parametric and nonparametric Bayesian bandits

#### Abstract

One of two independent stochastic processes (arms) is to be selected at each of $n$ stages. The selection is sequential and depends on past observations as well as the prior information. The objective is to maximize the expected future-discounted sum of the $n$ observations. We study structural properties of this classical bandit problem, in particular how the maximum expected payoff and the optimal strategy vary with the priors, in two settings: (a) observations from each arm have an exponential family distribution and different arms are assigned independent conjugate priors; (b) observations from each arm have a nonparametric distribution and different arms are assigned independent Dirichlet process priors. In both settings, we derive results of the following type: (i) for a particular arm and a fixed prior weight, the maximum expected payoff increases as the prior mean yield increases; (ii) for a fixed prior mean yield, the maximum expected payoff increases as the prior weight decreases. Specializing to the one-armed bandit, the second result captures the intuition that, given the same immediate payoff, the less one knows about an arm, the more desirable it becomes because there remains more information to be gained when selecting that arm. In the parametric case, our results extend those of (*Ann. Statist.* **20** (1992) 1625–1636) concerning Bernoulli and normal bandits (see also (In *Time Series and Related Topics* (2006) pp. 284–294 IMS)). In the nonparametric case, we extend those of (*Ann. Statist.* **13** (1985) 1523–1534). A key tool in the derivation is stochastic orders.

#### Article information

**Source**

Bernoulli Volume 23, Number 4B (2017), 3685-3710.

**Dates**

Received: April 2011

Revised: May 2016

First available in Project Euclid: 23 May 2017

**Permanent link to this document**

https://projecteuclid.org/euclid.bj/1495505106

**Digital Object Identifier**

doi:10.3150/16-BEJ862

**Zentralblatt MATH identifier**

06778300

**Keywords**

Bernoulli bandits convex order Dirichlet bandits log-concavity optimal stopping sequential decision two-armed bandits

#### Citation

Yu, Yaming. Some monotonicity properties of parametric and nonparametric Bayesian bandits. Bernoulli 23 (2017), no. 4B, 3685--3710. doi:10.3150/16-BEJ862. https://projecteuclid.org/euclid.bj/1495505106