The Annals of Statistics

Two-Armed Dirichlet Bandits with Discounting

Manas K. Chattopadhyay

Full-text: Open access

Abstract

Sequential selections are to be made from two independent stochastic processes, or "arms." At each stage we choose which arm to observe based on past selections and observations. The observations on arm $i$ are conditionally i.i.d. given their marginal distribution $P_i$ which has a Dirichlet process prior with parameter $\alpha_i, i = 1, 2$. Future observations are discounted: at stage $m$, the payoff is $a_m$ times the observation $Z_m$ at that stage. The discount sequence $A_n = (a_1, a_2,\cdots, a_n, 0,0,\cdots)$ is a nonincreasing sequence of nonnegative numbers, where the "horizon" $n$ is finite. The objective is to maximize the total expected payoff $E(\sum^n_1a_iZ_i)$. It is shown that optimal strategies continue with an arm when it yields a sufficiently large observation, one larger than a "break-even observation." This generalizes results of Clayton and Berry, who considered two arms with one arm known and assumed $a_m = 1 \forall m \leq n$.

Article information

Source
Ann. Statist., Volume 22, Number 3 (1994), 1212-1221.

Dates
First available in Project Euclid: 11 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aos/1176325626

Digital Object Identifier
doi:10.1214/aos/1176325626

Mathematical Reviews number (MathSciNet)
MR1311973

Zentralblatt MATH identifier
0818.62067

JSTOR
links.jstor.org

Subjects
Primary: 62L05: Sequential design
Secondary: 62C10: Bayesian problems; characterization of Bayes procedures

Keywords
Sequential decisions two-armed bandits one-armed bandits Dirichlet bandits Dirichlet process prior

Citation

Chattopadhyay, Manas K. Two-Armed Dirichlet Bandits with Discounting. Ann. Statist. 22 (1994), no. 3, 1212--1221. doi:10.1214/aos/1176325626. https://projecteuclid.org/euclid.aos/1176325626


Export citation