The Annals of Statistics

The Two-Armed Bandit with Delayed Responses

Stephen G. Eick

Full-text: Open access

Abstract

A general model for a two-armed bandit with delayed responses is introduced and solved with dynamic programming. One arm has geometric lifetime with parameter $\theta$, which has prior distribution $\mu$. The other arm has known lifetime with mean $\kappa$. The response delays completely change the character of the optimal strategies from the no delay case; in particular, the bandit is no longer a stopping problem. The delays also introduce an extra parameter $p$ into the state space. In clinical trial applications, this parameter represents the number of patients previously treated with the unknown arm who are still living. The value function is introduced and investigated as $p, \mu$ and $\kappa$ vary. Under a regularity condition on the discount sequence, there exists a manifold in the state space such that both arms are optimal on the manifold, arm $x$ is optimal on one side and arm $y$ on the other. Properties of the manifold are described.

Article information

Source
Ann. Statist., Volume 16, Number 1 (1988), 254-264.

Dates
First available in Project Euclid: 12 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aos/1176350703

Digital Object Identifier
doi:10.1214/aos/1176350703

Mathematical Reviews number (MathSciNet)
MR924869

Zentralblatt MATH identifier
0637.62074

JSTOR
links.jstor.org

Subjects
Primary: 62L05: Sequential design
Secondary: 62L15: Optimal stopping [See also 60G40, 91A60]

Keywords
Two-armed bandits delayed responses randomized clinical trials dynamic programming optimal strategies

Citation

Eick, Stephen G. The Two-Armed Bandit with Delayed Responses. Ann. Statist. 16 (1988), no. 1, 254--264. doi:10.1214/aos/1176350703. https://projecteuclid.org/euclid.aos/1176350703


Export citation