Motivated by an application in computerized adaptive tests, we consider the following sequential design problem.There are $J$ jobs to be processed according to a predetermined order. A single machine is available to process these $J$ jobs. Each job under processing evolves stochastically as a Markov chain and earns rewards as it is processed, not otherwise. The Markov chain has transition probabilities parameterized by an unknown parameter $\theta$. The objective is to determine how long each job should be processed so that the total expected rewards over an extended time interval is maximized. We define the regret associated with a strategy as the shortfall from the maximum expected reward under complete information on $\theta$. Therefore the problem is equivalent to minimizing the regret. The asymptotic lower bound for the regret associated with any uniformly good strategy is characterized by a deterministic constraint minimization problem. In ignorance of the parameter value, we construct a class of efficient strategies, which achieve the lower bound, based on the theory of sequential testing.
"Asymptotically efficient strategies for a stochastic scheduling problem with order constraints." Ann. Statist. 28 (6) 1670 - 1695, December2000. https://doi.org/10.1214/aos/1015957475