This paper considers the multiarmed bandit problem and presents a new proof of the optimality of the Gittins index policy. The proof is intuitive and does not require an interchange argument. The insight it affords is used to give a streamlined summary of previous research and to prove a new result: The optimal value function is a submodular set function of the available projects.
Richard Weber. "On the Gittins Index for Multiarmed Bandits." Ann. Appl. Probab. 2 (4) 1024 - 1033, November, 1992. https://doi.org/10.1214/aoap/1177005588