Open Access
Translator Disclaimer
November 2003 Convergent multiple-timescales reinforcement learning algorithms in normal form games
David S. Leslie, E. J. Collins
Ann. Appl. Probab. 13(4): 1231-1251 (November 2003). DOI: 10.1214/aoap/1069786497


We consider reinforcement learning algorithms in normal form games. Using two-timescales stochastic approximation, we introduce a model-free algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surely to Nash distribution in two-player zero-sum games and $N$-player partnership games. However, there are simple games for which these, and most other adaptive processes, fail to converge--in particular, we consider the $N$-player matching pennies game and Shapley's variant of the rock--scissors--paper game. By extending stochastic approximation results to multiple timescales we can allow each player to learn at a different rate. We show that this extension will converge for two-player zero-sum games and two-player partnership games, as well as for the two special cases we consider.


Download Citation

David S. Leslie. E. J. Collins. "Convergent multiple-timescales reinforcement learning algorithms in normal form games." Ann. Appl. Probab. 13 (4) 1231 - 1251, November 2003.


Published: November 2003
First available in Project Euclid: 25 November 2003

zbMATH: 1084.68102
MathSciNet: MR2023875
Digital Object Identifier: 10.1214/aoap/1069786497

Primary: 68T05 , 91A20

Keywords: best response dynamics , reinforcement learning , repeated normal form games , stochastic approximation

Rights: Copyright © 2003 Institute of Mathematical Statistics


Vol.13 • No. 4 • November 2003
Back to Top