The Annals of Applied Probability

Convergent multiple-timescales reinforcement learning algorithms in normal form games

E. J. Collins and David S. Leslie

Full-text: Open access

Abstract

We consider reinforcement learning algorithms in normal form games. Using two-timescales stochastic approximation, we introduce a model-free algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surely to Nash distribution in two-player zero-sum games and $N$-player partnership games. However, there are simple games for which these, and most other adaptive processes, fail to converge--in particular, we consider the $N$-player matching pennies game and Shapley's variant of the rock--scissors--paper game. By extending stochastic approximation results to multiple timescales we can allow each player to learn at a different rate. We show that this extension will converge for two-player zero-sum games and two-player partnership games, as well as for the two special cases we consider.

Article information

Source
Ann. Appl. Probab., Volume 13, Number 4 (2003), 1231-1251.

Dates
First available in Project Euclid: 25 November 2003

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1069786497

Digital Object Identifier
doi:10.1214/aoap/1069786497

Mathematical Reviews number (MathSciNet)
MR2023875

Zentralblatt MATH identifier
1084.68102

Subjects
Primary: 68T05: Learning and adaptive systems [See also 68Q32, 91E40] 91A20: Multistage and repeated games

Keywords
Stochastic approximation reinforcement learning repeated normal form games best response dynamics

Citation

Leslie, David S.; Collins, E. J. Convergent multiple-timescales reinforcement learning algorithms in normal form games. Ann. Appl. Probab. 13 (2003), no. 4, 1231--1251. doi:10.1214/aoap/1069786497. https://projecteuclid.org/euclid.aoap/1069786497


Export citation

References

  • Benaïm, M. (1999). Dynamics of stochastic approximation algorithms. Le Séminaire de Probabilités XXXIII. Lecture Notes in Math. 1709 1--68. Springer, Berlin.
  • Benaïm, M. and Hirsch, M. W. (1999). Mixed equilibria and dynamical systems arising from fictitious play in perturbed games. Games Econom. Behav. 29 36--72.
  • Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
  • Borkar, V. S. (1997). Stochastic approximation with two timescales. Systems Control Lett. 29 291--294.
  • Borkar, V. S. (2002). Reinforcement learning in Markovian evolutionary games. Available at www.tcs.tifr.res.in/$\sim$borkar/games.ps.
  • Cowan, S. (1992). Dynamical systems arising from game theory. Ph.D. dissertation, Univ. California, Berkeley.
  • Fudenberg, D. and Kreps, D. M. (1993). Learning mixed equilibria. Games Econom. Behav. 5 320--367.
  • Fudenberg, D. and Levine, D. K. (1998). The Theory of Learning in Games. MIT Press, Cambridge, MA.
  • Harsanyi, J. (1973). Games with randomly disturbed payoffs: A new rationale for mixed-strategy equilibrium points. Internat. J. Game Theory 2 1--23.
  • Hofbauer, J. and Hopkins, E. (2002). Learning in perturbed asymmetric games. Available at www.econ.ed.ac.uk/pdf/perturb.pdf.
  • Hopkins, E. (1999). A note on best response dynamics. Games Econom. Behav. 29 138--150.
  • Jones, C. K. R. T. (1995). Geometric singular perturbation theory. Dynamical Systems. Lecture Notes in Math. 1609 44--118. Springer, Berlin.
  • Jordan, J. S. (1993). Three problems in learning mixed strategy equilibria. Games Econom. Behav. 5 368--386.
  • Konda, V. R. and Borkar, V. S. (2000). Actor--critic-type learning algorithms for Markov decision process. SIAM J. Control Opt. 38 94--123.
  • Kushner, H. J. and Clark, D. S. (1978). Stochastic Approximation Methods for Constrained and Unconstrained Systems. Springer, New York.
  • Littman, M. and Stone, P. (2001). Implicit negotiation in repeated games. Intelligent agents VIII: Agent Theories, Architectures and Languages. Lecture Notes in Comput. Sci. 2333 393--404. Springer, Berlin.
  • Nash, J. (1951). Non-cooperative games. Ann. Math. 54 286--295.
  • Pemantle, R. (1990). Nonconvergence to unstable points in urn models and stochastic approximations. Ann. Probab. 18 698--712.
  • Shapley, L. S. (1964). Some topics in two person games. In Advances in Game Theory (M. Dresher, L. S. Shapley and A. W. Tucker, eds.) 1--28. Princeton Univ. Press.
  • Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press.