Convergent multiple-timescales reinforcement learning algorithms in normal form games



The Annals of Applied Probability

Convergent multiple-timescales reinforcement learning algorithms in normal form games

E. J. Collins and David S. Leslie

Source: Ann. Appl. Probab. Volume 13, Number 4 (2003), 1231-1251.

Abstract

We consider reinforcement learning algorithms in normal form games. Using two-timescales stochastic approximation, we introduce a model-free algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surely to Nash distribution in two-player zero-sum games and $N$-player partnership games. However, there are simple games for which these, and most other adaptive processes, fail to converge--in particular, we consider the $N$-player matching pennies game and Shapley's variant of the rock--scissors--paper game. By extending stochastic approximation results to multiple timescales we can allow each player to learn at a different rate. We show that this extension will converge for two-player zero-sum games and two-player partnership games, as well as for the two special cases we consider.

Primary Subjects: 68T05, 91A20
Keywords: Stochastic approximation; reinforcement learning; repeated normal form games; best response dynamics

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoap/1069786497
Digital Object Identifier: doi:10.1214/aoap/1069786497
Mathematical Reviews number (MathSciNet): MR2023875
Zentralblatt MATH identifier: 02063737

References

Benaïm, M. (1999). Dynamics of stochastic approximation algorithms. Le Séminaire de Probabilités XXXIII. Lecture Notes in Math. 1709 1--68. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR1767993
Benaïm, M. and Hirsch, M. W. (1999). Mixed equilibria and dynamical systems arising from fictitious play in perturbed games. Games Econom. Behav. 29 36--72.
Mathematical Reviews (MathSciNet): MR1729309
Digital Object Identifier: doi:10.1006/game.1999.0717
Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
Zentralblatt MATH: 0924.68163
Borkar, V. S. (1997). Stochastic approximation with two timescales. Systems Control Lett. 29 291--294.
Mathematical Reviews (MathSciNet): MR1432654
Borkar, V. S. (2002). Reinforcement learning in Markovian evolutionary games. Available at www.tcs.tifr.res.in/$\sim$borkar/games.ps.
Mathematical Reviews (MathSciNet): MR1899835
Digital Object Identifier: doi:10.1142/S0219525902000535
Cowan, S. (1992). Dynamical systems arising from game theory. Ph.D. dissertation, Univ. California, Berkeley.
Fudenberg, D. and Kreps, D. M. (1993). Learning mixed equilibria. Games Econom. Behav. 5 320--367.
Mathematical Reviews (MathSciNet): MR1227915
Digital Object Identifier: doi:10.1006/game.1993.1021
Fudenberg, D. and Levine, D. K. (1998). The Theory of Learning in Games. MIT Press, Cambridge, MA.
Mathematical Reviews (MathSciNet): MR1629477
Zentralblatt MATH: 0939.91004
Harsanyi, J. (1973). Games with randomly disturbed payoffs: A new rationale for mixed-strategy equilibrium points. Internat. J. Game Theory 2 1--23.
Mathematical Reviews (MathSciNet): MR323363
Digital Object Identifier: doi:10.1007/BF01737554
Hofbauer, J. and Hopkins, E. (2002). Learning in perturbed asymmetric games. Available at www.econ.ed.ac.uk/pdf/perturb.pdf.
Hopkins, E. (1999). A note on best response dynamics. Games Econom. Behav. 29 138--150.
Mathematical Reviews (MathSciNet): MR1729314
Digital Object Identifier: doi:10.1006/game.1997.0636
Jones, C. K. R. T. (1995). Geometric singular perturbation theory. Dynamical Systems. Lecture Notes in Math. 1609 44--118. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR1374108
Zentralblatt MATH: 0840.58040
Jordan, J. S. (1993). Three problems in learning mixed strategy equilibria. Games Econom. Behav. 5 368--386.
Mathematical Reviews (MathSciNet): MR1227916
Digital Object Identifier: doi:10.1006/game.1993.1022
Konda, V. R. and Borkar, V. S. (2000). Actor--critic-type learning algorithms for Markov decision process. SIAM J. Control Opt. 38 94--123.
Mathematical Reviews (MathSciNet): MR1740605
Digital Object Identifier: doi:10.1137/S036301299731669X
Kushner, H. J. and Clark, D. S. (1978). Stochastic Approximation Methods for Constrained and Unconstrained Systems. Springer, New York.
Mathematical Reviews (MathSciNet): MR499560
Littman, M. and Stone, P. (2001). Implicit negotiation in repeated games. Intelligent agents VIII: Agent Theories, Architectures and Languages. Lecture Notes in Comput. Sci. 2333 393--404. Springer, Berlin.
Nash, J. (1951). Non-cooperative games. Ann. Math. 54 286--295.
Mathematical Reviews (MathSciNet): MR43432
Pemantle, R. (1990). Nonconvergence to unstable points in urn models and stochastic approximations. Ann. Probab. 18 698--712.
Mathematical Reviews (MathSciNet): MR1055428
Shapley, L. S. (1964). Some topics in two person games. In Advances in Game Theory (M. Dresher, L. S. Shapley and A. W. Tucker, eds.) 1--28. Princeton Univ. Press.
Mathematical Reviews (MathSciNet): MR198990
Zentralblatt MATH: 0126.16204
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press.

2008 © Institute of Mathematical Statistics