We consider reinforcement learning algorithms in normal
form games. Using two-timescales stochastic approximation, we
introduce a model-free algorithm which is asymptotically equivalent
to the smooth fictitious play algorithm, in that both result in
asymptotic pseudotrajectories to the flow defined by the smooth
best response dynamics. Both of these algorithms are shown to
converge almost surely to Nash distribution in two-player
zero-sum games and $N$-player partnership games. However, there are
simple games for which these, and most other adaptive processes,
fail to converge--in particular, we consider the $N$-player
matching pennies game and Shapley's variant of the
rock--scissors--paper game. By extending stochastic approximation
results to multiple timescales we can allow each player to learn at
a different rate. We show that this extension will converge for
two-player zero-sum games and two-player partnership games, as well
as for the two special cases we consider.
References
Benaïm, M. (1999). Dynamics of stochastic approximation algorithms. Le Séminaire de Probabilités XXXIII. Lecture Notes in Math. 1709 1--68. Springer, Berlin.
Benaïm, M. and Hirsch, M. W. (1999). Mixed equilibria and dynamical systems arising from fictitious play in perturbed games. Games Econom. Behav. 29 36--72.
Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
Borkar, V. S. (1997). Stochastic approximation with two timescales. Systems Control Lett. 29 291--294.
Borkar, V. S. (2002). Reinforcement learning in Markovian evolutionary games. Available at www.tcs.tifr.res.in/$\sim$borkar/games.ps.
Cowan, S. (1992). Dynamical systems arising from game theory. Ph.D. dissertation, Univ. California, Berkeley.
Fudenberg, D. and Kreps, D. M. (1993). Learning mixed equilibria. Games Econom. Behav. 5 320--367.
Fudenberg, D. and Levine, D. K. (1998). The Theory of Learning in Games. MIT Press, Cambridge, MA.
Harsanyi, J. (1973). Games with randomly disturbed payoffs: A new rationale for mixed-strategy equilibrium points. Internat. J. Game Theory 2 1--23.
Mathematical Reviews (MathSciNet):
MR323363
Hofbauer, J. and Hopkins, E. (2002). Learning in perturbed asymmetric games. Available at www.econ.ed.ac.uk/pdf/perturb.pdf.
Hopkins, E. (1999). A note on best response dynamics. Games Econom. Behav. 29 138--150.
Jones, C. K. R. T. (1995). Geometric singular perturbation theory. Dynamical Systems. Lecture Notes in Math. 1609 44--118. Springer, Berlin.
Jordan, J. S. (1993). Three problems in learning mixed strategy equilibria. Games Econom. Behav. 5 368--386.
Konda, V. R. and Borkar, V. S. (2000). Actor--critic-type learning algorithms for Markov decision process. SIAM J. Control Opt. 38 94--123.
Kushner, H. J. and Clark, D. S. (1978). Stochastic Approximation Methods for Constrained and Unconstrained Systems. Springer, New York.
Mathematical Reviews (MathSciNet):
MR499560
Littman, M. and Stone, P. (2001). Implicit negotiation in repeated games. Intelligent agents VIII: Agent Theories, Architectures and Languages. Lecture Notes in Comput. Sci. 2333 393--404. Springer, Berlin.
Nash, J. (1951). Non-cooperative games. Ann. Math. 54 286--295.
Mathematical Reviews (MathSciNet):
MR43432
Pemantle, R. (1990). Nonconvergence to unstable points in urn models and stochastic approximations. Ann. Probab. 18 698--712.
Shapley, L. S. (1964). Some topics in two person games. In Advances in Game Theory (M. Dresher, L. S. Shapley and A. W. Tucker, eds.) 1--28. Princeton Univ. Press.
Mathematical Reviews (MathSciNet):
MR198990
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press.