Convergent multiple-timescales reinforcement learning algorithms in normal
form games
David S. Leslie and E. J. Collins
We consider reinforcement learning algorithms in normal
form games. Using two-timescales stochastic approximation we
introduce a model-free algorithm which is asymptotically equivalent
to smooth fictitious play, since both result in asymptotic
pseudotrajectories to the flow defined by the smooth best response
dynamics. Both of these algorithms are shown to converge almost
surely to the Nash distribution in two-player zero-sum games and
N-player partnership games. However there are simple games
for which these, and most other adaptive processes, fail to
converge - in particular we consider the N-player matching
pennies game and Shapley's variant of the rock-scissors-paper game.
By extending stochastic approximation results to multiple timescales
we can allow each player to learn at a different rate. This
extension will converge for two-player zero-sum games and two-player
partnership games. It will also converge for the two special cases
we consider.
Some key words:
Stochastic approximation, reinforcement learning,
repeated normal form games, best response dynamics.