Model-free mean-field reinforcement learning: Mean-field MDP and mean-field Q-learning

René Carmona; Mathieu Laurière; Zongjun Tan

doi:10.1214/23-AAP1949

Abstract

We study infinite horizon discounted mean field control (MFC) problems with common noise through the lens of mean field Markov decision processes (MFMDP). We allow the agents to use actions that are randomized not only at the individual level but also at the level of the population. This common randomization is introduced for the purpose of exploration from a reinforcement learning (RL) paradigm. It also allows us to establish connections between both closed-loop and open-loop policies for MFC and Markov policies for the MFMDP. In particular, we show that there exists an optimal closed-loop policy for the original MFC and we prove dynamic programming principles for the state and state-action value functions. Building on this framework and the notion of state-action value function, we then propose RL methods for such problems, by adapting existing tabular and deep RL methods to the mean-field setting. The main difficulty is the treatment of the population state, which is an input of the policy and the value function. We provide convergence guarantees for the tabular Q-learning algorithm based on discretizations of the simplex. We also show that neural network based deep RL algorithms are more suitable for continuous spaces as they allow us to avoid discretizing the mean field state space. Numerical examples are provided.

Funding Statement

This work has been supported by NSF Grant DMS-1716673, ARO Grant W911NF-17-1-0578, and AFOSR Grant FA9550-19-1-0291.

Acknowledgements

We would like to thank M. Motte and H. Pham for an enlightening discussion which led to Remark 20. We would also like to thank M. Geist and J. Pérolat for helpful discussions on the DDPG algorithm.

Citation

Download Citation

René Carmona. Mathieu Laurière. Zongjun Tan. "Model-free mean-field reinforcement learning: Mean-field MDP and mean-field Q-learning." Ann. Appl. Probab. 33 (6B) 5334 - 5381, December 2023. https://doi.org/10.1214/23-AAP1949

Information

Received: 1 March 2022; Revised: 1 November 2022; Published: December 2023

First available in Project Euclid: 13 December 2023

MathSciNet: MR4677735

Digital Object Identifier: 10.1214/23-AAP1949

Subjects:

Primary: 65M12 , 65M99 , 93E20 , 93E25

Keywords: McKean–Vlasov control , mean field Markov decision processes , Mean field reinforcement learning

Abstract

Funding Statement

Acknowledgements

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS