Open Access
2007 A survey of some simulation-based algorithms for Markov decision processes
Hyeong Soo Chang, Michael C. Fu, Jiaqiao Hu, Steven I. Marcus
Commun. Inf. Syst. 7(1): 59-92 (2007).


Many problems modeled by Markov decision processes (MDPs) have very large state and/or action spaces, leading to the well-known curse of dimensionality that makes solution of the resulting models intractable. In other cases, the system of interest is complex enough that it is not feasible to explicitly specify some of the MDP model parameters, but simulated sample paths can be readily generated (e.g., for random state transitions and rewards), albeit at a non-trivial computational cost. For these settings, we have developed various sampling and population-based numerical algorithms to overcome the computational difficulties of computing an optimal solution in terms of a policy and/or value function. Specific approaches presented in this survey include multi-stage adaptive sampling, evolutionary policy iteration and evolutionary random policy search.


Download Citation

Hyeong Soo Chang. Michael C. Fu. Jiaqiao Hu. Steven I. Marcus. "A survey of some simulation-based algorithms for Markov decision processes." Commun. Inf. Syst. 7 (1) 59 - 92, 2007.


Published: 2007
First available in Project Euclid: 20 July 2007

zbMATH: 1140.90503
MathSciNet: MR2346579

Keywords: (adaptive) sampling , Markov decision process , population-based algorithms

Rights: Copyright © 2007 International Press of Boston

Vol.7 • No. 1 • 2007
Back to Top