December 2022 Toward theoretical understandings of robust Markov decision processes: Sample complexity and asymptotics
Wenhao Yang, Liangyu Zhang, Zhihua Zhang
Author Affiliations +
Ann. Statist. 50(6): 3223-3248 (December 2022). DOI: 10.1214/22-AOS2225

Abstract

In this paper, we study the nonasymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes (MDPs), where the optimal robust policy and value function are estimated from a generative model. While prior work focusing on nonasymptotic performances of robust MDPs is restricted in the setting of the KL uncertainty set and (s,a)-rectangular assumption, we improve their results and also consider other uncertainty sets, including the L1 and χ2 balls. Our results show that when we assume (s,a)-rectangular on uncertainty sets, the sample complexity is about O˜(|S|2|A|ε2ρ2(1γ)4). In addition, we extend our results from the (s,a)-rectangular assumption to the s-rectangular assumption. In this scenario, the sample complexity varies with the choice of uncertainty sets and is generally larger than the case under the (s,a)-rectangular assumption. Moreover, we also show that the optimal robust value function is asymptotically normal with a typical rate n under the (s,a) and s-rectangular assumptions from both theoretical and empirical perspectives.

Funding Statement

This work has been supported by the National Key Research and Development Project of China (No. 2018AAA0101004).

Acknowledgments

The authors would like to thank the anonymous referees, the Associate Editor and the Editor for their detailed and constructive comments that improved the quality of this paper. The authors would also thank Xiang Li and Dachao Lin for a discussion related to DRO and some inequalities.

Citation

Download Citation

Wenhao Yang. Liangyu Zhang. Zhihua Zhang. "Toward theoretical understandings of robust Markov decision processes: Sample complexity and asymptotics." Ann. Statist. 50 (6) 3223 - 3248, December 2022. https://doi.org/10.1214/22-AOS2225

Information

Received: 1 November 2021; Revised: 1 July 2022; Published: December 2022
First available in Project Euclid: 21 December 2022

MathSciNet: MR4524495
zbMATH: 07641124
Digital Object Identifier: 10.1214/22-AOS2225

Subjects:
Primary: 62C05 , 62F12
Secondary: 68Q32

Keywords: distributional robustness , f-divergence set , Model-based reinforcement learning , robust MDPs

Rights: Copyright © 2022 Institute of Mathematical Statistics

JOURNAL ARTICLE
26 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.50 • No. 6 • December 2022
Back to Top