An inequality for variances of the discounted rewards
EUGENE A. FEINBERG and JUN FEI
Source: J. Appl. Probab. Volume 46, Number 4
(2009), 1209-1212.
Abstract
We consider the following two definitions of discounting:
(i) multiplicative coefficient in front of the rewards, and
(ii) probability that the process has not been stopped if the
stopping time has an exponential distribution independent of the
process. It is well known that the expected total discounted
rewards corresponding to these definitions are the same. In this
note we show that, the variance of the total discounted rewards is
smaller for the first definition than for the second definition.
Secondary Subjects:
90C40
Full-text: Access denied (no subscription
detected)
We're sorry, but we are unable to provide
you with the full text of this article because we are not able to identify
you as a subscriber.
If you have a personal subscription to
this journal, then please login. If you are already logged in, then you
may need to update your profile to register your subscription.
Read more about accessing full-text
Links and Identifiers
Permanent link to this document: http://projecteuclid.org/euclid.jap/1261670699
Digital Object Identifier: doi:10.1239/jap/1261670699
Zentralblatt MATH identifier: 05665466
Mathematical Reviews number (MathSciNet): MR2582716
References
Baykal-Gürsoy, M. and Gürsoy, K. (2007). Semi-Markov decision processes: nonstandard criteria. Prob. Eng. Inf. Sci. 21, 635--657.
Feinberg, E. A. (2004). Continuous time discounted jump Markov decision processes: a discrete-event approach. Math. Operat. Res. 29, 492--524.
Fristedt, B. and Gray, L. (1997). A Modern Approach to Probability Theory. Birkhäuser, Boston, MA.
Jaquette, S. C. (1975). Markov decision processes with a new optimality criterion: continuous time. Ann. Statist. 3, 547--553.
Mathematical Reviews (MathSciNet):
MR363493
Markowitz, H. M. (1952). Portfolio selection. J. Finance 7, 77--91.
Shiryaev, A. N. (1996). Probability, 2nd edn. Springer, New York.
Sobel, M. J. (1982). The variance of discounted Markov decision processes. J. Appl. Prob. 19, 794--802.
Mathematical Reviews (MathSciNet):
MR675143
Sobel, M. J. (1985). Maximal mean/standard deviation ratio in an undiscounted MDP. Operat. Res. Lett. 4, 157--159.
Mathematical Reviews (MathSciNet):
MR821178
Sobel, M. J. (1994). Mean-variance tradeoffs in an undiscounted MDP. Operat. Res. 42, 175--183.
Van Dijk, N. M. and Sladký, K. (2006). On the total reward variance for continuous-time Markov reward chains. J. Appl. Prob. 43, 1044--1052.
White, D. J. (1988). Mean, variance, and probabilistic criteria in finite Markov decision processes: a review. J. Optimization Theory Appl. 56, 1--29.
Mathematical Reviews (MathSciNet):
MR922375