Source: Ann. Appl. Probab. Volume 14, Number 2
(2004), 796-819.
We study the rate of convergence of linear two-time-scale stochastic approximation methods. We consider two-time-scale linear iterations driven by i.i.d. noise, prove some results on their asymptotic covariance and establish asymptotic normality. The well-known result [Polyak, B. T. (1990). Automat. Remote Contr. 51 937–946; Ruppert, D. (1988). Technical Report 781, Cornell Univ. ] on the optimality of Polyak–Ruppert averaging techniques specialized to linear stochastic approximation is established as a consequence of the general results in this paper.
References
Baras, J. S. and Borkar, V. S. (2000). A learning algorithm for Markov decision processes with adaptive state aggregation. In Proc. 39th IEEE Conference on Decision and Control. IEEE, New York.
Benveniste, A., Metivier, M. and Priouret, P. (1990). Adaptive Algorithms and Stochastic Approximations. Springer, Berlin.
Bhatnagar, S., Fu, M. C., Marcus, S. I. and Bhatnagar, S. (2001). Two timescale algorithms for simulation optimization of hidden Markov models. IIE Transactions 3 245--258.
Bhatnagar, S., Fu, M. C., Marcus, S. I. and Fard, P. J. (2001). Optimal structured feedback policies for ABR flow control using two timescale SPSA. IEEE/ACM Transactions on Networking 9 479--491.
Borkar, V. S. (1997). Stochastic approximation with two time scales. Systems Control Lett. 29 291--294.
Duflo, M. (1997). Random Iterative Models. Springer, Berlin.
Kokotovic, P. V. (1984). Applications of singular perturbation techniques to control problems. SIAM Rev. 26 501--550.
Mathematical Reviews (MathSciNet):
MR765671
Konda, V. R. (2002). Actor-critic algorithms. Ph.D. dissertation, Dept. Electrical Engineering and Computer Science, MIT.
Konda, V. R. and Borkar, V. S. (1999). Actor-critic like learning algorithms for Markov decision processes. SIAM J. Control Optim. 38 94--123.
Konda, V. R. and Tsitsiklis, J. N. (2003). On actor-critic algorithms. SIAM J. Control Optim. 42 1143--1166.
Kushner, H. J. and Clark, D. S. (1978). Stochastic Approximation for Constrained and Unconstrained Systems. Springer, New York.
Mathematical Reviews (MathSciNet):
MR499560
Kushner, H. J. and Yang, J. (1993). Stochastic approximation with averaging of the iterates: Optimal asymptotic rates of convergence for general processes. SIAM J. Control Optim. 31 1045--1062.
Kushner, H. J. and Yin, G. G. (1997). Stochastic Approximation Algorithms and Applications. Springer, New York.
Nevel'son, M. B. and Has'minskii, R. Z. (1973). Stochastic Approximation and Recursive Estimation. Amer. Math. Soc., Providence, RI.
Mathematical Reviews (MathSciNet):
MR423714
Polyak, B. T. (1976). Convergence and convergence rate of iterative stochastic algorithms I. Automat. Remote Control 12 1858--1868.
Mathematical Reviews (MathSciNet):
MR462747
Polyak, B. T. (1990). New method of stochastic approximation type. Automat. Remote Control 51 937--946.
Polyak, B. T. and Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30 838--855.
Ruppert, D. (1988). Efficient estimators from a slowly convergent Robbins--Monro procedure. Technical Report 781, School of Operations Research and Industrial Engineering, Cornell Univ.