The Annals of Applied Probability

Convergence rate of linear two-time-scale stochastic approximation

Vijay R. Konda and John N. Tsitsiklis
Source: Ann. Appl. Probab. Volume 14, Number 2 (2004), 796-819.

Abstract

We study the rate of convergence of linear two-time-scale stochastic approximation methods. We consider two-time-scale linear iterations driven by i.i.d. noise, prove some results on their asymptotic covariance and establish asymptotic normality. The well-known result [Polyak, B. T. (1990). Automat. Remote Contr. 51 937–946; Ruppert, D. (1988). Technical Report 781, Cornell Univ. ] on the optimality of Polyak–Ruppert averaging techniques specialized to linear stochastic approximation is established as a consequence of the general results in this paper.

First Page: Show Hide
Primary Subjects: 62L20
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoap/1082737112
Digital Object Identifier: doi:10.1214/105051604000000116
Mathematical Reviews number (MathSciNet): MR2052903
Zentralblatt MATH identifier: 02100755

References

Baras, J. S. and Borkar, V. S. (2000). A learning algorithm for Markov decision processes with adaptive state aggregation. In Proc. 39th IEEE Conference on Decision and Control. IEEE, New York.
Benveniste, A., Metivier, M. and Priouret, P. (1990). Adaptive Algorithms and Stochastic Approximations. Springer, Berlin.
Zentralblatt MATH: 0752.93073
Mathematical Reviews (MathSciNet): MR1082341
Bhatnagar, S., Fu, M. C., Marcus, S. I. and Bhatnagar, S. (2001). Two timescale algorithms for simulation optimization of hidden Markov models. IIE Transactions 3 245--258.
Bhatnagar, S., Fu, M. C., Marcus, S. I. and Fard, P. J. (2001). Optimal structured feedback policies for ABR flow control using two timescale SPSA. IEEE/ACM Transactions on Networking 9 479--491.
Borkar, V. S. (1997). Stochastic approximation with two time scales. Systems Control Lett. 29 291--294.
Mathematical Reviews (MathSciNet): MR1432654
Duflo, M. (1997). Random Iterative Models. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR1485774
Zentralblatt MATH: 0868.62069
Kokotovic, P. V. (1984). Applications of singular perturbation techniques to control problems. SIAM Rev. 26 501--550.
Mathematical Reviews (MathSciNet): MR765671
Digital Object Identifier: doi:10.1137/1026104
Zentralblatt MATH: 0548.93001
Konda, V. R. (2002). Actor-critic algorithms. Ph.D. dissertation, Dept. Electrical Engineering and Computer Science, MIT.
Konda, V. R. and Borkar, V. S. (1999). Actor-critic like learning algorithms for Markov decision processes. SIAM J. Control Optim. 38 94--123.
Mathematical Reviews (MathSciNet): MR1740605
Digital Object Identifier: doi:10.1137/S036301299731669X
Zentralblatt MATH: 0938.93069
Konda, V. R. and Tsitsiklis, J. N. (2003). On actor-critic algorithms. SIAM J. Control Optim. 42 1143--1166.
Mathematical Reviews (MathSciNet): MR2044789
Digital Object Identifier: doi:10.1137/S0363012901385691
Zentralblatt MATH: 1049.93095
Kushner, H. J. and Clark, D. S. (1978). Stochastic Approximation for Constrained and Unconstrained Systems. Springer, New York.
Mathematical Reviews (MathSciNet): MR499560
Kushner, H. J. and Yang, J. (1993). Stochastic approximation with averaging of the iterates: Optimal asymptotic rates of convergence for general processes. SIAM J. Control Optim. 31 1045--1062.
Mathematical Reviews (MathSciNet): MR1227546
Digital Object Identifier: doi:10.1137/0331047
Zentralblatt MATH: 0788.62078
Kushner, H. J. and Yin, G. G. (1997). Stochastic Approximation Algorithms and Applications. Springer, New York.
Mathematical Reviews (MathSciNet): MR1453116
Zentralblatt MATH: 0914.60006
Nevel'son, M. B. and Has'minskii, R. Z. (1973). Stochastic Approximation and Recursive Estimation. Amer. Math. Soc., Providence, RI.
Mathematical Reviews (MathSciNet): MR423714
Polyak, B. T. (1976). Convergence and convergence rate of iterative stochastic algorithms I. Automat. Remote Control 12 1858--1868.
Mathematical Reviews (MathSciNet): MR462747
Polyak, B. T. (1990). New method of stochastic approximation type. Automat. Remote Control 51 937--946.
Mathematical Reviews (MathSciNet): MR1071220
Polyak, B. T. and Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30 838--855.
Mathematical Reviews (MathSciNet): MR1167814
Digital Object Identifier: doi:10.1137/0330046
Zentralblatt MATH: 0762.62022
Ruppert, D. (1988). Efficient estimators from a slowly convergent Robbins--Monro procedure. Technical Report 781, School of Operations Research and Industrial Engineering, Cornell Univ.

2012 © Institute of Mathematical Statistics

The Annals of Applied Probability

The Annals of Applied Probability