## Abstract

In this note we give a result on the maximum sample excursions of Kiefer-Wolfowitz stochastic approximation processes. The method is applicable to other stochastic approximation procedures, and under other conditions than those assumed here. Let $y(x)$ be a scalar valued random variable with distribution function $H(y \mid x)$, where $x$ is a scalar valued parameter. Define $M(x) = \int yH(dy \mid x)$. Let $M(x)$ be continuous and have a unique local maximum at $x = \theta$ and let $a_n, c_n$ be sequences of positive real numbers satisfying \begin{equation*}\tag{1}\sum a_n = \infty,\quad\sum a_n^2c_n^{-2} < \infty,\quad\sum a_nc_n < \infty,\quad c_n \rightarrow 0.\end{equation*} The sequence of random variables $x_n$ defined by \begin{equation*}\tag{2}x_{n + 1} = x_n + a_n\lbrack y(x_n + c_n) - y(x_n - c_n)\rbrack/c_n,\quad x_0 \text{given},\end{equation*} is known as a Kiefer-Wolfowitz process and, under mild conditions on $M(x)$ and $H(y \mid x), x_n$ is known to converge to $\theta$ w.p.1. (See, e.g., Schmetterer [6] for a review of such results.) A result of this note is an estimate of the following form, for any $m < \infty$ and even integer $r$, \begin{equation*}\tag{3}P\lbrack\max_{m \geqq n \geqq N} |x_n - \theta| > \epsilon\rbrack < \lbrack E(x_N - \theta)^r + \delta_{Nr}\rbrack/\epsilon^r\end{equation*} where $\delta_{Nr}$ depends on the sequences $a_n$ and $c_n$ and can be made arbitrarily small for each fixed $N$ and $r$, while $x_n \rightarrow \theta$ w.p.1. is still insured. As a special case of (3), let $|x_N - \theta|$ be unknown but assumed nonrandom. Let $\epsilon = \beta + (1 + \alpha)|x_N - \theta|, \beta > 0$ and $\alpha > 0$. Then \begin{equation*}\tag{4}P\lbrack\max_{m \geqq n \geqq N} |x_n - \theta| > (1 + \alpha)|x_N - \theta| + \beta\rbrack \end{equation*} $< \lbrack(x_N - \theta)^r + \delta_{Nr}\rbrack/\lbrack \beta + (1 + \alpha)|x_N - \theta|\rbrack^r$ which can be made arbitrarily small by fixing $r$ sufficiently large, and then arranging $a_n$ and $c_n$ so that $\delta_{Nr}$ is sufficiently small. Aside from the intrinsic interest of (3) and (4), these results seem to have some practical usefulness in assisting in the choice of the $a_n$ and $c_n$ when there is more than one local maximum of $M(x)$, or if the process (2) is used to optimize the parameters of a physical system whose performance $(M(x))$ should not be reduced below some minimum level--the value of $x$ corresponding to this level may not be known. In both of these cases we may wish to limit the excursions to some given multiple or function of $|x_0 - \theta|$, with a high probability, while still being certain that $x_n \rightarrow \theta$ w.p.1.

## Citation

Harold J. Kushner. "A Note on the Maximum Sample Excursions of Stochastic Approximation Processes." Ann. Math. Statist. 37 (2) 513 - 516, April, 1966. https://doi.org/10.1214/aoms/1177699536

## Information