The Annals of Mathematical Statistics

Limiting Distributions for Some Random Walks Arising in Learning Models

M. Frank Norman

Abstract

Associated with certain of the learning models introduced by Bush and Mosteller [1] are random walks $p_1, p_2, p_3, \cdots$ on the closed unit interval with transition probabilities of the form \begin{equation*}\tag{1}P\lbrack p_{n + 1} = p_n + \theta_1(1 - p_n) \mid p_n\rbrack = \varphi(p_n)\end{equation*} and \begin{equation*}\tag{2}P\lbrack p_{n + 1} = p_n - \theta_2p_n \mid p_n\rbrack = 1 - \varphi(p_n)\end{equation*} where $0 < \theta_1, \theta_2 < 1$ and $\varphi$ is a mapping of the closed unit interval into itself. In the experiments to which these models are applied, response alternatives $A_1$ and $A_2$ are available to a subject on each of a sequence of trials, and $p_n$ is the probability that the subject will make response $A_1$ on trial $n$. Depending on which response is actually made, one of two events $E_1$ or $E_2$ ensues. These events are associated, respectively, with the increment $p_n \rightarrow p_n + \theta_1(1 - p_n)$ and the decrement $p_n \rightarrow p_n - \theta_2p_n$ in $A_1$ response probability. The conditional probabilities $\pi_{ij}$ of event $E_j$ given response $A_i$ do not depend on the trial number $n$. Thus (1) and (2) are obtained with $\varphi(p) = \pi_{11}p + \pi_{21}(1 - p)$. Since the linearity of the functions $\varphi$ which arise in this way is of no consequence for the work presented in this paper, we will assume instead simply that \begin{equation*}\tag{3}\varphi \varepsilon C^2(\lbrack 0, 1\rbrack).\end{equation*} We impose one further restriction on $\varphi$ which excludes some cases of interest in learning theory: \begin{equation*}\tag{4}\epsilon_1 = \min_{0 \leqq p \leqq 1} \varphi(p) > 0 \text{and} \epsilon_2 = \max_{0 \leqq p \leqq 1} \varphi(p) < 1.\end{equation*} It follows from a theorem of Karlin ([5], Theorem 37) that under (1)-(4) the distribution function $F^{(n)}_{\theta_1,\theta_2,\varphi}$ of $p_n$ (which depends, of course, on the distribution $F$ of $p_1$) converges as $n$ approaches infinity to a distribution $F_{\theta_1,\theta_2,\varphi}$ which does not depend on $F$. It is with the distributions $F_{\theta_1,\theta_2,\varphi}$ that the present paper is concerned. Very little is known about distributions of this family, though some results may be found in Karlin [5], Bush and Mosteller [1], Kemeny and Snell [6], Estes and Suppes [3], and McGregor and Hui [8]. The only theorem in the literature directly relevant to the present work is one of McGregor and Zidek [9] as a consequence of which, in the case $\theta_1 = \theta_2 = \theta, \varphi(p) \equiv \frac{1}{2}$, $\lim_{\theta \rightarrow 0} \lim_{n \rightarrow \infty} P\lbrack\theta^{-\frac{1}{2}}(p_n - \frac{1}{2}) \leqq x\rbrack = \Phi(8^{\frac{1}{2}}x)$ where $\Phi$ denotes the standard normal distribution function; that is, the distribution $F_{\theta,\theta,\frac{1}{2}}(\theta^{\frac{1}{2}}x + \frac{1}{2})$ converges to a normal distribution as the "learning rate" parameter $\theta$ tends to 0. We will prove, by means of another method, that this phenomenon is of much greater generality. Theorem 1 below shows that, for any positive constant $\zeta$ and any $\varphi$ with $\max_{0 \leqq p \leqq 1}\varphi'(p) < \min (1, \zeta)/\max (1, \zeta)$ there is a constant $\rho$ such that $F_{\theta,\zeta\theta,\varphi}(\theta^{\frac{1}{2}}x + \rho)$ converges to a normal distribution as $\theta \rightarrow 0$. A nonnormal limit is obtained if $\theta_1$ approaches 0 while $\theta_2$ remains fixed as is shown in Theorem 2. In this case $F_{\theta_1,\theta_2,\varphi}(\theta_1x)$ converges to an infinite convolution of geometric distributions. If $f(p,\theta) = p + \theta(1 - p)$ then (1) and (2) can be written in the form \begin{equation*}\tag{5}P\lbrack p_{n + 1} = f(p_n, \theta_1) \mid p_n\rbrack = \varphi(p_n)\end{equation*} and \begin{equation*}\tag{6}P\lbrack p_{n + 1} = 1 - f(1 - p_n, \theta_2) \mid p_n\rbrack = 1 - \varphi(p_n).\end{equation*} In Section 4 it is shown that the linearity of $f(p,\theta)$ in $p$ and $\theta$ is not essential to the phenomena discussed above. Theorems 3 and 4 present generalizations of Theorems 2 and 1, respectively, to "learning functions" $f(p,\theta)$ subject only to certain fairly weak axioms. A somewhat different learning model, Estes [2] $N$-element pattern model, leads to a finite Markov chain $p_1, p_2, p_3, \cdots$ with state space $S_N = \{jN^{-1}: j = 0, 1, \cdots, N\}$ and transition probabilities \begin{align*}\tag{7}P\lbrack p_{n + 1} &= p_n + N^{-1} \mid p_n\rbrack = \varphi(p_n), \\ \tag{8}P\lbrack p_{n + 1} &= p_n - N^{-1} \mid p_n\rbrack = \psi(p_n),\\ \end{align*} and \begin{equation*}\tag{9}P\lbrack p_{n + 1} = p_n \mid p_n\rbrack = 1 - \varphi(p_n) - \psi(p_n)\end{equation*} where $\varphi(p) = c\pi_{21}(1 - p), \psi(p) = c\pi_{12}p, 0 < c \leqq 1$, and for the sake of this discussion we suppose that $0 < \pi_{12}, \pi_{21} < 1$. In this case a limiting distribution $F_{N, \varphi,\psi}$ of $p_n$ as $n \rightarrow \infty$ exists and is independent of the distribution of $p_1$ by a standard theorem on Markov chains. Estes [2] showed that the limit is binomial over $S_N$ with mean $r = \pi_{21}/(\pi_{12} + \pi_{21})$. It then follows from the central limit theorem that $\lim_{N \rightarrow \infty}\lim_{n \rightarrow \infty} P\lbrack N^{\frac{1}{2}}(p_n - r) \leqq x\rbrack = \Phi\lbrack x/(r(1 - r))^{\frac{1}{2}}\rbrack.$ In Section 5 it is shown that our method permits an extension of this result to much more general $\varphi$ and $\psi$.

Article information

Source
Ann. Math. Statist., Volume 37, Number 2 (1966), 393-405.

Dates
First available in Project Euclid: 27 April 2007

https://projecteuclid.org/euclid.aoms/1177699521

Digital Object Identifier
doi:10.1214/aoms/1177699521

Mathematical Reviews number (MathSciNet)
MR192535

Zentralblatt MATH identifier
0139.34802

JSTOR