Abstract
This paper develops the theory of the design and performance of optimal finite-memory systems for the two-hypothesis testing problem. Let $X_1, X_2, \cdots$ be a sequence of independent identically distributed random variables drawn according to a probability measure $\mathscr{P}$. Consider the standard two-hypothesis testing problem with probability of error loss criterion in which $\mathscr{P} = \mathscr{P}_0$ with probability $\pi_0$; and $\mathscr{P} = \mathscr{P}_1$ with probability $\pi_1$. Let the data be summarized after each new observation by an $m$-valued statistic $T\in\{ 1, 2, \cdots, m\}$ which is updated according to the rule $T_n = f(T_{n-1}, X_n),$ where $f$ is a (perhaps randomized) time-invariant function. Let $d:\{ 1, 2,\cdots, m\} \rightarrow\{ H_0, H_1\}$ be a fixed decision function taking action $d(T_n)$ at time $n$, and let $P_e(f,d)$ be the long-run probability of error of the algorithm $(f, d)$ as the number of trials $n\rightarrow\infty$. Define $P^\ast = \inf_{(f,d)}P_e(f, d)$. Let the a.e. maximum and minimum likelihood ratios be defined by $\bar{l} = \sup(\mathscr{P}_0(A)/\mathscr{P}_1(A))$ and $\underline{l} = \inf(\mathscr{P}_0(A)/\mathscr{P}_1(A))$ where the supremum and infimum are taken over all measurable sets $A$ for which $\mathscr{P}_0(A) + \mathscr{P}_1(A) > 0$. Define $\gamma = \bar{l}/\underline{l}$. It will be shown that $P^\ast = \lbrack 2(\pi_0\pi_1\gamma^{m-1})^{\frac{1}{2}} - 1\rbrack/(\gamma^{m-1} - 1)$, under the nondegeneracy condition $\gamma^{m-1} \geqq \max\{\pi_0/\pi_1, \pi_1/\pi_0\}$; and a simple family of $\varepsilon$-optimal $(f, d)$'s will be exhibited. In general, an optimal $(f, d)$ does not exist; and $\varepsilon$-optimal algorithms involve randomization in $f$.
Citation
Martin E. Hellman. Thomas M. Cover. "Learning with Finite Memory." Ann. Math. Statist. 41 (3) 765 - 782, June, 1970. https://doi.org/10.1214/aoms/1177696958
Information