Abstract
Consider the problem of Bayesian sequential estimation of a real parameter $\theta$ with quadratic loss and fixed cost $c$ per observation. It is well known (cf. [1], [2]) that, under simple regularity conditions, this problem reduces to the following one. If $Z_1, Z_2, \cdots, Z_n, \cdots$ are the observations (independent and identically distributed given $\theta$) let \begin{equation*}\tag{1.1}Y_n = \operatorname{Var} (\theta\mid Z_1, \cdots, Z_n),\end{equation*} the posterior variance of $\theta$, and, \begin{equation*}\tag{1.2}X_n(c) = Y_n + nc.\end{equation*} The problem is then to find a stopping time $s(c)$ such that $E(X_{s(c)}(c)) = \inf \{E(X_t(c)):t \varepsilon T\}$ where $T$ is the set of all stopping times. In general, although $s(c)$ can usually be shown to exist finding it in explicit form is difficult. In [2] we proposed the following stopping time $\tilde{t}(c)$ for this problem: "Stop as soon as $Y_n \leqq c(n + 1)$". We showed in [2] (generalized in [3]) that under some regularity conditions this rule is asymptotically pointwise optimal (A.P.O.) i.e., \begin{equation*}\tag{1.3}\lim_{c\rightarrow 0} X_{\tilde{t}(c)}(c)\lbrack X(c)\rbrack^{-1} = 1\operatorname{a.s.} \end{equation*} where, \begin{equation*}\tag{1.4}X(c) = \inf_nX_n(c).\end{equation*} In fact, we proved that, \begin{equation*}\tag{1.5}X_{\tilde{t}(c)}(c) = 2c^{\frac{1}{2}}V^{\frac{1}{2}}(\theta) + o(c^{\frac{1}{2}}) \operatorname{a.s.}\end{equation*} and, \begin{equation*}\tag{1.6}X_{\tilde{t}(c)}(c) - X(c) = o(c^{\frac{1}{2}}) \operatorname{a.s.}\end{equation*} where $V(\theta)$ is the reciprocal of the Fisher information number. Later, (in [3]) we showed, under some additional conditions that $\tilde{t}(c)$ is asymptotically optimal i.e., that, \begin{equation*}\tag{1.7}\lim_{c\rightarrow 0}\lbrack E(X_{s(c)}(c))\rbrack\lbrack E(X_{\tilde{t}(c)}(c))\rbrack^{-1} = 1,\end{equation*} and in fact, that \begin{equation*}\tag{1.8}E(X(c)) = 2c^{\frac{1}{2}}E(V(\theta)) + o(c^{\frac{1}{2}})\end{equation*} and \begin{equation*}\tag{1.9}E(X_{\tilde{t}(c)}(c)) - E(X(c)) = o(c^{\frac{1}{2}}).\end{equation*} In this paper we seek to refine the term $o(c^{\frac{1}{2}})$ in (1.5)-(1.6) and (1.8)-(1.9). Our analysis, as in our previous work, is based on looking at the asymptotic properties of $Y_n$. We showed in [2] and [4] that, \begin{equation*}\tag{1.10}Y_n = V(\theta) n^{-1} + R_n\end{equation*} where $R_n = o(n^{-1})$ a.s. In [4] we further showed that, under suitable conditions, \begin{equation*}\tag{1.11}Y_n = V(\theta)n^{-1} + S_n(\theta)n^{-2} + R'_n\end{equation*} a.s. where $R'_n = o(n^{-3/2})$ and \begin{equation*}\tag{1.12}S_n(\theta) = \sum^n_{i=1} W_i(\theta)\end{equation*} where the $W _i$ are independent and identically distributed with mean $0$ given $\theta$. If $W_1(\theta)$ has a second moment and is non degenerate the law of the iterated logarithm enables us to conclude that, \begin{equation*}\tag{1.13}R_n = O(n^{-3/2}\lbrack\log\log n\rbrack^{\frac{1}{2}}) \operatorname{a.s.}\end{equation*} This suggests Theorem 2.1 which asserts that if (1.13) holds then, \begin{equation*}\tag{1.14} X_{\tilde{t}(c)}(c) - X(c) = 0(c^{3/4-\epsilon})\end{equation*} a.s. for all $\epsilon > 0$. The analogues of (1.8) and (1.9) pose greater difficulty. In Section 3 we show that (Theorems 3.1, 3.2), \begin{equation*}\tag{1.15}E(X(c) - 2\lbrack V(\theta)c\rbrack^{\frac{1}{2}}) = \max (o(c^{\frac{1}{2}+\delta(\lambda, b)-\epsilon}), O(c)),\end{equation*} for every $\epsilon > 0$ where, \begin{equation*}\tag{1.16}\delta(\lambda, b) = \frac{1}{2}(\lambda - 1)b(b + (\lambda - 1))^{-1}\end{equation*} and $b$ and $\lambda$ depend on the problem. (Typically $\lambda = \frac{3}{2}$.) On the other hand, in Section 4 we establish, (Theorem 4.1), \begin{equation*}\tag{1.17}E(X_{\tilde{t}(c)}(c) - 2\lbrack V(\theta)c\rbrack^{\frac{1}{2}})^+ = \max (O(c^{\lambda/2}), O(c)),\end{equation*} for every $\epsilon > 0$ where again typically $\lambda = \frac{3}{2}$. Finally in Section 5 we apply our general results to two special situations. (i) Estimating the mean of a normal distribution with a normal prior. (ii) Estimating $p$ on the basis of binomial trials with a beta prior. In case (i) our conditions yields $O(c)$ in both (1.15) and (1.17) and this is best possible. In (ii) when for instance we have a uniform prior the best $\lambda = \frac{3}{2}$ and the best $b = 1$ and we therefore get $o(c^{-3/4\epsilon})$ for every $\epsilon > 0$ in (1.15) and $o(c^{2/3-\epsilon})$ for every $\epsilon > 0$ in (1.17). We do not believe these are best possible. A further analysis of (1.11) would seem to be required for anything better.
Citation
Peter J. Bickel. Joseph A. Yahav. "On an A.P.O. Rule in Sequential Estimation with Quadratic Loss." Ann. Math. Statist. 40 (2) 417 - 426, April, 1969. https://doi.org/10.1214/aoms/1177697706
Information