The Posterior $t$ Distribution

M. Stone

doi:10.1214/aoms/1177704169

June, 1963 The Posterior $t$ Distribution

M. Stone

Ann. Math. Statist. 34(2): 568-573 (June, 1963). DOI: 10.1214/aoms/1177704169

Abstract

For the problem of inference about a real parameter $\mu$ on the basis of $n$ independent observations $x_1, \cdots, x_n$ (or x) each distributed as $N(\mu, \sigma^2)$ with $\sigma^2$ "unknown", it is commonly asserted, for example in [2] p. 465, that the Bayesian method is close to other forms of inference (significance tests, confidence and fiducial intervals) since it too may be based on $s_{n-1}(t)$, the probability density function (pdf) of Student's $t$ with $n - 1$ degrees of freedom. The Bayesian role of $s_{n-1}(t)$ is that of the posterior pdf of $t = \lbrack n(n - 1)/S\rbrack^{\frac{1}{2}} (\bar x - \mu)$, where $\bar x = n^{-1} \sum x_i$ and $S = \sum (x_i - \bar x)^2$ are the sufficient statistics for $\mu$ and $\sigma^2$. It results from formal use in Bayes's Theorem of the improper prior pdf for $\mu$ and $\sigma^2$ described by "independence of $\mu$ and $\log \sigma$ and their uniform distributions on $R^1$". More convincing support for $s_{n-1}(t)$ as a posterior pdf could be obtained by detailed examination of the product space of proper (integrable) prior pdfs and $(\bar x, S)$ and the determination of the essential features of the region where replacement of the posterior pdf of $\mu$ by that derived from $s_{n-1}(t)$ does not seriously affect inference about $\mu$. In this note, attention will be confined to prior pdfs in the following class. Let $\omega$ denote the Fisher information $\sigma^{-2}$ and let $I\{ \}$ denote the 0-1 indicator function of a set. Consider prior pdfs for $\mu$ and $\omega$ drawn from the sequence \begin{equation*}\tag{1.1}p_\alpha(\mu, \omega) \propto \omega^{-1}I\{\mu, \omega\mid\mu_{1\alpha} < \mu < \mu_{2\alpha}, \omega_{1\alpha} < \omega < \omega_{2\alpha}\} \alpha = 1, 2, \cdots.\end{equation*} For each member of this sequence, $\mu$ and $\omega$ are independent while $\mu$ and $\log \omega$ (or $\log \sigma$) have rectangular distributions (from which it is clear that the choice of (1.1) is motivated by the improper prior pdf for $\mu$ and $\sigma^2$ above). The posterior pdf of $\mu$ obtained by combining $p_\alpha(\mu, \omega)$ with the likelihood function $p(\mathbf{x}\mid\mu, \omega) \propto \omega^{\frac{1}{2}n} \exp \lbrack-\frac{1}{2}n\omega(\bar x - \mu)^2 - \frac{1}{2}\omega S\rbrack$ is proportional to $\int^{\omega_{2\alpha}}_{\omega_{1\alpha}} \omega^{\frac{1}{2}n-1} \exp \lbrack-\frac{1}{2}n\omega(\bar x - \mu)^2 - \frac{1}{2}\omega S\rbrack d\omega\cdot I\{\mu\mid\mu_{1\alpha} < \mu < \mu_{2\alpha}\}$ giving, with the change of variable $u = \omega\lbrack 1 + t^2/(n - 1)\rbrack S$ \begin{equation*}\tag{1.2}p_\alpha(t \mid \mathbf{x}) \propto s_{n-1}(t) \int^{\lbrack 1+t^2/(n-1)\rbrack S\omega_{2\alpha}}_{\lbrack 1+t^2/(n-1)\rbrack S\omega_{1\alpha}} u^{\frac{1}{2}(n-2)}e^{-\frac{1}{2}u}du\end{equation*} $I\{t\mid\lbrack n(n - 1)/S\rbrack^{\frac{1}{2}}(\bar x - \mu_{2\alpha}) < t < \lbrack n(n - 1)/S\rbrack^{\frac{1}{2}}(\bar x - \mu_{1\alpha})\}.$ To obtain $s_{n-1}(t)$, Jeffreys (p. 68 of [1]) uses a convergence argument which, in our specialisation, would involve letting \begin{equation*}\tag{1.3}\mu_{1\alpha} \rightarrow - \infty, \mu_{2\alpha} \rightarrow \infty, \omega_{1\alpha} \rightarrow 0, \omega_{2\alpha} \rightarrow \infty \text{as} \alpha \rightarrow \infty\end{equation*} as necessary and sufficient conditions for \begin{equation*}\tag{1.4}\lim p_\alpha(t\mid\mathbf{x}) \equiv s_{n-1}(t)\end{equation*} for all values of $\mathbf{x}$. In (1.4), $\mathbf{x}$ is kept fixed. However, in changing $\alpha$, we are changing the prior distribution used, so that keeping $\mathbf{x}$ fixed has no obvious relevance. To emphasize that a different $\mathbf{x}$ would normally be associated with a different prior pdf, we will, except in the proofs of Section 2, write $\mathbf{x}_\alpha, \bar x_\alpha, S_\alpha, t_\alpha$ for the $\mathbf{x}, \bar x, S, t$ associated with $p_\alpha(\mu, \omega)$. A radically different justification of $s_{n-1}(t)$ is provided as follows. Let us suppose that the person who is to make the inference about $\mu$ has the prior pdf $p_s(\mu, \omega)$ for $s$ some positive integer, that is, a pdf that happens to be a member of the sequence (1.1). Examination of (1.2) shows that he can take $s_{n-1}(t_s)$ as a good approximation to his posterior pdf provided \begin{equation*}\tag{1.5}S_s\omega_{1s} \ll 1, S_s\omega_{2s} \gg 1, S^{-\frac{1}{2}}_s(\mu_{2s} - \bar x_s) \gg 1, S^{-\frac{1}{2}}_s(\bar x_s - \mu_{1s}) \gg 1.\end{equation*} Now a person holding the prior pdf $p_s(\mu, \omega)$ would expect to obtain $\mathbf{x}_s$'s according to the marginal pdf $p_s(\mathbf{x}_s) = \int \int p(\mathbf{x}_s\mid\mu, \omega)p_s(\mu, \omega) d\mu d\omega$. The probability of (1.5) under $p_s(\mathbf{x}_s)$ is therefore the person's prior probability of being able to use $s_{n-1}(t_s)$ as a basis for inference about $\mu$. In the light of this, if, for the sequence (1.1), we were to have $\operatorname{plim} S_\alpha\omega_{1\alpha} = 0, \quad \operatorname{plim} S_\alpha\omega_{2\alpha} = \infty,$ \begin{equation*}\tag{1.6} \operatorname{plim} S^{-\frac{1}{2}}_\alpha(\bar x_\alpha - \mu_{1\alpha}) = \infty, \operatorname{plim} S^{-\frac{1}{2}}_\alpha(\bar{x}_\alpha - \mu_{1\alpha}) = \infty \end{equation*} with the $\operatorname{plim}$ evaluated with respect to the sequence of marginal distributions $p_\alpha(\mathbf{x}_\alpha)$, we would, by proceeding down the sequence, be able to invest $s_{n-1}(t)$ with an asymptotic justification. (By $\operatorname{plim} z = \infty$, we mean that $\lim \operatorname{Prob} (z < K) = 0 \text{for all} K.)$ In Lemma 1 be $\mathrm{(a)} \rho_{2\alpha}/\rho_{1\alpha}\rightarrow \infty$ \begin{equation*}\tag{1.7} \mathrm{(b)} \rho_{2\alpha} \rightarrow \infty \end{equation*} $\mathrm{(c)} \lim \inf \lbrack\log \rho_{1\alpha}/\log \rho_{2\alpha}\rbrack \geqq 0 \quad \text{as} \quad \alpha \rightarrow \infty. $ Lemma 2 then shows that (1.6) is equivalent to \begin{equation*}\tag{1.8}\operatorname{plim} p_\alpha(t\mid\mathbf{x}_\alpha) \equiv s_{n-1}(t)\end{equation*} where the $\operatorname{plim}$ is again evaluated with respect to the sequence $p_\alpha(\mathbf{x}_\alpha), \alpha \rightarrow \infty$. Hence (1.7) is necessary and sufficient for (1.8) which, since it allows direct comparison with the Jeffreys approach in (1.3) and (1.4), we state as the principal theorem. The interpretation of the conditions (1.3) is superficially straightforward; it is that the prior pdfs for $\mu$ and $\omega$ should (separately) approach conditions representing "complete ignorance". (1.7) is apparently more complex. In the requirement $\rho_{2\alpha}/\rho_{1\alpha} \rightarrow \infty$, it agrees with (1.3); its principal divergence from (1.3) lies in the existence of the joint conditions, (b) and (c), on the developments of the prior pdfs of $\mu$ and $\omega. \rho_{1\alpha}$ and $\rho_{2\alpha}$ may be regarded as measures of the information about $\mu$ in the least and most informative conditional distribution $p(\mathbf{x}\mid\mu, \omega)$ allowed by $p_\alpha(\mu, \omega)$, relative to the prior information about $\mu$ measured by the quantity $(\mu_{2\alpha} - \mu_{1\alpha})^{-1}$. (1.7) (c) requires that, although there is no necessity for $\rho_{1\alpha}$ to approach zero at all, if it does so, it should not do so too rapidly that is, loosely speaking the least informative conditional distribution should not be too uninformative. For the case $\mu_{1\alpha} = -\alpha, \mu_{2\alpha} = \alpha, \omega_{1\alpha} = \alpha^\lambda, \omega_{2\alpha} = \alpha$, (1.3) requires $-\infty < \lambda < 0$, while (1.7) requires $-2 \leqq \lambda < 1$. The case $\mu_{1\alpha} = -1, \mu_{2\alpha} = 1, \omega_{1\alpha} = 1, \omega_{2\alpha} = \alpha$ satisfies (1.7) but not (1.3). The comparison of (1.3) and (1.7) is assisted by noting that $t_\alpha$ is invariant with respect to the simultaneous transformations of $x$ and $\mu, x \rightarrow a_\alpha x + b_\alpha, \mu \rightarrow a_\alpha\mu + b_\alpha$. We would therefore expect that any reasonable condition on the sequence (1.1) for the asymptotic relevance of $s_{n-1}(t)$ would be unaffected by these transformations, when coupled with $\omega \rightarrow a^{-2}_\alpha\omega$. (1.7) agrees with such expectation while (1.3) does not.