Many statistical problems contain an infinite parameter space or are analyzed as if they contained one. In a Bayesian analysis of such a problem, there is often, for one or more of several reasons, an attraction in employing an infinite measure on parameter space in the role of `prior distribution' of the parameter. The employment of such a quasi prior distribution consists in its formal substitution as the prior density in Bayes's theorem to produce a quasi posterior distribution. (We will qualify the "posterior distribution," obtained in this way, as quasi, even if it is a probability distribution with integral one, as will be assumed for the rest of this paper). The attractions referred to are that the quasi prior distribution (i) may be thought to represent "ignorance" about the parameter; (ii) may give (quasi) posterior distributions satisfying some "natural" invariance requirement; (iii) may itself satisfy some "natural" invariance requirement (the Jeffreys invariants); (iv) may give (quasi) posterior distributions on the basis of which statistical statements may be constructed which closely resemble those of classical statistics. [A separate argument for the quasi prior distribution is that, for an infinite parameter space, the class of Bayes decision functions may be complete only if the class includes those derived from quasi prior distributions (e.g. Sacks (1963)); but in this paper we will go no further than consideration of posterior distributions.] In the foundations of Bayesian statistics, associated with the names of Ramsey, de Finetti and Savage (but not Jeffreys), quasi prior distributions do not appear. When finally arrived at, subjective prior distributions are finite measures. Moreover they are, for any given person, uniquely determined, so that there should be no question of choice. Hence, matters such as the representation of ignorance, invariance and the degree of resemblance to classical statistics are not relevant. But it is possible to accept this standpoint and then to argue that the consequences of using quasi prior distributions are worth investigating if only as convenient approximations in some sense. As Welch [(1958), p. 778] reveals, such an attitude must have been implicitly adopted by those nineteenth century followers of Bayes and Laplace who ascribed a probability content to the interval between probable error limits of some astronomical or geodetic observation. (With a normal distribution of known variance taken for the observations, the implicit quasi prior distribution was, of course, uniform on the real line.) One sense of the "approximation" is straightforward. Imagine fixed data. A quasi prior distribution may be a satisfactory approximation to the actual prior distribution for this set of data if it locally resembles or simulates the actual distribution over some important compact set of parameter points determined by the data; the fact that the quasi prior distribution is not integrable on the complement of this compact set may not be important. For example, if, in sampling from a normal distribution, the quasi prior resembles the actual prior in the region where the likelihood function is not close to zero, then the approximation may be satisfactory. Thus, for given data, actual prior distribution and criterion of satisfactory approximation, the question of deciding whether a certain quasi prior distribution is employable is theoretically straightforward, the answer being based on a direct comparison of the actual posterior and quasi posterior distributions. Furthermore, even without knowledge of the data, a calculation of the prior probability (evaluated by the actual prior distribution) of obtaining data for which the quasi prior distribution is employable will provide the necessary prospective analysis. However, when no actual prior distribution is given, two courses of justification of a given quasi prior distribution are available. One, suggested by a referee, would consist of the demonstration that, for each member of a wide class of proper (and possibly actual) prior distributions, the corresponding posterior distributions are (with high probability) satisfactorily close to the quasi posterior distribution. The other, which formally avoids any decision as to when two distributions are satisfactorily close to each other, is asymptotic and would ask the question "Does there exist a sequence of proper prior distributions such that, as we proceed down the sequence, the posterior distributions converge in some sense to the quasi posterior distribution?" Jeffreys [(1957), p. 68] and Wallace [(1959), p. 873] have adopted the latter course. Wallace shows without difficulty that, roughly speaking, given a quasi prior distribution, there exists a sequence of proper prior densities whose corresponding posterior densities tend to the quasi posterior density for each fixed set of data. Reintroducing the concept of satisfactory approximation, the existence of this type of convergence, which may be called Jeffreys-Wallace convergence, assures us that, for all reasonable criteria of approximation, given the data there will be a member of the constructed sequence of prior distributions whose corresponding posterior distribution will be satisfactorily approximated by the quasi posterior distribution. Since this particular prior distribution could be the actual prior distribution of some experimenter, a certain justification of the quasi posterior distribution is thereby provided. This justification can be made separately for each set of data thereby yielding an apparently prospective justification of the quasi prior distribution itself (that is, a justification of its use for all sets of data). However, it is clear that the justification is essentially retrospective, since the prior selected may depend on the data or, in other words, the convergence (of posterior distributions to quasi posterior distribution) may not be uniform with respect to different data. In this paper, we will, like Jeffreys and Wallace, adopt the asymptotic justification course (fearing that the more statistically relevant alternative is exceedingly complex) but our justification will be genuinely prospective. To obtain this prospective asymptotic justification of a given quasi prior distribution, we may, perhaps, impose the condition that the convergence be uniform. However, even where it is possible, such a requirement is stronger than is statistically necessary. All we need is convergence in probability defined as follows. (The precise definition, applied to a special context, is given later.) There is convergence in probability to the quasi posterior distributions corresponding to a given quasi prior distribution if there exists a sequence of proper prior distributions such that, at each parameter point, the corresponding sequence of posterior densities converges in probability to the quasi posterior density. The latter use of "converges in probability" is the customary one with the Bayesian slant that the sequence of probability distributions with respect to which it is defined are the marginal distributions of data corresponding to the sequence of prior distributions. From this definition it is clear that, for all reasonable criteria of approximation, for each $\epsilon > 0$ there will be a member of the postulated sequence of prior distributions such that the prior probability of obtaining data for which the quasi posterior distribution is a satisfactory approximation to the posterior distribution corresponding to the member will exceed $1 - \epsilon$. Since this member could be the actual prior distribution of some experimenter, a genuinely prospective justification of the given quasi prior distribution is thereby provided. Wallace's theorem shows that there is Jeffreys-Wallace convergence to all quasi prior distributions, so that all quasi prior distributions are asymptotically justified in the Jeffreys-Wallace sense. However, no equivalent theorem is available for convergence in probability, which requires demonstration for each quasi prior distribution considered. The above distinction has been drawn previously by Stone (1963), (1964) for data from the normal distribution-univariate and multivariate. The present work provides a generalisation of the case. Two analytical restrictions are made. The first, essential for the results obtained, is that the experiment generating the data should have the property of invariance under a group of transformations. (Following Fraser (1961), the supposed invariance will generally be conditional on an ancillary statistic). Many statistical problems have this group invariance structure. The second restriction (which may well be inessential) is that, given any quasi prior distribution, only sequences of prior distributions obtained by truncations of the quasi distribution to compact parameter sets will be considered. In Section 2, the group invariant structure of the experiment is outlined. In Section 3, we follow Hartigan (1964) by introducing relatively invariant prior distributions. In Section 4, convergence in probability in this context is defined and in Theorems 4.1 and 4.2 it is shown that right Haar measure (as quasi prior distribution) is, under certain conditions, sufficient and necessary (among relatively invariant prior distributions) for convergence in probability. In Section 5, general statistical applications are considered.
"Right Haar Measure for Convergence in Probability to Quasi Posterior Distributions." Ann. Math. Statist. 36 (2) 440 - 453, April, 1965. https://doi.org/10.1214/aoms/1177700154