Statistics and Subfields

R. R. Bahadur

doi:10.1214/aoms/1177728493

September, 1955 Statistics and Subfields

R. R. Bahadur

Ann. Math. Statist. 26(3): 490-497 (September, 1955). DOI: 10.1214/aoms/1177728493

Abstract

Let $(X, S, \mu)$ be a probability measure space: $X$ is set of points $x, S$ is a field of subsets of $X,$ and $\mu$ is a countably additive measure on $S$ with $\mu(X) = 1.^2$ A subfield is a field $S_0$ of subsets of $X$ such that $S_0 \subseteq S,$ that is, each $S_0$-measurable set is also $S$-measurable. A statistic is a function defined on $X.$ There is no a priori restriction on the class of statistics; in particular, statistics are not necessarily real-valued, and a real-valued statistic is not necessarily an $S$-measurable function. For any statistic $f,$ let $S_f$ denote the class of all sets which are $S$-measurable and of the form $f^{-1}(B),$ where $B$ is a subset of the range of $f.$ The class $S_f$ is clearly a subfield, and is called the subfield induced by $f.$ The induced subfield $S_f$ plays a central role in the study of a statistic $f,$ for the following reason. The probabilist or mathematical statistician is usually concerned not with the statistic $f$ as such, but rather with the class of random variables (i.e., real-valued $S$-measurable functions) which depend on $x$ only through $f,$ and, as is easily seen, this class of random variables is exactly the class of real-valued $S_f$-measurable functions. In case the given statistic $f$ is a random variable (and therefore itself an object of study), the argument just given continues to apply, because in this case $f$ is necessarily an $S_f$-measurable function. This paper discusses certain measure-theoretic problems concerning the relations between subfields, subfields of the apparently special form $S_f,$ and statistics. The main problems, as also the main conclusions, are discribed in the following paragraphs. Most of the conclusions of the paper are valid only in the case when $(X, S)$ is (or may be taken to be) a euclidean sample space, that is, $X$ is a Borel set of the $m$-dimensional euclidean space $(1 \leqq m \leqq \infty),$ and $S$ is the field of Borel sets of $X.$ It is assumed henceforth that this is the case under consideration. There are two main problems. The first is whether every subfield is inducible by a statistic. This problem is discussed (in a more general setting) in [2], and the conclusions of the present paper complement those of [2]. It is shown here that every subfield is inducible by a statistic if and only if the sample space is discrete, that is to say, $X$ is a countable set and $S$ is the class of all subsets of $X$ (Theorem 1). This result is, however, not quite relevant to situations where the natural equivalence relation between subfields is not identity but approximability to within sets of $\mu$-measure zero. The equivalence relation referred to is defined as follows. A subfield $S_1$ is a contraction of a subfield $S_2$ if corresponding to each real-valued $S_1$-measurable function $f_1$ there exists an $S_2$-measurable function $f_2$ such that $f_2(x) = f_1(x)$ except on a set of $\mu$-measure zero; we then write $S_1 \subseteq S_2 \lbrack S, \mu\rbrack.$ The subfields $S_1$ and $S_2$ are equivalent if each is a contraction of the other; we then write $S_1 = S_2 \brack S, \mu\rbrack.$ It is shown that, in fact, corresponding to any subfield $S_0$ there exists an $f$ such that $S_f$ is equivalent to $S_0,$ and that this $f$ may be taken to be a random variable (Theorem 2). In the literature the notion of contraction (and the derived notion is equivalence) has been defined for statistics in two ways, which are here called contraction and functional contraction. A statistic $f$ is a contraction of a statistic $g$ if $S_f$ is a contraction of $S_g$ (that is, $S_f \subseteq S_g \lbrack S, \mu\rbrack ); f$ is a functional contraction of $g$ (written $f \subseteq g \lbrack S, \mu\rbrack $) if there exists a function $h$ on the range of $g$ into that of $f,$ and an $S$-measurable set $N$ with $\mu(N) = 0,$ such that $f(x) = h(g(x))$ for $x$ in $X - N.$ (Cf. [3], [4].) It seems to the write that for most (possibly all) technical purposes the relevant concept is contraction as just defined (cf. Lemmas 7.1 and 3.2 of [1]). However, functional contraction has simpler interpretations and greater intuitive appeal. The second problem is the exact relation between contraction and functional contraction. It is shown that, in general, functional contraction does not imply contraction (Example 1), and also that contraction does not imply functional contraction (Example 2). If, however, both $f$ and $g$ are random variables, then $S_f \subseteq S_g [S, \mu]$ if and only if $f \subseteq g [S, \mu]$ (Theorem 3). It follows, in particular, that if the sample space is discrete, then contraction coincides with functional contraction. The problems described above arose in connection with the theory of sufficiency, and the results have applications in that theory. It follows, for example (assuming that the sample space is euclidean and that the set of alternative distributions of the sample point is a dominated set),that it $f$ is a necessary and sufficient statistic, then $S_f$ is a necessary and sufficient subfield (Corollary 2). The following are some general conclusions bearing on mathematical models for studies such as [1]. (a) The notion of a subfield, while certainly no loss general than that of a statistic, is in fact no more general. (b) There is no loss of generality, or other disadvantage, in defining a statistic to be a random variable. On the contrary, admission of nonmeasurable functions to the discussion leads to inconsistencies between extension and functional extension--this seems undesirable. (c) If $f$ is a random variable, it is inmaterial whether $f$ is regarded as a statistic or as a Borel-measurable transformation (cf. [1], p. 431). These satisfactory conclusions do not necessarily hold for an arbitrary space $(X, S, \mu).$ An example given in [2] shows that at least (a) and (c) are not valid in the general case.