## The Annals of Mathematical Statistics

### The General Moment Problem, A Geometric Approach

J. H. B. Kemperman

#### Abstract

Let $g_1, \cdots, g_n$ and $h$ be given real-valued Borel measurable functions on a fixed measurable space $T = (T, \mathscr{A})$. We shall be interested in methods for determining the best upper and lower bound on the integral $\mu(h) = \int_Th(t)\mu(dt),$ given that $\mu$ is a probability measure on $T$ with known moments $\mu(g_j) = y_j, j = 1, \cdots, n$. More precisely, denote by $\mathscr{M}^+ = \mathscr{M}^+(T)$ the collection of all probability measures on $T$ such that $\mu(|g_j|) < \infty (j = 1, \cdots, n)$ and $\mu(|h|) < \infty$. For each $y = (y_1, \cdots, y_n) \varepsilon R^n$, consider the bounds $L(y) = L(y | h) = \inf \mu(h), U(y) = U(y | h) = \sup \mu(h),$ where $\mu$ is restricted by $\mu \varepsilon \mathscr{M}^+(T); \mu(g_1) = y_1, \cdots, \mu(g_n) = y_n.$ If there is no such measure $\mu$ we put $L(y) = + \infty, U(y) = - \infty$. In many applications, $h$ is the characteristic function (indicator function) $h = I_s$ of a given measurable subset $S$ of $T$. In that case we usually write instead $L(y | I_s) = L_s(y), U(y | I_s) = U_s(y)$. Thus, $L_s(y) \leqq \mu(S) \leqq U_s(y)$ are the best possible bounds on the probability mass $\mu(S)$ contained in $S$, given that $\mu \varepsilon \mathscr{M}^+$ and that $\mu(g) = y$. Here, $g$ denotes the mapping $g:T \rightarrow R^n$ defined by $g(t) = (g_1(t), \cdots, g_n(t))$. By $g_0$ we shall denote the function on $T$ with $g_0(t) = 1$ for all $t \varepsilon T$. The following tentative method for finding $L(y \mid h)$ may be said to go back to Markov [8] and Riesz [13], see [7]. Choose an $(n + 1)$-tuple $d^\ast = (d_0, d_1, \cdots, d_n)$ of real numbers such that $d_0 + d_1g_1(t) + \cdots + d_ng_n(t) \leqq h(t) \text{for all} t \varepsilon T,$ and define $B(d^\ast) = \{z \varepsilon R^n: z = g(t) \text{for some} t \varepsilon T \text{with} \sum^n_{j=0} d_jg_j(t) = h(t)\}.$ Then $L(y \mid h) = d_0 + \sum^n_{j=1} d_jy_j \text{for each} y \varepsilon \operatorname{conv} B(d^ast),$ ($\operatorname{conv} =$ convex hull). The main purpose of the present paper is to investigate the merits of this method and certain more general methods. It turns out (Theorem 5) that for almost all $y \epsilon R^n$ there exists at most one admissible $d^\ast$ with $y \varepsilon \operatorname{conv} B(d^\ast)$. Moreover, provided $y \varepsilon \int (V)$ where $V = \operatorname{conv} g(T)$, there exists at least one such $d^\ast$ if and only if there exists a measure $\mu \varepsilon \mathscr{M}^+$ with $\mu(g) = y$ and $\mu(h) = L(y \mid h)$. A sufficient condition for the latter would be that $T$ has a compact topology with respect to which $g$ is continuous and $h$ is lower semi-continuous. More interesting is a related method for finding $L(y \mid h)$, see Theorem 6, which will work for each $y \varepsilon \int (V)$ as soon as $g$ is bounded. The situation where $y \not\in \int (V)$ is discussed in Section 4. It appears that the assumption $y \varepsilon \int(V)$ is a rather natural one. We have chosen to develop the important special case $h = I_s$ in a partly independent manner, see the Sections 5, 6, and 7. In this case, the $(n + 1)$-tuple $d^\ast$ must satisfy \begin{align*}d_0 + \sum^n_{j=1} d_jz_j \leqq 1 \text{for all} z \varepsilon g(T),\\ \leqq 0 \text{for all} z \varepsilon g(S').\end{align*} Here, $S'$ denotes the complement of $S$ in $T$. Assuming that $d_1, \cdots, d_n$ are not all zero, let us associate to $d^\ast$ the pair of hyperplanes $H$ and $H'$ with equations $\sum^n_{j=1 d_jz_j} = 1 - d_0 \text{and} \sum^n_{j=1} d_jz_j = -d_0,$ respectively. This pair is such that $H, H'$ are distinct parallel hyperplanes with $g(S')$ and $H$ on opposite sides of $H'$ and $g(T)$ and $H'$ on the same side of $H;$ such a pair $H, H'$ will be said to be admissible. Observe that $B(d^\ast) = (g(S) \mathbf{cap} H) \mathbf{cup} (g(S') \mathbf{cap} H'),$ with $H, H'$ as the admissible pair determined by $d^\ast$. The present $(n + 1)$-tuple $d^\ast$ is useful, for determining $L_s(y) = L(y \mid I_s)$ for at least some points $y$, only when both $g(S) \mathbf{cap} H \not\equiv 0$ and $g(S') \mathbf{cap} H' \not\equiv 0$. That is, $H'$ should not only support the set $g(S')$ but even "intersect" it; similarly, $H$ and $g(S)$. Fortunately, one can usually replace "intersect" by "touch". More precisely (Corollary 13), if $H$ and $H'$ form an admissible pair as above then $L_s(y) = d_0 + \sum^n_{j=1} d_jy_j$ for each point $y$ such that both $y \varepsilon \int(V),\quad y \varepsilon \operatorname{conv}\lbrack\{H \mathbf{cap} \overline{\operatorname{conv}} g(S)\} \mathbf{cup} \{H' \mathbf{cap} \overline{\operatorname{conv}} g(S')\}\rbrack,$ a bar denoting closure. Provided $g$ is bounded the latter generalization will yield the value $L_s(y)$ for all relevant $y$, see Theorem 7. Whether or not $g$ is bounded, we have for almost all $y$ that there can be at most one admissible pair of hyperplanes $H$ and $H'$ yielding $L_s(y)$ in the above manner. A detailed discussion of the method on hand may be found in Section 6. The present method is geometrical in the following sense: (i) one only needs to know the sets $g(S)$ and $g(S')$ in $R^n;$ (ii) afterwards, one considers all the pairs $H$ and $H'$ of parallel hyperplanes touching $g(S)$ and $g(S')$ in the above manner. Each such pair yields $L_s(y)$ for certain values $y;$ varying the pair $H, H'$ one often obtains the value $L_s(y)$ for all relevant $y \varepsilon R^n$. Usually, there are many different regions in $y$-space, each with its own analytic formula for $L_s(y)$. Nevertheless, all these different formulae are derived from one and the same geometrical principle. A number of specific illustrations, all with $n = 2$, are presented in Section 7. They indicate that it is often quite easy to solve the following problem in a geometric manner. Let $X$ be a random variable taking its values in a measurable space $T$, such that $E(g_1(X)) = y_1,\quad E(g_2(X)) = y_2,$ with $g_1$ and $g_2$ as known real-valued Borel measurable functions on $T$. The problem is to determine the best possible lower bound $L_s(y)$ on $\mathrm{Pr} (X \varepsilon S)$ where $S$ is a given Borel measurable subset of $T$.

#### Article information

Source
Ann. Math. Statist., Volume 39, Number 1 (1968), 93-122.

Dates
First available in Project Euclid: 27 April 2007

https://projecteuclid.org/euclid.aoms/1177698508

Digital Object Identifier
doi:10.1214/aoms/1177698508

Mathematical Reviews number (MathSciNet)
MR247645

Zentralblatt MATH identifier
0162.49501

JSTOR