A rigorous lower confidence bound for the expectation of a positive random variable

Given an IID sample from a positive distribution, we provide a method for constructing rigorous finite sample lower confidence bounds for the expectation of the distribution. The method is based on constructing rigorous confidence regions for the cdf of the distribution. We provide some analysis of the asymptotical behavior of the rigorous LCBs. We apply the method to obtain an LCB for a particular, controversial, empirical data set, where the validity of standard methods has been called into question.


Introduction
In general, given an IID sample from an unknown distribution, no rigorous confidence bounds can be provided for the expectation of the distribution.The possibility of the existence of a very low probability tail with some extreme value which has a large impact on the expectation can never be ruled out, or even made improbable by any finite sample.For example, with a sample of size n, the existence of an atom with probability n −2 and an arbitrary value, with magnitude large enough so as to impact the expected value greatly, is not unlikely.Indeed, it may be that an expectation does not even exist, or has infinite magnitude.
The normal practice for estimating expectations is to ignore the possibility of the existence of low probability, extreme value tails, apply estimators with known asymptotical properties, and, in effect, assume that those properties are valid for the given sample size.Alternatively, rigorous confidence bounds, such as the Chebychev or Chernoff bounds, can be derived when certain moments are assumed to be bounded or when the distribution itself is assumed to lie within a bounded interval.
Here, we use a weaker assumption, namely, that the distribution is over positive numbers only, but aim at deriving only lower confidence bounds rather than a confidence interval.The same argument as above shows that without additional assumptions, no upper confidence bounds for the expectation can be established.
In addition to providing a guaranteed finite sample confidence level for any underlying distribution over the positive reals, the confidence bounds proposed have the pleasing property that they are monotonic in the order statistics of the sample.This eliminates the paradoxical phenomenon, which can occur with a normal-theory LCB, in which a positive outlier in the sample lowers the nominal LCB for the expectation.

Setup and theorem
Let X, X 1 , . . ., X n be IID variables from an unknown distribution over the real numbers, with P(X ≥ 0) = 1.Let X (1) , . . ., X (n) be the order statistics of X 1 , . . ., X n .We wish to derive lower confidence bounds for the expectation of X: B = B(X 1 , . . ., X n ) such that P(B > EX) ≤ α.We rely on the following theorem.The theorem establishes a LCB for EX as a consequence of simultaneous UCBs for the cdf of X at the sample points.
Theorem 1.Let U = U (1) , . . ., U (n) be the vector of the order statistics of n independent samples from a U[0, 1] distribution.For any vector u where u n+1 = 1 and X (0) = 0. Then B u is a level-p u LCB for the expectation of X.
Proof: The second equality follows from rearranging the terms of the sum.We prove the first equality: Let F − (x) be the left-continuous cdf of the random variable X, i.e., F − (x) = P(X < x).

It follows that for any vector
is at least p u .On this event,

Tunable families of bound parameter vectors
Theorem 1 implies that each vector u ∈ [0, 1] n defines a level-p u LCB, B u , for EX.Different choices of the parameter vector u result in different LCBs.
It is convenient to construct families of parameter vectors in such a way that from each family a vector can be chosen to match a desired level of confidence: Let Λ be a closed subset of R. Define a tunable family of parameter vectors, U = {u λ ∈ [0, 1] n : λ ∈ Λ}, to be a set of vectors which is parameterized continuously by λ and increasing monotonically in each coordinate to 1.That is, for all i = 1, . . ., n, the following hold: Then for each α, 0 < α < 1, there exists a unique λ(α) such that Given a desired confidence level 1 − α, or a set of confidence levels 1 − α 1 , 1 − α 2 , . . ., 1 − α k , the corresponding λ(α), or λ(α 1 ), λ(α 2 ), . . ., λ(α k ), can be determined numerically, to any desired precision, using simulation.
Examples An infinite variety of tunable parameter vector families exist.A very simple family is defined by adding an offset to a the vector ( 1 n+1 , . . ., n n+1 ): When using this family, the LCB assumes the form where Another family results from using confidence bounds for the beta distribution.Let B(a, b, p) be a level-p UCB for the beta distribution with parameters a and b.The construction of this family is intuitively motivated by the fact that for continuous IID random variables X 1 , . . ., X n the marginal distribution of F (X (i) ) is a beta distribution with parameters i and n− i + 1 for all i = 1, . . ., n. Define:

Some asymptotical analysis
Analysis of the asymptotical behavior of the rigorous LCB is facilitated by the Donsker property of the empirical process (see, for example, [1], chapter 2.1).
The Donsker property implies that for continuous distributions the centered and scaled empirical process converges in distribution to the standard Brownian bridge.The centered and scaled empirical process, (H n (t), 0 ≤ t ≤ 1), is defined as: This property can be used directly to calculate the asymptotical behavior of the rigorous bound obtained when using the offset family, LCB OFF , as we do below.Asymptotical analysis of the rigorous LCB for other families such as LCB BETA would be more complex.
The distribution of M , the supremum of the Brownian bridge, is putting the 1 − α quantile of the distribution, q α at 1 2 log 1 α .When using the offset family with a sample of size n, the member selected for a 1 − α LCB will be approximately ( and so, The first term on the right is the integral of the tail of the distribution.If EX is finite, this term approaches zero as n increases, guaranteeing that the LCB is consistent.However, the convergence may be arbitrarily slow unless some additional assumptions regarding the distribution of X are made.The convergence is O(n −1 2 ) if and only if X is bounded almost surely.Using the Hölder inequality it can be shown that if EX r+ǫ < ∞ for some positive r and ǫ then the convergence is o(n VarX is infinite.Thus, asymptotically, the normal theory LCB converges faster than the rigorous LCB whenever VarX exists (unless X is bounded, in which case both LCBs converge as O(n − 1 2 )), but the rigorous LCB guarantees convergence to EX when VarX is infinite, i.e., in situations in which the normal-theory LCB diverges in expectation.

Application to the Lancet study of mortality in Iraq
The results above provide a method for generating theoretical rigorous lower confidence bounds for the expectation of a positive random variable.These bounds can be applied in situations where conventional methods for producing confidence bounds are challenged based on the fact that the validity of those methods relies on asymptotical analysis which may not hold for samples of a given size from particular a distribution.
One such case is the politically sensitive estimate of mortality in Iraq following the U.S. led invasion.In 2006 a group of researchers from the Johns Hopkins Bloomberg School of Public Health carried out a survey among households in Iraq aimed at estimating mortality [2].They provided a point estimate of about 601,000 violent deaths in Iraq for the period March 2003 to July 2006 and a 95% confidence interval of 426,000-794,000.Due to the potential political implication of the findings, the study received intense scrutiny.Most of the attention was focused at the various potential biases introduced into the data collection process by a methodology constrained by the conditions in Iraq (see a summary of such points of criticism in [3]).In addition, however, and despite the fact that the estimation procedure used was apparently identical to that used in similar studies, some criticism was made of the estimation procedure itself.Doubts were voiced as to whether the normal theory 95% confidence interval did indeed have its nominal probability of coverage and it was suggested that the a true 97.5% LCB would be drastically lower than the left point of the interval [4].
We follow here the treatment of Mark van der Lann [4].He uses a somewhat stylized setup in which the death counts in the 49 clusters in the sample collected by Burnham et al. are assumed to be IID samples from an unknown distribution of violent deaths in household clusters in Iraq 1 .Each cluster contains 40 households, so under van der Laan's setup the unknown mean of the distribution is 40 times the mean number of violent deaths per household in Iraq.
Van der Lann provides the death counts in the 47 clusters as follows: The sample mean is 6.4 and the sample standard deviation is 8.3, giving a classical normal-theory 97.5% LCB for the expectation of the distribution of 4.0, i.e., 63.0% of the sample mean.Employing the method above, we obtain a rigorous LCB for the expectation of 2.3 (36.5% of the sample mean) using the offset family and 2.8 (43.8% of the sample mean) when using the beta family. 2e therefore note that while the rigorous LCBs constructed here are significantly lower than the nominal normal-theory LCB, they are not dramatically different (reducing the bound by about one third).This suggests that using such a technique can be useful when dealing with certain situations in which the validity of traditional methods may be called into doubt.
Figure 1 demonstrates the construction of the LCBs graphically.It shows the empirical cdf together with lines signifying the boundaries of the confidence regions established for the cdf using the offset family (dotted line) and the beta family (dashed line).The LCBs for the expectation are the areas to the left of and above those two curves.

Further research
One point associated with the method presented that may merit further research regards the choice of tunable family.Are some families better -i.e., yield tighter bounds -than others, across all possible distributions?Can families be chosen so as to match various properties of the distribution?
Another avenue of research would be to produce extensions of the method in order to make it applicable to a wider variety of situations.One desirable extension would be to cover cases where the sample is stratified, while another would be to cases where random censoring occurs.

Fig 1 .
Fig 1.The Burnham et al. cluster death counts empirical cdf, and derived confidence regions.