Recycling physical random numbers

Physical random numbers are not as widely used in Monte Carlo integration as pseudo-random numbers are. They are inconvenient for many reasons. If we want to generate them on the fly, then they may be slow. When we want reproducible results from them, we need a lot of storage. This paper shows that we may construct N = n(n − 1)/2 pairwise independent random vectors from n independent ones, by summing them modulo 1 in pairs. As a consequence, the storage and speed problems of physical random numbers can be greatly mitigated. The new vectors lead to Monte Carlo averages with the same mean and variance as if we had used N independent vectors. The asymptotic distribution of the sample mean has a surprising feature: it is always symmetric, but never Gaussian. This follows by writing the sample mean as a degenerate U -statistic whose kernel is a left-circulant matrix. Because of the symmetry, a small number B of replicates can be used to get confidence intervals based on the central limit theorem.


Introduction
When it comes to Monte Carlo simulation, physically based random numbers are the poor cousin of pseudo-random numbers.Most physical random number generators have comparatively cumbersome interfaces.Some of them are slow, although http://true-random.com/ is fast.But all of them have problems with reproducibility.To reproduce a simulation with physical random numbers, we would need to store them, and truly random numbers cannot be compressed.See L'Ecuyer (2009) for a survey of random and pseudo-random number generation.Because of these well-known shortcomings, the great majority of simulations take place with pseudo-random numbers.Physical random numbers do have their place however.They are still used in a small percentage of Monte Carlo applications, and there is a market for devices that produce them.For example, when we are concerned that a flaw in the pseudo-random number generator might interact with a feature of the problem, we can replace the pseudo-random numbers by physically random ones and rerun the example.
This article investigates a strategy to mitigate the storage disadvantage of physical random numbers, by summing pairs of (vectors of) uniform random numbers modulo 1.In this way, n physical random inputs can be used to get answers comparable to what we would get from N = n(n − 1)/2 independent random inputs.The CPU cost is still n(n − 1)/2 function evaluations, but storage requirements are greatly reduced, and we end up with reproducibility for physical random numbers.
We suppose that the random numbers are being used for Monte Carlo integration, as follows.There is a function f defined on [0, 1) d , and we seek to approximate the integral µ = [0,1) d f (x) dx.We will assume that f (x) ∈ R. Extensions to vector valued f are straightforward.As written, µ = E(f (X)) for X ∼ U[0, 1) d .Many expectations of functions of non-uniform random variables on the unit cube and other domains, can be cast into this framework, by techniques described in Devroye (1986).We assume that σ 2 = Var(f (X)) < ∞.
Forming all pairwise sums of n independent U[0, 1) d random variables and taking their remainder modulo 1, yields Section 2 gives more details about the construction.Section 3 gives basic statistical properties of this method.The combined inputs are pairwise independent from U[0, 1) d .It follows that E( Ȳ ) = µ, Var( Ȳ ) = σ 2 /N , and the usual variance estimate s 2 satisfies E(s 2 ) = σ 2 .The estimate Ȳ is a degenerate U statistic whose asymptotic distribution is that of a weighted sum of centered independent χ 2 (1) random variables.Section 4 makes a small empirical comparison of IID sampling versus pairwise and three-fold combinations.A surprising symmetry turns up in the QQ plots of the examples even for a lognormally distributed f (X).Section 5 shows that this symmetry is not special to the lognormal distribution, but can instead be explained via recent results in the spectra of circulant matrices.Section 6 gives conclusions.

Notation
For U, V ∈ [0, 1), their sum modulo 1 is where z is the greatest integer less than or equal to z from a source of random numbers, we want to obtain all of their pairwise sums modulo 1.A convenient iteration for that purpose is We ordinarily use X i right after it is generated, so we only have to store U 1 , . . ., U n .We will not need an explicit expression for i in terms of r and s, or for r and s in terms of i.
More generally, for 2 m n we can form N m = n m points X 1 , . . ., X Nm by summing all distinct m-tuples of U 1 , . . ., U n , modulo 1.It is easy to generalize (1) to triple and higher order sums.Ordinary IID sampling corresponds to m = 1.

Statistical properties
Here we give basic statistical properties for the pairwise recycled uniform vectors.Proposition 1 shows that the combined Monte Carlo inputs are pairwise independent.Then Proposition 2 shows how this suffices to get the low order moments right.
Proposition 1.For dimension d 1 and n m 1, let U 1 , . . ., U n be IID U[0, 1) d random variables.Suppose that X i for i = 1, . . ., N m comprise all N m = n m distinct sums of the form . Assume that i = j.Then X i contains at least one summand U r that is not used in X j .Without loss of generality suppose that it is U rm(i) .The distribution of The random variables X 1 , . . ., X N are pairwise independent, but for m > 1 they are not generally independent.For example is always a vector of integers.
Pairwise independent random variables satisfy many of the key properties we need in Monte Carlo integration.Suppose that These have the right low order moments as shown next.
Proposition 2. For N 2, let Y 1 , . . ., Y N be pairwise independent random variables with common mean µ and common variance σ Proof.The first part is obvious, by linearity of expectations.The second and third parts follow easily because Proposition 2 shows that we can use N pairwise independent random variables to get unbiased Monte Carlo estimates with the same variance as with N fully independent random variables.Furthermore we can get an unbiased estimate of that variance.
Usually in a Monte Carlo integration problem we ask for more than the moment properties in Proposition 2. To get an asymptotic confidence interval for µ, we want a central limit theorem.Sequences of pairwise independent random variables do not always satisfy a central limit theorem, even when the individual variables are identically distributed and have finite variance.For an extreme counterexample, see Romano and Siegel (1986, Chapter 5) who construct 2 n pairwise independent random bits from n independent ones.
Suppose that U 1 , . . ., where Hoeffding (1948) gives a central limit theorem for U -statistics.In this setting √ n( Ȳ − µ) → N (0, τ 2 ) but here √ nVar( Ȳ ) = σ 2 n/N and so the limit has τ 2 = 0.This U -statistic is degenerate and the central limit theorem for it does not help us set confidence intervals.
The limiting distribution for degenerate U -statistics of order m = 2 is a weighted sum of independent centered chisquares.It uses the eigenvalues λ j of Ψ(•, •) − µ where an eigenvalue-eigenfunction pair (λ, g) satisfies The function g(V ) = 1 is an eigenfunction with eigenvalue 0. By convention we will call this eigenpair (λ 0 , g 0 ).
Notice that the sum in (3) does not include the zeroth eigenvalue.Combining because Var(Z 2 j ) = 2.The degree of nonnormality in the limiting distribution (3) is quite mild.The skewness of a weighted sum of independent χ 2 (1) random variables must be between − √ 8 and √ 8 because √ 8 is the skewness of the χ 2 (1) distribution.The kurtosis of that weighted sum must be between 0 and 12, the kurtosis of χ 2 (1) .Degenerate U -statistics for m 3 do not satisfy a central limit theorem either.See Arcones and Giné (1993).
The lack of a central limit theorem is easily mended.We take B independent replicates of the whole process and average them.Specifically let Then , and E(s 2 ) = σ 2 by the same pairwise independence properties used in Proposition 2. We could also form B separate averages Ȳb and take σ2 = ( n m ) B−1 B b=1 ( Ȳb − Ȳ ) 2 , but this estimate would have only B − 1 degrees of freedom.
Using B replicates we consume only nB uniform random vectors to obtain B n m pairwise independent ones.For m = 2, taking a fixed n > 2000 leads to at least a 1000 fold reduction in the number of independent random vectors that we need to store.For fixed n and N , we have a central limit theorem

Numerical comparison
Here we make a small numerical inspection of random vector recycling.It is convenient to compare the methods with m = 2 and m = 3 using the same value of N .There are only three values of N in the range 10 < N < 10 6 that can be attained as both n2 To make the comparison we use N = 1540.This sample size is small enough to allow many replications.The distribution of Ȳ may be sensitive to that of Y i .Two quite different example distributions are used for Y i .The first is the lognormal distribution, which we get via f (X) = exp(Φ −1 (X)) for X ∈ [0, 1).The second is the uniform distribution which we get via f (X) = X.The mean, variance, skewness, and kurtosis of these two distributions are as follows: The results of 10,000 independent replicated computations of Ȳ from these methods are displayed in Figure 1.Taking m = 1 corresponds to sampling N independent values of Y i .For m = 1, 2, 3 the distribution of Ȳ is nearly normal in the center, but starts to depart in the tails, where it matters most.For m = 2 and 3 the distributions are nearly symmetric while for m = 1 and the log normal distribution, the distribution of Ȳ retains some of the skewness of Y i .As we might expect, the non-normality is more severe with m = 3 than with m = 2. Surprisingly, the non-normality is more severe for Y i ∼ U(0, 1) than for log normal Y i .

QQ plots
Since m = 3 gives greater skewness, and is not covered by the sum of chisquareds result in Theorem 1, we focus on the case m = 2, which should create enough pairwise independent vectors for applications.For m = 2 we group the 10,000 independent replicates into groups of B = 4 and B = 10. Figure 2 shows QQ plots for the averages of these replicates.Even B as small as 10 gives a very nearly normal distribution.
A QQ plot (not shown) for Ȳ from 1,000 independent samples for m = 2 and N = 7140 was similarly symmetric, had slightly lighter tails than that for N = 1540, but was clearly not normal.

Symmetry of Ȳ
In the numerical examples, the QQ-plots of Ȳ appeared to be symmetric, even when the test function was the log-normal inverse CDF exp(Φ −1 (u)).Such symmetry is consistent with the limit distribution (3) only when the eigenvalues, or at least the dominant ones, come in pairs of opposite sign.Symmetry, if it holds generally, is useful because it means that the central limit approximation will be more accurate for a small number B of replicates than it would otherwise be.

QQ plots
To investigate the eigenvalues we first consider a 1 dimensional problem with Ψ(u 1 , u 2 ) = f (u 1 ⊕ u 2 ) − µ for u 1 , u 2 ∈ [0, 1) and µ = 1 0 f (u) du.Without loss of generality, assume that µ = 0 for this section.The d dimensional case will be similar as remarked below.
Let G be a large odd integer and define z j = (j +1/2)/G for j = 0, . . ., G−1.The values z j are from a midpoint rule on [0, 1).We will use the approximation As a result, we study the eigenvalues of Ψ by looking at those of the G × G matrix G −1 Ψ G where is a left-circulant matrix (Davis, 1979).Each row is the previous one shifted left one position with wraparound.The better known right-circulant matrices shift each row to the right, and are thus a subfamily of Toeplitz matrices.
The spectral decomposition of left-circulant matrices was recently found by Karner et al. (2003), giving a more explicit version of results in Davis (1979).We restate one of their results.
Theorem 2 (Theorem 3.6 of Karner et al. (2003)).The eigenvalues of The sums in (4) are real because Ψ G is a symmetric matrix.As a result, the eigenvalues of Ψ G come as pairs of real numbers with opposite signs except for as G → ∞.The limit in (5) holds if a midpoint rule is asymtotically correct for 1 0 f (u) du.It suffices for f to be Riemann integrable, but that is not necessary, as convergence also holds for some unbounded integrands.
For an even number G, the eigenvalues of G −1 Ψ G come in pairs apart from the 0'th and the G/2'th one, which is G −1 G−1 j=0 (−1) j a j → 0. See Karner et al. (2003).
When d > 1, and G is odd, the eigenvalues still come in pairs with opposite sign.The generalization of Ψ G is then a d-fold Kronecker product of left circulant matrices.See van der Mee et al. (2006) for an example using multiindex Toeplitz matrices with a similar Kronecker product structure.The eigenvalues of the Kronecker product are products of the eigenvalues of its matrix factors and so the spectrum is still symmetric apart from the eigenvalue for the constant eigenvector, which approaches [0,1) d f (u) du = 0.

Conclusions
Given n independent vectors U i ∼ U[0, 1) d we can recycle them over and over to make N m = n m pairwise independent random variables X i .The variance of Monte Carlo integration is not adversely affected when we substitute these pairwise independent for genuinely independent ones.The estimate Ȳ is unbiased with the same variance as for independent variables and the customary variance estimate for Ȳ is unbiased.These moment results hold for fixed m as n → ∞.If m increases with n, perhaps m = n/2 , then Propositions 1 and 2 still hold, but Theorem 1 does not apply unless m = 2.Because results for m = 3 appeared worse than for m = 2, it is reasonable to use a small fixed m like m = 2.
To get confidence intervals based on the central limit theorem, it is necessary to average several replicates of the recycled vectors.Due in part to a surprising symmetry in the asymptotic distribution of Ȳ , shown for m = 2, only a small number of replicates are needed.Even m = 2 is enough to generate a very large number N of pairwise independent vectors from a modest number n of independent ones.
Pairing up random vectors works for d dimensional integration, but it is crucial to the analysis that the components X ij for j = 1, . . ., d of each point X i be truly independent.We arranged this by making each U i have d independent components.In particular, nothing in this article is meant to support turning n independent scalar uniform random variables into N pairwise independent scalars before forming vectors of dimension d.
Another route to pairwise independent random vectors is to take U r ind ∼ U[0, 1) d for 1 r n, where n is even, and form n 2 /4 pairs U r ⊕ U s for 1 r n/2 and n/2 < s n.The resulting statistic Ȳ is a generalized U -statistic, and once again, is degenerate.Theorem 1 does not apply to it.From Lemma B of Serfling (1980, Section 5.2.2) we can find that E(( Unless the implied constant is zero when k = 3 a strategy based on generalized U -statistics will also require independent replicates in order to satisfy a central limit theorem.Even if that constant is 0, the Lemma B does not yield a central limit theorem for generalized U statistics.
Pseudorandom numbers have many advantages compared to physical ones.Indeed the simulations in Section 4 were done with pseudo-random numbers.It is also known that some sources of physical random numbers fail tests of randomness such as Marsaglia's diehard battery of tests.But when one wants to use physical random numbers, the problems of large storage needs can be greatly mitigated by pooling the random numbers together into pairwise independent vectors.
Finally, the problem considered here is similar to one that arises in randomized quasi-Monte Carlo.Scrambled digital nets with the random scrambling proposed in Owen (1995) satisfy a central limit theorem (Loh, 2003).The random linear scrambles of Matoušek (1998) are simpler to implement and require much less storage.Random linear permutations have the same first and second order marginal distributions as uniform random permutations, and scrambled nets using them have the same mean and variance as with uniform random permutations.The third and higher order margins of random linear permutations are different from the uniform ones (when more than 3 items are permuted).
Thus they may fail to satisfy a central limit theorem, but no proof or counter example has yet been published.There is some empirical evidence that the distributions could be different: Hong et al. (2003) have found that the two scramblings yield quite different distributions for a related quantity (the mean squared discrepancy) of quadrature points.

Figure 1 :
Figure 1: QQ plots to show the effect of m ∈ {1, 2, 3} in random vector recycling.The distribution √ N ( Ȳ − µ)/σ is plotted versus normal quantiles.All samples had N = 22 3 = 56 2 = 1540 function evaluations.The 45 degree line is given as a reference.Points corresponding to 95% and 99% confidence interval endpoints are plotted on it.

Figure 2 :
Figure 2: QQ plots to show the effect of averaging B ∈ {1, 4, 10} independent replicates of random vector recycling.The recycling scheme averaged 1540 = 56 2 pairwise independent vectors constructed from n = 56 independent ones.Here √ N B( Ȳ − µ)/σ is plotted versus normal quantiles.The 45 degree line is given as a reference.Points corresponding to 95% and 99% confidence interval endpoints are plotted on it.