Sum of Arbitrarily Dependent Random Variables

J o u r n a l o f P r o b a b i l i t y Electron. Abstract In many classic problems of asymptotic analysis, it appears that the scaled average of a sequence of F-distributed random variables converges to G-distributed limit in some sense of convergence. In this paper, we look at the classic convergence problems from a novel perspective: we aim to characterize all possible limits of the sum of a sequence of random variables under different choices of dependence structure. We show that under general tail conditions on two given distributions F and G, there always exists a sequence of F-distributed random variables such that the scaled average of the sequence converges to a G-distributed limit almost surely. We construct such a sequence of random variables via a structure of conditional independence. The results in this paper suggest that with the common marginal distribution fixed and dependence structure unspecified, the distribution of the sum of a sequence of random variables can be asymptotically of any shape.


Introduction
Consider a convergence problem of the following type: b n → X in some sense of convergence as n → ∞, where X, X i , i ∈ N are random variables, and a n , b n , n ∈ N are real numbers, typically with b n → ∞.Convergence problems of type (1.1) are arguably the most classic problems in probability, dating back to the first time probability theory was established in Ars Conjectandi.The laws of large numbers and central limit theorems all belong to the type of (1.1).
The classic setting for (1.1) consists of three key elements: (i) the marginal distributions of X i , i ∈ N, typically chosen as identical, say F , with some conditions; This is a fundamental result in the study of stable distributions, with the Central Limit Theorem as a special case.Moreover, and more importantly, we obtain that if the dependence structure in (ii) is chosen as iid, then the limit in (iii) has to be α-stable no matter how we choose F in (i).
The reader might have noticed that examples (a)-(c) have a lot in common: we fix (i) and (ii), and then obtain (iii).In example (d), more is studied: when the dependence in (ii) is fixed as independence, we can determine what distribution in (iii) is a possible limit.This can be interpreted as a compatibility problem of the choices in (ii) and (iii).We remark that if the sum of the sequence in (1.1) is replaced by the maximum of the sequence, it leads to the study of extreme value distributions and their domains of attraction; see e.g.Chapter 1 of de Haan and Ferreira (2006).
The above discussion motivates us to look at the convergence problem in (1.1) from another perspective: what if (i) and (iii) are given, and (ii) is unknown (a typical setting called uncertain dependence)?That is, consider the compatibility problem of (i) and (iii).Let us phrase the problem precisely below.
(Q) For univariate distributions F and G, determine whether there exist a sequence of random variables {X i , i ∈ N}, X i ∼ F , a random variable X ∼ G, and real sequences b n → X in some sense of convergence as n → ∞, (1.2) where S n = n i=1 X i .For instance, if G is chosen as degenerate, then F can be any distribution with finite α-moment, α > 0, with dependence structure in (ii) being independence and b n chosen as max{n 1/α+ε , n}, for some ε > 0; if G is chosen as a normal distribution, then F can be any distribution with finite second moment, with dependence structure in (ii) being independence and b n chosen as bn 1/2 for some b ∈ R; for any G, F can be chosen as G itself, with dependence structure in (ii) being comonotonicity, i.e.X 1 , X 2 , . . .are a.s.identical, and b n = n.
Note that in question (Q), only the shape of the distributions F and G matters to the compatibility, not the location or scale of them.Moreover, note that in all the above examples (a)-(d), typically only moment conditions or tail conditions of F affect the limit in (iii).Based on such an observation, one might naturally expect that the answer to (Q) is generally positive, with only conditions on the moments or tail-behavior of F and G.In this paper, we will show that this is indeed true.
Question (Q) concerns a fundamental question in multivariate dependence theory.Needless to say dependence structures are of crucial importance in statistical analysis; great challenges exist in both the modeling and the statistical inference of dependence structures.More specifically, Question (Q) belongs in the area of distributions with fixed margins, that is, the study on the probabilistic properties of a random vector X := (X 1 , . . ., X n ) with F 1 , . . ., F n fixed as its marginal distributions, while the dependence structure of X can vary.Tchen (1980) studied inequalities for the joint distribution of X in the case of n = 2, which originated from the seminal work of Hoeffding (1940) and Fréchet (1951) in this field.In general, the range of functionals of X with varying dependence structure is still an open question; only limited explicit results are available in the literature.We refer to Joe (1997) for many unsolved problems on distributions with fixed margins.For instance, Chaganty and Joe (2006) studied the range for correlation matrices of X when F 1 , . . ., F n are chosen as Bernoulli random variables; even in that rather simple case, a full characterization of the correlation matrices is far away from being clear.A collection of research developments in the area of distributions with fixed margins can be found in DallŠAglio et al. (1991) and Cuadras and Fortiana (2002).Research in this area often overlaps with the study of dependence concepts and copulas; the interested reader is referred to Joe (1997) and Nelsen (2006) for an overview on dependence concepts, copulas and their relation to problems of distributions with fixed margins.
A particularly interesting and important topic in this area, also relevant to this paper, is the probabilistic behavior of S n := X 1 + • • • + X n .Makarov (1981) and Rüschendorf (1982) considered the distribution function F Sn of S n for n = 2 and obtained sharp bounds on F Sn .Later developments of bounds on F Sn for n 3 can be found in Embrechts and Puccetti (2006), Wang andWang (2011), andWang et al. (2013), with no universal solution achieved yet.An overview of this topic with its connection to masstransportation theory is available in Rüschendorf (2013).Characterizing the set of all possible distributions of S n stands even more challenging and remains open; see Bernard et al. (2014) for instance.The study of question (Q) addresses the set of possible distributions of S n after scaling, in an asymptotic manner as n → ∞.In this paper, we give an answer to question (Q).In Section 2, we show that a stronger version of (Q), holds for all non-degenerate distributions F with finite mean and bounded distributions G, and hence the answer to (Q) is positive with a broad generality.The dependence structure for (Q') in this paper is a special construction of conditional independence.In Section 3 we show that conditions on the asymptotic behavior of F and G are required for (Q') to hold when G is unbounded.We obtain a sufficient and necessary condition for (Q') when F is regular varying, and some sufficient conditions for (Q') for general distributions F and G.In Section 4 we draw our conclusions and highlight some future research directions.We hope that the results in this paper would clearly deliver the following message: With marginal distributions of a sequence given, and its dependence structure unspecified, the asymptotic distribution of the sum of the sequence can be of arbitrary shape.
In other words, even when marginal distributions are accurate, a misspecified dependence assumption could lead to a completely problematic asymptotic behavior in a classic problem of statistical analysis.
A direct application of the main results in this paper can be found in Quantitative Risk Management, where statistical inference with dependence uncertainty is an important issue to consider; see McNeil et al. (2005).For instance, in modeling operational risk, individual risks (risks arising from different types of business lines) are typically non-Gaussian distributed; modeling and statistical inference are usually quite reliable for each of the individual risks.However, there is often not enough joint data to make reliable inference for the (non-Gaussian) dependence structure between individual risks.One may use conservative assumptions on the dependence structure to model risk aggregation (sum) based on marginal data, and find corresponding quantities for capital requirements or stress testing purposes.However, it is well-known that some commonly used "conservative dependence structures" such as the comonotonicity is not meaningful in order to obtain a conservative estimate of a quantile (called a Value-at-Risk in Finance) of S n ; we refer to Embrechts et al. (2013) and Embrechts et al. (2014) for detailed illustrations on this topic.The results in this paper show that such conservative modeling of risk aggregation has to consider all possible shapes of the distribution of the sum, when the number of individual risks is large.For the purpose of the current paper, we will focus on the theoretical aspects of the underlying problem; applications will be left for future discussion.
Throughout the paper, we consider a standard atomless probability space (Ω, A, P).
In a standard atomless probability space, there exist random vectors with any distribution.We denote by F the set of all univariate distributions on R, and by F p the set of univariate distributions on R with finite p-th moment, p ∈ (0, ∞].For simplicity, for any distribution F , we use X F for an F -distributed random variable.We denote the pseudo-inverse function of any non-decreasing function F by F −1 , that is We also denote the essential supremum of a random variable X by sup(X), i.e. sup(X) = inf{x ∈ R : P(X x) = 1}.
To be consistent with the laws of large numbers, and for some technical ease, in this paper we focus on the cases when F has finite mean.
First, we formally define a property considered in question (Q'), which is the main objective of this paper.
Definition 2.1.F ∈ F is said to be shape-compatible with G ∈ F if there exist a sequence of random variables {X i ∼ F, i ∈ N}, a random variable X G ∼ G, and real numbers a ∈ R and b ∈ R + , such that as n → ∞, where and throughout the rest of the paper, S n = n i=1 X i .We denote by F → G if F is shape-compatible with to G. We say F → G at rate b, where b is given in (2.1).
In the above definition, we call F an initial distribution, and G a target distribution.It is clear that if F → G then (Q) in the introduction also holds true for F and G.By definition, the property F → G is invariant under linear transformation.In this section, we assume the initial distribution F is has finite mean.The main result in this section addresses the generality of (2.1), summarized in the following theorem.
Theorem 2.2.Suppose F ∈ F 1 is non-degenerate.Then F → G for all bounded distributions G.
We proceed to prove this theorem step by step.We first give a useful lemma.In many places below, we will assume the means of F and G are equal without loss of generality, so that the normalizing constant a in (2.1) can be set as zero.
We can assume the means of F and G are both zero (implying a = 0).We will construct a sequence {Y i , i ∈ N} such that the averaged partial sum converges a.s. to X G at the rate c b.The idea is to show that part of the sequence {Y i , i ∈ N} can be treated as zero, and the rest part can be taken as {X i , i ∈ N}.Denote d = b/c 1, and let It is easy to see that By the law of large numebrs, we have Therefore, we have The following lemma establishes the shapability from F to any two-point distributions.Part (a) is crucial to the proof of the main theorem in this section, and part (b) will be used later in Section 3.
Proof.Without loss of generality we assume F and G both have mean zero.Let q 1 and q 2 , with q 1 < q 2 , be the two points at which G is supported, that is, for some p ∈ (0, 1), Denote by F p the distribution of F −1 (U p ), where U p ∼ U[p, 1] and by G p the distribution of F −1 (V p ), where V p ∼ U[0, p].Now let X i , i ∈ N be iid random variables distributed as F p and Y i , i ∈ N be iid random variables distributed as G p .It is easy to and and by the law of large numbers we have The next lemma states that if F and G can both be decomposed into a mixture of a collection of distributions with mean zero, say F a , G a , a ∈ A with F a → G a for each a, then F → G.This would be the key step of proving Theorem 2.2.Lemma 2.5.Suppose that A is a non-empty set, distributions F a ∈ F 1 have the same mean for a ∈ A, and distributions G a ∈ F 1 have the same mean for a ∈ A. Suppose F a → G a at rate c a ε for each a ∈ A and some ε > 0. Let H be any probability measure on A, then F := A F a dH(a) and G := A G a dH(a) are distributions and F → G at rate ε.
Proof.It is obvious that F and G are distributions.Without loss of generality we assume F a and G a , a ∈ A all have the same mean.By Lemma 2.3, for each a ∈ A, F a → G a at rate ε, and thus there exist X → W (a) , as n → ∞.Let Y : Ω → A be a random variable with distribution H. Let Z i , i ∈ N be a random variable defined as Thus we obtain F → G.
Remark 2.6.If we choose F a = F , then we obtain that the set of distributions with mean zero and with which F are shape-compatible is a convex set.
The following lemma gives a representation of any distributions with finite mean as a mixture of two-point distributions.This lemma together with Lemmas 2.4-2.5 will eventually lead to a complete proof of Theorem 2.2.Lemma 2.7.Each distribution F with mean µ has the following representation: where F a , a ∈ (0, 1) are two-point (not necessarily distinct) distributions with mean µ.
(2.4) i.e.A F and B F are the inverse functions of H F on the two intervals [0, ν] and [ν, 1], respectively.Note that since H F is continuous and convex, and has an a.e.non-zero derivative, A F and B F are continuous and a.e.differentiable on [c, 0] and .
We can check for x µ, Since B F and A F are the inverse functions of H F , we have a.e. and Similarly, we can show that G(x) = F (x) for x > µ.Thus, for a ∈ (0, 1).Through substituting s by K −1 F (a), we obtain Finally we comment on the case P(X F = µ) = p > 0. If p = 1 then we simply take F a = P µ , where P µ is the degenerate distribution at µ.If p < 1, then the distribution of X F |X F = µ is well-defined; we denoted it by F .Note that F has a representation where Fa , a ∈ (0, 1) are two-point distributions with mean µ.By taking F a = F(a−p)/(1−p) for a ∈ (p, 1) and F a = P µ for a ∈ (0, p], we obtain the representation of F as The proof is complete.
The constructed dependence structure among X 1 , . . ., X n in the proof of Lemmas 2.4 and 2.5 can be explained as follows.The probability space is divided into infinitely many subsets.The random variables X i , i ∈ N are conditionally independent on each of the subset, with 1 bn n i=1 X i converging to a point in the support of G; more specifically, the limiting points appear in pair through the construction in Lemma 2.7.We remark that the dependence structure which accommodates F → G is not unique in general; for example the conditional independence can be replaced by any type of conditional negative dependence, as long as 1 bn n i=1 X i converges a.s. to its mean conditionally on the corresponding subset of the probability space.
Remark 2.8.From the proofs, we can see that Theorem 2.2 still holds if the a.s.convergence in Definition 2.1 is replaced by an L 1 convergence, since the convergence in our proof is obtained by the Strong Law of Large Numbers.The L 1 convergence however highly relies on the fact that we restrict F to have finite mean.To allow a more general framework for future study -especially to allow F to have infinite mean -we frame our discussion in terms of a.s.convergence in this paper.
3 Unbounded target distributions

Necessary conditions
In this section we consider the case when G ∈ F 1 has unbounded support.Without loss of generality we can assume the means of F and G are the same in our discussion.
As we have seen above, shapability is obtained unconditionally for bounded target distributions.One might expect a similar generality for unbounded target distributions.
We will first show that some conditions on the asymptotic behavior of F and G are necessary for F → G to hold.We first recall the definition of tail indices of a distribution.Definition 3.1.The right tail-index of a distribution F is defined as and the left tail-index of a distribution F is defined as Note that here we do not assume that F is regular varying, hence the notion of the tail indices is slightly different from that in classic extreme value theory (see, for example, de Haan and Ferreira, 2006, Chapter 3).When F is regular varying, our definition coincides with the traditional definition used in extreme value theory.
The case when α + F = 1 is trivial since α + G 1 for all G ∈ F 1 ; we assume α F > 1 in the following.For any 1 < α < α F , we have by Fatou's lemma, F is obtained by symmetry.
Lemma 3.2 tells us that for F → G it is necessary for F to have a relatively heavier tail.This can be characterized more precisely using the tail integrals of F −1 and G and lim sup t→0 Proof.We only show (3.1).Let and let F n be the distribution function of where ≺ cx stands for the convex order (see Shaked and Shanthikumar, 2007, Theorem 3.A.36 for this simple fact).By the consistency of Φ (•) with convex order, we have that Φ Fn (t) Φ F (t); see Theorem 3.A.5 of Shaked and Shanthikumar (2007).It follows from → X G and Fatou's lemma that and hence (3.1) follows.
In the following we show that for F or G being a regular varying distribution, (3.1)-(3.2) can be written as a ratio between F −1 (t) and G −1 (t).We denote by L the set of regular varying distributions F with tail indices less than one, i.e. there exist α, β ∈ (0, 1), such that F satisfies that as t → 0, EJP 19 (2014), paper 84. and as t → 1, where L 1 , L 2 are two slowly varying functions (see de Haan and Ferreira, 2006, Section 1.2).
and let F n be the distribution function of 1 n n i=1 X i .Similar arguments as in the proof of Lemma 3.3 lead to By Karamata's Theorem (see de Haan and Ferreira, 2006, Theorem B.1.5),we have and hence Note that in the above lemma we did not assume G is also regular varying.If we assume that F, G ∈ L, then (3.3)-(3.4)are immediately equivalent to (3.1)-(3.2) by Karamata's theorem.
In the next section, we show that (3.3)-(3.4)turn out to be also sufficient for F → G for regular varying F .

Regular varying distributions
By Lemma 2.7, we can find the representations for F and G: where F a , G a , a ∈ (0, 1) are two-point distributions with the same mean.Assume P(X F = E[X F ]) = 0 for the moment.It is clear that F a → G a by Lemma 2.4 since F a and G a are both two-point distributions for each a.Hence, to establish F → G one needs to control the rate of F a → G a for each a.To this consideration, it suffices to have lim sup < ∞. (3.5) Note that both F −1 a and G −1 a only take two values, hence t in (3.5) can be chosen as t = 0 and t = 1.Also note that for c < 1, G a , a ∈ [0, c] has bounded support, and therefore we have that always holds at a positive rate (Theorem 2.2), as long as 1 1−c c 0 F a da is not degenerate.Hence, we only need to control the rate in F a → G a asymptotically as a → 1.Thus, the case when P(X F = E[X F ]) > 0 would not be problematic, and we keep the assumption P(X F = E[X F ]) = 0 in this section for the ease of discussion.
To deal with the possibly infinite support of distributions, the right tail and the left tail need to be discussed separately.We first consider the case of one-side bounded distributions.
where B F , B G , K F and K G are defined in (2.2)-(2.5).Then F → G.
It then follows that ε := inf a∈(c,1) ε a > 0 for some c ∈ (0, 1) and F a → G a at rate ε for a ∈ (c, 1).By (3.6) and Lemma 2.5, we have that F a → G a .
By Lemma 3.5, in order to show that F → G for one-side bounded distributions F and G, it remains to show (3.7).
Proof.Without loss of generality we can assume that F and G have mean zero.Denote Φ F (a) = 1 a F −1 (t)dt and Φ G (a) = 1 a G −1 (t)dt for a ∈ (0, 1).By the definition of B F and K F , we have . (3.9) and Since F −1 or G −1 is a regular varying function, Φ F or Φ G is also a regular varying function.In either case, (3.12) or (3.13) implies lim sup < ∞.
(ii) In the case of G −1 being regular varying, (3.14) implies lim sup from which, along with (3.8), we obtain In both cases, (3.7) holds.By Lemma 3.6 we have F → G. (3.16) Then F → G.
Proof.Without loss of generality we assume that F and G have mean zero, and P(X

General distributions
In this section we drop the assumption of regular variation in F or G.It turns out that a stronger condition is sufficient for F → G to hold.We give two results.The first result is for one-side bounded distributions.Proposition 3.9.Suppose non-degenerate distributions F, G ∈ F 1 have means µ F , µ G , respectively, F −1 (0) > −∞, G −1 (0) > −∞, and (3.17) Then F → G.
Proof.Without loss of generality we assume that F and G have mean zero.The case G −1 (1) < ∞ is covered by Theorem 2.2, and in the following we assume G −1 (1) = ∞.
Proof.Using the same notation as in the proof of Theorem 3.7, we have, by (3.19)-(3.20)and Proposition 3.9, that F p → G p and Fp → G p .The argument in the proof of Theorem 3.7 leads to the conclusion that F → G.
Theorem 3.10 tells us that F → G holds as long as F has a strictly heavier tail compared to G.

Conclusion
In this paper, we show that for any distributions F and G under general conditions, it is possible to find a sequence of random variables with common marginal distribution F such that the scaled average of this sequence converges almost surely to the distribution G.The random variables in this sequence are conditionally independent.This result adds to the study of distributions with fixed margins.The conclusion in this paper is clear: for a classic convergence problem with marginal distribution given, the shape of the limiting distribution can be anything, depending on the dependent structure.
We list some future research directions on the compatibility problem studied in this paper: (a) cases when the initial distribution does not have finite mean or is not regular varying; (b) problems in which the almost sure convergence is replaced by a weak convergence and the normalizing sequences are allowed to be arbitrary; (c) characterizing the set of dependence structures which leads to a given limit; (d) the compatibility problem of replacing the sum by the maximum or another function of the sequence of random variables; (e) the compatibility problem with constraints on the dependence structure of the sequence of random variables; (f) generalization of the obtained results to the setting of sequences of random vectors; (g) applications of the results to time series analysis.
We believe that some of the above questions are challenging and important to modern statistical analysis.

∼
F a , i ∈ N and W (a) ∼ G a such that T

Proof.
Without loss of generality we assume P(X F = E[X F ]) = 0, and F, G have mean zero and are non-degenerate.By the representation in Lemma 2.7, one has By Lemma 2.4 (b), we have F a → G a at rate

Finally, we
give a general sufficient condition for F → G based on Proposition 3.9.
Now we are finally ready to complete the proof of Theorem 2.2.Proof of Theorem 2.2.Without loss of generality we assume the means of F and G are both zero.By Lemma 2.7, G has representation G a , a ∈ (0, 1) are two-point distributions with mean zero.Since the support of G a is bounded by the support of G, by Lemma 2.4, we have that F → G a at rate E[|X F |]/(2 sup(X G )) > 0. Lemma 2.5 and the representation of G lead to the conclusion that where and by Ĝp the distribution of G −1 (V p ).By Lemma 3.6 we have F p → G p .Similarly, we have Fp → Ĝp .By noting that the mean of F p is equal to the mean of G p , and the mean of Fp is equal to the mean of Ĝp , we have, by the same argument in Lemma 2.5, that p Fp + (1 − p)F p → p Ĝp + (1 − p)G p = G.Since p Fp + (1 − p)F p is the distribution of bX F , we conclude that F → G. Remark 3.8.By Lemma 3.2, we can see that (3.15)-(3.16)are necessary and sufficient conditions for F → G to hold for F ∈ L and arbitrary G.