Inference from Small and Big Data Sets with Error Rates

In this paper we introduce randomized $t$-type statistics that will be referred to as randomized pivots. We show that these randomized pivots yield central limit theorems with a significantly smaller magnitude of error as compared to that of their classical counterparts under the same conditions. This constitutes a desirable result when a relatively small number of data is available. When a data set is too big to be processed, we use our randomized pivots to make inference about the mean based on significantly smaller sub-samples. The approach taken is shown to relate naturally to estimating distributions of both small and big data sets.


Introduction
In this paper we address the problem of making inference about the population mean when the available sample is either small or big.In case of having a small sample we develop a randomization technique that yields central limit theorems (CLT's) with a significantly smaller magnitude of error that would compensate for the lack of sufficient information as a result of having a small sample.Our technique works even when the sample is so small that the classical CLT cannot be used to make a valid inference.In the case of having a big sample, we also develop a technique to make inference about the mean based on a smaller sub-sample that can be drawn without dealing with the entire original data set that may not be even processable.
Unless stated otherwise, X, X 1 , . . .throughout are assumed to be independent random variables with a common distribution function F (i.i.d.random variables), mean µ := E X X and variance 0 < σ 2 := E X (X − µ) 2 < +∞.Based on X 1 , . . ., X n , a random sample on X, for each integer n ≥ 1, define Xn := that, in turn, on replacing X i by the classical Student t-pivot for the population mean µ.

Define now T
(1) mn,n and G (1) mn,n , randomized versions of T n (X) and T n (X −µ) respectively, as follows: T (1)  mn,n := Xmn,n − Xn G (1)  mn,n := where, Xmn,n := is the randomized sample mean and the weights (w n ) have a multinomial distribution of size m n := n i=1 w (n) i with respective probabilities 1/n, i.e., (w ).
The just introduced respective randomized T (1) mn,n and G (1) mn,n versions of T n (X) and T n (X − µ) can be computed via re-sampling from the set of indices {1, . . ., n} of X 1 , . . ., X n with replacement m n times so that, for each 1 ≤ i ≤ n, w is the count of the number of times the index i of X i is chosen in this re-sampling process.
i.e., the weights have a multinomial distribution of size m n with respective probabilities 1/n.Clearly, for each n, w (n) i are independent from the random sample X i , 1 ≤ i ≤ n.Weights denoted by w (n) i will stand for triangular multinomial random variables in this context throughout.

Thus, T
mn,n and G (1) mn,n can simply be computed by generating, independently from the data, a realization of the random multinomial weights (w mn,n of T n (X) and T n (X − µ) respectively, as follows: T (2)  mn,n := Xmn,n − Xn S mn,n n i=1 ( G (2)  mn,n := where S 2 mn,n is the randomized sample variance, defined as Unlike T n (X) that can be transformed into T n (X − µ), the Student pivot for µ as in (1.2) (cf.Giné et al. [13] for the asymptotic equivalence of the two), its randomized versions T (1) mn,n and T (2) mn,n do not have this straightforward property, i.e., they do not yield a pivotal quantity for the population mean µ = E X X by simply replacing each X i by X i − µ in their definitions.We introduced G (1) mn,n and G (2) mn,n in this paper to serve as direct randomized pivots for the population mean µ, while T (1) mn,n and T (2) mn,n will now be viewed on their own as randomized pivots for the sample mean Xn in case of a big data set.
Our Theorem 2.1 and its corollaries will explain the higher order accuracy these randomized pivots provide for inference about the mean µ, as compared to that provided by T n (X − µ).
Among the many outstanding contributions in the literature studying the asymptotic behavior of T n (X) and T n (X − µ), our main tool in this paper, Theorem 2.1 below, relates mostly to Bentkus et al. [3], Bentkus and Götze [4], Pinelis [15] and Shao [17].
A short outline of the contributions of this paper reads as follows.
In Section 2 we derive the rates of convergence for G √ n) rate is best possible in the sense that it cannot be improved without restricting the class of distribution functions of the data, for example, to normal or symmetrical distributions.In section 2 we also present numerical studies that well support our conclusion that, on taking m n = n, G (i) mn,n and T (i) mn,n , i = 1, 2, converge to standard normal at a significantly faster rate than that of the classical CLT.In Sections 4 and 5, the respective rates of convergence of the CLT's in Section 2 will be put to significant use.In Section 4, G (i) mn,n , i = 1, 2, are studied as natural asymptotic pivots for the population mean µ = E X X.In section 5, T (i) mn,n , i = 1, 2, are studied as natural asymptotic pivots for the sample mean Xn that closely shadows µ, when dealing with big data sets of univariate observations of n labeled units {X 1 , . . ., X n }.In this case, instead of trying to process the entire data set that may even be impossible to do, sampling it indirectly via generating random weights independently from the data as in Remark 1.1 makes it possible to use T (2) mn,n to construct an interval estimation for the sample mean Xn based on significantly smaller sub-samples which can be obtained without dealing directly with the entire data set (cf. Remark 5.3).The latter confidence set for Xn in turn will be seen to contain the population mean µ as well, and with same rates of convergence, in terms m n and n, as those established for having Xn in there.In Section 6 the sample and population distribution functions are studied along the lines of Sections 2-5.The proofs are given in Sections 7 and Appendices 1 and 2.
For throughout use, we let (Ω X , F X , P X ) denote the probability space of the random variables X, X 1 , . .., and (Ω w , F w , P w ) be the probability space on which the weights 2 ), . . ., (w n ), . . .are defined.In view of the independence of these two sets of random variables, jointly they live on the direct product probability space (Ω X × Ω w , F X ⊗ F w , P X,w = P X .P w ).For each n ≥ 1, we also let P .|w(.) stand for the conditional probabilities given 2 The rate of convergence of the CLT's for G (i)   mn and T (i) One of the efficient tools to control the error when approximating the distribution function of a statistic with that of a standard normal random variable is provided by Berry-Esséen type inequalities (cf., e.g., Serfling [16]), which provide upper bounds for the error of approximation for any finite number of observations in hand.It is well known that, on assuming E X |X − µ| 3 < +∞, as the sample size n increases to infinity, the rate at which the Berry-Esséen upper bound for sup where, and also throughout, Φ stands for the standard normal distribution function.
Furthermore, the latter rate is best possible in the sense that it cannot be improved without narrowing the class of distribution functions considered.
Our Berry-Esséen type inequalities for the respective conditional, given the weights w mn,n and T Theorem 2.1.Assume that E X |X| 3 < +∞ and let Φ(.) be the standard normal distribution function.Also, for arbitrary positive numbers δ, ε, let Then, for all n, m n we have and also with C being a universal constant as in the Berry-Esséen upper bound for independent and not necessarily identically distributed summands (cf.page 33 of Serfling [16]).
The following result, a corollary to Theorem 2.1, gives the rate of convergence of the respective conditional CLT's for G and n = o(m 2 n ) then, for δ > 0, we also have mn,n and T (i) mn,n , i = 1, 2, whose respective rates of convergence are established in Corollary 2.1, can be concluded as direct consequences of a realization of the Lindeberg-Feller CLT (cf.Theorems 27.3 and 27.4 of Billingsley [1]) as formulated in Lemma 5.1 of Csörgő et al. [5] (cf.also Appendix 2) that is also known as the Hájeck -Sidák Theorem (cf., e.g., Theorem 5.3 in DasGupta [8]).

Numerical Studies
In this section we use the statistical software R to conduct our numerical studies for comparing the performance of G n,n as in (2.5) of Corollary 2.3 to that of its classical counterpart T n (X − µ).
In order to provide initial motivation for the more in-depth numerical studies as in Tables 2 and 3 below, that indicate a significantly better performance of the pivot G (1) n,n for µ over its classical counterpart T n (X − µ), we first compare the empirical probabilities of coverage of these pivots for µ in Table 1.The nominal probability coverage for the one sided confidence intervals (C.I.'s) in Table 1 is 95% in terms of the standard normal cutoff point 1.644854.The C.I.'s in Table 1 are based on 1000 replications of the data (X 1 , . . ., X n ) for both pivots G (1) n,n and T n (X − µ), and 1000 replication of (w n,n .The intervals are obtained by setting: The empirical probabilities of coverage for each one of these pivots are presented in Table 1 for the distributions therein.Table 1 below shows that the sampling distribution of G n,n in each case, even for small sample sizes, is close enough to the standard normal distribution.Using standard normal percentiles, G n,n , as a pivot for the population mean µ, tends to yield probabilities of coverage that are near to the nominal 95% even for sample sizes for which the classical CLT for T n (X − µ) fails to provide valid C.I.'s for µ.

Table 1:
Comparing the empirical probability coverage of pivot G (1)    In order to study in-depth the refinement provided by G n,n over the classical T n (X − µ) in view of (2.5) of Corollary 2.3, in the following Tables 2  and 3 we present some numerical illustrations of the rates of convergence of one sided C.I.'s for the population mean µ based on the pivot G (1) n,n whose validity and the rate at which they approach to their nominal probability coverage are concluded in (2.5) of our Corollary 2.3 for G (1) n,n .In Table 2 the empirical probability coverage of these asymptotic C.I.'s based on the pivot G (1) n,n with nominal 95% level are compared to the empirical probability coverage of the exact size t-C.I.'s based on the pivot T n (X − µ) whose exact sampling distribution is Student-t with n − 1 degrees of freedom when the data are i.i.d.normal.
To construct our asymptotic 95% C.I.'s based on G n,n in both Tables 2  and 3, we use the standard normal 95% cutoff point 1.644854.In Table 2 we use exact cutoff points of the Student t-statistic T n (X − µ), valid for exact C.I.'s for the population mean.All of the one sided C.I.'s in Table 3 are asymptotic, with both pivots in hand having standard normal limiting distribution as n → +∞.
Tables 2 and 3 display the proportion of 500 generated one sided C.I.'s with empirical coverage probability value in [0.94, 0.96].Each one of these 500 C.I.'s is constructed by generating 500 sets of i.i.d.observations (X 1 , . . ., X n ), with n as displayed, from the indicated respective underlying distributions.

For simulating each value of G
(1) n,n , we also generate 500 sets of the multinoimal weights (w = n and associated probability vector (1/n, . . ., 1/n).
Both Tables 2 and 3 indicate a highly satisfactory performance of the pivot G (1) n,n even when it is compared to an exact size Student t-confidence interval as in Table 2.
To exhibit the performance of the pivot G n,n in Table 3, in addition to normal, we also consider data from skewed distributions.It is known that the Student t-distribution converges to standard normal at a rate of order O(1/n).The numerical results in Table 3 show that, based on normal data, G n,n performs as good as the t-statistic T n (X − µ).The latter is an empirical indication that G (1) n,n converges to standard normal at the rate of O(1/n).In both Tables 2 and 3, we denote the proportions of the C.I.'s with empirical probability coverage values between 94% and 96% associated with the pivots G (1) n,n and T n (X −µ), respectively, by prop G (1) and prop T n (X −µ).

Table 2:
Comparing the pivot G (1)   n,n to the Student t-distribution Distribution of Sample n prop G (1) prop T n (X − µ) n,n and the cutoff points t 0.05,19 = 1.729, t 0.05,24 = 1.711 and t 0.05,29 = 1.699 were used for the pivot T n (X −µ) for n = 20, n = 25 and n = 30, respectively.In Table 3 the standard normal 95% cutoff point 1.644854 was used for both pivots G (1) n,n and T n (X − µ).Furthermore, in Table 3 Lognormal(0,1) stands for a Lognormal distribution with mean zero and variance one.Table 3: Comparing the pivot G (1)   n,n to T n (X − µ) Distribution of Sample n prop G (1) prop T n (X − µ) Binomial( 10 We are now to present G mn,n of (1.4) and G (2) mn,n of (1.7) as direct asymptotic randomized pivots for the population mean µ = E X X, first when only 0 < σ 2 := E X (X − µ) 2 < +∞ is assumed, followed by assuming E X |X| 3 < +∞ as in Remark 4.1, and E X X 4 < +∞ as in Remark 4.2.
We note that for the coinciding numerator terms of G mn,n and G (2) mn,n we have Furthermore, given w (n) i 's, for the randomized weighted average mutatis mutandis in verifying (8.1) in Appendix 1, we conclude that when the original sample size n is fixed and m := m n , then, as m → +∞, we have and the same holds true if n → +∞ as well.
In view of (4.1) is an unbiased estimator for µ with respect to P X|w .It can be shown that when E X X 2 < +∞, as n, m n → +∞ such that m n = o(n 2 ), Xmn,n is a consistent estimator for the population mean µ in terms of P X,w , i.e., Xmn,n → µ in probability − P X,w . (4.5) In Appendix 1 we give a direct proof for (4.5) for the important case when m n = n, for which the CLT's in Corollary 2.1 hold true at the O(1/n) rate.

As to G
(1) mn,n of (1.4), on replacing mn − 1 n in the proof of (a) of Corollary 2.1 of Csörgő et al. [5] (cf.Appendix 2), as n, m n → +∞ so that m n = o(n 2 ), when 0 < σ 2 := E X (X − µ) 2 < 0, we arrive at and, via Lemma 1.2 in S. Csörgő and Rosalsky [7], we conclude also the unconditional CLT When E X X 2 < +∞, in Appendix 1 we show that when n is fixed and m := m n → +∞, the randomized sample variance S 2 mn,n , as defined in (1.8), converges in probability-P X,w to the sample variance S 2 n , i.e., we have (cf.(8.2) in Appendix 1 or Remark 2.1 of Csörgő et al. [5]) For related results along these lines in terms of u-and v-statistics, we refer to Csörgő and Nasari [6], where, in a more general setup, we establish in probability and almost sure consistencies of randomized u-and v-statistics.
In Appendix 1 we also show that, when When E X X 4 < +∞, the preceding convergence also holds true when n = o(m 2 n ) (cf. the proof of (C) and (D) of Corollary 2.1).On combining (4.9) with the CLT in (4.7), when E X X 2 < +∞, as n, m n → +∞ so that m n = o(n 2 ) and n = o(m n ), the following unconditional CLT holds true as well in terms of P X,w G (2)   mn,n d where, and also throughout, d −→ stands for convergence in distribution, G mn,n in mind as direct asymptotic pivots for µ, the CLT's as in (4.6) and (4.7), as well as their respective versions as spelled out in Remark 4.1, together with the CLT's as in (4.10), (4.11) and (4.12), can be used to construct exact size asymptotic C.I.'s for the population mean µ = E X X.Thus, in terms of G (1) mn,n , as n, m n → +∞ and m n = o(n 2 ), we conclude as follows, a 1 − α size asymptotic C.I. for the population mean µ = E X X, which is valid both in terms of the conditional P X|w and in unconditional P X,w distributions as in (4.6) and (4.7) respectively, as well as with rates of convergence as in Remark 4.1: where z α/2 satisfies P (Z ≥ z α/2 ) = α/2 and √ .: When E X X 4 < +∞, then we can replace S n by S mn,n in (4.13), and then the thus obtained 1 − α size asymptotic C.I. for the population mean µ holds true in terms of G (2) mn,n via both of the respective CLT's as in (4.11) and (4.12) with respective rates of convergence as indicated in Remark 4.2.
In view of Remark 4.1, on taking m n = n, when E X |X| 3 < +∞, then both CLT's as in (4.6) and (4.7) hold true with a O(1/n) rate of convergence (cf.Remark 2.1 and (2.1) of Corollary 2.3).Hence, the 1 − α size asymptotic C.I. for µ as in (4.13) is also achieved at that rate in both cases.The same conclusion remains true on replacing S n by S mn,n in (4.13) and taking m n = n when E X X 4 < +∞ (cf.Remarks 4.2 and 2.1, and (2.7) of Corollary 2.3).

Randomized asymptotic pivots for the sample and population means of big data sets
The numerical characteristics of a given big data set should be fairly close to their population counterparts.For instance, the sample mean of a give data set {X 1 , . . ., X n } of large size n will be seen to deviate from the population mean only by a negligible error in the context of this paper.The same will be seen to be true for the sample percentiles and their population counterparts in Section 6.
When processing the entire big data set is not an option, then its numerical characteristics become unobservable, and hence unknown.Thus the estimators of the unknown parameters themselves are to be estimated as well.
In this section we construct confidence sets for the sample mean, Xn , of a large i.i.d.sample, shadowing that of the population µ.These confidence sets can in turn be used to serve as C.I.'s for the population mean µ, due to closeness of the two parameters in hand (cf.(5.8) and (5.9)).
To begin with, we consider the associated numerator term of mn,n , i = 1, 2, and write = Xmn,n − Xn .
We note that when the original sample size n is assumed to be fixed, then on taking only one large sub-sample of size m := m n , via re-sampling the set of indices of the observations with replacement as in Remark 1.1, as m → +∞, we have Xm,n → Xn in probability − P X,w mn,n , and further to (5.3), we have that ), then (cf.part (a) of Corollary 2.1 of Csörgő et al. [5] and Appendix 2) P X|w (T (1)  mn,n ≤ t) → P (Z ≤ t) in probability − P w f or all t ∈ R. (5.4) Consequently, as n, m n → +∞ so that m n = o(n 2 ), we arrive at an unconditional CLT.
Remark 5.1.When E X |X| 3 < +∞ and n, m n → +∞ so that m n = o(n 2 ), then, in addition to (5.4) Furthermore, in view of the latter CLT and (4.9), as n, m n → +∞ so that m n = o(n 2 ) and n = o(m n ), in terms of probability-P X,w we conclude the unconditional CLT T (2)   mn,n d where T (2) mn,n is as defined in (1.6).
Remark are computable based only on the smaller sub-sample rather than the entire original big data set.
Under their respective conditions the CLT's as in (5.6) and (5.7) can be used to construct confidence sets for the sample mean Xn that is an unknown parameter in our present context.
We spell out the one based on T mn,n as in (5.6) that is also valid in terms of (5.7), i.e., both in the context of Remark 5.2.Accordingly, when E X X 4 < +∞ and m n , n → +∞ so that m n = o(n 2 ) and n = o(m 2 n ), then for any α ∈ (0, 1), we conclude a 1 − α size asymptotic confidence set for Xn , at the indicated rates of convergence, as follows Xmn,n − z α/2 S mn,n √ .≤ Xn ≤ Xmn,n + z α/2 S mn,n √ ., where z α/2 is as in (4.13), and When E X |X| < +∞, as n → +∞, we have that Xn − µ =: ε n = o(1), almost surely in P X -probability, as n → +∞.Since, the original sample size n of a big data set is already very large to begin with, ε n is already negligible with high P X -probability.Consequently, the confidence set (5.8) for Xn can actually be viewed as a (1 −α) size asymptotic C.I. as well for the population mean µ, by simply rewriting it as follows where z α/2 and √ .are as in (5.8).
We emphasize that (5.8) and (5.9) are identical statements under the conditions as spelled out right above (5.8).The asymptotic negligibility of the error sequence ε n in (5.9) can, however, be studied on its own as n → +∞, freely from the identical conditions for (5.8) and (5.9) that m n = o(n 2 ) and n = o(m 2 n ), as n, m n → +∞.To further elaborate on the fact that (5.9) should work well as an asymptotic (1 − α) size C.I. for the population mean µ in the case of a big data set, we make use of some well known classical results on the complete convergence of Xn to µ under two or more moment conditions for X.
Further along these lines, we also mention the Baum and Katz theorem [2] that asserts +∞ n=1 n r/p−2 P X | Xn − µ| > ǫ n 1/p−1 < +∞, for every ǫ > 0 and some p ∈ (0, 2), if and only if E X |X| r < +∞.Thus, when E X X 4 < +∞, then for a big sample of size n = 10 6 , for example, with p = 1 This shows that ε n = Xn − µ in (5.9) becomes arbitrarily small at a very fast rate in probability-P X in terms of the original big sample size n, without paying attention to how n and m n relate to each other when arriving at the asymptotic (1 − α) size confidence set for covering Xn as in (5.8).Hence, the confidence set (5.8) for the unknown sample mean Xn of a big data set of size n, viewed as in (5.9), is also seen to be an asymptotic (1 − α) C.I. for the unknown population mean µ under the same conditions that are used to arrive at having (5.8).
We now also illustrate how one goes about constructing the coinciding random boundaries in (5.8) and (5.9) in general, and then in case of having a big sample of size n = 10 6 , as a convenient example.
First of all we emphasize that in the asymptotic confidence set (5.8) for Xn of a big data set, the bounds in hand, are computed by generating, independently from the entire data set, a realization of the random multinomial weights (w n ) as in Remark 1.1.Thus, instead of trying to process the entire big data set {X 1 , . . ., X n } in order to compute Xn , sampling it only via its index set {1, . . ., n} as above, we end up estimating Xn in terms of a confidence set as in (5.8) that can be based on significantly smaller subsamples of size m n of the entire big data set of size n, without having to deal with the latter directly, whenever E X X 4 < +∞ and m n = o(n 2 ) and n = o(m 2 n ) (cf.Remark 5.2).In this case the rate of convergence of the conditional CLT as in (5.7), as well as its unconditional CLT as in (5.6), is in view of (D) of Corollary 2.1 and (2.4) of Corollary 2.2 respectively.We note that, on account of having n = o(m 2 n ), as m n , n → +∞ we cannot consider taking m n = n 1/2 in the context of (5.10).We may however consider taking .11)and then the rate of convergence in (5.10) reduces to (5.12) For example, on taking δ = 1/4, then m n = n 3/4 , and the rate of convergence for covering Xn as in (5.8) becomes O(n −1/2 ), that coincides with that of the classical CLT for the Student t-statistic and pivot (cf.(1.1) and (1.2)).For instance, in this case, for a big sample of size n = 10 6 , the CLT of (5.7) and its unconditional version for T ) are generated independently from the data {X 1 , . . ., X 10 6 } with respective probabilities 1/10 These multinomial weights, in turn, are used to construct a (1 − α) size confidence set à la (5.8), covering the unobserved mean X10 6 , as well as the unknown population mean µ, with an error proportional to 0.001 (cf.(5.12) with δ = 1/4).More reduction of the sub-sample size m n can, for example, be achieved by taking instead of that in (5.11) and, via (5.10), arriving at the rate of convergence for the CLT's in hand, instead of that in (5.12).For instance, if we again consider having a big sample of size n = 10 6 , then (5.14) yields a subsample of size m n = 10 3 log log 10 6 ≈ 2, 626, and constructing a (1 − α) size confidence set à la (5.8), will cover the unobserved X10 6 , as well as the unknown population mean µ, with an error proportional to 1/(log log 10 6 ) 2 ≈ 1/7.The latter increased error, as compared to the previous example with respective sub-sample size m 10 6 = 31, 623, is due to the much reduced subsample of size m 10 6 = 2, 626 in this context.This scenario can also be viewed in terms of using normal z α/2 percentiles for the Student t-pivot T n (X − µ) when estimating the population mean µ on the basis of n = 49 i.i.d.observations with an error proportional to 1/ √ 49 = 1/7.
6 Randomized CLT's and C.I.'s for the empirical and theoretical distributions with application to big data sets Let X, X 1 , X 2 , . . .be independent real valued random variables with a common distribution function F as before, but now without assuming the existence of any finite moments for X.Let {X 1 , . . ., X n } be a random sample of size n ≥ 1 on X and, for each n, define the empirical distribution function and the sample variance of the indicator variables 1( and the multinomial weights as in Remark 1.1, that are independent from the random sample of n labeled units {X 1 , . . ., X n }, define the randomized standardized empirical process where is the randomized empirical distribution function.We note that, point-wise in x ∈ R, Define also the randomized sub-sample variance of the indicator random variables 1(X i ≤ x) by putting With n fixed and m = m n → +∞, along the lines of (5.2) we arrive at and, consequently, point-wise in x ∈ R, as m = m n → +∞, Furthermore, à la (5.3), as n, m n → +∞, point-wise in x ∈ R, we conclude that, in turn, point-wise in x ∈ R, as n, m n → +∞, implies with S 2 mn,n and S 2 n (x) respectively as in (6.6) and (6.2).We wish to note and emphasize that, unlike in (4.9), for concluding (6.10), we do not have to assume that n = o(m n ) as n, m n → +∞.
Further to the randomized standardized empirical process α n,mn (x), we now define the following Studentized/self-normalized versions with x ∈ R, as follows: . ( Clearly, on replacing X i by 1(X i ≤ x) and µ by F (x), x ∈ R, in the formula in (4.2), we arrive at the respective statements of (4.1) and (4.3) in this context.Also, replacing X i by 1(X i ≤ x) in the formula as in (4.4), we conclude the statement of (4.5) with µ replaced by F (x), x ∈ R.
As to the latter statement, on letting as n, m n → +∞, such that m n = o(n 2 ), point-wise in x ∈ R, by virtue of (4.5), In Lemma 5.2 of Csörgő et al. [5] it is shown that, if m n , n → +∞ so that m n = o(n 2 ), then This, mutatis mutandis, combined with (a) of Corollary 2.1 of Csörgő et al. [5], as n, m n → +∞ so that m n = o(n 2 ), yields P X|w α(s) mn,n (x) ≤ t → P (Z ≤ t) in probability − P w , f or all x, t ∈ R, (6.18) with s = 1 and also for s = 2, and via Lemma 1.2 in S. Csörgő and Rosalsky [7], this results in having also the unconditional CLT with s = 1 and also for s = 2. On combining (6.19) and (6.10), as n, m n → +∞ so that m n = o(n 2 ), when s = 1 in (6.19), we conclude α (1) mn,n (x) d −→ Z (6.20) and, when s = 2 in (6.19), we arrive at for all x ∈ R.
Remark 6.1.The Berry-Esséen type inequality (A) of our Theorem 2.1 continues to hold true for α(2) mn,n (x), and so does also (B) of Theorem 2.1 for α (1) mn,n (x), without the assumption E X |X| 3 < +∞, for the indicator random variable 1(X ≤ x) requires no moments assumptions.Remark 6.2.In view of Remark 6.1, in the context of this section, (A) and (B) of Corollary 2.1 read as follows: As n, m n → +∞ in such a way that m n = o(n 2 ), then, mutatis mutandis, (A) and (B) hold true for α(1) mn,n (x) and α(2) mn,n (x), with O(max{m n /n 2 , 1/m n }) in both.Consequently, statements (2.1) and (2.2) of Corollary 2.2 also read similarly for α(1) mn,n and α(2) mn,n (x) in terms of the conditions and the rates of convergence.Thus, on taking m n = n, we immediately obtain the optimal O(n −1 ) rate conclusion of Remark 2.1 in this context as well, i.e., uniformly in t ∈ R and point-wise in x ∈ R for α (1) mn,n (x) and α(2) mn,n (x).
Remark 6.3.As to the rate of convergence of the respective CLT's in terms of P X,w as in (6.20) and (6.21), and also in terms of P X|w , via (C) and (D) of Corollary 2.1, for α(1) mn,n (x) and α(2) mn,n (x), as n, m n → +∞ in such away that m n = O(n 2 ), we obtain the rate O(max{m n /n 2 , 1/m n }).Thus, on taking m n = n, we conclude the optimal rate of convergence O(n −1 ) for α(1) mn,n (x) and α(2) mn,n (x), uniformly in t ∈ R and point-wise in x ∈ R.
The CLT's for α(1) mn,n and α(1) mn,n can be used to construct point-wise confidence sets for the empirical distribution function F n (.), while those for α(2) mn,n and α(2) mn,n provide point-wise C.I.'s for the distribution function F (.).We spell out the ones, respectively based on α(1) mn,n and α(2) mn,n , that are valid both in terms of P X|w and P X,w with the rate of convergence O max{m n /n 2 , 1/m n } (cf.Remark 6.3).Thus, as n, m n → +∞ so that m n = o(n 2 ), the CLT's in hand respectively result in the following asymptotically exact (1 − α) size C.I.'s, for any α ∈ (0, 1) and point-wise in x ∈ R: mn − 1 n ) 2 , S mn,n (x) = F mn,n (x)(1 − F mn,n (x)) as in (6.6), Fmn,n (x) as in (6.4), and F mn,n (x) as in (6.15).
On taking m n = n, then, for each x ∈ R, both of the preceding C.I.'s achieve their nominal level at the optimal rate of O(n −1 ).This is a significant achievement in capturing the population distribution by (6.23), for each x ∈ R, when the available sample is of moderate size or small.
In case of having a big data set of size n, when processing the entire data set may not be possible, then both F n (.) and F (.) are to be estimated.In this case the confidence set (6.22) can serve not only for covering F n (x), but F (x) as well with any desirable accuracy for each x ∈ R. Namely, on putting ε n (x) = F n (x) − F (x), x ∈ R, we simply re-write it as follows (6.24) and argue via the Glivenko-Cantelli theorem that in case of big data sets ε n is negligible with any desired accuracy for each x ∈ R at a fast enough rate of convergence as n → +∞, without paying attention to how m n and n relate to each other when arriving at the asymptotic (1 − α) size confidence set that covers F n (x) for each x ∈ R as in (6.22).This, in turn, is guaranteed by the Dvoretzky-Kiefer-Wolfowitz [9] inequality that asserts for all ǫ > 0 P X ( sup (6.25) On summing in (6.25), one concludes the Glivenko-Cantelli theorem at the indicated exponentially fast rate of convergence to zero in P X -probability that of course also holds true point-wise in x ∈ R for ε n (x) as in (6.24).Thus, the error induced when estimating F (x), point-wise in x ∈ R, as in (6.24) is practically zero for data sets of big size n.
For example, in view of inequality (6.25),where the best possible constant 2 in front of the exponential function is due to Massart [14], when a large sample of size n = 10 6 is at hand, then we have for all ǫ > 0. Thus, practically, the confidence set (6.22) for F n (x) is also a C.I. for F (x) in the case of big data sets of size n.
Recall now that as n, m n → +∞ in such a way that m n = o(n 2 ), then the rate of convergence for having the (1 − α) size confidence set (6.22) for F n (x), and also for F (x), in view of (6.24), for x ∈ R, is O m n /n 2 , 1/m n .Consequently, when drawing a significantly smaller sub-sample of size m n = n 1/2 , for example, the rate of convergence becomes O(n −1/2 ) that coincides with the rate of convergence of the classical CLT for the Student t-statistic and pivot, based on n observations as in (1.1) and (1.2) respectively.Needless to say that in case of a big data set, a sub-sample of size m n = n 1/2 can be a huge reduction in the number of observations that we are to deal with instead of the original sample that, in our approach, results in the same magnitude of error as that of the classical CLT when the entire sample of size n is to be observed.
To illustrate the reduction provided by our confidence set (6.22) when it used to cover F n (x) or F (x), point-wise in x ∈ R, we consider a big data set of size n = 10 6 .By generating the random weights (w (cf.Remark 1.1), our confidence set (6.22) to capture F n (x) is achieved with an error proportional to 1/1000.Recalling also that in this case ε n = F n (x) − F (x) is negligible already (cf.(6.26)), we also conclude that (6.22) captures F (x) with an error proportional to 1/1000.

Proofs
Proof of Theorem 2.1 Due to similarity of the two cases we only give the proof of part (A) of this theorem.The proof relies on the fact that, via conditioning on the weights is a sum of independent and non-identically distributed random variables.This in turn enables us to use a Berry-Esséen type inequality for self-normalized sums of independent and non-identically distributed random variables.Also, some of the ideas in the proof are similar to those of Slutsky's theorem.
We now write In view of the above setup, for t ∈ R and ε 1 > 0, we have Observe now that for ε 2 > 0 we have One can readily see that Combining now the preceding conclusion with (7.3), (7.2) can be replaced by Now, the continuity of the normal distribution Φ allows us to choose We now use the Berry-Esséen inequality for independent and not necessarily identically distributed random variables (cf., e.g., Serfling [16]) to write , where C is a universal constant as in the Berry-Esséen inequality in this context (cf.page 33 of Serfling [16]).Incorporating these approximations into (7.5)we arrive at sup From the preceding relation we conclude that > δ n (7.6) with δ n as defined in the statement of Theorem 2.1.
For ε > 0, the right hand side of (7.6) is bounded above by We bound Π 1 (n) above by mn , an application of Chebyshev's inequality yields We now use the fact that w (n) 's are multinomially distributed to compute the preceding relation.After some algebra it turns out that it can be bounded above by (7.9) Incorporating (7.7) and (7.9) into (7.6)completes the proof of part (A) of Theorem 2.1.

Proof of Corollary 2.1
The proofs of parts (A) and (B) of this corollary are immediate consequences of Theorem 2.1.
To prove parts (C) and (D) of this corollary, in view of Theorem 2.1 it suffices to show that, for arbitrary ε 1 , ε 2 > 0, as n, m n → +∞, ). (7.10) To prove the preceding result we first note that By virtue of the preceding observation, we proceed with the proof of (7.10) by first letting d and writing Observe now that (7.12) We note that in the preceding relation, since i, j, k are distinct, we have that Also, since i, j, k, l are distinct, we have that Therefore, in view of (7.12) and (7.11), the proof of (7.10) follows if we show that The preceding two conclusions imply (7.13) and (7.14), respectively.Now the proof of Corollary 2.1 is complete.

Proof of Corollary 2.2
The proof of this result is relatively easy.Due to their similarity, we only give the proof for part (A) as follows.For arbitrary positive δ, we write sup −∞<t<+∞ P X,w G (1)  mn,n ≤ t − Φ(t) ≤ δ + 2P P X|w (G (1)  mn,n ≤ t) − Φ(t) > δ The last relation above is true in view of Corollary 2.1.

Appendix 1
Consider the original sample {X 1 , . . ., X n } and assume that the sample size n ≥ 1 is fixed.We are now to show that when n is fixed, as m → +∞, we have Xmn,n → Xn in probability P X,w .To do so, without loss of generality we assume that µ = 0. Let ε 1 , ε 2 > 0, and write → 0, as m → ∞. (8.1) The preceding conclusion means that P X|w Xmn − Xn > ε 1 → 0 in probability-P w .Hence, by the dominated convergence theorem, we conclude that Xm → Xn in probability P X,w .
We are now to show that the randomized sample variance S 2 mn,n is an in probability consistent estimator of the ordinary sample variance S 2 n for each fixed n, when m → +∞.Employing now the u-statistic representation of the sample variance enables us to rewrite S 2 mn,n , as in (1.8), as follows .
In view of the preceding formula, we have )(X i − X j ) 2 > 2ε 1 > ε 2 2) The preceding relation can be bounded above by: Clearly, the latter term approaches zero when m → +∞, for each fixed n.By this we have shown that S 2 mn,n → S 2 n in probability-P X,w , when n is fixed and only m → +∞.

Consistency of Xmn,n in (4.4)
We give the proof of (4.4) for m n = n, noting that the proof below remains the same for m n ≤ n and it can be adjusted for the case m n = kn, where k is a positive integer.In order to establish (4.4) when m n = n, we first note that − Xn ) 2 n, the sample mean and sample variance, respectively, and consider the classical Student t−statistic T n (X) := Xn

Remark 1 . 1 .
In view of the preceding definition of w(n) i , 1 ≤ i ≤ n,they form a row-wise independent triangular array of random variables such that n i=1 w (n) i = m n and, for each n ≥ 1, (w , i = 1, 2, via establishing Berry-Esséen type results in Theorem 2.1 and its Corollaries 2.1-2.3.In Corollary 2.3 we show that, on taking m n = n, G mn,n , i = 1, 2, converge, in distribution, to the standard normal at the rate of O(1/n).This rate is significantly better than the best possible O(1/ √ n) rate of convergence under similar moment conditions for the classical t-statistic T n (X) and its Student pivot T n (X − µ), based on a random sample of size n.The latter O(1/ , as in (1.7) and (1.6) respectively, read as follows.

6
= 1000, independently from the original sample

n → +∞, where ε 1 and ε 2 are arbitrary positive numbers.
mn for m n = n when E X X 4 < +∞.Assume that E X |X| 3 < +∞.If n, m n → +∞ in such a way that m n = o(n 2 ), then, for arbitrary δ > 0, we have When m n = n, for arbitrary positive δ, as n → +∞, we have .7) Remark 4.1.When E X |X| 3 < +∞ and n, m n → +∞ so that m n = o(n 2 ), 5.2.Assuming that E X X 4 < +∞ and n, m n → +∞ so that m n = o(n 2 ) and n = o(m 2 n ), we then have (2.4) and (2.8) as in Corollaries 2.2 and 2.3 respectively, i.e., then the unconditional CLT as in (5.6), in terms of P X,w , holds true at the therein indicated respective rates of convergence.Naturally, under the same conditions, as n, m n → +∞, we have (D) of Corollary 2.1 as well, i.e., P X|w (T(2)mn,n ≤ t) −→ Φ(t) in probability − P w f or all t ∈ R (5.7)at the therein indicated rate of convergence.Remark 5.3.Considering that our approach to randomizing the original sample in this section coincides with drawing a smaller sub-sample of size m n with replacement from the original big data set {X 1 , ..., X n } via re-sampling its index set {1, ..., n} as in Remark 1.1, it is important to note that in order to compute both Xmn,n and S 2 mn,n , as in(1.5)and(1.8),respectively, only those X i 's are needed whose w