Self-normalized Limit Theorems: a Survey

Let X 1 , X 2 ,. .. , be independent random variables with EX i = 0 and write Sn = n i=1 X i and V 2 n = n i=1 X 2 i. This paper provides an overview of current developments on the functional central limit theorems (invariance principles), absolute and relative errors in the central limit theorems, moderate and large deviation theorems and saddle-point approximations for the self-normalized sum Sn/Vn. Other self-normalized limit theorems are also briefly discussed.


Introduction
Let X, X 1 , X 2 , . . .be a sequence of i.i.d.random variables with mean zero, and put X = S n /n, There has been increasing interest in the investigation of limit behaviors for the so-called self-normalized sum S n /V n in the past decade.One of the reasons for this flourish comes from the fact: where t n = S n /σ n is the classical Student t-statistic.This allows us to only consider the distribution properties of the less complex S n /V n to discuss the distribution properties of t n , which is frequently used in practice to test hypotheses about the mean.More importantly, the limit theorems for S n /V n (and hence for t n ) usually require much less stringent moment conditions than the classical limit theorems do, and hence provide much wide practical applicability.
The past decade has witnessed significant development in the arena of weakening moment conditions for self-normalized limit theorems.Griffin and Kuelbs (1989) [31] obtained a self-normalized law of the iterated logarithm for all distributions in the domain of attraction of a normal or stable law.Shao (1997) [55] showed that no moment conditions are needed for a self-normalized large deviation result P (S n /V n ≥ x √ n), and that the tail probability of S n /V n is Gaussian-like when X 1 is in the domain of attraction of the normal law and sub-Gaussian-like when X is in the domain of attraction of a stable law, while Giné, Götze and Mason (1997) [29] proved that the tails of S n /V n are uniformly sub-Gaussian when the sequence is stochastically bounded.Shao (1999) [57] established a Cramér type moderate deviation result for self-normalized sums only under a finite third moment condition.Jing, Shao and Wang (2003) [37] proved a Cramér type moderate deviation result (for independent random variables) under a Lindeberg type condition.Jing, Shao and Zhou (2004) [38] obtained the saddlepoint approximation without any moment conditions.Other results include Wang and Jing (1999) [69] as well as Robinson and Wang (2005) [54] for an exponential non-uniform Berry-Esseen bound; Csörgő, Szyszkowicz and Wang (2003a, b) [18,19] for Darling-Erdős theorems and Donsker's theorems; Wang (2005) [66] as well as Wang and Hall (2009) [68] for a refined moderate deviation; Hall and Wang (2004) [35] for exact convergence rates, and Chistyakov and Götze (2004b) [13] for all possible limiting distributions when X is in the domain of attraction of a stable law.We also refer to de la Pena, Lai and Shao (2009) [52] for a systematic presentation on self-normalized processes and their statistical applications.
The main aim of this paper is to provide an overview of new developments on the functional central limit theorems (invariance principles), absolute and relative errors in the central limit theorems, Cramér-Chernoff-type large deviations and saddle-point approxiamtions for the S n /V n .Partial materials have been collected in [56,58,60].We represent these here for the sake of completeness.Explicitly, Section 2 will review weak convergence properties of S n /V n , including central limit theorems and invariance principles.The absolute and relative errors in the central limit theorems for the S n /V n will be given in Section 3 and 4 respectively.Section 5 reviews Cramér-Chernoff-type large deviations and saddle-point approximations.Finally in Section 6 we briefly review other self-normalized limit theorems, like the self-normalized law of the iterated logarithm, Darling-Erdös type theorem, limit theorem for studentized non-linear statistics, etc.Throughout the paper, we assume that X, X 1 , . . ., X n are i.i.d.random variables with EX = 0, except for those explicitly specified.

The central limit theorem and invariance principle
Efron (1969) [25] might be the first paper to investigate the limit behavior of the Student's t-statistic t n or, equivalently, the S n /V n , in some nonstandard cases.The general research begins with Logan, Mallows, Rice and Shep (1973) [44] (LMRS for short) in which the authors showed, among many other results, that if X is in the domain of attraction of an α-stable law, 0 < α ≤ 2, centered if α > 1 and symmetric if α = 1, then S n /V n converges in distribution to a limit, which is sub-Gaussian, and if moreover X is symmetric, then the moments of S n /V n also converge to the corresponding moments of the limit.LMRS also conjectured that S n /V n is asymptotically normal if (and perhaps only if) X is in DAN (the domain of attraction of the normal law) and the only possible nontrivial limit distributions of S n /V n are those obtained when X follows a stable law.
Based on Raikov' theorem, as was noticed by Maller (1981) [45], among others, one can easily show the "if" part in the conjectures of LMRS.We refer to Csörgő and Horváth (1988) [15], Griffin and Mason (1991) [32] for more details in this regard.It is the "only if" part that has remained open until 1997 for the general case of not necessarily symmetric random variables, when Giné, Götze and Mason (1997) [29] proved that Theorem 2.1.S n /V n → D N (0, 1) if and only if X is in the domain of attraction of the normal law.
Giné, Götze and Mason (1997) [29] also showed that, if the self-normalized sums S n /V n , n ∈ N , are stochastically bounded, then they are uniformly sub-Gaussian in the sense that sup n∈N Ee tSn/Vn ≤ 2e ct 2 for all t ∈ R and some c < ∞.This, in turn, implies a basic requirement in the proof of this result that the moments of S n /V n converge to those of a N (0, 1) r.v.whenever S n /V n is asymptotically standard normal.
The second conjecture of LMRS was confirmed by Chistyakov and Götze (2004b) [13] who proved the following theorem.Theorem 2.2.S n /V n converge weakly to a random variable Z such that (a) P (|Z| = 1) < 1 if and only if (1) X is in the domain of attraction of a stable law with α ∈ (0, 2];

in the domain of attraction of Cauchy's law and
Feller's condition holds: lim n→∞ nE sin(X/a n ) exists and is finite, where (b) P (|Z| = 1) = 1 if and only if P (|X| > x) is a slowly varying function at +∞.
The proofs of Chistyakov and Götze (2004b) [13] are very technical.It would be interesting to find an alternative approach.In the independent, but not identically distributed case, Mason (2005) [46] considered self-normalized triangular arrays.The result in [46] is stated as follows.
Theorem 2.3.Let X 1,n , . . ., X n,n , n ≥ 1, be a triangular array of independent infinitesimal random variables.Assume that for a nondegenerate pair (U, V ).Then n i=1 X i,n n i=1 X 2 i,n → D N (0, 1) if and only if for some τ > 0, Mason (2005) [46] also claimed other general results.For instance, their Theorem 1 leads to an alternative proof of the Giné, Götze and Mason (1997) [29] result.In the independent symmetric case, the following result from Egorov (1996) [26] is also of interest.
Theorem 2.4.Let X 1 , X 2 , . . ., be independent symmetric random variables around mean zero.Then S n /V n → D N (0, 1) if and only if Note that (2.1) is equivalent to the condition that X is in the domain of attraction of the normal law if {X j , j ≥ 1} is a sequence of i.i.d.random variables (cf.O'Brien (1980) [48]).Also, it is readily seen that the Lindeberg con-dition implies (2.1).However, it is not clear at this moment whether or not Theorem 2.4 still holds for general independent random variables, i.e., without assuming {X j , j ≥ 1} to be symmetric.In the i.i.d.case, Theorem 2.4 has been previously proved in Griffin and Mason (1991) [32].
The extension of the self-normalized central limit theorem to Donsker type functional central limit theorem was established in Csörgő, Szyszkowicz and Wang (CsSzW) (2003b) [19].Define S [nt] = [nt]  i=1 X i .The following theorem comes from their Theorem 1.
Theorem 2.5.As n → ∞, the following statements are equivalent: (a) EX = 0 and X is in the domain of attraction of the normal law; Assuming appropriate conditions, there are two immediate analogs of Theorem 2.5 when {X j , j ≥ 1} is a sequence of independent random variables with EX j = 0 and finite variances EX 2 j .Write B 2 n = n j=1 EX 2 j .If the Lindeberg condition holds, namely for all ǫ > 0, then it is readily seen that V 2 n /B 2 n → P 1. Hence it follows easily from classical results (cf., e.g., Prohorov (1956) [53]) that S Kn(t) /V n → D W (t) on (D[0, 1], ρ), where K n (t) = sup{m : B 2 m ≤ tB 2 n }.By using a similar method as in the Theorem 2.5, we can also redefine {X j , j ≥ 1} on a richer probability space together with a sequence of independent normal random variables {Y j , j ≥ 1} with mean zero and V ar(Y j ) = V ar(X j ) such that provided that the Lindeberg condition holds.Furthermore, CsSzW (2003b) [19] also proved the following result for self-normalized, self-randomized partial sums processes of independent random variables.
Since we assume EX j = 0, the investigation of limit behaviors for S n /V n in Theorems 2.1-2.6 is only related to centralized Student t-statistics.The limit behaviors of the non-central Student t-statistic was discussed in Bentkus, Jing, Shao and Zhou (2007) (BJSZ for short) [8].Under the assumption of EX 2 < ∞, the limit behaviors of the non-central limit t-statistic are shown to be different for the following two cases: (a) where Y is a standardized Bernoulli random variable.The following theorem comes from BJSZ's Theorems 1 and 2. , (ii) For any X other than the one given in (i), we have , in the domain of attraction of the stable law with an index τ ∈ [1,2], and that , where c n > 0 (slowly varying) and d n (diverge to ∞) are constants related to the limit law: where We have In BJSZ, the authors also considered the limit behavior of the non-centralized Student t when EX 2 = ∞ or µ = µ n = o(n).

Absolute errors in the central limit theorems
There are mainly two approaches for estimating the error of the normal approximation in Section 2. One is to study the absolute error in the self-normalized central limit theorem via a Berry-Esseen bound or an Edgeworth expansion.Put b n = sup{ x : n x −2 E{X 2 I(|X| ≤ x)} ≥ 1} and As a major advance in this direction, Bentkus and Götze (1996) [7] refined the results in Slavova (1985) [63] as well as Hall (1988) [34] and showed that Theorem 3.1.If X is in the domain of attraction of the normal law, then where A is an absolute constant.
The Berry-Esseen bounds provide an upper bound for the rate of convergence in the central limit theorem.In order to characterize the rate of convergence in the central limit theorem for S n /V n , Hall and Wang (2004) [35] investigated the leading term arguments under the optimal conditions.Letting [35] established the following result, among others.Theorem 3.3.If X is in the domain of attraction of the normal law, then where If in addition Cramér's condition holds, i.e.
) on the right-hand side of (3.2) can be replaced by O(n −1 ).
Theorem 3.3 argues that L n (x) is a leading term in an expansion of the distribution of S n /V n .Indeed, it was proved in [35] that δ 1n → 0 and There exist examples of distributions in the domain of attraction of the normal law, having zero mean, and for which any given one of the four components in the definition of δ 1n , dominates all the others along a subsequence.It follows that none of the terms of which δ 1n is comprised can be dropped if we require a full account of the rate of convergence in the central limit theorem.Together, properties (3.2) and (3.3) give concise results about the rate of convergence in the central limit theorem.For example, assuming that X is in the domain of attraction of the normal law and E(X) = 0, we have If additionally E(|X| 3 ) < ∞ and E(X 2 ) = 1, then we also have sup where γ = E(X 3 ), as n → ∞.
Formula (3.4) shows that in the case of finite third moments, the leading term is asymptotic to its conventional form in an Edgeworth expansion.Consequently, if E|X| 3 < ∞ and the distribution of X is nonlattice, then where . More results on Edgeworth expansion for Student's t statistics can be found in Hall (1987) and Bloznelis andPutter (1998, 2002) [9,10].
Wang and Jing (1999) [69] was the first to investigate the non-uniform Berry-Esseen bound for S n /V n .The result given by [69] was extended in Robinson and Wang (2005) [54], where an exponential non-uniform bound was established under the optimal moment conditions.The following result comes from Theorem 3 of [54].
Theorem 3.4.If X is in the domain of attraction of the normal law, then there exist 0 < η < 1 such that for all x ∈ R and n ≥ 1, where δ n is defined as in (3.1) and A is an absolute constant.
The constant η in Theorem 3.4 may depend on the distribution of X and cannot be replaced by an absolute constant.For example, let X 1 , . . ., X n be iid random variables from the distribution P (X = 1) = 1−P (X = −p/(1−p)) = p, where 0 < p < 1.It is readily seen that EX = 0, and for x = √ n and p ≥ 1/2 Since log p < 0 and log p ↑ 0 as p ↑ 1, (3.5) cannot be true for an absolute constant.Corollary 2.3 of [69] provided a similar result to (3.5) under E|X| 10/3 < ∞.However the corollary misused the concept of absolute constant.It is possible to replace the η in (3.5) by 1 if we restrict the x in a narrow range or if we require X j to be symmetric random variables.Indeed it follows from Theorem 4.1 below, that, if EX = 0 and E|X| 3 < ∞, then , where σ 2 = EX 2 and A is an absolute constant.Furthermore if X is a symmetric random variable around zero with E|X| 3 < ∞, then See Wang and Jing (1999) [69] as well as Chistyakov and Götze (2003) [11].

Relative errors in the central limit theorems
Section 3 reviewed the absolute error in the central limit theorem for S n /V n .This section considers the relative error of P (S n /V n ≥ x) to 1 − Φ(x), that is, the Cramér-type moderate deviation for S n /V n , which is another approach for estimating the errors in the normal approximation in Section 2. In this regard, Jing, Shao and Wang (2003) [37] refined Shao (1999) [57], Wang and Jing (1999) [69] as well as Chistyakov and Götze (2003) [11], and obtained the following result.Let Theorem 4.1.If X 1 , X 2 , . . .are independent random variables with EX j = 0 and 0 < E|X j | 3 < ∞, then 3n , where O 1 is bounded by an absolute constant.[37] actually established more general frameworks and considered applications to the self-normalized law of the iterated logarithm and the studentized bootstrap.There are several further extensions in i.i.d.settings.Using an example, Chistyakov and Götze (2004a) [12] proved that the result in [37] is sharp.Robinson and Wang (2005) [54] established a Cramér type result under optimal moment condition, that is, under the assumption that X is in the domain of attraction of the normal law.Assuming EX 4 < ∞, Wang (2005) [66] as well as Wang and Hall (2009) [68] proved that P (S n /V n ≥ x)/{1 − Φ(x)} equals, to first order, exp{−x 3 EX 3 /(3 √ nσ 3 )} where σ 2 = EX 2 .Explicitly, the following Theorem 4.2 is from Theorem 1.1 of Wang (2005) [66].
Theorem 4.2.Assume that EX = 0 and EX 4 < ∞.Then, for x ≥ 0 and x = O(n 1/6 ), ) We mention that the proofs of Theorems 4.1 and 4.2 (other related results in the cited articles as well) depends heavily on the following ideas and facts.First of all, by the Cauchy inequality , where b := b x = x/B n , we have i are independent having Ee tYi < ∞ for any t > 0, the lower bound of (4.1) was obtained from the fact that: 3n .By the conjugate method, it is easy to prove (4.5) and it is also possible to provide a more concise result.
A truncation technique is used in establishing the upper bound of (4.1).Let τ := τ n,x = B n / max{1, x} and define Xi = X i I(|X i | ≤ τ ).We may write where p 1n = P (S n ≥ xV n , X i = Xi , all i = 1, . . ., n) and It is relatively easy to show that 3n .Therefore the key step in the proof of (4.1) is to establish the result: 3n .This was done in Jing, Shao and Wang (2003) [37] (and other related results in the cited articles) by separating the x into "small" and "large" cases.Explicitly in [37] we proved (4.9) and for 0 ≤ x ≤ L −1/3 3n .By using (4.9) we obtain (4.8) when x is "large", and by using (4.10) we obtain (4.8) when x is "small".The proofs of both (4.9) and (4.10) are difficult and very technical.

X2
i .This, together with (4.4), heuristically provides the fact that As noticed before, it is relatively easy to derive a concise estimate for P (2bS n − b 2 V 2 n ≥ x 2 ).Based on these observations, Wang (2011) [67] provided an alternative proof of (4.8).Furthermore he gave a more concise estimate of P (S n /V n ≥ x) instead of the (4.1), which is stated as follows.
Let's start with some notation and basic facts.
> 0 for all λ > 0 and x = 0, where Z 1 , . . ., Z n are independent random variables with Z j having distribution function V j (u) defined by (4.12) Hence, for each x = 0, m ′ (λ) is a strictly increasing function for λ > 0. Furthermore, there exists an absolute constant A 0 such that for all x > 0 satisfying x 2 has a unique solution λ 0 > 0. For this defined λ 0 , [67] established the following result.Write where δ = δ(x) and c = c(x) are defined later.
where τ = B n / max{1, x} and denote λ 1 for the solution of the equation m ′ (λ) = x 2 + 2xc.Theorem 4.4.If X 1 , X 2 , . . .are independent random variables with EX j = 0 and 0 < E|X j | 3 < ∞, then or equivalently, ) uniformly for |c| ≤ x/5 and for all x > 0 satisfying x ≤1 3 B n /(max i E|X i | 3 ) 1/3 and x ≤ L −1 3n /A 0 , where O 1 and O 2 are bounded by an absolute constant.Consequently, we also have ) 3n /A 0 , where O 1 and O 2 are bounded by an absolute constant.Theorem 4.4 provides a concise estimate of P (S n /V n ≥ x) instead of (4.1).It also improves the general result in Jing, Shao and Wang (2003) [37].Assuming that X 1 , X 2 , . . ., X n are i.i.d.random variables, result (4.17) gives Theorem 4.2 and Theorem 1 of Wang and Hall (2009) [68].It would be interesting to see whether or not it is possible to remove the error term O 1 ∆ n,x in (4.16).If the x is in a small range [for instance 0 ≤ x ≤ O(n 1/6 ) in the i.i.d.settings], then the term O 1 ∆ n,x is smaller than that of xL 3n , and hence can be removed (see Theorem 4.3).However it plays a part when the x is in a median or large range [for instance O(n In terms of the difference between (4.4) and (4.11), the current truncation technique seems to raise an error term like O 1 ∆ n,x .
Based on these facts, in order to obtain a better estimate of P (S n /V n ≥ x) in a median or large range for the x, we may have to use a completely different technique.Jing, Shao and Zhou (2004) [38] [also see Zhou and Jing (2006) [71]] investigated a saddle-point approximation for the tail probability P (S n /V n ≥ x) in a very large range for the x, that is, x = c √ n with 0 < c < 1 in the i.i.d.settings (see the review in the next section).It is not clear at the moment if the technique in [38] or [71] can be employed to provide a better approximation for P (S n /V n ≥ x) in a median range for the x [i.e., O(n We note that [38] derived their results without imposing any moment conditions on X, but we do require a moment condition to establish a better approximation for P (S n /V n ≥ x) in a small range for the x.For instance, a finite third moment is necessary to establish , where C is a constant.By taking consideration of this fact, some significant modifications might be necessary even if the technique in [38] would work to provide a better approximation for P (S n /V n ≥ x) in a median range for the x.On the other hand, the problem may be solvable by the method developed in Shao and Zhou (2012) [61] where a new randomized concentration inequality is obtained to establish a Cramér type moderate theorem for self-normalized non-linear statistics (see Section 6).

Cramér-Chernoff type large deviation and saddle-point approximation
Section 4 considered the relative error of ).In the i.i.d.settings, assuming EX = 0 and E|X| 3 < ∞, it follows from Theorem 4.4 that whenever x n → ∞ and x n = o(n 1/2 ).It is interesting to notice that the result (5.1) may be proved directly under fewer moment conditions.See Shao (1997) [55] and Jing, Shao and Zhou (2008) [38].Indeed, Shao (1997) [55] showed that if EX = 0 and the distribution of X is in the domain of attraction of the normal law, then (5.1) holds true.Furthermore, [38] established the following more general Theorem 5.1.Denote the support of X by C s , that is, Say X is in the centered Feller class if X ∈ F θ for some 0 ≤ θ < ∞, where Then we have Theorem 5.1.Suppose that Also assume that X is in the centered Feller class.Then, for any sequence {x n , n ≥ 1} satisfying x n → ∞ and where λ(x) = inf b>0 sup t≥0 (tx The result (5.2) also holds true when x n = ǫ √ n for some ǫ > 0. In fact, in this situation, the condition that X is in the centered Feller class is not necessary.The following theorem proved in [55] claims this statement.Theorem 5.2.Assume that either EX = 0 or EX 2 = ∞.Then, for ǫ > EX/(EX 2 ) 1/2 , where λ(x) is defined as in Theorem 5.1 and EX/ (EX 2 ) 1/2 is interpreted to be zero if EX 2 = ∞ and 0/0 to be ∞.
Note that no conditions are required for the result (5.3), since it is natural to assume EX = 0 if EX 2 < ∞.By adding some smoothing conditions for the distribution of X, [38] derived the saddle-point approximation for the tail probability P (S n /V n ≥ ǫ √ n) without any moment conditions.Indeed, it is proved in [38] that, if the distribution function F (x) of X is continuous, then, for any 0 < ǫ < 1, the equation: has a unique solution (s 0 , t 0 , a 0 ) such that s 0 > 0, t 0 < 0 and a 0 > 0, where and ∆ = K ss (s 0 , t 0 ) K st (s 0 , t 0 ) K st (s 0 , t 0 ) K tt (s 0 , t 0 ) .

Other self-normalized limit theorems
Sections 2-5 reviewed current developments in the investigations for the selfnormalized sums S n /V n , along the lines related to central limit theorems.This section will briefly mention other important self-normalized limit theorems.

Self-normalized laws of the iterated logarithm
Griffin and Kuelbs (1989) [31] was the first to investigate the laws of the iterated logarithm for the self-normalized sums.The following beautiful result is from their Theorem 1. Theorem 6.1.If EX = 0 and X is in the domain of attraction of the normal law, then Griffin and Kuelbs (1989) [31] and later Shao (1997) [55] also discussed the laws of the iterated logarithm under the condition that X is in the domain of attraction of the stable law.More currently, Jing, Shao and Zhou (2008) [39] established a general result under the condition that X is in the centered Feller class defined as in Section 5.The following Theorem 6.2 is from their Theorem 1.2.Note that Theorem 6.2 also extends those results given by Giné and Mason (1998) [30].Theorem 6.2.Under the conditions of Theorem 5.1, where t 0 = lim x→0 + t x , and (t x , b x ) is the solution of the following equations: There are other extensions for the self-normalized laws of the iterated logarithm.For instance, Csörgö and Hu (2013) [16] established a strong approximation result, Csörgö, Hu and Mei (2013) [17] obtained a strassen-type law of the iterated logarithm, Dembo andShao (1998, 2006) [23,24] considered selfnormalized laws of the iterated logarithm under space R d , de la Pena, Klass and Lai (2000Lai ( , 2004) ) [50,51] investigated the laws of the iterated logarithm for selfnormalized martingales.The later also derived other results for self-normalized processes.

Darling-Erdós type theorem and maximum of self-normalized sum
CsSzW (2003a) [18] and later Wang (2004) [65] investigated the asymptotic behavior in the distribution of the maximum of self-normalized sums, max 1≤k≤n S k / V k .The following Darling-Erdős type result Theorem 6.3 comes from Wang's Theorem Theorem 6.3.Suppose that l(x) is a slowly varying function at ∞, satisfying l(x) ≤ c 1 exp{c 2 (log x) β } for some c 1 > 0, c 2 > 0 and 0 ≤ β < 1/2.Then, for every t ∈ R, we have that Self-normalization significantly reduces the moment conditions in comparison to the classical result.Indeed, the classical Darling-Erdős theorem shows that, where σ 2 = EX 2 , if and only if EX = 0 and See Einmahl (1989) [28].The necessary condition for the result (6. 3) remains an open question.
The asymptotic behavior of max 1≤k≤n S k /V n is different from max 1≤k≤n S k /V k .This claim can be justified by the following result, coming from Theorem 1 of Liu, Shao and Wang (2012) [43].
[43] actually established more general results which improved those by Hu, Shao and Wang (2009) [36].It should be mentioned that Theorem 6.4 is comparable to the large deviation result for the maximum of partial sum given in Aleshkyavichene (1979) [1].However the latter requires a finite exponential moment condition.If we are only interested in a Chernoff type large deviation, the third moment condition required in Theorem 6.4 can be reduced significantly.Indeed, [36] proved the following theorem.for any x n → ∞ with x n = o( √ n).
We also refer to [23,24] and [42] for other limit theorems related to the T 2statistic.Because of its usefulness in statistical inferences, it would be interesting to find further sharp results for the T 2 -statistic as in Sections 2-5.For instance, we conjecture that the result (6.9) still holds if only E||X 1 − µ|| 3 < ∞.

Limit theorems for studentized non-linear statistics
Non-linear statistics are used in various statistical inference problems.It is known that many of them can be written as a partial sum of independent random variables plus a negligible term.Typical examples include U-statistics, multi-sample U-statistics, L-statistics, random sums and functions of non-linear statistics.Since the Standardized non-linear statistics often involve some unknown nuisance parameters, the Studentized analogues are commonly used in practice.
Let ξ 1 , . . ., ξ n be independent random variables satisfying Eξ i = 0. Assume the non-linear statistic of interest can be decomposed as a standardized partial sum of {ξ i }, e.g., W n , plus a remainder, say, D 1 .Then the Studentized analogues can be written as where , D 1 and D 2 are measurable functions of {ξ i }, 1 ≤ i ≤ n.Examples satisfying (6.10) include the t-statistic, Studentized U-statistics and L-statistics.There are many works on the asymptotic distribution theory for the Studentized nonlinear statistics T n .We only refer to Wang, Jing and Zhao (2000) [70] as well as Shao, Zhang and Zhou (2012) [62] for general Berry-Esseen bounds under minor moment conditions.A general Cramér type moderation for the Studentized statistics T n was established in Shao and Zhou (2012) [61].For 1 ≤ i ≤ n and x ≥ 0, let