and

This is partly an expository paper. We prove and highlight a quantile inequality that is implicit in the fundamental paper by Komlos, Major, and Tusnady (31) on Brownian motion strong approximations to partial sums of independent and identically distributed random variables. We also derive a number of refinements of this inequality, which hold when more assumptions are added. A number of examples are detailed that will likely be of separate interest. We especially call attention to applications to the asymptotic equivalence theory of nonparametric statistical models and nonparametric function estimation. AMS 2000 subject classifications: Primary 62E17; secondary 62B15,


Introduction
Komlós, Major, and Tusnády [KMT] [31,32] approximations to the partial sum and empirical processes are two of the most important results in probability over the last forty years.In particular, they proved the following powerful Gaussian coupling to partial sums [PS] of i.i.d.random variables.(We shall use the words approximation and coupling almost interchangeably.) Theorem [PS].Let X be a random variable with mean 0 and variance 0 < σ 2 < ∞.Also assume that E exp (a |X|) < ∞ for some a > 0. Then on the same probability space there exist i.i.d.X random variables X 1 , X 2 , . . ., and i.i.d.standard normal random variables Z 1 , Z 2 , . . ., such that for positive constants C, D and λ for all x ∈ R and n ≥ 1, This is Theorem 1 of KMT [32].The original version given in Theorem 1 of KMT [31] is stated under added conditions.
One of the key tools needed in its proof was a quantile inequality.To describe it let us introduce some notation.Let {Y n } n≥1 be a sequence of random variables and for each integer n ≥ 1 let denote the cumulative distribution function [cdf] of Y n .Its inverse distribution function or quantile function is defined by Let Z denote a standard normal random variable, Φ be its cdf and φ its density function.Since Φ(Z) = d U , we see that for each integer n ≥ 1, For this reason, we shall from now on write for convenience Consider now the special case of {Y n } n≥1 such that for each n ≥ 1, where X 1 , X 2 , . . ., are i.i.d.X satisfying the conditions of Theorem [PS].Fundamental to the proof of Theorem [PS] is the following quantile inequality, which is implicit in the proof of Theorem 1 of KMT [31].
We shall soon show that if additional assumptions are imposed on X that this inequality can be improved, in particular when X is symmetric and its distribution has a nonzero absolutely continuous component.Refer to Theorem 3 and Proposition 5 below for details.
We shall also see that this inequality leads to a coupling of Y n and Z such that for suitable constants C > 0 and λ > 0 which via Lemma A1 of Berkes and Philipp [4] implies that for each integer n ≥ 1 there exist X 1 , . . ., X n i.i.d.X and i.i.d.standard normal random variables Z 1 , . . ., Z n such that on a suitable probability space for all z ≥ 0 KMT [31] also stated the following Brownian bridge coupling to the uniform empirical process α n , along with an outline of its proof.But first, here is the definition of α n .Let U 1 , U 2 , . . ., be a sequence of independent Uniform (0, 1) random variables.For each integer n ≥ 1 let denote the empirical distribution function based on U 1 , . . ., U n .The uniform empirical process [EP] α n is the process Theorem [EP].There exists a probability space (Ω, A, P ) with a sequence of independent Uniform[0, 1] random variables U 1 , U 2 , . .., a sequence of Brownian bridges B 1 , B 2 , . . ., and positive constants a, b and c such that for all n ≥ 1 and x ∈ R, Mason and van Zwet [38], Major [36] and Mason [37] have published the details of the proof of Theorem [EP] based on Proposition [KMT] as it applies to where S n is a Binomial random variable with parameters n and 1/2.A proof of Theorem [EP] can also be obtained using a binomial inequality due to Tusnády (Proposition [T]) [55].This inequality is often referred to as the Tusnády lemma, which for comparison we state here.
Proposition [T].For all integers n ≥ 1, with Y n as in (8) and in ( 4) Tusnády [55] did not provide a fully detailed proof of his lemma.In fact, M. Csörgő and Révész [12] remarked in their monograph on strong approximations, "Although the proof of the inequality is elementary, it is not simple.It will not be given here however."When Bretagnolle and Massart [6] published a complete proof of the Tusnády lemma, it indeed was not simple.Other proofs of the Tusnády lemma can be found in M. Csörgő and Horváth [11], Dudley [17], Massart [40] and Lawler and Trujillo Ferreras [33].Carter and Pollard [9] improved upon the Tusnády inequality.More specifically, they showed that with Y n as in ( 8) and ( 4) for some C, ε > 0. Bretagnolle and Massart [6], Csörgő and Horváth [11] and Dudley [17]) also give proofs of Theorem [EP] based on the Tusnády lemma.
It is sometimes thought that the Tusnády lemma is indispensable to its proof.However this is not the case.As pointed out above, its original proof as sketched in KMT [31] was based on the binomial special case of Proposition [KMT].
Clearly the quantile coupling of a standardized sum of i.i.d.Bernoulli(1/2) with a normal random variable lies at the heart of KMT construction for the empirical process.
In the last decade the KMT construction has played a key role in the progress of the asymptotic equivalence of experiment theory.Nussbaum [41] made a remarkable breakthrough in asymptotic equivalence theory using KMT.He established the asymptotic equivalence of density estimation and Gaussian white noise under a Hölder smoothness condition.A major step toward the proof of this equivalence result is the functional KMT construction for the empirical process by Koltchinskii [30].His construction relies on the Tusnády lemma.The main consequence of this result is that an asymptotically optimal result in one of these nonparametric models automatically yields an analogous result in the other model.
Our paper is largely expository.Its two goals are to spotlight and prove a basic quantile inequality that is implicit in KMT [31], as well as establish the improvements that we alluded to above, and then show how they can be used to obtain a number of interesting couplings of a sequence of random variables Y n to a standard normal random variable Z and describe their applications in probability and statistics.As a by-product, we get Proposition [KMT] and (9) as special cases of formally more general results.In an applications section we shall also describe how refinements to the KMT quantile inequality lead to advances in asymptotic equivalence of experiment theory.It is hoped that this paper helps to make these quantile inequalities known to a wider mathematical and statistical audience.
Our paper is organized as follows.In Section 2, we state our basic quantile couplings, then in Section 3 we discuss examples of their use in probability theory and the theory of statistical experiments.Section 4 is devoted to proofs.Appendices A, B and C provide additional information for the interested reader.

The KMT quantile inequality
The following quantile inequality is essentially due to KMT [31] and it can be implied from their analysis.That it holds more generally than in the i.i.d.sum setup of Proposition [KMT] is more or less known.(See Remark 1 below.)The proof that we provide here basically follows the lines of that given for the special case of the standardized Binomial random variable (8) with parameters p = 1/2 and n in Section 1.2 of Mason [37].This proof, in turn, was largely adapted from notes taken from the Diplomarbeit of Richter [44].Very similar details are to be found in Einmahl [18].Let {F n } n≥1 be a sequence of cdfs, not necessarily being that of a sequence of sums of i.i.d.random variables of the form (5), and let Y n be defined as in (4).Theorem 1.With the above notation, assume there exist a sequence K n > 0, a sequence 0 < ε n < 1 and an integer n 0 ≥ 1 such that for all n ≥ n 0 and and Then whenever n ≥ n 0 ∨ 64K 2 n and where Remark 1.Though not explicitly stated in KMT [31], Theorem 1 has long been known in one form or another by practitioners in strong approximation theory.
Theorem A in KMT [31] implies that if X 1 , X 2 , . . ., are i.i.d.X satisfying the assumptions of Theorem [PS], then the sequence of random variables {Y n } n≥1 as defined in ( 5) satisfies (10), (11), (12) and (13).KMT [31] do not provide a proof of their Theorem A, however they refer the reader to a large deviation theorem in Petrov [42] (see Theorem 1 on page 218 of Petrov [42] or Theorem 5.23 of Petrov [43]), where it is pointed out that, though the result is formulated under the more restrictive assumption that ε n → 0 as n → ∞, his proof is applicable to establishing this refinement.Direct proofs of the fact that {Y n } n≥1 fulfills the assumptions of Theorem 1 are given by Einmahl [18] (see his Corollary 1 for a more general result from which this result follows) and Theorem 3.1 of Arak and Zaitsev [2].Theorem 1 of Einmahl [18] provides conditions under which the assumptions hold for independent but not necessarily identically distributed sums.For further quantile inequalities along this line consult Sakhanenko [46,48,49].
Theorem [PS] implies that under its assumptions and on its probability space, as n → ∞, with rate R n = O(log n).We should point out that the improved quantile inequalities that in subsection 2.2 are shown to hold under additional assumptions on X do not in general lead to corresponding improvements in the rate in (16), namely, O(log n) cannot be replaced by o(log n).A result of Bártfai [3] implies that this can only happen when σ −1 X = d Z. Refer to Theorem 2.3.2 in M. Csörgő and Révész [12] and especially to the very nice expository paper on strong invariance principles by P. Major [35].In short, unless σ −1 X = d Z, the best rate possible in (16) is O(log n).
Remark 2. Obviously, whenever K 2 n /n → 0 then there exists an integer n 1 ≥ 1 such that for all n ≥ n 1 we have n ≥ n 0 ∨ 64K 2 n .Remark 3. In typical applications K n = K, ε n = ε and η n = η for all n ≥ 1, where K, ε and η are fixed positive constants.
Here is a special case of Theorem 1 that will lead to some interesting applications.
Theorem 2. Assume there exist an L > 0, an 0 < ε < 1, a p ≥ 2 and an integer n 0 ≥ 1 such that for all n ≥ n 0 and 0 < z ≤ εn 1/p and Then whenever n ≥ n 0 ∨ 64L 2 n 1−2/p and where η = ε ∧ (1/ (8L)), we have Proof.The proof follows from Theorem 1 by setting From Theorem 2 we get the following distributional bound for the coupling |Y n − Z|.
Corollary 1.In addition to the assumptions of the Theorem 2, assume that for suitable positive constants a, b and c for all n ≥ 1 and z ≥ 0 Then for positive constants C and λ, for all z ≥ 0 and n ≥ 1, where Y n is defined as in (4).
For conditions that imply that the assumptions of Corollary 1 hold refer to Prob-example 1 in subsection 3.1.
Here is an interesting application of Theorem 1 and the methods of proof of Corollary 1 to martingale difference sequences.It shows how to apply Theorem 1 when the parameters depend on n.
Corollary 2. Let (ξ i , F i ) i=0,...,n be a square integrable martingale difference sequence with ξ 0 = 0, and and where L and M are finite positive constants.Also assume that for all n ≥ 1 Then there exist constants α > 0 and D > 0 and an integer n 1 ≥ 2 such that whenever n ≥ n 1 and where n and is defined as in (4), we have Furthermore, there exist positive constants C and λ such that for all z ≥ 0 and n ≥ 1, By repeated application of Lemma A.1 of Berkes and Philipp [4] in combination with the Kolmogorov extension theorem we obtain the following proposition.
Proposition 1.Let ξ 1 , ξ 2 , . . ., be a sequence of random variables on the same probability space.For each integer n ≥ 1 let g n be a measurable function from R d to R and p n be a non-negative function defined on [0, ∞).Suppose that for each integer n ≥ 1 there exists a probability space on which sit ( ξ 1 , . . ., ξ n ) such that ( ξ 1 , . . ., ξ n ) = d (ξ 1 , . . ., ξ n ) and a standard normal random variable Z n such that for all z ≥ 0 and n ≥ 1, Then one can construct a probability space on which sit a sequence of random variables ξ 1 , ξ 2 , . . .having the same distribution as ξ 1 , ξ 2 , . . ., and a sequence of standard normal random variables Z 1 , Z 2 , . . ., such that inequality (31) holds for each n ≥ 1.
Here is an example of the use of Proposition 1.
Further for each n ≥ 1, let σ 1.n , . . ., σ n.n be an array of constants satisfying (i) n i=1 σ 2 i.n = 1 and (ii) for some c > 0, max 1≤i≤n |σ i.n | ≤ c/ √ n.Then i.i.d.X, X 1 , X 2 , . . .random variables and a sequence of standard normal random variables Z 1 , Z 2 , . . .can be put on the same probability space so that for suitable constants C > 0 and λ > 0 one has for all x ≥ 0 This can be shown by using Corollary 1 with p = 2, in combination with Proposition 1.In particular, that inequality (23) holds for appropriate constants a, b and c follows from an application of the classic Bernstein inequality (cf.p. 855 of Shorack and Wellner [53]).To verify that the conditions of Theorem 2 are satisfied with p = 2 we apply Corollary 1 in Einmahl [18].

Some remarks on strong approximations
Let X = {X n , n ≥ 1} denote a sequence of independent mean zero random variables such that X n has variance 0 < σ 2 n < ∞, n ≥ 1, and let Y = {Y n , n ≥ 1} be a sequence of independent mean zero normal random variables such that each Y n has variance 0 < σ 2 n < ∞, n ≥ 1. Whenever X and Y are on the same probability space set for each n ≥ 1 Consider the special case when X is an i.i.d.X sequence of random variables, where EX = 0 and 0 < V arX = σ 2 < ∞.Theorem [PS], i.e.Theorem 1 of KMT [32], implies that whenever, in addition, E exp (a |X|) < ∞ for some a > 0, then on the same probability space one can define X and Y such that If, instead, we assume that E |X| r < ∞ for some r > 2, we have The case 2 < r ≤ 3 is the Corollary in Major [34] and the case r > 3 is Theorem 2 of KMT [32].
Couplings such that almost sure statements as ( 32) and ( 33) hold allow one to infer laws of the iterated logarithm [LILs] for the i.i.d.partial sums n i=1 X i from the LIL for the partial sums of i.i.d.normal random variables n i=1 Y i .These are special cases of strong approximations.To learn more about strong approximations and their applications refer to M. Csörgő and Révész [12].
Corollary 3 is not a strong approximation.However, Proposition 1, upon which it is based, and versions of it play a vital role in establishing strong approximations via quantile and conditional quantile coupling inequalities, combined with dyadic construction schemes and blocking arguments.
The classical KMT approximation results for i.i.d.sums have been extended by Sakhanenko to the independent but not necessarily i.d.case.Here are his two main coupling inequalities.Quantile and conditional quantile inequalities play an indispensable role in establishing these results.
Theorem A (Sakhanenko [46]).Let X = {X n , n ≥ 1} be a sequence of independent mean zero random variables such that Then on the same probability space one can define X and Y such that for all n ≥ 1 where A is a universal constant.
Theorem B (Sakhanenko [48]).Let X = {X n , n ≥ 1} be a sequence of independent mean zero random variables such that Then on the same probability space one can define X and Y such that for all n ≥ 1 where C is a universal constant.
Theorems A and B were announced in Sakhanenko [45].Theorem A is proved in Sakhanenko [46], where it is appears as Theorem 1, and Theorem B is established in Sakhanenko [48], where it is stated and proved in Section 5 of his paper as Corollary 5. (An earlier version with worse constants is given in Sakhanenko [47].)The above formulations of the Sakhanenko coupling inequalities were adapted from those given in Shao [52].(Actually our Theorem B implies Shao's version via Markov's inequality.)He uses Theorems A and B to establish strong approximations for partial sums of independent random variables, and shows how they lead to LILs and strong laws.

Refined quantile inequalities
To formulate the results in this section we shall need the following regularity condition.Let {Y n } n≥1 be a sequence of random variables, where each Y n has cdf F n .Assume that for every n ≥ 1 there is an Note that this assumption is very weak.It holds as long as EY n = 0 and P (Y n = 0) < 1, for instance, when for each n ≥ 1, Y n is as in (5), with X 1 , X 2 , . . ., i.i.d.X, where X is nondegenerate having expectation zero.We shall show that the KMT quantile coupling for Y n as in (5) given in Proposition [KMT] can be improved with a rate 1/ √ n, when X satisfies additional regularity conditions.Our main result in this section is the following refined KMT quantile coupling inequality.
Theorem 3. In addition to the assumptions of Proposition [KMT] suppose that EX 3 = 0 and its characteristic function v (t) satisfies Then there exist C > 0 and ε > 0 such that for every integer n ≥ 1, with Y n defined as in ( 5) and ( 4), Notice that if the random variable X is symmetric then EX 3 = 0. Also if the absolutely continuous component of the distribution of the random variable X is nonzero, one can readily conclude by the Riemann-Lebesgue lemma that assumption (38) is satisfied.Theorem 3 will be an immediate consequence of Theorem 4 and Proposition 2 below.
Our next result discloses a relationship between the existence of a certain type of large deviation result and a sharp quantile coupling inequality.Such a large deviation is often called a "Petrov expansion".Actually, the expansion that we use in this paper is even more "precise" than that of Petrov (see Remark 4), and perhaps it is better to call it a Saulis expansion (see pages 249 and 169 in Petrov [42]).
Note in this paper, we use the notation A n (x) = O (a n (x)) with x ∈ D n , where {D n } n≥1 is a sequence of sets, to mean that there is an n 0 ≥ 1 and C > 0 such that for all n ≥ n 0 uniformly over all x ∈ D n .Theorem 4. Let Z be a standard normal random variable.Let Y n be a sequence of random variables.Assume that there is a positive ε such that , and (37) holds.Then there exist C 1 > 0 and ε 1 > 0 such that for every n ≥ 1, with Y n defined as in (4), for The Petrov expansion is obtained by replacing the O n −1 x 4 + n −1 in Theorem 4 by O (a (n, x)) (see Theorem 1 in Chapter VIII of Petrov [42], or Theorem A in Komlós, Major, and Tusnády [31]).In this case, as in Theorem 1 above, the corresponding coupling inequality becomes (see Sakhanenko [46,49]).The deviation term O n −1 x 4 + n −1 improves the O (a (n, x)) term with a rate 1/ √ n uniformly in x ∈ [0, a] for any a > 0, which shows in the corresponding quantile coupling inequality.
In some applications, it is more convenient to use the following corollary, where the bound involves only the standard normal random variable Z. Zhou [57] used such a coupling of a standardized Beta random variable with a standard normal random to establish asymptotic equivalence of Gaussian variance regression and Gaussian white noise with a drift.He found that his analysis was much easier when he used the following bound in his moment calculations.
Corollary 4.Under the assumptions of Theorem 4, there exist C > 0 and ε > 0 such that for every n ≥ 1 where Y n is defined as in (4).
We have the following Saulis expansion (see page 188 in Saulis and Statulevicius [51]), which shows that the conditions of Theorem 4 hold when those of Theorem 3 are satisfied.Note that the following proposition, when combined with Theorem 4 establishes Theorem 3.
Proposition 2. Let X 1 , X 2 , . . ., X n be i.i.d.X random variables for which EX = 0, EX 2 = 1, EX 3 = 0 and E exp (a |X|) < ∞ for some a > 0. Further assume that (38) holds.Then for Y n is as in ( 5) and ( 4), there exists a positive constant η such that Proof.We only verify (42).The proof that (41) holds is similar.Saulis [50] shows that there is a constant η > 0 such that for some C > 0 and n sufficiently large, when the third moment of X is 0. See also page 249 of Petrov [42].
Theorem 3 in page 169 of Petrov [42] together with EX 3 = 0 imply The proof of Proposition 2 can also be derived by arguments similar to those in Section 8.2 of Petrov [42].Also note that the above expansion holds when the random variables X i are replaced by −X i .This implies that the expansion above holds when "<" is replaced by " ≤".
Other couplings are possible.Consider the following coupling result, which can yield refinements when for each n ≥ 1 the distribution of Y n is concentrated on a lattice.Theorem 5. Let Z be a standard normal random variable.Let Y n be a sequence of random variables.Assume that there exists a sequence of sets {C n } n≥1 such that P {Y n ∈ C n } = 1 and a positive ε such that for all n ≥ 1, and (37) holds.Moreover, the expansions above hold when "<" is replaced by " ≤".Then there exist C 1 > 0 and ε 1 > 0 such that for every n ≥ 1, with Y n defined as in (4), for Results in Carter and Pollard [9] show that the assumptions of Theorem 5 hold when Y n is the standardized sum of i.i.d.Bernoulli(1/2) with Arguing as in the proof of Corollary 4, we obtain: Corollary 5.Under the assumptions of Theorem 5, there exist C > 0 and ε > 0 such that for every n ≥ 1, with Y n defined as in (4), Remark 5.If the sequence of random variables {Y n } n≥1 in Theorems 4 or 5 does not satisfy condition (37), then their conclusions hold for all large enough n.
To see what kind of couplings one can get for standardized partial sums of i.i.d.X random variables when the condition that E exp (a |X|) < ∞ for some a > 0 is replaced by the assumption that E exp {g (|X|)} < ∞ for a suitable continuous increasing function g on [0, ∞), refer to Appendix C.

Applications to probability theory
Prob-example 1 (Partial sums) Assume X, X 1 , . . ., X n are i.i.d. with EX = 0, 0 < V arX = σ 2 < ∞, satisfying for some γ ≥ 0 and or in terms of for p ≥ 2, (Note that the case p = 2 is the classic Bernstein condition.)An application of Theorem 3.1 in Saulis and Statulevičius [51] shows that the sequence of random variables satisfies the conditions of Corollary 1.The case p = 2, i.e. γ = 0, corresponds to the conditions of Theorem [PS].In fact, a random variable X having mean zero and variance 0 < σ 2 < ∞ satisfies (45) if and only if for some C > 0 and d > 0 for all x ≥ 0 where β = 4/(p + 2).Refer to Appendix B for a proof.
If one also assumes that the characteristic function of X satisfies (38) then one can apply an expansion due to Wolf [56] to show that the coupling (24) can be improved to say that for positive constants C and λ, for all z ≥ 0 and n ≥ 1, For a sketch of how this done see Appendix C.
Note that in the following Prob-examples 2-5, it is understood that K n = K, ε n = ε and η n = η for all n ≥ 1, where K, ε and η are fixed positive constants.
Now since (obviously) Y n = O p (1), we can apply Theorem 2.5 of Giné, Goetze and Mason [20] to show that for suitable constants b > 0 and c > 0, for all z ≥ 0 and n ≥ 1 Thus we conclude by Corollary 1 that (24) with p = 2 holds in this example.Recall in Prob-example 1 that for inequality (24) with p = 2 to hold for an unself-normalized sum σ −1 S n we required that X have a finite moment generating function in a neighborhood of zero.Prob-example 2 shows that selfnormalizing dramatically reduces the assumptions needed for (24) with p = 2 to be valid. and For each m ≥ 1 choose 1 ≤ n m ≤ m and set Assume that for all m ≥ m 0 for some integer m 0 ≥ 1, For each integer m ≥ 1 let S m denote the sum of X 1 , . . ., X nm taken by simple random sampling without replacement from {a 1,m , . . ., a m,m }.Let Assumptions ( 50), ( 51), ( 52) and ( 53) permit us to apply Theorem 1.1 of Hu, Robinson and Wang [28] to get for a suitable constant K > 0 and ε > 0 that (10), ( 11), ( 12) and ( 13) hold for all m ≥ m 0 .(For an earlier version of their result where m = 2n and n m = n, n ≥ 1, refer to Lemma 3 of KMT [32].)Therefore we can apply Theorem 1 (note we replace n by m) to show that the quantile inequality ( 15) is valid whenever m ≥ m 0 ∨ 64K 2 and ( 14) is satisfied with η = 1 ∧ (1/ (8K)).
Notice that for all m ≥ m 0 for some λ > 0 This bound combined with the Hoeffding [27] inequality for simple random sampling from a finite population without replacement gives for all z ≥ 0 and m ≥ m 0 This allows us to apply Corollary 1 to conclude that (24) with p = 2 holds for all m ≥ m 0 .
A more carefully analysis leads to an inequality of the form: For all z ≥ 0 and m ≥ m 0 where C m and λ m are positive constants depending on m, ω m , σ m , M m and β 3,m .For a closely related result in the special case when a i,m ∈ {−1, 1} for i = 1, . . ., m, m ≥ 1, refer to Theorem 3.2 of Chatterjee [10].
Prob-example 4 Let X 1 , X 2 , . . ., be a stationary sequence of random variables defined on a probability space (Ω, F , P ), satisfying Assume that for some σ 2 0 > 0 we have In these last two expressions it is understood that 0/0 := 0, whenever its occurs.Introduce the mixing rates: For some M > 0 and µ > 0 and the bounding conditions: Applying results in Statulevičius and Jakimavičius [54] we get that inequalities (23)  In the last three cases we assume that the random variables X t are connected by a Markov chain.
Prob-example 5 (Sample median) Let {Y n } n≥1 be a sequence of random variables of the form where V n is a Beta (n, n) random variable.A long but elementary analysis based on Stirling's formula shows that this sequence satisfies the assumptions of Theorem 1.For more about this example see Exp-example 2 in subsection 3.2.Furthermore, since V n is equal in distribution to the n th order statistic of 2n − 1 independent Uniform (0, 1) random variable, i.e. the sample median, we get using identity (11) on page 86 of Shorack and Wellner [53] that for any z ≥ 0 where α 2n−1 (t) is defined as in (7), which by the Dvoretzky-Kiefer-Wolfowitz inequality (see Massart [39]) is Thus we can apply Corollary 1 to get that inequality (24) with p = 2 is satisfied for all n ≥ 1. Hence we can conclude that for positive constants C and λ, for all z ≥ 0 and n ≥ 1, where Y n is defined as in ( 56) and ( 4).
The coupling (57) can be used to give a fairly direct proof of Theorem [PS] in the special case when X 1 , X 2 , . . ., are i.i.d.ω − 1, where ω is exponential with mean 1.This exponential special case is important in constructing a Brownian bridge approximation to the uniform quantile process.For details refer to M. Csörgő and Révész [12] and M. Csörgő, S. Csörgő, Horváth and Mason [13].
We conclude this subsection with a remark about probability spaces.
Remark 6.Using Proposition 1 one can construct a probability space on which sit a square integrable martingale difference sequence (ξ i , F i ) i=0,...,n satisfying the conditions of Corollary 2 and a sequence of standard normal random variables {Z n } n≥1 such that for each n ≥ 1, (30) holds with Y n = n i=1 ξ i / √ n and with Z replaced by Z n .Analogous statements are true for Prob-examples 1, 2 and 4. By the way, the probability space of Theorem [EP] is constructed in this way.

Applications to asymptotic equivalence of experiments
Since Donoho and Johnstone [15], a Besov smoothness constraint has been standard in the study of the asymptotic optimally of nonparametric estimation procedures.More recently, under a sharp Besov smoothness assumption and via the Carter and Pollard [9] improved Tusnády inequality, Brown, Carter, Low and Zhang [BCLZ] [7] extended the asymptotic equivalence result of Nussbaum [41] for density estimation.BCLZ [7] is considered to be an important paper in this area.We point out that the Tusnády inequality given in Proposition [T] may not be strong enough to establish their results.
Quantile coupling inequalities of the kind we are discussing in this survey have led to extensions of the asymptotic equivalence theory for density estimation in Nussbaum [41] to general nonparametric estimation models (see Grama and Nussbaum [23,24,25]).Among these is the important case of the asymptotic equivalence of Gaussian variance regression to Gaussian white noise.
Zhou [57] and Golubev, Nussbaum and Zhou [21] obtained equivalence results for spectral density estimation.One of their crucial tools is a sharp quantile coupling bound between a standardized Beta random variable and a standard normal random variable to obtain an asymptotic equivalence theory for the Gaussian variance regression.This was the key to establish the asymptotic equivalence of spectral density estimation and Gaussian white noise experiments under a Besov smoothness constraint.One of its interesting applications is a coupling of a sample median statistic with a standard normal random variable.It improves upon the classical quantile coupling bounds with a rate 1/ √ n, under certain smoothness conditions on the underlying distribution function and it includes the Cauchy distribution as a special case.It is likely that this coupling will be of independent interest because of the fundamental role played by the sample median in statistics.
Here are some more detailed descriptions of these applications.
Exp-example 1 (Asymptotic equivalence of density estimation and Gaussian white noise) Consider the two sequences of experiments: E n : y(1), . . ., y(n), i.i.d. with density f on [0, 1] where W t is a standard Wiener process on [0, 1].The asymptotic equivalence of these two sequences was established in BLCZ [7] under a Besov smoothness constraint.The basic approach in their paper is the utilization of the classical KMT [31] construction.To do this they needed the following coupling of a standardized Binomial random variable and a standard normal random variable Z.Let X 1 , X 2 , . . ., X n be i.i.d.Bernoulli(1/2).Our Corollary 5 tells us that for every n ≥ 1 there is a random variable Y n with for |Z| ≤ ε √ n , where C, ε > 0 do not depend on n (see also Carter and Pollard [9]).This result was applied in combination with the KMT construction to establish the asymptotic equivalence under a Besov smoothness condition and a compactness in the Besov balls B E n : y(1), . . ., y(n), a stationary centered Gaussian sequence with spectral density f where f has support in [−π, π].The asymptotic equivalence between the sequence of Gaussian spectral density experiments E n and the sequence of regression Gaussian white noise experiments F n was established in Golubev, Nussbaum and Zhou [21] under a Besov smoothness constraint.In that paper, they used a modification of the dyadic KMT [31]-type construction.Instead of applying a complicated KMT [31] type conditional quantile coupling for higher resolutions, Golubev et al [21] found that in their setup it was easier to use a construction based on the fact that if two X and Y are two independent χ 2 n random variables X and Y then for any y > 0, where B n is a Beta (n/2, n/2) random variable.This permitted them to avoid conditional quantile coupling by considering a coupling for a Beta random variable, obtaining the following coupling inequality, which we get here by an application of Corollary 4. Let Z be a standard normal random variable.For every n, there is a mapping T n : R → R such that the random variable B n = T n (Z) has the Beta (n/2, n/2) law and for |Z| ≤ εn, where C, ε > 0 do not depend on n (cf.Zhou [57]).
For simplicity, we only consider odd integers n = 2k + 1 with k ≥ 0. Thus, in this notation, the sample median X med = X (k+1) .Assume that f (0) > 0, f´(0) = 0, and f ∈ C 3 and there is an ǫ > 0 such that Let Z be a standard normal random variable.Cai and Zhou [8] show that for every n ≥ 1, there is a mapping where C, ε > 0 do not depend on n.For the details of proof and for more general discussions consult Cai and Zhou [8].It can be shown that our Corollary 4 gives this result.They use this quantile inequality to study the nonparametric location model with Cauchy noise and as well as wavelet regression.Donoho and Yu [16] treated a similar problem, however it is not clear that the minimax property holds for their procedure.In the wavelet regression setting, Hall and Patil [26] studied nonparametric location models and achieved the optimal minimax rate, but under the more restrictive assumption of the existence of a finite fourth moment.Cai and Zhou [8] need only to impose existence of a finite ǫ−moment (59).The noise can be general and unknown, but yet achieve an optimal minimax rate of convergence.Without the assumptions f´(0) = 0 or f ∈ C 3 , we can still obtain coupling bounds, but they may not be as tight as the above bound.The tightness of the upper bound affects the underlying smoothness condition that they require in deriving their asymptotic results.

The underlying approach to our quantile inequalities
Underlying our quantile inequalites is the following simple observation.Let Y be a random variable with cdf F and as usual let Φ be the cdf of the standard normal random variable Z. Observe trivially that for some y and u (y) > 0 if and only if Thus if there is a z such that y = F −1 (Φ (z)) we get from (61) and the fact that for all s ∈ (0, 1) that with y = F −1 (Φ (z)) and s = Φ (z), Now let C be the set of y such that inequality (60) holds and define Y = F −1 (Φ (Z)).Clearly whenever Y ∈ C, we have All of our quantile inequalities will follow this approach.For instance, Theorem 1 provides conditions under which (60) holds with For more about this approach see Sakhanenko [49].

Proof of Theorem 1
We shall infer Theorem 1 from the following technical result.Proposition 3. Assume there exist a sequence K n > 0, a sequence 0 < ε n < 1 and an integer n 0 ≥ 1 such that for all n ≥ n 0 and 0 < z ≤ ε n √ n inequalities ( 10), ( 11), ( 12) and ( 13) hold.Then whenever where To see how our theorem follows from this proposition, set Y n = √ nx into the (63) and (64).Therefore whenever n ≥ n 0 ∨ 64K 2 n and where As pointed out in subsection 4.1, this inequality implies Thus (15) holds.Hence the theorem will be proved as soon as we have established the proposition.

Proof of Proposition 3
The proof will follow from a number of lemmas.
Lemma 1.For any x > 0 (This is the classic Mill's ratio.Refer, for instance, to Shorack and Wellner [53].)We can readily infer from (66) and some simple bounds that for some c > 1 and all x ≥ 0, Lemma 2. The function and the function Proof.First consider (69).We see that where When x ≥ 0, obviously g(x) < 0, and when x < 0, (66) implies that g(x) ≤ 0. Thus we have (69).Assertion (68) follows from the fact that Ψ Notice that (66) and (68) imply that for x > 0, we have The following lemma can be inferred from Lemma (A.8) in Einmahl [18] with the 8 there replaced by an unspecified constant.To keep the presentation selfcontained we provide here a direct proof.In any case, we need this lemma with its present constants in order to establish Theorem 1 as it is stated.
We are now ready to complete the proof of Proposition 3. Suppose n ≥ n 0 ∨ 64K 2 n .By assumptions (10), ( 11), ( 12), ( 13) and Lemma 3 we can choose A = K n , and η n = ε n ∧ (1/ (8K n )) such that for all 0 ≤ x ≤ η n we have and ( 71) and (72) hold, which imply that for all 0 and In other words, for all |x| ≤ η n , we have where u = 2K n √ nx 2 + 2Kn √ n .This completes the proof of the proposition.

Proof of Corollary 1
We know by Theorem 2 that there exist an 0 < L < ∞, an 0 < η < ∞ and an integer n 0 ≥ 1 such that for all integers n ≥ n 0 ∨ 64L 2 n 1−2/p , whenever (21) holds, we have We require a number of lemmas.
Lemma 5.For every A > 0 there exist positive numbers C 2 > 0 and λ 2 > 0 such that for all n ≥ 1, k ≥ 1 and 0 ≤ z ≤ An 2/p we have Proof.Applying (23), we see that Combining Lemmas 4 and 5 with inequality (22) we readily infer the following lemma: Lemma 6.For every A > 0 there exist positive numbers C 3 > 0 and λ 3 > 0 such that for all n ≥ n 0 ∨ 64K 2 and 0 ≤ z ≤ An 2/p we have We shall need two more lemmas.
Lemma 7.For every A > 0 there exist positive numbers C 4 > 0 and λ 4 > 0 such that for all n ≥ 1, and z > An 2/p we have Proof.The proof is an easy consequence of inequality (23), which gives Lemma 8.For every A > 0 there exist positive numbers C 5 > 0 and λ 5 > 0 such that for all n ≥ 1, and z > An 2/p we have Proof.Inequality (90) is readily inferred from the elementary bound The proof of inequality (24) for n ≥ n 0 is now completed by using some routine bounds on the probability in (24) and then applying Lemmas 6, 7 and 8.For the case 1 ≤ n < n 0 , (should it be that n 0 > 1), we establish that (24) holds uniformly in 1 ≤ n < n 0 by using the elementary inequality Remark 7. Actually the proof shows that for suitable constants C > 0 and λ > 0, .

Proof of Corollary 2
Assumptions ( 25) and ( 26) allow us to apply the results in Grama and Haeusler [22] to get the following large-deviation result: For x in the range 1 ≤ x ≤ α + n 1/4 (for α + > 0 sufficiently small), one has This implies that for some constant D 1 > 0, all large enough n and 1 Furthermore assumptions (25), ( 26) and ( 27) permit us to apply the corollary in Bolthausen [5] to infer that for some constant This implies that for some D 3 > 0 for all 0 Thus for some D + > 0, all large enough n and 0 Similarly we get for some α − > 0, D − > 0, all large enough n and 0 Thus ( 10), ( 11), ( 12) and ( 13) are satisfied with K n = D log n for some D > 0 and with ε n = α/ 4 √ n for some α > 0, and for all n ≥ n 0 for some integer n 0 ≥ 2.
Applying Theorem 1, whenever n ≥ n 0 ∨ (64D 2 (log n) 2 ), we get (29), namely, Next consider (30).Notice that for n ≥ 2 which by Azuma's inequality is Applying Azuma's inequality again we have Once more using Azuma's inequality we get for z > √ n and an elementary bound gives for z > √ n It is now easy to conclude from these inequalities that (30) holds for appropriate C > 0 and λ > 0. The following is a detailed proof of Theorem 4. It is a modification of the proof for the classical case, which was sketched in Komlós, Major, and Tusnády [31].
Recall the definition of Y n as in (4).Without loss of generality, we assume that 0 shows that Equation (92) holds when n ≤ n 0 for any fixed integer n 0 ≥ 1.It is then enough to consider the n sufficiently large case.From the assumptions of Theorem 4, we know that for 0 for some C > 0. Thus it suffices to show there is a C 1 > 0 and a small enough From ( 70) and (68) we have log Putting everything together, we establish that (93) holds for all large enough n ≥ 1.To complete the proof recall that we assume (37), which allows us to apply Lemma 9 in the appendix to conclude that (93) holds for all 1 ≤ n ≤ n 0 for any fixed n 0 ≥ 1.This finishes the proof of the theorem.

Proof of Corollary 4
Set Y n (z) = H n (Φ(z)).Let us rewrite (39) as so we have for some C 2 > 0, We know that z n ≥ 0 from the definition of the quantile coupling Y n (z) = H n (Φ(z)), and from (95) we have < exp (C/n 0 ) .

Appendix B
In this subsection we prove the equivalence of ( 45) and (46).We shall assume that X satisfies EX = 0, 0 < V arX = 1 < ∞, and for some γ ≥ 0 and K ≥ 1, Notice that (101) implies that for t > 0, Thus there exists a 0 < δ < 1 such that Next we shall prove that (102) holding for some 0 < δ < 1 implies (101) for some K > 0. (The argument is similar to the proof of Lemma 3 in Amosova e. Here we used the inequality (x + 1) x+1 ≤ (2x) x e for x ≥ 1.This last expression is which is finite for a small enough 0 < δ < 1.From these considerations we can readily establish the equivalence of ( 45) and (46).

Appendix C
Wolf [56] extended results of Saulis [50] under a more general moment condition.
In the following propositions X, X 1 , X 2 , . . ., X n are i.i.d.random variables with EX = 0, EX 2 = 1.As above, we use the notation , where H n is the inverse distribution of F n (the distribution of Y n ), Z is a standard normal random variable and Φ is its distribution function.
Choose any k > 1 and let Λ (n) denote the solution to the equation For example, when g (x) = dx α for some α ∈ (0, 1) and d > 0 An argument based on Theorem 6.3 (Wolf [56]) in Saulis and Statulevičius [51] and following the lines of the proofs in subsection 4.3 we get the following refined quantile inequality.
assumption.If instead, one uses the classical Tusnády inequality, a stronger smoothness condition would be needed to establish the asymptotic equivalence.Exp-example 2 (Asymptotic equivalence of spectral density estimation and Gaussian white noise) Consider the two sequences of experiments: