Optimal two-value zero-mean disintegration of zero-mean random variables

For any continuous zero-mean random variable (r.v.) X, a reciprocating function r is constructed, based only on the distribution of X, such that the conditional distribution of X given the (at-most-)two-point set {X,r(X)} is the zero-mean distribution on this set; in fact, a more general construction without the continuity assumption is given in this paper, as well as a large variety of other related results, including characterizations of the reciprocating function and modeling distribution asymmetry patterns. The mentioned disintegration of zero-mean r.v.'s implies, in particular, that an arbitrary zero-mean distribution is represented as the mixture of two-point zero-mean distributions; moreover, this mixture representation is most symmetric in a variety of senses. Somewhat similar representations -- of any probability distribution as the mixture of two-point distributions with the same skewness coefficient (but possibly with different means) -- go back to Kolmogorov; very recently, Aizenman et al. further developed such representations and applied them to (anti-)concentration inequalities for functions of independent random variables and to spectral localization for random Schroedinger operators. One kind of application given in the present paper is to construct certain statistical tests for asymmetry patterns and for location without symmetry conditions. Exact inequalities implying conservative properties of such tests are presented. These developments extend results established earlier by Efron, Eaton, and Pinelis under a symmetry condition.

These results can be easily restated in terms of Student's statistic T , which is a monotonic function of S, as noted by Efron: T = n−1 n S/ 1 − S 2 /n. Eaton (1970) [6] proved the Khinchin-Whittle-Haagerup inequality (1.2) for a rich class of moment functions, which essentially coincides with the class F 3 of all convex functions f with a convex second derivative f ′′ ; see [21,Proposition A.1] and also [25]. Based on this extension of (1.2), inequality (1.3) was improved in [6,7,21]. In particular, Pinelis (1994) [21] obtained the following improvement of a conjecture by Eaton (1974) [7]: 4 IOSIF PINELIS same direction, where q 1−p is a (1 − p)-quantile of the distribution of X). The construction described above in terms ofX a,b = X I{a X b} corresponds, clearly, to the case of opposite-moving markers.
While an analogous same-direction zero-mean disintegration is possible, we shall not deal with it in this paper. For a zero-mean distribution, the advantage of an opposite-directions construction is that the resulting two-point zero-mean distributions are less asymmetric than those obtained by using a same-direction method (in fact, we shall show that our opposite-directions disintegration is most symmetric, in a variety of senses). On the other hand, the same-direction method will produce two-point zero-mean distributions that are more similar to one another in width. Thus, in our main applications -to self-normalized sums, the advantages of opposite-directions appear to be more important, since the distribution of a selfnormalized sum is much more sensitive to the asymmetry than to the inhomogeneity of the constituent two-point distributions in width; this appears to matter more in the setting of Corollary 2.6 than in the one of Corollary 2.5.
These mixture representations of a distribution are similar to the representations of the points of a convex compact set as mixtures of the extreme points of the set; the existence of such representations is provided by the celebrated Krein-Milman-Choquet-Bishop-de Leeuw (KMCBdL) theorem; concerning "non-compact" versions of this theorem see e.g. [19]. In our case, the convex set would be the set of all zero-mean distributions on R. However, in contrast with the KMCBdL-type pure-existence theorems, the representations given in [24], [3], and this paper are constructive, specific, and, as shown here, optimal, in a variety of senses.
Moreover, in a certain sense [24] and this paper provide disintegration of r.v.'s rather than that of their distributions, as the two-point set {x, r(x)} is a function of the observed value x of the r.v. X. This makes it convenient to construct statistical tests for asymmetry patterns and for location without symmetry conditions. Exact inequalities implying conservative properties of such tests will be given in this paper. These developments extend the mentioned results established earlier by Efron, Eaton, and Pinelis under the orthant symmetry condition.
More specifically, one can construct generalized versions of the self-normalized sum (1.1), which require -instead of the symmetry of independent r.v.'s X i -only that the X i 's be zero-mean: S W := X 1 + · · · + X n 1 2 W 2 1 + · · · + W 2 n and S Y,λ := X 1 + · · · + X n (Y λ 1 + · · · + Y λ n ) 1 2λ , where λ > 0, W i := |X i − r i (X i )| and Y i := |X i r i (X i )|, and the reciprocating function r i := r Xi is constructed as above, based on the distribution of X i , for each i, so that the r i 's may be different from one another if the X i 's are not identically distributed. Note that S W = S Y,1 = S (recall here (1.1)) when the X i 's are symmetric. Logan et al [18] and Shao [28] obtained limit theorems for the "symmetric" version of S Y,λ (with X 2 i in place of Y i ), whereas the X i 's were not assumed to be symmetric.
Corollaries 2.5 and 2.6 in Subsection 2.2 of this paper suggest that statistical tests based on the "corrected for asymmetry" statistics S W and S Y have desirable conservativeness and similarity properties, which could result in greater power; further studies are needed here. Recall that a test is referred to as (approximately) similar if the type I error probabilities are (approximately) the same for all distributions corresponding to the null hypothesis.
Actually, in this paper we provide two-point zero-mean disintegration of any zeromean r.v. X, with a d.f. not necessarily continuous or strictly increasing. Toward that end, randomization by means of a r.v. uniformly distributed in interval (0, 1) is used to deal with the atoms of the distribution of r.v. X, and generalized inverse functions to deal with the intervals on which the d.f. of X is constant.
Note that the reciprocating function r depends on the usually unknown in statistics distribution of the underlying r.v. X. However, if e.g. the X i 's constitute an i.i.d. sample, then the function G defined in the next section by (2.1) can be estimated based on the sample, so that one can estimate the reciprocating function r. Thus, replacing X 1 + · · · + X n in the numerators of S W and S Y,λ by X 1 + · · · + X n − nθ, one obtains approximate pivots to be used to construct confidence intervals or, equivalently, tests for an unknown mean θ. One can also use bootstrap to estimate the distributions of such approximate pivots. Introduce also a "randomized" version of G: and what we shall refer to as the reciprocating function r = r ν for the measure ν: (ii) Also,G(x, u) and hence r(x, u) depend on u for a given value of x only if ν({x}) = 0. Therefore, let us write simply r(x) in place of r(x, u) in the case when the measure ν is non-atomic.
If ν is the measure µ = µ X that is the distribution of a r.v. X, then we may use subscript X with G,G, r, x ± in place of subscript µ (or no subscript at all).
In what follows, X will by default denote an arbitrary zero-mean real-valued r.v., which will be usually thought of as fixed. Then, for G = G X , For any a and b in R such that ab 0, let X a,b denote any zero-mean r.v. with values in the two-point set {a, b}; note that such a r.v. X a,b exists and, moreover, its distribution is uniquely determined: provided that U does not depend on X a,b , where r a,b := r X a,b , the reciprocal function for X a,b . Note that, if ab = 0, then R a,b = 0 = X a,b a.s. If ab < 0, then R a,b = b a.s. on the event {X a,b = a}, and R a,b = a a.s. on the event {X a,b = b}, so that the random set {X a,b , R a,b } coincides a.s. with the nonrandom set {a, b}. However, R a,b equals in distribution to X a,b only if a + b = 0, that is, only if X a,b is symmetric; moreover, in contrast with X a,b , the r.v. R a,b is zero-mean only if a+b = 0. Clearly, (X a,b , R a,b ) D = (X b,a , R b,a ) whenever ab 0. We shall prove that the conditional distribution of X given the two-point random set {X, r(X, U )} is the zero-mean distribution on this set: In fact, we shall prove a more general result: that the conditional distribution of the ordered pair X, r(X, U ) given that {X, r(X, U )} = {a, b} is the distribution of the ordered pair X a,b , R a,b : Formally, this basic result of the paper is expressed as Theorem 2.2. Let g : R 2 → R be any Borel function bounded from below (or from above). Then (2.12) E g X, r(X, U )) = R×[0, 1] E g X x,r(x,u) , R x,r(x,u) P(X ∈ dx) du.
Instead of the condition that g be bounded from below or above, it is enough to require only that g(x, r) − cx be so for some real constant c over all real x, r.
The proofs (whenever necessary) are deferred to Section 4. As one can see, Theorem 2.2 provides a complete description of the distribution of the ordered random pair X, r(X, U ) -as a mixture of two-point distributions on R 2 ; each of these two-point distributions is supported by a two-point subset of R 2 of the form {(a, b), (b, a)} with ab 0, and at that the mean of the projection of this two-point distribution onto the first coordinate axis is zero. As special cases, Theorem 2.2 contains descriptions of the individual distributions of the r.v.'s X and r(X, U ) as mixtures of two-point distributions on R: for any Borel function g : R → R bounded from below (or from above) one has This is illustrated by Example 2.3. Let X have the discrete distribution 5 10 δ −1 + 1 10 δ 0 + 3 10 δ 1 + 1 10 δ 2 on the finite set {−1, 0, 1, 2}, where δ a denotes the (Dirac) probability distribution on the singleton set {a}. Then m = 5 10 and, for x ∈ R, u ∈ [0, 1], and h ∈ [0, m], are the zero-mean distributions 1 2 δ −1 + 1 2 δ 1 , 2 3 δ −1 + 1 3 δ 2 , and δ 0 , respectively. Thus, the zero-mean distribution of X is represented as a mixture of these two-point zero-mean distributions: 5 10 δ −1 + 1 10 δ 0 + 3 10 δ 1 + 1 10 δ 2 = 6 10 ( 1 2 δ −1 + 1 2 δ 1 ) + 3 10 ( 2 3 δ −1 + 1 3 δ 2 ) + 1 10 δ 0 . 2.2. Two-value zero-mean disintegration of several independent zeromean r.v.'s and applications to self-normalized sums. Suppose here that X 1 , . . . , X n are independent zero-mean r.v.'s and U 1 , . . . , U n are independent r.v.'s uniformly distributed on [0, 1], which are also independent of X 1 , . . . , X n . For each j = 1, . . . , n, let R j := r j (X j , U j ), where r j denotes the reciprocating function for r.v. X j . For any real a 1 , b 1 , . . . , a n , b n such that a j b j 0 for all j, let X 1;a1,b1 , . . . , X n;an,bn be independent r.v.'s such that, for each j ∈ {1, . . . , n}, the r.v. X j;aj ,bj is zero-mean and takes on its values in the two-point set {a j , b j }. For all j, let R j;aj ,bj := a j b j /X j;aj ,bj if a j b j < 0 and R j;aj ,bj := 0 if a j b j = 0. Theorem 2.4. Let g : R 2n → R be any Borel function bounded from below (or from above). Then identity (2.12) can be generalized as follows: . . , X n;pn , R n;pn ) dp 1 · · · dp n , where p j and dp j stand, respectively, for x j , r j (x j , u j ) and P(X j ∈ dx j ) du j . Instead of the condition that g be bounded from below or above, it is enough to require only that g(x 1 , r 1 , . . . , x n , r n ) − c 1 x 1 − · · · − c n x n be so for some real constants c 1 , . . . , c n over all real x 1 , r 1 , . . . , x n , r n . 8

IOSIF PINELIS
For every natural α, let H α + denote the class of all functions f : R → R such that f has finite derivatives f (0) := f, f (1) Applying Theorem 2.4 along with results of [23,25] to the mentioned asymmetrycorrected versions of self-normalized sums, one can obtain the following results.
Suppose that for some p ∈ (0, 1) and all i ∈ {1, . . . , n} Then for all one has where T n := (Z 1 + · · · + Z n )/n 1/(2λ) ; Z 1 , . . . , Z n are independent r.v.'s each having the standardized Bernoulli distribution with parameter p; the function x → P LC (T n x) is the least log-concave majorant of the function x → P(T n x) on R; c 3,0 = 2e 3 /9 = 4.4634 . . .. The upper bound c 3,0 P LC (T n x) can be replaced by somewhat better ones, in accordance with [22,Theorem 2.3] or [25,Corollary 4]. The lower bound λ * (p) on λ given by (2.17) is the best possible one, for each p.
The bounded-asymmetry condition (2.16) is likely to hold when the X i 's are bounded i.i.d. r.v.'s. For instance, (2.16) holds with p = 1 3 for r.v. X in Example 2.3 in place of X i .

Statements of related results, with discussion
We begin this section with a number of propositions, collected in Subsections 3.1. These propositions describe general properties of the reciprocating function r and the associated functions x + and x − , and thus play a dual role. On the one hand, these properties of r and x ± may be of independent interest, each to its own extent.
On the other hand, they will be used in the proofs of the basic Theorem 2.2 and related results to be stated and discussed in Subsections 3.2-3.5.
In Subsection 3.2, a generalization and various specializations of the mentioned two-point zero-mean disintegration are presented; methods of proofs are discussed and numerous relations of these results between themselves and with the mentioned result by Aizenman et al. [3] are also given. In Subsection 3.3, which exploits some of the results of Subsection 3.2, the disintegration based on the reciprocating function is shown to be optimal -most symmetric, but also most inhomogeneous in the widths. In Subsection 3.4, various characterizations of the reciprocating function r (as well as of the functions x ± ) are given. These characterizations are perhaps the most difficult results in this paper to obtain. They are then used in Subsection 3.5 for modeling.
In all these results, the case when X = 0 a.s. is trivial. So, henceforth let us assume by default that P(X = 0) < 1. Also, unless specified otherwise, µ will stand for the distribution µ X of X.
3.1. General properties of the functions x ± and r. Let us begin this subsection by stating, for easy reference, some elementary properties of the functions x ± defined by (2.3) and (2.4).
It follows that Moreover, for any h 1 , h 2 , and x one has the following implications: Consider the lexicographic order ≺ on [0, ∞] × [0, 1] defined by the formula for all (x 1 , u 1 ) and ( The following proposition is a useful corollary of Proposition 3.3. Proposition 3.4. One has P X = 0,G(X, U ) = h = 0 for all real h. Therefore, P G (X, U ) = h = 0 for all real h = 0; that is, the distribution of the "randomized" versionG(X, U ) of G(X) may have an atom only at 0.
Along with the r.v. X, let Y , Y + , Y − stand for any r.v.'s which are independent of U and whose distributions are determined by the formulas for all Borel functions f : R 2 → R bounded from below (or from above). Here and elsewhere, we use the standard notation x + := max(0, x) and x − := min(0, x). One should not confuse Y ± with Y ± ; in particular, by (3.12), P(Y + = 0) = P(Y 0) = P(X 0) = 0 (since E X = 0), while P(Y + = 0) = 0. Now one can state another corollary of Proposition 3.3: At this point one is ready to admit that the very formulation of Theorem 2.2 may seem problematic for the following reasons. On the one hand, the two-value zeromean r.v.'s X a,b are not defined (and cannot be reasonably defined) when one of the points a, b is ∞ or −∞ while the other one is nonzero. On the other hand, r(x, u) may take infinite values for some u ∈ [0, 1] and real nonzero x, which will make the r.v. X x,r(x,u) undefined. For example, if X has the zero-mean distribution (say However, such concerns are taken care of by another corollary of Proposition 3.3: Proposition 3.6. Almost surely, |r(X, U )| < ∞.
An application of Proposition 3.4 is the following refinement of Proposition 3.6. Let, as usual, supp ν denote the support of a given nonnegative measure ν, which is defined as the set of all points x ∈ R such that for any open neighborhood O of x one has ν(O) > 0. Then, also as usual, supp X is defined as the support of the distribution µ X of X. Proposition 3.7. One has P X = 0, r(X, U ) / ∈ (supp X) \ {0} = 0; that is, almost surely on the event X = 0, the values of the r.v. r(X, U ) are nonzero and belong to supp X. In particular, P X = 0, r(X, U ) = 0 = 0. (Obviously, r(X, U ) = 0 on the event {X = 0}.) In the sequel, the following definition will be quite helpful: for u ∈ [0, 1]; cf. definition (2.6) of the reciprocating function r. (ii) if 0 x < x, then all of the following conditions must occur: then all of the following conditions must occur: From Proposition 3.8, we shall deduce Proposition 3.9. Almost surely,x(X, U ) = X.
In view of Propositions 3.9 and 3.8, one may find it appropriate to refer tox(x, u) as the regularized version of x, and to the functionx as the regularizing function for (the distribution of) X.
We shall use Proposition 3.9 to show that the mentioned in Introduction symmetry property r(−x) ≡ r(x) of the reciprocating function for symmetric r.v. X with a continuous strictly increasing d.f. essentially holds in general, without the latter two restrictions on the d.f.: Proposition 3.10. The following conditions are equivalent to one another: Propositions 3.4 and 3.9 can also be used to show that the term "reciprocating function" remains appropriate even when the d.f. of X is not necessarily strictly increasing. Toward that end, let us first state Proposition 3.11. For any given ( where h :=G(x, u) and y := r(x, u); then Remark. In general, the identity r(x, u) = −x for a symmetric r.v. X does not have to hold for all x ∈ R and u ∈ [0, 1], even if X is continuous. For example, let X be uniformly distributed on [−1, 1] and x > 1; then r(x, u) = r(x) = −1 = −x for all u. Moreover, then r(r(x)) = 1 = x, so that the identity r(r(x)) = x does not have to hold for all x ∈ R, even if X is continuous. Furthermore, if X is not continuous and V is not allowed to depend on (X, U ), then the conclusion r r(X, U ), V = X a.s. in Proposition 3.12 will not hold in general. For instance, in Example 2.3 one has r r(1, u), v = r r(2, u), v = I{v 3 5 } + 2 I{v > 3 5 } for all u and v in [0, 1]; so, for any r.v. V taking its values in [0, 1] and independent of (X, U ), one has P r r(X, U ), V = X Variations on the disintegration theme. In this subsection we shall consider a formal extension of Theorem 2.2, stated as Proposition 3.13, which is in fact equivalent to Theorem 2.2, and yet is more convenient in certain applications. A number of propositions which are corollaries to Theorem 2.2 or Proposition 3.13 will be considered here, including certain identities for the joint distribution of X and r(X, U ). As noted before, Theorem 2.2 implies a certain disintegration of the zero-mean distribution of X into a mixture of two-point zero-mean distributions recall (2.13) . We shall prove that such a disintegration can be obtained directly as well, and that proof is much simpler than the proof of Theorem 2.2.

Moreover, the function v is Borel and takes its values in the interval
Let us now proceed by noting first a special case of (2.12), with g(x, r) := I{x = 0, r = 0} for all real x and r. Then it follows that r(X, U ) = 0 almost surely on the event {X = 0}: Next, note that the formalization of (2.11) given in Theorem 2.2 differs somewhat from the way in which the notion of the conditional distribution is usually understood. Yet, Theorem 2.2 and its extension, Theorem 2.4, are quite convenient in the applications, such as Corollaries 2.5 and 2.6, and others. However, Theorem 2.2 can be presented in a more general form -as a statement on the joint distribution of the ordered pair X, r(X, U ) and the (unordered) set {X, r(X, U )}, which may appear to be in better accordance with informal statement (2.11): Proposition 3.13. Let g : R 2 × R 2 → R be any Borel function bounded from below (or from above), which is symmetric in the pair (x,r) of its last two arguments: (3.18) g(x, r;r,x) = g(x, r;x,r) for all real x, r,x,r. Then Instead of the condition that g be bounded from below or above, it is enough to require only that g(x, r;x,r) − cx −cr be so for some real constants c,c -over all real x, r,x,r.
Symmetry restriction (3.18) imposed on the functions g in Proposition 3.13 corresponds to the fact that the conditioning in (2.10) and (2.11) is on the (unordered) set {X, r(X, U )}, and of course not on the ordered pair X, r(X, U ) . Indeed, the natural conditions ψ(a, b) = ψ(b, a) =ψ({a, b}) (for all real a and b) establish a one-to-one correspondence between the symmetric functions (a, b) → ψ(a, b) of the ordered pairs (a, b) and the functions {a, b} →ψ({a, b}) of the sets {a, b}. This correspondence can be used to define the Borel σ-algebra on the set of all sets of the form {a, b} with real a and b as the σ-algebra generated by all symmetric Borel functions on R 2 . It is then with respect to this σ-algebra that the conditioning in the informal equation (2.11) should be understood.
Even if more cumbersome than Theorem 2.2, Proposition 3.13 will sometimes be more convenient to use. We shall prove Proposition 3.13 (later in Section 4) and then simply note that Theorem 2.2 is a special case of Proposition 3.13.
Alternatively, one could first prove Theorem 2.2 -in a virtually the same way as Proposition 3.13 is proved in this paper one only would have to use g(a, b) instead of g(a, b; a, b)[= g(a, b; b, a)] , and then it would be easy to deduce the ostensibly more general Proposition 3.13 from Theorem 2.2, in view of (3.17). Indeed, for any function g as in Proposition 3.13, one can observe that E g X, r(X, U ); X, r(X, U ) = Eg X, r(X, U ) and E g X a,b , R a,b ; a, b = Eg X a,b , R a,b for all real a and b such that either ab < 0 or a = b = 0, whereg(a, b) := g(a, b; a, b).
The following proposition, convenient in some applications, is a corollary of Proposition 3.13.
Proposition 3.14 allows one to easily obtain identities for the distribution of the ordered pair X, r(X, U ) or, more generally, for the conditional distribution of X, r(X, U ) given the random set {X, r(X, U )}.
For instance, letting g(x, r;x,r) := x ψ(x,r), one obtains the following proposition, which states that the conditional expectation of X given the random set {X, r(X, U )} is zero: More formally, one has Proposition 3.15. Suppose that ψ : R 2 → R is a symmetric Borel function, so that ψ(x, r) = ψ(r, x) for all real x and r. Suppose also that the function (x, r) → x ψ(x, r) is bounded on R 2 . Then E Xψ X, r(X, U ) = 0.
While Proposition 3.15 is a special case of Proposition 3.14 and hence of Proposition 3.13, the general case presented in Proposition 3.13 will be shown to follow rather easily from this special case; essentially, this easiness is due to the fact that a distribution on a given two-point set is uniquely determined if the mean of the distribution is known -to be zero, say, or to be any other given value.
Looking back at (3.16), one can see that the ratio X r(X,U) can be conventionally defined almost surely on the event {X = 0}; let also X r(X,U) := −1 on the event {X = 0}. Letting then g(x, r;r,x) := ψ(r, x) + ψ(x, r) x r I{xr < 0} ϕ(x,r) for all real x, r,x,r, where ψ is any nonnegative Borel function and ϕ is any symmetric nonnegative Borel function, one obtains from Proposition 3.14 the identity (3.20) E ψ X, r(X, U ) X r(X, U ) ϕ X, r(X, U ) = − E ψ r(X, U ), X ϕ X, r(X, U ) .
In particular, letting here ψ = 1, one sees that the conditional expectation of X r(X, U ) given the two-point set {X, r(X, U )} is −1: It further follows that On the other hand, letting r(X,U) X := −1 on the event {X = 0}, one has Proposition 3. 16. If X is symmetric, then E r(X,U) The contrast between (3.21) and (3.22) may appear surprising, as an ostensible absence of interchangeability between X and r(X, U ). However, this does not mean that the construction of the reciprocating function is deficient in any sense. In fact, as mentioned before, the disintegration based on r will be shown to be optimal in a variety of senses. Also, such "non-interchangeability" of X and r(X, U ) manifests itself even in the case of a "pure" two-point zero-mean distribution: for all a and b with ab < 0; recall (2.9). The "strange" inequality E X r(X,U) = E r(X,U) X (unless X is symmetric) is caused only by the use of an inappropriate averaging measure -which is the distribution of r.v. X, just one r.v. of the pair X, r(X, U ) -and this choice of one r.v. over the other breaks the symmetry. Here is how this concern is properly addressed: , and < −1 except when X is symmetric, in which case one has "= −1" in place of "< −1" ; recall that Y , Y + , and Y − are almost surely nonzero, by Proposition 3.5.
Just as in Proposition 3.13 versus (2.11), the equalities in distribution of the random two-point sets in (3.26) are understood as the equalities of the expected values of (say all nonnegative Borel) symmetric functions of the corresponding ordered pairs of r.v.'s. Proposition 3.17 and, especially, relations (3.25) suggest an alternative way to construct the reciprocating function r. Namely, one could start with an arbitrary r.v. H uniformly distributed in [0, m] and then let Y ± := x ± (H). Then, by a disintegration theorem for the joint distribution of two r.v.'s (see e.g. [9, (3.25). Finally, one would let r(y, u) := r ± (y, u) if ∓y > 0. However, this approach appears less constructive than the one represented by (2.6) and thus will not be pursued here.
Going back to (3.20) and letting there ϕ = 1 and ψ(x, r) ≡ I{(x, r) ∈ A} for an arbitrary A ∈ B(R 2 ), one has where R := r(X, U ) and µ Z denotes the distribution of a random point Z, with the rule −0 0 := 1. This means that the distribution of the random point r(X, U ), X is absolutely continuous relative to that of X, r(X, U ) , with the function (x, r) → I{x = r = 0} + −x r I{xr < 0} as a Radon-Nikodym derivative.

IOSIF PINELIS
Specializing further, with A of the form B × R for some B ∈ B(R), one has so that the distribution of r(X, U ) is absolutely continuous relative to that of X, with the function x → du as a Radon-Nikodym derivative. Recall now the special case (2.13) of (2.12). In particular, identity (2.13) implies that an arbitrary zero-mean distribution can be represented as the mixture of twopoint zero-mean distributions. However, such a mixture representation by itself is much easier to prove (and even to state) than Theorem 2.2. For instance, one has Proposition 3.18. Let g : R → R be any Borel function bounded from below (or from above) such that g(0) = 0. Then We shall give a very short and simple proof of Proposition 3.18 (see Proof 1 on page 37), which relies only on such elementary properties of the functions x + and x − as (3.1) and (iii) of Proposition 3.1. We shall also give an alternative proof of Proposition 3.18, based on [3, Theorem 2.2] as well on some properties of the functions x + and x − provided by Propositions 3.8 and 3.1 of this paper. The direct proof is a bit shorter and, in our view, simpler.
This simplicity of the proof might be explained by the observation that -while Proposition 3.18 or, for that matter, identity (2.13) describes the one-dimensional distribution of X (as a certain mixture) -Theorem 2.2 provides a mixture representation of the two-dimensional distribution of the pair X, r(X, U ) , even though the distribution of this pair is completely determined by the distribution of X. Note that the random pair X, r(X, U ) is expressed in terms of the reciprocating function, which in turn depends, in a nonlinear and rather complicated manner, on the distribution of X. Another indication of the simplicity of identity (3.27) is that it in contrast with (2.12) and even with (2.13) does not contain the randomizing random variable U . On the other hand, an obvious advantage of disintegration (2.12) is that it admits such applications to self-normalized sums as Corollaries 2.5 and 2.6.
However, there are a number of ways to rewrite (3.27) in terms similar to those of (2.13). Towards that end, for each function g as in Proposition 3.18, introduce the function Ψ g defined by the formula Then (3.27) can be rewritten as where H is any r.v. uniformly distributed on the interval [0, m]. One such r.v. is This follows in view of Hence, for all g as in Proposition 3.18, one has an identity similar in form to (2.13): However, more interesting mixture representations are obtained if one uses Proposition 3.5 (and also Proposition 3.9) instead of Proposition 3.19: Proposition 3.20. Let g : R → R is any Borel function bounded from below (or from above). Then, assuming the rule 0 Going back to (3.20) and letting therein ψ(x, r) ≡ E g(X x,r ) and ϕ = 1, one can rewrite the right-hand side of identity (3.31) here, as before, R := r(X, U ). Similarly but in a simpler say, without using (3.20) , identity (3.30) can be rewritten as Now it is immediately clear why the right-hand sides of (3.30) and (3.31) are identical to each other: because X x,r D = X r,x . This is another way to derive (3.31): from (3.30) and (3.20). Of course, identity (3.30) is the same as (2.13), which was obtained as a special case of (2.12). Here, the point is that identity (2.13) can be alternatively deduced from the simple -to state and to prove -identity (3.27).
However, no simple way is seen to deduce (2.12) from (3.27). Toward such an end, one might start with the obvious identity E g X, r(X, U ) = E g 1 (X), where g 1 (x) := 1 0 g x, r(x, v) dv. Then one might try to use (3.30) with g 1 in place of g, which yields . From this, one would be able to get (2.12) if one could replace here the terms r(x, v) and r r(x, u), v by r(x, u) and x, respectively, and it is not clear how this could be easily done, unless the distribution of X is non-atomic (cf. Propositions 3.11 and 3.12). Anyway, such an alternative proof would hardly be simpler than the proof of disintegration (2.12) given in this paper.

3.3.
Optimality properties of the two-point disintegration. Two-value zeromean disintegration is not unique. For example, consider the symmetric distribution . This distribution can be represented either as the mixture 3 10 of two asymmetric and one symmetric two-point zero-mean distributions or as the mixture of two symmetric two-point zero-mean distributions; the latter representation is a special case of (2.13) or, equivalently, (3.27).
We shall show that, in a variety of senses (indexed by the continuous superadditive functions as described below), representation (2.13) of an arbitrary zero-mean distribution as the mixture of two-point zero-mean distributions is on an average most symmetric. The proof of this optimality property is based on the stated below variants of a well-known theorem on optimal transportation of mass, which are most convenient for our purposes; cf. e.g. [13] (translated in [15, pp. 57-107]), [5], [29], [26]. We need to introduce some definitions.
Let I 1 and I 2 be intervals on the real line. A function k : for all a, b in I 1 and c, d in I 2 such that a < b and c < d. So, superadditive functions are like the distribution functions on R 2 . For a function k : I 1 × I 2 → R to be superadditive, it is enough that it be continuous on I 1 × I 2 and twice continuously differentiable in the interior of I 1 × I 2 with a nonnegative second mixed partial derivative.
Let X 1 and X 2 be any r.v.'s with values in the intervals I 1 and I 2 , respectively. Let where H is any non-atomic r.v., andx 1 : R → I 1 andx 2 : R → I 2 are any nondecreasing left-continuous functions such that Proposition 3.21. Let each of the intervals I 1 and Suppose that a function k is superadditive, right-continuous, and bounded from below on I 1 × I 2 . Then Proposition 3.22. Suppose that a function k is superadditive, continuous, and bounded from above on (0, ∞) 2 . Suppose that X 1 > 0 and X 2 > 0 a.s. Then (3.35) holds.
Propositions 3.21 and 3.22 essentially mean that, if the unit-transportation cost function k is superadditive, then a costliest plan of transportation of mass distribution µ X1 on interval I 1 to mass distribution µ X2 on I 2 is such that no two arrows in the picture here on the left may cross over; that is, smaller (respectively, larger) values in I 1 are matched with appropriate smaller (respectively, larger) values in I 2 . Note that no integrability conditions are required in Proposition 3.21 or 3.22 except for the boundedness of k from below or above; at that, either or both sides of inequality (3.35) may be infinite. Proposition 3.23. Suppose that one has a two-point zero-mean mixture representation of the distribution of a zero-mean r.v. X: for all Borel functions g : R → R bounded from below or from above, where ν is a probability measure on a measurable space (S, Σ), and y + : S → (0, ∞) and defines a probability measureν on (S, Σ), so that the functions y + and y − can (and will be) considered as r.v.'s on the probability space (S, Σ,ν); where the symbol " = " means an equality which takes place in the case when the additional symmetry condition k(x, −r) = k(r, −x) holds for all real x and r such that xr < 0; in particular, for any p > 0 and Z p : for any p 1, for any p 0, Remark. Observe that the probability measureν defined by (3.37) -which, according to part (ii) of Proposition 3.23, equalizes y ± with Y ± in distributionis quite natural, as one considers the problem of the most symmetric disintegration of an arbitrary zero-mean distribution into the mixture of two-point zeromean distributions as the problem of the most symmetric transportation (or, in other words, matching) of the measure A → E X + I{X ∈ A} to the measure A → E(−X − ) I{X ∈ A} (of the same total mass) or, equivalently, the most symmetric matching of the distribution of Y + with that of Y − . Observe also that, in terms ofν, mixture representation (3.36) can be rewritten in the form matching that of (3.27): , and at that S mν(ds) = m = m 0 dh. Remark 3.24. Inequality (3.39) means that the two-point zero-mean disintegration given in this paper is, on an average, both least-skewed to the right and leastskewed to the left, where the averaging is done according to the distribution of (Y, U ) or that of (Y + , U ) or (Y − , U ) . Inequality (3.39) is obtained as a special case of (3.38) (in view of Proposition 3.22) with k(y 1 , for positive y 1 , y 2 ; inequality (3.40) is a "two-sided" version of (3.39). Generalizing both these one-and two-sided versions, one can take k(y 1 , , where the functions f 1 , f 2 are nonnegative, continuous, and nondecreasing, and the functions g 1 , g 2 are strictly positive, continuous, and nondecreasing. Another two-sided expression of least average skewness is given by (3.41), which is obtained as a special case of (3.38) (again in view of Proposition 3.22) with k(y 1 , y 2 ) ≡ −|y 1 −y 2 | p , for positive y 1 , y 2 ; using −|(y 1 −y 2 ) ± | p instead of −|y 1 −y 2 | p , one will have the corresponding right-and left-sided versions; note that, in any of these versions, the condition that Y + p < ∞ or Y − p < ∞ is not needed. More generally, one can take k(y 1 , y 2 ) ≡ −f (c 1 y 1 − c 2 y 2 ), where f is any nonnegative convex function and c 1 , c 2 are any nonnegative constants.
On the other hand, (3.42) implies that our disintegration has the greatest paverage width |y − r(y, u)|. This two-sided version is obtained by (3.38) in view of Proposition 3.21 with k(y 1 , y 2 ) ≡ |y 1 + y 2 | p , again for positive y 1 , y 2 ; using |(y 1 + y 2 ) ± | p instead will provide the corresponding right-and left-sided versions.
The largest-p-average-width property can also be expressed by taking |y 1 y 2 | p or |y ± 1 y ± 2 | p in place of |y 1 + y 2 | p or |(y 1 + y 2 ) ± | p . More generally, one can take k(y 1 , y 2 ) ≡ f (c 1 y 1 + c 2 y 2 ), where f is any nonnegative convex function and c 1 , c 2 are again any nonnegative constants; cf. (3.43). Thus, our disintegration can be seen as most inhomogeneous in the widths of the two-point zero-mean distributions constituting the mixture.
Another way to see this is to take any nonnegative a, b and then, in Proposition 3.23, the superadditive function k(y 1 , y 2 ) ≡ I{y 1 a, y 2 b} or k(y 1 , y 2 ) ≡ I{y 1 < a, y 2 < b}. Then one sees that our disintegration makes each of the two probabilities -the large-width probability P(y − −a, y + b) and the small-width probability P(−a < y − , y + < b) -the greatest possible (over all the two-point zero-mean disintegrations, determined by the functions y ± as in Proposition 3.23).
Moreover, each of these two properties -most-large-widths and most-smallwidths -is equivalent to each of the two least-average-skewness properties: the least-right-skewness and the least-left-skewness. Indeed, for our disintegration, the right-skewness probability P(y − > −a, y + b) is the least possible, since it complements the large-width probability P(y − −a, y + b) to P(y + b), and it complements the small-width probability P(−a < y − , y + < b) to P(y − > −a), and at that each of the probabilities P(y + b) and P(y − > −a) is the same over all the disintegrations -recall part (ii) of Proposition 3.23. Similarly one shows that the least left-skewness is equivalent to each of the properties: most-large-widths and most-small-widths. So, there is a rigid trade-off between average skewness and width homogeneity.
On the other hand, reviewing the proofs of Propositions 3.21 and 3.22 (especially, see (4.13)), one realizes that the superadditive functions k of the form k(y 1 , y 2 ) ≡ I{y 1 a, y 2 b} serve as elementary building blocks; more exactly, these elementary superadditive functions (together with the functions that depend only on one of the two arguments) represent the extreme rays of the convex cone that is the set of all superadditive functions. From these elementary superadditive functions, an arbitrary superadditive function can be obtained by mixing and/or limit transition. One can now conclude that the exact equivalence between the least average skewness and the most inhomogeneous width (of a two-point zeromean disintegration) occurs at the fundamental, elementary level.
Remark. It is rather similar (and even slightly simpler) to obtain an analogue of Proposition 3.23 for the mentioned disintegration (given in [3, Theorem 2.2]) of any probability distribution into the mixture of two-point distributions with the same skewness coefficients (but possibly with different means). In fact, a same-skewness analogue of (3.42) in the limit case p = ∞ was obtained in [3, Theorem 2.3]; note that the corresponding L ∞ norm of the width equals ∞ unless the support of the distribution is bounded. In this paper, we shall not further pursue the matters mentioned in this paragraph.
3.4. Characteristic properties of reciprocating functions. To model reciprocating functions, one needs to characterize them. Let us begin here with some identities which follow from Proposition 3.18: It is interesting that identities (3.44) together with properties (i)-(iv) of Proposition 3.1 completely characterize the functions x + and x − . This allows effective modeling of asymmetry patterns of a zero-mean distribution.
In applications such as Corollaries 2.5 and 2.6, which are stated in terms of the reciprocating function r, it is preferable to model r (rather than the functions x ± ). Toward that end, let us provide various characterizations of the reciprocating function r.
For any (nonnegative) measure µ on B(R), let µ + be the measure defined by the formula Moreover, under condition (II), the measure µ as in condition (I) is unique.
In the sequel, we shall be referring to conditions (a)-(k) listed in Proposition 3.27 as 3.27(II)(a)-(k) or as 3.27(II)(a-k). Similar references will be being made to conditions listed in other propositions. Proposition 3.27 implies, in particular, that the set of all zero-mean probability measures µ on B(R) is "parameterized" via the one-to-one mapping µ ←→ (s, ν) = µ + , (r µ ) + ; also, Proposition 3.27 provides a complete description of the "parameter space" (say M) consisting of all such pairs (s, ν) = µ + , (r µ ) + .
Next, we characterize the projection of the parameter space M onto the "first coordinate axis"; that is, the set of all functions s such that (s, ν) is in M for some measure ν. In other words, we are now going to characterize the set of the "positive parts" r + of the reciprocating functions of all zero-mean probability measures µ on B(R). Toward that end, with any function s : [0, ∞] × [0, 1] associate the "level" sets here inf ∅ := ∞; note that the set of which a s is the supremum contains the point 0 and hence is never empty.
; in fact, this b necessarily coincides with b s ; (j') the set M s (0) has one of the following two forms: Now let us characterize those s = r + that determine the corresponding reciprocating function r uniquely. (u) for any x and y such that 0 x < y ∞, one of the following three conditions must occur: Proposition 3.29 shows that the intrinsic "cause" (that is, the "cause" expressed only in terms of the function s itself) of the possible non-uniqueness of r given s is that s may fail to satisfy the almost-strict-decrease condition 3.29(II)(u), while an extrinsic "cause" of such non-uniqueness is that the support set of the "positive" part µ + of µ may fail to be connected. On the other hand, the next proposition shows that another extrinsic "cause" of the possible non-uniqueness is that the "negative" part µ − of µ may fail to be non-atomic, where µ − is the measure defined by the formula (cf.   andG(x, u) , along with conditions Moreover, under condition (II), the measure µ as in (I) is unique.
The following "non-atomic" version of Propositions 3.28 and 3.29 is based in part on the well-known theorem that every non-empty closed set (say in R d ) without isolated points is the support of some non-atomic probability measure; see e.g. [20]. In contrast with Proposition 3.34, the following proposition characterizes the reciprocating functions r of non-atomic zero-mean probability measures with a connected support (rather than the "positive parts" s = r + of such functions r). (I) there exists a non-atomic zero-mean probability measure µ on B(R) such that supp µ = I and r µ = r; (II) condition r(0) = 0 holds, along with the following: Our final characterization concerns the case when it is desirable to avoid zeromean probability measures µ with a density that is discontinuous at 0 (say, as an unlikely shape). Moreover, if either condition (I) or (II) holds, then necessarily r ′ (0) = −1, that is, one has the approximate local symmetry condition r(x) ∼ −x as x → 0.
3.5. Modeling reciprocating functions. As pointed out by Bartlett [4] and confirmed by Ratcliffe [27], skewness affects the t distribution (and hence that of the self-normalized sum) more than kurtosis does. These results are in agreement with the result by Hall and Wang [11]. Tukey [30, page 206] wrote, "It would be highly desirable to have a modified version of the t-test with a greater resistance to skewness... ." This concern is addressed in the present paper by such results as Corollaries 2.5 and 2.6.
Closely related to this is the question of modeling asymmetry. Tukey [31] proposed using the power-like transformation functions of the form z(y) = a(y +c) p +b, y > −c, with the purpose of symmetrizing the data. To deal with asymmetry and heavy tails, Tukey also proposed (see Kafadar [16, page 328] and Hoaglin [12]) the so-called g-h technology, whereby to fit the data to a g-h distribution, which is the distribution of a r.v. of the form e hZ 2 /2 (e gZ − 1)/g, where Z ∼ N (0, 1), so that the parameters g and h are responsible, respectively, for the skewness of the distribution and the heaviness of the tails.
We propose modeling asymmetry using reciprocating functions. In view of Propositions 3.34 and 3.35, the reciprocating function r of any non-atomic zeromean probability measure µ with a connected support can be constructed as follows.
with the convention that r(∞) := r(∞−); here, p ∈ R \ {0} and c > 0 are real numbers, which may be referred to as the shape and scale parameters, respectively. Indeed, one can see that mere re-scaling of µ (or a corresponding r.v. X) results only in a change of c: if r p,1 = r X for some zero-mean r.v. X, then r p,c = r cX . For x ∈ [−∞, 0] such that px c (that is, for x ∈ [−∞, c p ] when p < 0), we set r p,c (x) := ∞, in accordance with the general description of Construction 1. Let us also extend the family of functions r p,c to p = 0 by continuity: Case p = 1 corresponds to the pattern of perfect symmetry of µ; that is, r(x) = −x for all x (recall Proposition 3.10). Case p > 1 corresponds to a comparatively long (or, equivalently, heavy) left tail of µ, so that µ will be skewed to the left. Similarly, case p < 1 corresponds to a comparatively long (or heavy) right tail of µ. Thus, p can be considered as the asymmetry parameter.
Another limit case is when p → ±∞ and c → ∞ in such a manner that c p → ±λ, for some λ ∈ (0, ∞), and this limit is given by where λ and ±∞ play, respectively, the roles of the scale and shape (or, more specifically, asymmetry) parameters. Yet another limit case is when c → 0 and p → 1 in such a manner that c p−1 → κ, for some κ ∈ (0, ∞), and this limit is given by However, in this case the property r ′ (0) = −1 is lost (in fact, r 1,0;κ is not differentiable at 0) unless κ = 1, so that, by Proposition 3.36, no corresponding zero-mean distribution µ can have a density that is strictly positive and continuous at 0. The tighter the graph of the reciprocating function embraces the first quadrant, the more skewed is the corresponding distribution to the right; and the tighter the graph embraces the third quadrant, the more skewed is the distribution to the left. In this example, the greater is p, the more skewed to the left must the corresponding zero-mean distribution µ be.
A simplest such function is the quadratic function F given by the formula (3.49) F (x, y) ≡ Ax 2 + 2Bxy + Ay 2 + cx + cy, so that the graphs are elliptic or hyperbolic arcs symmetric about the diagonal ∆ and passing through the origin. However, here we shall not consider this construction in detail.

IOSIF PINELIS
Instead, let us turn to for all x ∈ (a − , a + ), and the following strict Lip(1) condition (Lipschitz with constant factor 1) holds: (1), a(0) = 0, and a(w) → a + + a − as w ↑ a + − a − , then there exists a unique reciprocating function r such as in Proposition 3.35 that satisfies condition (3.50). In fact, then one necessarily has where the functions ξ and ρ are defined by In particular, Proposition 3.38 shows that the asymmetry pattern function a is necessarily Lipschitz and hence absolutely continuous, with a density a ′ (w) = da(w) dw such that −1 < a ′ (w) < 1 for almost all w ∈ [0, a + − a − ). In view of (3.50), this density a ′ may be considered as the rate of change of asymmetry α = x + r(x) relative to the varying width w = |x − r(x)| of the constituent zero-mean distribution on the two point set {x, r(x)}. For instance, if at the given width w this rate a ′ (w) is close to 1, then at this width w the distribution's skewness to the right is growing fast. Also, for all w ∈ (0, a + − a − ), the ratio represents the average asymmetry-to-width rate over all widths from 0 to w. Thus, Construction 3 provides a flexible and sensitive tool to model asymmetry patterns.
One can see that in Example 3.37 the asymmetry-to-width rate a ′ strictly increases or decreases from 0 to 1 or −1 as w increases from 0 to ∞, depending on whether p < 1 or p > 1, and a ′ (w) = 0 for all w ∈ [0, ∞) if p = 1. Moreover, Let us now provide examples of two parametric families of reciprocating functions obtained using the asymmetry-to-width rate a ′ as the starting point. for all w ∈ [0, ∞), so that, for α ∈ (0, 1], the rate a ′ (w) increases from 0 to α as w increases from 0 to ∞; similarly, for α ∈ [−1, 0), the rate a ′ (w) decreases from 0 to α as w increases from 0 to ∞. Then the corresponding asymmetry pattern function a is given by a(w) = a α,c (w) = α w 2 c + w for all w ∈ [0, ∞), and, by (3.51), the corresponding reciprocating function r is given by for α ∈ (−1, 1) and all x ∈ R; expressions for r 1,c and r −1,c are of different forms. Note that r α,c (x) ∼ α∓1 α±1 x as x → ±∞, for each α ∈ (−1, 1); on the other hand, The parameters α and c are, respectively, the shape (or, more specifically, asymmetry) and scale parameters. The graph of r α,c is the union of two hyperbolic arcs of two different hyperbolas: −(1 + α)r 2 + 2αxr + (1 − α)x 2 + cr + cx = 0 (used for x 0) and (1 − α)r 2 + 2αxr − (1 + α)x 2 + cr + cx = 0 (used for x 0) -cf. (3.49). Yet, by Proposition 3.38, all these reciprocating functions r α,c are continuously differentiable in neighborhood of 0 (in fact, they are so wherever on R they take finite values). On the left one can see parts of the graphs { x, r α,1 (x) : x ∈ [a − , a + ]} with α = −1, − 1 2 , 0, 1 2 , 1. In such an example, the shape (or, more specifically, asymmetry) parameter α can also be considered as a scale parameter -but in the direction of the diagonal ∆ Example 3.40. Take any α ∈ [−1, 1] and c ∈ (0, ∞), and consider the asymmetryto-width rate of the form for all w ∈ [0, ∞), so that, for α ∈ (0, 1], the rate a ′ (w) increases from 0 to α and then decreases from α to 0 as w increases from 0 to c/ √ 3 to ∞; similarly, for α ∈ [−1, 0), the rate a ′ (w) decreases from 0 to α and then increases from α to 0 as w increases from 0 to c/ √ 3 to ∞. The corresponding asymmetry pattern function a is given by , and, using (3.51), one can see that the corresponding reciprocating function r = r α,c is given by an algebraic expression involving certain cubics. In particular, r α,c (x) ∼ −x + 8αc 3 √ 3 as |x| → ∞. Again, the parameters α and c are, respectively, the shape (or, more specifically, asymmetry) and scale parameters. Alternatively, in this example as well, the shape/asymmetry parameter α can also be considered as a scale parameter, in the direction of the diagonal ∆. Again, the parameters α and c are, respectively, the shape (or, more specifically, asymmetry) and scale parameters. Alternatively, in this example as well, the shape/asymmetry parameter α can also be considered as a scale parameter, in the direction of the diagonal ∆. Here on the left one can see parts of the graphs { x, r α,1 (x) : x ∈ [a − , a + ]} with α = −1, − 1 2 , 0, 1 2 , 1. Examples 3.37, 3.39, 3.40 of parametric families of reciprocating functions already appear to represent a wide enough variety. Moreover, Constructions 1-4 given in this subsection appear sufficiently convenient and flexible for efficient modeling of asymmetry patterns that may arise in statistical practice. In any case, each of these constructions -of reciprocating functions for non-atomic distributions with connected support -is quite universal. For discrete distributions, it appears more convenient to model asymmetry patterns based on the characterization of the functions x ± provided by Proposition 3.26. In any such parametric or nonparametric model, the reciprocating function can be estimated in a standard manner, as follows: substituting the empirical distribution for the "true" unknown distribution µ, one obtains empirical estimates of the function G and hence empirical estimates of the functions x ± and r; then, if desired, the empirical estimate of r can be fit into an appropriate parametric family of reciprocating functions.
It remains to recall Proposition 3.6.
Thus, part (ii) of the proposition is proved; part (iv) is quite similar. Proof of Proposition 3. 10. Implications (i)⇒ (ii)⇔ (iii)⇒ (iv) follow straight from the corresponding definitions. Implication (iv)⇒ (v) follows by Proposition 3.9. Implication (ii)⇒ (i) follows by the identity for all A ∈ B(R\{0}), which in turn follows from definition (2.1). It remains to prove implication (v)⇒ (iii). Toward this end, assume (v) and observe the equivalence Therefore and by Proposition 3.5, (3.12), and Proposition 3.9, so that x − = −x + almost everywhere on [0, m] (with respect to the Lebesgue measure) and hence on an everywhere dense subset of [0, m]. Now it remains to recall property (iv) in Proposition 3.1, taking also into account that x ± (0) = 0.
Proof of Proposition 3.12. This follows immediately from Propositions 3.11 and 3.9, on letting V := v(X, U ).
Proof of Proposition 3.13. By monotone convergence, without loss of generality (w.l.o.g.) let us assume that the function g is bounded. Now, in view of (2.8) and the independence of X and U , observe that the difference between the lefthand side and the right-hand side of (2.12) equals E Xψ X, r(X, U ) , where ψ(x, y) := g(x, r; x, r) − g(r, x; r, x) x − r I{xr 0, x = r} for all real x and r, so that ψ(x, r) is understood as 0 if x = r. The function ψ is symmetric, and the expression |xψ(x, r)| |g(x, r; x, r) − g(r, x; r, x)| is bounded over all real x and r. It remains to refer to Proposition 3.15, proved later in this paper.
Proof of Proposition 3.14. This follows immediately from Proposition 3.13 and (3.17).
To prove Proposition 3.15, we shall use some notation and two lemmas, as follows. For all real a and b, let  Proof. Let us consider the following two cases.
Proof. We have to prove that e 1 + e 2 = 0 on R 2 , where e 1 and e 2 are given by (4.1) and (4.2). Observe that for all real x and y and u ∈ (0, 1). Let us now consider the four possible cases.
The first equality in (3.25) is proved similarly, with g 1 (x, r;x,r) ≡ x + ψ(x, r) and To prove the second equality in The second and third equalities in (3.26) follow immediately from (3.25). In turn, these two equalities imply the first equality in (3.26), in view of (4.8) (used with symmetric ψ).
The rest of Proposition 3.17 follows immediately from (3.24) and (3.25), except for the "except when" statement in the parentheses. To prove this latter statement, note that, in view of the inequality a b + b a < −2 for all real a and b with ab < 0 and a = −b, the equality E r(Y,U) It remains now to refer to (3.12) and Proposition 3.10.

IOSIF PINELIS
Proof of Proposition 3.19. This is quite similar to the proof of Proposition 3.5.
Proof of Proposition 3.20. Let g : R → R is any Borel function bounded from below (or from above). In addition to the function Ψ g defined by (3.28), introduce the functions Ψ g,+ and Ψ g,− defined by the formulas for all h ∈ (0, m), so that Ψ g = Ψ g,+ + Ψ g,− .

Proof of Proposition 3.22. The proof is quite similar to that of [29, Corollary 2.2(a)].
We shall only indicate the necessary changes in that proof, in the notations used there, including the correction of a couple of typos: use the interval I ε := (ε, 1 ε ] with ε ↓ 0 instead of (−B, B] and, accordingly, replace QB by I 2 ε ; w.l.o.g. one may assume here that h = 0; one does not need to assume that the integral ϕ dH at the end of [29, page 819] is finite; on line 1 of [29, page 820], there should be (h − ϕ) dH and lim inf instead of (h − ϕ) dH and lim, respectively.
So, part (iii) and thus the entire Proposition 3.23 are proved.
Proof of Proposition 3.25. The first identity in (3.44) is a special case of Proposition 3.18, with g(x) ≡ I{x > 0}; the second identity is quite similar.

Proof of Proposition 3.26.
Let Then one can check that relations (2.2), (3.1), and (3.2) hold with the functions L and y ± in place of G and x ± , respectively. Introduce also a nonnegative measure ν on B(R) by the formula for all A ∈ B(R), so that cf. (2.1) . Then, using (3.1) (with L and y + instead of G and x + ) and Fubini's theorem, one has Similarly,

Proof of Proposition 3.27.
Checking (I) =⇒ (II). Here it is assumed that there exists a zero-mean probability measure µ on B(R) whose reciprocating function r := r µ satisfies conditions µ + = ν and r + = s. We have to show at this point that then conditions (a)-(k) necessarily take place.
Step 1. Here assuming the conditions (a)-(k) to hold we shall show that there exists a unique function y − : [0, m ν ] → R such that (cf. Let us assume that indeedG ν (x, u) =G ν (y, v); we have to show that s(x, u) = s(y, v). W.l.o.g. let us also assume that here (x, u) ≺ (y, v), whence, by condition (b), s(x, u) s(y, v). One of the following two cases must take place.
Step 2. Here again assuming the conditions (a)-(k) to hold we shall show that all the conditions (i)-(iv) in Proposition 3.1 are satisfied with y − defined by (4.21) and m ν in place of x − and m.
Step 3. Now we are prepared to complete the entire proof of Proposition 3.27. By the just completed Step 2, the function y − has all the properties (i)-(iv) listed in Proposition 3.1 for x − . Letting now y + := x +,ν , observe that the function y + too has the same four properties the proof of these properties of y + = x +,ν is practically the same as the corresponding part of the proof of Proposition 3.1 .
Observe next that (cf. Proposition 3.25) Similarly to (4.19) but using (4.21) instead of (4.18), one also has Hence, by condition (k), It follows now by Proposition 3.26 that there exists a unique zero-mean probability measure µ such that x +,µ = y + and x −,µ = y − . Since x +,µ = y + = x +,ν , in view of (3.1) one has Finaly, it remains to prove the uniqueness of µ given µ + and r + . By Step 1 above, the function x −,µ is uniquely determined by the condition (cf. (4.18) and (4.21)). Also, the function x +,µ is uniquely determined byG µ+ . It remains to refer to the uniqueness part of the statement of Proposition 3.27.

Proof of Proposition 3.28.
Checking (I) =⇒ (II) To prove this implication, assume that condition (I) holds. Then s(x, 0) < 0 for all u ∈ [0, 1], so that 0 < 1 0 x −s(x,u) du < ∞. In particular, for all x ∈ (a, b) one has ϕ(x) ∈ (1, ∞). It also follows that, if x ∈ R a and s(x, u) = 0 for some but not all u ∈ [0, 1], then necessarily x = a and, by 3.27(II)(b), s(x, 1) = s(x, 0), so that (k') implies that the integral 1 0 x −s(x,u) du is finite and, by 3.27(II)(a,b), it is also strictly positive . Let also ϕ(x) := 0 for x ∈ R \ R a . Now one is prepared to introduce the measure ν 1 by the formula for all A ∈ B(R), where p 1 is any probability density function that is strictly positive on (a, b) and zero elsewhere. Recall that ϕ(x) ∈ (1, ∞) for all x ∈ (a, b). So, the measure ν 1 is finite, nonzero, nonnegative, and non-atomic (even more, it is absolutely continuous, with respect to the Lebesgue measure). Moreover, Observe also that the set D is at most countable; this follows because, for any x and y in D such that x < y, one has, by 3.27(b), s(y, 1) < s(y, 0) s(x, 1) < s(x, 0), so that the open intervals s(x, 1), s(x, 0) and s(y, 1), s(y, 0) are disjoint. So, there exists a function p 2 : D → (0, ∞) such that x∈D p 2 (x) = 1 (unless D = ∅). Take now any such function p 2 and introduce the finite nonnegative discrete measure ν 2 by the formula for all A ∈ B(R), where ϕ is still given by (4.27). Since D ⊆ R a and ϕ(x) = ∞ for some x ∈ R a only if x = a and s(a, 1) = s(a, 0), it follows that 1 < ϕ(x) < ∞ for all x ∈ D. Therefore, definition (4.30) is correct; moreover, ν 2 ({x}) > 0 for all x ∈ D, 46 IOSIF PINELIS while 0 ν 2 (R) = ν 2 (0, ∞) < 1. Furthermore, R ϕ dν 2 = x∈D p 2 (x) = 1 (unless D = ∅, in which case ν 2 is the zero measure).
Thus, the proof of implication (II) =⇒ (I) and thereby the entire proof of Proposition 3.28 is complete.
Checking (II) =⇒ (I) Assume that condition (II) holds. We have to show that there exists a unique function r such that r + = s and r coincides with the reciprocating function r µ of some zero-mean probability measure µ on B(R). The existence here follows immediately from Proposition 3.28. To verify implication (II) =⇒ (I), it remains to prove the uniqueness of r. We shall do it in steps.
Next, consider two cases: y > x and x > y, to show that either one effects a contradiction.
From this consideration of Cases 1 and 2, it follows that y = x, that is, r(z, v) = x. This completes Step 1.
Note that the pair (x z , u z ) is uniquely determined by z and the function s, and hence so isž. Note also that (4.40) implies that −∞ <ž 0.

Proof of Proposition 3.31.
Checking (I) =⇒ (II). Here it is assumed that condition (I) of Proposition 3.31 takes place. By Proposition 3.27, conditions 3.27(II)(a)-(c), (e), (f), (i), and (j) with s(x) and G(x) in place of s(x, u) andG(x, u) will then hold. So, to complete the proof of implication (I) =⇒ (II), it remains to check conditions (h') and (k").
Checking (k"). The verification of this is the same as that of condition 3.27(II)(k), taking also into account in (4.20) that µ R \ {0} = 1, since µ is non-atomic.
Then one can easily construct a measure ν on B(R) with a density g := dν dx that is continuous and strictly positive on [0, a + ) and such that condition 3.33(II) holds. Then condition 3.33(I) holds as well, so that there exists a non-atomic zero-mean probability measure µ on B(R) such that supp µ = I, µ + = ν, and the reciprocating function (r µ ) + = s. Moreover, by the uniqueness part of Proposition 3.34, r µ = r. Therefore, identity (4.50) holds with G = G µ . So, for all x in a left neighborhood (l.n.) of 0 there exists the derivative x g r(x) r ′ (x) is continuous on O − and, by (4.52), f (0−) = g(0). Gluing the functions f and g together, one sees that indeed the probability measure µ has a continuous strictly positive density a neighborhood of 0.
Next, let us check the strict Lip(1) condition, which is easy to see to be equivalent to the condition that the functions ξ and ρ (defined by (3.52)) are strictly increasing on [0, a + − a − ).