UvA-DARE (Digital Academic Repository) Hardy's inequality and its descendants

We formulate and prove a generalization of Hardy’s inequality [27] in terms of random variables and show that it contains the usual (or familiar) continuous and discrete forms of Hardy’s inequality. Next we improve the recent version by Li and Mao [42] of Hardy’s inequality with weights for general Borel measures and mixed norms so that it implies the discrete version of Liao [43] and the Hardy inequality with weights of Muckenhoupt [48] as well as the mixed norm versions due to Hardy and Littlewood [29], Bliss [8]

1. Introduction.The classical Hardy inequality is often presented as the following pair of inequalities: the continuous (or integral form) inequality says, if p > 1 and ψ is a nonnegative p−integrable function on (0, ∞), then while the discrete (or series form) inequality says, if p > 1 and {c n } ∞ 1 is a sequence of nonnegative real numbers, then For example, see pp. 239-243 of Hardy et al. (1952), Exercises 3.14 and 3.15 of Rudin (1966), Kufner et al. (2017), or Steele (2004), Chapter 9.
As Hardy (1925) mentions in his Section 5, Landau pointed out that the discrete inequality follows from the integral one by noting that c 1 ≥ c 2 ≥ • • • may be assumed, and by choosing an appropriate step function as ψ; see also Kufner et al. (2017).
Our main objective here is to give a unified formulation and proof of the inequalities (1) and (2) using the notation and language of probability theory.Along the way we will obtain a large family of other corollaries related to weighted Hardy inequalities (as given in Kufner et al. (2006) and in the book-length treatments Kufner et al. (2017) and Kufner et al. (2007)); see Section 2.
Such versions usually involve two arbitrary Borel measures.A very recent result by Li and Mao (2020) is not optimal yet, because it does not contain the discrete version as given by Liao (2015).In Section 3 we shall formulate an improvement of the result by Li and Mao (2020) that contains the discrete version by Liao (2015) as a special case.Actually our proof of this improvement is based of the discrete result of Liao (2015).An equivalent formulation of our version of Hardy's inequality with weights in terms of random variables will also be given.
Furthermore, we apply our methods from Section 2 to Copson's inequality (Copson (1927)) in Section 5 and to the reverse Hardy inequality in Section 4; cf.Renaud (1986) and Bennett (1986).We treat reverse Copson inequalities in the same style in Section 6, and we provide a probabilistic version of the inequalities of Carleman, Pólya, and Knopp in Section 7. In Section 8 we connect our new versions of Copson's inequality formulated in probability terms with counting process martingales arising in survival analysis and reliability theory.The appendix, Section 12, elaborates on survival analysis by briefly explaining connections with the forward (and backward) versions of the Kaplan -Meier estimators appearing in right (and left) censored survival data, including a short description of the analysis of data arising from the question of "when do the baboons come down from the trees".Other imsart-aos ver.2011/05/20 file: KW-Hardy-arXiv-try3b.tex date: December 4, 2021 applications are presented briefly in Section 11 and a summary of the new inequalities is given in Section 10.Most of the proofs are collected in Section 9.
Theorem 1. Hardy's inequality Let X and Y be independent random variables with distribution function F on (R, B), and let ψ be a nonnegative measurable function on (R, B).For p > 1 holds.For continuous distribution functions F this inequality may be rewritten as (4) , and for such F the constant (p/(p−1)) p is the smallest possible one.
The strength of this inequality (3) lies in the fact that it implies both the continuous and the discrete version of Hardy's inequality.
Proof.(i) and (ii) follow from Theorem 1 by taking F to be the distribution function corresponding to the uniform probability measure on [0, K] and on {1, . . ., K}, respectively, multiplying by K, and taking limits as K → ∞.
Translating Theorem 1 from random variable notation back into analysis yields the following corollary.
Corollary 2. For any p > 1, distribution function F on R, and ψ ∈ L p (F ) we have where H F is the F −averaging operator defined for x ∈ R and ψ ∈ L p (F ) by Note that H F generalizes both the discrete and the continuous Hardy averaging operators; see e.g.Kufner et al. (2006), page 715.Observe that |H F ψ| ≤ H F |ψ| holds for all measurable ψ with equality if ψ is nonnegative F -a.e.This shows the equivalence of Theorem 1 and Corollary 2.
with the convention 0/0 = 0, then the inequality does not hold anymore for some distribution functions with jumps.In particular, for X and Y Bernoulli with success P (X = 1) = q and with ψ(0) = 1, ψ(1) = 0 we get

Consequently, inequality (3) with
does not hold here for Remark 2. There are distributions for which the constant in ( 3) is not optimal for any p > 1.This is the case for all Bernoulli distributions.Let X and Y have a Bernoulli distribution with P (X = 1) = q = 1 − P (X = 0).Then with ψ(0) = a ≥ 0 and ψ(1) = b ≥ 0 our Hardy inequality (3) becomes imsart-aos ver.2011/05/20 file: KW-Hardy-arXiv-try3b.tex date: December 4, 2021 However, by convexity holds.Consequently, for the Bernoulli distribution with success probability q the optimal constant in our Hardy inequality equals at most 1 + q, for which Note that ( 12) can be rewritten as where H F is the (right-tail) F −averaging operator defined for x ∈ R and ψ ∈ L p (F ) by is the "mean residual life function" corresponding to the distribution function imsart-aos ver.2011/05/20 file: KW-Hardy-arXiv-try3b.tex date: December 4, 2021 so that the conditional centering operator I −H F = I −Ψ is an isometry.For more on this and connections to counting process martingales and survival analysis see Ritov and Wellner (1988), Efron andJohnstone (1990), andBickel et al. (1998).Strzelecki (2020) studies I − H and I − H * as operators on L p (R + , λ) where λ denotes Lebesgue measure.
Remark 4. Since the conditional distribution of X given X ≤ c has distribution function F (•)/F (c) for c ∈ R and the same holds for Y , we have the following conditional version of ( 3) where the inequality stems from (3) itself.
3. Hardy's inequality with weights and mixed norms.To the best of our knowledge the most recent and most general versions of Hardy's inequalities with weights and mixed norms are presented by Liao (2015) and Li and Mao (2020).We shall improve the result of Li and Mao (2020) so that it contains the discrete version of Liao (2015) as a special case.To this end we prove the result of Li and Mao (2020) with (−∞, x) in the inner integral replaced by (−∞, x], i.e. Theorem 2. Hardy's Inequality with Weights and Mixed Norms Let 1 < p ≤ q < ∞, and suppose that µ and ν are σ−finite Borel measures on R. Then holds for all measurable ψ : R → [0, ∞), where k q,p and B are defined by and, with Beta(a, b) = k q,p ≡ r Beta(1/r, (q − 1)/r) r/q and k p,p = p(p − 1) (1−p)/p .Remark 6.With the help of Theorem 1.4 of Liao (2015) we shall prove our Theorem 2 in Section 9.In fact these theorems are equivalent, since our theorem implies his.For nonnegative a i , u i , v i , i = 1, . . ., N , let µ and ν be measures on {1, . . ., N } that have densities u i and v −1/(p−1) i , respectively, at i with respect to counting measure, and let With these choices Theorem 2 yields ( 84) and ( 85), and hence Theorem 1.4 of Liao (2015).
holds in the situation of Theorem 2. With ψ(y) = 1 [y≤z] this yields which implies the well known inequality B ≤ C. By Theorem 2 we also have C ≤ k q,p B so C < ∞ if and only if B < ∞.The constants k q,p first appeared via a (1923) conjecture of Hardy and Littlewood (1930) which was later confirmed by Bliss (1930).See Chapter 5 of Kufner et al. (2007) for a very complete history of these developments and further results.
Theorem 2 and Remark 7 may be reformulated in terms of random variables as follows.
Theorem 3. Probability Version of Hardy's Inequality with Weights and Mixed Norms Let X and Y be independent random variables with distribution functions imsart-aos ver.2011/05/20 file: KW-Hardy-arXiv-try3b.tex date: December 4, 2021 F and G respectively, let 1 < p ≤ q < ∞, and let U and V be nonnegative measurable functions on (R, B).Furthermore let C ∈ [0, ∞] be the smallest constant such that holds for all nonnegative measurable functions ψ on (R, B).With , the string of inequalities Proof.Theorem 3 is implied by Theorem 2 via the choices µ is a probability measure dominating µ.Let F and G be the distribution functions of probability measures dominating the measures µ and ν, respectively, from Theorem 2. The choices U (x) = dµ/dF (x) and V (y) = (dν/dG(y)) 1−p show that Theorem 3 implies Theorem 2.
Following the arguments of Muckenhoupt (1972), in Section 9 we prove the following generalization of his result, which is the special case q = p of our Theorems 2 and 3.
Theorem 4. Probability Version of Muckenhoupt's Inequality Let X and Y be independent random variables with distribution functions F and G respectively, let p > 1, and let U and V be nonnegative measurable functions on (R, B).Furthermore let C ∈ [0, ∞] be the smallest constant such that holds for all nonnegative measurable functions ψ on (R, B).With the string of inequalities 29) does not imply our Hardy inequality (3).Indeed, for Bernoulli random variables with P (X = 1) = 1/p = 1 − P (X = 0) the factor B equals 1+(p−1) p−1 /p p then and hence the upper bound on C equals 1+p p /(p−1) p−1 , which is larger than (p/(p − 1)) p for p ≥ p 0 ≈ 1.77074.However, with U = G −p , V = 1 and G = F a continuous distribution function the factor B equals 1/(p − 1), which shows that (29) does imply our Hardy inequality (3) for this case.
If X is stochastically larger than Y, Y X, and they have no point masses at the same location, then Theorem 4 yields an inequality very similar to (3).A comparable result is obtained for X Y .
Corollary 3. Stochastic ordering Let X and Y be independent random variables with distribution functions F and G respectively, let p > 1, and let ψ be a nonnegative measurable function on (R, B).(a) If P (X = Y ) = 0 and F (x) ≤ G(x), x ∈ R, hold, then Proof.In case (a) we apply Theorem 4 with U = G −p and V = 1.Then B from (28) equals G −p dF.
In the first line of the last display and in the second line below we use the characterization Y X if and only if Eh(Y ) ≤ Eh(X) for all bounded and non-decreasing functions h; see e.g.Müller and Stoyan (2002) Theorem 1.2.8 (ii), page 5, or Shaked and Shanthikumar (2007) (1.A.7), page 4.
If F has a point mass at r then G has not and the stochastic ordering Combining (32)-( 34) and ( 27)-( 29) we arrive at (30).
In case (b) we apply Theorem 4 with U = F −p and V = 1.Then the continuity of F and G ≤ F imply that B from (28) satisfies and hence that (31) holds.

A reverse Hardy inequality.
There are also reversed versions of the classical Hardy inequality: the continuous (or integral form) inequality says, if p > 1 and ψ is a nonnegative, nonincreasing p−integrable function on (0, ∞), then (36) imsart-aos ver.2011/05/20 file: KW-Hardy-arXiv-try3b.tex date: December 4, 2021 while the discrete (or series form) inequality says, if p > 1 and {c n } ∞ 1 is a nonincreasing sequence of nonnegative real numbers, then Here, ζ(•) is the zeta function.These inequalities have been obtained independently by Renaud (1986) and Bennett (1986); see also Lemma 2.1 of Milman (1997).By taking ψ the indicator function of the unit interval we see that ( 36) is sharp and by taking Here are our random variable versions of ( 36) and (37).
Theorem 5. Reverse Hardy inequality Let X and Y be independent random variables both with distribution function F on (R, B), and let ψ be a nonnegative, nonincreasing measurable function on (R, B).For p > 1 and F absolutely continuous holds with equalities if ψ is constant.
For p ≥ 1 and F general holds with equalities if ψ is constant.If F is general, but p ≥ 2 is an integer, then, with X, Y, X 1 , . . ., X p independent and identically distributed and with X (p) = max{X 1 , . . ., X p }, we have (40) with equality if ψ is constant.
For further developments concerning reverse Hardy type inequalities, see Evans et al. (2008).

5.
Copson's inequality.Copson (1927) presented the following pair of inequalities: the continuous (or integral form) inequality says, if p > 1 and ψ is a nonnegative p−integrable function on (0, ∞), then holds, while the discrete (or series form) inequality says, if p > 1 and a i and a p j λ j holds.We generalize Copson's inequalities as follows.
Theorem 6. Copson's inequality Let X and Y be independent random variables with distribution function F on (R, B), and let ψ be a nonnegative measurable function on (R, B).
holds.For absolutely continuous distribution functions F the constant p p is the smallest possible one.
The strength of this inequality (43) lies in the fact that it implies both the continuous and the discrete version of Copson's inequality.Proof.By Tonelli's theorem (Fubini) equality holds in ( 41) and ( 42) for p = 1.Let p > 1. (i) can be seen by choosing X and Y uniform on (0, K) and taking limits with K → ∞.
(ii) needs a longer argument.Define Λ i = i j=1 λ j , p i = λ i /Λ K , i = 1, . . ., K, for some natural number K and define the bounded continuous function ψ such that Taking limits here for K 2 → ∞ and subsequently K 1 → ∞ we arrive at (42).
Comparison of the left side of (43) with the left side of (3) and the definition of H F in (5) leads us to define the Copson (or dual) operator H * F as follows: for x ∈ R and ψ ∈ L p (F ) where Λ(x) ≡ [x,∞) dF (y)/F (y) is the reverse (or backward) hazard function corresponding to F .(We will introduce and discuss the forward hazard function Λ(x) ≡ (−∞,x] dF (x)/(1 − F (x−)) in connection with the inequalities of Carleman, Pólya, and Knopp in Section 7.) As pointed out by Hardy in Hardy (1928), the discrete Copson inequality is a "reciprocal" or "dual" inequality of the discrete Hardy inequality (2), in the sense that one implies the other.But this holds in other senses as well.For a treatment of (1) and ( 41) based on the duality of L p and L q with 1/p + 1/q = 1, see Folland (1999), section 6.3, especially his Theorem 6.20 and Corollary 6.2.1.In particular when viewed as operators on L 2 (F ), H F and H * F are adjoint operators: for ψ and χ in L 2 (F ) we have So, H F and H * F have the same norms for p = 2, and indeed the bounds in ( 141) and ( 142) are the same for p = 2. Applying Hardy's approach we obtain the equivalence of ( 3) and ( 43).
Theorem 7. Equivalence of Hardy's and Copson's inequality Let X and Y be independent random variables with distribution function F on (R, B).For p > 1 and all nonnegative measurable functions ψ on (R, B) (3) holds if and only if for p > 1 and all nonnegative measurable functions ψ on (R, B) (43) holds.
Although this Theorem 7 (formally) renders one of our proofs of Hardy's and Copson's inequality superfluous, we have included both proofs in Section 9 to illustrate the different methods.
Remark 9.For p > 1 there are distributions for which the constant p p in ( 43) is not optimal.This is the case for all Bernoulli distributions.Let X and Y have a Bernoulli distribution with P (X = 1) = q = 1 − P (X = 0).Then with ψ(0) = a and ψ(1) = b the left hand side of our Copson inequality (43) equals where the first inequality follows from Jensen's inequality and the convexity of x → x p , x ≥ 0. The right hand side of ( 48) is bounded by where the strict inequality holds since p → p log p − (p − 1) log 2 is strictly increasing on [1, ∞) with value 0 at p = 1 and where the last expression is the upper bound in (43).
imsart-aos ver.2011/05/20 file: KW-Hardy-arXiv-try3b.tex date: December 4, 2021 Remark 10.Theorem 7 gives a qualitative connection between Hardy's inequality and Copson's inequality (or the "dual Hardy inequality").The papers by Kruglyak and Setterqvist (2008), Kolyada (2014), and Kolyada (2020) quantify these connections.These results are strongly related to further work on the connections between the I − H F and I − H * F operators on the one hand, and between the I − H F and I − H * F operators on the other hand.Also see Boza and Soria (2011).Recall that where are the backward cumulative hazard function and the (forward) cumulative hazard functions of survival analysis.

A reverse Copson inequality. Reversed versions of the classical
Copson inequality are given in Theorems 2 and 4 of Renaud (1986) Renaud (1986).His continuous (or integral form) inequality may be rephrased as follows.If p ≥ 1 holds and ψ is a nonnegative p−integrable function on (0, ∞) such that x → ψ(x)/x is nonincreasing, then It seems natural to consider a reverse Copson inequality formulated in terms of random variables.Here is our result in this direction.
Theorem 8. Reverse Copson inequality Let X and Y be independent random variables both with distribution function F on (R, B) and let ψ be a nonnegative p-integrable function on (R, B) with holds with equality if ψ = F or p = 1 holds.
If the distribution function F is continuous, ψ is nonincreasing, and p is an integer, then holds with equality if ψ is constant, F is degenerate, or p = 1 holds.If the distribution function F is arbitrary, ψ is nonincreasing, and p is an integer, then holds with equality if ψ equals 0, or F is degenerate, or p = 1 holds.
We conjecture that (54), with p! replaced by Γ(p+1), and (55) hold for all p ≥ 1, but we have no proof.Note that for F continuous (55) with p ∈ [1, ∞) follows from (53).For the situations of the continuous and discrete versions of the original Copson inequality our reverse Copson inequality implies: The proof of this corollary is almost the same as the proof of Corollary 5 in Section 5 (but with the inequality signs reversed and the constants changed), and therefore it is omitted.
Remark 11.Without continuity of F inequality ( 53) is not generally valid.Again a counterexample is provided by the Bernoulli distribution.Take . Now, as a function of the success probability q the left minus the right hand side of ( 53) equals which is negative for 1/2 < q ≤ 1.
7. The Carleman and Pólya -Knopp inequalities.Another classical pair of inequalities in this family of inequalities are those associated with the names of Pólya and Knopp in the continuous (or integral) case, and Carleman in the discrete case: for a positive function and, for a sequence of constants {c k }, Kufner et al. (2006) section 9, Kaijser et al. (2005) and Pečarić and Stolarsky (2001).By now the reader will anticipate our impulse to reformulate and unify these two inequalities in a more probabilistic vein involving random variables and distribution functions as follows: Theorem 9. Let ψ be a positive valued function on R and let X, Y be independent random variables with distribution function The proof of Corollary 1 is applicable to Corollary 7 as well.Kaijser et al. (2002) rewrite the classical integral version of the Carleman inequality as follows: replacing ψ(y) in (59) by ψ(y)/y yields This follows by elementary manipulations together with the identity x 0 log ydy = x(log x − 1).Kaijser et al. (2002) give an alternative proof of (59) by proving (61) via the following simple convexity argument.By convexity of exp, it follows from Jensen's inequality followed by Fubini's theorem that Strict inequality follows because equality in Jensen's inequality almost everywhere forces ψ to be constant a.e., but this contradicts finiteness of ∞ 0 ψ(y)/y dy.Now several questions arise: is there a corresponding rewrite of our probabilistic version of the inequalities of Carleman and Pólya -Knopp?The answer is clearly "yes" for continuous distribution functions F .Replacing ψ by ψ/F in (9) and arguing as above, but using the identity where Λ(x) ≡ [x,∞) dF (y)/F (y).This is a "left tail inequality" with motivations from survival analysis.For the corresponding "right tail inequality" we instead replace ψ by ψ/(1 − F ). Then reasoning as above yields, for continuous F , 8. Martingale connections and the H operators.In this section we expand on the comments in Sections 2, 5, and 7 concerning martingales, counting processes, and the residual life and dual Hardy operators.
First recall the operators H F , H F , H * F and H * F introduced in Section 5.With I the identity operator and F the continuous distribution function of X, Fubini's theorem yields (62 We will also need the classical Hardy operators H and H * defined by for ψ ∈ L p (R + , λ) where λ denotes Lebesgue measure.Krugljak et al. (2000) (see also Kruglyak and Setterqvist (2008)), showed that It is well known (see e.g. Brown et al. (1965)) that I − H is an isometry on L 2 (R + ).Ritov and Wellner (1988) showed that R ≡ I − H F is an isometry of L 2 (R + , F ); see also Bickel et al. (1998) Appendix A.1, pages 420 -424.These authors also showed that with R ≡ I − H F and L ≡ I − H * F we have , and we see that the analogue of the identity (63) becomes where Λ is as defined in (50).To see that this is fundamentally linked to counting process martingales, let X have distribution function F on R + , and define a one-jump counting process {N(t) : t ≥ 0} by This process is (trivially) seen to be nondecreasing in t with probability 1, and hence is a sub-martingale (a process increasing in conditional mean).By the Doob-Meyer decomposition theorem there is an increasing predictable process {A(t) : t ≥ 0} such that where {M(t) : t ≥ 0} is a mean−0 martingale.In fact for this simple counting process it is well-known that Shorack and Wellner (2009), or Chapter 18 of Liptser and Shiryayev (1978)), and hence we see that Comparing this with the identity (64) rewritten for a distribution function F on R + we see that with ψ t (x) = 1 [x≤t] and evaluating the resulting identity at x = X we get But there are still more martingales in this setting which can be represented in terms of the martingale M by bringing in the residual life operator R = I − H F .Consider the increasing family of σ−fields {F t : t ≥ 0} given by F t ≡ σ{1 [X≤s] : 0 ≤ s ≤ t}.Now let ψ ∈ L 0 2 (F ) and consider the process Since the σ−fields {F t } t≥0 are nested, {Y(t) : t ≥ 0} is a martingale (and it is often called "Doob's martingale").Furthermore, it can be represented in terms of the basic martingale M using the fundamental identity L • R = I on L 0 2 (F ) discussed above: since ψ = L • Rψ we see that Rψ(s)dM(s).
This set of connections deserves to be explored further.In particular we conjecture that many of the interesting properties of the classical Hardy operator H and the dual Hardy operator H * established in the series of papers by Krugljak et al. (2000), Kruglyak and Setterqvist (2008), Boza and Soria (2011), Kolyada (2014), Boza and Soria (2019), Kolyada (2020), and Strzelecki (2020) will have useful analogues for H F and H * F in the probability setting for Hardy's inequalities which we have considered here.On the other hand, the martingale connections of the operators L and R perhaps deserve to be better known in the world of classical Hardy type inequalities.
For further explanation of the connections of these processes with right and left censored data problems in survival analysis, see the Appendix, Section 12 .
If X 1 , . . ., X n are i.i.d. with (continuous distribution function) F , then is a counting process which is simply the sum of independent counting processes and the sum of the corresponding counting process martingales is again a counting process martingale: where Y n (t) ≡ n i=1 1 [X i ≥t] is the number of X i 's "at risk" at time t. 9. Proofs.9.1.Proofs for Section 2. In order to prove our random variable version of Hardy's inequality we need a Lemma.This Lemma has the same structure as Broadbent's proof of Hardy's inequality (3), which is a slightly improved version of Elliot's proof; see Broadbent (1928), Elliott (1926) With p i = 1 this inequality is a finite sum version of the discrete Hardy inequality (2).Taking limits as m → ∞ first on the right hand side and subsequently on the left hand side of (65) with p i = 1 we obtain the discrete Hardy inequality itself.
For any x ∈ R there exists an index n(N, x) with x ∈ (y N,n(N,x)−1 , y N,n(N,x) ]. and hence by Tonelli's theorem, Fatou's lemma and the right continuity of F lim inf
Proof. of Theorem 2. If B equals infinity, inequality ( 19) is trivial.So, we may assume that B is finite and hence for any holds, then without loss of generality we may assume that µ is a finite Borel measure and we take R For 0 < ε < 1 we define δ = εM S /(M R,S ∨ 1).With N = ⌈1/δ⌉ we choose ( 83) Note that (y n−1 , y n ) might be empty, i.e.
By Theorem 1.4 of Liao (2015) we have for nonnegative imsart-aos ver.2011/05/20 file: KW-Hardy-arXiv-try3b.tex date: December 4, 2021 With these choices the left hand side of (84) to the power q satisfies Furthermore, by Jensen's inequality (or Hölder) the third factor at the right hand side of (84) to the power p satisfies where the last expression equals the third factor at the right hand side of (19) to the power p.With these choices B d from (85) becomes imsart-aos ver.2011/05/20 file: KW-Hardy-arXiv-try3b.tex date: December 4, 2021 and analogously we obtain This implies that B d from ( 88) becomes [recall µ R,S (( where B is as in (20).Since ε may be chosen arbitrarily close to 0, this implies together with ( 81) through (87) that inequality ( 19) holds with the left hand side replaced by the right hand side of (86) to the power 1/q.
For the proof of Theorem 4 we need the following Lemma.
Lemma 2. For F and G distribution functions, χ a nonnegative measurable function and 0 < γ < 1 we have Proof.By symmetry it suffices to prove (96), which with the distribution function G With the random variable U uniformly distributed on the unit interval the left hand side of this inequality equals and satisfies imsart-aos ver.2011/05/20 file: KW-Hardy-arXiv-try3b.tex date: December 4, 2021 Proof.(of Theorem 4).The choice ψ(y) = V −1/(p−1) (y)1 [y≤r] in inequality (27) leads to the string of (in)equalities which implies the first inequality in (29).With ( 101) By Hölder's inequality this implies imsart-aos ver.2011/05/20 file: KW-Hardy-arXiv-try3b.tex date: December 4, 2021 By the definition of B in (28) the right hand side of ( 103) is bounded from above by , where the inequality follows from (97) of Lemma 2. By the definition of B the last expression is bounded by the right hand side of (29), which completes the proof of (29).9.3.Proofs for Section 4.
Proof. of Theorem 5. Let f be a density of F .The monotonicity of ψ implies for Lebesgue almost all x ∈ R.So we have imsart-aos ver.2011/05/20 file: KW-Hardy-arXiv-try3b.tex date: December 4, 2021 and hence which is the first inequality of (38).Since ψ p and 1 − F p−1 are both nonincreasing, ψ p (Y ) and 1 − F p−1 (Y ) are nonnegatively correlated and consequently their covariance is nonnegative implying This results in the second inequality of (38).
Note that inequality (39) and hence the inequality between the left hand side and the right hand side of (38) is obvious as ψ is nonincreasing.
Let F be general and p integer.As X 1 , . . ., X p are independent and identically distributed and and hence which implies (40).
For the second part of the corollary we take X and Y uniformly distributed on {1, . . ., K}.In view of P (X (p) ≤ n) = (n/K) p our inequality (40) with for any integer K 0 ≤ K and the corresponding sum vanishing for n > K 0 .
Taking limits as K → ∞ and subsequently K 0 → ∞ we obtain (115) Lemma 2 of Renaud (1986) shows for n ≥ 2. As for n = 1 equality holds in (116), the proof that for integer p inequality (37) can be obtained from our inequality (40), is complete.9.4.Proofs for Section 5. We will use the following Lemma, which shows the structure of Copson's proof of his Theorem B with sums over infinitely many terms replaced by finite sums; see Copson (1927).Lemma 3. Let a i and p i be nonnegative numbers for i = 1, . . ., m, with p 1 > 0. For p > 1 the inequality Note that part of Theorem B of Copson (1927) follows from this inequality by taking limits for m → ∞, first at the right hand side, subsequently within the p-th power at the left hand side, and finally for the first sum at the left hand side.
Proof. of Lemma 3.With the notation (118 , n = 1, . . ., m, P 0 = A m+1 = 0, Young's inequality (as in the proof of Lemma 1) yields Proof. of Theorem 6.As in the proof of Theorem 1 we define y N,i = F −1 (i/N ), i = 0, . . ., N − 1, y N,N = ∞, for large N and we apply Lemma 3 with m = N , but this time we choose Observe that F (y N,i −) ≤ F (y) + 1/N holds for y ∈ [y N,i−1 , y N,i ).Consequently we have and hence by Fatou's lemma lim inf Combining (127), Lemma 3 and ( 125) we arrive at a proof of Theorem 6.
To prove ( 54) and ( 55) we restrict attention to integer p and let X, Y, Y 1 , . . ., Y p be independent random variables all with distribution function F .
If F is continuous, the monotonicity of ψ implies that where equality holds if ψ is constant.
Similarly, if F is arbitrary, we derive One may check that equalities in (140) hold if F is degenerate.9.6.Proofs for Section 7.
Proof. of Theorem 9.By Hardy's inequality in the probability form (3) with ψ replaced by ψ 1/p we have imsart-aos ver.2011/05/20 file: KW-Hardy-arXiv-try3b.tex date: December 4, 2021 where (p/(p − 1)) p → e as p → ∞.Furthermore, taking the logarithm of the expression inside the outer expectation we see that it is equal to after letting p = 1/α.Now for every fixed X = X(ω) we see that this difference quotient converges as α ↓ 0 by the chain rule as follows: where the last equality holds by dominated convergence.Indeed, for any 1 > ε > α > 0 we have and the right hand side has finite expectation in view of ψ ∈ L 1 (F ) .
10. Summary.Our sharp inequalities related to Hardy's inequality read as follows.( 141) where the first inequality holds if F is absolutely continuous and ψ is nonincreasing.
Our sharp inequalities related to Copson's inequality are the following. ( where the first inequality holds if F is continuous and x → ψ(x)/F (x) is nonincreasing.
Our Hardy inequality with weights and mixed norms is where, with and k q,p ≡ (q − p)/p Beta(p/(q − p), (q − 1)p/(q − p)) Detailed conditions are given in the respective Theorems.
11. Applications and Related Work.We close with a few brief comments concerning applications and related work.
As noted by Diaconis ( 2002), Hardy's inequality (2), and especially the weighted version thereof due to Muckenhoupt (1972), has been applied by Miclo (1999) to obtain useful bounds for the spectral gap for birth-anddeath Markov chains.He provides a nice overview of alternative methods and their potential drawbacks.Bobkov and Götze (1999b) extend the methods of Muckenhoupt (1972) to study optimal constants in log-Sobolev inequalities on R. Because log-Sobolev inequalities are preserved by the formation of products of independent distributions (i.e.tensorization), their results yield log-Sobolev inequalities for product measures.Their results have been refined by Barthe and Roberto (2003) who go on in Barthe and Roberto (2008) to study modified log-Sobolev inequalities.Saumard and Wellner (2019) use the "two-sided" Hardy inequality given by ( 16) to give an alternative proof of Cheeger's inequality.Applications of the Hardy inequality (3) with F continuous to semiparametric models for survival analysis were given by Ritov and Wellner (1988) and Bickel et al. (1998).As noted in Sections 2, 5, 7, and 8, these results yield martingale connections with the operators H F and H * F .There has been some related work on Hardy type inequalities with similar unification (of continuous and discrete cases) as an explicit goal: for example, see Kaijser et al. (2002) and Evans et al. (2008), page 45.Li andMao (2020), page 257 and258, refer to Prohorov (2008).They all study general measures.
What about related work on formulating probabilistic versions of Hardy type inequalities?We have not found any results in this direction.Despite the many applications of Hardy and Muckenhoupt type inequalities in probability theory over the past 30 years, we are unaware of any explicit mention of these inequalities in terms of random variables.It seems to us that these inequalities should be better known in both the probability and statistics communities, and the probability versions may stimulate both further applications and further theoretical developments.In any case, it seems to be worthwhile to understand when several different formulations can be unified.
In Section 8 we sketched the connection between the operators H * a simple counting process martingale.The key functions Λ F (x) and Λ F (x) appearing in those operators (recall (50) for the explicit definitions) play an extremely important role in survival analysis and reliability theory.Also note that they do not appear without the probabilistic perspective adopted in our approach.In the Appendix (Section 12) we discuss how these functions arise in connection with left and right censored survival data.
12. Appendix.Right and Left censored data: the forward and reverse hazard functions.
Here we go further with the discussion concerning the forward and backward hazard functions connected with our random variable versions of the Copson inequalities.
12.1.Censored survival data: from the right and from the left.Suppose that X 1 , . . ., X n are i.i.d.survival times with d.f.F on [0, ∞).Furthermore, suppose that Y 1 , . . ., Y n are i.i.d.censoring times (independent of X 1 , . . ., X n ) with distribution function G. Unfortunately we do not get to observe the X i 's.Instead, for each individual we observe where ∆Λ(s) ≡ Λ(s) − Λ(s−) and Λ c (t) ≡ Λ(t) − s≤t ∆Λ(s).This is the setting of (random, right) -censored survival data, and the (nonparametric) maximum likelihood estimators of Λ and 1 − F are the famous Nelson-Aalen estimators Λ of Λ and Kaplan-Meier estimator 1 − F n of 1 − F .This is the random censorship version of right-censored survival data.For treatments of fixed (i.e.deterministic) censoring times, see Pollard (1990) and Meier (1975).
Before discussing right-censoring further, suppose instead that we observe imsart-aos ver.2011/05/20 file: KW-Hardy-arXiv-try3b.tex date: December 4, 2021 where the U i 's are i.i.d. with d.f.F , and the V i 's are i.i.d.G (and independent of the U i 's).The goal again is to estimate the (reverse or backwards) cumulative hazard function Λ F (t) ≡ [t,∞) dF (s)/F (s) and the d.f.F .This is left-censored survival data.Note that Λ F is the function which arose naturally in the random variable version of Copson's inequality in Section 8.A famous example of left-censored data is the data which arose in a study of the descent times of baboons in the Amboseli Reserve, Kenya.See Wagner and Altmann (1973), Ware and DeMets (1976), Csörgő and Horváth (1980), Csörgő and Horváth (1985).In this study the U i 's represent the times when the baboons descended from the trees in the morning while the V i 's represent the times at which the investigators arrived at the study site.If a baboon descended before its observer arrived at the study site, then that baboon's U i is regarded as being "left -censored".Again the goal is nonparametric estimation of the d.f. of the U i 's.
In this setting, once we have an estimator Λ F,n of Λ = Λ F , then estimation of F is immediate since 12.2.Nonparametric estimation for right or left censored survival data.First the classical and frequently occurring censoring from the right.To see that Λ F and 1 − F can be estimated nonparametrically from the observed data, consider the following empirical distributions: imsart-aos ver.2011/05/20 file: KW-Hardy-arXiv-try3b.tex date: December 4, 2021 where "uc" stands for "uncensored" observations and "c" stands for "censored" observations.By the strong law of large numbers, (1 − G(s−))dF (s) = H uc (t), (1 − F (s))dG(s) = H c (t), Then 1− F n (t) = s≤t (1−∆ Λ n (s)) is the Kaplan and Meier (1958) estimator of 1 − F .Now for estimation in the presence of censoring from the left.To see that Λ F and Λ F can be estimated nonparametrically from the observed (leftcensored) data, consider the following empirical distributions: H Then F n (t) = s≥t ∆ Λ n (s) is the "reverse" or "backwards" Kaplan -Meier estimator of F ; see e.g.Ware and DeMets (1976) and Csörgő and Horváth (1980), Csörgő and Horváth (1985).For more on left-censoring, the data in the baboon study, and a plot of the resulting backwards Kaplan-Meier estimator, see Andersen et al. (1993), pages 24, 162-165, and 273-274.