Concentration of information content for convex measures

We obtain sharp exponential deviation estimates of the information content as well as a sharp bound on the varentropy for convex measures, extending the development for log-concave measures in recent work of Fradelizi, Madiman and Wang (2015). Note: This document is not intended for submission to a journal in its current form. It will form one part of a larger paper [1] (which is in preparation), and is being made available separately for quicker accessibility for those interested.


Introduction
Let X be a random variable taking values in R n . Suppose that the distribution of X has density f with respect to the Lebesgue measure on R n . The information content of X is the random variable h(X) = − log f (X). (1) The average value of h(X) is known more commonly as the entropy. The entropy of X is defined by Because of the relevance of the information content in various areas such as information theory, probability and statistics, it is intrinsically interesting to understand its behavior. In particularly, it is natural to ask whether the information content concentrates around the entropy in high dimension. If X is a standard Gaussian random vector in R n , its information content is where | · | is the Euclidean norm. In this case, the concentration property of |X| 2 around its mean is well known. Bobkov and Madiman [2] showed that h(X) does possess a powerful concentration property if X has a log-concave density. That in particular includes the Gaussian case, which had earlier been treated by Cover and Pombra [9]. The proof heavily depends on the localization lemma of Lovász-Simonovits [15] and reverse Hölder type inequalities [6]. As a consequence, the following variance bound holds for all log-concave random variables Var( h(X)) ≤ Cn (3) with some absolute constant C. Recently, it was independently determined by Nguyen [17] and Wang [18] that the sharp constant is 1. Simpler proofs of the sharp constant are independently given by Bolley, Gentil and Guillin [5] and Fradelizi, Madiman and Wang [11]. The key observation in [11] is that variance bound emerges as a consequence of the log-concavity of the moments of log-concave functions.
In this note, we extend the concentration property of the information content from logconcave measures to general convex measures. In Section 2, we will show that exponential deviation of a functional follow from the log-concavity of normalized moments of that functional. In particularly, the log-concavity of the normalized moments of s-concave functions are studied in Section 3. Optimal concentration and sharp variance bound of the information content of κ-concave random variables are obtained in Section 4.

A general principle for exponential deviation
Let X be a random variable taking values in R n . Suppose that it has density f with respect to the Lebesgue measure on R n . Let ϕ : R n → R be real-valued function. One natural way to show the exponential deviation of ϕ(X) from its mean is to prove the finiteness of the moment generating function Ee αϕ(X) for certain α. The logarithmic moment generating function L(α) is defined by The following observation is a well known fact about exponential families in statistics. Let a, b > 0 be certain real numbers.

Proof.
The assumption L(α) < ∞ for α ∈ (−a, b) guarantees that L(α) is infinitely differentiable with respect to α ∈ (−a, b) and that we can freely change the order of differentiation and expectation. Then we have Differentiate L ′ (α) one more time. We have for all x, y ∈ R n and all λ ∈ [0, 1].
The following lemma tells us that the upper bound of Ee αϕ(X) emerges as a consequence of the log-concavity of L(α) after certain normalization. Lemma 2.3. Let c(α) be a smooth function such that e −c(α) Ee αϕ(X) is log-concave for −a < α < b. Then we have where Proof. Since e −c(s) Ee sϕ(X) is log-concave, we have L ′′ (s) ≤ c ′′ (s). For any 0 < t < α < b, integrating the inequality over (0, t) we have Integrating both sides over (0, α), we have Similarly we can show that the estimate also holds for −a < α < 0. Notice that L(0) = 0 and L ′ (0) = Eϕ(X). Then the lemma follows from exponentiating both sides of (9).
Remark 2.4. From Lemma 2.1 and Lemma 2.3, we can see that the study of upper bound of Var(ϕ(X α )) is equivalent to that of the normalizing function for Ee αϕ(X) to be log-concave. We can get one from the other by differentiating or integrating twice. That is why variance bounds can imply exponential deviation inequalities when moment generating functions exist.
Proof. The proof follows from the so-called Cramér-Chernoff method: using Markov inequality in conjugation with optimization of the resulting bound. For the upper tail, we have for 0 < α < b and t > 0 that We use Lemma 2.3 in the second inequality. Then the upper tail estimate follows by taking the infimum of the right hand side over 0 < α < b. The lower tail estimate follows from the same argument for −a < α < 0.
3 Log-concavity of the moments of s-concave functions In this section, we study the log-concavity of the (normalized) moments of s-concave functions, which, in conjugation with the results from the previous section, will enable us to obtain optimal concentration of the information content for convex measures.
For s = 0, the right hand side is defined by continuity, which corresponds to log-concave functions defined before. For s > 0, the previous definition is equivalent to that f s is concave on its support; while for s < 0, it is equivalent to that f s is convex on its support.
Recall that for x > 0, the gamma function Γ(x) is defined by For x, y > 0, the beta function B(x, y) is defined by The following result is proved by Borell [6] for s > 0, except that the function ϕ is assumed to be decreasing. It was then noticed by some people and available for example in Guédon, Nayar and Tkocz [12] that the result remains true without any monotonicity hypothesis. For s < 0, it is proved by Fradelizi, Guédon and Pajor [10], and the case s = 0 follows by taking the limits (or reproducing the mechanics of the proof).

Let us define the function ϕ s (t) = (1 − st)
1/s + 1 R + for s = 0, and ϕ 0 (t) = e −t 1 R + . Then the preceding proposition may be expressed in the following way: if ϕ : is log-concave for p such that 1/p > max(0, −s). Using the preceding proposition, we can prove the following theorem which unifies and partially extends previous results of Borell [6], Bobkov and Madiman [3], and Fradelizi, Madiman and Wang [11]. A weaker log-concavity statement was also obtained by Nguyen [16].
Theorem 3.1. Let s ∈ R and let f : R n → R + be an integrable s-concave function. Then the function is log-concave for p > max(0, −ns).
Proof. The case s = 1 is due to Borell [6] and the case s > 0 deduces directly by applying Borell's result to f s . The case s = 0 was proved by Fradelizi, Madiman and Wang [11]. The case s = −1 is due to Bobkov and Madiman [3] 1 , except that the range was p > n + 1. In the same way, the case s < 0 deduces from the case s = −1 by applying it to f |s| . So we only need to prove the extension of the range for s = −1. Let us assume that s = −1. Thus f is −1-concave, which means that g = f −1 is convex on its support. As done by Bobkov and Madiman [3], we write where ψ(t) = |{x ∈ R n : g(x) ≤ t}| n is the Lebesgue measure of the sub-level set {x ∈ R n : g(x) ≤ t}. Using Brunn-Minkowski theorem, we can see that ψ is a 1/n-concave function. Using the properties of the perspective function, we can deduce that the function ϕ(t) = t n ψ(1/t) is also a 1/n-concave function. Thus it follows that Applying Proposition 3.2 to s = 1/n and p replaced by p − n we get that is log-concave on (n, +∞). Then we can conclude the proof using the following identity B(p − n, n + 1) −1 = p(p − 1) · · · (p − n) Γ(n + 1) .
The fact that Theorem 3.1 is optimal can be seen from the following example. Let U : R n → [0, ∞] be a positively homogeneous convex function of degree 1, i.e. that U (tx) = tU (x) for all x ∈ R n and all t > 0. We define f s,U = (1 − sU ) 1/s + for s = 0 and f 0,U = e −U for s = 0. Then we have where C U is the Lebesgue measure of the sub-level set {x ∈ R n : U (x) ≤ 1}. We only check the identity for s > 0, and the other two cases can be proved similarly.
In the third equation, we use the homogeneity of U and the property of Lebesgue measure. Then we can prove the identity using the following fact B(p/s, n + 1) = n! (p/s + n) · · · p/s .
Thus the preceding theorem can be written in the following way: if f : R n → R + is an integrable s-concave function, then is log-concave for p > max(0, −ns).

Concentration of information content
Now we are ready to study the concentration property of information content for convex measures introduced and studied by Borell [7,8].
We say that a R n -valued random variable X is κ-concave if the probability measure induced by X is κ-concave. In this section, we let X be a κ-concave random variable with density f and κ < 0. Then Borell's characterization implies that there is a convex function V such that f = V −β . In the following, we will study the deviation of h(X) from its mean h(X), that is corresponding to taking ϕ = − log f in Section 2. Then the moment generating function is The integral is finite as long as (1 − α)β > n, i.e. that α < 1 − n/β. Proposition 4.2. Let β > n and let X be a random variable in R n with density f being −1/β-concave. Then the function is log-concave for α < 1 − n/β.
Proof. It easily follows from Theorem 3.1 with p replaced by 1−α and s replaced by −1/β.
Remark 4.4. The variance bound is sharp. Suppose X has density f = (1 + U/β) −β + with U being a positively homogeneous convex function of degree 1. In this case, the function in Proposition 4.2 is log-affine, i.e. L ′′ (α) = c ′′ (α). Then we have equality in the above variance bound. In particular, it includes the Pareto distribution with density where a > 0 and Z n (a, β) is a normalizing constant.
Let β > n + 2 and let X be a random variable in R n with density f being −1/β-concave. In this case, we have E|X| 2 < ∞ and the covariance matrix Σ is defined by Then we have where tr(Σ) is the trace of Σ and J(X) is the Fisher information defined by Combining with Corollary 4.3 we have the following result.
Corollary 4.5. Let β > n + 2 and let X be a random variable in R n with density f being −1/β-concave. Then we have In particular, if X is isotropic, i.e. that EX = 0 and Σ is the identity matrix, we have Taking β → ∞ yields the analogue for log-concave random variables, namely which was observed by Nguyen [16].
Theorem 4.1. Let β > n and let X be a random variable in R n with density f being −1/βconcave. Then we have Particularly, we have equality for Pareto distributions.
Proof. The moment generating function bound (25) easily follows from Lemma 2.3 and Proposition 4.2. Some easy calculations will show the equality case for Pareto distributions. Essentially that is due to the identity L ′′ (α) = c ′′ (α), where c(α) is defined in (17).
In general we do not have explicit expressions for ψ * c,+ or ψ * c,− . The following result was obtained by Bobkov and Madiman [3] with the assumption β ≥ n + 1, which can be relaxed to β > n. It basically says that the entropy of an κ-concave distribution can not exceed that of the Pareto distribution with the same maximal density value.
where we denote by f ∞ the essential supremum. We have equality for Pareto distributions.
The following result is an improvement of Proposition 5.1 of Bobkov and Madiman [4]. Its analogue for log-concave probability measures was first observed by Klartag and Milman [14], with refinement made by [11,Corollary 4.7].
Therefore we have ψ * c,+ (t) = α * t − ψ c (α * ), where α * is a positive number such that (αt − ψ c (α)) ′ (α * ) = 0, i.e. that Using the definitions of ψ c (α) and t in (26) and (31), respectively, we have Combining with (32), we have where That is equivalent to the desired statement. To see that c 1 < 1, we take the logarithm of c 1 , We use the equation (34) in the second identity. The last inequality follows from the fact that log(1 + x) < x for x > 0.