Strictly subgaussian probability distributions

We explore probability distributions on the real line whose Laplace transform admits an upper bound of subgaussian type known as strict subgaussianity. One class in this family corresponds to entire characteristic functions having only real zeros in the complex plane. Using Hadamard’s factorization theorem, we extend this class and propose new sufﬁcient conditions for strict subgaussianity in terms of location of zeros of the associated characteristic functions. The second part of this note deals with Laplace transforms of strictly subgaussian distributions with periodic components. This class contains interesting examples, for which the central limit theorem with respect to the Rényi entropy divergence of inﬁnite order holds.


Introduction
Following Kahane [15], a random variable X is called subgaussian, if E e cX 2 < ∞ for some constant c > 0. Assuming that X has mean zero, this is equivalent to the statement that the moment generating function (or the two-sided Laplace transform) of X satisfies E e tX ≤ e σ 2 t 2 /2 , t ∈ R, ( with some constant σ 2 .Its optimal value appears in the literature under different names such as a subgaussian constant or as an optimal proxy variance.Being deepely connected with logarithmic Sobolev constants and concentration of measure phenomena, the problem of computation or estimation of σ is of considerable interest (including a similar quantity for a more general setting of metric spaces, cf.[5]).For example, in the case of a centered Bernoulli distribution pδ q + qδ −p , the subgaussian constant was identified, although not with a rigorous proof, by Kearns and Saul [17] to be σ 2 = p − q 2 (log p − log q) (1.2) (cf.also [6], [2]).A similar expression was obtained by Diaconis and Saloff-Coste [9] and by Higuchi and Yoshida [13] for the logarithmic Sobolev constant of a Markov chain on the two point space.Immediate consequences of inequality (1.1) are the finiteness of moments of all orders of X and in particular the relations EX = 0 and EX 2 ≤ σ 2 , which follow by an expansion of both sides of (1.1) around t = 0.Here the possible case σ 2 = Var(X) is of particular interest.The following definition seemed to have appeared first in the work of Buldygin and Kozachenko [7] who called this property "strongly subgaussian".
Definition.The random variable X is called strictly subgaussian, or the distribution of X is strictly subgaussian, if (1.1) holds with the optimal constant σ 2 = Var(X).
Such distributions appear in a natural way in a variety of mathematical problems, as well as in statistical mechanics and quantum field theory.For example, under the name "sharp subgaussianity", this class was recently considered in the work by Guionnet and Husson [10] as a condition for LDPs for the largest eigenvalue of Wigner matrices with the same rate function as in the case of Gaussian entries.Our interest has been motivated by the study of the central limit theorem with respect to information-theoretic distances.Let us clarify this connection in the following statement.
Given independent copies (X n ) n≥1 of a random variable X with mean zero and variance one, suppose that the normalized sums Z n = 1 √ n (X 1 + • • • + X n ) have densities p n for large n.The Rényi divergence of order α > 0 from the distribution of Z n to the standard normal law with density ϕ (or the relative α-entropy) is defined by It is non-decreasing as a function of α, representing a strong distance-like quantity.Here, the case α = 1 corresponds to the relative entropy (Kullback-Leibler's distance) and another important case α = 2 leads to the function of the χ 2 -Pearson distance.
Theorem 1.1.Suppose that D α (p n ||ϕ) < ∞ for every α and some n = n α .For the convergence D α (p n ||ϕ) → 0 as n → ∞ with an arbitrary α > 0, it is necessary and sufficient that X is strictly subgaussian.This characterization follows from the results of [3], which will be discussed later.Of course, the property of being strictly subgaussian does not require that the distribution of X has a density.From (1.2) it already follows that the symmeric Bernoulli distribution belongs to this class.More examples are discussed in Arbel, Marchal and Nguyen [1], where it is also shown that the distribution of X does not need be symmetric.The problem of characterization of the whole class of such distributions is still open and seems to be highly non-trivial.Nevertheless, there is a simple general sufficient condition for the strict subgaussianity given by Newman [23] (Theorem 4, see also [8], Chapter 1) in terms of location of zeros of the characteristic function Note that the subgaussian property (1.1) ensures that f has an analytic extension from the real line to the whole complex plane as an entire function of order at most 2.
Theorem 1.2.Let X be a subgaussian random variable with mean zero.If all zeros of f (z) are real, then X is strictly subgaussian.This condition is easily verified for many interesting classes including, for example, arbitrary Bernoulli sums and (finite or infinite) convolutions of uniform distributions on bounded symmetric intervals.
The probability distributions of Theorem 1.2 form an important class L, introduced and studied by Newman in the mid 1970's in connection with the Lee-Yang property which naturally arises in the context of ferromagnetic Ising models, cf.[23,24,25,26].We will recall the argument and several properties of this class in Section 4. Note that if the characteristic function f (z) of a subgaussian random variable X does not have any real or complex zeros, a well-known theorem due to Marcinkiewicz [22] implies that the distribution of X is already Gaussian.Thus, non-normal subgaussion distributions need to have zeros.Towards the characterization problem, the main purpose of this note is to explore two natural subclasses of distributions outside L that are still strictly subgaussian.First, we extend Theorem 1.2 in terms of zeros of characteristic functions.
Theorem 1.3.Let X be a subgaussian random variable with symmetric distribution.If all zeros of f (z) with Re(z) ≥ 0 lie in the cone centered on the real axis defined by then X is strictly subgaussian.
At the first sight, the condition (1.4) looks artificial.However, it turns out to be necessary in the following simple situation: Theorem 1.4.Let X be a random variable with a symmetric subgaussian distribution.Suppose that f has exactly one zero z = x + iy in the positive quadrant x, y ≥ 0. Then X is strictly subgaussian, if and only if (1.4) holds true.
As a consequence of Theorem 1.3, one can partially address the following question from the theory of entire characteristic functions (which is one of the central problems in this area): What can one say about the possible location of zeros of such functions?Theorem 1.5.Let (z n ) be a finite or infinite sequence of non-zero complex numbers in the angle Then there exists a symmetric strictly subgaussian distribution whose characteristic function has zeros exactly at the points ±z n , ±z n .
It will be shown that a random variable X with such distribution may be constructed as the sum X = n X n of independent strictly subgaussian random variables X n whose characteristic function has zeros at the points ±z n , ±z n (and only at these points like in Theorem 1.4).Moreover, one may require that with any prescribed value Λ ≥ Λ 0 where Λ 0 is a universal constant (Λ 0 ∼ 5.83).
Returning to Theorem 1.3, it will actually be shown that, if a strictly subgaussian random variable X is not normal, the inequality (1.1) may be further sharpened as follows: For any t 0 > 0, there exists c = c(t 0 ), 0 < c < σ 2 = Var(X), such that E e tX ≤ e ct 2 /2 , |t| ≥ t 0 . (1.5) In particular, such a refinement applies to Theorem 1.2.The property (1.5) is important in the study of rates in the local limit theorems such as CLT for the Rényi divergence of infinite order.Two results in this direction will be mentioned in the end of this note.
The sharpening (1.5) raises the question of whether or not this separation-type property is fulfilled automatically for any non-normal strictly subgaussian distribution.At least, it looks natural to expect the weaker relation for the Laplace transform L(t) = E exp{tX}.However, the answer to this is negative, and moreover, (1.6) may turn into an equality for infinitely many points t.In addition, the characteristic function f (z) may have infinitely many zeros approaching the imaginary line Arg(z) = π 2 .To this aim, we introduce the following: Definition.We say that the distribution µ of a random variable X is periodic with respect to the standard normal law γ, with period h > 0, if it has a density p(x) such that the density of µ with respect to γ, represents a periodic function with period h, that is, q(x + h) = q(x) for all x ∈ R.
We denote the class of all such distributions by F h and say that X belongs to F h .The following characterization in terms of Laplace transforms may be useful.
Theorem 1.6.Any random variable X in F h is subgaussian, and the Laplace transform of its distribution is resepresentable as where the function Ψ is periodic with period h.Conversely, if Ψ(t) for a subgaussian random variable X is h-periodic, then X belongs to F h , as long as the characteristic function f (t) of X is integrable.
In this way, we obtain a wide class of strictly subgaussian distributions, by requiring that Ψ(t) ≤ 1 for all t.As a simple example, for any sufficiently small c > 0, represent respectively the Laplace transform and the characteristic function of a strictly subgaussian distribution with mean zero and variance one.In this case, we have L(t) = e t 2 /2 for all t = πk, k ∈ Z, and f (z m ) = 0 for z m = a + 2πim, m ∈ Z, where a > 0 depends on the parameter c.Hence Arg(z m ) → π 2 as m → ∞.
More examples based on trigonometric polynomials will be described in Section 12.The proof of Theorem 1.6 is given in Sections 10-11.Theorems 1.3 and 1.5 are proved in Sections 8-9, with preliminary steps in Sections 6-7, and Section 5 is devoted to the proof of Theorem 1.4.In Sections 3-4 we recall basic definitions and results related to the Hadamard factorization theorem the class L. We conclude with some remarks on the central limit theorem with respect to the Rényi divergences.Thus, our plan is the following: In addition to the properties EX = 0 and EX 2 ≤ σ 2 , the Taylor expansion of the exponential function in (1.1) around zero implies as well that necessarily EX 3 = 0 and EX 4 ≤ 3σ 4 .Here an equality is attained for symmetric normal distributions (but not exclusively so).
Turning to other properties and some examples, first let us emphasize the following two immediate consequences of (1.1).
Proposition 2.1.If the random variables X 1 , . . ., X n are independent and strictly subgaussian, then their sum X = X 1 + • • • + X n is strictly subgaussian, as well.Proposition 2.2.If strictly subgaussian random variables (X n ) n≥1 converge weakly in distribution to a random variable X with finite second moment, and Var(X n ) → Var(X) as n → ∞, then X is strictly subgaussian.
Proof.By the assumption, putting σ 2 n = Var(X n ), we have Var(X n ) < ∞ for independent, strictly subgaussian summands X n , then the series X = ∞ n=1 X n represents a strictly subgaussian random variable.
Here, the variance assumption ensures that the series ∞ n=1 X n is convergent with probability one (by the Kolmogorov theorem), so that the partial sums of the series are weakly convergent to the distribution of X.Thus, the class of strictly subgaussian distributions is closed in the weak topology under infinite convolutions.
Obviously, it is also closed when taking convex mixtures.
Proposition 2.4.If X n are strictly subgaussian random variables with Var(X n ) = σ 2 , and µ n are distributions of X n , then for any sequence p n ≥ 0 such that ∞ n=1 p n = 1, the random variable with distribution µ = ∞ n=1 p n µ n is strictly subgaussian as well and has variance Var(X) = σ 2 .
Note also that, if X is strictly subgaussian, then λX is strictly subgaussian for any λ ∈ R. Finally, let us give a simple sufficient condition for the property (1.5).Recall the notation K(t) = log E e tX , t ∈ R. Proposition 2.5.Let X be a non-normal strictly subgaussian random variable.If the function K( |t|) is concave on the half-axis t > 0 and is concave on the half-axis t < 0, then (1.5) holds true.
Proof.Let Var(X) = σ 2 .For t ≥ 0, write By the assumption, W (s) is non-negative and convex in s ≥ 0, with W (0) = 0.In addition, it is C ∞ -smooth on (0, ∞).Since X is not normal, necessarily W (s) > 0 and W ′ (s) > 0 for all s > 0. Using that W ′ (s) ↑ r as s → ∞ for some r ∈ (0, ∞], it follows that In particular, given s 0 > 0, we have 1 s W (s) ≥ r(s 0 ) > 0 for all s ≥ s 0 , or equivalently which is the desired conclusion.A similar argument works for t < 0 as well.
An application of Corollary 2.3 allows to construct a rather rich family of probability distributions from the class L. Recall that L(t) = E e tX denotes the Laplace transform.
Example 2.6.First of all, if a random variable X has a normal distribution with mean zero and variance σ 2 , then it is strictly subgaussian with L(t) = e σ 2 t 2 /2 , t ∈ R.
Example 2.7.If X has a symmetric Bernoulli distribution, supported on two points ±1, then it is strictly subgaussian with n .The Laplace transform and characteristic function f of X are given by Example 2.9.If X is uniformly distributed on an interval [−a, a], a > 0, it is strictly subgaussian.In this case it may be represented (in the sense of distributions) as the sum Hence, this case is covered by the previous example, with L(t) = sinh(at) at .Example 2.10.If the random variables X n are independent and uniformly distributed on the interval Example 2.11.Suppose that X has density p(x) = x 2 ϕ(x), where ϕ(x) = 1 √ 2π e −x 2 /2 is the standard normal density.Then EX = 0, σ 2 = EX 2 = 3, and Hence, X is strongly subgaussian.
Example 2.12.More generally, if X has a density of the form Hence, X is strictly subgaussian.The last inequality follows from Theorem 1.2, since the Chebyshev-Hermite polynomials have real zeros, only.

Hadamard's and Goldberg-Ostrovskiȋ's Theorems
All the previous examples may be included as partial cases of a more general setup.First, let us recall some basic definitions and notations related to the Hadamard theorem from the theory of complex variables.Given an entire function f (z), introduce which characterizes the growth of f at infinity.The order of f is defined by Thus, ρ is an optimal value such that, for any ε > 0, we have M f (r) < e r ρ+ε for all large r.If f is a polynomial, then ρ = 0.If ρ is finite, then the type of f is defined by Thus, τ is an optimal value such that, for any ε > 0, we have M f (r) < e (τ +ε) r ρ for all sufficiently large r.If 0 < τ < ∞, the function f is said to be of normal type.
For integers p ≥ 0, introduce the functions called the primary factors, with the convention that G 0 (u) = 1 − u.Given a sequence of complex numbers z n = 0 such that |z n | ↑ ∞, one considers a function of the form called a canonical product.An integer p ≥ 0 is called the genus of this product, if it is the smallest integer such that There is a simple estimate log |G p (u)| ≤ A p |u| p+1 where the constant A p depends on p only.Therefore, the product in (3.1) is uniformly convergent as long as (3.2) is fulfilled.
See e.g.Levin [19] for the following classical theorem.
Theorem 3.1 (Hadamard).Any entire function f of a finite order ρ can be represented in the form Here z n are the non zero roots of f (z), the genus of the canonical product satisfies p ≤ ρ, P (z) is a polynomial of degree ≤ ρ, and m ≥ 0 is the multiplicity of the zero at the origin.
In order to describe the convergence of the canonical product, assume that f (z) has an infinite sequence of non-zero roots z n arranged in increasing order of their moduli so that Define the convergence exponent of the sequence a n by A theorem due to Borel asserts that the order ρ of the canonical product Π(z) satisfies ρ ≤ ρ 1 .Moreover, Theorem 6 from [19], p.16, states that the convergence exponent of the zeros of any entire function f (z) does not exceed its order: ρ 1 ≤ ρ.Thus, for canonical products the convergence exponent of the zeros is equal to the order of the function: There is also the following elementary relation between the convergence exponent and the genus of the canonical product: p ≤ ρ 1 ≤ p + 1.Assuming that ρ 1 is an integer, we have that that the latter series is convergent.The following theorem due to Goldberg and Ostrovskiȋ [11] refines Theorem 3.1 for the class of ridge entire functions whose all zeros are real.Recall that f is a ridge function, if it satisfies |f (x + iy)| ≤ |f (iy)| for all x, y ∈ R. Theorem 3.2 (Goldberg-Ostrovskiȋ).Suppose that an entire ridge function f of a finite order has only real roots.Then it can be represented in the form ) We refer to [11].See also Kamynin [16] for generalizations of Theorem 3.2 to the case where the zeros of f are not necessarily real.

Characteristic Functions with Real Zeros
We are now prepared to prove Theorem 1.2, including the relation (1.5) in the non-Gaussian case which is stronger than (1.1).
Thus, let X be a subgaussian random variable with mean zero and variance σ 2 = Var(X).Then the inequality (1.1) may be extended to the complex plane in the form for some constant b ≥ σ 2 , where f is the characteristic function of X.Hence, f is a ridge entire function of order ρ ≤ 2. We are therefore able to apply Theorem 3.2 which yields the representation (3.4) for some c ∈ C, γ ≥ 0, β ∈ R, and for some finite or infinite sequence are all zero of f (this set may be empty).Since f (0) = 1 and f ′ (0) = 0, we necessarily have c = 1 and β = 0. Hence, this representation is simplified to Since f ′′ (0) = −σ 2 , we also have so that γ ≤ σ 2 .Applying (4.1) with z = −it, t ∈ R, we get a similar representation for the Laplace transform Using 1+x ≤ e x (x ∈ R), we see that the right-hand side above does not exceed e σ 2 t 2 /2 , where we used (4.2).Hence (4.3) leads to the desired bound (1.1), and Theorem 1.2 is proved.Let us also verify the property (1.5) in the case where the random variable X is not normal.Then the product in (4.3) is not empty and therefore γ < σ 2 .Let us rewrite (4.3) as Since the function V is concave, it remains to refer to Proposition 2.5.
Remark.Using (4.2), let us rewrite (4.1) with z = t ∈ R in the form Here, the terms in the product represent characteristic funtions of random variables 1 zn X n such that all X n have density p(x) = x 2 ϕ(x) which we discussed in Example 2.11.Hence, if assuming that X n are independent and Z ∼ N (0, 1) is independent of all X n .Note that (4.1) does not always define a characteristic function.For example, when there is only one term in the product, we have [20], p. 34).We will return to this question in Section 8.
Properties of the class L. Following Newman [23], let us emphasize several remarkable properties of strictly subgaussian distributions whose characteristic functions have real zeros, only.Starting from (4.3), one can represent the log-Laplace transform of X as Hence the cumulants of even order 2m of X are given for m ≥ 2 by In particular, the distribution of X has to be symmetric about the origin, with (−1) m−1 γ 2m ≥ 0. As was also shown in [23], the cumulants and the moments of X admit the bounds In addition, for all integers k ≥ 0 and t ∈ R, Since the class L is closed under convolutions, the second inequality in (4.5) continues to hold for weighted sums of independent, strictly subgaussian random variables.This provides a natural extension of Khinchine's inequality for Bernoulli sums, as noticed in [24] (cf.also a recent work [12]).

More Examples of Strictly Subgaussian Distributions
In connection with the problem of location of zeros, we now examine probability distributions with characteristic functions of the form where α, β ∈ R are parameters.It was already mentioned that when β = 0, we obtain a characteristic function if and only if 0 ≤ α ≤ 1.As we will see, in the general case, it is necessary that β ≥ 0 for f (t) to be a characteristic function (although negative values of α are possible for small β).
Before deriving a full characterization, first let us emphasize the following.
As already emphasized, if a random variable X is subgaussian (even if it is not strictly subgaussian), its characteristic function f (t) may be extended to the complex plane as an entire function f (z) = E e izX of order ρ ≤ 2 and of finite type like in the strictly subgaussian case (5.2).Since in general f (−z) = f (z), any zero z = x+iy of f (x, y ∈ R) is complemented with zero −z = −y −ix.If in addition the distribution of X is symmetric about zero, then −z and z will also be zeros of f .Thus, in this case with every non-real zero z, the characteristic function has 3 more distinct zeros, and hence we have 4 distinct zeros ±x ± iy, x, y > 0. One can now apply Proposition 5.1 to prove Theorem 1.4.
Proof of Theorem 1.4.Given a random variable X with a symmetric subgaussian distribution, suppose that its characteristic function has exactly one zero z = x + iy in the positive quadrant x, y ≥ 0. We need to show that X is strongly subgaussian, if and only if The case where z = x is real is covered by Theorem 1.2.The argument below also works in this case, but for definiteness let us assume that z is complex, so that x, y > 0 (the case x = 0 and y > 0 is impossible, since then f (z) = f (iy) ≥ 1).
Thus, let f (z) have four distinct roots z 1 = z, z 2 = −z = −x − iy, z 3 = z = x − iy, z 4 = −z = −x + iy.Applying Hadamard's theorem, we get a representation where P (z) is a quadratic polynomial.Since f (0) = 1, necessarily P (0) = 0. Also, by the symmetry of the distribution of X, we have f (z) = f (−z), which implies P (z) = P (−z) for all z ∈ C. It follows that P (z) has no linear term, so that P (z) = − 1 2 γz 2 for some γ ∈ C. Thus, putting w = a + bi = 1 x+iy , we have (5.4) Comparing both sides of (5.4) near zero according to Taylor's expansion, we get that (5.5) In particular, γ must be a real number, necessarily positive (since otherwise f (t) would not be bounded on the real axis).Moreover, the case a = |b| is impossible, since then f (t) = e −σ 2 t 2 /2 (1 + 2b 4 t 4 ).Rescaling the variable and applying Proposition 5.1 with α = 0, we would conclude that the random variable X is not strictly subgaussian.Thus, let a = |b| (as we will see, necessarily γ > σ 2 ).Again rescaling of the t-variable, one may assume that γ = 1 in which case the representation (5.4) becomes One can now apply Proposition 5.1 with parameters α = 2(A − B), β = (A + B) 2 , where A = a 2 , B = b 2 .Since the condition α ≥ 0 is necessary for f (t) to be a characteristic function of a strictly subgaussian distribution, we may assume that A ≥ B (in fact, we have To express this in polar coordinates, put a = r cos θ, b = r sin θ with r 2 = a 2 + b 2 and |θ| ≤ π 2 .Since A ≥ B, that is a ≥ |b|, necessarily |θ| ≤ π 4 , and the above turns out to be the same as Since θ = Arg(a + bi) = −Arg(z), the desired characterization (5.3) follows.

One Characterization of Characteristic Functions
It remains to decide whether or not the characteristic functions in Proposition 5.1 with nonreal zeros do exist.Therefore, we now turn to the characterization of the property that the functions of the form are positive definite.The more general class of functions f (t) = e −γt 2 /2 (1 − αt 2 + βt 4 ), γ > 0, is reduced to (6.1) by rescaling the t-variable.
Proposition 6.1.The equality (6.1) defines a characteristic function, if and only if the point (α, β) belongs to one of the following two regions: 3) The expression on the left-hand sides in (6.2)-( 6.3) is negative, if and only if β < 1 6 .Hence, for such values of β, the parameter α may be negative.
Combining Propositions 5.1 and 6.1, we obtain a full characterization of strictly subgaussian distributions with characteristic functions of the form (6.1).To this aim, one should complement (6.2)-( 6.3) with the bound α ≥ √ 2β.To describe the full region, we need to solve the corresponding inequalities.First, it should be clear that √ 2β is smaller than the right-hand sides of (6.2)-( 6.3) for all 0 ≤ β ≤ 1 2 .In this β-interval, we also have The latter is fulfilled automatically for β ≤ 1 4 .For 1 4 ≤ β ≤ 1 2 , squaring the above inequality, we arrive at the quadratic inequality The corresponding quadratic equation has two real roots, one of which 0.0188... is out of our interval, while the other one belongs to the interval ( 1 3 , 1 2 ).Therefore, the left-hand side in (6.2) should be replaced with √ 2β on the whole interval 0 ≤ β ≤ 1 3 , while the lower bounds in (6.3) should be properly changed for β ≤ β 0 and β ≥ β 0 .That is, we obtain: Proposition 6.2.The equality (6.1) defines a characteristic function of a strictly subgaussian distribution, if and only if Proof of Proposition 6.1.Recall that the Chebyshev-Hermite polynomial H k (x) of degree k = 0, 1, 2, . . . is defined via the identity ϕ (k) (x) = (−1) k H k (x)ϕ(x).In particular, Equivalently, for even orders Therefore, the function in (6.1) represents the Fourier transform of the function whose total integral is f (0) = 1.Hence, p(x) represents a probability density, if and only if Choosing y = 0 and y → ∞, we obtain necessary conditions Assuming this, a sufficient condition for the inequality ψ(y) ≥ 0 to hold for all y ≥ 0 is α ≥ 6β.As a result, we obtain a natural region for the parameters, namely for which f (t) in (6.1) is a characteristic function.
In the case α < 6β, we obtain a second region.Note that the quadratic function ψ(y) = c 0 + 2c 1 y + c 2 y 2 with c 0 , c 2 ≥ 0 and c 1 < 0 is non-negative in y ≥ 0, if and only if c 2 1 ≤ c 0 c 2 .For the coefficients c 2 = β > 0 and 2c Thus, necessarily β ≤ 1 2 , and then admissible values of α are described by the relations in addition to the assumption α < 6β and the necessary conditions in (6.7).If 1 3 ≤ β ≤ 1 2 , we arrive at the desired relations in (6.3), since In the case α < 6β, the upper bound in (6.9) will hold automatically, since So, for the values α < 6β and β ≤ 1 3 , (6.9) is simplified to It remains to take the union of the two regions described by (6.10) with (6.8), and then we arrive at (6.2).

Strictly Subgaussian Symmetric Distributions with Characteristic Functions Having Exactly One Non-trivial Zero
One may illustrate Proposition 6.2 by the following simple example.For β = 1 3 , admissible values of α cover the interval 2/3 ≤ α ≤ 2, following both (6.4) and (6.5).Choosing α = 2/3, we obtain the characteristic function of a strictly subgaussian random variable.It has four distinct complex zeros z k defined by with w = a + bi.Thus, in the complex plane f Moreover, there exists a universal constant 0 < a 0 < 2 −1/4 , a 0 ∼ 0.7391, such that for 0 ≤ a ≤ a 0 and only for these a-values, the property |b| ≤ b(a) is equivalent to the angle requirement Arg(w) ≤ π 8 .As for the values a 0 < a ≤ 2 −1/4 , this angle must be smaller.
Proof.We may assume that b ≥ 0. The function in (7.1) may be expressed in the form with parameters α = 2(A − B), β = (A + B) 2 , where A = a 2 , B = b 2 .Since the condition α ≥ 0 is necessary for f (t) to be a characteristic function of a strictly subgaussian distribution, we may require that a ≥ b, that is, In fact, as easy to check, if w = re iθ , then In order to apply Proposition 6.2, first note that the above parameters satisfy α ≤ 2 √ β.
In this case, the upper bounds in (6.4)-(6.6)are fulfilled automatically.Therefore, we only need to take into account the lower bounds in (6.4)-(6.6).Thus, f (t) in (7.2) represents the characteristic function of a strongly subgaussian distribution, if and only if where is necessary, we should require that a ≤ 2 −1/4 .Moreover, for a = 2 −1/4 , there is only one admissible value b = 0, when w is a real number, w = 2 −1/4 .
Let us recall that in which case there is a strict inequality α > √ 2β for admissible values of α in (6.6).Hence, Arg(w) < π 8 according to (6.3).Thus, Arg(w) < π 8 for the region described in (7.5).Turning to the region of couples (A, B) as in (7.4), let us fix a value 0 < A < √ β 0 .The first inequality in (7.4) is equivalent to which is the same as (7.3).The value B = Therefore, in this a-interval Proposition 7.1 holds true with b(a) = 1 √ 2+1 a.Now, let a 0 < a < 2 −1/4 .Since A < 1  2 , both (7.4) and (7.5) are fulfilled for all B small enough.Indeed, if A ≥ √ β 0 and B = 0, (7.5) becomes which holds with a strict inequality sign.To show that (7.5) is solved as B ≤ B(A) for a certain positive function B(A), it is sufficient to verify that the left-hand side of (7.5) is increasing in B (since the right-hand side is decreasing in B).Consider the function We have Theorem 8.1.Let X be a subgaussian random variable with a symmetric distribution.If all zeros of f (z) with Re(z) ≥ 0 lie in the angle |Arg(z)| ≤ π 8 , then X is strictly subgaussian.Moreover, if X is not normal, then for any t 0 > 0, there exists c = c(t 0 ), 0 < c < σ 2 = Var(X), such that E e tX ≤ e ct 2 /2 , |t| ≥ t 0 .(8.1) In the proof of (8.1) we employ Proposition 2.5, which asserts that (8.1) would follow from the property that the function t → log E e √ tX is concave on the positive half-axis t ≥ 0 (in the symmetric case).In this connection recall Proposition 5.1: A random variable ξ with characteristic function is strictly subgaussian, if and only if β ≥ 0 and α ≥ √ 2β.In fact, the latter description is also equivalent to the concavity of the function That is, we have: , and then the function R(t) = αt − Q(t) is convex and non-decreasing.
Indeed, by the direct differentiation, from which the claim readily follows.
Proof of Theorem 8.1.We may assume that X is not normal.By the symmetry assumption, with every zero z = x + iy, we have more zeros ±x ± iy.So, one may arrange all zeros in increasing order of their moduli and by coupling ±z 1 , ±z 1 , . . . .Let us enumerate only the zeros z n = x n + iy n lying in the quadrant x n ≥ 0, y n ≤ 0 and deal with −z n , zn , −z n as associated zeros.If z n is real, then we have only one associated zero −z n .For simplicity of notations, let us assume that all zeros are complex.
Since X is subgaussian, the characteristic function f (t) may be extended from the real line to the complex plane as an entire function satisfying for some constant b ≥ 0. Therefore, f is a ridge entire function of order ρ ≤ 2 and of a finite type like in the strongly subgaussian case.Thus, Hadamard's theorem is applicable, with parameters ρ ≤ 2 and p ≤ 2. In this case, the representation (3.3) takes the form Here, the genus of the canonical product satisfies p ≤ 2, and P (z) is a polynomial of degree at most 2 such that P (0) = 0. Thus, putting in the sequel for some β, γ ∈ C, where By the symmetry assumption, f (−z) = f (z) for all z ∈ C. Since also π p,n (−z) = π p,n (z), we conclude that β = 0. Put There are three cases for the values of the genus, p = 0, p = 1, and p = 2, for which where These functions are real-valued for z = t ∈ R, as well as f (t), by the symmetry assumption on the distribution of X.Hence, necessarily γ ∈ R.Moreover, we have γ ≥ 0, since otherwise f (t) would not be bounded on the real axis t ∈ R.
Since Arg(z n ) = −Arg(w n ), we have Arg(w n ) ≤ π 8 , by the main angle hypothesis.In particular, a n > b n > 0 so that α n > 0 (since x n > 0, y n < 0).As already noticed in the proof of Theorem 1.4, the angle hypothesis is equivalent to the relation with positive factors given by We have already observed in the proof of Proposition 5.1 that, by the angle hypothesis, 1 + α n t 2 + β n t 4 < e αnt 2 , t > 0, ( so that Q 2,n (it) < 1.Moreover, this inequality was strengthened by improving the constant α n in the exponent, provided that t is bounded away from zero.We will thus repeat some steps from the proof of Proposition 5.1.However, formally, we need to consider the three cases separately according to the three possible values of p. Genus p = 2.By the very definition of the genus, Since Q 2,n (it) = 1+O(β n t 4 ) as t → 0, the product in (8.4) is absolutely convergent.Moreover, the right-hand side of (8.4) near zero is 1 + γt 2 + O(t 3 ).Hence, necessarily γ = σ 2 , and (8.4) becomes Recalling the bound Q 2,n (it) ≤ 1, we conclude that which means that X is strictly subgaussian.For the second claim of the theorem, write where and define By Lemma 8.2, and using the assumption α 2 n ≥ 2β n , all R n (s) > 0 for s > 0, representing convex increasing functions.Hence, W is a convex increasing function with W (0) = 0.It remains to apply Proposition 2.5, and we obtain the property (8.1).
Genus p = 1.By definition, the following sum converges n , the product in (8.4) is convergent.Moreover, the right-hand side of (8.4) near zero is Hence, necessarily 1 2 σ 2 = 1 2 γ + n≥1 α n , so that the characteristic function and the Laplace transform admit the same representation (8.6).As a result, since the summation property defining the genus became stronger, we immediately obtain (8.7) and its improvement (8.1) using the previous step.
Genus p = 0.By definition, the following sum converges Since this assumption is stronger than the one of the previous step, while Q 0,n = Q 1,n , we are reduced to the previous step.
9. Proof of Theorem 1.5 As in the proof of Theorem 8.1, let us enumerate the points z n = x n + iy n lying in the quadrant x n ≥ 0, y n ≤ 0 and deal with −z n , zn , −z n as associated zeros.For simplicity of notations, we assume that all these numbers are complex.Put w n = 1 zn = a n + b n i and define for a given sequence γ n > 0 (to be precised later on) with α n = 2(a 2 n − b 2 n ) and β n = (a 2 n + b 2 n ) 2 as before.By the assumption, a n , b n > 0.Moreover, the angle assumption |Arg(z n )| = Arg(w n ) ≤ π 8 is equivalent to α 2 n ≥ 2β n , which may also be written as Now, if γ n is sufficiently large, f n (t), t ∈ R, will be the characteristic function of a strictly subgaussian distribution.A full description of the minimal possible value of γ n is provided in Proposition 7.1.More precisely, consider the function As we know, g n (t) represents the characteristic function of a strictly subgaussian random variable X ′ n , as long as where the universal constant a 0 was explicitly identified in (7.6), a 0 ∼ 0.7391.Here, the first condition is satisfied in view of (9.1), while the second one is equivalent to Thus, subject to (9.2), f n (t) will be the characteristic function of the strictly subgaussian random variable X n = √ γ n X ′ n , whose variance is given by n , so that the expression in (9.3) would be equal to Λ(a Then the condition (9.2) is satisfied, and also n γ n < ∞.As a result, the series n X n is convergent with probability one, and the sum of the series, call it X, represents a strictly subgaussian random variable with characteristic function 2).By the construction, all f n (z) have exactly prescribed zeros, and

Laplace Transforms with Periodic Components
We now turn to a second class of Laplace transforms -the ones that contain periodic components.Recall that a random variable X belongs to the class F h , h > 0, if it has a density p(x) such that the function is periodic with period h.This section is devoted to basic properties of this class (some of them will be used in the proof of Theorem 1.6).
Proposition 10.1.If X belongs to the class F h , then for all integers m, E e mhX = e (mh) 2 /2 .(10.1) In particular, the random variable X is subgaussian.
Proof.By the periodicity of q, the random variable X + mh has density It remains to integrate this equality over x, which leads to (10.1).Next, starting from (10.1), it is easy to see that E e cX 2 < ∞ for some c > 0.
As a consequence, the Laplace transform L(t) = E e tX , t ∈ R, is finite and may be extended to the complex plane as an entire function.This property may be refined.Proposition 10.2.If X belongs to F h , then its Laplace transforms is an entire function of order 2.Moreover, if EX = 0, it satisfies Proof.We may assume that EX = 0.In this case, by Jensen's inequality, L(t) ≥ 1 for all t ∈ R, so that t = 0 is the point of miminum of L on the real line.Since L(t) is convex (and moreover, log L(t) is convex), L(t) is decreasing for t < 0 and is increasing for t > 0.
Given t ≥ 0, take an integer number m ≥ 1 such that (m − 1)h ≤ t < mh.Then, by (10.1), and using the monotonicity of L, we get (10.3)By a similar argument, L(−t) ≤ e (t+h) 2 /2 .Thus, we obtain (10.2) for real values of z (when y = 0).In the general case, it remains just to note that |L(z)| ≤ L(t), and we obtain (10.2).This bound shows that L(z) is an entire function of order at most 2.
On the other hand, (10.1) shows that L(z) is an entire function of order at least 2.
Proposition 10.3.If X belongs to F h , then the function is periodic with period h.It can be extended to the complex plane as an entire function.Moreover, if EX = 0, it satisfies with C h,y = e (h 2 +y 2 )/2 .
The inequality (10.5) shows that Ψ is an entire function of order at most 2. By analyticity and periodicity on the real line, Ψ(z + h) = Ψ(z) for all z ∈ C. (10.6) Proof.By periodicity of q, changing the variable x = y + h, we have t+h) (y+h) q(y) ϕ(y) e −yh−h 2 /2 dy = L(t) e th+h 2 /2 .
Hence L(t + h) e −(t+h) 2 /2 = L(t) e −t 2 /2 , which was the first claim.Since L(z) is an entire function, Ψ(z) is entire as well.
Next, assuming that EX = 0, one may apply (10.2) which gives Ψ(t) ≤ C h e h|t| with C h = e h 2 /2 .Thus, we obtain (10.5) for real values of z.In the general case, for simplicity let t = Re(z) ≥ 0. By the previous step,
Hence, we need to show that Using contour integration, one may rewrite the first integral in a different way.Given T > 0, consider the rectangle contour with sides so that to apply Cauchy's theorem and write down For points z = t + iy on the contour, we have |e −izx | = e xy ≤ e |x|h .In addition, f (z) → 0 as |t| → ∞ uniformly over all y such that |y| ≤ h.This follows from the fact that the functions t → f (t + iy) represent the Fourier transform of the functions p y (x) = e −xy p(x).Indeed, by the subgaussian assumption, the family {p y : |y| ≤ h} is pre-compact in L 1 (R n ), so that the Riemeann-Lebesgue lemma is applicable to the whole family.As a consequence, where the last integral is convergent due to (11.1).Moreover, by (11.1), the last integrand is equal to e −i(t+ih)x e −ith+h 2 /2 f (t), which coincides with the integrand on the right-hand side of (11.2) multiplied by the indicated factor.This proves (11.2).

Examples Involving Triginometric Series
Theorem 1.6 is applicable to a variety of interesting examples including the underlying distributions whose Laplace transform has the form where Ψ is a 2π-periodic functions of the form Here a k , b k are real coefficients which are supposed to satisfy and c ∈ R is a non-zero parameter.
Proposition 12.1.If P (0) = P ′ (0) = P ′′ (0) = 0 and |c| is small enough, then L(t) represents the Laplace transform of a subgaussian random variable X with EX = 0, EX 2 = 1, and with density p = qϕ, where q is a bounded, 2π-periodic function.This random variable is strictly subgaussian, if P (t) ≥ 0 for all t ∈ R and if c > 0 is small enough.
Proof.The functions of the form u λ (x) = cos(λx) ϕ(x) and v λ (x) = sin(λx) ϕ(x) have respectively the Laplace transforms In this case, the Laplace transform of the function p The requirement P (0) = 0 guarantees that ∞ −∞ p(x) dx = 1.Moreover, according to (12.3), the condition on the parameter c which ensures that the function p is indeed a probability density may be stated as This is fulfilled due to (12.1) when |c| is small enough.Finally, the properties EX = 0, EX 2 = 1 are equivalent to P ′ (0) = P ′′ (0) = 0.
Note that in terms of the coeficients in the series (12.1), the condition P (0) = P ′ (0) = P ′′ (0) = 0 has the form It should also be mentioned that, when P is a trigonometric polynomial of degree N , the function q(x) in (12.3) will be a trigonometric polynomial of degree N as well.
Example 12.2.As a particular case, one may consider the transforms with an arbitrary integer m ≥ 3, where c is small enough.Then EX = 0, EX 2 = 1, and the cumulants of X satisfy γ k (X) = 0, 3 ≤ k ≤ m − 1.

Examples Involving Poisson Formula and Theta Functions
Often, the periodic functions Ψ(t) in (12.1) appear naturally by means of the Poisson formula, rather than as a trigonometric series.Let w(t) ≥ 0 be an integrable, even, absolutely continuous function on the real line with Fourier transform ŵ(x) = represents the Laplace transform of a strictly subgaussian random variable X with EX = 0, EX 2 = 1, which has density p(x) = q(x)ϕ(x), where q(x) is a 2π-periodic function.
Proof.The function Q(t) = m∈Z w(t+2πm) is well-defined (since the series is absolutely convergent), 2π-periodic, and admits a Fourier series expansion This is a well-known Poisson formula, in which the series is understood as a limit of symmetric partial sums, cf.e.g.[27], p. 68.Under (13.1), this series is absolutely convergent and defines a smooth function.By the symmetry ŵ(−k) = ŵ(k), k ∈ Z, this formula takes the form Hence, the condition (12.2) is fulfilled under (13.1), and one may apply Proposition 12.1.
Example 13.2.One may further apply Corollary 13.1 to the theta functions Q(t) corresponding to with an arbitrary parameter σ > 1.

Central Limit Theorem for Rényi Distances
Finally, let us describe the role of subgaussian distributions in the central limit theorem with respect to the Rényi divergences D α defined in (1.3).Consider the normalized sums where X k ' are independent copies of a random variable X with mean zero and variance one.
Assuming that Z n have densities p n for some or equivalently for all sufficiently large n, the following characterization was obtained in [3], which we state in dimension one.it is necessary and sufficient that D α (p n ||ϕ) < ∞ for some n = n 0 , and E e tX < e βt 2 /2 f or all t = 0, ( where β = α α−1 is the conjugate index. Thus, for the CLT as in (14.1), the random variable X has to be subgaussian.In order to obtain this convergence for all α simultaneously, the condition (14.2) on the Laplace transform should be fulfilled for all β > 1.But this is equivalent to saying that X is strictly subgaussian, thus proving Theorem 1.1.
In this connection, it is natural to raise the question of whether or not (14.1)may hold for the critical index α = ∞, which corresponds to the strongest distance in this hierarchy.Note that in the limit case it is defined to be As an equivalent quantity, one may also consider the limit Tsallis distance T ∞ (p n ||ϕ) = ess sup x∈R p n (x) − ϕ(x) ϕ(x) .
Suppose it is finite for some n = n 0 .The following two theorems can be obtained using recent results on the sharpened Richter-type local limit theorem, cf.[4].
Theorem 14.2.Suppose that, for every t 0 > 0, E e tX ≤ δe t 2 /2 f or all |t| ≥ t 0 (14.3) with some δ = δ(t 0 ) ∈ (0, 1).Then Note that (14.3) is a weakened form of the separation property (1.5), which in turn is a sharpening of strict subgaussianity.In particular, this rate for the convergence in D ∞ holds true for all distributions from the class L whose densities p(x) are dominated by ϕ(x).
A similar assertion holds true in the period case.

proving the claim. 8 . 8
General Case of Zeros in the Angle |Arg(z)| ≤ π We are now prepared to prove Theorem 1.3, which covers the case where the zeros of the characteristic function f (z) = E e izX , z ∈ C, of the subgaussian random variable X are not necessarily real, but belong to the angle |Arg(z)| ≤ π 8 .Let us state it once more together with the stronger property (1.5).