Characterizations of Gig Laws: a Survey

Several characterizations of the Generalized Inverse Gaussian (GIG) distribution on the positive real line have been proposed in the literature , especially over the past two decades. These characterization theorems are surveyed, and two new characterizations are established, one based on maximum likelihood estimation and the other is a Stein characterization.


Introduction
The Generalized Inverse Gaussian (hereafter GIG) distribution on the positive real line has been proposed by Good [14] in his study of population frequencies, yet its first appearance can be traced back to Etienne Halphen in the forties [16], whence the GIG is sometimes called Halphen Type A distribution.This distribution with parameters p ∈ R, a > 0, b > 0 has density 2K p ( √ ab) x p−1 e −(ax+b/x)/2 , x > 0, with K p the modified Bessel function of the third kind.The parameters a and b regulate both the concentration and scaling of the densities, the former via √ ab and the latter via b/a.This is the reason why some authors (e.g., Jørgensen [19]) introduce the parameters θ = √ ab and η = b/a, leading to a GIG density of the form f p,θ,η (x) := 1 2ηK p (θ) x η p−1 e −θ(x/η+η/x)/2 , x > 0. ( The parameter p bears no concrete statistical meaning, but some particular values of p lead to well-known sub-models of the GIG such as the Inverse Gaussian (p = − 1 2 ), the Reciprocal Inverse Gaussian (p = 1 2 ), the hyperbolic (p = 0, in which case one also speaks of the Harmonic law), the positive hyperbolic (p = 1), the Gamma (p > 0 and b = 0) or the Reciprocal Gamma distribution (p < 0 and a = 0).
Several papers have investigated the probabilistic properties of the GIG distributions.For instance, Barndorff-Nielsen and Halgreen [1] have studied their convolution properties and established their infinite divisibility as well as interesting properties such as the equivalence X ∼ GIG(p, a, b) ⇐⇒ 1 X ∼ GIG(−p, b, a).Vallois [59] has shown that, depending on the sign of p, the GIG laws can be viewed as the distributions of either first or last exit times for certain diffusion processes, generalizing the well-known fact that the Inverse Gaussian and Reciprocal Inverse Gaussian distributions are respectively the distributions of the first and the last hitting time for a Brownian motion.Madan, Roynette and Yor [36] have shown that the Black-Scholes formula in finance can be expressed in terms of the distribution function of GIG variables.GIG distributions belong to the family of generalized Γ-convolutions, as shown in Halgreen [15] (see also Eberlein and Hammerstein [11] for a detailed proof).Sichel ([54], [55]) used this distribution to construct mixtures of Poisson distributions.
The interesting properties of the GIG have led to the definition of matrix GIG distributions, containing the special case of Wishart matrices for b = 0.In this paper, we only deal with the one-dimensional GIG distribution.Since the matrix case will sometimes be referred to in the course of this survey, let us recall the definition of this distribution for matrix variates (see for instance Letac and Weso lowski [31]).Let r ≥ 1 be an integer.Denote by M r the Euclidean space of r ×r real symmetric matrices.Consider the inner product a, b = tr(ab) on M r .Let M + r be the cone of positive definite matrices in M r .For p ∈ R, a, b ∈ M + r , the GIG distribution with parameters p, a, b is the probability measure on M + r whose density with respect to the Lebesgue measure on M + r is proportional to For an overview of results on matrix GIG laws, we refer the reader to Massam and Weso lowski ( [38], [39]).Throughout the paper, unless explicitly stated, the name GIG distribution refers to the one-dimensional case r = 1.The GIG distributions have been used in the modelization of diverse real phenomena such as, for instance, waiting time (Jørgensen [19]), neural activity (Iyengar and Liao [18]), or, most importantly, hydrologic extreme events (see Chebana et al. [6] and references therein).Despite this popularity of GIG models, statistical aspects of the GIG distributions have received much less attention in the literature than their probabilistic counterparts.The cornerstone reference in this respect is Jørgensen [19], complemented by the stream of literature on general Halphen distributions (of which the GIG or Halphen A is one of three representatives), see e.g.Perreault et al. ([44], [45]) or Chebana et al. [6].Quite recently, Koudou and Ley [24] have applied the Le Cam methodology (see [27]) to GIG distributions in order to construct optimal (in the maximin sense) testing procedures within the GIG family of distributions.
The present paper is concerned with characterization results of GIG laws.In probability and statistics, a characterization theorem occurs when a given distribution is the only one which satisfies a certain property.Besides their evident mathematical interest per se, characterization theorems also deepen our understanding of the distributions under investigation and sometimes open unexpected paths to innovations which might have been uncovered otherwise.Over the years, an important number of such characterization theorems for GIG laws have been presented in the literature, especially over the past two decades.We therefore propose here a survey of existing characterizations (without pretending to be exhaustive), amended by two novel characterizations, one based on the maximum likelihood estimator (MLE) of the scale parameter η = b/a of GIG laws and the other is a so-called Stein characterization.
The paper is organized as follows.A review of known characterizations of GIG distributions is presented in Section 2, while the short Section 3 is devoted to our two new characterizations.

A characterization by continued fractions
The following theorem characterizes the GIG distribution as the law of a continued fraction with independent Gamma entries.We adopt the notation Γ(p, a/2) = GIG(p, a, 0) for the Gamma distribution with parameters p, a > 0 and = d for equality in distribution.
• Let X and Y be two independent random variables such that X > 0 and Y ∼ Γ(p, a/2) for p, a > 0.
• Let X, Y 1 and Y 2 be three independent random variables such that X > 0, then the distribution of the random variable is GIG(−p, a, b).
The proof relies on properties of continued fractions, which are exploited to show that the GIG distribution is the unique stationary distribution of the Markov chain (X m ) m defined by in the case a = b, or by in the general case.
This characterization by continued fractions is one key ingredient in the proof given by Letac and Weso lowski [31] of the characterization of GIG laws by the Matsumoto-Yor property, which we now recall.

The Matsumoto-Yor property
Consider two independent, positive random variables X and Y such that for some p, a, b > 0. The Matsumoto-Yor property is the fact that the random variables are independent.This property has been discovered in Matsumoto and Yor [40] for the case a = b while studying certain exponential functionals of Brownian motion.Letac and Weso lowski have noticed afterwards that this property remains true if a = b, although their paper has finally been published earlier than the Matsumoto-Yor paper from 2001, namely in 2000.Letac and Weso lowski [31] have further proved that this property is in fact a characterization of GIG laws (more exactly, of the product of a GIG and a Gamma law with suitable parameters).
Theorem 2.2 (Letac and Weso lowski, [31]).Consider two non-Dirac, positive and independent random variables X and Y .Then the random variables Let us make a few remarks on the theorem: • As shown in the same paper, this result holds true also for matrix variates (characterization of the product of matrix GIG and Wishart variables) under a smoothness assumption not needed in the scalar case.To prove this, the authors use the extension to the matrix case, established by Bernadac [3], of the above mentioned continued-fraction characterization of GIG distributions.
• An interpretation of this property in terms of Brownian motion has been pointed out by Matsumoto and Yor [41].Massam and Weso lowski [38] have provided a tree-version of the Matsumoto-Yor property, and an interpretation of that tree-version by means of a family of Brownian motions is given in Weso lowski and Witkowski [62].• For p = −1/2, the Matsumoto-Yor property can be seen as a consequence of an independence property established by Barndorff-Nielsen and Koudou [2] on a tree-network of Inverse Gaussian resistances (see Koudou [23]).• Koudou and Vallois ( [25], [26]) investigated a generalization of the Matsumoto-Yor property, by looking for smooth decreasing functions f from (0, ∞) onto (0, ∞) with the following property: there exist independent, positive random variables X and Y such that the variables U = f (X + Y ) and V = f (X)−f (X +Y ) are independent.This led to other independence properties of the Matsumoto-Yor type.In particular, an independence property characterizing Kummer distributions has been established.

Characterizations by constant regression
In a subsequent work, Weso lowski [61] relaxed the independence condition between U and V in Theorem 2.2 by assuming that V and 1/V have constant regression on U .Following Lukacs [35], we say that a random variable Z with finite expectation has a constant regression on a random variable T if E(Z|T ) = E(Z) almost surely.As can be seen in the following theorems, the price to pay for the relaxation at the level of independence is to assume the existence of moments of X and/or 1/X.
Theorem 2.3 (Weso lowski [61]).Consider two non-Dirac, positive and independent random variables X and Y such that E(1/X) < ∞.Let U and V be defined by ( 3), and assume that E(V |U ) = c and E(1/V |U ) = d for some real constants c, d.
One can prove that X is GIG distributed, assuming that Y is Gamma distributed, and vice-versa: Theorem 2.4 (Seshadri and Weso lowski [51]).Consider two non-Dirac, positive and independent random variables X and Y such that E(1/X) and E(X) are finite.Define U and V as in (3) Theorem 2.5 (Seshadri and Weso lowski [51]).Consider two non-Dirac, positive and independent random variables X and Y such that E(Y ) < ∞.Assume that X ∼ GIG(−p, a, b) for some positive p, a, b.
Theorems 2.3, 2.4 and 2.5 are established by proving that Laplace transforms of measures linked with the laws of X and Y are probabilistic solutions of secondorder differential equations.Weso lowski [61] has pointed out that the condition E(V |U ) = c can be expressed in terms of linearity of the regression of Y /X on X + Y , under the form This can easily be seen from the definition of U and V in (3).
Theorem 2.6 (Chou and Huang [9]).Consider two non-Dirac, positive and independent random variables X and Y and define U and V as in (3).Assume that, for some fixed integer r, E(X −r−2 ), E(X −r ), E(Y r ) and E(Y r+2 ) are finite.
Assume that, for some constants c r and c r+1 , Then, c r+1 > c r > 0 and there exists a > 0 such that X ∼ GIG(−p, 2a, 2b) and Lukacs [35] (Theorem 6.1) characterized the common normal distribution of i.i.d.random variables X 1 , X 2 , . . ., X n by the constancy of regression of a quadratic function of X 1 , X 2 , . . ., X n on the sum Λ = n i=1 X i .The following theorem characterizing the GIG distribution belongs also to the family of characterizations of distributions by constancy of regression of a statistic S on the sum Λ of observations.Theorem 2.7 (Pusz [47]).Let X 1 , X 2 , . . ., X n be independent and identically distributed copies of a random variable X > 0 with E(1/X 2 ), E(1/X) and E(X) finite.Consider q > 0 and p ∈ R such that pE(1/X) + qE(1/X 2 ) > 0 and define Λ = n i=1 X i and Then E(S|Λ) = c for some constant c if and only if there exist µ ∈ R, a, b, δ > 0 and X ∼ GIG(p, a, b).
The proof is based on the fact that the GIG distribution is characterized by its moments, on a lemma by Kagan et al. [20] giving a necessary and sufficient condition for the constancy of E(S|Λ), and on a careful manipulation of a differential equation satisfied by the function f (t) = E(1/X 2 ) exp(itX).

Entropy characterization
Characterizing a distribution by the maximum entropy principle dates back to Shannon [53] who showed that Gaussian random variables maximize entropy among all real-valued random variables with given mean and variance.Since then, many examples of such characterizations have appeared in the literature.For instance, Kagan et al. [20] characterized several well-known distributions such as the exponential, Gamma or Beta distributions in terms of maximum entropy given various constraints, e.g.: (i) the exponential distribution with parameter λ maximizes the entropy among all the distributions of X supported on the positive real line under the constraint E(X) = 1/λ; (ii) the Gamma distribution Γ(p, a) maximizes the entropy under the constraints E(X) = p/a and E(log X) = Ψ(p) − log a, where Ψ is the Digamma function; (iii) the Beta distribution with parameters a, b > 0 maximizes the entropy among all the distributions of X supported on [0, 1] under the constraints E(log X) = Ψ(a) − Ψ(a + b) and E(log(1 − X)) = Ψ(b) − Ψ(a + b).This type of characterization exists as well for GIG distributions, as established in the following theorem due to Kawamura and Kōsei [22], for which we recall that the Bessel function of the third kind is defined by Theorem 2.8.The distribution of X with density f on (0, ∞) which maximizes the entropy and is the distribution GIG(p, a, b).
One application of entropy characterizations are goodness-of-fit tests.An instance is Vasicek [60] who designed an entropy test of normality, using Shannon's characterization.In our GIG setting, Mudholkar and Tian [42] established a goodness-of-fit test for the special case of an Inverse Gaussian (IG) distribution based on an entropy characterization that we recall below.Since the IG is a major particular case of GIG distributions -the latter were actually introduced in order to meet the needs of adding a third parameter to the IG distribution-we devote a brief subsection to characterizations of the IG distribution.

Some characterizations of the IG distribution
Of particular interest for the present survey is the Inverse Gaussian distribution with density The IG distribution is useful in many fields for data modeling and analysis when the observations are right-skewed.Numerous examples of applications can be found in Chhikara and Folks [8], Seshadri [49] and Seshadri [50].The IG distribution possesses many similarities in terms of statistical properties with the Gaussian distribution, as pointed out by e.g.Mudholkar and Tian [42].We give here a few results characterizing the IG distribution.

Entropy characterizations
Let us first observe that, as an immediate corollary of Theorem 2.8, we obtain the following (new) entropy characterization of the IG distribution.
Theorem 2.9.Denote by c a,b the value of ∂ p log K p ( √ ab) at p = −1/2.The distribution of X with density f on (0, ∞) having maximum entropy under the constraints Mudholkar and Tian [42] have obtained another entropy characterization of the IG distribution.Inspired by the Vasicek [60] methodology for the entropy test of normality, Mudholkar and Tian [42], as stated earlier, used the above theorem to construct a goodness-of-fit test for the IG distribution.The test statistic is based on a non-parametric estimation of the entropy of 1/ √ X and rejection occurs when this estimation is not close enough to the theoretical value under the null hypothesis of X being IG-distributed.The rejection criterion is assessed at given significance level by critical values obtained via Monte Carlo simulations.

A martingale characterization
Seshadri and Weso lowski [52] established the following martingale characterization of the IG distribution.
Theorem 2.11 (Seshadri and Weso lowski [52]).Let (X n ) n≥1 be a sequence of positive, non degenerate i.i.d.random variables.For n ≥ 1, define , F n n≥1 is a backward martingale if and only if X 1 ∼ IG(a, b) for some a > 0.
Remark.In fact, the result established in Seshadri and Weso lowski [52] is more general than stated in Theorem 2.11.They proved that ( αn Sn − β n , F n ) n≥1 is a backward martingale for some sequences (α n ) n and (β n ) n if and only if X 1 follows one of four distributions, among which the IG distribution.The proof uses the Letac and Mora [28] characterization of natural exponential families with cubic variance function.
We conclude this section by writing down the well-known Khatri characterization of the IG distribution (see Letac and Seshadri [30] and also Seshadri [49] which contains a review of characterization problems for the IG distribution).
Theorem 2.12 (Khatri [21]).Let X 1 , X 2 , . . ., X n be i.i.d.copies of a random variable X such that E(1/X), E(X), E(X 2 ) and E(1/ n i=1 X i ) exist and do not vanish. Let The random variables X and X−1 − 1/ X are independent if and only if there exist a, b > 0 such that X ∼ IG(a, b).
If one considers the parameters µ = b/a and λ = b, then X and 1/( X−1 − 1/ X) are respectively the maximum likelihood estimators of µ and λ.The independence between these estimators characterizes the IG distribution, as the independence between the empirical mean and standard deviation as maximum likelihood estimators of location and scale characterizes the normal distribution.This is one of the similarities between the IG and normal distributions, as observed by Mudholkar and Tian [42].

An MLE characterization
A famous characterization theorem in statistics due to Carl Friedrich Gauss [12] says the following: the sample mean X is for any i.i.d.sample X 1 , . . ., X n of any sample size n the maximum likelihood estimator (MLE) of the location parameter in a location family {f (x − µ), µ ∈ R} if and only if the samples are drawn from a Gaussian population (with variance not fixed).This very first MLE characterization theorem has important implications, as it clearly indicates that, the further one is away from the Gaussian situation, the less efficient becomes the sample mean as estimator.Several other characterization theorems have emerged in the past, linking particular forms of MLEs to a specific distribution such as the one-parameter exponential family (Poincaré [46]), the (negative) exponential distribution (Teicher [58]), the Laplace distribution (Kagan et al. [20]), the Gamma distribution (Marshall and Olkin [37]) or the Harmonic Law (Hürlimann [17]).For a recent overview and a unified theory on these results, we refer to Duerinckx et al. [10].
In this section, our aim is to apply the general result of Duerinckx et al. [10] to the GIG distribution in order to construct an MLE characterization theorem for GIG laws.To this end, we shall rather have recourse to the re-formulation (1) of the density, as in that parameterization the family {f p,θ,η (x), η > 0} is a scale family.Let us first observe, by a simple calculation, that the MLE of η for fixed p and θ is where we recall that X = Xi if X 1 , . . .X n are i.i.d. with common distribution GIG(p, a, b).This enables us to formulate the following.
Proof of Theorem 3.1.We want to apply Theorem 5.1 of Duerinckx et al. [10].To this end, we first need to calculate the scale score function of the standardized GIG density f p,θ (x) := 1 2Kp(θ) x p−1 e − 1 2 θ(x+1/x) defined on (0, ∞).This score, defined as f p,θ (x) , corresponds to One easily sees that ψ f p,θ is invertible over (0, ∞) with image R. Thus Theorem 5.1 of Duerinckx et al. [10] applies and the so-called MNSS (Minimal Necessary Sample Size, the smallest sample size for which an MLE characterization holds) equals 3.This theorem then yields that √ is the MLE of the scale parameter η in a scale family 1 η f (x/η), η > 0 on (0, ∞) for all samples X 1 , . . ., X n of fixed sample size n ≥ 3 if and only if the densities are proportional to x d−1 (f p,θ (x)) d for d > 0, which is nothing else but f pd,θd (x), the (standardized) GIG density with parameters pd and θd.This yields the announced MLE characterization for GIG laws.
Clearly, this MLE characterization theorem for GIG laws contains, inter alia, MLE characterizations for the Inverse Gaussian, the Reciprocal Inverse Gaussian and the hyperbolic or Harmonic Law, hence the characterization of Hürlimann [17] is a special case of our Theorem 3.1.

A Stein characterization
The celebrated Stein's method of normal approximation, introduced in Stein [56], has over the years been adapted to several other probability distributions, including the Poisson (Chen [7]), the exponential (Chatterjee et al. [5]), the Gamma (Luk [34]), the multinomial (Loh [33]), the geometric (Peköz [43]), the negative binomial (Brown and Phillips [4]) or the Beta (Goldstein and Reinert [13]).A first step in this method consists in finding a suitable Stein operator, whose properties determine the quality of the approximation; see Ross [48] for a recent overview on the intricacies of Stein's method.This operator satisfies a Stein characterization theorem which clearly links it to the targeted distribution.
To the best of the authors' knowledge, so far no such Stein characterization has been proposed in the literature for the GIG distribution, whence the reason of the following result., since then This result is a particular instance of the density approach to Stein characterizations initiated in Stein et al. [57] and further developed in Ley and Swan [32].We attract the reader's attention to the fact that we could replace the functions h(x) with h(x)x 2 , turning the GIG Stein operator T f p,a,b (h) into the perhaps more tractable form This new Stein characterization and the associated Stein operator(s) could be exploited to derive rates of convergence in some asymptotics related to GIG distributions, e.g. the rate of convergence of the law of the continued fraction in (2) to the GIG distribution.This kind of application will be investigated in future work.

Theorem 2 . 10 (
Mudholkar and Tian [42]).A random variable X has the IG( b/a, b) distribution if and only if 1/ √ X attains maximum entropy among all absolutely continuous random variables Y ≥ 0 such that E(Y −2 ) = b/a and E(Y 2 ) = a/b + 1/b.

Theorem 3 . 2 .
A positive random variable X follows the GIG(p, a, b) distribution if and only if for any differentiable function h satisfyinglim x→∞ f p,a,b (x)h(x) = lim x→0 f p,a,b (x)h(x) The functional h → T f p,a,b (h)(x) := h ′ (x) + p−1 x + b 2x 2 − a 2 h(x) is the GIG Stein operator.Proof of Theorem 3.2.The sufficient condition is readily checked by noting that h′ (x) + p−1 x + b 2x 2 − a 2 h(x) = (h(x)f p,a,b (x)) ′ f p,a,b(x) x)f p,a,b (x)) ′ dx = 0 by the conditions on h.To see the necessity, write f for the density of X and define for z > 0 the function l z (u) := I (0,z] (u) − F p,a,b (z), u > 0, with F p,a,b the GIG cumulative distribution function and I an indicator function.Then the functionh z (x) := 1 f p,a,b (x) x 0 l z (u)f p,a,b (u)du = − 1 f p,a,b (x) ∞ x l z (u)f p,a,b (u)du is differentiable and satisfies lim x→∞ f p,a,b (x)h z (x) = lim x→0 f p,a,b (x)h z (x) = 0 for all z (since ∞ 0 l z (u)f p,a,b (u) = 0), hence is a candidate function for the functions h that verify(4).Using hence this h z leads to0 = E h ′ z ((X)f p,a,b (X)) ′ f p,a,b (X) = ∞ 0 l z (x)f (x)dx = F (z) − F p,a,b(z)for every z > 0, from which we can conclude that F = F p,a,b .