Estimation of multivariate generalized gamma convolutions through Laguerre expansions

The generalized gamma convolutions class of distributions appeared in Thorin's work while looking for the infinite divisibility of the log-Normal and Pareto distributions. Although these distributions have been extensively studied in the univariate case, the multivariate case and the dependence structures that can arise from it have received little interest in the literature. Furthermore, only one projection procedure for the univariate case was recently constructed, and no estimation procedures are available. By expanding the densities of multivariate generalized gamma convolutions into a tensorized Laguerre basis, we bridge the gap and provide performant estimation procedures for both the univariate and multivariate cases. We provide some insights about performance of these procedures, and a convergent series for the density of multivariate gamma convolutions, which is shown to be more stable than Moschopoulos's and Mathai's univariate series. We furthermore discuss some examples.


Introduction
Olof Thorin introduced the Generalized Gamma Convolutions (GGC) class of distributions in 1977 as a tool to study the infinite divisibility of Pareto [55] and log-Normal [56] distributions.The quite natural and pragmatic question of identifying if some distributions are infinitely divisible or not turned out to be theoretically fruitful, and notably gave rise to the concepts of hyperbolic completely monotony and generalized gamma convolutions.Later, Lennart Bondesson extended these concepts in a book [5] which is still today a good reference on the subject.An introduction to more recent literature about GGC can be found in the two surveys [25,7], on the probabilistic side of the problem.
The GGC class contains a lot of known and commonly used distributions, including heavy-tailed distributions such as Pareto, log-Normal and α-stable distributions.This versatility makes the class appealing for the statistical practitioners working in various fields such as life and non-life insurance, reinsurance, anomaly or fraud detection, floods analysis and meteorology, etc.Other classes of general approximating distributions with high-order features can be considered.Mixtures of Erlangs [11,32,57], for example, have the property of being dense in L 2 [57] and allow for fast k-MLE estimation algorithms [46].On the other hand, they lack crucial closure properties and interpretability that the generalized gamma convolutions have.Surprisingly, on the statistical side, very little work can be found about estimation of generalized gamma convolutions.Only one projection procedure, that is a procedure that projects generalized gamma convolutions onto convolutions of a finite number of gammas, was published recently in [40,19], Although the resulting convergence result is stunning, this procedure cannot handle cases where the incoming density is not already inside the class.This typically happens for any empirical distribution, due to sampling noise, even if the sampling density is in the class.Furthermore, we show that the theoretical background of the estimation procedure in [40,19] is inherently univariate, and no direct extension to a multivariate case is possible.
The multivariate analogue is still an active research field, and, as far as we know, has never been considered as a statistical tool in the literature.In applications, however, statisticians might deal with marginals that are in this class.A way to handle dependence structure in this interpretable framework seems therefore appealing as it would allow studying functions of the random vector, like the sum of components, under dependence assumptions.Therefore, an estimation procedure for multivariate generalized gamma convolutions that takes into account the dependence structure could be useful.The probabilistic background is already developed: based on an original idea from [8], Bondesson introduces a bivariate extension of the Thorin class, and gives some properties about these distributions.Later, [6] extends this concept to the multivariate case.More recently, [47,48] propose another equivalent framework allowing extension on any cone, with application to matrix gamma convolutions.On the other hand, no estimation procedure nor explicit density exist.We discuss this multivariate class and show that the produced dependence structures are highly flexible: non-exchangeability, left tail and right tail dependency are easily achievable.
Regarding tools, the literature gives two series expansions for the density of univariate gamma convolutions: Mathai [35] proposes a density based on Kummer's confluent geometric functions and Moschopoulos [42] refined it as a convergent gamma series, with an explicit truncation error.Both these densities have the problem of being based on the smallest scale parameter, and are not well conditioned when the smallest scale is too small.Unfortunately, this is typically the case when approximating a log-Normal or a Pareto distribution by a finite convolution of gammas.Therefore, no stable procedure for the density computation is available for the entirety of the parameter range.We are not aware of any density estimation available in the literature for the multivariate case.
We consider here the problem of estimation from samples of multivariate generalized gamma convolutions.Indeed, in some practical cases such as insurance losses modeling, the random variable we model is supposed to be the sum of a random number of independent and identically distributed (i.i.d.) random amounts, and hence is inherently divisible.This is one of the reason that led Thorin to study these concepts: he wanted to ensure that the approximating models (a log-Normal or Pareto distribution for example) fulfil this property.We want to leverage this divisibility for joint statistical analysis.The parametric divisibility, i.e., the possibility of constructing the parametric distribution of the piece easily, is therefore an appealing property for a model.The multivariate generalized gamma convolution class is one of these models that allows easy parametric division of the estimated distributions in a multivariate setting, while staying general enough and allowing for a variety of dependence structure and marginal behaviors.
By the introduction of a specific Laguerre basis, which was already considered for non-parametric deconvolution problems by [17,34,2], we manage to obtain a series expansion for the density.Classical deconvolution problems usually consider only one source of signal and one source of noise [36,38,59,53].Although we have here a finite number of signals to be estimated, the Laguerre expansion is still a useful tool.We show that this expansion is quite natural for our problem, as it includes and generalizes Moschopoulos's univariate series.Moreover, we introduce a new quantification of the "well-behavior" of a multivariate gamma convolution, that we show to be equivalent to the exponential decay of Laguerre coefficients.Using this quantification, we bridge the gaps and provide a new stable algorithm for the evaluation of theoretical densities in both the univariate and multivariate cases, as well as consistent parameter estimation procedures which handle both clean data (the density to be estimated is given as a formal function) and dirty data (such as an empirical dataset in 64 bits precision).The resulting algorithms are implemented in the Julia package ThorinDistributions.jl, available on GitHub1 under a MIT License and archived in Zenodo [31].
After fixing notations, Section 2 covers some definitions, properties, and algorithms to set up the stage.Section 3 considers the Laguerre expansion of densities in the multivariate Thorin class, studies their regularity and discusses the control of their error through the new quantification and develops an estimation procedure for these distributions.In Section 4, we investigate the numerical results of our estimation procedure on several examples.Section 5 concludes and gives leads for further work.

Gamma convolutions classes
As we deal with a lot of multivariate objects, we start by fixing some notations.
We use bold letters such as a to designate generally indexable objects, e.g., vectors or matrices, and corresponding indexed and possibly unbolded letters, such as a i , designate values in these objects.We use Cartesian indexing.For example, if we consider a row-major matrix r, then r i denotes the i th row of that matrix, r i,j the j th value of that row and if k = (i, j), then r k = r i,j .
We denote |x| = x 1 + . . .+ x d the sum of components of a vector x.The product of factorials of component of an integer vector k is denoted k! = k 1 !...k d !, and we set Inequalities, sums, and products between vectors of same shape are always intended componentwise, and standard broadcasting between objects of different shapes applies.For example, 1 + 2x, 1 + 2x and (1 + 2x) x∈x all denote the same object, which has same shape as x.
All over the paper, d ∈ N * and X = (X 1 , ..., X d ) will denote a d-variate positive and real-valued random vector, abbreviated X = X 1 when d = 1.
To characterize the distribution of the random vector X, we might use several functions of the slack variables x ∈ R d + and t ∈ C d , with the same convention that When they exist, the probability density function (pdf) of X, f (x) = ∂ ∂x F (x), its moment generating function (mgf) M (t) = E e t,x or its cumulant generating function (cgf) K(t) = ln(M (t)) can also be used to characterize the distribution.Note that moment generating functions and cumulant generating functions are not always defined on the whole C d .We nevertheless consider by default t ∈ C d as there is usually no ambiguity on their domains.We denote by L 2 (R d + , w) the set of functions that are Lebesgue square-integrable on the space R d + w.r.t. a weight function w, and we abbreviate it by L 2 (R d + ) when w is the identity function.
The following subsections set the stage of our analysis by reviewing common tools from the literature.

Definitions and first properties
A random variable X is said to be Gamma distributed with shape α ∈ R * + , scale s ∈ R * + and rate 1 s , which we denote X ∼ G 1,1 (α, s), if it admits the cumulant generating function (cgf): We denote by G 1,1 this class of distributions.Recall that the cumulant generating function of the sum of independent random variables is the sum of their cumulant generating functions.Therefore, gamma distributions are clearly infinitely divisible with gamma distributed pieces (same scale and smaller shape).We now consider independent convolutions of these distributions: Definition 2.1.1 (G 1,n and G 1 ).Let X be a random variable with support R + .
(i) X is a gamma convolution with shapes α ∈ R n + and scales s We denote this class of distributions G 1,n .
We denote by G 1 this class of distributions.
In both cases, K is defined on its (complex) domain of convergence.
The generalized gamma convolutions class G 1 , which is commonly called the one dimensional Thorin class, has received a lot of interest in the literature.These limits of convolutions are not dense in the set of positive continuous random variables, but they contain many interesting distributions, including the log-Normal and Pareto distributions as historically important cases [55,56].They have several interesting properties: noteworthy is the fact that the G 1 class is closed with respect to independent sums of random variables, by construction, but also, as a more recent result shows, by independent products of random variables [7].In fact, G 1 can be defined as the smallest class that is closed through convolution and that contains gamma distributions.See [5] for an extensive study of these distributions2 .
Remark that if the Thorin measure ν of X ∼ G 1 (ν) has a finite discrete support of cardinal n, then X ∼ G 1,n (α, s) for α, s vectors such that ν = i α i δ si (where Mathai [35] provides a series expression for the density of G 1,n random variables based on Kummer's confluent hypergeometric functions.Later, Moschopoulos [42] refined the result by providing the following gamma series: Property 2.1.2(Moschopoulos expansion).For X ∼ G 1,n (α, s), denoting without loss of generality s 1 = min s, the density f of X is given by the fol-lowing uniformly convergent series: where coefficients δ k are given in [42].
The dependence of this expansion onto the smallest scale parameter s 1 has a major drawback.When the smallest scale is close to zero, the corresponding truncated series is quite unstable.A simple show-case is the simulation of random numbers from a G 1,2 10, 10 −3 , 1, 10 −3 distribution (which has, e.g., mean 10.000001 and variance 10.000000001): the Moschopoulos implementation from the R package coga [24] gives a density that evaluates to 0 on all random numbers it itself simulated, which is obviously wrong.Mathai's method, implemented in the same package, produces the same result.Sadly, as later implementation will show, parameters that correspond to approximations of useful distributions such as log-Normal, Weibull or Pareto usually have very small scales and trigger the same instability.
A likelihood approach to fit distributions in the Thorin class is therefore not practical, as there is no stable density, which is part of the reason for which there are currently no estimation procedures in the literature.
In [5,6], Bondesson defined a class of multivariate convolutions of gamma distributions based on the following idea from Cherian [8].Suppose that Z 0 , Z 1 and Z 2 are three independent gamma random variables with (respective) shapes α i and scales s i , with w.l.o.g.s 0 = 1.Then the random vector (s has a meaningful structure of dependence, while retaining gamma marginals.Indeed, its cgf writes: This We denote by G d,n this class of distributions. (ii) X is an (untranslated) multivariate generalized gamma convolutions with Thorin measure ν, denoted X ∼ G d (ν), if it is a weak limit of d-variates gamma convolutions.Its cgf writes: for a positive measure ν of R d + , the Thorin measure, with suitable integration conditions (see [47]).We denote by G d this class of distributions.In both cases, K is defined on its (complex) domain of convergence.
Note that X ∼ G d,n (α, s) if and only if there exists a vector Z of n independent unit scale gamma random variables with shapes α 1 , ..., α n such that X = s Z.The j th marginal X j has distribution G 1,n (α, s .,j).We can interpret this as a decomposition of the random vector X into an additive risk factor model with gamma distributed factors.Note that we allow scales to be zero, and some factors might therefore appear only in some marginals.As in the univariate case, we close the class by taking convolutional limits.For the analysis of these distributions, we refer to [5,6], but also [47] which uses a slightly different but equivalent parametrization, allowing a generalization to other cones than R d + .Consider a distribution F , not necessarily in G d , that generates our observations.We are here interested in parametric estimators in G d of this distribution, with particular interest for estimators in G d,n G d .Indeed, G d,n models have a structure that allows fast simulation, and that provides meaningful insight about the dependence structure, since a G d,n distribution follows an additive risk factor model.On the other hand, distributions in G d with a non-atomic Thorin measure are hard to sample (see [4,49] for potential solutions to this problem).They have no known closed-form density or distribution function, and even the cumulant generating function requires integration to be evaluated.
Before diving into statistical considerations and estimation of these distributions in Section 3, we give here some properties about the dependence structure that can be achieved in the G d class, and about the regularity of the density.
Recall that a real valued random vector X with distribution function F and marginal distribution functions F 1 , ..., F d is said to be independent if F (x) = d i=1 F i (x i ) and comonotonic if F (x) = min (F 1 (x 1 ), ..., F d (x d )).For more details on dependence structures, see [44], in particular Theorems 2.3.3 (Sklar), 2.5.4 and 2.5.5 (Fréchet-Hoeffding bounds).Note that G d,1 random vectors are, by construction, comonotonic.
Between these two extreme cases, there are many available structures of dependence for a G d random vector.The following properties give more insights about the relation between the dependence structure of the random vector and the dependence structure of its Thorin measure: Property 2.1.4(The support of the Thorin measure).Let X ∼ G d (ν), and let Sup(ν) be the support of ν.Then we have: (i) The marginals of X are mutually independent if and only if In this case, if we denote for all j ∈ 1, ..., d, where ν j denotes the Thorin measure of the j th marginal, and the total masses of ν, ν 1 , ..., ν d are such that ν( The marginals of X are comonotonic if and only if there exists a constant In this case, the multivariate Thorin measure and its marginals have all the same total mass. (iii) The r.v.X has a Lebesgue-square-integrable density if D ≥ d, where D denotes the minimum integer such that there exists constants c 1 , ..., c D such that Sup(ν) S ci .On the other hand if D < d, X is a singular random vector with a D-dimensional support (which is obviously the case when ν is atomic with less than d atoms).
Proof.See Appendix A.
The link given at point (iii) of Property 2.1.4between the regularity of the density and the constant D (which, for atomic Thorin measures, is the rank of the matrix s), will be investigated and quantified further in the next section.
Between the independent and the comonotonic cases, there is a wide range of achievable dependence structures.For example, since the class G d is closed w.r.t.convex convolution, every (positive) value of dependence measures, such as Kendall tau or Spearman rho, are achievable.Some examples in Section 4 also exhibit highly asymmetrical dependence structure and tail dependency.
For the study of the relation between the dependence structure of X and the dependence structure of ν, the distribution proposed in Example 2.1.5 is of interest.
This random variable has the nice property that its has uniform marginal Thorin measures, but is not G d (c ν )-distributed, where c ν is the copula of ν.
There is in fact a bijection between the copula of the random vector and the copula of its Thorin measure, but conditionally on the marginal distributions.Therefore, an estimation scheme that separates the dependence structure from the marginals seems not possible.
Before discussing the estimation of these distributions, we still need to introduce in the next subsection some specific integrals, which correspond to derivatives of the moment generating function and the cumulant generating function of a random vector.

Shifted moments and shifted cumulants
We formally introduce the shifted moments and the cumulants of a random vector, and present a known and useful result that maps moments to cumulants and vice-versa.
Recall that X is a random vector of dimension d, with moment generating function M and cumulant generating function K defined by: We now introduce specific notations for the derivatives of these two functions.
Definition 2.2.1 (Shifted moments and cumulants).For X a random vector of dimension d, for i ∈ N d , define the t-shifted i th moments µ i,t and cumulants κ i,t as the i th derivatives of (respectively) M and K, taken at t: They correspond to the moment and cumulants of an exponentially tilted version of the random vector X, sometimes called the Escher transform [18] of X.
Although standard Monte-Carlo estimators for shifted moments are unbiased and easy to compute, shifted cumulants are a little harder to estimate from i.i.d.samples of a random vector.We refer to [52,51] for some unbiased estimators of multivariate cumulants, and [30] for an application to the estimation of stable laws.To switch from moments to cumulants, Property 2.2.2 gives a bijection between the two: Property 2.2.2 (Bijection between moments and cumulants).For a given shift t ∈ C d and m ∈ N d , the sets (µ i,t ) i≤m and (κ i,t ) i≤m are in bijection.
Proof.Since (µ i,t ) i≤m and (κ i,t ) i≤m are respective Taylor coefficients at t of exp •K and K, they are in bijection through the multivariate Faà Di Bruno formula [10,41,16].
Remark that this bijection can be expressed in many ways: most analysis in the literature uses multivariate Bell Polynomials [3,12,27,37] for this task.For computational purposes, there exists some practical functions that give coefficients of Bell polynomials [58,20], even for d > 1.However, in all generality, the number of coefficients to compute and store is exponential with m making this method quite unpractical, as the Bell polynomials are combinatorial in nature.
To switch from cumulants to moments, the good property of the exponential function of being its own derivative allows a recursive approach, which is described in full details in [39].We adapted the notation and simplified the indexes conventions from [39], which allowed, after adding a slack variable to handle the dimensionality index, for an even faster implementation.The expression in Algorithm 1 below uses derivatives of the cgf and mgf as it is the case that we need, but the derivative of the exponential of any function could be computed through exactly the same algorithm, knowing the derivatives of the exponent.
Algorithm 1: Recursive computation of µ from κ. (see [39]) Note that the main loop of Algorithm 1 must be done in the right order such that each µ k is computed before being used.This recursive formulation has stunning results and is at the moment the fastest way this computation can be done 3 .Unfortunately, no equivalently fast procedure can be found the other way around, as the derivatives of the log function are more complicated.
We now dive into the estimation of multivariate gamma convolutions.We highlight in the next subsection the univariate projection procedure from [19,40].
Suppose that we have a density in G 1 , given as a formal function.Then, for a certain n ∈ N, the procedure provided by Miles, Furman & Kuznetsov in [40,19], hereafter called "the MFK procedure", gives expression for the shapes α and scales s of a G 1,n (α, s) that is exponentially close to the desired density.We will not describe the algorithm here, we refer to the original article for the details as the exposition is quite long, but we will comment on the results and caveats and use it as a comparison basis for later experiments.
The MFK procedure solves a univariate moment problem for the Thorin measure.This moment problem can be highlighted by deriving the cumulant generating function of a G 1 random vector.Indeed, remark that, for X ∼ G 1 (ν) and t ≥ 0, and use the change of variables x = s 1+st , mapping R + to 0, 1 t .You obtain a moment problem of the form: are moments that the measure ξ needs to fit.
We can then obtain a solution with n atoms for the measures ξ and ν by solving a truncated version of this moment problem.In the univariate case, K (1) is a one-dimensional Stieltjes function, and therefore this moment problem can be efficiently solved via Padé approximants (see [13]).
The first drawback of MFK is that the computed moments ξ need to be inside the moment cone of positive measure on 0, 1 t .For the projection of a formal density already in G 1 on G 1,n , this condition imposes estimations of shifted moments (from which ξ are derived) with at least 1024 bits of precision (about 300 digits).On the other hand, if ξ is not a sequence of moments of some measure on 0, 1 t , then the moment problem becomes unsolvable, and MFK fails to provide correct (within boundaries) atoms and weights for the measures ξ and ν.From an empirical dataset, even if the true distribution lies in G 1 , evaluation of µ through Monte-Carlo with enough precision is therefore impossible.
Through this moment problem interpretation, the approach could nevertheless be carried out in higher dimensions, for projections of densities from G d to G d,n , with an even stronger integration precision requirement as the dimension increases.However, the results about the complete factorization of the Padé approximants denominator of a Stieltjes function, which drastically simplifies the solution of the moment problem, are not true for d > 1.This possibility is nevertheless discussed in the next section.

Construction of an estimator in G d,n
We attempt a multivariate extension of the MFK algorithm.In a multivariate setting, through the moment problem interpretation of MFK algorithm developed in Subsection 2.3, we can express the corresponding multivariate moment problem.
For X ∼ G d (ν), consider K the cumulant generating function of X, given in Definition 2.1.3as: where t is in the (complex) domain of convergence of K.
Assuming |i| ≥ 1 and the real parts (t) of t are all non-positives, we have that and, therefore, Note that (t) ≤ 0 is inside the region of convergence of the function, and use now the (bijective, continuous) change of variables: x, −t ≤ 1 , yielding an equivalent problem of finding the measure ξ that solves the more standard moment problem: where we denoted ξ = κ i,t (|i|−1)!i≤m the moments that ξ needs to match according to Equation (1).
Finding an n-atomic measure ξ solution to these equations is equivalent to finding the parametrization of a G d,n distribution that fulfills the cumulant constraints.The corresponding Generalized Moment Problem is known as a hard problem in the literature, but can be solved through semi-definite positive relaxations, following, e.g., [21,45,22].Sadly, these algorithms and the corresponding literature focus on the first case, where ξ are, indeed, moments of a measure supported on the simplex ∆ d (−t), which is the case only if X ∈ G d,n and if ξ are computed with enough precision, as it was in univariate settings.
If the knowledge about the distribution of X is given through empirical observations, with sampling noise, or if the true distribution is not in the G d,n class, these moments will not belong to the moment cone, and the moment problem will have no solution.By projecting the moments onto the moment cone, we could force the known algorithms to provide a solution, but the projection onto this cone is not an easy task (see [14,15] for details on moment cones).
The estimation of an n-atomic Thorin measure from samples of the random vector X therefore rises several questions, both in univariate and multivariate cases.If we empirically compute cumulants, we will stand outside the moment cone and the moment problem will have no solution.Then, should we consider Equation (2) as a multi-objective optimization problem ?Do we seek a Pareto front, or can we consider that some moments are less important than others ?If we do, how do we weight the several objectives ?
To produce a meaningful loss to minimize, we will weight these moment conditions.For this purpose, we leverage a certain Laguerre basis of L 2 (R d + ) to expand the density of random vectors in G d,n , and to construct a coherent loss for this, potentially impossible to solve, moment problem.

The tensorized Laguerre basis
The standard Laguerre polynomials [43] form an orthogonal basis of the set L 2 (R + , e −x ) of square integrable functions with respect to the weight function x → e −x .From these polynomials, we can extract the following orthonormal basis of the set L 2 (R d + ) of functions that are Lebesgue square-integrable on R d + : Definition 3.1.1(Laguerre function).[9,34,17] For all k ∈ N d , define the Laguerre functions ϕ k with support R d + by: For every function f ∈ L 2 (R d + ), for every k ∈ N d , coefficients of f in the basis are denoted by: and we have, since the basis is orthonormal, that f (x) = Furthermore, for X a random vector with density f ∈ L 2 R d + and shifted moments µ, we have by simple plug-in of f and ϕ k into the expression of a k that: We now denote For X ∼ G d,1 (α, s), there exists an explicit bijection between the d + 1 first Laguerre coefficients (with index k s.t.|k| ≤ 1) and the d + 1 parameters, whose expression is given in the proof.
Proof.See Appendix A.
This example is quite peculiar since by Property 2.1.4the random vector is singular.Note that the edge case ∃i ∈ 1, .., d, r i = 0, in which the i th marginal is identically zero, is included in the previous result.
We do not know if the same kind of inversion could be carried out analytically for the coefficients of G d,n densities, since Laguerre coefficients are not as trivially expressed in all generality.Furthermore, this kind of parametric inversion would not work for empirical datasets or densities outside the class, and is therefore not our focus here.However, we can efficiently compute coefficients from parameters of a G d,n random vector through Algorithm 2, based on Algorithm 1: according to Eq. ( 3) end

Return a
The complexity of Algorithm 2 is quadratic in |m|.Note that, as in Algorithm 1, computations need to be performed in the right order.Sometimes, the Laguerre coefficients a = (a k ) k≤m overflow the Float64 limits, but the implementation we provide in the Julia package ThorinDistributions.jl [31], ensures that the computations do not overflow by using multiple precision arithmetic when needed.Furthermore, as in Algorithm 1, the use of Miatto's fast recursion gives this algorithm a good efficiency, even for reasonably large d, n, m.
By reordering the coefficients and leveraging generalizations of Laguerre polynomials, we have the following link with Moschopoulos's density: Remark 3.1.3(Generalized Laguerre basis and Moschopoulos density).Note that Laguerre expanded densities are expressed as (infinite) gamma mixtures, as was Moschopoulos's density in the univariate case (see Property 2.1.2).Furthermore, following [9], the Laguerre basis is defined through univariate orthogonal polynomials w.r.t the weight function e −x , i.e a G 1,1 (1, 1) density.If, instead, we use the weight function x ρ e −x , ρ ≥ 0, corresponding to a G 1,1 (1 + ρ, 1) den-sity, the associated orthogonal polynomials are the so-called generalized Laguerre polynomials, and a slightly different associated orthonormal basis of L 2 (R d + ) is obtained.In the one-dimensional case, if we chose ρ to be the total mass of the Thorin measure, we retrieve Moschopoulos series from Property 2.1.2as a Laguerre expansion.
Sadly, even if mixtures of gammas are easier to fit by k-MLE procedure [46,50], we have no way of identifying the subspace of coefficients that would match a true convolution of gammas: as [5] noted in the univariate case, generalized gamma convolutions can be expressed as mixtures of gammas, but there is no simple reverse mapping.Last but not least, if the expansion through generalized Laguerre polynomials with a parameter corresponding to the total mass of the Thorin measure would converge faster, we would have no way of estimating this parameter beforehand from data, and for many useful cases it would be infinite (log-Normals among others).
We can now compute efficiently these Laguerre coefficients for G d,n distributions.In the next subsection, we provide a control on their decay.

Exponential decay of Laguerre coefficients
We now consider the convergence of these Laguerre expansions.Under simple technical assumptions on the parameters of a G d,n random vector, Laguerre coefficients are (uniformly) exponentially decreasing.Theorem 3.2.1 provides this bound: Of course, we did not define yet the concept of ε-well-behaved gamma convolution, which will be central in the proof.Before exposing the main proof, we start by a few technical statements.Lemma 3.2.2gathers some useful properties about a particular univariate Möbius transform.Lemma 3.2.2(Property of a Möbius transform).Let h be the Möbius transform: For a ∈ C and b ∈ R + , we denote the disc with center a and radius b by: Also denote c(t) = t 2 +1 t 2 −1 and r(t) = 2t t 2 −1 , and finally denote C − and C + the left half and right half of the complex plane.The following holds: (i) h is its own inverse, i.e., h(h(t)) = t, and h has a simple pole at t = 1 which has limit +∞ if Re(t) > 1 and −∞ if Re(t) < 1, t → 1.
(ii) h(1/t) = −h(t) for all t ∈ C and |h(t)| > 1 for all Re(t) > 0, We are now in position to define the concept of ε-well-behaved gamma convolutions, which is a specific regularity assumption on a parametrization of generalized gamma convolutions.Unfortunately, it is a little cumbersome to describe as Definition 3.
If there exists ε > 0 such that the model is ε-w.b., we say that the model is well-behaved, in short w.b.
Of course, any ε 1 -w.b. gamma convolution is ε 2 -w.b. for any 0 < ε 2 < ε 1 .Property 3.2.4gives equivalent statements for the well-behavior of a gamma convolution, hopefully clarifying the requirements from Definition 3.2.3.Property 3.2.4(Well-behaved gamma convolution).For a gamma convolution G d,n (α, s) with Thorin measure ν, the following statements are equivalent: (iv) ν(R d + ) > 1 and there exists no c 1 , . . ., c p linearly independent, p < d, such that where S c is as defined in Property 2.1.4.
Proof.First, (i) ⇐⇒ (ii) is just the definition of well-behavior.Furthermore, (iv) is equivalent to (iii): a subset of scales s I such that i∈I α i > i / ∈I α i is a subset of scale that has the majority of the weight of the Thorin measure assigned to it, and Ker(s I ) = 0 if and only if at least d linearly independent directions are inside s I .Therefore, we only show (ii) ⇐⇒ (iii).
By Rouché-Capelli's Theorem, see [54], any complex solution (if there are some) of the linear system The sketch of the proof is as follows: we construct a multivariate complex function R that has the Laguerre coefficients as its Taylor coefficients, and we show by a singularity analysis that this function is analytic around the origin, in a polydisc with polyradius 1 + ε.We conclude by applying Cauchy's inequality.
As the one-dimensional case allows for more detailed computations, we start by working for all d, and split later into the two cases d = 1 and d > 1.
We follow the path of [26] to express the Laguerre coefficients (a k ) k∈N d as the Taylor coefficients of a certain (d-variate, complex) function R. From the computation of the moment generating function M (t) of the Laguerre expanded density f (x) = k∈N d a k ϕ k (x), we have: which implies that: Now denote h(t) = (h(t)) t∈t , so that h applies the Möbius transform h(t) = t+1 t−1 from Lemma 3.2.2componentwise.Using the substitutions y = h(t) ⇐⇒ t = h(y) (since h is its own inverse), which implies that (1 − t) We now study the regularity of the function R, which is tightly related to the regularity of M .Recall first the expression of the moment generating function for our random vector: Denote by V R the singular variety of R, i.e., the set of points where R is not analytic.From the singularities of y → (1 − y) −1 and from those of M , we have: Consider first the fact that, since the model is ε-w.b., |α| > 1.Hence, M (h(y)) dominates (1 − y) −1 when y goes to 1 (as h(y) goes to infinity) and therefore R is analytic at 1. Thus, we only need to consider the singularities of M (h(y)), and we can further restrict V R : We now split the argument into the cases d = 1 and d > 1, to show that the ε-w.b. condition is equivalent to the analyticity of R on the polydisc with polyradius 1 + ε.
When d = 1, all singularities of R are such that sh(y) = 1 for a scale s ∈ s.In other words, the singularities are at the points y = h( 1 s ) for each scale s ∈ s.Recall that −h(x) = h(x −1 ), and that h is (strictly) decreasing before and after its vertical asymptote at 1. Three cases arise: which corresponds (with |α| > 1) to the one-dimensional ε-w.b. condition.
We now turn ourselves to the multivariate case.When d > 1, the singularities of t → M (t) are the zeros of the function: Any zero t * of G must satisfy the system of equations s I t * = 1 for a given non-empty I ⊆ {1, ..., n}.However, for t * to be unbounded, we must have By the definition of a ε-w.b. model, there exist no such unbounded zeros as, for any t * zero of G s.t.
Thus, the singularities of R, which are images by h of the zeros of G, must have at least one dimension that is outside the centered polydisc with radius 1 + ε.
Hence, whatever d, R is analytic on the centered polydisc with polyradius 1 + ε.
Now, for any d, by Cauchy's inequality, see Theorem 2.2.7 in [23], by taking a ε strictly between 0 and ε we have the wanted finite upper bound on the coefficients: We can now compute efficiently the Laguerre coefficients of densities in G d,n , with an insurance that they would decrease fast if the model is well-behaved.
We propose in the next subsection to finally discuss an estimation algorithm based on an Integrated Square Error loss that leverages these coefficients.

Consistency of the induced empirical loss
Expressing the density of G d,n random vectors into the basis from Definition 3.1.1allows us to truncate the basis and effectively compute an approximated density.From this, by evaluating empirical Laguerre coefficients from data, we will fit the parameters to data by minimizing the L 2 distance between coefficients (since the basis is orthonormal).
For parameters α, s, denote by f α,s the density of the G d,n (α, s) distribution.
For a number of observations N ∈ N, suppose X 1 , ...X N are i.i.d.random vectors from a true unknown density f with support R d + .We would like to minimize the integrated square error: Estimating a k (f ) = f (x)ϕ k (x) dx by a simple Monte-Carlo plug-in we could compute an approximation of this loss.However, we cannot compute all couples of coefficients a k and a k (α, s) for all k ∈ N d , so that we are forced to truncate the basis.
Definition 3.3.1 (G d,n parameters estimator).For a set of i.i.d.random vectors X = (X 1 , ..., X N ), we define parameters estimator of a well-behaved G d,n distribution that matches the observations as: where the approximated truncated loss L m is given by: and where the empirical Laguerre coefficients a k are given by (4).
The loss L m (X, α, s) can be efficiently computed through Algorithm 2. It will be 0 if the first coefficients of the Laguerre expansion of f and of our estimator match perfectly, i.e., assuming f ∈ G d,n and m big enough, if we found the right Thorin measure.Note that if the goal was given theoretically, like in a projection from G d to G d,n , we could compute a k by formal integration with high precision instead of Monte-Carlo.We will discuss this option in the next section.However, even in this case, the error that comes from the truncation of the basis cannot be computed.
To show the consistency of this loss, we use in the following the results from Theorem 3. L m (X, α, s), which depends on the threshold m and on the N observations X. Then: Proof.To show the result, we start by expressing the loss in the Laguerre basis, and we split the basis on indices smaller and greater than m.For the sake of simplicity, we will omit f in a k (f ) and denote a k = a k (f ).We have: where We first show that A a.s.

− −−− →
N →∞ 0 for any fixed m.For A, we can treat each term in the summation independently as the sum is finite.A can be then expanded further by plugging-in the Monte-Carlo estimator ϕ k (X i ) and expanding the square: Furthermore, since we restricted the optimization to well-behaved estimators, each a k ( α, ŝ) is bounded by Theorem 3.
Last but not least, by definition of ( α, ŝ) as global minimizers, since Monte-Carlo estimators are not biased, and (α 0 , s 0 ) is inside the range of optimizations.Therefore, we can conclude that A Since both the true model and the approximation are well-behaved, there exists ε > ε > 0 so that they are both ε-w.b.By Theorem 3.2.1 we have then, uniformly in N , which concludes the argument.
Although the consistency result restricts the parameter space to only wellbehaved Thorin measures, in practice we run our optimization procedures without this restriction, and simply check at the end that the resulting estimators are well-behaved.When d = 1, we can always fix ε to make the estimated gamma convolution ε-w.b. if | α| > 1.In practice, numerical computations have troubles producing collinear scales, so every estimated multivariate gamma convolution is usually well-behaved as soon as the total mass of the produced Thorin measure is greater than 1.Recall that univariate distributions like log-Normal and Pareto have an infinite total mass for the Thorin measure.
We now have a reliable loss to minimize and an efficient algorithm to compute it, allowing us to estimate multivariate gamma convolutions from data.To test the approach, we propose in the next section to discuss some potential applications.

Investigation of performance
In this section, we show the results of our algorithm on several examples, both in univariate and multivariate settings.
Note that, through the recursive Faà di Bruno's formula in Algorithm 2, we produce Laguerre coefficients as high degree multinomials into α, µ 0 and ξ (the simplex projection of the scales).The loss will therefore have a myriad of local minima, making it non-convex and forcing us to use global minimization routines.We found the Particle Swarm [29,33] global optimization routine to work particularly well.
All the implementation was gathered in the provided Julia package, along with the code corresponding to this investigation.Indeed, Julia allowed the use of high-precision arithmetic together with compiled code and global optimizations routines without the (potentially error-prone) reimplementation of at least one of these three features4 , which was not possible in Python, R, Matlab or C++ for example.
We start by the projection of known densities onto the G 1,n class, differentiating the cases when the known density is inside G 1 or not.Then we discuss the estimation of G 1,n models from empirical data, with heavy tailed simulated data, and we finally look at the estimation of G d,n models from multivariate empirical data, with a particular emphasis on the properties of the dependence structure underlying the data.We conclude by a direct application on a real dataset, using the Loss-Alae dataset from Klugman & Parsa [28].

Projection from a known density
The two first examples are a log-Normal and a Weibull, that have the particularity of being respectively inside and outside G 1 .We will also take a look at the Pareto case, which is inside G 1 , but might be outside L 2 (R + ).
We want to compare our algorithm to the MFK procedures, discussed in Section 2.3 on fair grounds.Therefore, we slightly modify our algorithm to use coefficients based on the theoretical integrals µ k,−1 , as MFK.The choice of the input distributions is also guided by [40,19].The experiment is as follows: • We compute shifted moments (µ k,−1 ) k≤2n of the distribution through direct integration of the density, at 1024 bits of precision (to match the 300 digits that MFK requires).
• We use MFK to fit a G 1,n based on these moments.
• Through Eq 3, we compute Laguerre coefficients based on the same shifted moments µ k,−1 , and we minimize the loss given through Algorithm 2 to obtain our approximation.
• We compute Kolmogorov-Smirnov (one-sample, exact) distances between N = 10 000 simulations from the estimated G 1,n model to the theoretical distribution, on B = 100 different simulations.Although MFK has a very strong convergence result, for n = 2 gammas our estimator performs better with the same information (theoretical shifted moments given with 1024 bits precision), and that for n > 2, the performance is overall the same.Indeed, we see that in most cases the two algorithms found parameters that are close to each other.We reported the shapes and scales that both algorithms produced for n = 2, 3, 4 and 5 in Table 1.Although the produced estimations are close to each other on this example, our method has several clear benefits: the full density does not need to belong to G 1 , and we do not require shifted moments up to 300 digits precision.In the next subsection, we describe the same experiment on a Weibull density that is known to make MFK simply fail, even with accurate enough computations.

Projection from densities outside
Let X be a Weibull r.v. with shape k > 0. Its pdf can then be written as: When k > 1, this distribution does not belong to G 1 .In [40,19], authors report negative shapes for the MFK approximation in G 1,2 of the Weibull with shape k = 3 2 , which is clearly a failure.This failure is occurring because the truncated moment problem that MFK is solving has no solution, as we detailed in Section 2.3, since the (normalized) cumulants ξ are not inside the moment cone.
However, the Laguerre basis is an orthonormal basis of L 2 (R + ), which contains this density.The projection procedure detailed at the top of Subsection 4.1 still works correctly: it produces meaningful shapes and scales that are positive, whatever the number of gammas we ask for.Figure 2 displays a Violin plot of KS distances for several numbers of gammas: In abscissa is n ∈ {2, 3, 4, 5, 10, 20, 30, 40}, the number of gammas used in the approximation.Only the Laguerre approximation results are presented.
On Figure 2, we see that the KS distance is decreasing for an increasing number of gammas n, when n is small, but reaches a minimum quite quickly.Indeed, the distance is lower-bounded since the target is not in the class.Although none of the proposed approximations passes the KS test, we will revisit this example later from an estimation point of view to show that the projection is still an acceptable approximation of the input Weibull distribution.
Note that the KS distance is not stable for increasing n: the algorithm has troubles to fix parameters to 0 if needed.An additional penalty term in this direction might be a good solution to overcome this behavior.
Through these two first examples, we showed the projection possibilities of our algorithm, and compared them with MFK, the only existing method in the literature.We saw that for input densities inside the G 1 class our method performs as good as MFK.Moreover, our method still works properly and produces meaningful results for examples outside the class, contrary to MFK which does not give any result.We will now talk about estimation from data, a problem that has never been solved for gamma convolutions, and showcase how our algorithm can find a good representation for a given dataset, whatever the dimension of the dataset.
In the following, we will treat empirical datasets given in standard IEE 745 float64 precision.

Estimation from univariate empirical data
Before coming back in more details to the Weibull case, we wanted to perform some tests with empirical data coming from simulations of two heavy-tailed distributions: log-Normal and Pareto.

Heavy-tailed examples
We consider first a dataset of 100 000 samples from a log-Normal distribution with the same parameter values as in Section 4.1, LN (µ = 0, σ = 0.83), in 64 bits precision.
From this sample, we compute empirical Laguerre coefficients, and then we optimize the set of shapes and scales of the estimated density to minimize the truncated L 2 loss between them and the ones produced through Algorithm 2. Following [19,40], we choose the size of the basis as being m = 2n + 1, such that if we remove the first moment that is irrelevant, we have a number of moments equal to the number of parameters5 .We ran the experiment for different number of gammas n = 10, 20 and 40.The corresponding results are respectively available in Figure 3 for n = 10, and in Figures 12 and 13 for n = 20, 40, see Appendix B. We see on these graphs that the approximation is good enough to fool the Kolmogorov-Smirnov test.The absolute pointwise error between the Laguerre approximation of the density of the gamma convolution and the true log-Normal density is lower than 0.005 for n = 10, 20, and twice smaller for n = 40.A positive but surprising result is that, although the distance between the Laguerre approximation and the true density does not reduce with increasing m, the distance between the gamma approximation and the true density does.
The KS test setup is as follows: we simulate 250 resamples of 100 000-sized dataset from the estimated density, and compare it with a one-sample exact KS test to the true density (and not to the samples that the estimator was computed from).For all values of n, we observe close-to-uniform distribution of the p-values, which is a good result.
We also treated several datasets simulated from Pareto's distribution, with exactly the same experiment.Pareto's distribution has a shape parameter k > 0 that influences the thickness of the tail: the distribution has a variance if and only if k > 2, an expectation if and only if k > 1 and the density belongs to L 2 if and only if k > 1/2.Note that these inequalities are strict.
The fact that the distribution has no variance or no expectation is not really a problem for our procedure, since we use only shifted moments that all Pareto distributions have, but our convergence result requires a square-integrable density.Therefore, we propose to check several values of the shape parameter k, covering all cases, while varying the number n of gammas.Figure 4 summarizes the results by only showing quantile-quantile plots of the estimated distributions against the starting Pareto distribution.All the models follow the same procedure as for the previous log-Normal estimators.Note that even quantile-quantile plots between two simulations from the same Pareto distribution have a tendency to deviate in the extremes (due to the fat tail), which is why we excluded the last points.In each case, we also ran the same Kolmogorov-Smirnov testing procedure as for the log-Normal, and obtained very good uniformity of the p-values, except for k = 0.25 when n is too small.We see on the graph that increasing the number of gammas usually produces better results, but also that the heavier the tail (smaller shapes) the harder it is to get a good fit.
We see on these graphs that the method is accurate enough for reasonably high quantiles, and very good in the body of the distribution.The k = 0.25 case is not in the theoretical boundaries for the convergence result to take place, we have to wait until at least 20 gammas to have an acceptable fit.

An example outside the class
We revisit the Weibull(k = 1.5) case of Section 4.1, a distribution that is not inside the class.We ran our algorithm on 100 000 samples from a Weibull with shape k = 3  2 , with successive numbers of gammas of 2, 10, 20 and 40.The corresponding graphs are shown in Figure 5. Here, we show only the difference on the density estimation and the quantile-quantile plot as one notable thing is that the Laguerre expansion of the density gets better and better as we increase the size of the basis (jointly with the number of gammas), but the estimator does not: the difference to the density is always the same, and always makes Kolmogorov-Smirnov reject the proposed estimator, although quantile-quantile plots indicate correct fits.On these estimations, several things should be noted: • The KS tests reject the hypothesis that the distributions are the same whatever the number of gammas, which was expected.
• The pointwise difference on the density of the estimator does not match the error of the Laguerre expansion (which is getting smaller as the basis grows) but stays the same whatever the size of the basis and the number of gammas.This error represents some kind of distance between the Weibull distribution and the class.
• The quantile-quantile plot however indicates a correct fit on all the distribution except maybe the left tail.
These three points show that our procedure works even for data that are far from the G 1 class, finding some orthogonal projection, relative to the basis, onto the class.This is a very good point for the procedure.
In the next examples, we will showcase some multivariate uses of the procedure, with a particular emphasis on the dependence structure of the simulated data and of the produced estimators.

Estimation from multivariate empirical data
We now discuss multivariate capabilities of our algorithm.As no known algorithm estimates multivariate gamma convolutions, we are not able to compare our results to other procedures.However, we can still check quantilequantile plots for marginals and verify that the shape of the dependence structure matches the inputted one.
As use-cases might vary, we propose two different examples: a multivariate log-Normal, with a Gaussian copula that induces no tail dependency, and a (survival) Clayton copula, including a strong tail dependency.Note that we reduced the exposition to estimation from empirical data, but formal integration to obtain (µ k,−1 ) k≤m from a formal density at any desired computational precision is also a perfectly fine use case.However, we do not have a clever way of choosing the parameter m yet.We found that setting all m i 's equal to n provide enough evaluation points to obtain a good fit on our simple examples.

A Gaussian example
We do not know if the d-variate log-Normal, defined as the componentwise exponentiation of a d-variate Gaussian random vector, is inside G d .However, we can still run our procedure and observe the resulting approximations.
A first result is given in Figure 6, that corresponds to a G 2,10 estimation taken from 100 000 data points simulated from a M LN (µ = 0, σ = 1, ρ = 0.5). Figure 6 proposes quantile-quantile plots for the marginals, and a simple Gaussian kernel density estimator for the pseudo-observations on the original and estimated models.We see on Figure 6 that the dependence structure was correctly reproduced by the approximation as a convolution of 10 (comonotonic) gamma random vectors, but marginal tails are not estimated correctly.Thankfully, pushing the number of multivariate gammas to n = 20 solves this issue, as shown in Figure 7: On Figure 7, we see that the dependence structure is still very good, and that the problem of the marginal tails has been solved by adding a few more gammas.
Note that since we have 100 000 points in the quantile-quantile plots, one point out of the line corresponds to a deviation at the 0.00001-quantiles, which are of little importance in many use cases.
Overall, marginal quantile-quantile plots are correct when the number of gammas is big enough, and the Gaussian dependence structure is well-fitted.The Gaussian copula is a copula that exhibits no tail dependency.We already know that our model can exhibit asymmetric behaviors, but the possibility of tail dependency is also an important feature to have.Therefore, in the next example we run the model on datasets that have such a feature.

A copula with tail dependency
We fit data whose dependence structure is given through a (survival) Clayton copula.This copula is a symmetric copula that exhibits upper tail dependency.
We took here a parameter θ = 7, yielding a strong positive dependence structure.
In Figure 8, you can see the results corresponding to data sampled from a Clayton copula, a Pareto(k = 2.5) marginal and a log-Normal LN (0, 0.83) marginal.The conditions of the experiments are identical to the previous subsection.We note that marginal upper tails are not perfectly fitted: this could probably be overcome by taking a greater number of gammas.However, we see that the shape of the estimated copula is not perfectly symmetric: there is no constraint on the model to produce a symmetric dependence structure, and therefore producing one is hard.The small misfit around the (0.25, 0.02) quantile is probably due to this: marginals are not on the same scale, and therefore the Laguerre coefficients with k 1 k 2 are probably on a different scale than those with k 2 k 1 .The goodness of fit is therefore not symmetric.

A real dataset
To test the algorithm on more realistic data, we chose the Loss-Alae dataset from Klugman & Parsa [28].This dataset features 1500 observations of two variables: "Loss" and "Alae".We refer to [28] for description of this (quite standard) dataset in insurance valuations and copula estimations fields.On Figure 9, we see that the dependence is somewhat strong between the two marginals.There is left and right tail dependency, and the marginals are clearly heavy-tailed.
We ran the algorithm with numbers of gammas n ∈ (2,3,4,5,10,20).We also fixed m = (n, n), arbitrarily.The obtained shapes and rates are given in Table 2, in Appendix B. We note that all estimators are ε-well-behaved, and that, sometimes, the algorithm uses zero scales.Quantiles-quantiles plots of the marginals and kernel density estimation of the dependence structures for n = 2, 5, 10, 20 are reported in Figure 10.We observe on the two first lines of Figure 10 a strange behavior for the dependence structure.This is due to the fact that only one marginal has zeros in the scale matrices (see Table 2 in Appendix B).Although unusual, this kind of structure is quite expressive, and indeed for bigger values of n, where we convoluted several of them, we managed to capture the dependence structure of the data.However, we have to wait until n = 20 to obtain good fits on the marginals as well.Figure 11 reports more information about the final fits n = 20, which shows the parametric smoothing on the marginals and the dependence structure.Simulation from the obtained model is trivial, thanks to its additive structure.Moreover, the distribution of any linear combination c, X (c ≥ 0) is also given as a G 1,n (α, c, s ).This additive property is a computationally valuable feature of the model: any result and potential code routine that computes statistics on G 1,n distribution can be directly used for the marginals and for all linear combinations of them.Finally, Theorem 4.1.1 in [5] provides a neat mixture of gammas representation for G 1,n distributions, with easily computable mixture weights from parameters, which can be useful for applications.

Conclusion
Although the class of gamma convolutions has very appealing properties for the practitioner, the estimation of gamma convolutions from empirical data had not been considered in the literature.The closest algorithm we could find is the recent MFK algorithm, that we described in Subsection 2.3, which projects densities from the G 1 class to the G 1,n class.Unfortunately, extending this algorithm for general estimation purposes is hard if not impossible.
Using a well-chosen Laguerre basis, we constructed a series expansion for the density of a distribution in G d,n .Coupled with a simple L 2 loss for the density, we were able to project densities from G d , d > 1 onto G d,n , which was not possible, but also to project any density onto G d,n , which was not possible either.Finally, through the same loss, we can estimate G d,n distributions from empirical data for any reasonable d, n, which was also not treated in the literature.
Although the convergence result is not as strong as MFK's one, the numerical precision needs and computational speeds are far better.We saw that given the same information, shifted moments of the random variable with high precision, the performance of the two algorithms is overall the same, except maybe for small numbers of gammas where we beat MFK.For both algorithms, Figure 1 also shows that the model's goodness of fit seems to be decreasing for large numbers of gammas, and an adaptive estimator might be a suitable way out.
Maybe a better loss could be found for the estimation of these gamma convolutions.Furthermore, automatic penalization of the model could be done through the number of gammas, but also through sparsity of the Thorin measure, according to Property 2.1.4,yielding more independent factors when possible.Such a modification will also provide a clearer view at the additive risk factor structure.
The rate of the estimator will be explored further in future work, using results from the parametric contrast estimators literature, as a referee pointed out.
Last but not least, many features are still to be discovered about multivariate Thorin classes.The estimation of multiplicative structures could yield interesting results.Bondesson showed that the product of random variables in G 1 is also in G 1 , but we still do not have a way of constructing the corresponding Thorin measure, neither a corresponding result for the multivariate case.
For (i) and (ii), consider that there exists a matrix Y of gamma random variables Y i,j ∼ G 1,1 (α j , s i,j ) such that where row vectors Y i,. are independent and column vectors Y .,jhave two separate groups of marginals: a first comonotonic group (with gamma marginals) corresponding to scales s i,j that are strictly positive and a second group (with identically 0 marginals) corresponding to null scales.Note that one of these two groups might be empty.From this expansion, we can easily deduce the structure of maximal supports for ν that leads to independent or comonotonic random vectors X.
For (iii), note that D is the rank of the matrix s, hence D ≤ min(n, d).If D = d = n, then X is by definition an invertible linear transformation of an independent, d-dimensional random vector, and therefore has a density in L 2 that we can express easily.If D = d < n, then X is the convolution of the previous case with something else, and therefore also has a density in L 2 .The last case, D < d, gives a random vector X that is a linear transformation of a set of D < d independent variables, which is therefore clearly not absolutely continuous as claimed.
Consider non-atomic Thorin measures as limits of the previous case.The second result comes from a discretization of the Thorin measure, from continuously uniform on [0, 1] to uniform on j n+1 , j ∈ {1, ..., n} .Note that the cumulant generating functions converges on its domain of definition, that includes a real interval, hence the convergence in distribution.To prove the first part, note that there exists X 0 ∼ G 1,1 (α, 1) such that X = s, (X 0 , ..., X 0 ) .Therefore, denoting by M the mgf of X 0 , Therefore, the Laguerre coefficients a k (α, s) of X are given by: Then, for the reverse assertion, consider the expression of the first Laguerre coefficients of X: Use the simplex transformation x = s 1+|s| to solve for s as a function of α in the second equation, and inject in the first.We obtain an equation of the form ae x +bx+c which is solved by the zeroth branch of the Lambert W function.
Proof of Lemma 3.2.2.Point (i) and (ii) are obvious.To prove the 3 remaining points, we use the fact that h is a Möbius transform, see [1], and hence maps generalized circles to generalized circles.
x) and scalar products of vectors by x, y = x 1 y 1 + . . .+ x d y d .Finally, we denote x y = i xi yi the product of binomial coefficients.
and truncation threshold m ∈ N d Result: Laguerre coefficients a = (a k ) k≤m of the G d,n (α, s) density Side-product: −1-shifted cumulants κ and moments µ op to order m of the G d,n (α, s) distribution.Compute the simplex version of the scales

2 . 1 .
Property 3.3.2(Consistency).Consider that f ∼ G d,n (α 0 , s 0 ) is well-behaved, and denote X a set of N i.i.d.random vectors with this distribution.Fix m ∈ N d .Suppose that we have a global minimizer ( α, ŝ) = arg min G d,n (α,s) w.b.

1 a
the strong law of large numbers, A m fixed.Now consider the remainder B. We show that B a.s.− −−− → m→∞ 0 uniformly in N .

4. 1 . 1 .
Projection from G 1 densities to G 1,n The log-Normal distribution is a fundamental distribution in the G 1 class.The proof that it belongs to G 1 is actually what historically initiated the study of the class by Thorin.To match MFK's experiment, we conduct our first comparison with a log-Normal LN (µ = 0, σ = 0.83) density.The resampled Kolmogorov-Smirnov distances are summarized, for several n and for both algorithms, in Figure 1.

Fig 3 .
Fig 3. Log-Normal results with 10 gammas: Upper left: the comparison of the density approximations.Lower left: the difference of to the true density.Upper right: a quantile-quantile plot of the approximation against the true distribution.Lower right: p-values of one-sample KS tests of simulations from the estimator against the true distribution.

Fig 4 .
Fig 4. Quantile-quantile plots for Pareto experiments, with N = 1000 samples (log-scaled).We only show the 995 first points: 5 points are excluded in the tail for clarity.Each row corresponds to a shape parameter for the Pareto, and each column to a number of gammas.

Fig 6 .
Fig 6.Multivariate log-Normal results with 10 gammas.Top left and bottom right: quantilequantile plots for the two marginals.Bottom left and top right: levels plots of pseudoobservations from simulations from, respectively, the estimated model and the dataset.

Fig 9 .
Fig 9.The original Loss-Alae dataset: On the left, a log-scale joint plot of the dataset.On the right, from top to bottom, Gaussian kernel density estimates of the copula and of the two marginals (both log-scale).

Fig 10 .
Fig 10. Results of the estimation on the Loss-Alae dataset.Each line correspond to a different number of gammas n ∈ (2, 3, 10, 20).The first column gives marginal quantile-quantile plots of Loss and Alae, the middle column gives a joint plot on log-scale, and the last column represents the dependence structure on the copula scale.

Fig 11 .
Fig 11.Results for the n = 20 model.Top left: density of the sum Loss + Alae.Top right: marginal densities.Bottom: A comparison between pseudo-observations from a simulation (right) and the original dataset (left).

Proof of Example 3 . 1 . 2 . 2 −d and c 2 = d 2 − 1 2a0 d i=1 a 1
The bijection is given through the following equations:a k (α, s) a 0 − a 1(i)2αa 0 for all i ∈ 1, .., d reciprocally, where:• 1(i) is the d-variate vector with j th component equal to 1 i=j , for all j.• c 1 = a 0 √ (i) • W 0 is the zeroth branch of the Lambert function W defined by: y = W (x) ⇐⇒ x = ye y .
Furthermore, h maps reals to reals, and we obtain c(b) and r(b) from the expression of h(b) and h(−b).Last, by checking if the disk with center c(b) and radius r(b) contains h(0) = −1, we conclude.Proof of Example 3.2.5.The three first cases are deduced directly from Definition 3.2.3 and Property 3.2.4.For point (iv), recall that for two gamma convolutions with Thorin measures ν 1 and ν 2 , the Thorin measure of the sum is:ν(s) = ν 1 (s) + ν 2 (s).Now, according to the reading of Property 3.2.4,point (iv), no set of p < d rays S c1 , ..., S cp concentrate more than half of ν 1 weights and more than half of ν 2 weights.Hence, no set of p < d rays can concentrate more than half of ν weights.Therefore, the class of w.b. gamma convolutions is closed w.r.t finite convolutions.Appendix B: Auxiliary figures and tables
2.3 shows.Definition 3.2.3(ε-well-behaved gamma convolutions).A gamma convolution G d,n (α, s) is said to be ε-well-behaved, in short ε-w.b., if |α| > 1 and ∀I ⊆ {1, ..., n} such that i∈I α i > i / ∈I α i , denoting s I = (s i ) i∈I , for any (complex) solution t * of the linear system of equations s I t = 1, at least for one j ∈ 1, ..., d, |h(t * j 1 is unique if and only if Ker(s I ) = {0}.If it exists, call this solution t * ∈ C d .Now, since for a given i ∈ I, s i,j are all non-negative reals and one of them must not be zero, t * must have at least one component t * j that has a positive real part.Furthermore, this component is clearly of bounded modulus since the solution is unique and all s i,j are finite.Finally, the requirement |h(t * j )| > 1+ε is equivalent, through Lemma 3.2.2,point(v), to |t * j − c(1 + ε)| < r(1 + ε) (where c and r were defined in Lemma 3.2.2).Since the disk D(c(1 + ε), r(1 + ε)) is in C + ,and tends to the whole C + as ε goes to 0 by Lemma 3.2.2,point (iv), there always exist an ε > 0 so that a bounded t * j with positive real part is inside this disk.
The proof of Property 3.2.4gives a generic way of finding the greatest ε that makes a well-behaved gamma convolution ε-w.b. for the general multivariate case.In a certain sense, the ε-w.b. property is a quantification of the statements from Property 2.1.4:theThorinmeasuremust not assign half of its mass or more to any subset that would correspond to a non absolutely continuous gamma convolution.It is therefore stronger than the absolute continuity of the density.A surprise comes in the next examples, where we show the unexpected fact that the w.b. subclass of gamma convolutions is closed w.r.t finite convolutions.Example 3.2.5 (Simple w.b. examples).As soon as the total mass of their Thorin measure is greater than one, the following gamma convolutions are wellbehaved:(i) Any univariate gamma convolution (the best ε is given in Definition 3.2.3).(ii)Any independent random vector in G d,n .k ) k∈N d of any d-variate ε-well-behaved gamma convolution verify:

Table 1
Estimated parameters from both algorithms on the log-Normal example

Table 2
Estimated Thorin measures on Loss-Alae dataset, for several number of atoms.Scales below 10 −10 are considered to be 0.