Nonparametric Estimation of the Expected Discounted Penalty Function in the Compound Poisson Model

We propose a nonparametric estimator of the expected discounted penalty function in the compound Poisson risk model. We use a projection estimator on the Laguerre basis and we compute the coeﬃcients using Plancherel theorem. We provide an upper bound on the MISE of our estimator, and we show it achieves parametric rates of convergence on Sobolev–Laguerre spaces without needing a bias-variance compromise. Moreover, we compare our estimator with the Laguerre deconvolution method. We compute an upper bound of the MISE of the Laguerre deconvolution estimator and we compare it on Sobolev–Laguerre spaces with our estimator. Finally, we compare these estimators on simulated data. MSC-2020


The statistical problem
We consider the classical risk model (compound Poisson model) for the risk reserve process (U t ) t≥0 of an insurance company: where u ≥ 0 is the initial capital; c > 0 is the premium rate; the claim number process (N t ) is a homogeneous Poisson process with intensity λ; the claim sizes (X i ) are positive and i.i.d. with density f and mean μ, independent of (N t ). We denote by τ (u) the ruin time: and we make the following assumption to ensure that τ (u) is not almost surely finite.
Assumption 1 (safety loading condition). Let θ := λμ c , we assume that θ < 1. To study simultaneously the ruin time, the deficit at ruin, and the surplus level before the ruin, Gerber and Shiu (1998) introduced the function: where δ ≥ 0, and w is a non-negative function of the surplus before the ruin and the deficit at ruin. This function is called the expected discounted penalty function, but it will also be referred to as the Gerber-Shiu function in the following. For more information concerning the compound Poisson model and the Gerber-Shiu function, see Asmussen and Albrecher (2010).

F. Dussap
motivated by the work of , where the properties of the Laguerre functions relative to the convolution product are used to solve a Laplace deconvolution problem. The same method was then used in more general risk models: Zhang and Su (2019) estimate the Gerber-Shiu function in a Lévy risk model, where the aggregate claims is a pure-jump Lévy process; Su, Yong and Zhang (2019) estimate the Gerber-Shiu function in the compound Poisson model perturbed by a Brownian motion; and Su, Shi and Wang (2019) study the model where both the income and the aggregate claims are compound Poisson processes. Recently, Su and Yu (2020) showed the Laguerre projection estimator of the Gerber-Shiu function in the compound Poisson model is pointwise asymptotically normal in the case δ = 0.
In this paper, we construct an estimator of the Gerber-Shiu function (2) in the compound Poisson model (1). As Zhang and Su (2018), our estimator is a projection estimator on the Laguerre basis, but we compute the coefficients using Plancherel theorem instead of using a Laguerre deconvolution method. We emphasize that our estimator achieves parametric rates of convergence on Sobolev-Laguerre spaces regardless of the regularity of the Gerber-Shiu function, and without needing to find a compromise between the bias and the variance.
We also improve the previous results concerning the Laguerre deconvolution method. Previous rates were given in O P , and we propose a non-asymptotic bound on the MISE (Mean Integrated Squared Error) of the estimator. To achieve this goal, we introduce two modified versions of the Laguerre deconvolution estimator: the first one depends on a truncation parameter, whereas the second one does not, but it is only defined in the case δ = 0.
To control the MISE of the second version of the Laguerre deconvolution estimator, we had to prove that the primitives of the Laguerre functions were uniformly bounded (see Lemma 3.4). This result is interesting in itself, the proof relies on the study of the properties of the ODE's satisfied by Laguerre polynomials. The interested reader can find all the details in Appendix B.
Outline of the paper In the remaining part of this section, we introduce the notations and we give preliminary results on the Gerber-Shiu function. In Section 2, we construct our estimator and we study its MISE. In Section 3, we introduce two modified versions of the Laguerre deconvolution estimator and we study their MISE. In Section 4, we compute convergence rates of the different estimators considered on Sobolev-Laguerre spaces and also in the case where the claim sizes are exponentially distributed. In Section 5, we compare numerically the estimators on simulated data. We gathered all the proofs in Section 7.

Notations and preliminaries on the Gerber-Shiu function
We use the following notations in the paper: • "x y" means x ≤ Cy for an absolute constant C > 0. Amx 2 x 2 is the 2 -operator norm of the matrix A m ∈ R m×m .
The key result to estimate the Gerber-Shiu function is the following theorem.
We need to ensure that φ, g and h belong to L 2 (R + ) in order to use a projection estimator. We see that sup x g(x) ≤ sup x λ c P[X > x] ≤ λ c and ∞ 0 g(x) dx ≤ λ c ∞ 0 P[X > x] dx = θ, hence g ∈ L 1 (R + ) ∩ L ∞ (R + ), therefore g ∈ L 2 (R + ). To ensure that h ∈ L 2 (R + ) we make the following assumption.

F. Dussap
Remark 1.4. Assumption 2 has already been considered by Shimizu and Zhang (2017), and Zhang and Su (2018). Actually, the quantity: can be found on several occasion in the study of the Gerber-Shiu function. The assumption that ∞ 0 ω(x) dx is finite ensures that φ(u) is finite for all u (Asmussen and Albrecher, 2010, Chapter X, Section 1). The additional requirement that ∞ 0 x ω(x) dx is finite serves to prove that φ belongs to L 1 (R + ), so that its Fourier transform is well defined. As we have seen, it also ensures that φ belongs to L 2 (R + ).

The Laguerre-Fourier estimator
We use the Laguerre functions (ψ k ) k∈N as an orthonormal basis of L 2 (R + ): We choose this basis for several reasons. First, the support of the Laguerre functions is R + , which is well suited since the functions we want to estimate are defined on R + . Moreover, exponential functions (and more broadly mixtures of gamma functions, see the proof of Lemma 3.9 in Mabon (2017)) have an exponentially small bias in this basis, which is interesting because when the claim sizes distribution is exponential and w is a polynomial, then g and h will be given by products of polynomials with exponentials. Finally, the Fourier transform of the Laguerre function is known explicitly: which is helpful for the computation of the estimated coefficients (8).
We denote by (a k ) k≥0 the Laguerre coefficients of φ. If m ∈ N * , we denote by φ m the projection of φ on the subspace of L 2 (R + ) spanned by the first m Laguerre functions ψ 0 , . . . , ψ m−1 , that is: The Laguerre coefficients of φ can be computed using Plancherel theorem: Taking the Fourier transform in equation (3), we see that Fφ = F h 1−F g . Let g,ĥ ∈ L 2 (R + ) be some estimators of g and h (we provide these estimators later in equation (14)), we estimate the coefficients of φ by: where Fg := (Fĝ)1 |Fĝ|≤θ0 for some truncation parameter θ 0 < 1. The estimator of φ is then:φ where m 1 is the dimension of the projection space.
Proposition 2.1. Under Assumptions 1 and 2, if θ < θ 0 , we have: Remark 2.2. We emphasize the fact that this result is proven using only two properties: the function φ satisfies the equation (3) and θ 0 > θ > g L 1 . Hence, it can be applied to other problems where the target function satisfies an equation of the form (3). For example, it is the case in Zhang and Su (2019), Su, Shi and Wang (2019) and Su, Yong and Zhang (2019).
We now need to provide good estimators of g and h. We choose to estimate them by projection on the Laguerre basis too. Let (b k ) k≥0 and (c k ) k≥0 be the coefficients of g and h, that is: By Fubini's theorem and using equation (4):

F. Dussap
The same calculation for c k yields: We estimate these coefficients by empirical means. However, we first need to estimate ρ δ . Since ρ δ is the non-negative solution of the Lundberg equation (5), we estimate it byρ δ the non-negative solution of the empirical Lundberg equation: whereλ := N T T and Lf (s) := 1 When δ = 0 we know that ρ δ = 0 so we do not need to estimate it, thus we setρ 0 = 0. The estimated coefficients of g and h are: and the estimators of g and h are: where m 2 and m 3 are the dimensions of the projection spaces. As we did for φ, we denote by g m2 and h m3 the projections of g and h on the subspaces Span(ψ 0 , . . . , ψ m2−1 ) and Span(ψ 0 , . . . , ψ m3−1 ).
Remark 2.3. The dimensions m 1 , m 2 , m 3 do not have to be the same for the estimation of φ, g and h. In practice, we will choose different dimensions.
In order to give a bound on the mean integrated squared error of our estimatorsĝ m2 andĥ m3 , we need to make an additional assumption.
Remark 2.4 (Applicability of Assumptions 2 and 3). Assumptions 2 and 3 can be thought as moment conditions on the claim sizes distribution, with respect to w. In the special case where w is given by w(x, y) = x k (x + y) for k, ≥ 0, we have: so Assumptions 2 and 3 reduce to the moment condition E X 2k+2 +3 < +∞ (if δ = 0). Notice that the functions of Example 1.1 correspond to the cases (k, ) = (0, 0) or (0, 1), so that the corresponding moment condition is Hence, heavy-tailed distributions can fit into these assumptions, provided they admit sufficiently large moments. On the other hand, if w grows with an exponential rate, for example if w(x, y − x) := exp(γ(x + y)), then we also need an exponential moment for X, so that we are restricted to light tailed distributions.
Theorem 2.5. Under Assumptions 1, 2 and 3, if δ = 0 then it holds: and if δ > 0 then it holds: Remark 2.6. The variance terms do not depend on m 2 nor m 3 , so no compromise between the bias and the variance is needed: we just have to take m 2 and m 3 as large as possible such that the bias is smaller than 1/T . See Section 4 for a discussion concerning the choice of m 2 and m 3 when the functions g and h belong to a Sobolev-Laguerre space.
Let m 1 , m 2 , m 3 ∈ N * , we estimate g byĝ m2 and h byĥ m3 . We plug these estimators in (8) and we estimate φ by: with Fg m2 := Fĝ m2 1 |Fĝm 2 |≤θ0 . Combining Proposition 2.1 with Theorem 2.5, we obtain: Corollary 2.7. Under Assumptions 1, 2 and 3, if θ < θ 0 then it holds: We want to compare our estimator with the Laguerre deconvolution method. However, there is no result on the MISE of this method for estimating the Gerber-Shiu function, so we study it in the next section.

The Laguerre deconvolution estimator
For the Laguerre deconvolution method, we need an additional assumption on the coefficients of g.
Remark 3.1. If g belongs to a Sobolev-Laguerre space W s (R + ) with regularity s > 1, then Assumption 4 holds automatically. The spaces W s (R + ) are regularity spaces associated with the Laguerre basis, see Definition 4.1 below. Indeed, by the Cauchy-Schwarz inequality, we have: The reason why the Laguerre basis is well suited for deconvolution on R + is the following relation satisfied by the Laguerre functions: see formula 22.13.14 in Abramowitz and Stegun (1972). The reader interested in the use of the Laguerre basis for deconvolution problems is referred to Mabon (2017). Expanding the renewal equation (3) on the Laguerre basis, one easily obtains the following relation between the coefficients of φ, g and h: where the sequence (β k ) k≥0 is defined by β 0 := b0 √ 2 and β k : for k ≥ 1. This relation can be written in a matrix form: if a m := (a 0 , . . . , a m−1 ) T and c m := (c 0 , . . . , c m−1 ) T are the vectors of the m first coefficients of φ and h, then it holds: where A m is the lower triangular Toeplitz matrix: This matrix is invertible if and only if 1 − b0 √ 2 = 1, which is the case because b0 √ 2 ≤ θ < 1 under Assumption 1.

Lemma 3.2. Under Assumption 4, we have
This lemma is borrowed from Zhang and Su (2018) (Lemma 4.3 in their article). There were missing elements in their proof, so we give a new proof of this lemma, for the sake of completeness.
The naive Laguerre deconvolution estimator consists in estimating the matrix A m and the coefficients c m in (15), to obtain an estimation of the coefficients of φ. More precisely, the matrix A m is estimated by pluggingb k , defined by (12), in (16): This matrix is invertible if and only ifb 0 √ 2 = 1, which is almost surely the case sinceb 0 whereĉ m := (ĉ 0 , . . . ,ĉ m−1 ) T . Under Assumptions 1, 2, 3, and 4, Zhang and Su (2018) In the following, we propose two ways inspired by Comte and Mabon (2017) to estimate the Gerber-Shiu function, using the Laguerre deconvolution method. To obtain a non asymptotic result on the MISE of the estimator, a cutoff is required when inverting the matrixÂ m . Let θ 0 < 1 be a truncation parameter, we estimate A −1 m by: and we estimate the coefficients a m byâ Lag 1 m := A −1 m,1 ×ĉ m . Theorem 3.3. Under Assumptions 1, 2, 3, and 4, if θ < θ 0 then it holds: We propose a second way to estimate φ using the Laguerre deconvolution method, in the case δ = 0. It avoids the use of a truncation parameter θ 0 , but at the expense of an extra log(m) factor in the upper bound of the MISE, and it uses an additional independence assumption. We estimate the Laguerre coefficients of g by (12), that is in this case: This lemma is a technical result interesting in itself and we prove it in Appendix B. Using this lemma, we can control the risk ofÂ m in operator norm.
Proposition 3.5. If δ = 0, p ≥ 1 and log m ≥ p, then it holds: This time, we estimate the inverse of the matrix A m by: we estimate the coefficients of φ byâ

To provide an upper bound on the MISE ofφ
Lag 2 m , we need A −1 m,2 andĉ m to be independent. For this reason, we assume that we have a second observation set {N T ; X 1 , . . . , X N T } identical in law but independent from the main one 1 . We use this second set to estimate A −1 m,2 . Theorem 3.6. We assume that δ = 0. Under Assumptions 1, 2, 3 and 4, if m log m ≤ cT then it holds: Remark 3.7. Contrary to the Laguerre-Fourier method, there is only one bias term with the Laguerre deconvolution method. However, the variance term is more complicated and a bias-variance compromise is needed. It leads to nonparametric rates of convergence, which are slower than the parametric rate 1 T .

Sobolev-Laguerre spaces
To study the bias of a function in the Laguerre basis, we consider the Sobolev-Laguerre spaces. These functional spaces have been introduced by Bongioanni and Torrea (2009) to study the Laguerre operator. The connection with the Laguerre coefficients was established later by Comte and Genon-Catalot (2015).
Definition 4.1. For s > 0, we define the Sobolev-Laguerre ball of radius L > 0 and regularity s as: and we define the Sobolev-Laguerre space as W s (R + ) := L>0 W s (R + , L). (2015), when s is a natural

By Proposition 7.2 in Comte and Genon-Catalot
. We are interested in the Sobolev-Laguerre spaces because of the following observation. If v belongs to a Sobolev-Laguerre ball W s (R + , L), then its bias is controlled by: Combining this upper bound on the bias term with Corollary 2.7, and Theorems 3.3 and 3.6, we obtain convergence rates for the Laguerre-Fourier estimator and the Laguerre deconvolution estimators, on Sobolev-Laguerre spaces.

Theorem 4.2. Under Assumptions 1, 2 and 3, if θ < θ
Remark 4.3. If φ, g and h belong to some Sobolev-Laguerre spaces with a regularity index greater than 1, we can just choose m 1 = m 2 = m 3 = cT and obtain the parametric rate O( 1 cT ) for the Laguerre-Fourier estimator. Theorem 4.4. We make Assumptions 1, 2, 3 and 4, and we assume that φ ∈ W s (R + ).
1. If θ < θ 0 , then choosing m opt ∝ (cT ) 1 1+s yields: Remark 4.5. The Fourier-Laguerre estimator and the Laguerre deconvolution estimatorφ Lag 1 m both depend on a truncation parameter θ 0 that needs to be chosen such that θ < θ 0 . We see two ways to ensure that.
1. We can assume that we know some θ 0 < 1 such that θ < θ 0 . Then our convergence rates are those of Theorems 4.2 and 4.4. 2. We can choose θ 0 = 1−(log T ) 1/2 . Then for T large enough (more precisely T > e (1−θ) 2 ), the convergence rates of the Laguerre-Fourier estimator and φ Lag 1 m are those of Theorems 4.2 and 4.4 multiplied by log(T ).
In our simulations, we chose the first way.

The exponential case
In this section, we want to compute the convergence rate of the estimators, in the exponential case: X ∼ Exp(1/μ). This distribution is often considered in risk theory and closed forms of the Gerber-Shiu function are available in this case. Indeed, the Gerber-Shiu functions of Example 1.1 are given by: (19) These formulas are obtained by Laplace inversion, see Asmussen and Albrecher (2010), chapter XII. We use the following lemma to compute the bias terms of the functions φ, g and h.
Lemma 4.6. Let C, γ be positive numbers and let F (x) = C exp(−γx)1 R+ (x). The Laguerre coefficients of F are given by: Combining this Proposition with Theorems 3.3 and 3.6, we easily obtain convergence rates for the Gerber-Shiu functions we are interested in.
Theorem 4.8. We assume that the density of X is f (x) = 1 μ e − x μ , we make Assumptions 1, 2, 3 and 4, and we assume that the bias term of φ decreases as: 2. If δ = 0, then choosing m opt = 1 r log(cT ) yields: For the Laguerre-Fourier estimator, we also need to know the decreasing rate of the bias term of g and h. For the ruin probability, the Laplace transform of τ , and the jump size causing the ruin, direct calculations show that g and h are given by a positive multiple of e −x/μ . Thus, Lemma 4.6 yields that their bias term is less than exp(−r m), with r := 2 log| 1+μ 1−μ |. Together with Corollary 2.7, we obtain the convergence rates of the Laguerre-Fourier estimator.
Using Laplace inversion techniques, we have access to explicit formulas for these Gerber-Shiu functions, see Chapter XII of Asmussen and Albrecher (2010) for more details. In all cases, they are given by a sum of products of polynomials and exponentials, hence they belong to W s (R + ) for all s > 0.
Computation of the estimators Let us start on how we compute the Laguerre functions. The Laguerre polynomials, defined by (6), satisfy the relations: see formulas 22.7.12 and 22.8.6 in Abramowitz and Stegun (1972). From this formulas, one can prove: Let Ψ k (x) := x 0 ψ k (t) dt be the primitive of the Laguerre function ψ k ; these functions ares used to compute the coefficientsb k andĉ k below. From (20), and by integrating (21), we see that the Laguerre functions and their primitives can be computed recursively: The expression ofb k andĉ k depends on the value of δ and the form of w: 1. Ruin probability. The estimators of the coefficients b k and c k are in this We compute the integrals inĉ k using Romberg's method with 2 10 + 1 points.
2. Expected claim size causing the ruin. The estimators of the coefficients b k and c k are in this case: We compute the integrals inĉ k using Romberg's method with 2 10 + 1 points. 3. Laplace transform. The estimators of the coefficients b k and c k are in this case: where we used integration by parts to obtain this expression ofĉ k . We compute the integrals inb k using Romberg's method with 2 10 + 1 points. We computeρ δ , the solution of Equation (11), with Newton's method using the initial condition δ+λ/2 c .
For the Laguerre-Fourier estimator, once we have computed (b k ) 0≤k<m2 and (ĉ k ) 0≤k<m3 , we can compute the coefficientsâ k defined by (8): where Fψ k is given by (7), and where the integral inâ k is computed with Romberg's method on a discretization of [−10 3 , 10 3 ] with 2 15 + 1 points. For the Laguerre deconvolution estimators, once (b k ) 0≤k<m and (ĉ k ) 0≤k<m have been computed, we can compute the matrixÂ m defined by (17) and then compute the coefficientsâ Lag i m as described in Section 3.
Remark 5.1. While the Gerber-Shiu function is always positive, this is not necessarily the case of the estimators. However, we can always take their positive part, since it does not increase their risk: In Figures 1 and 2, we observe that the estimators stay positive where φ is positive, and that they can take small negative values when φ becomes small (as u tends to +∞). Hence, it is reasonable to use the estimators without taking their positive part. We choose to do so, in the simulations.  Model selection Each estimator we consider depends on one or several parameters that need to be chosen. The Laguerre-Fourier estimator and the Laguerre deconvolution estimatorφ Lag 1 m depend on a truncation parameter θ 0 , which needs to be chosen such that θ < θ 0 . We choose θ 0 = 0.95 in our simulations.
• The Laguerre-Fourier estimator depends on four parameters: m 1 , m 2 and m 3 , the dimensions of the projection spaces for the functions φ, g and h, and θ 0 the truncation parameter in the estimation of Fg. As said in Remark 4.3, we can choose m 1 = m 2 = m 3 = cT , no selection procedure is required. Still, we propose a model reduction procedure for the choice of m 2 and m 3 , that we describe in Appendix A. • The naive Laguerre deconvolution estimatorφ Lag 0 m , defined by (18), depends on one parameter: m, the dimension of the projection space for φ. However, there is no model selection procedure for m. In their numerical section, Zhang and Su (2018) only consider (as we do) Gerber-Shiu functions with exponential decay; hence the bias term also decays with exponential rate. Using this fact, they chose m = 5T 1/10 . We make the same choice in our simulations and we writeφ ZS this estimator.
• The Laguerre deconvolution estimatorsφ where the model collections are: with M = cT ∧ 500 (we do not compute more than 500 coefficients, because of computation time). In the following, if F is a function, we write F (X) := 1 For the penalty terms, we choose empirical versions of the variance terms in Theorems 3.3 and 3.6: The constants κ 1 and κ 2 are calibrated following the "minimum penalty heuristic" (Arlot and Massart, 2009). On several preliminary simulations, we compute the selected dimensionm as a function of κ, and we find κ min such that for κ < κ min the dimension is too high and for κ > κ min it is acceptable. Then, the selected constant is 2κ min . In our cases, we choose: κ 1 = 0.01, κ 2 = 0.01 for the ruin probability; κ 1 = 0.1, κ 2 = 1 for the expected claim size causing the ruin; κ 1 = 10 −8 for the Laplace transform of the ruin time, δ = 0.1.
There is no constant κ 2 in the last case because the Laguerre deconvolution estimatorφ Lag 2 m is defined only if δ = 0. We writeφ Lag 1 :=φ Lag 1 m Lag 1 andφ Lag 2 :=φ Lag 2 m Lag 2 in the following.

MISE calculation
We compare the estimators by looking at their MISE: We compute the norm · L 2 with Romberg's method using a discretization of [0, u max ] with 2 11 + 1 points. The value of u max varies from 12 to 50, depending on the parameters set. We compute the expectation by an empirical mean over n = 200 paths of the process (U t ) t∈ [0,T ] . We also compute a 95% confidence interval for the MISE, using the asymptotic confidence interval for a mean (CLT approximation): where ISE n is the empirical mean of the ISEs, q 1− α 2 is the (1 − α 2 )-quantile of the normal distribution, and S 2 n is the empirical variance of the ISEs. We have two goals in this section: 1. To compare the performance of our Laguerre-Fourier estimator with the Laguerre deconvolution estimators. 2. To see if the model selection procedures (22) for the Laguerre deconvolution estimators lead to the same performance than the naive choice m = 5T 1/10 .
The code that performed the simulations can be obtained on request.

Results
We display our results in Tables 1, 2 and 3. Concerning the estimation of the ruin probability (Table 1), we see that all the estimators perform well with the first set of parameter (exponential distribution, θ = 0.67). However, with the two other sets of parameters (exponential distribution and Gamma(2) distribution, θ = 0.83), the difference is clear: the Laguerre-Fourier estimator has the smallest risk, followed by the estimator of Zhang and Su (2018), and the Laguerre deconvolution estimators come last. We notice thatφ Lag 2 seems to be better thanφ Lag 1 in this case.
Concerning the estimation of the expected jump size causing the ruin (Table  2), the difference is even clearer. With the first set of parameters, we see that the Laguerre-Fourier is better for small sample size (E[N T ] = 100), but equivalent to the other estimators for larger sample sizes. We also notice that the estimator φ ZS andφ Lag 2 have the same risk. With the two other sets of parameters, we find again that the Laguerre-Fourier estimator is better than the estimatorφ ZS , which is better than the Laguerre deconvolution estimators. This time, we see thatφ Lag 1 has better performances thanφ Lag 2 .
Concerning the estimation of Laplace transform of the ruin time (Table 3), we see no difference between the MISE of the Laguerre-Fourier estimator and the Laguerre deconvolution estimators.
For illustration purposes, on Figures 1 and 2, we show the estimations of the ruin probability and the expected claim size causing the ruin, on 50 independent samples, with the second set of parameters (exponential distribution, θ = 0.83). Qualitatively, we see that the Laguerre-Fourier estimator is better than the others. In contrast, the non data-driven choice of m for estimator of Zhang and Su (2018) seems not appropriate in this setting.
To conclude, we can say that our Laguerre-Fourier estimator has better performances than the Laguerre deconvolution estimators on simulated data, even in the exponential case where they have theoretically the same MISE (up to a log factor). Furthermore, the Laguerre deconvolution estimators with the model selection procedure (22) fail to match the performance of the estimator of Zhang and Su (2018), for which we choose the parameter m knowing the bias decay rate of φ, in most cases.
Remark 5.2. In Tables 1, 2 and 3, the MISEs of the estimators are not normalized by φ 2 L 2 , the size of the estimated function. Hence, it is normal that the order of magnitude of the results varies from one function to another. For example, in Table 2, φ 2 L 2 equals respectively 5, 100 and 50, for each set of parameters.

Conclusion
Using a projection estimator on the Laguerre basis, and computing the coefficients with Fourier transforms, we constructed an estimator of the Gerber-Shiu function that achieves parametric rates of convergence, without needing a model selection procedure. It is worth noticing that our results are non-asymptotic and concern the MISE of the estimator. In comparison, the Laguerre deconvolution estimators have slower rates of convergence and necessitate a model selection procedure in practice. The better performances of our procedure are confirmed by a numerical study, on simulated data.
Knowing that the Laguerre deconvolution method does not achieve the best rate of convergence in the compound Poisson model is important. Indeed, this method is used to estimate the Gerber-Shiu function in more general models, see Zhang and Su (2019), Su, Shi and Wang (2019) and Su, Yong and Zhang (2019). These papers have one thing in common: they all want to estimate a function φ that satisfies an equation of the form φ = φ * g + h, with g and h functions that depend on the specificity of each problem. If we applied the procedure described in the beginning of Section 2, we could obtain an estimator that would achieve the same rate of convergence as the estimators of g and h (see Remark 2.2). Hence the Laguerre deconvolution method used in these papers is not optimal since a factor m appears in the variance term in the construction step ofφ m fromĝ m andĥ m .

Proof of Theorem 2.5
We start with some preliminary lemmas.
We take the expectation and we get: 2148
The next proposition provides an upper bound on the L p -risk ofρ δ .
Proposition 7.4. Under Assumption 1, for p ≥ 1, we have: Proof. By definition, ρ δ is a solution of the Lundberg equation, so it is a zero of the function: δ (s) := cs − (λ + δ) + λ Lf (s). The estimatorρ δ is then a zero of the function: We use a Taylor-Lagrange expansion: where z is between ρ δ andρ δ .
under the safety loading condition. Thus: For the second term, we use Corollary 7.3: For the first term, we apply Lemma 7.1 conditional to N T : Finally: Now, we can prove Theorem 2.5.
Proof of Theorem 2.5. By Pythagoras theorem: hence we need to control the variance terms: Using equations (4.17) to (4.21) and (4.10) to (4.14) in Zhang and Su (2018), we can obtain equations (32) and (33) below. Still, we give the proofs of these equations for the sake of completeness.
We notice thatb k andĉ k (defined by (12) and (13)) can be written as:

F. Dussap
and that the coefficients b k and c k (defined by (9) and (10)) can be written as: where F is given by: (27) Thus, we need to give an upper bound on quantities of the form: with: The bound on V m is based on the following decomposition: where: Let us notice that if δ = 0, thenρ δ = ρ δ = 0, so Δ k = 0 and the decomposition reduces to Z k .
• Bound on m−1 k=0 E Z 2 k . This bound is obtained by a projection argument: where the last inequality comes from the fact that (ψ k ) k≥0 is an orthonormal basis of L 2 (R + ). From (27), we see that: for the coefficients of g, λ c 2 T E[W (X)] for the coefficients of h.
(30) where W (X) is defined in Assumption 3. In the δ = 0 case, this gives the desired results.
• Bound on m−1 k=0 Δ 2 k . We use a projection argument again: , so by the mean value theorem: Since the function te −ρt 1 t>0 achieves its maximum at t = 1 ρ , we see that: Thus, Using the decomposition (29) in (28), we obtain V m ≤ 2 (30) and (31) yields: We apply Hölder's inequality on the second term in (32) and we use Proposition 7.4: . We need to evaluate this last expectation: Thus, we obtain: . We make the same reasoning for h, replacing X i by W (X i ).
The classical result from O. Toeplitz says that this matrix induces a bounded operator on 2 (N) if and only if (α n ) n∈Z are the Fourier coefficients of some function α ∈ L ∞ (T), where T denotes the complex unit circle. We denote both the matrix (34) and its induced operator on 2 (N) by T(α), the function α being called the symbol of the Toeplitz matrix. Finally, if m ∈ N * and if T(α) is a Toeplitz matrix, we denote by T m (α) the m × m matrix: The operator norm of T(α) depends on the properties of its symbol. In the case where α k = 0 for all k < 0, we have the following lemma.
Proof. The fact that T(α) is lower triangular and that T(α) × x = α * x is clear from the definition of a Toeplitz matrix. Then, Young's inequality for convolution yields α * x 2 ≤ α 1 x 2 .
Concerning the inverse of a Toeplitz matrix, its norm depends on the position of zero relatively to the range of the symbol. More precisely, we use the following result.
Lemma 7.6 (Lemma 3.8 in Böttcher and Grudsky (2000)). Let α ∈ L ∞ (T) and let E(α) be the convex hull of the essential range of α. (16) is a Toeplitz matrix and its symbol is given by: Let us notice that under Assumption 4, we have (α k ) k≥0 ∈ 1 (N) so the symbol α is continuous on T, and thus α ∈ L ∞ (T).

Proof of Lemma 3.2.
We apply Lemma C.1 2 in  to the coefficients of g: the sequence (β k ) k≥0 , defined by β 0 := b0 √ 2 and β k : for k ≥ 1, are the Fourier coefficients of the function t ∈ T → Lg( 1+t 1−t ) ∈ C. Thus, we have: with the convention Lg(∞) = 0. Since α(t) = 1 − k≥0 β k t k , we get: We notice that if t ∈ T \ {1}, then there exists ω ∈ R such that 1+t 1−t = iω. Thus: 2 this lemma is stated for the generalized Laguerre basis, which depends on a parameter a. This parameter is equal to 1 in our case.

Proof of Theorem 3.3. By Pythagoras Theorem
• First term. We apply Proposition 7.8 with Lemma 7.5: Hence we have: (1−θ) 2 . To conclude, we use the upper bounds established in the proof of Theorem 2.5. If δ = 0, we have: and if δ > 0, we have:

Proof of Proposition 3.5
Let us introduce the sequence of functions (D k ) k≥0 as: so we can rewrite: with T m (•) defined by (35). Now, the difference betweenÂ m and A m can be decomposed as: The next lemma gives a control on the first term in the decomposition (36).
. Then for p ≥ 1 and log m ≥ p, we have: Proof. We want to apply Theorem C.2. First, we need upper bounds on Z i op and λ max (E[S n S n ]).
• Bound on Z i op : and by Cauchy-Schwarz inequality: We want apply Theorem C.2 to our matrix S n , which is not Hermitian. We use the following trick, called the Paulsen dialtation. For M a rectangular matrix, we define: thus for p ≥ 1 and r ≥ max(2p, 2 log m), we get that: If log m ≥ p, then r = 2 log m and we get E S n 2p op 2 2p−1 (nμ m log m) p + 2 6p−1 (m log m) 2p . Now we can prove Proposition 3.5.
Proof of Proposition 3.5. From the decomposition (36), we get: For the first term, we apply Lemma 7.9 conditional on N T : . For the second term, we know from Corollary 7.3 that , and:

Proof of Theorem 3.6
The following results are based on the proofs of Lemma 3.1 and Corollary 3.2 in Comte and Mabon (2017).
Proposition 7.10. If m log m ≤ cT , then it holds: Proof of Proposition 7.10. We decompose according to Δ m : We now give two bounds on (37) (37) and using the set Δ 2 m , we have that: We apply Proposition 3.5 and get: cT m log m . Starting from (37) again, we get: Applying Proposition 3.5, we get: with C (p, λ) = O(λ p ∨ λ 2p ).

Upper bound on
cT m log m . From the triangular inequality: we obtain: Moreover we have assumed that A −1 m op < 1 2 cT m log m , so: Now let us rewrite this probability, as: To control the second term, we apply Markov inequality and Proposition 3.5: Next, to control the first term on the right hand side of Equation (39), we apply Theorem C.1: We apply Markov inequality again, along with Proposition 3.5: So starting from Equation (39) and gathering Equations (40) and (41) gives: with C(p, λ) = O(λ p ).
Finally gathering Equations (38) with (42), we get that The next proposition is a variant of the last one. It gives a better bound than applying directly Proposition 7.10 to Proposition 7.11. If m log m ≤ cT , then it holds: Proof of Proposition 7.11. The proof follows the lines of the proof of of Proposition 7.10, but starting from the following decomposition: It yields the following upper bound: Following the proof of Proposition 7.10, we get: Now we can prove Theorem 3.6.
Proof of Theorem 3.6. By Pythagoras Theorem, φ −φ In the proof of Theorem 2.5, we saw that: We decompose the variance term in three terms: For the first term, we apply Proposition 7.10. For the second term, we use the fact that A −1 m,2 andĉ m are independent, and we apply Proposition 7.11:

F. Dussap
For the third term: We apply Lemma 3.2 and we obtain the following bound, with C(λ) = O(λ∨λ 2 ):

Appendix A: Model reduction procedure
We propose a model reduction procedure to choose the dimensions m 2 and m 3 , defined by (14). We explain the method for the choice of m 2 in the case δ = 0. Let us assume we have estimated the M first coefficients of g, for a large M . By Remark 2.6, we know that the best estimator isĝ M . Our goal is to choosê m 2 smaller than M that achieves a similar MISE. This provides a parsimonious version of the estimator without degrading its MISE. By Theorem 2.5, the MISE ofĝ m is given by: Ideally, we would like to choose the first m such that the bias term g − g m 2 L 2 is smaller than the variance term λ c 2 T E[X]. Since these terms are unknown, we estimate them by M −1 k=mb 2 k and 1 (cT ) 2 N T i=1 X i respectively. We choosem 2 as: with κ 2 an adjustment constant. The next proposition shows that the MISE of gm 2 does not exceed the MISE ofĝ M by more than κ 2 × (variance term).

F. Dussap
The same goes form 3 : we estimate the bias term by M −1 k=mĉ 2 k and the variance term by 1 (cT ) 2 N T i=1 W (X i ); we choosem 3 as: By the same arguments, the MISE ofĥ m3 is given by: In the case δ > 0, we choose the samem 2 andm 3 as in the case δ = 0. By the same arguments, we obtain: and: (1 − θ) 2 δ 2 .
Numerically, we compared the MISE's of the Laguerre-Fourier estimator with and without the model reduction procedure form 2 andm 3 , with the choice κ 2 = κ 3 = 0.3. We show the results in Table 4. We see that the model reduction Table 4 Comparison between the MISE of the Laguerre-Fourier estimator with and without model reduction. In each case, we chose the following parameters: X ∼ Exp(1/2), λ = 1.25, c = 3, T = 80. With this set of parameters, E[N T ] = 100. Each cell displays an estimation of the MISE over 200 samples with a 95% confidence interval, and the mean selected modelsm 2 andm 3 . In every case, m 1 is equal to N T .

With Model Reduction Without Model Reduction
Ruin Probability