A Bayesian Approach to Modeling Multivariate Multilevel Insurance Claims in the Presence of Unsettled Claims

A Bayesian model for individual insurance claims is proposed which accounts for the multivariate multilevel features of the claims, including multiple claimants for the same event, each of whom may receive benefits under different coverages. A Bayesian approach makes it possible to account for missing values in the covariates and partial information contained in open files, thereby avoiding sampling bias induced when unsettled claims are ignored. For a given claim, the combination of coverages under which payments are made is modeled as a type with multinomial regression. The presence of legal and expert fees follows a logistic regression given the type. The non-zero claim amounts are then modeled with log skewed normal regressions linked by a Student t copula. The Bayesian framework yields a predictive distribution for the amounts paid, including parameter risk and process risk, while handling missing covariates and open files. The approach is illustrated with Accident Benefits car insurance claims from a Canadian company.


Introduction
General insurance companies are increasingly interested in statistical methods that take advantage of their detailed databases. One such application is the statistical modeling of paid amounts at the policyholder or claim level. With greater granularity come broader applicability and many challenges that can be overcome using a Bayesian perspective.
A claim amount can be viewed as a sum of losses relating to the various coverages in the insurance policy. A typical example is a car insurance product which may cover vehicle damage, medical and rehabilitation expenses, income replacement benefits, etc. A car accident can trigger payments under more than one coverage and for multiple claimants (e.g., the driver, passengers, pedestrians). As the total claim amounts are heterogeneous, it is natural to model the detailed components jointly before they are aggregated. Both the probability of a payment for a component and its amount can depend on characteristics of the accident or the claimant. Dependence between the claim components naturally arises as they all relate to the same event.
A convenient way to construct multivariate distributions is to model the marginal behavior and the dependence structure separately through copulas. The usefulness of copulas in actuarial science and risk management is well documented; see, e.g., McNeil et al. (2005) or Denuit et al. (2006). This approach was adopted, for instance, in the multivariate claims model of Frees and Valdez (2008) and Frees et al. (2009), where the severity was split into two parts: the claim type, identifying the coverages with strictly positive payments, and the amounts given the type. The introduction of the claim type effectively takes care of the point mass at zero, due to the claims for which no payment is made. It also allows to model jointly the amounts that are strictly positive, conditionally on the claim type, in a copula model with margins that depend on covariates.
As an alternative, Shi et al. (2016) account for the mass at zero using Tweedie margins in a Gaussian copula model for policies involving more than one vehicle and multiple coverages. In this case, the copula describes the dependence in occurrences and amounts, thereby being somewhat less flexible than the claim type approach of Frees and Valdez (2008). In Yang et al. (2011), a copula links the components of bodily injury car insurance claims. In these papers, maximum likelihood, pairwise composite likelihood, or inference for margins are used for estimation. Thus, the resulting predictive distributions of claim amounts, approximated by simulation, do not incorporate the uncertainty relating to the large number of parameters.
We propose here a Bayesian model for multivariate multilevel claim amounts. As discussed in Shi et al. (2016), this data structure is frequent in insurance. Detailed car insurance claims, workers compensation insurance, group health insurance, and commercial auto insurance exhibit an unbalanced multilevel structure. Micro-level Accident Benefits insurance claims data from a large Canadian insurance company serve as justification for our approach. The proposed model has many applications; it could be used, e.g., to get the predictive distribution of the paid amount given the claim characteristics when a new claim gets reported. Such a model can complement (and maybe even replace in the long run) the costly initial assessment by a claims expert.
The dataset, described in Section 2, is complex: each claim may involve one or more claimants, and the payments for each claimant are broken down into four parts. Recent claims are the most relevant for the predictions, but because the settlement of Accident Benefits claims may take a few years, most of them are not yet settled and their total paid amount is unknown. Using only settled files for estimation causes sampling bias: the simpler cases close faster than the major ones. Ignoring the fact that open claims are censored also understates the paid amounts. To solve this problem, we propose an innovative Bayesian procedure that includes the open claims partial information.
The basic model setup, described in Section 3, is adapted from the hierarchical insurance claims model of Frees and Valdez (2008) and accounts for covariates known at the time of reporting. Claims are classified into five types depending on the combination of components with a strictly positive payment. These types are modeled with a multinomial regression. The presence of expenses in the form of subcontracted legal and claims expert fees is modeled conditionally on the type via logistic regression. Given the type and presence of expenses for each claimant in an accident, the strictly positive amounts are log skewed normal regression models linked by a Student t copula.
The Bayesian framework facilitates the handling of missing covariates and partial information contained in open claims. Imputation models for the censored types, presence of expenses, and amounts are developed in Section 4 and are estimated using the settled claim development data. MCMC methods with a Metropolis-within-Gibbs algorithm can be set up, as explained in the Supplement (Côté et al., 2020), to obtain the posterior distribution of the parameters and the posterior predictive distribution of the multivariate claim, including parameter uncertainty. The good performance of the approach is illustrated in Section 5, where the posterior predictive distribution based on the proposed model and the data available in 2008 is compared to the posterior distribution based on the complete 2015 data. Predictive distributions and predictive intervals on paid amounts for individual claims or portfolios from the holdout sample are shown in Section 6. The paper ends with a short conclusion in Section 7.

A glimpse at the data
The dataset consists of claim characteristics and the associated inflation-adjusted paid amounts for an Accident Benefits car insurance product sold in the province of Ontario, Canada. In total, 54,281 accidents were reported between January 1, 2004 andDecember 31, 2015. The payments are known up to the latter date, at which point 15% of the claim files are open. A random sample of 60% of the claims is used for estimation; the remainder forms a holdout validation sample and is not discussed in this section.
Most claims involve one (81.6%) or two (13.9%) claimants, but in one case there were as many as eight claimants. For each claimant, the payment is broken down into four components: three insurance coverages and the legal and claims expert fees (Expenses). The coverages are medical and rehabilitation benefits (Medical), income replacement benefits (Income) and caregiver benefits (Caregiver). Each of these components has a mass at zero. To model them jointly, the claimant files were classified into the five types listed in Table 1; 0.4% of the files matched none of these categories and were discarded as they were deemed highly unlikely to occur ever again.   As claim amounts below $100 can be regarded as immaterial for bodily injury claim modeling, they were treated as zeros when choosing the type. Such amounts hardly contain any information about types and their modeling would yield little benefit. Also, a claimant who receives income replacement or caregiver benefits must also incur medical expenses. Expenses were present in 28% of the files but were more likely for types 2-5. Effective September 1, 2010, a reform of the Ontario Insurance Act (FSCO, 2010) affected the Accident Benefits product by modifying the allowed paid amounts depending on the claim gravity, especially for the caregiver benefits. Côté and Genest (2015) documented the impact of this reform in a higher-level model for multiple business lines including the aggregate version of the claim portfolio studied here. Figure 1 shows the number of claimants per accident year and by status. The recent accident years, which are post-reform and the most relevant for future predictions, contain many open files: the final settlement amount is unknown at evaluation date, i.e., December 31, 2015.
The left panel of Figure 2 contrasts the type distributions for closed and open files. These distributions clearly differ, and dropping the open claims would bias the type analysis towards minor claims such as those that close at zero. As will be seen in Section 4.2, the type of open claims can actually change until settlement.
The right panel of Figure 2 compares kernel density estimates of the strictly positive Medical paid amounts, on the log scale, for unsettled (dashed line) and closed (solid  line) claims incurred after the reform. It transpires from it that large claims are more likely to be unsettled at evaluation date: even though these amounts are censored, their distribution is shifted to the right compared to that of the settled claims amounts.
Therefore, the information conveyed in the open files ought to be exploited in the estimation procedure for the amounts. This issue was not relevant to the micro-level claim analyses of Frees and Valdez (2008) or Frees et al. (2009) as the claim status was unavailable and all claims were assumed to be settled. Table 2 shows descriptive statistics of the log of the strictly positive paid amounts by component. At this stage, the dependence between amounts is not analyzed, as part of it is caused by common covariates. At the claim level, the covariates on the policy are: whether the vehicle is a motorcycle or a scooter, the number of years insured with the company (grouped), whether the policyholder also has property insurance, and whether there is optional income coverage. Two categories of covariates are given by claimant: (a) Claimant's demographics: Age, income indemnity level (categorical), marital status (single, married, divorced or other), and rating score.
(b) Claim characteristics: Whether it occurred after the 2010 reform, gravity (minor, major, or catastrophic), number of injuries, pain level (scale of 1 to 10), presence of a legal counsellor, and initial case reserve (in thousands) for the three coverages.
Income indemnity level is used only in the Income paid amounts model and is missing in 8.1% of the type 3 and 5 cases. Age, which will be used in all the marginal amounts models, is missing in 8.9% of cases. Even though 47.8% of the pain levels are missing, this feature still improves both the type and amounts models.

Multivariate multilevel claims model
The basic setup of the proposed model is inspired by Frees and Valdez (2008). Given the covariates for each claimant involved in the claim, the type for each claimant, i.e., the combination of coverages with positive payment, is modeled by a multinomial regression. Conditional on the type, the presence of Expenses follows a logistic regression model, for each claimant. These two components take care of the masses at zero, and the log of the strictly positive amounts are modeled with skewed normal regressions linked in a Student t copula of appropriate dimension, with a suitable correlation structure. Given covariates and parameters, different claims are assumed to be mutually independent but amounts paid in relation to the same claim are related. This association may result from unmeasured covariates or the duration of the rehabilitation, which affects simultaneously the claimant's medical, income replacement and caregiver benefits, if any. The model is multivariate due to the joint distribution of the amounts for a claimant, and multilevel because of the dependence between amounts for different claimants involved in the same claim.
See Figure 3 for a schematic view of the model. Details on each submodel, the notation and the priors are given in this section and, for ease of understanding, are illustrated with the following running example.
Example. Claim i = 5123 involves two claimants: the first received $2122.74 in Medical benefit; the second had a Medical benefit of $9399.86 and a Caregiver benefit of $3306.82. There were no Expenses for Claimant 1 but a $6600.47 fee was allocated to Claimant 2.

Multinomial model for type
For each claim i ∈ {1, . . . , N} and claimant j ∈ {1, . . . , n i }, where n i is the number of claimant files relating to claim i, we model the type S i,j , as defined in Table 1, using a multinomial logistic regression. Let x S i,j be the row vector of relevant explanatory variables for claimant j in claim i. In the present case, the components of x S i,j were the policy age and accident year if prior to the reform, the claimant's marital status, number of injuries, gravity, pain level, and two interactions involving the reform indicator.
As there is little prior information on the covariate impacts on the types, the parameters are assumed a priori independent with a diffuse prior Types are s i1 = 2 (Medical) and s i2 = 4 (Medical and Caregiver).

Binomial model for presence of Expenses
The indicator W i,j of the presence of Expenses takes value 1 whenever the amount for that component exceeds $100. Given S i,j , W i,j is modeled by a logistic regression with , where x W i,j are the relevant covariates, β W is the vector of coefficients, and α W s is the effect of type s with α W 1 = 0. In the present case, the components of x W i,j were policy age, a reform indicator, and gravity, as well as two interactions involving the reform indicator. The parameters are again assumed to have independent Gaussian priors centered around 0 and with large variance σ 2 W . For notational simplicity, set α W = (α W 1 , . . . , α W 5 ) and define M ijk as the indicator that the payment for component k (Medical, Income, Caregiver and Expenses are numbered from 1 to 4) and claimant j in claim i is positive, i.e., Example (cont'd). As there were Expenses for Claimant 2 only, we have w i1 = 0 and w i2 = 1. Recall that s i1 = 2 and s i2 = 4, so we have m i = (1, 0, 0, 0, 1, 0, 1, 1) and the index set of positive amounts is

Skewed normal regression models for the marginal amounts
Given the component vector M i , the amounts that are exactly 0 are known, so it remains to model the strictly positive paid amounts. When M ijk = 1, denote by Y ijk the natural logarithm of the payment made under component k ∈ {1, . . . , 4} (i.e., Medical, Income, Caregiver or Expenses) for the jth claimant in claim i.
A preliminary analysis showed that the log amounts distribution is skewed but not heavy-tailed. Fernández and Steel (1998) showed how to build skewed versions of symmetric distributions and how a Metropolis-within-Gibbs algorithm can be used to estimate the parameters. Inspired by this work, we assume that where x Y i,j stands for the vector of covariates listed in Tables 9-10 in the Supplement. Adapting the parametrization in Trottier and Ardia (2016), the density of the Fernández-Steel skewed normal can be written, for y ∈ R, as The skewed normal with ξ = 1 reduces to the symmetric normal. The moment generating function obtained in the Supplement guarantees the existence of the mean and variance of the paid amounts on the original scale. Risk measures such as the Tail Value-at-Risk of the paid amounts can be expressed in terms of the distribution function Φ of the N (0, 1) and its inverse, as can be derived with similar calculations as in Section 2 of the Supplement.

Student t copula for the dependence between the paid amounts
To account for the residual dependence once covariates have been factored in, a copula links the marginal regressions. For example, for m i = (1, 0, 0, 1), i.e., one claimant with Type 2 and Expenses, the joint distribution of the amounts can be expressed in terms of a bivariate copula C : where C is assumed to be the same across all values of the covariates. This flexible representation is widely used in multivariate analysis; see, e.g., Genest and Favre (2007) or Genest and Nešlehová (2012) for details on copulas and copula models.
The dimension of the copula for claim i is the number of 1s in the component vector M i , or ni j=1 4 k=1 M ijk . We expect the shape of the dependence between two components to remain unchanged when another component is present. As the claimant labels convey no information, the within-claimant dependence should be the same for all claimants. We also expect the between-claimant association to be weaker than the within-claimant association.
These constraints can be embodied into a Student t copula with correlation matrix Σ i built such that, for components k 1 , k 2 ∈ {1, . . . , 4} and claimants j 1 , j 2 ∈ {1, . . . , n i }, 1] and ρ k1,k2 = 1 when k 1 = k 2 . With the parameter γ, the dependence structure can be written using only seven parameters. This is particularly appealing as there are few cases of multiple claimants who all receive Income or Caregiver benefits, which would affect the estimation of between-claimant correlations. Let n max = max(n 1 , . . . , n N ) and define the positive definite matrices For claim i, the largest possible correlation matrix necessary is the Kronecker product Σ = Γ ⊗ Λ, which is necessarily positive definite. The actual copula parameter for claim i is the submatrix Σ[ṽ i ,ṽ i ] corresponding to rows and columns of the coverages with positive amount paid for each claimant. The joint density of the log positive amounts is where c ν {·; Σ} is the Student t copula density with ν degrees of freedom and correlation Σ, and To lighten notation, denote the conditional distribution function In this example, the joint density of the positive amounts is given by

Prior specification in the model for the amounts
In such a large model, it is preferable to use proper priors for all parameters. For the marginal amounts model, the priors are given, where G denotes the Gamma distribution. The covariate effects are taken to be independent a priori, and due to the different scales of the covariates, we use the diffuse variance σ 2 β = 100. Weakly informative Gamma priors are used for the variance and skewness parameters, which are both positive. For the precision parameters λ k , we choose α λ = 4 and β λ = 4.5, so that E(λ k ) = 4/4.5. Setting α ξ = β ξ implies that E(ξ k ) = 1, so the prior expectation is that the normal is symmetric. It is known a priori that the skewness is moderate, so a reasonable range for ξ k is (0.5, 2), as seen in Figure 1 of the Supplement. Using α ξ = β ξ = 10 leads to Pr(ξ k ≤ 1) = 0.54 and Pr(0.5 < ξ k ≤ 2) = 0.96.
We detail below the choice of priors for the parameters of the Student t copula.

Prior for the correlation matrix parameters
The between-claimant correlation parameter γ and within-claimant correlations are assumed independent a priori. Because of the size of the dataset, the prior settings for both Λ and γ are not the primary drivers of the posterior dependence structure, but it is interesting to show how to set informative priors for these parameters. For the between-claimant parameter, many options exist: we assume (γ + 1)/2 ∼ B (8,4), i.e., a Beta distribution with shape parameters 8 and 4, so that the prior expected correlation between claimants is 1/3. Few distributions allowing for informative inputs exist for random correlation matrices. As explained in Smith (2013) and references therein, authors use priors based on Cholesky factorization, partial correlations, or restricted independent normals (Liechty et al., 2004). In many applications, the large dimension of the matrix motivates the use of shrinkage priors, as in Pitt et al. (2006), where the off-diagonal elements of the matrix may equal 0. Barnard et al. (2000) propose two priors, the marginally uniform prior derived from the Inverse Wishart distribution on the covariance, and the jointly uniform prior. These distributions are either non-informative or shrink the correlations towards 0. When higher correlations are riskier, shrinkage priors are inappropriate.
It is unintuitive to set prior expectations on a Cholesky decomposition, so we extend the marginally uniform prior of Barnard et al. (2000) to be informative on Λ. This prior exploits the Inverse Wishart distribution for symmetric positive definite matrices and the relationship between covariance and correlation matrices. Suppose W is a k × k Inverse Wishart random covariance matrix with positive definite scale matrix Ψ and degrees of freedom ν > k. Then one can write W = ARA, where A is a diagonal matrix with ith diagonal element a i > 0, and R is a correlation matrix. The distribution of R when W has a standard Inverse Wishart distribution can be used as a prior. The derivation is shown in the Supplement, along with our choice of parameters.

Choice of degrees of freedom
Given the correlation matrix, the parameter ν ∈ (1, ∞) influences the tendency of variables to vary together in high quantiles, as measured by the Tail Dependence Coefficient (TDC): low values of ν imply strong tail dependence and beyond 12-15 degrees of freedom, the difference between the Student t and the Gaussian copula (which has no asymptotic dependence) becomes practically negligible. Furthermore, it is common to use a degenerate prior for the parameter ν to avoid convergence issues due to the flatness of the t likelihood.  (7) 0.14 0.17 0.09 0.10 0.10 0.08 TDC(10) 0.08 0.11 0.05 0.06 0.05 0.04 TDC(13) 0.05 0.07 0.02 0.03 0.05 0.02 Table 3: Pairwise Student t copula maximum pseudo-likelihood fit based on residuals from closed claims; in the table, TDC(ν) refers to the fitted value of the tail dependence coefficient when the degrees of freedom are set to ν. The letters M, I, C, and E respectively stand for the Medical, Income, Caregiver, and Expenses component.
To inform the choice of ν, we explored the dependence structure in the closed claims with rank-based inference tools for copula regression developed in Côté et al. (2019). Bivariate Student t copulas were fitted by maximum pseudo-likelihood based on the ranks of the residuals from each of the within-claimant pair. The results are shown in Table 3. For the Student t copula, a single value of the parameter ν must be used for all pairs. As the sampling bias may reduce the tail dependence strength in this preliminary analysis, and considering that stronger tail dependence is more conservative (thus preferable) in insurance applications, we set ν = 10. Other values of ν near 10 would yield very similar posterior distributions for the other parameters (not shown).

Estimation with open claims and missing covariates
In the model described in Section 3, the full likelihood factorizes into two parts, namely a) the likelihood relating to the types s ≡ (s 1,1 , . . . , s N,n N ) and presence of Expenses w ≡ (w 1,1 , . . . , w N,n N ): Posterior distributions for the vectors of parameters (β S , β W , α W ) and (β Y , σ, ξ, γ, Λ) can thus be computed separately in parallel, because it is assumed that the parameters for the type and presence of Expenses are a priori independent of those for the amounts.
In a case where all covariates are observed and all claims are settled, inference proceeds with classical MCMC techniques. The posterior distribution of the type parameters is simply where π 0 (β S ) = φ(β S ; 0, σ S ), and the posterior for the parameters (β W , α W ) is where π 0 is again the independent Gaussian prior. The posterior distribution of the amounts parameters given the component vectors m is π 0 (σ k )π 0 (ξ k )π 0 (γ k ), as specified in Section 3.5 and π 0 (a, Λ) is given in (1) in the Supplement.
In the present case and in many actuarial applications, however, these posteriors cannot be used directly because claim information is incomplete at evaluation date. This section describes how to handle these issues by integrating into the fully Bayesian model the fact that some covariates may be missing and unsettled claims are censored.
In Section 4.1, we detail how the missing covariates x mis can be treated as any other parameter in the Bayesian procedure. In Section 4.2, we handle the type and presence of Expenses for the open claims. Both are censored in an unconventional way due to the fact that open claims may change type before settlement. We model the transition probabilities before settlement conditional on the gravity and the time elapsed since claim occurrence.
In Section 4.3, we model the multivariate amounts given the information known at evaluation date. In the dataset, we observed that many claims are classified as open for multiple development periods before they close without further payments. Therefore, the settlement amounts for open claims are modeled with a continuous component and a point mass at the current paid amounts. The probability associated with this point mass depends on the number of consecutive periods without payment, gravity, and the time between claim occurrence and evaluation date.
Details on the MCMC algorithm based on the procedure described in this section are provided in the Supplement, along with the resulting posterior mean and standard errors of the model parameters.

Treatment of missing covariates
The pain level is a covariate in the type model and is missing whenever the claimant did not answer the question "Rate your pain on a scale of 1 to 10." This may happen, e.g., when the reporting delay is long and the question is no longer relevant, or when the claimant is unable to answer questions.
As the pain level is missing in a variety of claims ranging from simple to extreme, missingness is assumed to happen at random. In that case, and in view of the relatively low percentage of missingness, the analysis can still be performed without modeling the inclusion mechanism (Gelman et al., 2004). For the type model, the posterior distribution of the parameters, given the observed covariates x obs , is where x mis is the vector of missing covariates. The prior distribution of the missing values is taken to be a multinomial proportional odds model fitted with the complete open and closed records. It was designed to yield a reasonable distribution of generated pain levels given the relevant covariate values (number of injuries, reform indicator, claim gravity, civil status, and policy age). Details are given in Section 5 of the Supplement.
Although integral (1) is high-dimensional, it can be computed with data augmentation in a Gibbs sampler; see the Supplement. The conditional posterior of the missing covariate x mis,ij for claimant j in claim i is π(x mis,ij | β S , s ij , x obs ) ∝ p(s ij | β S , x obs,ij , x mis,ij )π 0 (x mis,ij | x obs ).
The claimant's age, income indemnity and pain levels appear in the amounts model, and they may be missing. Assuming that they are missing at random, the posterior distribution of the amounts parameters (β Y , σ, ξ, γ, Λ) given the observables (y, x obs , m) is where the nuisance parameter a is integrated out; see the Supplement. The pain level was found to be unrelated to age or income, so a multinomial proportional odds model was used again as a prior, but this time only the records with strictly positive paid amounts were used to fit the parameters and the claim type was included as a covariate. As pain level is involved in both the type and amounts models, the updates could in principle be done once for both parts. However, doing so would prevent one from running the two chains in parallel and would complicate the programming with little added value.
The income indemnity level is also a multinomial proportional odds model involving the reform indicator, the number of initial injuries, the policy age, the claim type, whether the policyholder has home insurance, and the rating score. Given the income level, age is modeled by a linear regression model truncated from below at 15 and from above at 85. Details are provided in Section 5 of the Supplement.

Treatment of types and presence of Expenses for open files
At evaluation date, 15% of the claimant files are unsettled, and their type may change before they close. To study the behavior of the type from claim reporting to settlement, we use the development data for closed claims, available by quarter (3-month period). These data contain the history of the total payment made (thus the type) in each quarter. The period of claim occurrence is called the first development period (DP), the quarter following this is the second DP, and so on. The DP at evaluation date is the number of periods since the occurrence of the claim. Figure 4 shows the distribution of the type at settlement for files that were open and in Type 2 (Medical) at the end of DPs 1-20. The left panel is for minor injuries, and the right one combines major and catastrophic injuries. These graphs illustrate that an open file may change type, and the probability of moving to another category depends on the gravity and the DP. Also, there are a few reimbursements (moves from 2 to 1). Therefore, the types for open files are censored, although in an unusual way due to the reimbursement possibility and the low probability of transition to another type. This feature can be incorporated into the inference by imputation of the ultimate types for open files, given the censoring point. Let s op and s cl be the sets of ultimate types for files that are respectively open or closed at evaluation date. If the complete set s = s cl ∪ s op of types at settlement were known, the posterior for β S would be as in (1). As s op is unknown at evaluation date, however, one can only rely on the set s op of current types for open files, yielding the following posterior for β S : π(β S | s cl , s op , x obs ) = π(β S | s, x obs )p(s op | s op , s cl , x obs ) ds op . With this posterior, we can then use data augmentation (Tanner and Wong, 1987) to simplify the estimation by imputing the possible values of s op .
Here, the censoring is very informative and we must design an imputation model to describe the probabilities of transition from s op to categories {1, . . . , 5}, considering gravity and the development period at evaluation. This model is constructed separately from the multinomial model on the ultimate claim type, because the fact that the type is s op after t DPs contains more information on s op than the covariates known at the time of file opening. That the type changes after DP t and before settlement is also plausibly unrelated to any reporting-time covariate x obs other than gravity. Details on the estimation of the transition probabilities are given in the Supplement. In the imputation procedure, these probabilities were treated as deterministic.
A schematic representation of the transition probabilities is depicted in Figure 5. The solid lines represent fixed probabilities, while the dashed lines are those that vary proportionately with the multinomial probabilities in the outer Bayesian model. This helps to retain the covariate effects. There are no lines when the probability is null, such as for going from type 3 (MI) to 4 (MC).
A similar analysis is performed for the presence of Expenses. In this case, there is no possibility of reimbursement, so w op,ij = 1 implies w op,ij = 1 almost surely. However, if there are no Expenses at evaluation for an open file, the probability of transitioning to 1 before settlement is non-zero and is estimated using the development data, varying by DP and gravity. Estimated values are given in the Supplement.
Let w = w op ∪ w cl be the union of the sets of ultimate Expenses indicators for files that are respectively open or closed at evaluation date. Let also x = x obs ∪ x mis denote the completed covariate information. Then the posterior distribution of the type and expense parameters π(β S , β W , α W | s cl , s op , w cl , w op , x obs ) is proportional to To compute this integral, we update the missing covariates x mis and the ultimate responses for open claims w op and s op sequentially in a Gibbs sampler using the imputation models. For details about the algorithm and the posterior means and standard errors of the parameters, consult the Supplement.

Treatment of the amounts for open files
Let y op and y cl be the sets of ultimate log amounts for files that are respectively open or closed at evaluation date. Conditional on the complete response data y = y cl ∪ y op , the posterior for the amounts parameters would be as given in (2). As y op is unknown at evaluation date, however, one can only rely on the set y op of log amounts for open files at evaluation date, which are treated as censored observations because other payments relating to these files can be made before settlement. We can then compute the posterior π(β Y , σ, ξ, γ, Λ | y cl , y op , x obs , m) with the integral π(β Y , σ, ξ, γ, Λ | y, x obs , m)p(y op | y op , x, m) dy op .
Note that if the type aggravates before settlement, then the paid amount for that component would be non-zero, but that does not inform on the amounts parameters. Accordingly, the analysis is performed conditional on m, and only the amounts that are strictly positive at the evaluation date are allowed to aggravate before settlement.
Many open files actually close later on without further payments, so for these claims the censored value is in fact equal to the settlement amount. It is thus necessary to account for the probability of settling at current value; otherwise the unknown settlement amounts will be overstated. The imputation model for the censored amounts has two parts: a mass at y op,i = y op,i representing the probability of closing without other payments, and a continuous component left truncated at y op,i .
As in Section 4.2, the development data for settled claims are used to assess the probability of closing without more payments. Let t now be the evaluation date and P t be the payment made in DP t. Further, let T c be the file closure time, i.e., the period at the end of which the company releases the reserves for that file and closes it. If T c > t now , then the file is open at evaluation date and the amount is censored. If T c ≤ t now , then the file is closed and the complete development is known.
Define T f as the time of financial closure, i.e., the period in which the last positive payment was made for a claim. We have T f = sup{t : t ≤ T c , P t > 0}. This is different from T c and in general, T f ≤ T c . For all claims that are open, i.e., for which T c > t now , the focus lies in whether the financial closure already occurred, i.e., whether T f ≤ t now . For example, consider the time line:

File closed ↓
This sketch represents the cash flows of a claim that closed after six development periods, so T c = t 6 , but for which the time of financial closure is T f = t 3 , as the last payment was made in Period 3. At the end of DPs 3, 4 and 5, the file was open, but the financial closure had already happened. If T last is the last period, before and including t now , where there was a strictly positive payment, i.e., T last = sup{t : t ≤ t now , P t > 0}, the probability of interest is This probability is modeled by a Bernoulli GLM with logistic link depending on three covariates: the development year d i , the number of periods since last positive payment t now − T last , and the gravity. Using the development period instead of the development year yields very similar results.
There are more than one observation per claimant, and for the current estimation purpose, all observations for which the file was open at end of period, but ultimately closed before evaluation, are considered. For example, in the previous time line, the useful observations are the periods t 1 , . . . , t 5 for which the claim is still open at end of period, i.e., t < T c . For period t, the response variable is the indicator 1(T f ≤ t).
Based on the development data for all claims that were settled as of December 31, 2015, the following model, selected with AIC, BIC and the optimal ROC curve, is used to simulate whether an open claim closes without other payments. The linear predictor iŝ where z i = t i − T last,i is the number of consecutive periods without payment, d i is the development year, and 1 maj,i and 1 cat,i indicate whether the gravity is major or catastrophic, respectively. The fitted probabilities are Finally, the imputation distribution is the full conditional posterior, so that For each open file, 1(T f ≤ t now ) can be simulated from the logistic model defined in (3). If it is 1, then y op is set equal to the current value y op , and if it is zero, y op is simulated using the full conditional Bayesian copula model for the amounts, truncated from below at the censoring values. This model preserves the covariate effects and the dependence. Pitfalls of this approach are the assumptions that the censoring point and the development year are uninformative on the levels of the future payments.
Details of the MCMC algorithms, starting values, proposal distributions and posterior distributions for the marginal parameters are given in the Supplement and the posterior distribution of the dependence parameters is discussed in Section 6.3.

Comparison of the proposed procedure with others using closed claims or ignoring the claim status
A good posterior predictive distribution based on the available data should resemble one that would be obtained if the complete response data were known, say the "target" posterior distribution. In this section, we illustrate how our Bayesian procedure performs better in this regard than simpler strategies. We compare three approaches:

Closed only:
The estimation is based solely on the data for files that were settled at the evaluation date. There is no imputation model as the complete data (t, w, y) are known for this sample.

No censoring:
The estimation is based on the data for all claims that were reported at the evaluation date. We assume (rather optimistically) that all files are settled, as in Frees and Valdez (2008) or Frees et al. (2009) when the claim status information is not available. There is no imputation model as the complete data (t, w, y) are assumed to be known.   As shown in Figure 6, the imputation of open claims (red curves) leads to more realistic distributions. In both cases, the observed amount falls close to the median of the red curve, but in the right tail of the (unachievable) target distribution. Indeed, while the Accident Benefits claims were known to spiral up before the legal reform, the extent of this effect had yet to be observed in the censored data as of 2008. In contrast, the black curve takes into account the (future) claim development until 2015. For the first quarter of 2009, the observed amount falls close to the median of both posterior distributions. The distribution with imputation overstates the risk slightly compared to the target distribution, as any conservative approach should.

Applications of the model and predictions
There are various actuarial applications for a claim amounts model. An insurance company may use the predictive distribution of the multivariate insurance claims for setting the initial reserves and capital, instead of basing them exclusively on the claims expert's assessment. The insurer can then decide to closely monitor claims that seem particularly risky at reporting. The posterior distributions of parameters relating to explanatory variables may be interpreted to identify the major risk factors (and maybe establish a prevention campaign) or to understand their impact, such as the effectiveness of the 2010 legislative reform. The model can also be used to determine reserves for individual claims that are "reported but not paid," i.e., those for which the insurer knows the claim characteristics but hasn't made any payments yet.

Posterior distribution for types and presence of expenses
The multinomial regression for type, together with the binomial model for presence of expenses given the type, fits the holdout-sample data well. To confirm this, predicted and observed counts in each category can be compared, given a combination of covariates. For each unique combination x S of covariates in the multinomial model, the predicted counts are obtained as follows: (i) Count the number N of observations with this combination of covariates.
(iv) For type s, the mean counts are Np(s | x S , β S )p(w = 1 | s, x W , β W , α W ) with Expenses and Np(s | The predicted counts are the mean of the posterior distribution of the counts, estimated by the average of the counts obtained when repeating the steps (i)-(iv) many times.
For example, the following covariate combination appeared 24 times in the sample, and all files were closed: major claim in 2005, policy in force since more than five years, claimant is married, has two injuries and pain level is 4/10. In that case, the predicted mean and the 95% credible interval on the mean count are shown in Table 4; the fit seems adequate. For all combinations of covariates where N > 10, the predicted versus observed counts in the different categories are plotted in Figure 7 for holdout-sample observations. On average, the observed counts are greater than predicted counts for types 1-None and 2-Medical. This is expected as open claims can aggravate.

Posterior predictive distribution of the multivariate paid amount
To approximate the posterior predictive distribution of the multivariate paid amount for a claim or a portfolio of claims, perform the following steps many times:

Posterior distributions of the correlation parameters
The marginal posterior distributions of the correlation parameters are shown in the left panel of Figure 11, along with the common diffuse marginal prior (dashed line). The within-claimant dependence is clearly significant. Modeling of the between-claimant dependence is also justified, as the posterior distribution of γ is narrow around 0.364, as shown in the right panel of Figure 11.

Joint posterior distribution of claim components
Finally, to visualize the joint predictive distribution of the components of a given claim, 10,000 realizations were drawn from the posterior distribution of Claim 5123 described in the running example. Approximately 41% of these realizations closed at 0. Figure 12 shows scatter plots of the non-negative payments (on the log scale) for the components that were ultimately observed for that claim. The points where the red lines cross correspond to the actual observation. The posterior association between the Medical and Caregiver payments for Claimant 2 is large, with the observed value of Kendall's tau equal to 39%.

Conclusion
In this article, we develop a sophisticated Bayesian model for multivariate insurance claim amounts. Major advantages of the Bayesian approach in this context are (i) the availability of the posterior predictive distribution of the paid amount, including process risk and parameter risk; (ii) the opportunity to account for expert opinion in the priors; and (iii) the possibility of taking advantage of the information in open claims and in records with missing covariates. It is crucial to appropriately include the unsettled claims in the inference, as extreme claims are more likely to be open at evaluation date.
Direct extensions of our approach include modeling the severity dispersion with covariates and using an asymmetric copula to account for the fact that high Income payments often imply high Medical payments, but not conversely. The ultimate goal is to construct a reserving model for the joint development of multivariate individual claims. While challenging, it would be advantageous to use a Bayesian approach for this task: it is ideally suited to the quantification of reserve variability.

Supplementary Material
Supplement to "A Bayesian approach to modeling multivariate multilevel insurance claims in the presence of unsettled claims." (DOI: 10.1214/20-BA1243SUPP; .pdf). Contents: Expression for the moment generating function of the Fernández-Steel skewed normal; derivation of an informative prior on the correlation matrix and a prior specification; transition probabilities between types; regression models for missing covariates; MCMC algorithms; summary of the posterior distributions for model parameters.