The Annals of Applied Statistics

Multilevel modeling of insurance claims using copulas

Peng Shi, Xiaoping Feng, and Jean-Philippe Boucher

Full-text: Open access


In property-casualty insurance, claims management is featured with the modeling of a semi-continuous insurance cost associated with individual risk transfer. This practice is further complicated by the multilevel structure of the insurance claims data, where a contract often contains a group of policyholders, each policyholder is insured under multiple types of coverage, and the contract is repeatedly observed over time. The data hierarchy introduces a complex dependence structure among claims and leads to diversification in the insurer’s liability portfolio.

To capture the unique features of policy-level insurance costs, we propose a copula regression for the multivariate longitudinal claims. In the model, the Tweedie double generalized linear model is employed to examine the semi-continuous claim cost of each coverage type, and a Gaussian copula is specified to accommodate the cross-sectional and temporal dependence among the multilevel claims. Estimation and inference is based on the composite likelihood approach and the properties of parameter estimates are investigated through simulation studies. When applied to a portfolio of personal automobile policies from a Canadian insurer, we show that the proposed copula model provides valuable insights to an insurer’s claims management process.

Article information

Ann. Appl. Stat., Volume 10, Number 2 (2016), 834-863.

Received: June 2015
Revised: December 2015
First available in Project Euclid: 22 July 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Composite likelihood insurance claims longitudinal data multivariate regression property-casualty insurance Tweedie distribution


Shi, Peng; Feng, Xiaoping; Boucher, Jean-Philippe. Multilevel modeling of insurance claims using copulas. Ann. Appl. Stat. 10 (2016), no. 2, 834--863. doi:10.1214/16-AOAS914.

Export citation


  • Aas, K., Czado, C., Frigessi, A. and Bakken, H. (2009). Pair-copula constructions of multiple dependence. Insurance Math. Econom. 44 182–198.
  • Arellano-Valle, R. B., Castro, L. M., González-Farías, G. and Muñoz-Gajardo, K. A. (2012). Student-$t$ censored regression model: Properties and inference. Stat. Methods Appl. 21 453–473.
  • Castro, L. M., Lachos, V. H., Ferreira, G. P. and Arellano-Valle, R. B. (2014). Partially linear censored regression models using heavy-tailed distributions: A Bayesian approach. Stat. Methodol. 18 14–31.
  • Cox, D. R. and Reid, N. (2004). A note on pseudolikelihood constructed from marginal densities. Biometrika 91 729–737.
  • Dunn, P. K. and Smyth, G. K. (2005). Series evaluation of Tweedie exponential dispersion model densities. Stat. Comput. 15 267–280.
  • Dunn, P. K. and Smyth, G. K. (2008). Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Stat. Comput. 18 73–86.
  • Fahrmeir, L. and Tutz, G. (2001). Multivariate Statistical Modelling Based on Generalized Linear Models, 2nd ed. Springer, New York.
  • Frees, E. (2014). Frequency and severity models. In Predictive Modeling Applications in Actuarial Sciences (E. Frees, G. Meyers and R. Derrig, eds.) 138–166. Cambridge Univ. Press, Cambridge.
  • Frees, E. W., Shi, P. and Valdez, E. A. (2009). Actuarial applications of a hierarchical insurance claims model. Astin Bull. 39 165–197.
  • Frees, E. W. and Valdez, E. A. (2008). Hierarchical insurance claims modeling. J. Amer. Statist. Assoc. 103 1457–1469.
  • Gao, X. and Song, P. X.-K. (2010). Composite likelihood Bayesian information criteria for model selection in high-dimensional data. J. Amer. Statist. Assoc. 105 1531–1540.
  • Garay, A. M., Lachos, V. H., Bolfarine, H. and Cabral, C. R. (2016). Linear censored regression models with scale mixtures of normal distributions. Statistical Papers. To appear.
  • Godambe, V. P. (1960). An optimum property of regular maximum likelihood estimation. Ann. Math. Stat. 31 1208–1211.
  • Greene, W. (2007). Econometric Analysis. Prentice Hall, New Jersey.
  • Hintze, J. L. and Nelson, R. D. (1998). Violin plots: A box plot-density trace synergism. Amer. Statist. 52 181–184.
  • Joe, H. (2015). Dependence Modeling with Copulas. Monographs on Statistics and Applied Probability 134. CRC Press, Boca Raton, FL.
  • Joe, H. and Lee, Y. (2009). On weighting of bivariate margins in pairwise likelihood. J. Multivariate Anal. 100 670–685.
  • Jørgensen, B. (1987). Exponential dispersion models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 49 127–162.
  • Jørgensen, B. and Paes de Souza, M. C. (1994). Fitting Tweedie’s compound Poisson model to insurance claims data. Scand. Actuar. J. 1 69–93.
  • Klugman, S., Panjer, H. and Willmot, G. (2012). Loss Models: From Data to Decisions, 4th ed. Wiley, New York.
  • Lindsay, B. G. (1988). Composite likelihood methods. In Statistical Inference from Stochastic Processes (Ithaca, NY, 1987). Contemp. Math. 80 221–239. Amer. Math. Soc., Providence, RI.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman & Hall, London.
  • Molenberghs, G. and Verbeke, G. (2005). Models for Discrete Longitudinal Data. Springer, New York.
  • Nelsen, R. B. (2006). An Introduction to Copulas, 2nd ed. Springer, New York.
  • Olsen, M. K. and Schafer, J. L. (2001). A two-part random-effects model for semicontinuous longitudinal data. J. Amer. Statist. Assoc. 96 730–745.
  • Panagiotelis, A., Czado, C. and Joe, H. (2012). Pair copula constructions for multivariate discrete data. J. Amer. Statist. Assoc. 107 1063–1072.
  • Parks, R. W. (1967). Efficient estimation of a system of regression equations when disturbances are both serially and contemporaneously correlated. J. Amer. Statist. Assoc. 62 500–509.
  • Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika 86 677–690.
  • Pourahmadi, M. (2000). Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix. Biometrika 87 425–435.
  • Pourahmadi, M. (2007). Cholesky decompositions and estimation of a covariance matrix: Orthogonality of variance-correlation parameters. Biometrika 94 1006–1013.
  • Shi, P. (2016). Insurance ratemaking using a copula-based multivariate Tweedie model. Scand. Actuar. J. 3 198–215.
  • Shi, P., Zhang, W. and Valdez, E. A. (2012). Testing adverse selection with two-dimensional information: Evidence from the Singapore auto insurance market. Journal of Risk and Insurance 79 1077–1114.
  • Smith, M., Min, A., Almeida, C. and Czado, C. (2010). Modeling longitudinal data using a pair-copula decomposition of serial dependence. J. Amer. Statist. Assoc. 105 1467–1479.
  • Smyth, G. K. (1996). Regression analysis of quantity data with exact zeros. In Proceedings of the Second Australia-Japan Workshop on Stochastic Models in Engineering, Technology and Management (R. Wilson, S. Osaki and D. Murthy, eds.) 17–19. Gold Coast, Australia.
  • Smyth, G. K. and Jørgensen, B. (2002). Fitting Tweedie’s compound Poisson model to insurance claims data: Dispersion modelling. Astin Bull. 32 143–157.
  • Song, P. X.-K., Li, M. and Yuan, Y. (2009). Joint regression analysis of correlated data using Gaussian copulas. Biometrics 65 60–68.
  • Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica 26 24–36.
  • Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. In Statistics: Applications and New Directions (Calcutta, 1981) 579–604. Indian Statist. Inst., Calcutta.
  • Varin, C. (2008). On composite marginal likelihoods. AStA nnv. Stat. Anal. 92 1–28.
  • Varin, C., Reid, N. and Firth, D. (2011). An overview of composite likelihood methods. Statist. Sinica 21 5–42.
  • Varin, C. and Vidoni, P. (2005). A note on composite likelihood inference and model selection. Biometrika 92 519–528.
  • Zhang, Y. (2013). Likelihood-based and Bayesian methods for Tweedie compound Poisson linear mixed models. Stat. Comput. 23 743–757.
  • Zhao, Y. and Joe, H. (2005). Composite likelihood estimation in multivariate data analysis. Canad. J. Statist. 33 335–356.