Electronic Journal of Statistics

Gaussian copula marginal regression

Guido Masarotto and Cristiano Varin

Full-text: Open access

Abstract

This paper identifies and develops the class of Gaussian copula models for marginal regression analysis of non-normal dependent observations. The class provides a natural extension of traditional linear regression models with normal correlated errors. Any kind of continuous, discrete and categorical responses is allowed. Dependence is conveniently modelled in terms of multivariate normal errors. Inference is performed through a likelihood approach. While the likelihood function is available in closed-form for continuous responses, in the non-continuous setting numerical approximations are used. Residual analysis and a specification test are suggested for validating the adequacy of the assumed multivariate model. Methodology is implemented in a R package called gcmr. Illustrations include simulations and real data applications regarding time series, cross-design data, longitudinal studies, survival analysis and spatial regression.

Article information

Source
Electron. J. Statist. Volume 6 (2012), 1517-1549.

Dates
First available in Project Euclid: 31 August 2012

Permanent link to this document
http://projecteuclid.org/euclid.ejs/1346421603

Digital Object Identifier
doi:10.1214/12-EJS721

Mathematical Reviews number (MathSciNet)
MR2988457

Zentralblatt MATH identifier
06167006

Citation

Masarotto, Guido; Varin, Cristiano. Gaussian copula marginal regression. Electronic Journal of Statistics 6 (2012), 1517--1549. doi:10.1214/12-EJS721. http://projecteuclid.org/euclid.ejs/1346421603.


Export citation

References

  • [1] Anděl, J., Netuka, I. and Svara, K. (1984). On threshold autoregressive processes., Kybernetika 20, 89–106.
  • [2] Andrews, D.W.K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation., Econometrica 59, 817–858.
  • [3] Azzalini, A. (1985). A class of distributions which includes the normal ones., Scandinavian Journal of Statistics 12, 171–178.
  • [4] Bodnar, O. Bodnar, T., and Gupta, A.K. (2010). Estimation and inference for dependence in multivariate data., Journal of Multivariate Analysis 101, 869–881.
  • [5] Booth, J.G. and Hobert, J.P. (1999). Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm., Journal of the Royal Statistical Society, Series B 61, 265–285.
  • [6] Chib, S. (1995). Marginal likelihood from the Gibbs output., Journal of the American Statistical Association 90, 1313–1321.
  • [7] Chib, S. and Greenberg, E. (1998). Analysis of multivariate probit models., Biometrika 85, 347–361.
  • [8] Cox, D.R., and Snell, E.J. (1968). A general definition of residuals., Journal of the Royal Statistical Society, Series B 30, 248–275.
  • [9] Craig, P. (2008). A new reconstruction of multivariate normal orthant probabilities., Journal of the Royal Statistical Society, Series B 70, 227–243.
  • [10] Cressie, N. (1993)., Statistics for Spatial Data. Wiley, New York.
  • [11] de Leon, A.R. and Wu, B. (2011). Copula-based regression models for a bivariate mixed discrete and continuous outcome., Statistics in Medicine 30, 175–185.
  • [12] de Leon, A.R., Wu, B., and Withanage, N. (2012). Joint analysis of mixed discrete and continuous outcomes via copula models. Preprint, http://math.ucalgary.ca/~adeleon/Chapter11.pdf.
  • [13] Diggle, P.J., Heagerty, P., Liang, K.-Y. and Zeger, S.L. (2002)., Analysis of longitudinal data. Second edition. Oxford University Press, Oxford.
  • [14] Diggle, P.J. and Ribeiro, P.J.J. (2007)., Model-based Geostatistics. Springer, New York.
  • [15] Dunn, P.K. and Smyth, G.K. (1996). Randomized quantile residuals., Journal of Computational and Graphical Statistics 5, 236-244.
  • [16] Durbin, J. and Koopman, S.J. (2001)., Time Series Analysis by State Space Methods. Oxford University Press.
  • [17] Genest, C. and Nešlehová, J. (2007). A primer on copulas for count data., Astin Bulletin 37, 475–515.
  • [18] Genz, A. and Bretz, F. (2002). Methods for the computation of multivariate t-probabilities., Journal of Computational and Graphical Statistics 11, 950–971.
  • [19] Geweke, J. (1991). Efficient simulation from the multivariate normal and Student-t distributions subject to linear constraints. In Proceedings of the 23rd Symposium in the Interface, Interface Foundation of North America, Fairfax.
  • [20] Gueorguieva, R.V. and Agresti, A. (2001). A correlated probit model for joint modelling of clustered binary and continuous responses., Journal of the American Statistical Association 96, 1102–1112.
  • [21] Harris, B. (1988). Tetrachoric correlation coefficient, in L. Kotz and N. Johnson (eds.) Encyclopedia of Statistical Sciences 9, 223–225. Wiley.
  • [22] Hausman, J.A. (1978). Specification tests in econometrics., Econometrica, 46, 1251–1271.
  • [23] Hoff, P.D. (2007). Extending the rank likelihood for semiparametric copula estimation., Annals of Applied Statistics 1, 265–283.
  • [24] Hothorn, A., Bertz, F., and Genz, A. (2001). On multivariate T and Gaussian probabilities in R., R News 1, 27–29.
  • [25] Hurvich, C.M. and Tsai, C.-L. (1989). Regression and time series model selection in small samples., Biometrika 76, 297–307.
  • [26] Jeliazkov, I. and Lee, E.H. (2010). MCMC perspectives on simulated likelihood estimation., Advances in Econometrics 26, 3–40.
  • [27] Joe, H. (1995). Approximation to multivariate normal rectangle probabilities based on conditional expectations., Journal of the American Statistical Association 90, 957–964.
  • [28] Joe, H. (1997)., Multivariate Models and Dependence Concepts. Chapman and Hall.
  • [29] Kauermann, G. and Carroll, R.J. (2001). A note on the efficiency of sandwich covariance matrix estimation., Journal of the American Statistical Association 96, 1387–1396.
  • [30] Keane, M.P. (1994). A computationally practical simulation estimator for panel data., Econometrica 62, 95–116.
  • [31] Klaassen, C.A. and Wellner, J.A. (1997). Efficient estimation in the bivariate normal copula model: normal margins are least favourable., Bernoulli 3, 55–77.
  • [32] Kugiumtzis, D. and Bora-Senta, E. (2010). Normal correlation coefficient of non-normal variables using piece-wise linear approximation., Computational Statistics 25, 645–662.
  • [33] Le Cessie, S. and Van Houwelingen, J.C. (1994). Logistic regression for correlated binary data., Applied Statistics 43, 95–108.
  • [34] Liang, K.-L. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models., Biometrika 73, 13–22.
  • [35] Lindsay, B.G. (1988). Composite likelihood methods., Contemporary Mathematics 80, 221–240.
  • [36] Mantel, N., Bohidar, N.R. and Ciminera, J.L. (1977). Mantel-Haenszel analysis of litter-matched time-to-response data, with modifications to recovery of interlitter information., Cancer Research 37, 3863–3868.
  • [37] McCullagh, P. and Nelder, J.A. (1989)., Generalized Linear Models. Second edition. Chapman and Hall.
  • [38] Miwa, T., Hayter, A.J. and Kuriky, S. (2003). The evaluation of general non-centred orthant probabilities., Journal of the Royal Statistical Society, Series B 65, 223-234.
  • [39] Molenberghs, G. and Verbeke, G. (2005)., Models for Discrete Longitudinal Data, Springer.
  • [40] Nikoloulopoulos, A.K., Joe, H. and Li, H. (2011). Weighted scores method for regression models with dependent data., Biostatistics 12, 653–665.
  • [41] Nikoloulopoulos, A.K., Joe, H. and Chaganty, N.R. (2011). Extreme value properties of multivariate t copulas., Extremes 12, 129–148.
  • [42] Parzen, M., Ghosh, S., Lipsitz, S., Sinha, D., Fitzmaurice, G.M., Mallick, B.K., Ibrahim, J.G. (2011). A generalized linear mixed model for longitudinal binary data with a marginal logit link function., Annals of Applied Statistics 5, 449–467.
  • [43] Pitt, M., Chan, D. and Kohn, R. (2006). Efficient Bayesian inference for Gaussian copula regression models., Biometrika 93, 537–554.
  • [44] R Development Core Team (2012)., R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: http://www.R-project.org.
  • [45] Rosenblatt, M. (1952). Remarks on a multivariate transformation., The Annals of Mathematical Statistics 23, 470–472.
  • [46] Rue, H. and Tjelmeland, H. (2002). Fitting Gaussian random fields to Gaussian fields., Scandinavian Journal of Statistics 29, 31–50.
  • [47] Song, P.X.-K. (2000). Multivariate dispersion models generated from Gaussian copula., Scandinavian Journal of Statistics 27, 305–320.
  • [48] Song, P.X-K. (2007)., Correlated Data Analysis: Modeling, Analytics and Applications. Springer-Verlag.
  • [49] Song, P.X.-K., Fan, Y. and Kalbfleisch, J.D. (2005). Maximization by parts in likelihood inference (with discussion)., Journal of the American Statistical Association 100, 1145–1167.
  • [50] Song, P.X.-K., Li, M. and Yuan, Y. (2009). Joint regression analysis of correlated data using Gaussian copulas., Biometrics 65, 60–68.
  • [51] Sung, Y.J. and Geyer, C.J. (2007). Monte Carlo likelihood inference for missing data models., The Annals of Statistics 35, 990–1011.
  • [52] Tong, H. (1990)., Non-Linear Time Series: A Dynamical System Approach. Oxford: Oxford University Press.
  • [53] Train, K.E. (2003)., Discrete Choice Methods with Simulation. Cambridge: Cambridge University Press.
  • [54] Varin, C., Reid, N. and Firth, D. (2011). An overview of composite likelihood methods., Statistica Sinica 21, 5–42.
  • [55] Waller, L.A. and Gotway, C.A. (2004)., Applied Spatial Statistics for Public Health Data. New York: John Wiley and Sons.
  • [56] Wakefield, J. (2007). Disease mapping and spatial regression with count data., Biostatistics 8, 158–183.
  • [57] White, H. (1994)., Estimation, Inference and Specification Analysis. Cambridge University Press.
  • [58] Wu, B. and de Leon, A.R. (2012). Flexible random effects copula models for clustered mixed bivariate outcomes in developmental toxicology. Preprint, http://math.ucalgary.ca/~adeleon/re_copula_paper_may2012.pdf.
  • [59] Zeger, S.L. (1988). A regression model for time series of counts., Biometrika 75, 822–835.
  • [60] Zeger, S.L. and Karim, M.R. (1991). Generalized linear models with random effects: a Gibbs sampling approach., Journal of the American Statistical Association 86, 79–86.
  • [61] Zeileis, A. (2006). Object-oriented computation of sandwich estimators., Journal of Statistical Software 16, issue 9.
  • [62] Zhao, Y. and Joe, H. (2005). Composite likelihood estimation in multivariate data analysis., The Canadian Journal of Statistics 33, 335–356.
  • [63] Zucchini, W. and MacDonald, I.L. (2009)., Hidden Markov Models for Time Series. Chapman & Hall/CRC.