We give an overview of statistical models and likelihood, together with two of its variants: penalized and hierarchical likelihood. The Kullback-Leibler divergence is referred to repeatedly in the literature, for defining the misspecification risk of a model and for grounding the likelihood and the likelihood cross-validation, which can be used for choosing weights in penalized likelihood. Families of penalized likelihood and particular sieves estimators are shown to be equivalent. The similarity of these likelihoods with a posteriori distributions in a Bayesian approach is considered.
References
[1] Akaike, H. (1973). Information theory and an extension of maximum likelihood principle., Second International Symposium on Information Theory, Akademia Kiado. 267–281.
Mathematical Reviews (MathSciNet):
MR483125
[2] Breslow, N.E. and Clayton, D.G. (1993). Approximate Inference in Generalized Linear Mixed Models., J. Amer. Statist. Assoc. 88, 9–25.
[3] Burnham, K.P. and Anderson, D.R. (2004). Multimodel inference: understanding AIC and BIC in model selection., Sociol. Methods Res. 33, 261–304.
[4] Cencov, N.N. (1982)., Statistical decisions rules and optimal inference. American Mathematical Society.
Mathematical Reviews (MathSciNet):
MR645898
[5] Commenges, D. and Gégout-Petit, A. (2005). Likelihood inference for incompletely observed stochastic processes: general ignorability conditions., arXiv:math.ST/0507151.
[6] Commenges, D. and Gégout-Petit, A. (2007). Likelihood for generally coarsened observations from multi-state or counting process models., Scand. J. Statist. 34, 432–450.
[7] Commenges, D., Joly, P., Gégout-Petit, A. and Liquet, B. (2007). Choice between semi-parametric estimators of Markov and non-Markov multi-state models from generally coarsened observations., Scand. J. Statist. 34, 33–52.
[8] Commenges, D., Jolly, D., Putter, H. and Thiébaut, R. (2009). Inference in HIV dynamics models via hierarchical likelihood., Submitted.
[9] Commenges, D., Sayyareh, A., Letenneur, L., Guedj, J. and Bar-Hen, A. (2008). Estimating a difference of Kullback-Leibler risks using a normalized difference of AIC., Ann. Appl. Statist. 2, 1123–1142.
[10] Davidian, M. and Giltinan, D.M. (2003). Nonlinear models for repeated measurement data: an overview and update, J. Agric. Biol. Environ. Statist. 8, 387–419.
[11] De Finetti, B. (1974)., Theory of Probability. Chichester: Wyley.
[12] Delyon B., Lavielle, M. and Moulines, E. (1999). Convergence of a Stochastic Approximation Version of the EM Algorithm., Ann. Statist. 27, 94–128.
[13] Eggermont, P. and Lariccia, V. (1999). Optimal convergence rates for Good’s nonparametric likelihood density estimator., Ann. Statist. 27, 1600–1615.
[14] Eggermont, P. and Lariccia, V. (2001)., Maximum penalized likelihood estimation. New-York: Springer-Verlag.
[15] Feigin, P.D. (1976). Maximum likelihood estimation for continuous-time stochastic processes., Adv. Appl. Prob. 8, 712–736.
Mathematical Reviews (MathSciNet):
MR426342
[16] Fisher, R.A. (1922). On the Mathematical Foundations of Theoretical Statistics., Phil. Trans. Roy. Soc. A 222, 309–368.
[17] Good, I.J. and Gaskin, R.A. (1971). Nonparametric roughness penalty for probability densities., Biometrika 58, 255–277.
Mathematical Reviews (MathSciNet):
MR319314
[18] Gu, C. and Kim, Y. J. (2002). Penalized likelihood regression.: general formulation and efficient approximation., Can. J. Stat. 30, 619–628.
[19] Guedj, J., Thiébaut, R. and Commenges, D. (2007). Maximum likelihood estimation in dynamical models of HIV., Biometrics 63, 1198–1206.
[20] Heitjan, D.F. and Rubin, D.B. (1991). Ignorability and coarse data., Ann. Statist. 19, 2244–2253.
[21] Hastie, T. and Tibshirani, R. (1990)., Generalized additive models. London: Chapman and Hall.
[22] Hoffmann-Jorgensen, J. (1994)., Probability with a view toward statistics. London: Chapman and Hall.
[23] Jacod, J. (1975). Multivariate point processes: predictable projection; Radon-Nikodym derivative, representation of martingales., Z. Wahrsch. Verw. Geb. 31, 235–253.
Mathematical Reviews (MathSciNet):
MR380978
[24] Jeffreys, H. (1961)., Theory of probability. Oxford University Press.
Mathematical Reviews (MathSciNet):
MR187257
[25] Joly, P. and Commenges, D. (1999). A penalized likelihood approach for a progressive three-state model with censored and truncated data: Application to AIDS., Biometrics 55, 887–890.
[26] Kass, R.E. and Wasserman, L. (1996). The selection of prior distributions by formal rules, J. Amer. Statist. Assoc. 91, 1343–1370.
[27] Konishi, S. and Kitagawa, G. (2008)., Information Criteria and Statistical Modeling. New-York: Springer Series in Statistics.
[28] Kullback, S. and Leibler, R.A. (1951). On information and sufficiency., Ann. Math. Statist. 22, 79–86.
Mathematical Reviews (MathSciNet):
MR39968
[29] Kullback, S. (1959)., Information Theory. New-York: Wiley.
Mathematical Reviews (MathSciNet):
MR103557
[30] Le Cam, L. (1990). Maximum Likelihood: An Introduction., Int. Statist. Rev. 58, 153–171.
[31] Lee, Y. and Nelder, J.A. (1992) Likelihood, Quasi-Likelihood and Pseudolikelihood: Some Comparisons., J. Roy. Statist. Soc. B 54, 273–284.
[32] Lee, Y. and Nelder, J.A. (1996). Hierarchical Generalized Linear Models., J. Roy. Statist. Soc. B 58, 619–678.
[33] Lee, Y. and Nelder, J.A. (2001). Hierarchical generalised linear models: A synthesis of generalised linear models, random-effect models and structured dispersions., Biometrika 88, 987–1006.
[34] Lee, Y., Nelder, J.A. and Pawitan, Y. (2006)., Generalized linear models with random effects. Chapman and Hall.
[35] Linhart, H. and Zucchini, W. (1986)., Model Selection, New York: Wiley.
Mathematical Reviews (MathSciNet):
MR866144
[36] Neymann, J. and Scott, E.L. (1988).Consistent estimates based on partially consistent observations., Econometrika 16, 1–32.
Mathematical Reviews (MathSciNet):
MR25113
[37] O’Sullivan, F. (1988). Fast computation of fully automated log-density and log-hazard estimators., SIAM J. Scient. Statist. Comput. 9, 363–379.
Mathematical Reviews (MathSciNet):
MR930052
[38] Rubin, D.B. (1976). Inference and missing data., Biometrika 63, 581–592.
Mathematical Reviews (MathSciNet):
MR455196
[39] Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations., J. Roy. Statist. Soc. B 71, 1–35.
[40] Shen, X. (1997). On methods of sieves and penalization., Ann. Statist. 25, 2555–2591.
[41] Therneau, T.M. and Grambsch, P.M. (2000)., Modeling survival data: extending the Cox model. Springer.
[42] van der Vaart, A. (1998), Asymptotic Statistics, Cambridge.
[43] Verbeke, G. and Molenberghs, G. (2000)., Linear Mixed Models for Longitudinal Data. New-York: Springer.
[44] Wahba, G. (1983). Bayesian “Confidence Intervals” for the Cross-Validated Smoothing Spline, J. Roy. Statist. Soc. B 45, 133–150.
Mathematical Reviews (MathSciNet):
MR701084
[45] Williams, D. (1991)., Probability with Martingales. Cambridge University Press.