References
Aitkin, M. and Stasinopoulos, M. (1989). Likelihood analysis of a binomial sample size problem. In Contributions to Probability and Statistics (L. J. Gleser, M. D. Perlman, S. J. Press and A. Sampson, eds.) Springer, New York.
Barnard, G. A., Jenkins, G. M. and Winsten, C. B. (1962). Likelihood inference and time series (with discussion). J. Roy. Statist. Soc. Ser. A 125 321-372.
Barndorff-Nielsen, O. (1983). On a formula for the distribution of the maximum likelihood estimator. Biometrika 70 343-365.
Barndorff-Nielsen, O. (1988). Parametric Statistical Models and Likelihood. Lecture Notes in Statist. 50. Springer, New York.
Barndorff-Nielsen, O. (1991). Likelihood theory. In Statistical Theory and Modelling: In Honour of Sir D.R. Cox. Chapman and Hall, London.
Bartlett, M. (1937). Properties of sufficiency and statistical tests. Proc. Roy. Soc. London Ser. A 160 268-282.
Basu, D. (1975). Statistical information and likelihood (with discussion). Sankhy¯a Ser. A 37 1-71.
Basu, D. (1977). On the elimination of nuisance parameters. J. Amer. Statist. Assoc. 72 355-366.
Bayarri, M. J., DeGroot, M. H. and Kadane, J. B. (1988). What is the likelihood function? In Statistical Decision Theory and Related Topics IV (S. S. Gupta and J. O. Berger, eds.) 2 3-27. Springer, New York,
Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer, New York.
Berger, J. O. and Bernardo, J. M. (1989). Estimating a product of means: Bayesian analysis with reference priors. J. Amer. Statist. Assoc. 84 200-207.
Berger, J. O. and Bernardo, J. M. (1992). Ordered group reference priors with applications to a multinomial problem. Biometrika 79 25-37.
Berger, J. O. and Berry, D. A. (1988). Statistical analysis and the illusion of objectivity. American Scientist 76 159-165.
Berger, J. O., Philippe, A. and Robert, C. (1998). Estimation of quadratic functions: noninformative priors for non-centrality parameters. Statist. Sinica 8 359-376.
Berger, J. O. and Strawderman, W. (1996). Choice of hierarchical priors: admissibility in estimation of normal means. Ann. Statist. 24 931-951.
Berger, J. O. and Wolpert, R. L. (1988). The Likelihood Principle: A Review, Generalizations, and Statistical Implications, 2nd ed. IMS, Hayward, CA.
Bernardo, J. M. (1979). Reference posterior distributions for Bayesian inference (with discussion). J. Roy. Statist. Soc. Ser. B 41 113-147.
Bernardo, J. M. and Smith, A. F. M. (1994). Bayesian Theory. Wiley, New York.
Bjørnstad, J. (1996). On the generalization of the likelihood function and the likelihood principle. J. Amer. Statist. Assoc. 91 791-806.
Butler, R. W. (1988). A likely answer to "What is the likelihood function?" In Statistical Decision Theory and Related Topics IV (S. S. Gupta and J. O. Berger, eds.) 2 21-26. Springer, New York.
Carroll, R. J. and Lombard, F. (1985). A note on N estimators for the Binomial distribution. J. Amer. Statist. Assoc. 80 423- 426.
Mathematical Reviews (MathSciNet):
MR792743
Chang, T. and Eaves, D. (1990). Reference priors for the orbit in a group model. Ann. Statist. 18 1595-1614.
Cox, D. R. (1975). Partial likelihood. Biometrika 62 269-276.
Cox, D. R. and Reid, N. (1987). Parameter orthogonality and approximate conditional inference (with discussion). J. Roy. Statist. Soc. Ser. B 49 1-39.
Cruddas, A. M., Reid, N. and Cox, D. R. (1989). A time series illustration of approximate conditional likelihood. Biometrika 76 231. Datta, G. S. and Ghosh, J. K. (1995a). Noninformative priors for maximal invariant parameter in group models. Test 4 95-114. Datta, G. S. and Ghosh, J. K. (1995b). On priors providing frequentist validity for Bayesian inference. Biometrika 82 37-45.
Datta, G. S. and Ghosh, J. K. (1996). On the invariance of noninformative priors. Ann. Statist. 24 141-159.
Dawid, A. P., Stone, M. and Zidek, J. V. (1973). Marginalization paradoxes in Bayesian and structural inference. J. Roy. Statist. Soc. Ser. B 35 180-233.
de Alba, E. and Mendoza, M. (1996). A discrete model for Bayesian forecasting with stable seasonal patterns. In Advances in Econometrics II (R. Carter Hill, ed.) 267-281. JAI Press.
Draper, N. and Guttman, I. (1971). Bayesian estimation of the binomial parameter. Technometrics 13 667-673.
Eaton, M. L. (1989). Group Invariance Applications in Statistics. IMS, Hayward, CA.
Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika 10 507.
Fisher, R. A. (1921). On the "probable error" of a coefficient of correlation deduced from a small sample. Metron 1 3-32.
Fisher, R. A. (1935). The fiducial argument in statistical inference. Ann. Eugenics 6 391-398.
Fraser, D. A. S. and Reid, N. (1989). Adjustments to profile likelihood. Biometrika 76 477-488.
Ghosh, J. K., ed. (1988). Statistical Information and Likelihood. A Collection of Critical Essays by D. Basu. Springer, New York.
Ghosh, J. K. and Mukerjee, R. (1992). Noninformative priors. In Bayesian Statistics 4 (J. O. Berger, J. M. Bernardo, A. P. Dawid and A. F. M. Smith, eds.) 195-203. Oxford Univ. Press.
Gleser, L. and Hwang, J. T. (1987). The nonexistence of 100 1 % confidence sets of finite expected diameter in errors-invariable and related models. Ann. Statist. 15 1351-1362.
Good, I. J. (1983). Good Thinking: The Foundations of Probability and Its Applications. Univ. Minnesota Press.
Hui, S. and Berger, J. O. (1983). Empirical Bayes estimation of rates in longitudinal studies. J. Amer. Statist. Assoc. 78 753-760.
Jeffreys, H. (1961). Theory of Probability. Oxford Univ. Press.
Kahn, W. D. (1987). A cautionary note for Bayesian estimation of the binomial parameter n. Amer. Statist. 41 38-39.
Mathematical Reviews (MathSciNet):
MR882767
Kalbfleisch, J. D. and Sprott, D. A. (1970). Application of likelihood methods to models involving large numbers of parameters. J. Roy. Statist. Soc. Ser. B 32 175-208.
Kalbfleish, J. D. and Sprott, D. A. (1974). Marginal and conditional likelihood. Sankhy¯a Ser. A 35 311-328.
Laplace, P. S. (1812). Theorie Analytique des Probabilities Courcier, Paris.
Lavine, M. and Wasserman, L. A. (1992). Can we estimate N? Technical Report 546, Dept. Statistics, Carnegie Mellon Univ.
Liseo, B. (1993). Elimination of nuisance parameters with reference priors. Biometrika 80 295-304.
McCullagh, P. and Tibshirani, R. (1990). A simple method for the adjustment of profile likelihoods. J. Roy. Statist. Soc. Ser. B 52 325-344.
Moreno, E. and Gir ´on, F. Y. (1995). Estimating with incomplete count data: a Bayesian Approach. Technical report, Univ. Granada, Spain.
Neyman, J. and Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica 16 1-32.
Mathematical Reviews (MathSciNet):
MR9,600d
Olkin, I., Petkau, A. J. and Zidek, J. V. (1981). A comparison of n estimators for the binomial distribution. J. Amer. Statist. Assoc. 76 637-642.
Raftery, A. E. (1988). Inference for the binomial N parameter: a hierarchical Bayes approach. Biometrika 75 223-228.
Reid, N. (1995). The roles of conditioning in inference. Statist. Sci. 10 138-157.
Reid, N. (1996). Likelihood and Bayesian approximation methods. In Bayesian Statistics 5 (J. O. Berger, J. M. Bernardo, A. P. Dawid and A. F. M. Smith, eds.) 351-369. Oxford Univ. Press.
Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length. Ann. Statist. 11 416- 431.
Savage, L. J. (1976). On rereading R. A. Fisher. Ann. Statist. 4 441-500.
Sun, D. (1994). Integrable expansions for posterior distributions for a two parameter exponential family. Ann. Statist. 22 1808-1830.
Sun, D. and Berger, J. O. (1998). Reference priors with partial information. Biometrika 85 55-71. Sweeting, T. (1995a). A framework for Bayesian and likelihood approximations in statistics. Biometrika 82 1-24. Sweeting, T. (1995b). A Bayesian approach to approximate conditional inference. Biometrika 82 25-36.
Sweeting, T. (1996). Approximate Bayesian computation based on signed roots of log-density ratios. In Bayesian Statistics 5 (J. O. Berger, J. M. Bernardo, A. P. Dawid and A. F. M. Smith, eds.) 427-444. Oxford Univ. Press.
Ye, K. and Berger, J. O. (1991). Non-informative priors for inference in exponential regression models. Biometrika 78 645- 656.
Zabell, S. L. (1989). R.A. Fisher on the history of inverse probability. Statist. Sci. 4 247-263.
likelihood introduced by Efron (1993). As noted by Berger, Liseo and Wolpert, in nonBayesian inference one cannot argue for integrated likelihood solely on grounds of rationality and coherency . However, one can at least say that as a likelihood method, integrated likelihood satisfies the likelihood principle, and is a proper likelihood. That is, if two likelihood functions for are proportional, then so are the corresponding integrated likelihoods for . The main inferential issue with eliminating nuisance parameters from the likelihood function is to be able to take into account the uncertainty in . It is well known that replacing by an estimate or even conditional to estimate (giving us the profile likelihood) ignores the uncertainty in . This can be especially serious if the dimension of is large. The resulting L can then be much too accurate, giving the impression that we have more information about than is warranted. I think that the single most important reason for using an integrated likelihood is, as emphasized in the paper, that this partial likelihood automatically and naturally takes into account parameter uncertainty in . A central theme in the paper is the comparison of the operations of integration and maximization. One of the main messages I read from the paper is that any reasonable integrated likelihood will typically outperform the profile likelihood. It seems quite clear that integration is a safer operation than maximization, so if it is obvious what kind of noninformative to use, integration would clearly be preferable. In fact, it seems to me that maybe the best thing is to do what Laplace suggested, choose parametrizations of the nuisance
likelihood by Harris (1989). Instead of integrating L with respect to a conditional noninformative , one can use a data-based weight function for , for example the distribution of the MLE at ,
1996). Thus, in the case that h = c in (24), and det I22 are equal up to first order and thus the integrated and profile likelihoods are the same up to first order. Hence for the reference priors used here, one expects similar profile and integrated likelihoods with large samples and regular models. Appropriately, the examples considered involve small samples or irregular models. The comments here will be restricted primarily to Examples 3, 4 and 5 with some additional comments about likelihood based methods in models with large numbers of nuisance parameters. Example 3 illustrates a general situation where profile likelihood methods can be expected to perform poorly. The problem with using profile
efficiency issues see also Lindsay, 1980). As a different example of the use of random effects integrated likelihoods, consider Example 5. The problems associated with the use of profile likelihood methods appear to be due to the relatively large number of nuisance parameters. A random effects assumption seems natural here given the parameter of interest. Consider the same random effects model as in Example 2: the µi are i.i.d. N 2 In this case, = and is a random variable ( in the notation of Section 1.3.1). The authors' recommendation of (6) as a likelihood in this setting is closely connected with the discussion about empirical Bayes methods in Section 3.3. If (6) is taken as the likelihood, the profile likelihood for = µ2i/p would be obtained by substituting an estimate into (6). It is more common to substitute an estimate of the based upon the likelihood for : the function of obtained by integrating over in (6). In this case the estimate for is = ¯x 2 = max 0 s2 1 Substituting this estimate of into (6) gives an integrated likelihood that is proportional to an estimate of the conditional density of given the observed data. Since the µi x are independently N mi v where v = 2/ 1 + 2 and mi = vxi + 1 v ¯x the estimate of the conditional distribution for p /v is a noncentral 2 distribution with p degrees of freedom and noncentrality parameter v-1 m2i/2 The additional random effects assumption is used in the above derivation but, even with the the original assumption that the µi are a fixed sequence, assuming that converges to a limit, one can show that this conditional distribution converges to a point mass at this limit. Statistical models with a large number of nuisance parameters arise frequently and profile likelihoods for these models often give unreasonable parameter of interest inferences. Often a natural alternative to a model with a large number of nuisance parameters is a random effects or mixture model. The analysis arising from the use of a random effects model can be viewed as an integrated likelihood method and frequently gives reasonable parameter of interest inferences that are robust to the assumption of randomness about the nuisance parameter. In many other statistical models, profile and integrated likelihoods tend to be similar. However, in models with badly behaved likelihood contributions, as in Example 3, profile likelihoods can give unreasonable parameter of interest inferences where integrated likelihoods give reasonable solutions. In models with a large amount of uncertainty about the nuisance parameter, as in Example 4, integrated likelihoods can give misleading results. Thus the differences between the results for the two methods is informative in itself, which suggests that in important and complex problems it might be of value to integrate the two as a form of sensitivity analysis.
(1989). We are somewhat concerned with the apparent double use of the data in the definitions of Lest and L est (using the data once in the likelihood function and again in the weight function for integration). Indeed, these weighted likelihoods are somewhat hard to compute. We were able to compute L est for Example 3, and it turned out to be the same as the (inadequate) profile likelihood, which does not instill confidence in the method.
Efron, B. (1993). Bayes and likelihood calculations from confidence intervals. Biometrika 80 3-26.
Harris, I. (1989). Predictive fit for natural exponential families. Biometrika 76 675-684.
Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist. 27 887-906.
Leonard, T. (1982). Comment on "A simple predictive density function" by M. Lejeune and G. D. Faulkenberry. J. Amer. Statist. Assoc. 77 657-658.
Liseo, B. and Sun, D. (1998). A general method of comparison for likelihoods. ISDS discussion paper, Duke Univ.
Tierney, L. J. and Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities. J. Amer. Statist. Assoc. 81 82-86.
Mathematical Reviews (MathSciNet):
MR830567