Statistical Science

Accurate Parametric Inference for Small Samples

Alessandra R. Brazzale and Anthony C. Davison

Source: Statist. Sci. Volume 23, Number 4 (2008), 465-484.

Abstract

We outline how modern likelihood theory, which provides essentially exact inferences in a variety of parametric statistical problems, may routinely be applied in practice. Although the likelihood procedures are based on analytical asymptotic approximations, the focus of this paper is not on theory but on implementation and applications. Numerical illustrations are given for logistic regression, nonlinear models, and linear non-normal models, and we describe a sampling approach for the third of these classes. In the case of logistic regression, we argue that approximations are often more appropriate than ‘exact’ procedures, even when these exist.

Keywords: Conditional inference; heteroscedasticity; logistic regression; Lugannani–Rice formula; Markov chain Monte Carlo; nonlinear model; R; regression-scale model; saddlepoint approximation; spline; statistical computing

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ss/1242049390
Digital Object Identifier: doi:10.1214/08-STS273
Mathematical Reviews number (MathSciNet): MR2530546

References

Agresti, A. (2002). Categorical Data Analysis, 2nd ed. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1914507
Agresti, A. and Caffo, B. (2000). Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. Amer. Statist. 54 280–288.
Mathematical Reviews (MathSciNet): MR1814845
Digital Object Identifier: doi:10.2307/2685779
Agresti, A. and Coull, B. A. (1998). Approximate is better than “exact” for interval estimation of binomial proportions. Amer. Statist. 52 119–126.
Mathematical Reviews (MathSciNet): MR1628435
Digital Object Identifier: doi:10.2307/2685469
Andrews, D. F. and Herzberg, A. M. (1985). Data: A Collection of Problems from Many Fields for the Student and Research Worker. Springer, New York.
Barndorff-Nielsen, O. E. (1983). On a formula for the distribution of the maximum likelihood estimator. Biometrika 70 343–365.
Mathematical Reviews (MathSciNet): MR712023
Digital Object Identifier: doi:10.1093/biomet/70.2.343
Barndorff-Nielsen, O. E. (1986). Inference on full or partial parameters based on the standardized signed log likelihood ratio. Biometrika 73 307–322.
Mathematical Reviews (MathSciNet): MR855891
Barndorff-Nielsen, O. E. and Cox, D. R. (1979). Edgeworth and saddle-point approximations with statistical applications (with discussion). J. Roy. Statist. Soc. Ser. B 41 279–312.
Mathematical Reviews (MathSciNet): MR557595
Barndorff-Nielsen, O. E. and Cox, D. R. (1994). Inference and Asymptotics. Chapman & Hall, London.
Mathematical Reviews (MathSciNet): MR1317097
Bates, D. M. and Watts, D. G. (1988). Nonlinear Regression Analysis and Its Applications. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1060528
Bellio, R. (2000). Likelihood asymptotics: Applications in biostatistics. Ph.D. thesis, Dept. Statistics, Univ. Padova, Italy.
Bellio, R. and Brazzale, A. R. (1999). Higher-order asymptotics in nonlinear regression. In Proceedings of the 14th International Workshop on Statistical Modelling, Graz, July 19–23, 1999 (H. Friedl, A. Berghold and G. Kauermann, eds.) 440–443.
Bellio, R. and Brazzale, A. R. (2003). Higher-order asymptotics unleashed: Software design for nonlinear heteroscedastic models. J. Comput. Graph. Statist. 12 682–697.
Mathematical Reviews (MathSciNet): MR2005458
Digital Object Identifier: doi:10.1198/1061860032003
Bellio, R., Jensen, J. E. and Seiden, P. (2000). Applications of likelihood asymptotics for nonlinear regression in herbicide bioassays. Biometrics 56 1204–1212.
Mathematical Reviews (MathSciNet): MR1815621
Digital Object Identifier: doi:10.1111/j.0006-341X.2000.01204.x
Bellio, R. and Sartori, N. (2006). Practical use of modified maximum likelihoods for stratified data. Biom. J. 48 876–886.
Mathematical Reviews (MathSciNet): MR2291296
Digital Object Identifier: doi:10.1002/bimj.200510221
Bondesson, L. (1982). To reduce a composite hypothesis to a simple one by sampling from the structural or a conditional distribution. Scand. J. Statist. 9 129–138.
Mathematical Reviews (MathSciNet): MR680908
Brazzale, A. R. (1999). Approximate conditional inference in logistic and loglinear models. J. Comput. Graph. Statist. 8 653–661.
Brazzale, A. R. (2000). Practical small-sample parametric inference. Ph.D. thesis, Dept. Mathematics, Swiss Federal Institute of Technology Lausanne. Available at www.isib.cnr.it/~brazzale/lib.html.
Brazzale, A. R. (2005). hoa: An R package bundle for higher order likelihood inference. R News 5 20–27. Available at http://cran.r-project.org/doc/Rnews.
Brazzale, A. R., Davison, A. C. and Reid, N. (2007). Applied Asymptotics: Case Studies in Small Sample Statistics. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR2342742
Brown, L. D., Cai, T. T. and DasGupta, A. (2001). Interval estimation for a binomial proportion (with discussion). Statist. Sci. 16 101–133.
Mathematical Reviews (MathSciNet): MR1861069
Project Euclid: euclid.ss/1009213286
Butler, R. W. (2007). Saddlepoint Approximations with Applications. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR2357347
Casella, G., Wells, M. T. and Tanner, M. A. (1994). Using sampling-based calculations for pivotal inference. Technical Report BU-1178-M, Biometrics Unit, Cornell Univ., Ithaca, NY.
Castillo, J. D. and López-Ratera, A. (2006). Saddlepoint approximation in exponential models with boundary points. Bernoulli 12 491–500.
Clopper, C. J. and Pearson, E. S. (1934). The use of confidence interval or fiducial limits illustrated in the case of the binomial. Biometrika 26 404–413.
Cox, D. R. (1958). The regression analysis of binary sequences (with discussion). J. Roy. Statist. Soc. Ser. B 20 215–242.
Mathematical Reviews (MathSciNet): MR99097
Cox, D. R. (1988). Some aspects of conditional and asymptotic inference: A review. Sankhyà A 50 314–337.
Cytel Inc. (2007). StatXact/LogXact 8. Cytel Inc., Cambridge, Mas. Available at http://www.cytel.com.
Daniels, H. E. (1954). Saddlepoint approximations in statistics. Ann. Math. Statist. 25 631–650.
Mathematical Reviews (MathSciNet): MR66602
Digital Object Identifier: doi:10.1214/aoms/1177728652
Project Euclid: euclid.aoms/1177728652
Daniels, H. E. (1958). Discussion of “The regression analysis of binary sequences,” by D. R. Cox. J. Roy. Statist. Soc. Ser. B 20 236–238.
Mathematical Reviews (MathSciNet): MR99097
Daniels, H. E. (1987). Tail probability approximations. Internat. Statist. Rev. 54 37–48.
Mathematical Reviews (MathSciNet): MR962940
Digital Object Identifier: doi:10.2307/1403269
Davison, A. C. (1988). Approximate conditional inference in generalized linear models. J. Roy. Statist. Soc. Ser. B 50 445–461.
Mathematical Reviews (MathSciNet): MR970979
Davison, A. C. (2003). Statistical Models. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR1998913
Davison, A. C., Fraser, D. A. S. and Reid, N. (2006). Improved likelihood inference for discrete data. J. Roy. Statist. Soc. Ser. B 68 495–508.
Mathematical Reviews (MathSciNet): MR2278337
Digital Object Identifier: doi:10.1111/j.1467-9868.2006.00548.x
Davison, A. C. and Hinkley, D. V. (1997). Bootstrap Methods and Their Application. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR1478673
Davison, A. C. and Sartori, N. (2008). The Banff challenge: Statistical detection of a noisy signal. Statist. Sci. 123 354–364.
Mathematical Reviews (MathSciNet): MR2483908
Digital Object Identifier: doi:10.1214/08-STS260
Project Euclid: euclid.ss/1233153063
Davison, A. C. and Wang, S. (2002). Saddlepoint approximations as smoothers. Biometrika 89 933–938.
Mathematical Reviews (MathSciNet): MR1946521
Digital Object Identifier: doi:10.1093/biomet/89.4.933
Denison, D. G. T., Holmes, C. C., Mallick, B. K. and Smith, A. F. M. (2002). Bayesian Methods for Nonlinear Classification and Regression. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1962778
Diaconis, P. and Sturmfels, B. (1998). Algebraic algorithms for sampling from conditional distributions. Ann. Statist. 26 363–397.
Mathematical Reviews (MathSciNet): MR1608156
Digital Object Identifier: doi:10.1214/aos/1030563990
Project Euclid: euclid.aos/1030563990
DiCiccio, T. J. and Efron, B. (1996). Bootstrap confidence intervals (with discussion). Statist. Sci. 11 189–228.
Mathematical Reviews (MathSciNet): MR1436647
Digital Object Identifier: doi:10.1214/ss/1032280214
Project Euclid: euclid.ss/1032280214
DiCiccio, T. J. and Field, C. A. (1991). An accurate method for approximate conditional and Bayesian inference about linear regression models from censored data. Biometrika 78 903–910.
Mathematical Reviews (MathSciNet): MR1147027
Digital Object Identifier: doi:10.1093/biomet/78.4.903
DiCiccio, T. J., Field, C. A. and Fraser, D. A. S. (1990). Approximations of marginal tail probabilities and inference for scalar parameters. Biometrika 77 77–95.
Mathematical Reviews (MathSciNet): MR1049410
Digital Object Identifier: doi:10.1093/biomet/77.1.77
DiCiccio, T. J. and Martin, M. A. (1993). Simple modifications for signed roots of likelihood ratio statistics. J. Roy. Statist. Soc. Ser. B 55 305–316.
Mathematical Reviews (MathSciNet): MR1210437
DiCiccio, T. J., Martin, M. A. and Stern, S. E. (2001). Simple and accurate one-sided inference from signed roots of likelihood ratios. Canad. J. Statist. 29 67–76.
Mathematical Reviews (MathSciNet): MR1834487
Digital Object Identifier: doi:10.2307/3316051
DiCiccio, T. J. and Young, G. A. (2008). Conditional properties of unconditional parametric bootstrap procedures for inference in exponential families. Biometrika 95 747–758.
Durbin, J. (1961). Some methods of constructing exact tests. Biometrika 48 41–55.
Mathematical Reviews (MathSciNet): MR126313
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1–26.
Mathematical Reviews (MathSciNet): MR515681
Digital Object Identifier: doi:10.1214/aos/1176344552
Project Euclid: euclid.aos/1176344552
Efron, B. and Hinkley, D. V. (1978). Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information (with discussion). Biometrika 65 457–487.
Mathematical Reviews (MathSciNet): MR521817
Digital Object Identifier: doi:10.1093/biomet/65.3.457
Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman & Hall, New York.
Mathematical Reviews (MathSciNet): MR1270903
Fisher, R. A. (1934). Two new properties of mathematical likelihood. Proc. R. Soc. Lond. Ser. A 144 285–307.
Forster, J. J., McDonald, J. W. and Smith, P. W. F. (1996). Monte Carlo exact conditional tests for log-linear and logistic models. J. Roy. Statist. Soc. Ser. B 58 445–453.
Mathematical Reviews (MathSciNet): MR1377843
Forster, J. J., McDonald, J. W. and Smith, P. W. F. (2003). Markov chain Monte Carlo exact inference for binomial and multinomial regression models. Stat. Comput. 13 169–177.
Fraser, D. A. S. (1979). Inference and Linear Models. McGraw-Hill, New York.
Mathematical Reviews (MathSciNet): MR535612
Fraser, D. A. S. (1990). Tail probabilities from observed likelihoods. Biometrika 77 65–76.
Mathematical Reviews (MathSciNet): MR1049409
Digital Object Identifier: doi:10.1093/biomet/77.1.65
Fraser, D. A. S. (2003). Likelihood for component parameters. Biometrika 90 327–339.
Mathematical Reviews (MathSciNet): MR1986650
Digital Object Identifier: doi:10.1093/biomet/90.2.327
Fraser, D. A. S., Andrews, D. A. and Wong, A. (2005). Computation of distribution functions from likelihood information near observed data. J. Statist. Plann. Inference 134 180–193.
Mathematical Reviews (MathSciNet): MR2146092
Digital Object Identifier: doi:10.1016/j.jspi.2003.12.021
Fraser, D. A. S., Lee, H. S. and Reid, N. (1990). Nonnormal linear regression: An example of significance levels in high dimensions. Biometrika 77 333–341.
Mathematical Reviews (MathSciNet): MR1064805
Digital Object Identifier: doi:10.1093/biomet/77.2.333
Fraser, D. A. S. and Reid, N. (2009). Mean likelihood and higher order approximations. Unpublished.
Fraser, D. A. S., Reid, N. and Wu, J. (1999). A simple formula for tail probabilities for frequentist and Bayesian inference. Biometrika 86 249–264.
Mathematical Reviews (MathSciNet): MR1705367
Digital Object Identifier: doi:10.1093/biomet/86.2.249
Fraser, D. A. S., Wong, A. and Wu, J. (1999). Regression analysis, nonlinear or nonnormal: Simple and accurate p values from likelihood analysis. J. Amer. Statist. Assoc. 94 1286–1295.
Mathematical Reviews (MathSciNet): MR1731490
Digital Object Identifier: doi:10.2307/2669942
Geyer, C. J. and Meeden, G. D. (2005). Fuzzy and randomized confidence intervals and P-values (with discussion). Statist. Sci. 20 358–387.
Mathematical Reviews (MathSciNet): MR2210225
Digital Object Identifier: doi:10.1214/088342305000000340
Project Euclid: euclid.ss/1137076652
Guolo, A., Brazzale, A. R. and Salvan, A. (2006). Improved inference on a scalar fixed effect of interest in nonlinear mixed-effects models. Comput. Statist. Data Anal. 51 1602–1613.
Mathematical Reviews (MathSciNet): MR2307530
Jensen, J. L. (1992). The modified signed likelihood statistic and saddlepoint approximations. Biometrika 79 693–703.
Mathematical Reviews (MathSciNet): MR1209471
Digital Object Identifier: doi:10.1093/biomet/79.4.693
Kappenmann, R. F. (1975). Conditional confidence intervals for the double exponential distribution parameters. Technometrics 17 233–235.
Mathematical Reviews (MathSciNet): MR370902
Digital Object Identifier: doi:10.2307/1268356
Lawless, J. F. (1972). Conditional confidence interval procedures for the location and scale parameters of the Cauchy and logistic distributions. Biometrika 59 377–386.
Mathematical Reviews (MathSciNet): MR334380
Digital Object Identifier: doi:10.1093/biomet/59.2.377
Lawless, J. F. (1973). Conditional versus unconditional confidence intervals for the parameters of the Weibull distribution. J. Amer. Statist. Assoc. 68 655–669.
Lawless, J. F. (1978). Confidence interval estimation for the Weibull and extreme value distributions. Technometrics 20 355–368.
Mathematical Reviews (MathSciNet): MR515992
Digital Object Identifier: doi:10.2307/1267633
Lee, S. M. S. and Young, G. A. (2005). Parametric bootstrapping with nuisance parameters. Statist. Probab. Lett. 71 143–153.
Mathematical Reviews (MathSciNet): MR2126770
Lugannani, R. and Rice, S. (1980). Saddle point approximation for the distribution of the sum of independent random variables. Adv. Appl. Probab. 12 475–490.
Mathematical Reviews (MathSciNet): MR569438
Digital Object Identifier: doi:10.2307/1426607
Lunn, D. J., Thomas, A., Best, N. and Spiegelhalter, D. (2000). WinBUGS—a Bayesian modelling framework: Concepts, structure, and extensibility. Stat. Comput. 10 325–337. Available at http://www.mrc-bsu.cam.ac.uk/bugs.
Mehta, C. R. and Patel, N. R. (1995). Exact logistic regression: Theory and examples. Stat. Med. 14 2143–2160.
Mehta, C. R., Patel, N. R. and Senchaudhuri, P. (2000). Efficient Monte Carlo methods for conditional logistic regression. J. Amer. Statist. Assoc. 95 99–108.
Morgenthaler, S. and Tukey, J. W., eds. (1991). Configural Polysampling: A Route to Practical Robustness. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1247031
Pace, L. and Salvan, A. (1997). Principles of Statistical Inference from a Neo-Fisherian Perspective. World Scientific, Singapore.
Mathematical Reviews (MathSciNet): MR1476674
Pace, L., Salvan, A. and Ventura, L. (2004). The effects of rounding on likelihood procedures. J. Appl. Statist. 31 29–48.
Mathematical Reviews (MathSciNet): MR2041554
Digital Object Identifier: doi:10.1080/0266476032000148939
Pierce, D. A. and Peters, D. (1992). Practical use of higher order asymptotics for multiparameter exponential families (with discussion). J. Roy. Statist. Soc. Ser. B 54 701–737.
Mathematical Reviews (MathSciNet): MR1185218
Pierce, D. A. and Peters, D. (1999). Improving on exact tests by approximate conditioning. Biometrika 86 265–277.
Mathematical Reviews (MathSciNet): MR1705363
Digital Object Identifier: doi:10.1093/biomet/86.2.265
R Development Core Team (2007). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Available at http://www.R-project.org.
Reid, N. (1988). Saddlepoint methods and statistical inference (with discussion). Statist. Sci. 3 213–238.
Mathematical Reviews (MathSciNet): MR968390
Digital Object Identifier: doi:10.1214/ss/1177012906
Project Euclid: euclid.ss/1177012906
Reid, N. (1995). The roles of conditioning in inference (with discussion). Statist. Sci. 10 138–199.
Mathematical Reviews (MathSciNet): MR1368097
Digital Object Identifier: doi:10.1214/ss/1177010027
Project Euclid: euclid.ss/1177010027
Reid, N. (2003). Asymptotics and the theory of inference. Ann. Statist. 31 1695–1731.
Mathematical Reviews (MathSciNet): MR2036388
Digital Object Identifier: doi:10.1214/aos/1074290325
Project Euclid: euclid.aos/1074290325
Reid, N., Mukerjee, R. and Fraser, D. A. S. (2002). Some aspects of matching priors. In Mathematical Statistics and Applications: Festschrift for Constance van Eeden (M. Moore, S. Froda and C. Léger, eds.) 31–44. Hayward, CA.
Mathematical Reviews (MathSciNet): MR2138284
Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, 2nd ed. Springer, New York.
Mathematical Reviews (MathSciNet): MR2080278
Ronchetti, E. and Ventura, L. (2000). Between stability and higher-order asymptotics. Stat. Comput. 11 67–73.
Mathematical Reviews (MathSciNet): MR1837146
Digital Object Identifier: doi:10.1023/A:1026562000322
S-PLUS (2007). S-PLUS® (version 8). Insightful Corporation, Seattle, WA. Available at http://www.insightful.com.
Sartori, N. (2003). Modified profile likelihoods in models with stratum nuisance parameters. Biometrika 90 533–549.
Mathematical Reviews (MathSciNet): MR2006833
Digital Object Identifier: doi:10.1093/biomet/90.3.533
Sartori, N., Bellio, R., Pace, L. and Salvan, A. (1999). The directed modified profile likelihood in models with many nuisance parameters. Biometrika 86 735–742.
Mathematical Reviews (MathSciNet): MR1723792
Digital Object Identifier: doi:10.1093/biomet/86.3.735
Seber, G. A. F. and Wild, A. J. (1989). Nonlinear Regression. Wiley, New York.
Mathematical Reviews (MathSciNet): MR986070
Seiden, P., Kappel, D. and Streibig, J. C. (1998). Response of Brassica napus L. tissue culture to metsulfuron methyl and chlorsulfuron. Weed Research 38 221–228.
Severini, T. A. (1999). An empirical adjustment to the likelihood ratio statistic. Biometrika 86 235–247.
Mathematical Reviews (MathSciNet): MR1705371
Digital Object Identifier: doi:10.1093/biomet/86.2.235
Severini, T. A. (2000). Likelihood Methods in Statistics. Clarendon, Oxford.
Mathematical Reviews (MathSciNet): MR1854870
Skovgaard, I. M. (1987). Saddlepoint expansions for conditional distributions. J. Appl. Probab. 24 875–887.
Mathematical Reviews (MathSciNet): MR913828
Digital Object Identifier: doi:10.2307/3214212
Skovgaard, I. M. (1996). An explicit large-deviation approximation to one-parameter tests. Bernoulli 2 145–166.
Mathematical Reviews (MathSciNet): MR1410135
Digital Object Identifier: doi:10.2307/3318548
Project Euclid: euclid.bj/1193839221
Skovgaard, I. M. (2001). Likelihood asymptotics. Scand. J. Statist. 28 3–32.
Mathematical Reviews (MathSciNet): MR1844348
Digital Object Identifier: doi:10.1111/1467-9469.00223
Smith, P. W. F., Forster, J. J. and McDonald, J. W. (1996). Monte Carlo exact tests for square contingency tables. J. Roy. Statist. Soc. Ser. A 159 309–321.
Strawderman, R. L., Casella, G. and Wells, M. T. (1996). Practical small-sample asymptotics for regression problems (with discussion). J. Amer. Statist. Assoc. 91 643–654.
Mathematical Reviews (MathSciNet): MR1395732
Digital Object Identifier: doi:10.2307/2291660
Tibshirani, R. J. (1989). Noninformative priors for one parameter of many. Biometrika 76 604–608.
Mathematical Reviews (MathSciNet): MR1040654
Digital Object Identifier: doi:10.1093/biomet/76.3.604
Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). Ann. Statist. 22 1701–1762.
Mathematical Reviews (MathSciNet): MR1329166
Digital Object Identifier: doi:10.1214/aos/1176325750
Project Euclid: euclid.aos/1176325750
Tierney, L. and Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities. J. Amer. Statist. Assoc. 81 82–86.
Mathematical Reviews (MathSciNet): MR830567
Digital Object Identifier: doi:10.2307/2287970
Tierney, L., Kass, R. E. and Kadane, J. B. (1989). Approximate marginal densities of nonlinear functions. Biometrika 76 425–433.
Mathematical Reviews (MathSciNet): MR1040637
Digital Object Identifier: doi:10.1093/biomet/76.3.425
Trotter, H. F. and Tukey, J. W. (1956). Conditional Monte Carlo for normal samples. In Symposium on Monte Carlo Methods (H. Meyer, ed.) 64–79. Wiley, New York.
Mathematical Reviews (MathSciNet): MR79825
Ventura, L. (1997). Metodi asintotici, semiparametrici e robusti per l’inferenza in famiglie di gruppo. Ph.D. thesis, Dept. Statistics, Univ. Padova, Italy. (In Italian.)
Zamar, D., McNeney, B. and Graham, J. (2007). elrm: Software implementing exact-like inference for logistic regression models. J. Statist. Software 21, issue number 3. Available at http://www.jstatsoft.org/v21/i03.

2009 © Institute of Mathematical Statistics