## Statistical Science

### Bootstrap confidence intervals

#### Abstract

This article surveys bootstrap methods for producing good approximate confidence intervals. The goal is to improve by an order of magnitude upon the accuracy of the standard intervals $\hat{\theta} \pm z^{(\alpha)} \hat{\sigma}$, in a way that allows routine application even to very complicated problems. Both theory and examples are used to show how this is done. The first seven sections provide a heuristic overview of four bootstrap confidence interval procedures: $BC_a$, bootstrap-t , ABC and calibration. Sections 8 and 9 describe the theory behind these methods, and their close connection with the likelihood-based confidence interval theory developed by Barndorff-Nielsen, Cox and Reid and others.

#### Article information

Source
Statist. Sci. Volume 11, Number 3 (1996), 189-228.

Dates
First available in Project Euclid: 17 September 2002

http://projecteuclid.org/euclid.ss/1032280214

Mathematical Reviews number (MathSciNet)
MR1436647

Digital Object Identifier
doi:10.1214/ss/1032280214

Zentralblatt MATH identifier
0955.62574

Subjects
Primary:

#### Citation

DiCiccio, Thomas J.; Efron, Bradley. Bootstrap confidence intervals. Statistical Science 11 (1996), no. 3, 189--228. doi:10.1214/ss/1032280214. http://projecteuclid.org/euclid.ss/1032280214.

#### References

• Babu, G. J. and Singh, K. (1983). Inference on means using the bootstrap. Ann. Statist. 11 999-1003.
• Barndorff-Nielsen, O. E. (1983). On a formula for the distribution of the maximum likelihood estimator. Biometrika 70 343-365.
• Barndorff-Nielsen, O. E. (1986). Inference on full or partial parameters based on the standardized signed log likelihood ratio. Biometrika 73 307-322.
• Barndorff-Nielsen, O. E. (1994). Adjusted versions of profile likelihood and likelihood roots, and extended likelihood. J. Roy. Statist. Soc. Ser. B 56 125-140.
• Barndorff-Nielsen, O. E. and Chamberlin, S. R. (1994). Stable and invariant adjusted directed likelihoods. Biometrika 81 485-499.
• Beran, R. (1987). Prepivoting to reduce level error of confidence sets. Biometrika 74 457-468.
• Bickel, P. J. (1987). Comment on "Better bootstrap confidence intervals" by B. Efron. J. Amer. Statist. Assoc. 82 191.
• Bickel, P. J. (1988). Discussion of "Theoretical comparison of bootstrap confidence intervals" by P. Hall. Ann. Statist. 16 959-961.
• Cox, D. R. and Reid, N. (1987). Parameter orthogonality and approximate conditional inference. J. Roy. Statist. Soc. Ser. B 49 1-39.
• Cox, D. R. and Reid, N. (1993). A note on the calculation of adjusted profile likelihood. J. Roy. Statist. Soc. Ser. B 55 467-472.
• DiCiccio, T. J. (1984). On parameter transformations and interval estimation. Biometrika 71 477-485.
• DiCiccio, T. J. and Efron, B. (1992). More accurate confidence intervals in exponential families. Biometrika 79 231-245.
• DiCiccio, T. J. and Martin, M. (1993). Simple modifications for signed roots of likelihood ratio statistics. J. Roy. Statist. Soc. Ser. B 55 305-316.
• DiCiccio, T. J. and Romano, J. P. (1995). On bootstrap procedures for second-order accurate confidence limits in parametric models. Statist. Sinica 5 141-160.
• Efron, B. (1979). Bootstrap methods: another look at the jackknife. Ann. Statist. 7 1-26.
• Efron, B. (1981). Nonparametric estimates of standard error: the jackknife, the bootstrap, and other methods. Biometrika 68 589-599.
• Efron, B. (1987). Better bootstrap confidence intervals (with discussion). J. Amer. Statist. Assoc. 82 171-200.
• Efron, B. (1993). Bay es and likelihood calculations from confidence intervals. Biometrika 80 3-26.
• Efron, B. (1994). Missing data, imputation, and the bootstrap (with comment and rejoinder). J. Amer. Statist. Assoc. 89 463-478.
• Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York.
• Hall, P. (1986). On the bootstrap and confidence intervals. Ann. Statist. 14 1431-1452.
• Hall, P. (1988). Theoretical comparison of bootstrap confidence intervals (with discussion). Ann. Statist. 16 927-985.
• Hall, P. and Martin, M. A. (1988). On bootstrap resampling and iteration. Biometrika 75 661-671.
• Hougaard, P. (1982). Parametrizations of non-linear models. J. Roy. Statist. Soc. Ser. B 44 244-252.
• Lawley, D. N. (1956). A general method for approximating to the distribution of the likelihood ratio criteria. Biometrika 43 295-303.
• Loh, W.-Y. (1987). Calibrating confidence coefficients. J. Amer. Statist. Assoc. 82 155-162.
• McCullagh, P. (1984). Local sufficiency. Biometrika 71 233-244.
• McCullagh, P. (1987). Tensor Methods in Statistics. Chapman and Hall, London.
• McCullagh, P. and Tibshirani, R. (1990). A simple method for the adjustment of profile likelihoods. J. Roy. Statist. Soc. Ser. B 52 325-344.
• Peers, H. W. (1965). On confidence points and Bayesian probability points in the case of several parameters. J. Roy. Statist. Soc. Ser. B 27 9-16.
• Pierce, D. and Peters, D. (1992). Practical use of higher order asy mptotics for multiparameter exponential families (with discussion) J. Roy. Stat. Soc. Ser. B 54 701-725.
• Sprott, D. A. (1980). Maximum likelihood in small samples: estimation in the presence of nuisance parameters. Biometrika 67 515-523.
• Tibshirani, R. (1988). Variance stabilization and the bootstrap. Biometrika 75 433-444.
• Tibshirani, R. (1989). Noninformative priors for one parameter of many. Biometrika 76 604-608.
• such as those described by Hall (1988). But why not replace them altogether with more informative tools? The bootstrap affords a unique opportunity for obtaining a large amount of information very simply. The process of setting confidence intervals merely picks two points off a bootstrap histogram, ignoring much relevant information about shape and other important features. "Confidence pictures" (e.g., Hall, 1992, Appendix III), essentially smoothed and transformed bootstrap histograms, are one alternative to confidence intervals. Graphics such as these provide a simple but powerful way to convey information lost in numerical summaries. The opportunities offered by dy namic graphics are also attractive, particularly when confidence information needs to be passed to a lay audience. (Consider, e.g., the need to provide information about the errors associated with predictions from opinion polls.) Bootstrap methods and new graphical way s of presenting information offer, together, exciting prospects for conveying information about uncertainty.
• jackknife-after-bootstrap plot for t (Efron, 1992). The ingenious idea that underlies this is that we can get the effect of bootstrapping the reduced data set y1 yj-1 yj+1 yn by considering only those bootstrap samples in which yj did not appear. The horizontal dotted lines are quantiles of t t for all 999 bootstrap replicates, while the solid lines join the corresponding quantiles for the subsets of bootstrap replicates in which each of the 20 observations did not appear. The x-axis shows empirical influence values lj which measure the effect on t of putting more mass on each of the observations separately. If F represents the empirical distribution function of the data, which puts mass n-1 on each of the observations y1 yn and t F is the corresponding statistic, we can write
• of Davison and Hinkley (1996). We have developed a library of bootstrap functions in S-PLUS which facilitates this ty pe of analysis. The library may be obtained by anony mous ftp to markov.stats.ox.ac.uk and retrieving the file pub/canty/bootlib.sh.Z.
• ABC, use bootstrap calibration directly on the crude percentile-based procedures these methods refine, and which seem currently favored in published applications of the bootstrap, as any literature search confirms. In doing so, we retain the desirable properties of these basic procedures (stability of length and endpoints, invariance under parametrization etc.) yet improve their coverage accuracy. The price is one of great computational expense, although, as is demonstrated by Lee and Young (1995), there are approximations which can bring such bootstrap iteration within the reach of even a modest computational budget. An advantage of this solution lies in its simplicity: there is no need to explain the mechanics of the method, in the way that is done for the BCa and ABC methods in Sections 2-4 of DiCiccio and Efron's paper. Which solution is best? To answer this requires a careful analysis of what we believe the bootstrap methodology to be. Our view is that willingness to use extensive computation to extract information from a data sample, by simulation or resampling, is quite fundamental. In other words, in comparing different methods, computational expense should not be a factor. All things being equal, we naturally look for computational efficiency, but things are hardly ever equal. How do the two solutions, that provided by DiCiccio and Efron and that involving the iterated percentile bootstrap, compare? There are two concerns here, theoretical performance and empirical performance, and the two might conflict. We demonstrate by considering the simple problem of constructing a two-sided nonparametric bootstrap confidence interval for a scalar population mean.
• (1986) and Beran (1987). The calibration method of Loh (1987) corresponds to the method of Beran (1987) when applied to a bootstrap confidence interval. For the confidence interval problem the method of Hall (1986) amounts to making an additive adjustment, estimated by the bootstrap, to the endpoints of the confidence interval, while the method of Beran (1987) amounts to making an additive adjustment, again estimated by bootstrapping, to the nominal coverage level of the bootstrap interval. The method of calibration described by DiCiccio and Efron in Section 7 of their paper is a subtle variation on the latter procedure, and one which should be used with care. DiCiccio and Efron use a method in which the bootstrap is used to calibrate separately the nominal levels of the lower and upper limits of the interval, rather than the overall nominal level. Theoretical and empirical evidence which we shall present elsewhere leads to the conclusion that, all things being taken into consideration, preference should be shown to methods which adjust nominal coverage, rather than the interval endpoints. We shall therefore focus on the question of how to calibrate the nominal coverage of a bootstrap confidence interval. The major difference between the two approaches to adjusting nominal coverage is that the method as illustrated by DiCiccio and Efron is only effective in reducing coverage error of the two-sided interval to order n-2 when the one-sided coverage-corrected interval achieves a coverage error of order n-3/2, as is the case with the ABC interval, but not the percentile interval. The effect of bootstrap calibration on the coverage error of one-sided intervals is discussed by Hall and Martin (1988) and by Martin (1990), who show that bootstrap coverage correction produces improvements in coverage accuracy of order n-1/2, therefore reducing coverage error from order n-1/2 to order n-1 for percentile intervals, but from order n-1 to order n-3/2 for the ABC interval. If the one-sided corrected interval has coverage error of order n-3/2, then separate correction of the upper and lower limits gives a two-sided interval with coverage error of order n-2, due to the fact that the order n-3/2 term involves an even poly nomial. With the percentile interval, the coverage error, of order n-1, of the coverage-corrected one-sided interval ty pically involves an odd poly nomial, and terms of that order will not cancel when determining the coverage error of the two-sided interval, which remains of order n-1. On the face of it, therefore, we should be wary of the calibration method described by DiCiccio and Efron, although the problems with it do not arise with the ABC interval.
• cedure: see also Martin (1990). Application of these methods to the intervals under consideration here allows closer examination of coverage error. Table 1 gives information on the theoretical leading terms in expansions of the coverage error of the percentile interval (denoted IP), iterated percentile interval (denoted IPITa and IPITb), ABC interval (denoted IABC) and iterated ABC interval (denoted by IABCITa and IABCITb). Figures refer to two-sided intervals of nominal coverage 90% and are shown for the two methods of nominal coverage calibration, for each of four underlying distributions. The intervals IPITa and IABCITa calibrate the overall nominal coverage, while the other two iterated intervals use calibration in the way discussed by DiCiccio and Efron. What is immediately obvious from the table is that the order of coverage error only tells part of the story. Compare the coefficients of n-1 for the interval IPITb with the coefficients of n-2 for the other iterated intervals. However, if we focus on those intervals that ensure a coverage error of order n-2, it appears that the two ty pes of iterated ABC interval are not significantly different, but that the iterated percentile interval has a leading error term consistently and significantly smaller than that of the ABC method. This same conclusion is true for any nominal coverage in the range 0.9-0.99.
• Daniels, H. E. and Young, G. A. (1991). Saddlepoint approximation for the Studentized mean, with an application to the bootstrap. Biometrika 78 169-179.
• Davison, A. C. and Hinkley, D. V. (1988). Saddlepoint approximations in resampling methods. Biometrika 75 417-431.
• Davison, A. C. and Hinkley, D. V. (1996). Bootstrap Methods and Their Application. Cambridge Univ. Press.
• Davison, A. C., Hinkley, D. V. and Worton, B. J. (1992). Bootstrap likelihoods. Biometrika 79 113-130.
• DiCiccio, T. J., Martin, M. A. and Young, G. A. (1992). Fast and accurate approximate double bootstrap confidence intervals. Biometrika 79 285-295.
• DiCiccio, T. J., Martin, M. A. and Young, G. A. (1993). Analy tical approximations for iterated bootstrap confidence intervals. Statistics and Computing 2 161-171.
• Efron, B. (1992). Jackknife-after-bootstrap standard errors and influence functions (with discussion). J. Roy Statist. Soc. Ser. B 54 83-127.
• Efron, B. and LePage, R. (1992). Introduction to bootstrap. In Exploring the Limits of Bootstrap (R. LePage and L. Billard, eds.) 3-10. Wiley, New York.
• Gleser, L. J. and Hwang, J. T. (1987). The nonexistence of 100 1 percent confidence sets of finite expected diameter in errors-in-variables and related models. Ann. Statist. 15 1351-1362.
• Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer, New York.
• Hall, P. and Jing, B-Y. (1995). Uniform coverage bounds for confidence intervals and Berry-Esseen theorems for Edgeworth expansion. Ann. of Statist. 23 363-375.
• Jensen, J. L. (1986). Similar tests and the standardized log likelihood ratio statistic. Biometrika 73 567-572.
• Lee, S. M. S. and Young, G. A. (1995). Asy mptotic iterated bootstrap confidence intervals. Ann. Statist. 23 1301-1330. Lee, S. M. S. and Young, G. A. (1996a). Sequential iterated bootstrap confidence intervals. J. Roy. Statist. Soc. Ser. B 58 235-252. Lee, S. M. S. and Young, G. A. (1996b). Asy mptotics and resampling methods. Computing Science and Statistics. To appear. Lu, K. L. and Berger, J. O. (1989a). Estimation of normal means: frequentist estimation of loss. Ann. Statist. 17 890-906. Lu, K. L. and Berger, J. O. (1989b). Estimated confidence for multivariate normal mean confidence set. J. Statist. Plann. Inference 23 1-20.
• Martin, M. A. (1990). On bootstrap iteration for coverage correction in confidence intervals. J. Amer. Statist. Assoc. 85 1105-1118.
• Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75 237-249.
• Owen, A. B. (1990). Empirical likelihood ratio confidence regions. Ann. Statist. 18 90-120.