## Statistical Science

- Statist. Sci.
- Volume 11, Number 3 (1996), 189-228.

### Bootstrap confidence intervals

Thomas J. DiCiccio and Bradley Efron

**Full-text: Open access**

#### Abstract

This article surveys bootstrap methods for producing good approximate confidence intervals. The goal is to improve by an order of magnitude upon the accuracy of the standard intervals $\hat{\theta} \pm z^{(\alpha)} \hat{\sigma}$, in a way that allows routine application even to very complicated problems. Both theory and examples are used to show how this is done. The first seven sections provide a heuristic overview of four bootstrap confidence interval procedures: $BC_a$, bootstrap-*t* , ABC and calibration. Sections 8 and 9 describe the theory behind these methods, and their close connection with the likelihood-based confidence interval theory developed by Barndorff-Nielsen, Cox and Reid and others.

#### Article information

**Source**

Statist. Sci., Volume 11, Number 3 (1996), 189-228.

**Dates**

First available in Project Euclid: 17 September 2002

**Permanent link to this document**

https://projecteuclid.org/euclid.ss/1032280214

**Digital Object Identifier**

doi:10.1214/ss/1032280214

**Mathematical Reviews number (MathSciNet)**

MR1436647

**Zentralblatt MATH identifier**

0955.62574

**Keywords**

Bootstrap-$t$ $BC_a$ and ABC methods calibration second-order accuracy

#### Citation

DiCiccio, Thomas J.; Efron, Bradley. Bootstrap confidence intervals. Statist. Sci. 11 (1996), no. 3, 189--228. doi:10.1214/ss/1032280214. https://projecteuclid.org/euclid.ss/1032280214

#### References

- Babu, G. J. and Singh, K. (1983). Inference on means using the bootstrap. Ann. Statist. 11 999-1003. Mathematical Reviews (MathSciNet): MR84i:62049

Zentralblatt MATH: 0539.62043

Digital Object Identifier: doi:10.1214/aos/1176346267

Project Euclid: euclid.aos/1176346267 - Barndorff-Nielsen, O. E. (1983). On a formula for the distribution of the maximum likelihood estimator. Biometrika 70 343-365. Mathematical Reviews (MathSciNet): MR85b:62023

Zentralblatt MATH: 0532.62006

Digital Object Identifier: doi:10.1093/biomet/70.2.343

JSTOR: links.jstor.org - Barndorff-Nielsen, O. E. (1986). Inference on full or partial parameters based on the standardized signed log likelihood ratio. Biometrika 73 307-322.
- Barndorff-Nielsen, O. E. (1994). Adjusted versions of profile likelihood and likelihood roots, and extended likelihood. J. Roy. Statist. Soc. Ser. B 56 125-140.
- Barndorff-Nielsen, O. E. and Chamberlin, S. R. (1994). Stable and invariant adjusted directed likelihoods. Biometrika 81 485-499. Mathematical Reviews (MathSciNet): MR96c:62009

Zentralblatt MATH: 0812.62030

Digital Object Identifier: doi:10.1093/biomet/81.3.485

JSTOR: links.jstor.org - Beran, R. (1987). Prepivoting to reduce level error of confidence sets. Biometrika 74 457-468. Mathematical Reviews (MathSciNet): MR89h:62049

Zentralblatt MATH: 0663.62045

Digital Object Identifier: doi:10.1093/biomet/74.3.457

JSTOR: links.jstor.org - Bickel, P. J. (1987). Comment on "Better bootstrap confidence intervals" by B. Efron. J. Amer. Statist. Assoc. 82 191. Mathematical Reviews (MathSciNet): MR883345

Zentralblatt MATH: 0622.62039

Digital Object Identifier: doi:10.2307/2289144

JSTOR: links.jstor.org - Bickel, P. J. (1988). Discussion of "Theoretical comparison of bootstrap confidence intervals" by P. Hall. Ann. Statist. 16 959-961. Mathematical Reviews (MathSciNet): MR959185

Zentralblatt MATH: 0663.62046

Digital Object Identifier: doi:10.1214/aos/1176350933

Project Euclid: euclid.aos/1176350933 - Cox, D. R. and Reid, N. (1987). Parameter orthogonality and approximate conditional inference. J. Roy. Statist. Soc. Ser. B 49 1-39.
- Cox, D. R. and Reid, N. (1993). A note on the calculation of adjusted profile likelihood. J. Roy. Statist. Soc. Ser. B 55 467-472.
- DiCiccio, T. J. (1984). On parameter transformations and interval estimation. Biometrika 71 477-485. Mathematical Reviews (MathSciNet): MR86e:62044

Zentralblatt MATH: 0566.62022

Digital Object Identifier: doi:10.1093/biomet/71.3.477

JSTOR: links.jstor.org - DiCiccio, T. J. and Efron, B. (1992). More accurate confidence intervals in exponential families. Biometrika 79 231-245. Mathematical Reviews (MathSciNet): MR93k:62070

Zentralblatt MATH: 0752.62027

Digital Object Identifier: doi:10.1093/biomet/79.2.231

JSTOR: links.jstor.org - DiCiccio, T. J. and Martin, M. (1993). Simple modifications for signed roots of likelihood ratio statistics. J. Roy. Statist. Soc. Ser. B 55 305-316.
- DiCiccio, T. J. and Romano, J. P. (1995). On bootstrap procedures for second-order accurate confidence limits in parametric models. Statist. Sinica 5 141-160.
- Efron, B. (1979). Bootstrap methods: another look at the jackknife. Ann. Statist. 7 1-26. Mathematical Reviews (MathSciNet): MR80b:62021

Zentralblatt MATH: 0406.62024

Digital Object Identifier: doi:10.1214/aos/1176344552

Project Euclid: euclid.aos/1176344552 - Efron, B. (1981). Nonparametric estimates of standard error: the jackknife, the bootstrap, and other methods. Biometrika 68 589-599. Mathematical Reviews (MathSciNet): MR83a:62089

Zentralblatt MATH: 0487.62031

Digital Object Identifier: doi:10.1093/biomet/68.3.589

JSTOR: links.jstor.org - Efron, B. (1987). Better bootstrap confidence intervals (with discussion). J. Amer. Statist. Assoc. 82 171-200. Mathematical Reviews (MathSciNet): MR88m:62053

Zentralblatt MATH: 0622.62039

Digital Object Identifier: doi:10.2307/2289144

JSTOR: links.jstor.org - Efron, B. (1993). Bay es and likelihood calculations from confidence intervals. Biometrika 80 3-26. Mathematical Reviews (MathSciNet): MR94m:62082

Zentralblatt MATH: 0773.62021

Digital Object Identifier: doi:10.1093/biomet/80.1.3

JSTOR: links.jstor.org - Efron, B. (1994). Missing data, imputation, and the bootstrap (with comment and rejoinder). J. Amer. Statist. Assoc. 89 463-478. Mathematical Reviews (MathSciNet): MR95g:62091

Zentralblatt MATH: 0806.62033

Digital Object Identifier: doi:10.2307/2290846

JSTOR: links.jstor.org - Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York.
- Hall, P. (1986). On the bootstrap and confidence intervals. Ann. Statist. 14 1431-1452. Zentralblatt MATH: 0611.62047

Mathematical Reviews (MathSciNet): MR868310

Digital Object Identifier: doi:10.1214/aos/1176350168

Project Euclid: euclid.aos/1176350168 - Hall, P. (1988). Theoretical comparison of bootstrap confidence intervals (with discussion). Ann. Statist. 16 927-985. Mathematical Reviews (MathSciNet): MR959185

Zentralblatt MATH: 0663.62046

Digital Object Identifier: doi:10.1214/aos/1176350933

Project Euclid: euclid.aos/1176350933 - Hall, P. and Martin, M. A. (1988). On bootstrap resampling and iteration. Biometrika 75 661-671. Zentralblatt MATH: 0659.62053

Mathematical Reviews (MathSciNet): MR995110

Digital Object Identifier: doi:10.1093/biomet/75.4.661

JSTOR: links.jstor.org - Hougaard, P. (1982). Parametrizations of non-linear models. J. Roy. Statist. Soc. Ser. B 44 244-252.
- Lawley, D. N. (1956). A general method for approximating to the distribution of the likelihood ratio criteria. Biometrika 43 295-303.
- Loh, W.-Y. (1987). Calibrating confidence coefficients. J. Amer. Statist. Assoc. 82 155-162. Mathematical Reviews (MathSciNet): MR88e:62082

Zentralblatt MATH: 0608.62057

Digital Object Identifier: doi:10.2307/2289142

JSTOR: links.jstor.org - McCullagh, P. (1984). Local sufficiency. Biometrika 71 233-244. Mathematical Reviews (MathSciNet): MR86f:62015

Zentralblatt MATH: 0573.62026

Digital Object Identifier: doi:10.1093/biomet/71.2.233

JSTOR: links.jstor.org - McCullagh, P. (1987). Tensor Methods in Statistics. Chapman and Hall, London.
- McCullagh, P. and Tibshirani, R. (1990). A simple method for the adjustment of profile likelihoods. J. Roy. Statist. Soc. Ser. B 52 325-344.
- Peers, H. W. (1965). On confidence points and Bayesian probability points in the case of several parameters. J. Roy. Statist. Soc. Ser. B 27 9-16.
- Pierce, D. and Peters, D. (1992). Practical use of higher order asy mptotics for multiparameter exponential families (with discussion) J. Roy. Stat. Soc. Ser. B 54 701-725.
- Sprott, D. A. (1980). Maximum likelihood in small samples: estimation in the presence of nuisance parameters. Biometrika 67 515-523. Mathematical Reviews (MathSciNet): MR82e:62052

Zentralblatt MATH: 0447.62040

Digital Object Identifier: doi:10.1093/biomet/67.3.515

JSTOR: links.jstor.org - Tibshirani, R. (1988). Variance stabilization and the bootstrap. Biometrika 75 433-444. Mathematical Reviews (MathSciNet): MR90f:62144

Zentralblatt MATH: 0651.62039

Digital Object Identifier: doi:10.1093/biomet/75.3.433

JSTOR: links.jstor.org - Tibshirani, R. (1989). Noninformative priors for one parameter of many. Biometrika 76 604-608.Mathematical Reviews (MathSciNet): MR1040654

Zentralblatt MATH: 0678.62010

Digital Object Identifier: doi:10.1093/biomet/76.3.604

JSTOR: links.jstor.org - such as those described by Hall (1988). But why not replace them altogether with more informative tools? The bootstrap affords a unique opportunity for obtaining a large amount of information very simply. The process of setting confidence intervals merely picks two points off a bootstrap histogram, ignoring much relevant information about shape and other important features. "Confidence pictures" (e.g., Hall, 1992, Appendix III), essentially smoothed and transformed bootstrap histograms, are one alternative to confidence intervals. Graphics such as these provide a simple but powerful way to convey information lost in numerical summaries. The opportunities offered by dy namic graphics are also attractive, particularly when confidence information needs to be passed to a lay audience. (Consider, e.g., the need to provide information about the errors associated with predictions from opinion polls.) Bootstrap methods and new graphical way s of presenting information offer, together, exciting prospects for conveying information about uncertainty.
- jackknife-after-bootstrap plot for t (Efron, 1992). The ingenious idea that underlies this is that we can get the effect of bootstrapping the reduced data set y1 yj-1 yj+1 yn by considering only those bootstrap samples in which yj did not appear. The horizontal dotted lines are quantiles of t t for all 999 bootstrap replicates, while the solid lines join the corresponding quantiles for the subsets of bootstrap replicates in which each of the 20 observations did not appear. The x-axis shows empirical influence values lj which measure the effect on t of putting more mass on each of the observations separately. If F represents the empirical distribution function of the data, which puts mass n-1 on each of the observations y1 yn and t F is the corresponding statistic, we can write
- of Davison and Hinkley (1996). We have developed a library of bootstrap functions in S-PLUS which facilitates this ty pe of analysis. The library may be obtained by anony mous ftp to markov.stats.ox.ac.uk and retrieving the file pub/canty/bootlib.sh.Z.
- ABC, use bootstrap calibration directly on the crude percentile-based procedures these methods refine, and which seem currently favored in published applications of the bootstrap, as any literature search confirms. In doing so, we retain the desirable properties of these basic procedures (stability of length and endpoints, invariance under parametrization etc.) yet improve their coverage accuracy. The price is one of great computational expense, although, as is demonstrated by Lee and Young (1995), there are approximations which can bring such bootstrap iteration within the reach of even a modest computational budget. An advantage of this solution lies in its simplicity: there is no need to explain the mechanics of the method, in the way that is done for the BCa and ABC methods in Sections 2-4 of DiCiccio and Efron's paper. Which solution is best? To answer this requires a careful analysis of what we believe the bootstrap methodology to be. Our view is that willingness to use extensive computation to extract information from a data sample, by simulation or resampling, is quite fundamental. In other words, in comparing different methods, computational expense should not be a factor. All things being equal, we naturally look for computational efficiency, but things are hardly ever equal. How do the two solutions, that provided by DiCiccio and Efron and that involving the iterated percentile bootstrap, compare? There are two concerns here, theoretical performance and empirical performance, and the two might conflict. We demonstrate by considering the simple problem of constructing a two-sided nonparametric bootstrap confidence interval for a scalar population mean.
- (1986) and Beran (1987). The calibration method of Loh (1987) corresponds to the method of Beran (1987) when applied to a bootstrap confidence interval. For the confidence interval problem the method of Hall (1986) amounts to making an additive adjustment, estimated by the bootstrap, to the endpoints of the confidence interval, while the method of Beran (1987) amounts to making an additive adjustment, again estimated by bootstrapping, to the nominal coverage level of the bootstrap interval. The method of calibration described by DiCiccio and Efron in Section 7 of their paper is a subtle variation on the latter procedure, and one which should be used with care. DiCiccio and Efron use a method in which the bootstrap is used to calibrate separately the nominal levels of the lower and upper limits of the interval, rather than the overall nominal level. Theoretical and empirical evidence which we shall present elsewhere leads to the conclusion that, all things being taken into consideration, preference should be shown to methods which adjust nominal coverage, rather than the interval endpoints. We shall therefore focus on the question of how to calibrate the nominal coverage of a bootstrap confidence interval. The major difference between the two approaches to adjusting nominal coverage is that the method as illustrated by DiCiccio and Efron is only effective in reducing coverage error of the two-sided interval to order n-2 when the one-sided coverage-corrected interval achieves a coverage error of order n-3/2, as is the case with the ABC interval, but not the percentile interval. The effect of bootstrap calibration on the coverage error of one-sided intervals is discussed by Hall and Martin (1988) and by Martin (1990), who show that bootstrap coverage correction produces improvements in coverage accuracy of order n-1/2, therefore reducing coverage error from order n-1/2 to order n-1 for percentile intervals, but from order n-1 to order n-3/2 for the ABC interval. If the one-sided corrected interval has coverage error of order n-3/2, then separate correction of the upper and lower limits gives a two-sided interval with coverage error of order n-2, due to the fact that the order n-3/2 term involves an even poly nomial. With the percentile interval, the coverage error, of order n-1, of the coverage-corrected one-sided interval ty pically involves an odd poly nomial, and terms of that order will not cancel when determining the coverage error of the two-sided interval, which remains of order n-1. On the face of it, therefore, we should be wary of the calibration method described by DiCiccio and Efron, although the problems with it do not arise with the ABC interval.
- cedure: see also Martin (1990). Application of these methods to the intervals under consideration here allows closer examination of coverage error. Table 1 gives information on the theoretical leading terms in expansions of the coverage error of the percentile interval (denoted IP), iterated percentile interval (denoted IPITa and IPITb), ABC interval (denoted IABC) and iterated ABC interval (denoted by IABCITa and IABCITb). Figures refer to two-sided intervals of nominal coverage 90% and are shown for the two methods of nominal coverage calibration, for each of four underlying distributions. The intervals IPITa and IABCITa calibrate the overall nominal coverage, while the other two iterated intervals use calibration in the way discussed by DiCiccio and Efron. What is immediately obvious from the table is that the order of coverage error only tells part of the story. Compare the coefficients of n-1 for the interval IPITb with the coefficients of n-2 for the other iterated intervals. However, if we focus on those intervals that ensure a coverage error of order n-2, it appears that the two ty pes of iterated ABC interval are not significantly different, but that the iterated percentile interval has a leading error term consistently and significantly smaller than that of the ABC method. This same conclusion is true for any nominal coverage in the range 0.9-0.99.
- Daniels, H. E. and Young, G. A. (1991). Saddlepoint approximation for the Studentized mean, with an application to the bootstrap. Biometrika 78 169-179. Mathematical Reviews (MathSciNet): MR92g:62023

Digital Object Identifier: doi:10.1093/biomet/78.1.169

JSTOR: links.jstor.org - Davison, A. C. and Hinkley, D. V. (1988). Saddlepoint approximations in resampling methods. Biometrika 75 417-431.Mathematical Reviews (MathSciNet): MR90b:62039

Zentralblatt MATH: 0651.62018

Digital Object Identifier: doi:10.1093/biomet/75.3.417

JSTOR: links.jstor.org - Davison, A. C. and Hinkley, D. V. (1996). Bootstrap Methods and Their Application. Cambridge Univ. Press.
- Davison, A. C., Hinkley, D. V. and Worton, B. J. (1992). Bootstrap likelihoods. Biometrika 79 113-130. Mathematical Reviews (MathSciNet): MR93c:62078

Zentralblatt MATH: 0753.62026

Digital Object Identifier: doi:10.1093/biomet/79.1.113

JSTOR: links.jstor.org - DiCiccio, T. J., Martin, M. A. and Young, G. A. (1992). Fast and accurate approximate double bootstrap confidence intervals. Biometrika 79 285-295. Mathematical Reviews (MathSciNet): MR93f:62040

Zentralblatt MATH: 0751.62013

Digital Object Identifier: doi:10.1093/biomet/79.2.285

JSTOR: links.jstor.org - DiCiccio, T. J., Martin, M. A. and Young, G. A. (1993). Analy tical approximations for iterated bootstrap confidence intervals. Statistics and Computing 2 161-171.
- Efron, B. (1992). Jackknife-after-bootstrap standard errors and influence functions (with discussion). J. Roy Statist. Soc. Ser. B 54 83-127.
- Efron, B. and LePage, R. (1992). Introduction to bootstrap. In Exploring the Limits of Bootstrap (R. LePage and L. Billard, eds.) 3-10. Wiley, New York.
- Gleser, L. J. and Hwang, J. T. (1987). The nonexistence of 100 1 percent confidence sets of finite expected diameter in errors-in-variables and related models. Ann. Statist. 15 1351-1362. Mathematical Reviews (MathSciNet): MR88k:62058

Zentralblatt MATH: 0638.62035

Digital Object Identifier: doi:10.1214/aos/1176350597

Project Euclid: euclid.aos/1176350597 - Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer, New York. Mathematical Reviews (MathSciNet): MR1145237
- Hall, P. and Jing, B-Y. (1995). Uniform coverage bounds for confidence intervals and Berry-Esseen theorems for Edgeworth expansion. Ann. of Statist. 23 363-375. Mathematical Reviews (MathSciNet): MR1332571

Zentralblatt MATH: 0824.62043

Digital Object Identifier: doi:10.1214/aos/1176324525

Project Euclid: euclid.aos/1176324525 - Jensen, J. L. (1986). Similar tests and the standardized log likelihood ratio statistic. Biometrika 73 567-572. Zentralblatt MATH: 0608.62021

Mathematical Reviews (MathSciNet): MR897847

Digital Object Identifier: doi:10.1093/biomet/73.3.567

JSTOR: links.jstor.org - Lee, S. M. S. and Young, G. A. (1995). Asy mptotic iterated bootstrap confidence intervals. Ann. Statist. 23 1301-1330. Lee, S. M. S. and Young, G. A. (1996a). Sequential iterated bootstrap confidence intervals. J. Roy. Statist. Soc. Ser. B 58 235-252. Lee, S. M. S. and Young, G. A. (1996b). Asy mptotics and resampling methods. Computing Science and Statistics. To appear. Lu, K. L. and Berger, J. O. (1989a). Estimation of normal means: frequentist estimation of loss. Ann. Statist. 17 890-906. Lu, K. L. and Berger, J. O. (1989b). Estimated confidence for multivariate normal mean confidence set. J. Statist. Plann. Inference 23 1-20. Mathematical Reviews (MathSciNet): MR1353507

Zentralblatt MATH: 0838.62034

Digital Object Identifier: doi:10.1214/aos/1176324710

Project Euclid: euclid.aos/1176324710 - Martin, M. A. (1990). On bootstrap iteration for coverage correction in confidence intervals. J. Amer. Statist. Assoc. 85 1105-1118. Zentralblatt MATH: 0736.62040

Mathematical Reviews (MathSciNet): MR1134507

Digital Object Identifier: doi:10.2307/2289608

JSTOR: links.jstor.org - Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75 237-249. Mathematical Reviews (MathSciNet): MR90b:62047

Zentralblatt MATH: 0641.62032

Digital Object Identifier: doi:10.1093/biomet/75.2.237

JSTOR: links.jstor.org - Owen, A. B. (1990). Empirical likelihood ratio confidence regions. Ann. Statist. 18 90-120.Mathematical Reviews (MathSciNet): MR91g:62037

Zentralblatt MATH: 0712.62040

Digital Object Identifier: doi:10.1214/aos/1176347494

Project Euclid: euclid.aos/1176347494

#### See also

- Includes: Peter Hall, Michael A. Martin. Comment.
- Includes: A. J. Canty, A. C. Davison, D. V. Hinkley. Comment.
- Includes: Leon Jay Gleser. Comment.
- Includes: Stephen M. S. Lee, G. Alastair Young. Comment.
- Includes: Thomas J. DiCiccio, Bradley Efron. Rejoinder.

### More like this

- Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling

Shimodaira, Hidetoshi, The Annals of Statistics, 2004 - Imputation and post-selection inference in models with missing data: An application to colorectal cancer surveillance guidelines

Liu, Lin, Qiu, Yuqi, Natarajan, Loki, and Messer, Karen, The Annals of Applied Statistics, 2019 - Multivariate prediction

Manuel Corcuera, José and Giummolè, Federica, Bernoulli, 2006

- Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling

Shimodaira, Hidetoshi, The Annals of Statistics, 2004 - Imputation and post-selection inference in models with missing data: An application to colorectal cancer surveillance guidelines

Liu, Lin, Qiu, Yuqi, Natarajan, Loki, and Messer, Karen, The Annals of Applied Statistics, 2019 - Multivariate prediction

Manuel Corcuera, José and Giummolè, Federica, Bernoulli, 2006 - Improved inference for the generalized Pareto distribution

Pires, Juliana F., Cysneiros, Audrey H. M. A., and Cribari-Neto, Francisco, Brazilian Journal of Probability and Statistics, 2018 - Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy

Efron, B. and Tibshirani, R., Statistical Science, 1986 - Iterated smoothed bootstrap confidence intervals for population quantiles

Ho, Yvonne H. S. and Lee, Stephen M. S., The Annals of Statistics, 2005 - Bootrapping robust estimates of regression

Salibian-Barrera, Matias and Zamar, Ruben H., The Annals of Statistics, 2002 - Theoretical Comparison of Bootstrap Confidence Intervals

Hall, Peter, The Annals of Statistics, 1988 - On Two-Sided Tolerance Intervals for a Normal Distribution

Ellison, Bob E., The Annals of Mathematical Statistics, 1964 - Nonparametric confidence intervals for monotone functions

Groeneboom, Piet and Jongbloed, Geurt, The Annals of Statistics, 2015