The Annals of Statistics

Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling

Hidetoshi Shimodaira
Source: Ann. Statist. Volume 32, Number 6 (2004), 2616-2641.

Abstract

Approximately unbiased tests based on bootstrap probabilities are considered for the exponential family of distributions with unknown expectation parameter vector, where the null hypothesis is represented as an arbitrary-shaped region with smooth boundaries. This problem has been discussed previously in Efron and Tibshirani [Ann. Statist. 26 (1998) 1687–1718], and a corrected p-value with second-order asymptotic accuracy is calculated by the two-level bootstrap of Efron, Halloran and Holmes [Proc. Natl. Acad. Sci. U.S.A. 93 (1996) 13429–13434] based on the ABC bias correction of Efron [J. Amer. Statist. Assoc. 82 (1987) 171–185]. Our argument is an extension of their asymptotic theory, where the geometry, such as the signed distance and the curvature of the boundary, plays an important role. We give another calculation of the corrected p-value without finding the “nearest point” on the boundary to the observation, which is required in the two-level bootstrap and is an implementational burden in complicated problems. The key idea is to alter the sample size of the replicated dataset from that of the observed dataset. The frequency of the replicates falling in the region is counted for several sample sizes, and then the p-value is calculated by looking at the change in the frequencies along the changing sample sizes. This is the multiscale bootstrap of Shimodaira [Systematic Biology 51 (2002) 492–508], which is third-order accurate for the multivariate normal model. Here we introduce a newly devised multistep-multiscale bootstrap, calculating a third-order accurate p-value for the exponential family of distributions. In fact, our p-value is asymptotically equivalent to those obtained by the double bootstrap of Hall [The Bootstrap and Edgeworth Expansion (1992) Springer, New York] and the modified signed likelihood ratio of Barndorff-Nielsen [Biometrika 73 (1986) 307–322] ignoring O(n−3/2) terms, yet the computation is less demanding and free from model specification. The algorithm is remarkably simple despite complexity of the theory behind it. The differences of the p-values are illustrated in simple examples, and the accuracies of the bootstrap methods are shown in a systematic way.

First Page: Show Hide
Primary Subjects: 62G10
Secondary Subjects: 62G09
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1107794881
Digital Object Identifier: doi:10.1214/009053604000000823
Mathematical Reviews number (MathSciNet): MR2153997
Zentralblatt MATH identifier: 1078.62045

References

Barndorff-Nielsen, O. E. (1986). Inference on full or partial parameters based on the standardized signed log likelihood ratio. Biometrika 73 307--322.
Mathematical Reviews (MathSciNet): MR855891
Zentralblatt MATH: 0605.62020
Barndorff-Nielsen, O. E. and Cox, D. R. (1994). Inference and Asymptotics. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR1317097
Zentralblatt MATH: 0826.62004
DiCiccio, T. and Efron, B. (1992). More accurate confidence intervals in exponential families. Biometrika 79 231--245.
Mathematical Reviews (MathSciNet): MR1185126
Zentralblatt MATH: 0752.62027
Draper, N. R. and Smith, H. (1998). Applied Regression Analysis, 3rd ed. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1614335
Efron, B. (1985). Bootstrap confidence intervals for a class of parametric problems. Biometrika 72 45--58.
Mathematical Reviews (MathSciNet): MR790200
Zentralblatt MATH: 0567.62025
Efron, B. (1987). Better bootstrap confidence intervals (with discussion). J. Amer. Statist. Assoc. 82 171--200.
Mathematical Reviews (MathSciNet): MR883345
Efron, B., Halloran, E. and Holmes, S. (1996). Bootstrap confidence levels for phylogenetic trees. Proc. Natl. Acad. Sci. U.S.A. 93 13429--13434.
Efron, B. and Tibshirani, R. (1998). The problem of regions. Ann. Statist. 26 1687--1718.
Mathematical Reviews (MathSciNet): MR1673274
Digital Object Identifier: doi:10.1214/aos/1024691353
Project Euclid: euclid.aos/1024691353
Zentralblatt MATH: 0954.62031
Felsenstein, J. (1985). Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39 783--791.
Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer, New York.
Mathematical Reviews (MathSciNet): MR1145237
Kamimura, T., Shimodaira, H., Imoto, S., Kim, S., Tashiro, K., Kuhara, S. and Miyano, S. (2003). Multiscale bootstrap analysis of gene networks based on Bayesian networks and nonparametric regression. In Genome Informatics 2003 (M. Gribskov, M. Kanehisa, S. Miyano and T. Takagi, eds.) 350--351. Universal Academy Press, Tokyo.
Kuriki, S. and Takemura, A. (2000). Shrinkage estimation towards a closed convex set with a smooth boundary. J. Multivariate Anal. 75 79--111.
Mathematical Reviews (MathSciNet): MR1787403
Digital Object Identifier: doi:10.1006/jmva.1999.1895
Zentralblatt MATH: 0983.62033
Liu, R. Y. and Singh, K. (1997). Notions of limiting $P$ values based on data depth and bootstrap. J. Amer. Statist. Assoc. 92 266--277.
Mathematical Reviews (MathSciNet): MR1436115
McCullagh, P. (1984). Local sufficiency. Biometrika 71 233--244.
Mathematical Reviews (MathSciNet): MR767151
Zentralblatt MATH: 0573.62026
Perlman, M. D. and Wu, L. (1999). The emperor's new tests (with discussion). Statist. Sci. 14 355--381.
Mathematical Reviews (MathSciNet): MR1765215
Project Euclid: euclid.ss/1009212517
Perlman, M. D. and Wu, L. (2003). On the validity of the likelihood ratio and maximum likelihood methods. J. Statist. Plann. Inference 117 59--81.
Mathematical Reviews (MathSciNet): MR2001142
Digital Object Identifier: doi:10.1016/S0378-3758(02)00359-2
Zentralblatt MATH: 1022.62048
Severini, T. A. (2000). Likelihood Methods in Statistics. Oxford Univ. Press.
Mathematical Reviews (MathSciNet): MR1854870
Zentralblatt MATH: 0984.62002
Shimodaira, H. (2002). An approximately unbiased test of phylogenetic tree selection. Systematic Biology 51 492--508.
Shimodaira, H. (2004). Technical details of the multistep-multiscale bootstrap resampling. Research Report B-403, Dept. Mathematical and Computing Sciences, Tokyo Institute of Technology, Tokyo.
Shimodaira, H. and Hasegawa, M. (2001). CONSEL: For assessing the confidence of phylogenetic tree selection. Bioinformatics 17 1246--1247.
Weyl, H. (1939). On the volume of tubes. Amer. J. Math. 61 461--472.
Mathematical Reviews (MathSciNet): MR1507388

2012 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics