The Annals of Statistics

Using the bootstrap to quantify the authority of an empirical ranking

Peter Hall and Hugh Miller
Source: Ann. Statist. Volume 37, Number 6B (2009), 3929-3959.

Abstract

The bootstrap is a popular and convenient method for quantifying the authority of an empirical ordering of attributes, for example of a ranking of the performance of institutions or of the influence of genes on a response variable. In the first of these examples, the number, p, of quantities being ordered is sometimes only moderate in size; in the second it can be very large, often much greater than sample size. However, we show that in both types of problem the conventional bootstrap can produce inconsistency. Moreover, the standard n-out-of-n bootstrap estimator of the distribution of an empirical rank may not converge in the usual sense; the estimator may converge in distribution, but not in probability. Nevertheless, in many cases the bootstrap correctly identifies the support of the asymptotic distribution of ranks. In some contemporary problems, bootstrap prediction intervals for ranks are particularly long, and in this context, we also quantify the accuracy of bootstrap methods, showing that the standard bootstrap gets the order of magnitude of the interval right, but not the constant multiplier of interval length. The m-out-of-n bootstrap can improve performance and produce statistical consistency, but it requires empirical choice of m; we suggest a tuning solution to this problem. We show that in genomic examples, where it might be expected that the standard, “synchronous” bootstrap will successfully accommodate nonindependence of vector components, that approach can produce misleading results. An “independent component” bootstrap can overcome these difficulties, even in cases where components are not strictly independent.

First Page: Show Hide
Primary Subjects: 62G09
Secondary Subjects: 62G30
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1256303532
Digital Object Identifier: doi:10.1214/09-AOS699
Zentralblatt MATH identifier: 05644261
Mathematical Reviews number (MathSciNet): MR2572448

References

Amosova, N. N. (1972). On limit theorems for probabilities of moderate deviations. Vestnik Leningrad. Univ. No. 13 Mat. Meth. Astronom. 3 5–14, 148.
Mathematical Reviews (MathSciNet): MR331484
Cesário, L. C. and Barreto, M. C. M. (2003). Study of the performance of bootstrap confidence intervals for the mean of a normal distribution using perfectly ranked set sampling. Rev. Mat. Estatíst. 21 7–20.
Golstein, H. and Spiegelhalter, D. J. (1996). League tables and their limitations: Statistical issues in comparisons of institutional performance. J. Roy. Statist. Soc. Ser. A 159 385–443.
Hall, P. and Miller, H. (2009). Using generalised correlation to effect variable selection in very high-dimensional problems. J. Comput. Graph. Statist. To appear.
Mathematical Reviews (MathSciNet): MR2572448
Zentralblatt MATH: 05644261
Digital Object Identifier: doi:10.1214/09-AOS699
Project Euclid: euclid.aos/1256303532
Hui, T. P., Modarres, R. and Zheng, G. (2005). Bootstrap confidence interval estimation of mean via ranked set sampling linear regression. J. Stat. Comput. Simul. 75 543–553.
Mathematical Reviews (MathSciNet): MR2162545
Zentralblatt MATH: 1067.62012
Digital Object Identifier: doi:10.1080/00949650412331286124
Langford, I. H. and Leyland, A. H. (1996). Discussion of Goldstein and Spiegelhalter. J. Roy. Statist. Soc. Ser. A 159 427–428.
Larocque, D. and Léger, C. (1994). Bootstrap estimates of the power of a rank test in a randomized block design. Statist. Sinica 4 423–443.
Mathematical Reviews (MathSciNet): MR1309422
Zentralblatt MATH: 0824.62040
Mukherjee, S. N., Sykacek, P., Roberts, S. J. and Gurr, S. J. (2003). Gene ranking using bootstrapped p-values. Sigkdd Explorations 5 14–18.
Pelin, P., Brcich, R. and Zoubir, A. (2000). A bootstrap technique for rank estimation. In Statistical Signal and Array Processing, 2000—Proceedings of the Tenth IEEE Workshop 94–98. IEEE, Pocono Manor, PA, USA.
Rubin, H. and Sethuraman, J. (1965). Probabilities of moderate deviations. Sankhyā Ser. A 27 325–346.
Mathematical Reviews (MathSciNet): MR203783
Segal, M. R., Dahlquist, K. D. and Conklin, B. R. (2003). Regression approaches for microarray data analysis. J. Comput. Biol. 10 961–980.
Srivastava, M. S. (1987). Bootstrap method in ranking and slippage problems. Comm. Statist. Theory Methods 16 3285–3299.
Mathematical Reviews (MathSciNet): MR917768
Zentralblatt MATH: 0655.62051
Digital Object Identifier: doi:10.1080/03610928708829571
Steland, A. (1998). Bootstrapping rank statistics. Metrika 47 251–264.
Mathematical Reviews (MathSciNet): MR1649090
Digital Object Identifier: doi:10.1007/BF02742877
Taconeli, C. A. and Barreto, M. C. M. (2005). Evaluation of a bootstrap confidence interval approach in perfectly ranked set sampling. Rev. Mat. Estatíst. 23 33–53.
Tu, X. M., Burdick, D. S. and Mitchell, B. C. (1992). Nonparametric rank estimation using bootstrap resampling and canonical correlation analysis. In Exploring the Limits of Bootstrap (R. LePage and L. Billard, eds.) 405–418. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1197798
Zentralblatt MATH: 0841.62024

2012 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics