The Annals of Statistics

Using the bootstrap to quantify the authority of an empirical ranking

Peter Hall and Hugh Miller

Source: Ann. Statist. Volume 37, Number 6B (2009), 3929-3959.

Abstract

The bootstrap is a popular and convenient method for quantifying the authority of an empirical ordering of attributes, for example of a ranking of the performance of institutions or of the influence of genes on a response variable. In the first of these examples, the number, p, of quantities being ordered is sometimes only moderate in size; in the second it can be very large, often much greater than sample size. However, we show that in both types of problem the conventional bootstrap can produce inconsistency. Moreover, the standard n-out-of-n bootstrap estimator of the distribution of an empirical rank may not converge in the usual sense; the estimator may converge in distribution, but not in probability. Nevertheless, in many cases the bootstrap correctly identifies the support of the asymptotic distribution of ranks. In some contemporary problems, bootstrap prediction intervals for ranks are particularly long, and in this context, we also quantify the accuracy of bootstrap methods, showing that the standard bootstrap gets the order of magnitude of the interval right, but not the constant multiplier of interval length. The m-out-of-n bootstrap can improve performance and produce statistical consistency, but it requires empirical choice of m; we suggest a tuning solution to this problem. We show that in genomic examples, where it might be expected that the standard, “synchronous” bootstrap will successfully accommodate nonindependence of vector components, that approach can produce misleading results. An “independent component” bootstrap can overcome these difficulties, even in cases where components are not strictly independent.

Primary Subjects: 62G09
Secondary Subjects: 62G30
Keywords: Confidence interval; genomic data; high dimension; independent-component bootstrap; m-out-of-n bootstrap; ordering; overlap interval; prediction interval; synchronous bootstrap

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1256303532
Digital Object Identifier: doi:10.1214/09-AOS699

References

Amosova, N. N. (1972). On limit theorems for probabilities of moderate deviations. Vestnik Leningrad. Univ. No. 13 Mat. Meth. Astronom. 3 5–14, 148.
Mathematical Reviews (MathSciNet): MR331484
Cesário, L. C. and Barreto, M. C. M. (2003). Study of the performance of bootstrap confidence intervals for the mean of a normal distribution using perfectly ranked set sampling. Rev. Mat. Estatíst. 21 7–20.
Golstein, H. and Spiegelhalter, D. J. (1996). League tables and their limitations: Statistical issues in comparisons of institutional performance. J. Roy. Statist. Soc. Ser. A 159 385–443.
Hall, P. and Miller, H. (2009). Using generalised correlation to effect variable selection in very high-dimensional problems. J. Comput. Graph. Statist. To appear.
Hui, T. P., Modarres, R. and Zheng, G. (2005). Bootstrap confidence interval estimation of mean via ranked set sampling linear regression. J. Stat. Comput. Simul. 75 543–553.
Mathematical Reviews (MathSciNet): MR2162545
Zentralblatt MATH: 1067.62012
Digital Object Identifier: doi:10.1080/00949650412331286124
Langford, I. H. and Leyland, A. H. (1996). Discussion of Goldstein and Spiegelhalter. J. Roy. Statist. Soc. Ser. A 159 427–428.
Larocque, D. and Léger, C. (1994). Bootstrap estimates of the power of a rank test in a randomized block design. Statist. Sinica 4 423–443.
Mathematical Reviews (MathSciNet): MR1309422
Zentralblatt MATH: 0824.62040
Mukherjee, S. N., Sykacek, P., Roberts, S. J. and Gurr, S. J. (2003). Gene ranking using bootstrapped p-values. Sigkdd Explorations 5 14–18.
Pelin, P., Brcich, R. and Zoubir, A. (2000). A bootstrap technique for rank estimation. In Statistical Signal and Array Processing, 2000—Proceedings of the Tenth IEEE Workshop 94–98. IEEE, Pocono Manor, PA, USA.
Rubin, H. and Sethuraman, J. (1965). Probabilities of moderate deviations. Sankhyā Ser. A 27 325–346.
Mathematical Reviews (MathSciNet): MR203783
Segal, M. R., Dahlquist, K. D. and Conklin, B. R. (2003). Regression approaches for microarray data analysis. J. Comput. Biol. 10 961–980.
Srivastava, M. S. (1987). Bootstrap method in ranking and slippage problems. Comm. Statist. Theory Methods 16 3285–3299.
Mathematical Reviews (MathSciNet): MR917768
Zentralblatt MATH: 0655.62051
Digital Object Identifier: doi:10.1080/03610928708829571
Steland, A. (1998). Bootstrapping rank statistics. Metrika 47 251–264.
Mathematical Reviews (MathSciNet): MR1649090
Digital Object Identifier: doi:10.1007/BF02742877
Taconeli, C. A. and Barreto, M. C. M. (2005). Evaluation of a bootstrap confidence interval approach in perfectly ranked set sampling. Rev. Mat. Estatíst. 23 33–53.
Tu, X. M., Burdick, D. S. and Mitchell, B. C. (1992). Nonparametric rank estimation using bootstrap resampling and canonical correlation analysis. In Exploring the Limits of Bootstrap (R. LePage and L. Billard, eds.) 405–418. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1197798
Zentralblatt MATH: 0841.62024

2009 © Institute of Mathematical Statistics