Using the bootstrap to quantify the authority of an empirical ranking
Peter Hall and Hugh Miller
Source: Ann. Statist.
Volume 37, Number 6B
(2009), 3929-3959.
Abstract
The bootstrap is a popular and convenient method for quantifying the authority of an empirical ordering of attributes, for example of a ranking of the performance of institutions or of the influence of genes on a response variable. In the first of these examples, the number, p, of quantities being ordered is sometimes only moderate in size; in the second it can be very large, often much greater than sample size. However, we show that in both types of problem the conventional bootstrap can produce inconsistency. Moreover, the standard n-out-of-n bootstrap estimator of the distribution of an empirical rank may not converge in the usual sense; the estimator may converge in distribution, but not in probability. Nevertheless, in many cases the bootstrap correctly identifies the support of the asymptotic distribution of ranks. In some contemporary problems, bootstrap prediction intervals for ranks are particularly long, and in this context, we also quantify the accuracy of bootstrap methods, showing that the standard bootstrap gets the order of magnitude of the interval right, but not the constant multiplier of interval length. The m-out-of-n bootstrap can improve performance and produce statistical consistency, but it requires empirical choice of m; we suggest a tuning solution to this problem. We show that in genomic examples, where it might be expected that the standard, “synchronous” bootstrap will successfully accommodate nonindependence of vector components, that approach can produce misleading results. An “independent component” bootstrap can overcome these difficulties, even in cases where components are not strictly independent.
Primary Subjects: 62G09
Secondary Subjects: 62G30
Keywords: Confidence interval; genomic data; high dimension; independent-component bootstrap; m-out-of-n bootstrap; ordering; overlap interval; prediction interval; synchronous bootstrap
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription.
Read more about accessing full-text
Links and Identifiers
Permanent link to this document: http://projecteuclid.org/euclid.aos/1256303532
Digital Object Identifier: doi:10.1214/09-AOS699
References
Amosova, N. N. (1972). On limit theorems for probabilities of moderate deviations. Vestnik Leningrad. Univ. No. 13 Mat. Meth. Astronom. 3 5–14, 148.
Mathematical Reviews (MathSciNet):
MR331484
Cesário, L. C. and Barreto, M. C. M. (2003). Study of the performance of bootstrap confidence intervals for the mean of a normal distribution using perfectly ranked set sampling. Rev. Mat. Estatíst. 21 7–20.
Golstein, H. and Spiegelhalter, D. J. (1996). League tables and their limitations: Statistical issues in comparisons of institutional performance. J. Roy. Statist. Soc. Ser. A 159 385–443.
Hall, P. and Miller, H. (2009). Using generalised correlation to effect variable selection in very high-dimensional problems. J. Comput. Graph. Statist. To appear.
Hui, T. P., Modarres, R. and Zheng, G. (2005). Bootstrap confidence interval estimation of mean via ranked set sampling linear regression. J. Stat. Comput. Simul. 75 543–553.
Langford, I. H. and Leyland, A. H. (1996). Discussion of Goldstein and Spiegelhalter. J. Roy. Statist. Soc. Ser. A 159 427–428.
Larocque, D. and Léger, C. (1994). Bootstrap estimates of the power of a rank test in a randomized block design. Statist. Sinica 4 423–443.
Mukherjee, S. N., Sykacek, P., Roberts, S. J. and Gurr, S. J. (2003). Gene ranking using bootstrapped p-values. Sigkdd Explorations 5 14–18.
Pelin, P., Brcich, R. and Zoubir, A. (2000). A bootstrap technique for rank estimation. In Statistical Signal and Array Processing, 2000—Proceedings of the Tenth IEEE Workshop 94–98. IEEE, Pocono Manor, PA, USA.
Rubin, H. and Sethuraman, J. (1965). Probabilities of moderate deviations. Sankhyā Ser. A 27 325–346.
Mathematical Reviews (MathSciNet):
MR203783
Segal, M. R., Dahlquist, K. D. and Conklin, B. R. (2003). Regression approaches for microarray data analysis. J. Comput. Biol. 10 961–980.
Srivastava, M. S. (1987). Bootstrap method in ranking and slippage problems. Comm. Statist. Theory Methods 16 3285–3299.
Mathematical Reviews (MathSciNet):
MR917768
Steland, A. (1998). Bootstrapping rank statistics. Metrika 47 251–264.
Taconeli, C. A. and Barreto, M. C. M. (2005). Evaluation of a bootstrap confidence interval approach in perfectly ranked set sampling. Rev. Mat. Estatíst. 23 33–53.
Tu, X. M., Burdick, D. S. and Mitchell, B. C. (1992). Nonparametric rank estimation using bootstrap resampling and canonical correlation analysis. In Exploring the Limits of Bootstrap (R. LePage and L. Billard, eds.) 405–418. Wiley, New York.