The Annals of Applied Statistics

A bias correction for the minimum error rate in cross-validation

Ryan J. Tibshirani and Robert Tibshirani

Full-text: Open access

Abstract

Tuning parameters in supervised learning problems are often estimated by cross-validation. The minimum value of the cross-validation error can be biased downward as an estimate of the test error at that same value of the tuning parameter. We propose a simple method for the estimation of this bias that uses information from the cross-validation process. As a result, it requires essentially no additional computation. We apply our bias estimate to a number of popular classifiers in various settings, and examine its performance.

Article information

Source
Ann. Appl. Stat. Volume 3, Number 2 (2009), 822-829.

Dates
First available in Project Euclid: 22 June 2009

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1245676196

Digital Object Identifier
doi:10.1214/08-AOAS224

Mathematical Reviews number (MathSciNet)
MR2750683

Zentralblatt MATH identifier
1166.62311

Keywords
Cross-validation prediction error estimation optimism estimation

Citation

Tibshirani, Ryan J.; Tibshirani, Robert. A bias correction for the minimum error rate in cross-validation. Ann. Appl. Stat. 3 (2009), no. 2, 822--829. doi:10.1214/08-AOAS224. https://projecteuclid.org/euclid.aoas/1245676196.


Export citation

References

  • Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
  • Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1–26.
  • Efron, B. (2008). Empirical Bayes estimates for large-scale prediction problems. Available at http://www-stat.stanford.edu/~ckirby/brad/papers/2008EBestimates.pdf.
  • Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman & Hall, London.
  • Stone, M. (1977). Asymptotics for and against cross-validation. Biometrika 64 29–35.
  • Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2001). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99 6567–6572.
  • Varma, S. and Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 91.