Statistical Science

Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays

Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu

Full-text: Open access


We propose a new method for class prediction in DNA microarray studies based on an enhancement of the nearest prototype classifier. Our technique uses "shrunken" centroids as prototypes for each class to identify the subsets of the genes that best characterize each class. The method is general and can be applied to the other high-dimensional classification problems. The method is illustrated on data from two gene expression studies: lymphoma and cancer cell lines.

Article information

Statist. Sci., Volume 18, Issue 1 (2003), 104-117.

First available in Project Euclid: 23 June 2003

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Sample classification gene expression arrays.


Tibshirani, Robert; Hastie, Trevor; Narasimhan, Balasubramanian; Chu, Gilbert. Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays. Statist. Sci. 18 (2003), no. 1, 104--117. doi:10.1214/ss/1056397488.

Export citation


  • Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick, J. C., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L., Marti, G. E., Moore, T., Hudson, Jr., J., Lu, L., Lewis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C., Greiner, T. C., Weisenburger, D. D., Armitage, J. O., Warnke, R., Levy, R., Wilson, W., Grever, M. R., Byrd, J. C., Botstein, D., Brown, P. O. and Staudt, L. M. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403 503-- 511.
  • Ambroise, C. and McLachlan, G. (2002). Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. U.S.A. 99 6562--6566.
  • Donoho, D. and Johnstone, I. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425--455.
  • Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95 14 863--14 868.
  • Friedman, J. (1989). Regularized discriminant analysis. J. Amer. Statist. Assoc. 84 165--175.
  • Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531--537.
  • Hastie, T., Tibshirani, R., Botstein, D. and Brown, P. (2001). Supervised harvesting of expression trees. Genome Biology 2 (1) research/0003.
  • Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Raffeld, M., Yakhini, Z., Ben-Dor, A., Dougherty, E., Kononen, J., Bubendorf, L., Fehrle, W., Pittaluga, S., Gruvberger, S., Loman, N., Johannsson, O., Olsson, H., Wilfond, B., Sauter, G., Kallioniemi, O., Borg, A. and Trent, J. (2001). Gene-expression profiles in hereditary breast cancer. New England Journal Medicine 344 539--548.
  • Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C., Peterson, C. and Meltzer, P. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7 673--679.
  • Levina, E. (2002). Statistical issues in texture analysis. Ph.D. dissertation, Dept. Statistics, Univ. California, Berkeley.
  • Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E. and Golub, T. (2001). Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. U.S.A. 98 15 149--15 154.
  • Rieger, K., Hong, W., Tusher, V., Tang, J., Tibshirani, R. and Chu, G. (2003). Toxicity of radiation therapy associated with abnormal transcriptional responses to DNA damage. Submitted.
  • Ross, D., Scherf, U., Eisen, M., Perou, C., Rees, C., Spellman, P., Iyer, V., Jeffery, S., Van de Rijn, M., Waltham, M., Pergamenschikov, A., Lee, J., Lashkari, D., Shalon, D., Myers, T., Weinstein, J., Botstein, D. and Brown, P. (2000). Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics 24 227--235.
  • Tibshirani, R., Hastie, T., Narasimhan, B., and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. U.S.A. 99 6567--6572.
  • Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A. 98 5116--5121.