Statistical Science

Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays

Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu
Source: Statist. Sci. Volume 18, Issue 1 (2003), 104-117.

Abstract

We propose a new method for class prediction in DNA microarray studies based on an enhancement of the nearest prototype classifier. Our technique uses "shrunken" centroids as prototypes for each class to identify the subsets of the genes that best characterize each class. The method is general and can be applied to the other high-dimensional classification problems. The method is illustrated on data from two gene expression studies: lymphoma and cancer cell lines.

First Page: Show Hide
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ss/1056397488
Digital Object Identifier: doi:10.1214/ss/1056397488
Mathematical Reviews number (MathSciNet): MR1997067
Zentralblatt MATH identifier: 02068942

References

Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick, J. C., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L., Marti, G. E., Moore, T., Hudson, Jr., J., Lu, L., Lewis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C., Greiner, T. C., Weisenburger, D. D., Armitage, J. O., Warnke, R., Levy, R., Wilson, W., Grever, M. R., Byrd, J. C., Botstein, D., Brown, P. O. and Staudt, L. M. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403 503-- 511.
Ambroise, C. and McLachlan, G. (2002). Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. U.S.A. 99 6562--6566.
Donoho, D. and Johnstone, I. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425--455.
Mathematical Reviews (MathSciNet): MR1311089
Zentralblatt MATH: 0815.62019
Digital Object Identifier: doi:10.2307/2337118
Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95 14 863--14 868.
Friedman, J. (1989). Regularized discriminant analysis. J. Amer. Statist. Assoc. 84 165--175.
Mathematical Reviews (MathSciNet): MR999675
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531--537.
Zentralblatt MATH: 1047.65504
Hastie, T., Tibshirani, R., Botstein, D. and Brown, P. (2001). Supervised harvesting of expression trees. Genome Biology 2 (1) research/0003.
Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Raffeld, M., Yakhini, Z., Ben-Dor, A., Dougherty, E., Kononen, J., Bubendorf, L., Fehrle, W., Pittaluga, S., Gruvberger, S., Loman, N., Johannsson, O., Olsson, H., Wilfond, B., Sauter, G., Kallioniemi, O., Borg, A. and Trent, J. (2001). Gene-expression profiles in hereditary breast cancer. New England Journal Medicine 344 539--548.
Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C., Peterson, C. and Meltzer, P. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7 673--679.
Levina, E. (2002). Statistical issues in texture analysis. Ph.D. dissertation, Dept. Statistics, Univ. California, Berkeley.
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E. and Golub, T. (2001). Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. U.S.A. 98 15 149--15 154.
Rieger, K., Hong, W., Tusher, V., Tang, J., Tibshirani, R. and Chu, G. (2003). Toxicity of radiation therapy associated with abnormal transcriptional responses to DNA damage. Submitted.
Ross, D., Scherf, U., Eisen, M., Perou, C., Rees, C., Spellman, P., Iyer, V., Jeffery, S., Van de Rijn, M., Waltham, M., Pergamenschikov, A., Lee, J., Lashkari, D., Shalon, D., Myers, T., Weinstein, J., Botstein, D. and Brown, P. (2000). Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics 24 227--235.
Tibshirani, R., Hastie, T., Narasimhan, B., and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. U.S.A. 99 6567--6572.
Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A. 98 5116--5121.

2012 © Institute of Mathematical Statistics

Statistical Science

Statistical Science