Abstract
The nearest shrunken centroids classifier (NSC) is a popular high-dimensional classifier. However, it is prone to inaccurate classification when the data is heavy-tailed. In this paper, we develop a robust generalization of NSC (RNSC) which remains effective under such circumstances. By incorporating the Huber loss both in the estimation and the calculation of the score function, we reduce the impacts of heavy tails. We rigorously show the variable selection, estimation, and prediction consistency in high dimensions under weak moment conditions. Empirically, our proposal greatly outperforms NSC and many other successful classifiers when data is heavy-tailed while remaining comparable to NSC in the absence of heavy tails. The favorable performance of RNSC is also demonstrated in a real data example.
Funding Statement
This project was supported in part by the grant CCF-1908969 from the U.S. National Science Foundation.
Acknowledgments
The authors thank the editor, the associate editor, and the referee, whose comments led to significant improvements of this paper.
Citation
Shaokang Ren. Qing Mai. "The robust nearest shrunken centroids classifier for high-dimensional heavy-tailed data." Electron. J. Statist. 16 (1) 3343 - 3384, 2022. https://doi.org/10.1214/22-EJS2022
Information