Open Access
2022 The robust nearest shrunken centroids classifier for high-dimensional heavy-tailed data
Shaokang Ren, Qing Mai
Author Affiliations +
Electron. J. Statist. 16(1): 3343-3384 (2022). DOI: 10.1214/22-EJS2022

Abstract

The nearest shrunken centroids classifier (NSC) is a popular high-dimensional classifier. However, it is prone to inaccurate classification when the data is heavy-tailed. In this paper, we develop a robust generalization of NSC (RNSC) which remains effective under such circumstances. By incorporating the Huber loss both in the estimation and the calculation of the score function, we reduce the impacts of heavy tails. We rigorously show the variable selection, estimation, and prediction consistency in high dimensions under weak moment conditions. Empirically, our proposal greatly outperforms NSC and many other successful classifiers when data is heavy-tailed while remaining comparable to NSC in the absence of heavy tails. The favorable performance of RNSC is also demonstrated in a real data example.

Funding Statement

This project was supported in part by the grant CCF-1908969 from the U.S. National Science Foundation.

Acknowledgments

The authors thank the editor, the associate editor, and the referee, whose comments led to significant improvements of this paper.

Citation

Download Citation

Shaokang Ren. Qing Mai. "The robust nearest shrunken centroids classifier for high-dimensional heavy-tailed data." Electron. J. Statist. 16 (1) 3343 - 3384, 2022. https://doi.org/10.1214/22-EJS2022

Information

Received: 1 April 2021; Published: 2022
First available in Project Euclid: 17 May 2022

MathSciNet: MR4422968
zbMATH: 1493.62399
Digital Object Identifier: 10.1214/22-EJS2022

Subjects:
Primary: 62H30
Secondary: 62J07

Keywords: heavy-tailed data , High-dimensional classification , Huber loss , nearest shrunken centroids classifier , robust estimator

Vol.16 • No. 1 • 2022
Back to Top