Open Access
Translator Disclaimer
2022 Binary classification with corrupted labels
Yonghoon Lee, Rina Foygel Barber
Author Affiliations +
Electron. J. Statist. 16(1): 1367-1392 (2022). DOI: 10.1214/22-EJS1987

Abstract

In a binary classification problem where the goal is to fit an accurate predictor, the presence of corrupted labels in the training data set may create an additional challenge. However, in settings where likelihood maximization is poorly behaved—for example, if positive and negative labels are perfectly separable—then a small fraction of corrupted labels can improve performance by ensuring robustness. In this work, we establish that in such settings, corruption acts as a form of regularization, and we compute precise upper bounds on estimation error in the presence of corruptions. Our results suggest that the presence of corrupted data points is beneficial only up to a small fraction of the total sample, scaling with the square root of the sample size.

Funding Statement

R.F.B. was partially supported by the National Science Foundation via grants DMS-1654076 and DMS-2023109, and by the Office of Naval Research via grant N00014-20-1-2337.

Citation

Download Citation

Yonghoon Lee. Rina Foygel Barber. "Binary classification with corrupted labels." Electron. J. Statist. 16 (1) 1367 - 1392, 2022. https://doi.org/10.1214/22-EJS1987

Information

Received: 1 June 2021; Published: 2022
First available in Project Euclid: 2 March 2022

Digital Object Identifier: 10.1214/22-EJS1987

Subjects:
Primary: 62H30

Keywords: ‎classification‎ , label noise

JOURNAL ARTICLE
26 PAGES


SHARE
Vol.16 • No. 1 • 2022
Back to Top