Double data piling leads to perfect classification

Woonyoung Chang; Jeongyoun Ahn; Sungkyu Jung

doi:10.1214/21-EJS1945

2021 Double data piling leads to perfect classification

Woonyoung Chang, Jeongyoun Ahn, Sungkyu Jung

Author Affiliations +

Electron. J. Statist. 15(2): 6382-6428 (2021). DOI: 10.1214/21-EJS1945

Abstract

Data piling refers to the phenomenon that training data vectors from each class project to a single point for classification. While this interesting phenomenon has been a key to understanding many distinctive properties of high-dimensional discrimination, the theoretical underpinning of data piling is far from properly established. In this work, high-dimensional asymptotics of data piling is investigated under a spiked covariance model, which reveals its close connection to the well-known ridged linear classifier. In particular, by projecting the ridge discriminant vector onto the subspace spanned by the leading sample principal component directions and the maximal data piling vector, we show that a negatively ridged discriminant vector can asymptotically achieve data piling of independent test data, essentially yielding a perfect classification. The second data piling direction is obtained purely from training data and shown to have a maximal property. Furthermore, asymptotic perfect classification occurs only along the second data piling direction.

Funding Statement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2019R1A2C2002256, 2021R1A2C1093526).

Acknowledgments

We would like to thank Editor, Associate Editor and anonymous reviewers whose comments and suggestions helped to improve and clarify our manuscript. We would also like to thank Mr. Taehyun Kim for constructive criticism of the manuscript.

Citation

Download Citation

Woonyoung Chang. Jeongyoun Ahn. Sungkyu Jung. "Double data piling leads to perfect classification." Electron. J. Statist. 15 (2) 6382 - 6428, 2021. https://doi.org/10.1214/21-EJS1945