June 2023 Optimal discriminant analysis in high-dimensional latent factor models
Xin Bing, Marten Wegkamp
Author Affiliations +
Ann. Statist. 51(3): 1232-1257 (June 2023). DOI: 10.1214/23-AOS2289

Abstract

In high-dimensional classification problems, a commonly used approach is to first project the high-dimensional features into a lower-dimensional space, and base the classification on the resulting lower-dimensional projections. In this paper, we formulate a latent-variable model with a hidden low-dimensional structure to justify this two-step procedure and to guide which projection to choose. We propose a computationally efficient classifier that takes certain principal components (PCs) of the observed features as projections, with the number of retained PCs selected in a data-driven way. A general theory is established for analyzing such two-step classifiers based on any projections. We derive explicit rates of convergence of the excess risk of the proposed PC-based classifier. The obtained rates are further shown to be optimal up to logarithmic factors in the minimax sense. Our theory allows the lower dimension to grow with the sample size and is also valid even when the feature dimension (greatly) exceeds the sample size. Extensive simulations corroborate our theoretical findings. The proposed method also performs favorably relative to other existing discriminant methods on three real data examples.

Funding Statement

Wegkamp is supported in part by the National Science Foundation grants DMS 2015195 and DMS 2210557. Bing is partially supported by a discovery grant from the Natural Sciences and Engineering Research Council of Canada.

Acknowledgments

The authors would like to thank the Editor, Associate Editor and two referees for their careful reading and very constructive suggestions.

Citation

Download Citation

Xin Bing. Marten Wegkamp. "Optimal discriminant analysis in high-dimensional latent factor models." Ann. Statist. 51 (3) 1232 - 1257, June 2023. https://doi.org/10.1214/23-AOS2289

Information

Received: 1 August 2022; Revised: 1 March 2023; Published: June 2023
First available in Project Euclid: 20 August 2023

MathSciNet: MR4630947
zbMATH: 07732746
Digital Object Identifier: 10.1214/23-AOS2289

Subjects:
Primary: 62H12 , 62J07

Keywords: Dimension reduction , discriminant analysis , High-dimensional classification , latent factor model , Optimal rate of convergence , principal component regression

Rights: Copyright © 2023 Institute of Mathematical Statistics

Vol.51 • No. 3 • June 2023
Back to Top