A convex optimization approach to high-dimensional sparse quadratic discriminant analysis

T. Tony Cai; Linjun Zhang

doi:10.1214/20-AOS2012

Abstract

In this paper, we study high-dimensional sparse Quadratic Discriminant Analysis (QDA) and aim to establish the optimal convergence rates for the classification error. Minimax lower bounds are established to demonstrate the necessity of structural assumptions such as sparsity conditions on the discriminating direction and differential graph for the possible construction of consistent high-dimensional QDA rules.

We then propose a classification algorithm called SDAR using constrained convex optimization under the sparsity assumptions. Both minimax upper and lower bounds are obtained and this classification rule is shown to be simultaneously rate optimal over a collection of parameter spaces, up to a logarithmic factor. Simulation studies demonstrate that SDAR performs well numerically. The algorithm is also illustrated through an analysis of prostate cancer data and colon tissue data. The methodology and theory developed for high-dimensional QDA for two groups in the Gaussian setting are also extended to multigroup classification and to classification under the Gaussian copula model.

Funding Statement

The first author was supported by NSF Grant DMS-1712735 and NIH Grant R01 GM-123056.
The second author was supported by NSF Grant DMS-2015378.

Acknowledgments

The authors would like to thank the anonymous referees, an Associate Editor and the editor for their constructive comments that improved the quality of this paper.

Citation

Download Citation

T. Tony Cai. Linjun Zhang. "A convex optimization approach to high-dimensional sparse quadratic discriminant analysis." Ann. Statist. 49 (3) 1537 - 1568, June 2021. https://doi.org/10.1214/20-AOS2012

Information

Received: 1 September 2019; Revised: 1 August 2020; Published: June 2021

First available in Project Euclid: 9 August 2021

MathSciNet: MR4298872

zbMATH: 1475.62178

Digital Object Identifier: 10.1214/20-AOS2012

Subjects:

Primary: 62H30

Secondary: 62C20 , 62H12

Keywords: ‎classification‎ , constrained ℓ1 minimization , High-dimensional data , minimax lower bound , Optimal rate of convergence , quadratic discriminant analysis

Abstract

Funding Statement

Acknowledgments

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS