February 2022 Canonical thresholding for nonsparse high-dimensional linear regression
Igor Silin, Jianqing Fan
Author Affiliations +
Ann. Statist. 50(1): 460-486 (February 2022). DOI: 10.1214/21-AOS2116

Abstract

We consider a high-dimensional linear regression problem. Unlike many papers on the topic, we do not require sparsity of the regression coefficients; instead, our main structural assumption is a decay of eigenvalues of the covariance matrix of the data. We propose a new family of estimators, called the canonical thresholding estimators, which pick largest regression coefficients in the canonical form. The estimators admit an explicit form and can be linked to LASSO and Principal Component Regression (PCR). A theoretical analysis for both fixed design and random design settings is provided. Obtained bounds on the mean squared error and the prediction error of a specific estimator from the family allow to clearly state sufficient conditions on the decay of eigenvalues to ensure convergence. In addition, we promote the use of the relative errors, strongly linked with the out-of-sample R2. The study of these relative errors leads to a new concept of joint effective dimension, which incorporates the covariance of the data and the regression coefficients simultaneously, and describes the complexity of a linear regression problem. Some minimax lower bounds are established to showcase the optimality of our procedure. Numerical simulations confirm good performance of the proposed estimators compared to the previously developed methods.

Funding Statement

The research was supported by ONR Grant N00014-19-1-2120, NSF Grant DMS-1662139 and NIH Grant 2R01-GM072611-14.

Acknowledgments

We thank gratefully the Editor, the Associate Editor and the referees for constructive comments and valuable suggestions, which led to significant improvements on the paper.

Citation

Download Citation

Igor Silin. Jianqing Fan. "Canonical thresholding for nonsparse high-dimensional linear regression." Ann. Statist. 50 (1) 460 - 486, February 2022. https://doi.org/10.1214/21-AOS2116

Information

Received: 1 July 2020; Revised: 1 July 2021; Published: February 2022
First available in Project Euclid: 16 February 2022

MathSciNet: MR4382024
zbMATH: 1486.62207
Digital Object Identifier: 10.1214/21-AOS2116

Subjects:
Primary: 62J05
Secondary: 62H12 , 62H25

Keywords: covariance eigenvalues decay , high-dimensional linear regression , principal component regression , relative errors , thresholding

Rights: Copyright © 2022 Institute of Mathematical Statistics

Vol.50 • No. 1 • February 2022
Back to Top