Abstract
We investigate the high-dimensional linear regression problem in the presence of noise that is correlated with Gaussian covariates. This type of correlation, known as endogeneity in regression models, often results from unobserved variables and other factors. It poses a significant challenge in causal inference and econometrics. In cases where covariates are high-dimensional, it is common to assume sparsity in the true parameters and to estimate them using regularization techniques, even with endogeneity. However, when sparsity is not applicable, controlling both endogeneity and high dimensionality simultaneously has not been well understood. This study demonstrates that an estimator, even without regularization, can achieve consistency, or benign overfitting, under certain assumptions about the covariance matrix. Specifically, our results indicate that the error of this estimator converges to zero when the covariance matrices of the correlated noise and the instrumental variables meet specific conditions related to their eigenvalues. We explore several extensions that relax these conditions and conduct experiments to validate our theoretical findings. As a technical contribution, we employ the convex Gaussian minimax theorem (CGMT) in our dual problem and expand upon CGMT itself.
Funding Statement
T.Tsuda was supported by JSPS Grant-in-Aid for JSPS Research Fellows (23KJ0713). M.Imaizumi was supported by JSPS KAKENHI (21K11780), JST CREST (JPMJCR21D2), and JST FOREST (JPMJFR216I).
Citation
Toshiki Tsuda. Masaaki Imaizumi. "Benign overfitting of non-sparse high-dimensional linear regression with correlated noise." Electron. J. Statist. 18 (2) 4119 - 4197, 2024. https://doi.org/10.1214/24-EJS2297
Information