October 2024 Environment invariant linear least squares
Jianqing Fan, Cong Fang, Yihong Gu, Tong Zhang
Author Affiliations +
Ann. Statist. 52(5): 2268-2292 (October 2024). DOI: 10.1214/24-AOS2435

Abstract

This paper considers a multi-environment linear regression model in which data from multiple experimental settings are collected. The joint distribution of the response variable and covariates may vary across different environments, yet the conditional expectations of the response variable, given the unknown set of important variables, are invariant. Such a statistical model is related to the problem of endogeneity, causal inference, and transfer learning. The motivation behind it is illustrated by how the goals of prediction and attribution are inherent in estimating the true parameter and the important variable set. We construct a novel environment invariant linear least squares (EILLS) objective function, a multi-environment version of linear least squares regression that leverages the above conditional expectation invariance structure and heterogeneity among different environments to determine the true parameter. Our proposed method is applicable without any additional structural knowledge and can identify the true parameter under a near-minimal identification condition related to the heterogeneity of the environments. We establish nonasymptotic 2 error bounds on the estimation error for the EILLS estimator in the presence of spurious variables. Moreover, we further show that the 0 penalized EILLS estimator can achieve variable selection consistency in high-dimensional regimes. These nonasymptotic results demonstrate the sample efficiency of the EILLS estimator and its capability to circumvent the curse of endogeneity in an algorithmic manner without any additional prior structural knowledge. To the best of our knowledge, this paper is the first to realize statistically efficient invariance learning in the general linear model.

Funding Statement

J. Fan’s research is supported by the ONR Grants N00014-19-1-2120 and N00014-22-1-2340 and NSF Grants DMS-2052926, DMS-2053832 and DMS-2210833.
C. Fang’s research is supported by National Key R&D Program of China (2022ZD0160301) and NSF China (No. 62376008).

Acknowledgments

The authors would like to thank three anonymous referees, an Associate Editor and the Editor for their constructive comments that improved the quality of this paper. Y. Gu thanks Yiran Jia for pointing out a typo in the proof of Theorem A.4.

Citation

Download Citation

Jianqing Fan. Cong Fang. Yihong Gu. Tong Zhang. "Environment invariant linear least squares." Ann. Statist. 52 (5) 2268 - 2292, October 2024. https://doi.org/10.1214/24-AOS2435

Information

Received: 1 March 2023; Revised: 1 July 2024; Published: October 2024
First available in Project Euclid: 20 November 2024

Digital Object Identifier: 10.1214/24-AOS2435

Subjects:
Primary: 62J05
Secondary: 62D20

Keywords: endogeneity , Heterogeneity , Invariance , invariant risk minimization , least squares , multiple environments , structural causal model

Rights: Copyright © 2024 Institute of Mathematical Statistics

Vol.52 • No. 5 • October 2024
Back to Top