Open Access
2021 Sparse regression for extreme values
Andersen Chang, Minjie Wang, Genevera I. Allen
Author Affiliations +
Electron. J. Statist. 15(2): 5995-6035 (2021). DOI: 10.1214/21-EJS1937

Abstract

We study the problem of selecting features associated with extreme values in high dimensional linear regression. Normally, in linear modeling problems, the presence of abnormal extreme values or outliers is considered an anomaly which should either be removed from the data or remedied using robust regression methods. In many situations, however, the extreme values in regression modeling are not outliers but rather the signals of interest; consider traces from spiking neurons, volatility in finance, or extreme events in climate science, for example. In this paper, we propose a new method for sparse high-dimensional linear regression for extreme values which is motivated by the Subbotin, or generalized normal distribution, which we call the extreme value linear regression model. For our method, we utilize an p norm loss where p is an even integer greater than two; we demonstrate that this loss increases the weight on extreme values. We prove consistency and variable selection consistency for the extreme value linear regression with a Lasso penalty, which we term the Extreme Lasso, and we also analyze the theoretical impact of extreme value observations on the model parameter estimates using the concept of influence functions. Through simulation studies and a real-world data example, we show that the Extreme Lasso outperforms other methods currently used in the literature for selecting features of interest associated with extreme values in high-dimensional regression.

Funding Statement

The authors acknowledge support from NSF DMS-1554821 and NSF NeuroNex-1707400.

Acknowledgments

We also thank Dr. Michael Weylandt for providing useful discussions.

Citation

Download Citation

Andersen Chang. Minjie Wang. Genevera I. Allen. "Sparse regression for extreme values." Electron. J. Statist. 15 (2) 5995 - 6035, 2021. https://doi.org/10.1214/21-EJS1937

Information

Received: 1 June 2021; Published: 2021
First available in Project Euclid: 27 December 2021

Digital Object Identifier: 10.1214/21-EJS1937

Subjects:
Primary: 62J05 , 62J07
Secondary: 62P05 , 62P10

Keywords: Extreme values , generalized normal distribution , Linear regression , sparse modeling , Subbotin distribution

Vol.15 • No. 2 • 2021
Back to Top