Open Access
February 2020 Sparse high-dimensional regression: Exact scalable algorithms and phase transitions
Dimitris Bertsimas, Bart Van Parys
Ann. Statist. 48(1): 300-323 (February 2020). DOI: 10.1214/18-AOS1804

Abstract

We present a novel binary convex reformulation of the sparse regression problem that constitutes a new duality perspective. We devise a new cutting plane method and provide evidence that it can solve to provable optimality the sparse regression problem for sample sizes $n$ and number of regressors $p$ in the 100,000s, that is, two orders of magnitude better than the current state of the art, in seconds. The ability to solve the problem for very high dimensions allows us to observe new phase transition phenomena. Contrary to traditional complexity theory which suggests that the difficulty of a problem increases with problem size, the sparse regression problem has the property that as the number of samples $n$ increases the problem becomes easier in that the solution recovers 100% of the true signal, and our approach solves the problem extremely fast (in fact faster than Lasso), while for small number of samples $n$, our approach takes a larger amount of time to solve the problem, but importantly the optimal solution provides a statistically more relevant regressor. We argue that our exact sparse regression approach presents a superior alternative over heuristic methods available at present.

Citation

Download Citation

Dimitris Bertsimas. Bart Van Parys. "Sparse high-dimensional regression: Exact scalable algorithms and phase transitions." Ann. Statist. 48 (1) 300 - 323, February 2020. https://doi.org/10.1214/18-AOS1804

Information

Received: 1 March 2017; Revised: 1 December 2018; Published: February 2020
First available in Project Euclid: 17 February 2020

zbMATH: 07196540
MathSciNet: MR4065163
Digital Object Identifier: 10.1214/18-AOS1804

Subjects:
Primary: 62J07
Secondary: 90C10

Keywords: Best subset selection , Convex optimization , integer optimization , kernel learning , Sparse regression

Rights: Copyright © 2020 Institute of Mathematical Statistics

Vol.48 • No. 1 • February 2020
Back to Top