Open Access
March 2013 Sparse least trimmed squares regression for analyzing high-dimensional large data sets
Andreas Alfons, Christophe Croux, Sarah Gelper
Ann. Appl. Stat. 7(1): 226-248 (March 2013). DOI: 10.1214/12-AOAS575

Abstract

Sparse model estimation is a topic of high importance in modern data analysis due to the increasing availability of data sets with a large number of variables. Another common problem in applied statistics is the presence of outliers in the data. This paper combines robust regression and sparse model estimation. A robust and sparse estimator is introduced by adding an $L_{1}$ penalty on the coefficient estimates to the well-known least trimmed squares (LTS) estimator. The breakdown point of this sparse LTS estimator is derived, and a fast algorithm for its computation is proposed. In addition, the sparse LTS is applied to protein and gene expression data of the NCI-60 cancer cell panel. Both a simulation study and the real data application show that the sparse LTS has better prediction performance than its competitors in the presence of leverage points.

Citation

Download Citation

Andreas Alfons. Christophe Croux. Sarah Gelper. "Sparse least trimmed squares regression for analyzing high-dimensional large data sets." Ann. Appl. Stat. 7 (1) 226 - 248, March 2013. https://doi.org/10.1214/12-AOAS575

Information

Published: March 2013
First available in Project Euclid: 9 April 2013

zbMATH: 06171270
MathSciNet: MR3086417
Digital Object Identifier: 10.1214/12-AOAS575

Keywords: Breakdown point , Outliers , penalized regression , robust regression , Trimming

Rights: Copyright © 2013 Institute of Mathematical Statistics

Vol.7 • No. 1 • March 2013
Back to Top