Institute of Mathematical Statistics Collections

MCD-RoSIS – A robust procedure for variable selection

Charlotte Guddat, Ursula Gather, and Sonja Kuhnt

Full-text: Open access

Abstract

Consider the task of estimating a regression function for describing the relationship between a response and a vector of p predictors. Often only a small subset of all given candidate predictors actually effects the response, while the rest might inhibit the analysis. Procedures for variable selection aim to identify the true predictors. A method for variable selection when the dimension p of the regressor space is much larger than the sample size n is Sure Independence Screening (SIS). The number of predictors is to be reduced to a value less than the number of observations before conducting the regression analysis. As SIS is based on nonrobust estimators, outliers in the data might lead to the elimination of true predictors. Hence, a robustified version of SIS called RoSIS was proposed which is based on robust estimators. Here, we give a modification of RoSIS by using the MCD estimator in the new algorithm. The new procedure MCD-RoSIS leads to better results, especially under collinearity. In a simulation study we compare the performance of SIS, RoSIS and MCD-RoSIS w.r.t. their robustness against different types of data contamination as well as different degrees of collinearity.

Chapter information

Source
J. Antoch, M. Hušková and P.K. Sen, eds., Nonparametrics and Robustness in Modern Statistical Inference and Time Series Analysis: A Festschrift in honor of Professor Jana Jurečková (Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2010), 75-83

Dates
First available in Project Euclid: 29 November 2010

Permanent link to this document
https://projecteuclid.org/euclid.imsc/1291044744

Digital Object Identifier
doi:10.1214/10-IMSCOLL708

Mathematical Reviews number (MathSciNet)
MR2808368

Subjects
Primary: 62G35: Robustness 62J99: None of the above, but in this section

Keywords
Variable selections dimension reduction regression outliers robust estimation

Rights
Copyright © 2010, Institute of Mathematical Statistics

Citation

Guddat, Charlotte; Gather, Ursula; Kuhnt, Sonja. MCD-RoSIS – A robust procedure for variable selection. Nonparametrics and Robustness in Modern Statistical Inference and Time Series Analysis: A Festschrift in honor of Professor Jana Jurečková, 75--83, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2010. doi:10.1214/10-IMSCOLL708. https://projecteuclid.org/euclid.imsc/1291044744


Export citation

References

  • [1] Bellman, R. E. (1961). Adaptive Control Processes. Princeton University Press.
  • [2] Cook, R. D. (1994). On the Interpretation of Regression Plots. J. Amer. Statist. Assoc., 89 177–189.
  • [3] Cook, R. D. (1998). Regression Graphics: Ideas for Studying Regressions Through Graphics. Wiley, New York.
  • [4] Cox, D. R., Snell, E. J. (1974). The Choice of Variables in Observational Studies. Appl. Statist. 23 51–59.
  • [5] Davies, P. L. and Gather, U. (1993). The Identification of Multiple Outliers (with discussion and rejoinder). J. Amer. Statist. Assoc. 88 782–792.
  • [6] Fan, J. Q. and Lv, J. (2008). Sure Independence Screening for Ultrahigh Dimensional Feature Space (with discussion and rejoinder). J. Roy. Stat. Soc. B 70 849–911.
  • [7] Gather, U. and Guddat, C. (2008). Comment on “Sure Independence Screening for Ultrahigh Dimensional Feature Space” by Fan, J.Q. and Lv, J. J. Roy. Stat. Soc. B 70 893–895.
  • [8] Gnanadesikan, R., Kettenring, J. (1972). Robust Estimates, Residuals, and Outlier Detection With Multiresponse Data. Biometrics 28 81–124.
  • [9] Guddat, C., Gather, U., and Kuhnt, S. (2010). MCD-RoSIS - A Robust Procedure for Variable Selection. Discussion Paper, SFB 823, TU Dortmund, Germany.
  • [10] Li, K.-C. (1991). Sliced Inverse Regression for Dimension Reduction (with discussion). J. Amer. Statist. Assoc. 86 316–342.
  • [11] Li, L., Cook, R. D. and Nachtsheim, C. J. (2005). Model-free Variable Selection. J. Roy. Stat. Soc. B 67 285–299.
  • [12] Maronna, R. A., Zamar, R. H. (2002). Robust Estimates of Location and Dispersion for High-dimensional Datasets. J. Amer. Statist. Assoc. 44 307–317.
  • [13] R Development Core Team (2008). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing. Vienna, Austria, ISBN 3-900051-07-0, URL http://www.R-project.org.
  • [14] Rousseeuw, P. J. (1984). Least Median of Squares Regression. J. Amer. Statist. Assoc. 84 871–880.