Institute of Mathematical Statistics Collections
MCD-RoSIS – A robust procedure for variable selection
Consider the task of estimating a regression function for describing the relationship between a response and a vector of p predictors. Often only a small subset of all given candidate predictors actually effects the response, while the rest might inhibit the analysis. Procedures for variable selection aim to identify the true predictors. A method for variable selection when the dimension p of the regressor space is much larger than the sample size n is Sure Independence Screening (SIS). The number of predictors is to be reduced to a value less than the number of observations before conducting the regression analysis. As SIS is based on nonrobust estimators, outliers in the data might lead to the elimination of true predictors. Hence, a robustified version of SIS called RoSIS was proposed which is based on robust estimators. Here, we give a modification of RoSIS by using the MCD estimator in the new algorithm. The new procedure MCD-RoSIS leads to better results, especially under collinearity. In a simulation study we compare the performance of SIS, RoSIS and MCD-RoSIS w.r.t. their robustness against different types of data contamination as well as different degrees of collinearity.
First available in Project Euclid: 29 November 2010
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Copyright © 2010, Institute of Mathematical Statistics
Guddat, Charlotte; Gather, Ursula; Kuhnt, Sonja. MCD-RoSIS – A robust procedure for variable selection. Nonparametrics and Robustness in Modern Statistical Inference and Time Series Analysis: A Festschrift in honor of Professor Jana Jurečková, 75--83, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2010. doi:10.1214/10-IMSCOLL708. https://projecteuclid.org/euclid.imsc/1291044744
-  Bellman, R. E. (1961). Adaptive Control Processes. Princeton University Press.
-  Cook, R. D. (1994). On the Interpretation of Regression Plots. J. Amer. Statist. Assoc., 89 177–189.
-  Cook, R. D. (1998). Regression Graphics: Ideas for Studying Regressions Through Graphics. Wiley, New York.
-  Cox, D. R., Snell, E. J. (1974). The Choice of Variables in Observational Studies. Appl. Statist. 23 51–59.
-  Davies, P. L. and Gather, U. (1993). The Identification of Multiple Outliers (with discussion and rejoinder). J. Amer. Statist. Assoc. 88 782–792.
-  Fan, J. Q. and Lv, J. (2008). Sure Independence Screening for Ultrahigh Dimensional Feature Space (with discussion and rejoinder). J. Roy. Stat. Soc. B 70 849–911.
-  Gather, U. and Guddat, C. (2008). Comment on “Sure Independence Screening for Ultrahigh Dimensional Feature Space” by Fan, J.Q. and Lv, J. J. Roy. Stat. Soc. B 70 893–895.
-  Gnanadesikan, R., Kettenring, J. (1972). Robust Estimates, Residuals, and Outlier Detection With Multiresponse Data. Biometrics 28 81–124.
-  Guddat, C., Gather, U., and Kuhnt, S. (2010). MCD-RoSIS - A Robust Procedure for Variable Selection. Discussion Paper, SFB 823, TU Dortmund, Germany.
-  Li, K.-C. (1991). Sliced Inverse Regression for Dimension Reduction (with discussion). J. Amer. Statist. Assoc. 86 316–342.
-  Li, L., Cook, R. D. and Nachtsheim, C. J. (2005). Model-free Variable Selection. J. Roy. Stat. Soc. B 67 285–299.
-  Maronna, R. A., Zamar, R. H. (2002). Robust Estimates of Location and Dispersion for High-dimensional Datasets. J. Amer. Statist. Assoc. 44 307–317.
-  R Development Core Team (2008). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing. Vienna, Austria, ISBN 3-900051-07-0, URL http://www.R-project.org.
-  Rousseeuw, P. J. (1984). Least Median of Squares Regression. J. Amer. Statist. Assoc. 84 871–880.