Electronic Journal of Statistics

Monitoring robust regression

Marco Riani, Andrea Cerioli, Anthony C. Atkinson, and Domenico Perrotta

Full-text: Open access

Abstract

Robust methods are little applied (although much studied by statisticians). We monitor very robust regression by looking at the behaviour of residuals and test statistics as we smoothly change the robustness of parameter estimation from a breakdown point of 50% to non-robust least squares. The resulting procedure provides insight into the structure of the data including outliers and the presence of more than one population. Monitoring overcomes the hindrances to the routine adoption of robust methods, being informative about the choice between the various robust procedures. Methods tuned to give nominal high efficiency fail with our most complicated example. We find that the most informative analyses come from S estimates combined with Tukey’s biweight or with the optimal $\rho$ functions.

For our major example with 1,949 observations and 13 explanatory variables, we combine robust S estimation with regression using the forward search, so obtaining an understanding of the importance of individual observations, which is missing from standard robust procedures. We discover that the data come from two different populations. They also contain six outliers.

Our analyses are accompanied by numerous graphs. Algebraic results are contained in two appendices, the second of which provides useful new results on the absolute odd moments of elliptically truncated multivariate normal random variables.

Article information

Source
Electron. J. Statist. Volume 8, Number 1 (2014), 646-677.

Dates
First available in Project Euclid: 20 May 2014

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1400592267

Digital Object Identifier
doi:10.1214/14-EJS897

Mathematical Reviews number (MathSciNet)
MR3211027

Zentralblatt MATH identifier
1348.62200

Subjects
Primary: 62J05: Linear regression 62J20: Diagnostics 62G35: Robustness
Secondary: 62P20: Applications to economics [See also 91Bxx]

Keywords
Forward search graphical methods least trimmed squares outliers regression diagnostics rho function S estimation truncated normal distribution

Citation

Riani, Marco; Cerioli, Andrea; Atkinson, Anthony C.; Perrotta, Domenico. Monitoring robust regression. Electron. J. Statist. 8 (2014), no. 1, 646--677. doi:10.1214/14-EJS897. https://projecteuclid.org/euclid.ejs/1400592267


Export citation

References

  • Andrews, D. F., Bickel, P. J., Hampel, F. R., Tukey, W. J., and Huber, P. J. (1972). Robust Estimates of Location: Survey and Advances. Princeton University Press, Princeton, NJ.
  • Atkinson, A. C. and Riani, M. (2000). Robust Diagnostic Regression Analysis. Springer-Verlag, New York.
  • Atkinson, A. C. and Riani, M. (2002). Forward search added variable $t$ tests and the effect of masked outliers on model selection. Biometrika, 89, 939–946.
  • Atkinson, A. C., Riani, M., and Cerioli, A. (2010). The forward search: Theory and data analysis (with discussion). Journal of the Korean Statistical Society, 39, 117–134. doi:10.1016/j.jkss.2010.02.007.
  • Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations (with discussion). Journal of the Royal Statistical Society, Series B, 26, 211–246.
  • Cerioli, A., Farcomeni, A., and Riani, M. (2014). Strong consistency and robustness of the Forward Search estimator of multivariate location and scatter. Journal of Multivariate Analysis, 126, 167–183.
  • Croux, C. and Rousseeuw, P. J. (1992). A class of high-breakdown scale estimators based on subranges. Communications in Statistics – Theory and Methods, 21, 1935–1951.
  • Croux, C., Dhaene, G., and Hoorelbeke, D. (2004). Robust standard errors for robust estimators. CES – Discussion paper series OR 0367, Department of Applied Economics, KU Leuven.
  • Hampel, F. R. (1975). Beyond location parameters: Robust concepts and methods. Bulletin of the International Statistical Institute, 46, 375–382.
  • Hampel, F. R., Rousseeuw, P. J., and Ronchetti, E. (1981). The change-of-variance curve and optimal redescending M-estimators. Journal of the American Statistical Association, 76, 643–648.
  • Hawkins, D. M. and Olive, D. J. (2002). Inconsistency of resampling algorithms for high-breakdown regression estimators and a new algorithm (with discussion). Journal of the American Statistical Association, 97, 136–159.
  • Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo. Annals of Statistics, 1, 799–821.
  • Huber, P. J. and Ronchetti, E. M. (2009). Robust Statistics, Second Edition. Wiley, New York.
  • Koller, M. and Stahel, W. A. (2011). Sharpening Wald-type inference in robust regression for small samples. Computational Statistics and Data Analysis, 55, 2504–2515.
  • Kotz, S., Balakrishnan, N., and Johnson, N. L. (2000). Continuous Multivariate Distributions – 1, 2nd Edition. Wiley, New York.
  • Maronna, R. A., Martin, R. D., and Yohai, V. J. (2006). Robust Statistics: Theory and Methods. Wiley, Chichester.
  • Riani, M., Perrotta, D., and Torti, F. (2012). FSDA: A MATLAB toolbox for robust analysis and interactive data exploration. Chemometrics and Intelligent Laboratory Systems, 116, 17–32. doi:10.1016/j.chemolab.2012.03.017.
  • Riani, M., Atkinson, A. C., and Perrotta, D. (2014a). The forward search algorithm for very robust regression. (Submitted).
  • Riani, M., Cerioli, A., and Torti, F. (2014b). On consistency factors and efficiency of robust S-estimators. TEST. (In press). doi:10.1007/S11749-014- 0357-7.
  • Riani, M., Atkinson, A. C., and Perrotta, D. (2014c). A parametric framework for the comparison of methods of very robust regression. Statistical Science, 29, 128–143. doi:10.1214/13-STS437.
  • Riani, M., Cerioli, A., Atkinson, A. C., and Perrotta, D. (2014d). Supplement to “Monitoring robust regression”. doi:10.1214/14-EJS897SUPP.
  • Rousseeuw, P. J. (1984). Least median of squares regression. Journal of the American Statistical Association, 79, 871–880.
  • Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection. Wiley, New York.
  • Rousseeuw, P. J. and Yohai, V. J. (1984). Robust regression by means of S-estimators. In Robust and Nonlinear Time Series Analysis: Lecture Notes in Statistics 26, pages 256–272. Springer Verlag, New York.
  • Stigler, S. M. (1989). Francis Galton’s account of the invention of correlation. Statistical Science, 4, 73–79.
  • Stigler, S. M. (2010). The changing history of robustness. The American Statistician, 64, 277–281.
  • Tallis, G. M. (1963). Elliptical and radial truncation in normal samples. Annals of Mathematical Statistics, 34, 940–944.
  • Yohai, V. J. (1987). High breakdown-point and high efficiency estimates for regression. The Annals of Statistics, 15, 642–656.
  • Yohai, V. J. and Zamar, R. H. (1988). High breakdown-point estimates of regression by means of the minimization of an efficient scale. Journal of the American Statistical Association, 83, 406–413.
  • Yohai, V. J. and Zamar, R. H. (1997). Optimal locally robust M-estimates of regression. Journal of Statistical Planning and Inference, 64(2), 309–323.

Supplemental materials