Instability of least squares, least absolute deviation and least median of squares linear regression, with a comment by Stephen Portnoy and Ivan Mizera and a rejoinder by the author



Statistical Science

Instability of least squares, least absolute deviation and least median of squares linear regression, with a comment by Stephen Portnoy and Ivan Mizera and a rejoinder by the author

Steven P. Ellis

Source: Statist. Sci. Volume 13, Number 4 (1998), 337-350.

Abstract

Say that a regression method is "unstable" at a data set if a small change in the data can cause a relatively large change in the fitted plane. A well-known example of this is the instability of least squares regression (LS) near (multi)collinear data sets. It is known that least absolute deviation (LAD) and least median of squares (LMS) linear regression can exhibit instability at data sets that are far from collinear. Clear-cut instability occurs at a "singularity"--a data set, arbitrarily small changes to which can substantially change the fit. For example, the collinear data sets are the singularities of LS. One way to measure the extent of instability of a regression method is to measure the size of its "singular set" (set of singularities). The dimension of the singular set is a tractable measure of its size that can be estimated without distributional assumptions or asymptotics.

By applying a general theorem on the dimension of singular sets, we find that the singular sets of LAD and LMS are at least as large as that of LS and often much larger. Thus, prima facie, LAD and LMS are frequently unstable. This casts doubt on the trustworthiness of LAD and LMS as exploratory regression tools.

Primary Subjects: 62J05
Keywords: Collinearity; stability; singularity

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ss/1028905829
Mathematical Reviews number (MathSciNet): MR1705266
Digital Object Identifier: doi:10.1214/ss/1028905829

References

Birkes, D. and Dodge, Y. (1993). Alternative Methods of Regression. Wiley, New York.
Mathematical Reviews (MathSciNet): MR94k:62098
Bloomfield, P. and Steiger, W. L. (1983). Least Absolute Deviations: Theory, Applications, and Algorithms. Birkha ¨user, Boston.
Boothby, W. M. (1975). An Introduction to Differentiable Manifolds and Riemannian Geometry. Academic Press, New York.
Mathematical Reviews (MathSciNet): MR54:13956
Davies, P. L. (1993). Aspects of robust linear regression. Ann. Statist. 21 1843-1899.
Zentralblatt MATH: 0797.62026
Dodge, Y., ed. (1987). Statistical Data Analy sis Based on the L1-Norm and Related Methods. North-Holland, Amsterdam.
Mathematical Reviews (MathSciNet): MR89b:62009
Dodge, Y., ed. (1992). L1-Statistical Analy sis and Related Methods. North-Holland, Amsterdam.
Mathematical Reviews (MathSciNet): MR93k:62005
Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York.
Mathematical Reviews (MathSciNet): MR95h:62077
Ellis, S. P. (1991). The singularities of fitting planes to data. Ann. Statist. 19 1661-1666. Ellis, S. P. (1995a). Dimension of the singular sets of planefitters. Ann. Statist. 23 490-501. Ellis, S. P. (1995b). A note on the smoothness of L1-estimators for the linear model. Sankhy¯a Ser. A 57 221-226.
Ellis, S. P. (1996). On the size of singular sets of plane-fitters. Utilitas Math. 49 233-242.
Zentralblatt MATH: 0849.62031
Ellis, S. P. (1997). On the instability of least squares, least absolute deviation, and least median of squares linear regression, unpublished manuscript.
Falconer, K. (1990). Fractal Geometry: Mathematical Foundations and Applications. Wiley, New York.
Gentle, J. E. (1977). Least absolute values estimation: an introduction. Comm. Statist. B 6 313-328.
Hampel, F. R. (1975). Bey ond location parameters: robust concepts and methods. Bull. Internat. Statist. Inst. 46 375-382.
Mathematical Reviews (MathSciNet): MR58:3190
Hettmansperger, T. P. and Sheather, S. J. (1992). A cautionary note on the method of least median squares. Amer. Statist. 46 79-83.
Mathematical Reviews (MathSciNet): MR93b:62092
Malone, K. M., Corbitt, E. M., Li, S. and Mann, J. J. (1996). Prolactin response to fenfluramine and suicide attempt lethality in major depression. British J. Psy chiatry 168 324- 329. Mann, J. J., McBride, P. A., Malone, K. M., DeMeo, M. and
Keilp, J. (1995). Blunted serotonergic responsivity in depressed inpatients. Neuropsy chopharmacology 13 53-64.
Morgan, F. (1988). Geometric Measure Theory: A Beginner's Guide. Academic Press, New York.
Pine, D. S., Cohen, P. and Brook, J. (1996). Emotional problems during youth as predictors of stature during early adulthood: Results from a prospective epidemiologic study. Pediatrics 97 856-863.
Rousseeuw, P. J. (1984). Least median of squares regression. J. Amer. Statist. Assoc. 79 871-880.
Mathematical Reviews (MathSciNet): MR86d:62113
Rousseeuw, P. J. (1994). Unconventional features of positivebreakdown estimators. Statist. Probab. Lett. 19 417-431.
Mathematical Reviews (MathSciNet): MR1278677
Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection. Wiley, New York.
Mathematical Reviews (MathSciNet): MR89e:62043
valued mappings; see Rockafellar and Wets, 1998). The answer for LAD is no. Let us first view LAD as a functional M on the space of the distribution functions for the data; the specific estimator is then M Fn , where Fn is the empiric distribution of the data (generally, both x and y for regression). If the design space is bounded, we observe the above-mentioned continuity, with respect to weak convergence. We know that the restriction on design space is inevitable, since weak continuity is closely related to qualitative robustness-and we know that LAD is qualitatively robust only with respect to arbitrary departures in y, not in x (see Mizera, 1998, for more details). Ellis considers LAD simply as a function of the data. In this case, the same ty pe of continuity holds as a consequence of the previous continuity of the LAD functional. This was proved separately by Dupa cov´a (1992), who moreover gave a modulus of continuity. Since at the points with unique LAD our continuity reduces to ordinary continuity, we may claim that LAD is continuous on exactly the same set of nonsingular designs as LS. It is important to remark that these continuity properties do not arise automatically with the introduction of the set-valued framework; the behavior of
LMS, for instance, is far from that straightforward
(see again Mizera, 1998). At singular points, the LAD solutions are unbounded, as are those of LS; and LAD is then continuous at these designs in the same sense as LS: any limit point of solutions for perturbed data tends to a solution for the original data (Rockafellar and Wets, 1998, call this outer semicontinuity). We agree that the insistence on picking a unique solution from the set of LAD solutions may lead to discontinuity. However, even this does not happen if xs are kept fixed (the setting appropriate in "designed" or "ANOVA" situations): as shown by Ellis (1995b) (see also Ellis and Morgenthaler, 1992), the centroid or Steiner point of the solution set is uniformly continuous as a function of ys when xs are constant.
tile (Zhou and Portnoy, 1996, 1998). Although these are not directly applicable to confidence intervals on specific coefficients, they would yield confidence bands that would have to include fitted values for all solutions for moderately small x-values. Thus, the extremely similar confidence lines in Figure 3 seem highly suspect. It seems clear from the plot that data are from a mixture of two rather different regression lines, and there is rather clear heteroscedasticity. It seemed likely that the LAD was in fact nonunique, and that the two lines in the plot (for the original data and the perturbed data) were the regression quantiles at successive breakpoints (up to perturbation). If this were so, then any confidence set for the LAD would have to cover the entire solution set (i.e., all convex combinations of both lines). Fortunately, Roger Koenker has developed regression quantile software that is available from Statlib and from his web page: http://www.econ.uiuc.edu/~roger/research Use of the "rq" program in S-PLUS verified the guesses above and provided confidence intervals for the data of Figure 3. The two regression lines were y = 0 000005 0 49574x and -0 00087 + 2 08440x. The default 95% confidence bounds for the intercept and slope were -3 988 3 610 and -0 699 2 211 . This last interval is much larger than that indicated on Figure 3, and clearly includes both slopes. Here one might argue that the change in the LAD estimate, although not statistically significant, is greater than that of the LS estimator. This possibly greater sensitivity, however, seems to indicate a useful sensitivity of regression quantile methods to important data features to which LS is insensitive. That is, the nonuniqueness and wide confidence intervals might suggest the possibility of a mixture model for the conditional median, while a naive LS analysis (at least one not looking at the data) gives a slope estimate of 0.803 with a SE of 0.309, which would indicate a significant regression with a slope corresponding to neither part of the mixture. To try to clarify the discrepancy between these results and the confidence intervals Ellis presents, we would like to suggest some places where errors may have occurred. First, the error density seems to be quite small at the median, and thus estimation of the sparsity function may be especially problematic. More important, we believe Ellis used the variance estimate for the i.i.d. model. In heteroscedastic cases, the appropriate variance is given by a "sandwich": X DX -1X X X DX -1, where D is a diagonal matrix of the density evaluated at each observation (see Portnoy, 1991b). The use of the i.i.d. variance estimate is not even consistent in heteroscedastic cases. Koenker (1994) presents several methods for generating asy mptotically legitimate confidence intervals for specific coefficients in heteroscedastic cases, and he recommends the one based on inverting a regression quantile rank test for a coefficient. This is the default in the software described above. It is interesting to note that Koenker's software permits the specification of the confidence interval estimate based on the i.i.d. model sparsity estimate. This method uses the Hall-Sheather optimal estimate (with bandwidth n-1/3) and gives 1 335 2 834 as an interval for the slope. Although this estimate is also invalidated by heteroscedasticity, it is still much larger than that of Ellis. Use of a sparsity estimate different from the optimal Hall-Sheather approach may have also contributed to the discrepancy. It is also legitimate to use the x y bootstrap in heteroscedastic cases. The example here seems to be quite similar to that plotted in the figures in Spady
(1991). The bootstrap distribution appears to have four modes: one at each of the solid and dashed lines in Figure 3 and two less obvious ones. This multimodality might suggest inaccuracies in the naive percentile method; but using the percentile x y bootstrap with 1,000 replications generated a 95% confidence interval for the slope of -0 783 2 274 , corroborating the rank method. Last, it should also be noted that the difficulty of forcing nonuniqueness for a given quantile makes this data highly artificial.
Y LAD, let b Y be the slope of the, necessarily unique, LAD line for Y. In this simple context, the message of Portnoy and Mizera concerning continuity appears to be: b is continuous on LAD. So, for example, if we perturb the
Belsley, D. A. (1991). Conditioning Diagnostics: Collinearity and Weak Data in Regression. Wiley, New York.
Dupa cov´a J. (1992). Robustness of L1 regression in the light of linear programming. In L1-Statistical Analy sis and Related Methods (Y. Dodge, ed.) 47-61. North-Holland, Amsterdam.
Ellis, S. P. and Morgenthaler S. (1992). Leverage and breakdown in L1 regression. J. Amer. Statist. Assoc. 87 143-148.
Freedman, F., Pisani, R., Purves, R. and Adhikari, A. (1991). Statistics, 2nd ed. Norton, New York.
Koenker, R. (1994). Confidence intervals for regression quantiles. In Asy mptotic Statistics: Proceedings of the 5th Prague Sy mposium (P. Mandl and M. Hu skov´a, eds.) 349-359. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR1311953
Mizera, I. (1998). On continuity: resistance and qualitative robustness. Preprint. (Available at http://www.uniba.sk/ ~ mizera.) Portnoy, S. (1991a). Asy mptotic behavior of the number of regression quantile breakpoints. SIAM J. Sci. Statist. Comput. 12 867-883. Portnoy, S. (1991b). Asy mptotic behavior of regression quantiles in nonstationary, dependent cases. J. Multivariate Anal. 38 100-113.
Portnoy, S. and Welsh, A. (1992). Exactly what is being modelled by the sy stematic component of a heteroscedastic linear regression. Statist. Probab. Lett. 13 253-258.
Rockafellar, R. T. and Wets, R. J.-B. (1998). Variational Analy sis. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR98m:49001
Simmons, G. F. (1963). Introduction to Topology and Modern Analy sis. McGraw-Hill, New York.
Mathematical Reviews (MathSciNet): MR26:4145
Spady, R. (1991). Saddlepoint approximations for regression models. Biometrika 78 879-889.
Mathematical Reviews (MathSciNet): MR92i:62033
Zentralblatt MATH: 0850.62185
Venables, W. N. and Ripley, B. D. (1994). Modern Applied Statistics with S-Plus. Springer, New York.
Mathematical Reviews (MathSciNet): MR97c:62003
Zhou, Q. and Portnoy, S. (1996). Direct use of regression quantiles to construct confidence sets for linear models. Ann. Statist. 24 287-306.
Zhou, Q. and Portnoy, S. (1998). Statistical inference on heteroscedastic linear models based on regression quantiles. J. Nonparametr. Statist. 9 239-260.

2009 © Institute of Mathematical Statistics