Statistical Science

The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators

Stephen Portnoy and Roger Koenker

Full-text: Open access

Abstract

Since the time of Gauss, it has been generally accepted that $\ell_2$-methods of combining observations by minimizing sums of squared errors have significant computational advantages over earlier $\ell_1$-methods based on minimization of absolute errors advocated by Boscovich, Laplace and others. However, $\ell_1$-methods are known to have significant robustness advantages over $\ell_2$-methods in many applications, and related quantile regression methods provide a useful, complementary approach to classical least-squares estimation of statistical models. Combining recent advances in interior point methods for solving linear programs with a new statistical preprocessing approach for $\ell_1$-type problems, we obtain a 10- to 100-fold improvement in computational speeds over current (simplex-based) $\ell_1$-algorithms in large problems, demonstrating that $\ell_1$-methods can be made competitive with $\ell_2$-methods in terms of computational speed throughout the entire range of problem sizes. Formal complexity results suggest that $\ell_1$-regression can be made faster than least-squares regression for n sufficiently large and p modest.

Article information

Source
Statist. Sci. Volume 12, Number 4 (1997), 279-300.

Dates
First available: 22 August 2002

Permanent link to this document
http://projecteuclid.org/euclid.ss/1030037960

Mathematical Reviews number (MathSciNet)
MR1619189

Digital Object Identifier
doi:10.1214/ss/1030037960

Zentralblatt MATH identifier
0955.62608

Citation

Portnoy, Stephen; Koenker, Roger. The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Statistical Science 12 (1997), no. 4, 279--300. doi:10.1214/ss/1030037960. http://projecteuclid.org/euclid.ss/1030037960.


Export citation

References

  • ney, A., Ostrouchov, S. and Sorenson, D. (1995). LAPACK Users' Guide. SIAM, Philadelphia.
  • Barrodale, I. and Roberts, F. D. K. (1974). Solution of an overdetermined sy stem of equations in the 1 norm. Communications of the ACM 17 319-320.
  • Bartels, R. and Conn, A. (1980). Linearly constrained discrete 1 problems. ACM Trans. Math. Software 6 594-608.
  • Bloomfield, P. and Steiger, W. L. (1983). Least Absolute Deviations: Theory, Applications, and Algorithms. Birkh¨auser, Boston.
  • Buchinsky, M. (1994). Changes in US wage structure 1963-87: an application of quantile regression. Econometrica 62 405- 458.
  • Buchinsky, M. (1995). Quantile regression, the Box-Cox transformation model and U.S. wage structure 1963-1987. J. Econometrics 65 109-154.
  • Chamberlain, G. (1994). Quantile regression, censoring and the structure of wages. In Advances in Econometrics (C. Sims, ed.). North-Holland, Amsterdam.
  • Chambers, J. M. (1992). Linear models. In Statistical Models in S (J. M. Chambers and T. J. Hastie, eds.) 95-144. Wadsworth, Pacific Grove, CA.
  • Charnes, A., Cooper, W. W. and Ferguson, R. O. (1955). Optimal estimation of executive compensation by linear programming. Management Science 1 138-151.
  • Chaudhuri, P. (1992). Generalized regression quantiles. In Proceedings of the Second Conference on Data Analy sis Based on the L1 Norm and Related Methods 169-186. North-Holland, Amsterdam.
  • Chen, S. and Donoho, D. L. (1995). Atomic decomposition by basis pursuit. SIAM J. Sci. Stat. Comp. To appear.
  • Dikin, I. I. (1967). Iterative solution of problems of linear and quadratic programming. Soviet Math. Dokl. 8 674-675.
  • Edgeworth, F. Y. (1887). On observations relating to several quantities. Hermathena 6 279-285.
  • Edgeworth, F. Y. (1888). On a new method of reducing observations relating to several quantities. Philosophical Magazine 25 184-191.
  • Fan, J. and Gijbels, I. (1996). Local Poly nomial Modelling and Its Applications. Chapman and Hall, London.
  • Fiacco, A. V. and McCormick, G. P. (1968). Nonlinear Programming: Sequential Unconstrained Minimization Techniques. Wiley, New York.
  • Floy d, R. W. and Rivest, R. L. (1975). Expected time bounds for selection. Communications of the ACM 18 165-173.
  • Frisch, R. (1956). La R´esolution des probl emes de programme lin´eaire par la m´ethode du potential logarithmique. Cahiers du S´eminaire d'Econometrie 4 7-20.
  • Gauss, C. F. (1821). Theoria combinationis observationum erroribus minimis obnoxiae: pars prior. [Translated (1995) by G. W. Stewart as Theory of the Combination of Observations Least Subject to Error. SIAM, Philadelphia.] Gill, P., Murray, W., Saunders, M., Tomlin, T. and Wright,
  • M. (1986). On projected Newton barrier methods for linear programming and an equivalence to Karmarkar's projective method. Math. Programming 36 183-209.
  • Gonzaga, C. C. (1992). Path-following methods for linear programming. SIAM Rev. 34 167-224.
  • Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. Chapman and Hall, London.
  • Gutenbrunner, C. and Jure ckov´a, J. (1992). Regression quantile and regression rank score process in the linear model and derived statistics. Ann. Statist. 20 305-330. Gutenbrunner, C., Jure ckov´a, J., Koenker, R. and Portnoy, S.
  • (1993). Tests of linear hy potheses based on regression rank scores. J. Nonparametric Statist. 2 307-333.
  • Hall, P. and Sheather, S. (1988). On the distribution of a studentized quantile. J. Roy. Statist. Soc. Ser. B 50 381-391.
  • Karmarkar, N. (1984). A new poly nomial time algorithm for linear programming. Combinatorica 4 373-395.
  • Koenker, R. (1994). Confidence intervals for regression quantiles. In Asy mptotic Statistics, Proceedings of the Fifth Prague Sy mposium (P. Mandl and M. Hu skov´a, eds.) 349-359. Springer, Heidelberg.
  • Koenker, R. and Bassett, G. (1978). Regression quantiles. Econometrica 46 33-50.
  • Koenker, R. and d'Orey, V. (1987). Computing regression quantiles. J. Roy. Statist. Soc. Ser. C 36 383-393.
  • Koenker, R. and d'Orey, V. (1993). Computing dual regression quantiles and regression rank scores. J. Roy. Statist. Soc. Ser. C 43 410-414.
  • Koenker R., Ng, P. and Portnoy, S. (1994). Quantile smoothing splines. Biometrika 81 673-680.
  • Laplace, P.-S. (1789). Sur quelques points du sy st eme du monde. M´emoires de l'Acad´emie des Sciences de Paris. (Reprinted in OEvres Compl´etes 11 475-558. Gauthier-Villars, Paris.)
  • Lustig, I. J., Marsden, R. E. and Shanno, D. F. (1992). On implementing Mehrotra's predictor-corrector interior-point method for linear programming. SIAM J. Optim. 2 435-449.
  • Lustig, I. J., Marsden, R. E. and Shanno, D. F. (1994). Interior point methods for linear programming: computational state of the art (with discussion). ORSA J. Comput. 6 1-36.
  • Manning, W., Blumberg, L. and Moulton, L. H. (1995). The demand for alcohol: the differential response to price. J. Health Economics 14 123-148.
  • Mehrotra, S. (1992). On the implementation of a primal-dual interior point method. SIAM J. Optim. 2 575-601.
  • Meketon, M. S. (1986). Least absolute value regression. Technical report, Bell Labs, Holmdel, NJ.
  • Mizuno, S., Todd, M. J. and Ye, Y. (1993). On adaptive-step primal dual interior point algorithms for linear programming. Math. Oper. Res. 18 964-981.
  • Oja, H. (1983). Descriptive statistics for multivariate distributions. Statist. Probab. Lett. 1 327-332.
  • Portnoy, S. (1991). Asy mptotic behavior of the number of regression quantile breakpoints. SIAM Journal of Scientific and Statistical Computing 12 867-883.
  • Powell, J. L. (1986). Censored regression quantiles. J. Econometrics 32 143-155.
  • Renegar, J. (1988). A poly nomial-time algorithm based on Newton's method for linear programming. Math. Programming 40 59-93.
  • Shamir, R. (1993). Probabilistic analysis in linear programming. Statist. Sci. 8 57-64.
  • Siddiqui, M. (1960). Distribution of quantiles in samples from a bivariate population. J. Res. Nat. Bur. Stand. B 64 145- 150.
  • Sonnevend, G., Stoer, J. and Zhao, G. (1991). On the complexity of following the central path of linear programs by linear extrapolation II. Math. Programming 52 527-553.
  • Stigler, S. M. (1984). Boscovich, Simpson and a 1760 manuscript note on fitting a linear relation. Biometrika 71 615-620.
  • Stigler, S. M. (1986). The History of Statistics: Measurement of Uncertainty before 1900. Harvard Univ. Press.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. C 58 267-288.
  • Vanderbei, R. J., Meketon, M. J. and Freedman, B. A. (1986). A modification of Karmarkar's linear programming algorithm. Algorithmica 1 395-407.
  • Wagner, H. M. (1959). Linear programming techniques for regression analysis. J. Amer. Statist. Assoc. 54 206-212.
  • Welsh, A. H. (1996). Robust estimation of smooth regression and spread functions and their derivatives. Statist. Sinica 6 347-366.
  • Wright, M. H. (1992). Interior methods for constrained optimization. Acta Numerica 1 341-407.
  • Zhang, Y. (1992). Primal-dual interior point approach for computing 1-solutions and -solutions of overdetermined linear sy stems. J. Optim. Theory Appl. 77 323-341.
  • GAUSSIAN HARE, LAPLACIAN TORTOISE 297
  • borne, 1985). It can be computed by the fast median algorithm of Bloomfield and Steiger, for example. The Barrodale-Roberts approach is equivalent to using a comparison sort in this context and seems already sufficient to explain the O n2 behavior observed. Recently, Osborne and Watson (1996) have observed that the secant algorithm can be applied here and interpreted as an alternative to the usual median of three partitioning in the fast median computation. The improvement over Bloomfield and Steiger can be staggering in problems which arise in fitting a deterministic model in the presence of noise. For the record, the code distributed by Bartels, Conn and Sinclair used a heap sort in the linesearch implementation and was perhaps the first to improve on the O n2 asy mptotics. It would seem to be time that S-PLUS used a more modern implementation. 3. There is at least some folk law concerning the inferior performance of interior point methods when compared with simplex-sty le methods in postoptimality computations. However, this is the ty pe of computation employ ed when stud
  • GAUSSIAN HARE, LAPLACIAN TORTOISE 299
  • G ¨uler, O., den Hertog, D., Roos, C. and Terlaky, T. (1993). Degeneracy in interior point methods for linear programming: a survey. Ann. Oper. Res. 46 107-138.
  • Kennedy, W. and Gentle, J. E., Jr. (1980). Statistical Computing. Dekker, New York.
  • Monteiro, R. D. C. and Mehrotra, S. (1996). A general parametric analysis approach and its implications to sensitivity analysis in interior point methods. Math. Programming 72 65-82.
  • Osborne, M. R. (1985). Finite Algorithms in Optimization and Data Analy sis. Wiley, New York.
  • Osborne, M. R. and Watson, G. A. (1996). Aspects of Mestimation and l1 fitting. In Numerical Analy sis (D. F. Griffith and G. A. Watson, eds.). World Scientific, Singapore. Press, W., Flannery, B., Teukolsky, S. and Vetterling, W.
  • (1986). Numerical Recipes: The Art of Scientific Computing. Cambridge Univ. Press.
  • Thisted, R. A. (1988). Elements of Statistical Computing. Chapman and Hall, London.

See also

  • Includes: Ronald A. Thisted. Comment by Ronald A. Thisted.
  • Includes: M. R. Osborne. Comment by M. R. Osborne.
  • Includes: Stephen Portnoy, Roger Koenker. Rejoinder by Stephen Portnoy and Roger Koenker.