Statistical Science

Regularization of Case-Specific Parameters for Robustness and Efficiency

Yoonkyung Lee, Steven N. MacEachern, and Yoonsuh Jung

Full-text: Open access


Regularization methods allow one to handle a variety of inferential problems where there are more covariates than cases. This allows one to consider a potentially enormous number of covariates for a problem. We exploit the power of these techniques, supersaturating models by augmenting the “natural” covariates in the problem with an additional indicator for each case in the data set. We attach a penalty term for these case-specific indicators which is designed to produce a desired effect. For regression methods with squared error loss, an $\ell_{1}$ penalty produces a regression which is robust to outliers and high leverage cases; for quantile regression methods, an $\ell_{2}$ penalty decreases the variance of the fit enough to overcome an increase in bias. The paradigm thus allows us to robustify procedures which lack robustness and to increase the efficiency of procedures which are robust.

We provide a general framework for the inclusion of case-specific parameters in regularization problems, describing the impact on the effective loss for a variety of regression and classification problems. We outline a computational strategy by which existing software can be modified to solve the augmented regularization problem, providing conditions under which such modification will converge to the optimum solution. We illustrate the benefits of including case-specific parameters in the context of mean regression and quantile regression through analysis of NHANES and linguistic data sets.

Article information

Statist. Sci., Volume 27, Number 3 (2012), 350-372.

First available in Project Euclid: 5 September 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Case indicator large margin classifier LASSO leverage point outlier penalized method quantile regression


Lee, Yoonkyung; MacEachern, Steven N.; Jung, Yoonsuh. Regularization of Case-Specific Parameters for Robustness and Efficiency. Statist. Sci. 27 (2012), no. 3, 350--372. doi:10.1214/11-STS377.

Export citation


  • Baayen, R. H. (2007). Analyzing Linguistic Data: A Practical Introduction to Statistics. Cambridge Univ. Press, Cambridge, England.
  • Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H. and Yap, M. J. (2004). Visual word recognition of single-syllable words. Journal of Experimental Psychology 133 283–316.
  • Bartlett, P. L., Jordan, M. I. and McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101 138–156.
  • Bi, J., Bennett, K., Embrechts, M., Breneman, C. and Song, M. (2003). Dimensionality reduction via sparse support vector machines. J. Mach. Learn. Res. 3 1229–1243.
  • Efron, B. (1991). Regression percentiles using asymmetric squared error loss. Statist. Sinica 1 93–125.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004a). Least angle regression (with discussion, and a rejoinder by the authors). Ann. Statist. 32 407–499.
  • Fischler, M. A. and Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. ACM 24 381–395.
  • Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119–139.
  • Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Monographs on Statistics and Applied Probability 43. Chapman & Hall, London.
  • Hastie, T., Rosset, S., Tibshirani, R. and Zhu, J. (2004). The entire regularization path for the support vector machine. J. Mach. Learn. Res. 5 1391–1415.
  • He, X. (1997). Quantile curves without crossing. Amer. Statist. 51 186–192.
  • Hjort, N. L. and Pollard, D. (1993). Asymptotics for minimisers of convex processes. Technical report, Dept. Statistics, Yale Univ.
  • Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12 55–67.
  • Huber, P. J. (1981). Robust Statistics. Wiley, New York.
  • Jung, Y., MacEachern, S. N. and Lee, Y. (2010). Window width selection for $\ell_{2}$ adjusted quantile regression. Technical Report 835, Dept. Statistics, Ohio State Univ.
  • Knight, K. (1998). Limiting distributions for $L_{1}$ regression estimators under general conditions. Ann. Statist. 26 755–770.
  • Koenker, R. (2005). Quantile Regression. Econometric Society Monographs 38. Cambridge Univ. Press, Cambridge.
  • Koenker, R. and Bassett, G., Jr. (1978). Regression quantiles. Econometrica 46 33–50.
  • Koenker, R. and Hallock, K. (2001). Quantile regression. Journal of Economic Perspectives 15 143–156.
  • Lee, Y., MacEachern, S. N. and Jung, Y. (2007). Regularization of case-specific parameters for robustness and efficiency. Technical Report 799, Dept. Statistics, Ohio State Univ.
  • Lee, Y.-J. and Mangasarian, O. L. (2001). SSVM: A smooth support vector machine for classification. Comput. Optim. Appl. 20 5–22.
  • Lee, Y. and Wang, R. (2011). Does modeling lead to more accurate classification?: A study of relative efficiency. Unpublished manuscript.
  • McCullagh, P. and Nelder, J. (1989). Generalized Linear Models, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL.
  • Newey, W. K. and Powell, J. L. (1987). Asymmetric least squares estimation and testing. Econometrica 55 819–847.
  • Nychka, D., Gray, G., Haaland, P., Martin, D. and O’Connell, M. (1995). A nonparametric regression approach to syringe grading for quality improvement. J. Amer. Statist. Assoc. 90 1171–1178.
  • Pollard, D. (1991). Asymptotics for least absolute deviation regression estimators. Econometric Theory 7 186–199.
  • Portnoy, S. and Koenker, R. (1997). The Gaussian hare and the Laplacian tortoise: Computability of squared-error versus absolute-error estimators. Statist. Sci. 12 279–300.
  • Pregibon, D. (1982). Resistant fits for some commonly used logistic models with medical applications. Biometrics 38 485–498.
  • Rockafellar, R. T. (1997). Convex Analysis. Princeton Univ. Press, Princeton, NJ.
  • Rosset, S. and Zhu, J. (2004). Discussion of “Least angle regression,” by B. Efron, T. Hastie, I. Johnstone and R. Tibshirani. Ann. Statist. 32 469–475.
  • Rosset, S. and Zhu, J. (2007). Piecewise linear regularized solution paths. Ann. Statist. 35 1012–1030.
  • Shen, X., Tseng, G. C., Zhang, X. and Wong, W. H. (2003). On $\psi$-learning. J. Amer. Statist. Assoc. 98 724–734.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Vapnik, V. N. (1998). Statistical Learning Theory. Wiley, New York.
  • Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59. SIAM, Philadelphia, PA.
  • Weisberg, S. (2004). Discussion of “Least angle regression,” by B. Efron, T. Hastie, I. Johnstone and R. Tibshirani. Ann. Statist. 32 490–494.
  • Weisberg, S. (2005). Applied Linear Regression, 3rd ed. Wiley-Interscience, Hoboken, NJ.
  • Wu, Y. and Liu, Y. (2007). Robust truncated hinge loss support vector machines. J. Amer. Statist. Assoc. 102 974–983.
  • Xu, H., Caramanis, C. and Mannor, S. (2009). Robustness and regularization of support vector machines. J. Mach. Learn. Res. 10 1485–1510.
  • Yu, Q., MacEachern, S. N. and Peruggia, M. (2011). Bayesian synthesis: Combining subjective analyses, with an application to ozone data. Ann. Appl. Stat. 5 1678–1698.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
  • Zhou, N. and Zhu, J. (2007). Group variable selection via hierarchical lasso and its oracle property. Technical report, Dept. Statistics, Univ. Michigan.
  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301–320.