Electronic Journal of Statistics

On Bayesian robust regression with diverging number of predictors

Daniel Nevo and Ya’acov Ritov

Full-text: Open access

Abstract

This paper concerns the robust regression model when the number of predictors and the number of observations grow in a similar rate. Theory for M-estimators in this regime has been recently developed by several authors (El Karoui et al., 2013; Bean et al., 2013; Donoho and Montanari, 2013). Motivated by the inability of M-estimators to successfully estimate the Euclidean norm of the coefficient vector, we consider a Bayesian framework for this model. We suggest a two-component mixture of normals prior for the coefficients and develop a Gibbs sampler procedure for sampling from relevant posterior distributions, while utilizing a scale mixture of normal representation for the error distribution. Unlike M-estimators, the proposed Bayes estimator is consistent in the Euclidean norm sense. Simulation results demonstrate the superiority of the Bayes estimator over traditional estimation methods.

Article information

Source
Electron. J. Statist., Volume 10, Number 2 (2016), 3045-3062.

Dates
Received: July 2015
First available in Project Euclid: 9 November 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1478660516

Digital Object Identifier
doi:10.1214/16-EJS1205

Mathematical Reviews number (MathSciNet)
MR3571962

Zentralblatt MATH identifier
1366.62139

Subjects
Primary: 62J05: Linear regression 62H12: Estimation 62F15: Bayesian inference

Keywords
Robust regression high dimensional regression Bayesian estimation MCMC

Citation

Nevo, Daniel; Ritov, Ya’acov. On Bayesian robust regression with diverging number of predictors. Electron. J. Statist. 10 (2016), no. 2, 3045--3062. doi:10.1214/16-EJS1205. https://projecteuclid.org/euclid.ejs/1478660516


Export citation

References

  • Andrews, D. F. and Mallows, C. L. (1974). Scale mixtures of normal distributions., Journal of the Royal Statistical Society. Series B (Methodological) 99–102.
  • Bean, D., Bickel, P. J., El Karoui, N. and Yu, B. (2013). Optimal M-estimation in high-dimensional regression., Proceedings of the National Academy of Sciences 110 14563–14568.
  • Carvalho, C. M., Polson, N. G. and Scott, J. G. (2009). Handling sparsity via the horseshoe. In, International Conference on Artificial Intelligence and Statistics 73–80.
  • Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals., Biometrika 97 465–480.
  • Castillo, I., Schmidt-Hieber, J. and Van der Vaart, A. (2015). Bayesian linear regression with sparse priors., The Annals of Statistics 43 1986–2018.
  • Castillo, I., van der Vaart, A. et al. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences., The Annals of Statistics 40 2069–2101.
  • Chhikara, R. (1988)., The Inverse Gaussian Distribution: Theory: Methodology, and Applications 95. CRC Press.
  • Donoho, D. and Montanari, A. (2013). High dimensional robust M-estimation: Asymptotic variance via approximate message passing., Probability Theory and Related Fields 1–35.
  • Efron, B. and Morris, C. (1973). Stein’s estimation rule and its competitors—an empirical Bayes approach., Journal of the American Statistical Association 68 117–130.
  • El Karoui, N. (2013). Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: Rigorous results., arXiv preprint arXiv:1311.2445.
  • El Karoui, N., Bean, D., Bickel, P. J., Lim, C. and Yu, B. (2013). On robust regression with high-dimensional predictors., Proceedings of the National Academy of Sciences 110 14557–14562.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent., Journal of Statistical Software 33 1.
  • Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images., Pattern Analysis and Machine Intelligence, IEEE Transactions on 6 721–741.
  • George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling., Journal of the American Statistical Association 88 881–889.
  • George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesian variable selection., Statistica Sinica 339–373.
  • Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo., Annals of Statistics 799–821.
  • Huber, P. J. (2011)., Robust Statistics. Springer.
  • James, W. and Stein, C. (1961). Estimation with quadratic loss. In, Proceedings of the Fourth Berkeley Symposium On Mathematical Statistics and Probability 1 361–379.
  • Maronna, R. A. and Yohai, V. J. (1981). Asymptotic behavior of general M-estimates for regression and scale with random carriers., Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 58 7–20.
  • Nevo, D. and Ritov, Y. (2016). Supplementary materials for “On Bayesian robust regression with diverging number of predictors”. DOI:, 10.1214/16-EJS1205SUPP.
  • Park, T. and Casella, G. (2008). The Bayesian Lasso., Journal of the American Statistical Association 103 681–686.
  • Portnoy, S. (1984). Asymptotic behavior of M-estimators of $p$ regression parameters when $p^2/n$ is large. I. Consistency., Annals of Statistics 1298–1309.
  • Portnoy, S. (1985). Asymptotic behavior of M estimators of $p$ regression parameters when $p^2/n$ is large; II. Normal approximation., Annals of Statistics 1403–1417.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society. Series B (Methodological) 267–288.
  • van der Pas, S., Kleijn, B. and van der Vaart, A. (2014). The horseshoe estimator: Posterior concentration around nearly black vectors., Electronic Journal of Statistics 8 2585–2618.
  • West, M. (1987). On scale mixtures of normal distributions., Biometrika 74 646–648.
  • Yi, C. and Huang, J. (2015). Semismooth Newton Coordinate Descent Algorithm for Elastic-Net Penalized Huber Loss and Quantile Regression., arXiv preprint arXiv:1509.02957.

Supplemental materials