The Annals of Statistics

A majorization–minimization approach to variable selection using spike and slab priors

Tso-Jung Yen

Full-text: Open access


We develop a method to carry out MAP estimation for a class of Bayesian regression models in which coefficients are assigned with Gaussian-based spike and slab priors. The objective function in the corresponding optimization problem has a Lagrangian form in that regression coefficients are regularized by a mixture of squared l2 and l0 norms. A tight approximation to the l0 norm using majorization–minimization techniques is derived, and a coordinate descent algorithm in conjunction with a soft-thresholding scheme is used in searching for the optimizer of the approximate objective. Simulation studies show that the proposed method can lead to more accurate variable selection than other benchmark methods. Theoretical results show that under regular conditions, sign consistency can be established, even when the Irrepresentable Condition is violated. Results on posterior model consistency and estimation consistency, and an extension to parameter estimation in the generalized linear models are provided.

Article information

Ann. Statist., Volume 39, Number 3 (2011), 1748-1775.

First available in Project Euclid: 25 July 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H12: Estimation
Secondary: 62F15: Bayesian inference 62J05: Linear regression

MAP estimation l_0 norm majorization–minimization algorithms Irrepresentable Condition


Yen, Tso-Jung. A majorization–minimization approach to variable selection using spike and slab priors. Ann. Statist. 39 (2011), no. 3, 1748--1775. doi:10.1214/11-AOS884.

Export citation


  • [1] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [2] Candés, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
  • [3] Candés, E. J., Wakin, M. B. and Boyd, S. P. (2008). Enhancing sparsity by reweighted l1 minimization. J. Fourier Anal. Appl. 14 877–905.
  • [4] Clyde, M. and George, E. I. (2000). Flexible empirical Bayes estimation for wavelets. J. R. Stat. Soc. Ser. B Stat. Methodol. 62 681–698.
  • [5] Clyde, M., Ghosh, J. and Littman, M. (2011). Bayesian adaptive sampling for variable selection and model averaging. J. Comput. Graph. Statist. 20 80–101.
  • [6] Fan, J. and Fan, Y. (2008). High-dimensional classification using features annealed independence rules. Ann. Statist. 36 2605–2637.
  • [7] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [8] Fan, J. and Lv, J. (2008). Sure independence screeing for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849–911.
  • [9] Friedman, J., Hastie, T., Hölfing, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302–332.
  • [10] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Statist. Software 33 1–22.
  • [11] Genkin, A., Lewis, D. D. and Madigan, D. (2007). Large scale Bayesian logistic regression for text categorization. Technometrics 49 291–304.
  • [12] George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731–747.
  • [13] George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
  • [14] George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesian variable selection. Statist. Sinica 7 339–373.
  • [15] Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C. and Lander, E. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531–537.
  • [16] Griffin, J. E. and Brown, P. J. (2010). Inference with normal-gamma prior distributions in regression problems. Bayesian Anal. 5 171–188.
  • [17] Hans, C. (2009). Bayesian lasso regression. Biometrika 96 835–845.
  • [18] Hunter, D. R. and Lange, K. (2004). A tutorial on MM algorithms. Amer. Statist. 58 30–37.
  • [19] Ishwaran, H. and Rao, J. S. (2005). Spike and slab gene selection for multigroup microarray data. J. Amer. Statist. Assoc. 100 764–780.
  • [20] Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist. 33 730–773.
  • [21] Johnstone, I. M. and Silverman, B. W. (2005). Empirical Bayes selection of wavelet thresholds. Ann. Statist. 33 1700–1752.
  • [22] Knight, K. and Fu, W. J. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
  • [23] Li, Q. and Lin, N. (2010). The Bayesian elastic net. Bayesian Anal. 5 151–170.
  • [24] Liang, F., Paulo, R., Molina, G., Clyde, M. A. and Berger, J. O. (2007). Mixtures of g priors for Bayesian variable selection. J. Amer. Statist. Assoc. 103 410–423.
  • [25] Mazumder, R., Friedman, J. and Hastie, T. (2011). SparseNet: Coordinate descent with nonconvex penalties. J. Amer. Statist. Assoc. To appear.
  • [26] McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman & Hall, New York.
  • [27] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group lasso for logistic regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 53–71.
  • [28] Meinshausen, N. (2007). Relaxed lasso. Comput. Statist. Data Anal. 52 374–393.
  • [29] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • [30] Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression. J. Amer. Statist. Assoc. 83 1023–1032.
  • [31] Park, M. Y. and Hastie, T. (2007). L1-regularization path algorithm for generalized linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 659–677.
  • [32] Park, T. and Casella, G. (2008). The Bayesian lasso. J. Amer. Statist. Assoc. 103 681–686.
  • [33] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • [34] Wu, T. T. and Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2 224–244.
  • [35] Yen, T.-J. (2011). Supplement to “A majorization–minimization approach to variable selection using spike and slab priors.” DOI:10.1214/11-AOS884SUPP.
  • [36] Yuan, M. and Lin., Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
  • [37] Yuan, M. and Lin, Y. (2007). On the nonnegative garrotte estimator. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 143–161.
  • [38] Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • [39] Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. J. Mach. Learn. Res. 7 2541–2564.
  • [40] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • [41] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301–320.
  • [42] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533.
  • [43] Zou, H. and Zhang, H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. Ann. Statist. 37 1733–1751.

Supplemental materials

  • Supplementary material: Supplement File. In Supplementary Material, we provide brief discussions on the log-sum function, connections with other approaches, derivation of the soft-thresolding operator, and proofs of Theorems 5.1, 5.2 and 5.3.