The Annals of Statistics

Gibbs posterior for variable selection in high-dimensional classification and data mining

Wenxin Jiang and Martin A. Tanner

Full-text: Open access


In the popular approach of “Bayesian variable selection” (BVS), one uses prior and posterior distributions to select a subset of candidate variables to enter the model. A completely new direction will be considered here to study BVS with a Gibbs posterior originating in statistical mechanics. The Gibbs posterior is constructed from a risk function of practical interest (such as the classification error) and aims at minimizing a risk function without modeling the data probabilistically. This can improve the performance over the usual Bayesian approach, which depends on a probability model which may be misspecified. Conditions will be provided to achieve good risk performance, even in the presence of high dimensionality, when the number of candidate variables “K” can be much larger than the sample size “n.” In addition, we develop a convenient Markov chain Monte Carlo algorithm to implement BVS with the Gibbs posterior.

Article information

Ann. Statist., Volume 36, Number 5 (2008), 2207-2231.

First available in Project Euclid: 13 October 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F99: None of the above, but in this section
Secondary: 82-08: Computational methods

Data augmentation data mining Gibbs posterior high-dimensional data linear classification Markov chain Monte Carlo prior distribution risk performance sparsity variable selection


Jiang, Wenxin; Tanner, Martin A. Gibbs posterior for variable selection in high-dimensional classification and data mining. Ann. Statist. 36 (2008), no. 5, 2207--2231. doi:10.1214/07-AOS547.

Export citation


  • Brown, P. J., Fearn, T. and Vannucci, M. (1999). The choice of variables in multivariate regres- sion: A non-conjugate Bayesian decision theory approach. Biometrika 86 635–648.
  • Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G. and West, M. (2004). Sparse graphical models for exploring gene expression data. J. Multivariate Anal. 90 196–212.
  • Friedman, J., Hastie, T., Rosset, S., Tibshirani, R. and Zhu, J. (2004). Discussion on boosting. Ann. Statist. 32 102–107.
  • Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. 6 721–741.
  • George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesian variable selection. Statist. Sinica 7 339–373.
  • Gerlach, R., Bird, R. and Hall, A. (2002). Bayesian variable selection in logistic regression: Predicting company earnings direction. Aust. N. Z. J. Statist. 44 155–168.
  • Greenshtein, E. (2006). Best subset selection, persistency in high dimensional statistical learning and optimization under 1 constraint. Ann. Statist. 34 2367–2386.
  • Horowitz, J. L. (1992). A smoothed maximum score estimator for the binary response model. Econometrica 60 505–531.
  • Kleijn, B. J. K. and van der Vaart, A. W. (2006). Misspecification in infinite-dimensional Bayesian statistics. Ann. Statist. 34 837–877.
  • Jiang, W. (2007). Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities. Ann. Statist. 35 1487–1511.
  • Lee, K. E., Sha, N., Dougherty, E. R., Vannucci, M. and Mallick, B. K. (2003). Gene selection: A Bayesian variable selection approach. Bioinformatics 19 90–97.
  • Lindley, D. V. (1968). The choice of variables in multiple regression (with discussion). J. Roy. Statist. Assoc. Ser. B 30 31–66.
  • Smith, M. and Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. J. Econometrics 75 317–343.
  • Tanner, M. A. (1996). Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions, 3rd ed. Springer, New York.
  • Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). J. Amer. Statist. Assoc. 82 528–550.
  • Zhang, T. (1999). Theoretical analysis of a class of randomized regularization methods. In COLT 99. Proceedings of the Twelfth Annual Conference on Computational Learning Theory 156–163. ACM Press, New York.
  • Zhang, T. (2006a). From ε-entropy to KL-entropy: Analysis of minimum information complexity density estimation. Ann. Statist. 34 2180–2210.
  • Zhang, T. (2006b). Information theoretical upper and lower bounds for statistical estimation. IEEE Trans. Inform. Theory 52 1307–1321.
  • Zhou, X., Liu, K.-Y. and Wong, S. T. C. (2004). Cancer classification and prediction using logistic regression with Bayesian gene selection. J. Biomedical Informatics 37 249–259.