Gibbs posterior for variable selection in high-dimensional classification and data mining

Wenxin Jiang; Martin A. Tanner

doi:10.1214/07-AOS547

October 2008 Gibbs posterior for variable selection in high-dimensional classification and data mining

Wenxin Jiang, Martin A. Tanner

Ann. Statist. 36(5): 2207-2231 (October 2008). DOI: 10.1214/07-AOS547

Abstract

In the popular approach of “Bayesian variable selection” (BVS), one uses prior and posterior distributions to select a subset of candidate variables to enter the model. A completely new direction will be considered here to study BVS with a Gibbs posterior originating in statistical mechanics. The Gibbs posterior is constructed from a risk function of practical interest (such as the classification error) and aims at minimizing a risk function without modeling the data probabilistically. This can improve the performance over the usual Bayesian approach, which depends on a probability model which may be misspecified. Conditions will be provided to achieve good risk performance, even in the presence of high dimensionality, when the number of candidate variables “K” can be much larger than the sample size “n.” In addition, we develop a convenient Markov chain Monte Carlo algorithm to implement BVS with the Gibbs posterior.

Citation

Download Citation

Wenxin Jiang. Martin A. Tanner. "Gibbs posterior for variable selection in high-dimensional classification and data mining." Ann. Statist. 36 (5) 2207 - 2231, October 2008. https://doi.org/10.1214/07-AOS547

Information

Published: October 2008

First available in Project Euclid: 13 October 2008

zbMATH: 1274.62227

MathSciNet: MR2458185

Digital Object Identifier: 10.1214/07-AOS547

Subjects:

Primary: 62F99

Secondary: 82-08

Keywords: Data augmentation , data mining , Gibbs posterior , High-dimensional data , linear classification , Markov chain Monte Carlo , prior distribution , risk performance , Sparsity , Variable selection

Access the abstract

JOURNAL ARTICLE
25 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY