The Annals of Statistics

Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities

Wenxin Jiang

Abstract

Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables $(x_1,\ldots,x_K)$ is possibly much larger than the sample size $n$. For generalized linear models, if most of the $x_j$’s have very small effects on the response $y$, we show that it is possible to use Bayesian variable selection to reduce overfitting caused by the curse of dimensionality $K\gg n$. In this approach a suitable prior can be used to choose a few out of the many $x_j$’s to model $y$, so that the posterior will propose probability densities $p$ that are “often close” to the true density $p^*$ in some sense. The closeness can be described by a Hellinger distance between $p$ and $p^*$ that scales at a power very close to $n^{-1/2}$, which is the “finite-dimensional rate” corresponding to a low-dimensional situation. These findings extend some recent work of Jiang [Technical Report 05-02 (2005) Dept. Statistics, Northwestern Univ.] on consistency of Bayesian variable selection for binary classification.

Article information

Source
Ann. Statist., Volume 35, Number 4 (2007), 1487-1511.

Dates
First available in Project Euclid: 29 August 2007

https://projecteuclid.org/euclid.aos/1188405619

Digital Object Identifier
doi:10.1214/009053607000000019

Mathematical Reviews number (MathSciNet)
MR2351094

Zentralblatt MATH identifier
1123.62026

Subjects
Primary: 62F99: None of the above, but in this section
Secondary: 62J12: Generalized linear models

Citation

Jiang, Wenxin. Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities. Ann. Statist. 35 (2007), no. 4, 1487--1511. doi:10.1214/009053607000000019. https://projecteuclid.org/euclid.aos/1188405619

References

• Bickel, P. J. and Levina, E. (2004). Some theory for Fisher's linear discriminant function, “naive Bayes,” and some alternatives when there are many more variables than observations. Bernoulli 10 989–1010.
• Bühlmann, P. (2006). Boosting for high-dimensional linear models. Ann. Statist. 34 559–583.
• Clyde, M. and DeSimone-Sasinowska, H. (1997). Accounting for model uncertainty in Poisson regression models: Particulate matter and mortality in Birmingham, Alabama. ISDS Discussion Paper 97-06. Available at ftp.isds.duke.edu/pub/WorkingPapers/97-06.ps.
• Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
• Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G. and West, M. (2004). Sparse graphical models for exploring gene expression data. J. Multivariate Anal. 90 196–212.
• Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. International Congress of Mathematicians 3 595–622. European Math. Soc., Zürich.
• George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesian variable selection. Statist. Sinica 7 339–373.
• Ghosal, S. (1997). Normal approximation to the posterior distribution for generalized linear models with many covariates. Math. Methods Statist. 6 332–348.
• Ghosal, S. (1999). Asymptotic normality of posterior distributions in high-dimensional linear models. Bernoulli 5 315–331.
• Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500–531.
• Greenshtein, E. (2006). Best subset selection, persistence in high-dimensional statistical learning and optimization under $\ell_1$ constraint. Ann. Statist. 34 2367–2386.
• Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
• Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York.
• Jiang, W. (2005). Bayesian variable selection for high dimensional generalized linear models. Technical Report 05-02, Dept. Statistics, Northwestern Univ. Available at newton.stats.northwestern.edu/~jiang/tr/glmone2.tr.pdf.
• Jiang, W. (2006). On the consistency of Bayesian variable selection for high dimensional binary regression and classification. Neural Comput. 18 2762–2776.
• Kleijn, B. J. K. and van der Vaart, A. W. (2006). Misspecification in infinite-dimensional Bayesian statistics. Ann. Stat. 34 837–877.
• Kohn, R., Smith, M. and Chan, D. (2001). Nonparametric regression using linear combinations of basis functions. Statist. Comput. 11 313–322.
• Lee, H. K. H. (2000). Consistency of posterior distributions for neural networks. Neural Networks 13 629–642.
• Lee, K. E., Sha, N., Dougherty, E. R., Vannucci, M. and Mallick, B. K. (2003). Gene selection: A Bayesian variable selection approach. Bioinformatics 19 90–97.
• Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
• Nott, D. J. and Leonte, D. (2004). Sampling schemes for Bayesian variable selection in generalized linear models. J. Comput. Graph. Statist. 13 362–382.
• Sha, N., Vannucci, M., Tadesse, M. G., Brown, P. J., Dragoni, I., Davies, N., Roberts, T. C., Contestabile, A., Salmon, M., Buckley, C. and Falciani, F. (2004). Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics 60 812–819.
• Smith, M. and Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. J. Econometrics 75 317–343.
• Wang, X. and George, E. I. (2004). A hierarchical Bayes approach to variable selection for generalized linear models. Techinical report. Available at www.cs.berkeley.edu/~russell/classes/cs294/f05/papers/wang+george-2004.pdf.
• Wong, W. H. and Shen, X. (1995). Probability inequalities for likelihood ratios and convergence rates of sieve MLEs. Ann. Statist. 23 339–362.
• Zhang, T. (2006). From $\varepsilon$-entropy to KL-entropy: Analysis of minimum information complexity density estimation. Ann. Statist. 34 2180–2210.
• Zhou, X., Liu, K.-Y. and Wong, S. T. C. (2004). Cancer classification and prediction using logistic regression with Bayesian gene selection. J. Biomedical Informatics 37 249–259.