The Annals of Statistics

Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities

Wenxin Jiang

Full-text: Open access

Abstract

Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables $(x_1,\ldots,x_K)$ is possibly much larger than the sample size $n$. For generalized linear models, if most of the $x_j$’s have very small effects on the response $y$, we show that it is possible to use Bayesian variable selection to reduce overfitting caused by the curse of dimensionality $K\gg n$. In this approach a suitable prior can be used to choose a few out of the many $x_j$’s to model $y$, so that the posterior will propose probability densities $p$ that are “often close” to the true density $p^*$ in some sense. The closeness can be described by a Hellinger distance between $p$ and $p^*$ that scales at a power very close to $n^{-1/2}$, which is the “finite-dimensional rate” corresponding to a low-dimensional situation. These findings extend some recent work of Jiang [Technical Report 05-02 (2005) Dept. Statistics, Northwestern Univ.] on consistency of Bayesian variable selection for binary classification.

Article information

Source
Ann. Statist., Volume 35, Number 4 (2007), 1487-1511.

Dates
First available in Project Euclid: 29 August 2007

Permanent link to this document
https://projecteuclid.org/euclid.aos/1188405619

Digital Object Identifier
doi:10.1214/009053607000000019

Mathematical Reviews number (MathSciNet)
MR2351094

Zentralblatt MATH identifier
1123.62026

Subjects
Primary: 62F99: None of the above, but in this section
Secondary: 62J12: Generalized linear models

Keywords
convergence rates generalized linear models high dimensional data posterior distribution prior distribution sparsity variable selection

Citation

Jiang, Wenxin. Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities. Ann. Statist. 35 (2007), no. 4, 1487--1511. doi:10.1214/009053607000000019. https://projecteuclid.org/euclid.aos/1188405619


Export citation

References

  • Bickel, P. J. and Levina, E. (2004). Some theory for Fisher's linear discriminant function, “naive Bayes,” and some alternatives when there are many more variables than observations. Bernoulli 10 989–1010.
  • Bühlmann, P. (2006). Boosting for high-dimensional linear models. Ann. Statist. 34 559–583.
  • Clyde, M. and DeSimone-Sasinowska, H. (1997). Accounting for model uncertainty in Poisson regression models: Particulate matter and mortality in Birmingham, Alabama. ISDS Discussion Paper 97-06. Available at ftp.isds.duke.edu/pub/WorkingPapers/97-06.ps.
  • Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G. and West, M. (2004). Sparse graphical models for exploring gene expression data. J. Multivariate Anal. 90 196–212.
  • Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. International Congress of Mathematicians 3 595–622. European Math. Soc., Zürich.
  • George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesian variable selection. Statist. Sinica 7 339–373.
  • Ghosal, S. (1997). Normal approximation to the posterior distribution for generalized linear models with many covariates. Math. Methods Statist. 6 332–348.
  • Ghosal, S. (1999). Asymptotic normality of posterior distributions in high-dimensional linear models. Bernoulli 5 315–331.
  • Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500–531.
  • Greenshtein, E. (2006). Best subset selection, persistence in high-dimensional statistical learning and optimization under $\ell_1$ constraint. Ann. Statist. 34 2367–2386.
  • Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York.
  • Jiang, W. (2005). Bayesian variable selection for high dimensional generalized linear models. Technical Report 05-02, Dept. Statistics, Northwestern Univ. Available at newton.stats.northwestern.edu/~jiang/tr/glmone2.tr.pdf.
  • Jiang, W. (2006). On the consistency of Bayesian variable selection for high dimensional binary regression and classification. Neural Comput. 18 2762–2776.
  • Kleijn, B. J. K. and van der Vaart, A. W. (2006). Misspecification in infinite-dimensional Bayesian statistics. Ann. Stat. 34 837–877.
  • Kohn, R., Smith, M. and Chan, D. (2001). Nonparametric regression using linear combinations of basis functions. Statist. Comput. 11 313–322.
  • Lee, H. K. H. (2000). Consistency of posterior distributions for neural networks. Neural Networks 13 629–642.
  • Lee, K. E., Sha, N., Dougherty, E. R., Vannucci, M. and Mallick, B. K. (2003). Gene selection: A Bayesian variable selection approach. Bioinformatics 19 90–97.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Nott, D. J. and Leonte, D. (2004). Sampling schemes for Bayesian variable selection in generalized linear models. J. Comput. Graph. Statist. 13 362–382.
  • Sha, N., Vannucci, M., Tadesse, M. G., Brown, P. J., Dragoni, I., Davies, N., Roberts, T. C., Contestabile, A., Salmon, M., Buckley, C. and Falciani, F. (2004). Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics 60 812–819.
  • Smith, M. and Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. J. Econometrics 75 317–343.
  • Wang, X. and George, E. I. (2004). A hierarchical Bayes approach to variable selection for generalized linear models. Techinical report. Available at www.cs.berkeley.edu/~russell/classes/cs294/f05/papers/wang+george-2004.pdf.
  • Wong, W. H. and Shen, X. (1995). Probability inequalities for likelihood ratios and convergence rates of sieve MLEs. Ann. Statist. 23 339–362.
  • Zhang, T. (2006). From $\varepsilon$-entropy to KL-entropy: Analysis of minimum information complexity density estimation. Ann. Statist. 34 2180–2210.
  • Zhou, X., Liu, K.-Y. and Wong, S. T. C. (2004). Cancer classification and prediction using logistic regression with Bayesian gene selection. J. Biomedical Informatics 37 249–259.