Bayesian Analysis

A Two-Component G-Prior for Variable Selection

Hongmei Zhang, Xianzheng Huang, Jianjun Gan, Wilfried Karmaus, and Tara Sabo-Attwood

Full-text: Open access

Abstract

We present a Bayesian variable selection method based on an extension of the Zellner’s g-prior in linear models. More specifically, we propose a two-component G-prior, wherein a tuning parameter, calibrated by use of pseudo-variables, is introduced to adjust the distance between the two components. We show that implementing the proposed prior in variable selection is more efficient than using the Zellner’s g-prior. Simulation results also indicate that models selected using the method with the two-component G-prior are generally more favorable with smaller losses compared to other methods considered in our work. The proposed method is further demonstrated using our motivating gene expression data from a lung disease study, and ozone data analyzed in earlier studies.

Article information

Source
Bayesian Anal., Volume 11, Number 2 (2016), 353-380.

Dates
First available in Project Euclid: 5 May 2015

Permanent link to this document
https://projecteuclid.org/euclid.ba/1430830144

Digital Object Identifier
doi:10.1214/15-BA953

Mathematical Reviews number (MathSciNet)
MR3471994

Zentralblatt MATH identifier
1357.62249

Keywords
Bayes factor measurement error mean squared loss pseudo variables tuning parameter

Citation

Zhang, Hongmei; Huang, Xianzheng; Gan, Jianjun; Karmaus, Wilfried; Sabo-Attwood, Tara. A Two-Component $G$ -Prior for Variable Selection. Bayesian Anal. 11 (2016), no. 2, 353--380. doi:10.1214/15-BA953. https://projecteuclid.org/euclid.ba/1430830144


Export citation

References

  • Bartlett, M. (1957). “A comment on D. V. Lindley’s statistical paradox.” Biometrika, 44: 533–534.
  • Brass, D. M., Tomfohr, J., Yang, I. V., and Schwartz, D. A. (2007). “Using Mouse Genomics to Understand Idiopathic Interstitial Fibrosis.” Proceedings of the American Thoracic Society, 4: 92–100.
  • Breiman, L. and Friedman, J. H. (1985). “Estimating optimal transformations for multiple regression and correlation.” Journal of the American Statistical Association, 80: 580–598.
  • Browne, W. J. and Draper, D. (2006). “A Comparison of Bayesian and Likelihood-based Methods for Fitting Multilevel Models (Pkg: P473-550).” Bayesian Analysis, 1(3): 473–514.
  • Carroll, R., Ruppert, D., Stefanske, L. A., and Crainiceanu, C. (2006). Measurement Error in Nonlinear Models: A Modern Perspective. Chapman and Hall/CRC Press.
  • Casella, G. and Moreno, E. (2006). “Objective Bayesian variable selection.” Journal of the American Statistical Association, 101: 157–167.
  • Christensen, P. J., Bailie, M. B., Goodman, R. E., O’Brien, A. D., Toews, G. B., and Paine, R. (2000). “Role of diminished epithelial GM-CSF in the pathogenesis of bleomycin-induced pulmonary fibrosis.” American Journal of Physiology – Lung Cellular and Molecular Physiology, 279: L487–L495.
  • Fan, J. and Li, R. (2001). “Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association, 96(456): 1348–1360.
  • Fernández, C., Ley, E., and Steel, M. (2001). “Benchmark priors for Bayesian model averaging.” Journal of Econometrics, 100(2): 381–427.
  • Foster, D. P. and George, E. I. (1994). “The risk inflation criterion for multiple regression.” Annals of Statistics, 22: 1947–1975.
  • Gelman, A. (2006). “Prior distributions for variance parameters in hierarchical models.” Bayesian Analysis, 1: 515–533.
  • Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). Bayesian Data Analysis. Chapman & Hall/CRC.
  • George, E. I. (2000). “The Variable Selection Problem.” Journal of the American Statistical Association, 95: 1304–1308.
  • George, E. I. and Foster, D. P. (2000). “Calibration and Empirical Bayes Variable Selection.” Biometrika, 87: 731–747.
  • George, E. I. and McCulloch, R. E. (1993). “Variable Selection via Gibbs Sampling.” Journal of the American Statistical Association, 88: 881–889.
  • — (1997). “Approaches for Bayesian Variable Selection.” Statistica Sinica, 7: 339–374.
  • — (1999). “Comment on “Variable Selection and Function Estimation in Additive Nonparametric Regression Using a Data-based Prior”.” Journal of the American Statistical Association, 94: 798–799.
  • Higgins, K. M., Davidian, M., and Giltinan, D. M. (1997). “A Two-step Approach to Measurement Error in Time-dependent Covariates in Nonlinear Mixed-effects Models, with Application to IGF-I Pharmacokinetics.” Journal of the American Statistical Association, 92: 436–448.
  • Huaux, F., Gharaee-Kermani, M., Liu, T., Morel, V., McGarry, B., Ullenbruch, M., Kunkel, S. L., Wang, J., Xing, Z., and Phan, S. H. (2005). “Role of Eotaxin-1 (CCL11) and CC Chemokine Receptor 3 (CCR3) in Bleomycin-Induced Lung Injury and Fibrosis.” The American Journal of Pathology, 167: 1485–1496.
  • Ishwaran, H. and Rao, J. S. (2005a). “Spike and Slab Gene Selection for Multigroup Microarray Data.” Journal of the American Statistical Association, 100: 764–780.
  • — (2005b). “Spike and Slab Variable Selection: Frequentist and Bayesian Strategies.” The Annals of Statistics, 33: 730–773.
  • Knight, D., Ernst, M., Anderson, G., Moodley, Y., and Mutsaers, S. (2003). “The role of gp130/IL-6 cytokines in the development of pulmonary fibrosis: critical determinants of disease susceptibility and progression?” Pharmacology & Therapeutics, 99: 327–338.
  • Lee, K. E., Sha, N., Dougherty, E. R., Vannucci, M., and Mallick, B. K. (2003). “Gene selection: a Bayesian variable selection approach.” Bioinformatics, 19: 90–97.
  • Liang, F., Paulo, R., Molina, G., Clyde, M. A., and Berger, J. O. (2008). “Mixtures of $g$ priors for Bayesian Variable Selection.” Journal of the American Statistical Association, 103: 410–423.
  • Liang, H. and Li, R. (2009). “Variable Selection for Partially Linear Models with Measurement Errors.” Journal of the American Statistical Association, 104: 234–248.
  • Lindley, D. V. (1957). “A Statistical Paradox.” Biometrika, 44: 187–192.
  • Liu, W. and Wu, L. (2007). “Simultaneous Inference for Semiparametric Nonlinear Mixed-Effects Models with Covariate Measurement Errors and Missing Responses.” Biometrics, 63: 342–350.
  • Ma, Y. and Li, R. (2010). “Variable Selection in Measurement Error Models.” Bernoulli, 16: 274–300.
  • Marcus, M. (1992). A survey of matrix theory and matrix inequalities, Vol. 14. Mineola, NY: Courier Dover Publications, Inc.
  • Maruyama, Y. and George, E. I. (2011). “Fully Bayes Factor with a Generalized $g$-prior.” The Annals of Statistics, 39: 2740–2765.
  • Miller, A. (2002). Subset Selection in Regression. New York: Chapman and Hall.
  • Mitchell, T. J. and Beauchamp, J. J. (1988). “Bayesian Variable Selection in Linear Regression (C/R: P1033-1036).” Journal of the American Statistical Association, 83: 1023–1032.
  • Morris, C. N. (1987). “Comments on “The Calculation of Posterior Distributions by Data Augmentation”.” Journal of the American Statistical Association, 82: 542–543.
  • Rocke, D. M. and Durbin, B. (2001). “A Model for Measurement Error for Gene Expression Arrays.” Journal of Computational Biology, 8: 557–569.
  • Sabo-Attwood, T., Ramos-Nino, M. E., Eugenia-Ariza, M., MacPherson, M. B., Butnor, K. J., Vacek, S. P., P. C.and McGee, Clark, J. C., Steele, C., and Mossman, B. T. (2011). “Osteopontin Modulates Inflammation, Mucin Production, and Gene Expression Signatures After Inhalation of Asbestos in a Murine Model of Fibrosis.” The American Journal of Pathology, 178: 1975–1985.
  • Smith, M. and Kohn, R. (1996). “Nonparametric Regression Using Bayesian Variable Selection.” Journal of Econometrics, 75: 317–343.
  • Som, A., Hans, C., and MacEachern, S. (2014). “Bayesian Modeling with Blockwise Hyper-g Priors.” arXiv:1406.6419.
  • Tibshirani, R. (1996). “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society, Series B (Methodological), 58: 267–288.
  • Vannucci, M., Do, K., and Müller, P. (2012). Bayesian Inference for Gene Expression and Proteomics. Cambridge University Press.
  • Wu, Y., Boos, D. D., and Stefanski, L. A. (2007). “Controlling variable selection by the addition of pseudo variables.” Journal of the American Statistical Association, 102: 235–243.
  • Zellner, A. (1986). “On Assessing Prior Distributions and Bayesian Regression Analysis with $g$-prior Distributions.” In: Goel, P. K. and Zellner, A. (eds.), Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, 233–243. Elsevier/North-Holland [Elsevier Science Publishing Co., New York; North-Holland Publishing Co., Amsterdam].
  • Zou, H. (2006). “The Adaptive LASSO and Its Oracle Properties.” Journal of the American Statistical Association, 101: 1418–1429.