Bayesian Analysis

Evolutionary stochastic search for Bayesian model exploration

Leonard Bottolo and Sylvia Richardson

Full-text: Open access

Abstract

Implementing Bayesian variable selection for linear Gaussian regression models for analysing high dimensional data sets is of current interest in many fields. In order to make such analysis operational, we propose a new sampling algorithm based upon Evolutionary Monte Carlo and designed to work under the "large $p$, small $n$" paradigm, thus making fully Bayesian multivariate analysis feasible, for example, in genetics/genomics experiments. Two real data examples in genomics are presented, demonstrating the performance of the algorithm in a space of up to $10,000$ covariates. Finally the methodology is compared with a recently proposed search algorithms in an extensive simulation study.

Article information

Source
Bayesian Anal. Volume 5, Number 3 (2010), 583-618.

Dates
First available in Project Euclid: 22 June 2012

Permanent link to this document
https://projecteuclid.org/euclid.ba/1340380542

Digital Object Identifier
doi:10.1214/10-BA523

Mathematical Reviews number (MathSciNet)
MR2719668

Zentralblatt MATH identifier
1330.90042

Keywords
Evolutionary Monte Carlo Fast Scan Metropolis-Hastings scheme linear Gaussian regression models variable selection

Citation

Bottolo, Leonard; Richardson, Sylvia. Evolutionary stochastic search for Bayesian model exploration. Bayesian Anal. 5 (2010), no. 3, 583--618. doi:10.1214/10-BA523. https://projecteuclid.org/euclid.ba/1340380542


Export citation

References

  • Altshuler, D., Brooks, L. D., Chakravarti, A., Collins, F. S., Daly, M. D. and Donnelly, P. (2005). "A haplotype map of the human genome." Nature, 437: 1299–1320.
  • Andersson, U., Lindberg, J., Wang, S., Marcusson-Staringhl, R. B. M. et al. (2009). "A systems biology approach to understanding elevated serum alanine transaminase levels in a clinical trial with ximelagatran." Biomarkers, 14: 572–586.
  • Bae, N. and Mallick, B. K. (2004). "Gene selection using a two-level hierarchical Bayesian model." Bioinformatics, 20: 3423–3430.
  • Brown, P. J., Vannucci, M. and Fearn, T. (1998). "Multivariate Bayesian variable selection and prediction." Journal of the Royal Statistical Society - Series B, 60: 627–641.
  • Calvo, F. (2005). "All-exchange parallel tempering." Journal of Chemical Physics, 123: 1–7.
  • Chipman, H. (1996). "Bayesian variable selection with related predictors." Canadian Journal of Statistics, 24: 17–36.
  • Chipman, H., George, E. I. and McCulloch, R. E. (2001). "The practical implementation of Bayesian model selection (with discussion)." In Lahiri, P. (ed.), Model Selection, 66–134. IMS: Beachwood, OH.
  • Clyde, M. A. and George, E. I. (2004). "Model uncertainty." Statistical Science, 19: 81–94.
  • Cui, W. and George, E. I. (2008). "Empirical Bayes vs fully Bayes variable selection." Journal of Statistical Planning and Inference, 138: 888–900.
  • Dellaportas, P., Forster, J. and Ntzoufras, I. (2002). "On Bayesian model and variable selection using MCMC." Statistics and Computing, 12: 27–36.
  • Dumas, M., Wilder, S. P., Bihoreau, M., Barton, R. H., Fearnside, J. F. et al. (2007). "Direct quantitative trait locus mapping of mammalian metabolic phenotypes in diabetic and normoglycemic rat models." Nature Genetics, 39: 666–672.
  • Fernandez, C., Ley, E. and Steel, M. F. J. (2001). "Benchmark priors for Bayesian model averaging." Journal of Econometrics, 75: 317–343.
  • George, E. I. and McCulloch, R. E. (1993). "Variable selection via Gibbs sampling." Journal of the American Statistical Association, 88: 881–889.
  • –- (1997). "Approaches for Bayesian variable selection." Statistica Sinica, 7: 339–373.
  • Goswami, G. and Liu, J. S. (2007). "On learning strategies for evolutionary Monte Carlo." Statistics and Computing, 17: 23–38.
  • Gramacy, R. B., Samworth, R. J. and King, R. (2010). "Importance Tempering." Statistics and Computing, 20: 1–7.
  • Green, P. J. and Mira, A. (2001). "Delayed rejection in reversible jump Metropolis-Hastings." Biometrika, 88: 1035–1–53.
  • Hans, C., Dobra, A. and West, M. (2007). "Shotgun Stochastic Search for ʽʽ large $p$ʼʼ regression." Journal of the American Statistical Association, 102: 507–517.
  • Hübner, N., Wallace, C. A., Zimdahl, H., Petretto, E., Schulz, H. and et al. (2005). "Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease." Nature Genetics, 37: 243–253.
  • Iba, Y. (2001). "Extended Ensemble Monte Carlo." International Journal of Modern Physics C, 12: 623–656.
  • Jasra, A., Stephens, D. A. and Holmes, C. (2007). "Population-based reversible jump Markov chain Monte Carlo." Biometrika, 94: 787–807.
  • Kindmark, A., Jawaid, A., Harbron, C. G., Barrat, B. J. and March, R. E. (2008). "Genome-wide pharmacogenetic investigation of a hepatic adverse event without clinical signs of immunopathology suggests an underlying immune pathogenesis." Pharmacogenomics Journal, 8: 186–195.
  • Kohn, R., Smith, M. and Chan, D. (2001). "Nonparametric regression using linear combinations of basis functions." Statistics and Computing, 11: 313–322.
  • Liang, F., Paulo, R., Molina, G., Clyde, M. A. and Berger, J. O. (2008). "Mixtures of $g$-priors for Bayesian variable selection." Journal of the American Statistical Association, 481: 410–423.
  • Liang, F. and Wong, W. H. (2000). "Evolutionary Monte Carlo: application to $C_{p}$ model sampling and change point problem." Statistica Sinica, 10: 317–342.
  • Liu, J. (2001). Monte Carlo strategies in scientific computations. Springer: New York.
  • Madigan, D. and York, J. (1995). "Bayesian graphical models for discrete data." International Statistical Review, 63: 215–232.
  • Maruyama, Y. and George, I. E. (2008). "A $g$-prior extension for $p>n$." Technical report. http://arxiv.org/abs/0801.4410v1
  • Natarajan, R. and McCulloch, C. E. (1998). "Gibbs sampling with diffuse proper priors: a valid approach to data-driven inference." Journal of Computational and Graphical Statistics, 7: 267–277.
  • Nott, D. J. and Green, P. J. (2004). "Bayesian variable selection and the Swedsen-Wang algorithm." Journal of Computational and Graphical Statistics, 13: 141–157.
  • O'Hara, R. B. and Sillanpää, M. J. (2009). "A review of Bayesian variable selection methods: what, how and which." Bayesian Analysis, 4: 85–118.
  • Petretto, E., Bottolo, L., Langley, S. R., Heinig, M., McDermott-Roe, M. C., Sarwar, R., Pravenec, M., Hübner, N., Aitman, T. J., Cook, S. A. and Richardson, S. (2010). "New insights into the genetic control of gene expression using a Bayesian multi-tissue approach." PLoS Computational Biology, 6: e1000737.
  • Roberts, G. O. and Rosenthal, J. S. (2009). "Examples of adaptive MCMC." Journal of Computational and Graphical Statistics, 9: 349–367.
  • Tierney, L. and Kadane, J. B. (1986). "Accurate approximations for posterior moments and marginal densities." Journal of the American Statistical Association, 81: 82–86.
  • Wilson, M. A., Iversen, E. S., Clyde, M. A., Schmidler, S. C. and Shildkraut, J. M. (2009). "Bayesian model search and multilevel inference for SNP association studies." Technical report. http://arxiv.org/abs/0908.1144
  • Zellner, A. (1986). "On assessing prior distributions and Bayesian regression analysis with g-prior distributions." In Goel, P. K. and Zellner, A. (eds.), Bayesian Inference and Decision Techniques-Essays in Honour of Bruno de Finetti, 233–243. North-Holland: Amsterdam.
  • Zellner, A. and Siow, A. (1980). "Posterior odds ratios for selected regression hypotheses." In Bernardo, J. M., Groot, M. H. D., Lindley, D. V. and Smith, A. F. M. (eds.), Bayesian Statistics, Proc. 1st Int. Meeting, 585–603. University of Valencia Press: Valencia.