Bayesian Analysis

A Novel Algorithmic Approach to Bayesian Logic Regression

Aliaksandr Hubin, Geir Storvik, and Florian Frommlet

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access

Abstract

Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has (partly due to computational challenges) remained less well known than other approaches to epistatic association mapping. Here we will adapt an advanced evolutionary algorithm called GMJMCMC (Genetically modified Mode Jumping Markov Chain Monte Carlo) to perform Bayesian model selection in the space of logic regression models. After describing the algorithmic details of GMJMCMC we perform a comprehensive simulation study that illustrates its performance given logic regression terms of various complexity. Specifically GMJMCMC is shown to be able to identify three-way and even four-way interactions with relatively large power, a level of complexity which has not been achieved by previous implementations of logic regression. We apply GMJMCMC to reanalyze QTL (quantitative trait locus) mapping data for Recombinant Inbred Lines in Arabidopsis thaliana and from a backcross population in Drosophila where we identify several interesting epistatic effects. The method is implemented in an R package which is available on github.

Article information

Source
Bayesian Anal., Advance publication (2018), 27 pages.

Dates
First available in Project Euclid: 20 December 2018

Permanent link to this document
https://projecteuclid.org/euclid.ba/1545296448

Digital Object Identifier
doi:10.1214/18-BA1141

Keywords
logic regression Bayesian model averaging mode jumping Monte Carlo Markov Chain genetic algorithm QTL mapping

Rights
Creative Commons Attribution 4.0 International License.

Citation

Hubin, Aliaksandr; Storvik, Geir; Frommlet, Florian. A Novel Algorithmic Approach to Bayesian Logic Regression. Bayesian Anal., advance publication, 20 December 2018. doi:10.1214/18-BA1141. https://projecteuclid.org/euclid.ba/1545296448


Export citation

References

  • Balasubramanian, S., Schwartz, C., Singh, A., Warthmann, N., Kim, M., Maloof, J., Loudet, O., Trainer, G., Dabi, T., Borevitz, J., Chory, J., and Weigel, D. (2009). “QTL mapping in new Arabidopsis thaliana advanced intercross-recombinant inbred lines.” PLoS One, 4(2).
  • Barber, R. F., Drton, M., and Tan, K. M. (2016). Laplace Approximation in High-Dimensional Bayesian Regression, 15–36. Cham: Springer International Publishing.
  • Barbieri, M. M., Berger, J. O., et al. (2004). “Optimal predictive model selection.” The annals of statistics, 32(3): 870–897.
  • Bayarri, M. J., Berger, J. O., Forte, A., García-Donato, G., et al. (2012). “Criteria for Bayesian model choice with application to variable selection.” The Annals of statistics, 40(3): 1550–1577.
  • Bogdan, M., Ghosh, J. K., and Tokdar, S. T. (2008). “A comparison of the Simes-Benjamini-Hochberg procedure with some Bayesian rules for multiple testing.” IMS Collections, Vol. 1, Beyond Parametrics in Interdisciplinary Research: Fetschrift in Honor of Professor Pranab K. Sen, edited by N. Balakrishnan, Edsel Peña and Mervyn J. Silvapulle, 211–230.
  • Chib, S. (1995). “Marginal likelihood from the Gibbs output.” Journal of the American Statistical Association, 90(432): 1313–1321.
  • Chib, S. and Jeliazkov, I. (2001). “Marginal likelihood from the Metropolis–Hastings output.” Journal of the American Statistical Association, 96(453): 270–281.
  • Claeskens, G. and Hjort, N. L. (2008). Model Selection and Model Averaging. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press.
  • Clarke, J. L., Clarke, B., Yu, C.-W., et al. (2013). “Prediction in M-complete Problems with Limited Sample Size.” Bayesian Analysis, 8(3): 647–690.
  • Clyde, M. A., Ghosh, J., and Littman, M. L. (2011). “Bayesian adaptive sampling for variable selection and model averaging.” Journal of Computational and Graphical Statistics, 20(1): 80–101.
  • Fritsch, A. (2006). “A Full Bayesian Version of Logic regression for SNP Data.” Ph.D. thesis, Diploma Thesis.
  • Fritsch, A. and Ickstadt, K. (2007). “Comparing Logic Regression Based Methods for Identifying SNP Interactions.” Springer Berlin / Heidelberg, Lecture Notes in Computer Science, 4414: 90–103.
  • Frommlet, F., Ljubic, I., Arnardottir, H., and Bogdan, M. (2012). “QTL Mapping Using a Memetic Algorithm with modifications of BIC as fitness function.” Statistical Applications in Genetics and Molecular Biology, 11(4): Article 2.
  • Gelman, A., Stern, H. S., Carlin, J. B., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian data analysis. Chapman and Hall/CRC.
  • Hubin, A. and Storvik, G. (2018). “Mode jumping MCMC for Bayesian variable selection in GLMM.” Computational Statistics and Data Analysis.
  • Hubin, A., Storvik, G., and Frommlet, F. (2018a). “Deep Bayesian regression models.” arXiv preprint arXiv:1806.02160. Submitted for publication.
  • Hubin, A., Storvik, G., and Frommlet, F. (2018b). “Supplementary Material for: A novel algorithmic approach to Bayesian Logic Regression.” Bayesian Analysis.
  • Janes, H., Pepe, M., Kooperberg, C., and Newcomb, P. (2005). “Identifying target populations for screening or not screening using logic regression.” Statistics in Medicine, 24: 1321–1338.
  • Jeffreys, H. (1946). “An invariant form for the prior probability in estimation problems.” Proceedings of the Royal Society of London. Series A, 186(1007): 453–461.
  • Jeffreys, H. (1961). Theory of probability. Oxford University Press, London.
  • Keles, S., van der Laan, M., and Vulpe, C. (2004). “Regulatory motif finding by logic regression.” Bioinformatics, 20: 2799–2811.
  • Kooperberg, C. and Ruczinski, I. (2005). “Identifying Interacting SNPs Using Monte Carlo Logic Regression.” Genetic Epidemiology, 28: 157–170.
  • Li, Y. and Clyde, M. A. (2018). “Mixtures of g-priors in generalized linear models.” Journal of the American Statistical Association, (just-accepted).
  • Malina, M., Ickstadt, K., Schwender, H., Posch, M., and Bogdan, M. (2014). “Detection of epistatic effects with logic regression and a classical linear regression model.” Statistical Applications in Genetics and Molecular Biology, 13(1): 83–104.
  • McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. 2nd Edition. Chapman and Hall, London.
  • Raftery, A. E., Madigan, D., and Hoeting, J. A. (1997). “Bayesian model averaging for linear regression models.” Journal of the American Statistical Association, 92(437): 179–191.
  • Ruczinski, I., Kooperberg, C., and LeBlanc, M. (2003). “Logic regression.” Journal of Computational and Graphical Statistics, 12(3): 474–511.
  • Ruczinski, I., Kooperberg, C., and LeBlanc, M. (2004). “Exploring Interactions in High-Dimensional Genomic Data: An Overview of Logic Regression, with Applications.” Journal of Multivariate Analysis, 90: 178–195.
  • Schwarz, G. (1978). “Estimating the dimension of a model.” The Annals of Statistics, 6: 461–464.
  • Schwender, H. and Ickstadt, K. (2008). “Identification of SNP interactions using logic regression.” Biostatistics, 9: 187–198.
  • Schwender, H. and Ruczinski, I. (2010). “Logic Regression and Its Extensions.” Advances in Genetics, 72: 25–45.
  • Scott, J. G. and Berger, J. O. (2008). “Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem.” Annalls of Statistics, 38(5): 2587–2619.
  • Tierney, L. and Kadane, J. B. (1986). “Accurate Approximations for Posterior Moments and Marginal Densities.” Journal of the American statistical association, 81(393): 82–86.
  • Tjelmeland, H. and Hegstad, B. K. (2001). “Mode jumping proposals in MCMC.” Scandinavian Journal of Statistics, 28(1): 205–223.
  • Wakefield, J. (2007). “A Bayesian measure of the probability of false discovery in genetic epidemiology studies.” The American Journal of Human Genetics, 81(2): 208–227.
  • Wang, Y. H. (1993). “On the number of successes in independent trials.” Statistica Sinica, 295–312.

Supplemental materials

  • Supplementary Material for: A novel algorithmic approach to Bayesian Logic Regression. https://github.com/aliaksah/EMJMCMC2016/tree/master/supplementaries/Bayesian%20Logic%20Regression.