A novel algorithmic approach to Bayesian Logic Regression

Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has remained less well known than other approaches to epistatic association mapping. Here we will adopt an advanced evolutionary algorithm called GMJMCMC (Genetically modified Mode Jumping Markov Chain Monte Carlo) to perform Bayesian model selection in the space of logic regression models. After describing the algorithmic details of GMJMCMC we perform a comprehensive simulation study that illustrates its performance given logic regression terms of various complexity. Specifically GMJMCMC is shown to be able to identify three-way and even four-way interactions with relatively large power, a level of complexity which has not been achieved by previous implementations of logic regression. We apply GMJMCMC to reanalyze QTL mapping data for Recombinant Inbred Lines in Arabidopsis thaliana and from a backcross population in Drosophila where we identify several interesting epistatic effects.


Introduction
Logic regression (not to be confused with logistic regression) was developed as a general tool to obtain predictive models based on Boolean combinations of binary covariates (Ruczinski et al., 2003). Its primary application area is epistatic association mapping as pioneered by Ruczinski et al. (2004); Kooperberg & Ruczinski (2005) although already early on the method was also used in other areas (Keles et al., 2004;Janes et al., 2005). Important contributions to the development of logic regression were later made by the group of Katja Ickstadt (Fritsch, 2006;Schwender & Ickstadt, 2008), which also provided a comparison of different implementations of logic regression (Fritsch & Ickstadt, 2007). Schwender & Ruczinski (2010) gave a brief introduction with various applications and potential extensions of logic regression. Recently a systematic comparison of the performance of logic regression and a more classical regression approach based on Cockerham's coding (Wang & Zeng, 2009) to detect interactions illustrated the advantages of logic regression to detect epistasic effects in QTL mapping (Malina et al., 2014). Given the potential of logic regression to detect interpretable interaction effects in a regression setting it is rather surprising that it has not yet become wider addressed in applications. Originally logic regression was introduced together with likelihood based model selection, where simulated annealing served as a strategy to obtain one "best" model (see Ruczinski et al., 2003, for details). However, assuming that there is one "best" model disregards the problem of model uncertainty. Whilst this approach works well in simulation studies, it seems to be quite an unrealistic assumption in real world applications, where there often is no "true" model. Hence Bayesian model averaging becomes important which implicitly takes into account model uncertainty. Bayesian versions of logic regression combined with model exploration include Monte Carlo logic regression (MCLR) (Kooperberg & Ruczinski, 2005) and the full Bayesian version of logic regression (FBLR) by Fritsch (2006). Both MCLR and FBLR use Markov Chain Monte Carlo (MCMC) algorithms for searching through the space of models and parameters. Inference is then based on a large number of models instead of just one model as in the original version of logic regression. MCLR utilizes a geometric prior on the size of the model (defined through the number of logic terms and their complexity). All models of the same size get the same prior probability while larger models implicitly are penalized. Regression parameters are marginalized out, significantly simplifying computational complexity. In contrast FBLR is performed on a joint space of parameters and models. FBLR uses multivariate normal priors for regression parameters, while model size is furnished with a slightly different prior serving similar purposes as the MCLR prior. In case of a large number of binary covariates these MCMC based methods might require extremely long Markov chains to guarantee convergence which can make them unfeasible in practice. Additionally both of them utilize simple Metropolis-Hastings settings which, together with the fact that the search space is often multimodal, increases the probability that they are stuck in local extrema for a significant amount of time.
In this paper we propose a new approach for Bayesian logic regression including model uncertainty. We introduce a novel prior for the topology of logic regression models which is slightly simpler to compute than the one used by MCLR and which still shows excellent properties in terms of controlling false discoveries. The proposed search algorithm, named GMJMCMC, combines genetic algorithm ideas with the mode jumping Markov Chain Monte Carlo (MJMCMC) algorithm (Hubin & Storvik, 2016a) in order to be able to jump between local modes in the model space.
An important ingredient in the GMJMCMC algorithm is the need for calculating integrated (or marginal) likelihoods for given models. To compute marginal likelihoods we make use of the Laplace approximation, thus bypassing the specification of priors for the regression coefficients (where MCLR and FBLR had different assumptions). Using Laplace approximation allows us to compute model posterior distributions much easier than with reversible jump MCMC. A similar approximation was suggested in Frommlet et al. (2012) and successfully applied in the context of QTL mapping. The main difference to the previous approaches is a better search strategy for exploring the model space. In more complex settings, when neither tractable likelihoods are available nor Laplace's approximation performs well, MCMC algorithms combined with e.g. Chib's method (Chib, 1995) can be applied, although this is again computationally more expensive than simple Laplace approximation (see also Friel & Wyse, 2012;Hubin & Storvik, 2016b, for alternative MCMC based methods). For Gaussian latent variables, the computational task can be efficiently solved through the integrated nested Laplace approximation (INLA, Rue et al., 2009;Hubin & Storvik, 2016b) but we will not further pursue this extension of our approach in this paper.
After formally introducing logic regression and describing the GMJMCMC algorithm in detail we will present results from a comprehensive simulation study. The performance of GMJM-CMC is compared with MCLR and FBLR in case of logistic models (binary responses) and additionally analyzed for linear models (quantitative responses). Models of different complexities are studied which allows us to illustrate the potential of GMJMCMC to detect higher order interactions. Finally we apply our logic regression approach to perform QTL mapping using two publicly available data sets. The first study is concerned with the hypocotyledonous stem length in Arabidopsis thaliana using Recombinant Inbred Line (RIL) data (Balasubramanian et al., 2009), the second one considers various traits from backcross data of Drosophila Simulans and Drosophila Mauritana (Zeng et al., 2000). 3 2 Methods

Logic regression
The method of logic regression (Ruczinski et al., 2003) was specifically designed for the situation where covariates are binary and predictors are defined as logic expressions operating on these binary variables. Logic regression can be applied in the context of the generalized linear model (GLM) as demonstrated in Malina et al. (2014). It can also be easily expanded to the domain of generalized linear mixed models (GLMM), but to keep our presentation as simple as possible we will focus here on generalized linear regression models.
Consider a response variable Y ∈ R, together with m binary covariates X 1 , X 2 , . . . , X m . Our primary example will be genetic association studies where, depending on the context, each binary covariate, X j , j ∈ {1, 2, . . . , m}, can have a different interpretation. In QTL mapping with backcross design or recombinant inbred lines X j simply codes the two possible genetic variants. In case of intercross design or in outbred populations different X j will be used to code dominant and recessive effects (see for example Malina et al., 2014). Each combination of the binary variables X j with the logical operators ∧ (AND), ∨ (OR) and X c (NOT X), is called a logic expression (for example L = (X 1 ∧ X 2 ) ∨ X c 3 ). Following the nomenclature of Kooperberg & Ruczinski (2005) we will refer to logic expressions as trees, whereas the primary variables contained in each tree are called leaves. The set of leaves of a tree L will be denoted by v(L), that is for the specified example we have v(L) = {X 1 , X 2 , X 3 }. We will study logic regression models of the general form where L j represents a logic combination of binaries, g is an appropriate link function and β j , j ∈ {0, ..., k} are unknown regression parameters. Our primary examples are linear regression for quantitative responses and logistic regression for dichotomous responses but the implementation of our approach works for any generalized linear model. For a given model M we will write T (M ) = {L 1 , . . . L k } for the set of trees which contribute as regressors. We go along with the usual convention in the context of variable selection that 'model' refers to the set of regressors and does not take into account the specific values of the non-zero regression coefficients. Each logic expression L j ∈ T (M ) can contain one or more leaves where we take C max as an upper limit for the tree size defined through the number of leaves. Our starting point is to formulate a prior for the number of trees in the model. Similarly to the FBLR approach (Kooperberg & Ruczinski, 2005) we use a uniform distribution on the number of trees where the maximal number of trees k max is a parameter to be specified depending on the problem. The choice of a uniform prior distribution can be motivated for example by considering the limiting case of the EBIC criterion (Chen & Chen, 2008) for high dimensional model selection. Different trees are assumed to be a priori independent, giving We then want to make the prior of a tree with s leaves proportional to the inverse of the total number of trees having s leaves, N (s). Note that each tree of size s consists of s leaves. Ignoring logic expressions including the same variable multiple times there are m s possibilities to select variables. Each variable can undergo logic negation giving s binary choices and furthermore there are s − 1 logic symbols (∨, ∧) to be chosen resulting in 2 2s−1 different expressions. However, due to De Morgan's law half of the expressions provide identical logic regression models. This gives In combination with the uniform distribution on the number of trees we then obtain the following prior on the model topology: where s j is the number of leaves in tree L j . Here we use the notation I (A) = 1 if A is true and 0 otherwise to define constraints on the maximal number of leaves per tree.

Computing posterior probabilities
Given prior probabilities for any logic regression model M the model posterior probability can be computed according to Bayes formula as where P (Y | M ) denotes the integrated (or marginal) likelihood for model M and Ω is the set of all models in the model space. The sum in the denominator involves a huge number of terms and it is impossible to compute all of them. Classical MCMC based approaches (like MCLR and FBLR) overcome this problem by estimating model posteriors with the relative frequency with which a specific model M occurs in the Markov chain. In case of an ultrahigh-dimensional model space (like in case of logic regression) this is computationally extremely challenging and might require chain lengths which are prohibitive for practical applications. An alternative approach makes use of the fact that most of the summands in the denominator of (4) will be so small that they can be neglected. Considering a subset Ω * ⊆ Ω containing the most important models we can therefore approximate (4) by To obtain good estimates we have to search in the model space for those models that contribute significantly to the sum in the denominator, that is for those models with large posterior probabilities or equivalently with large values of P (Y | M )P (M ). In Frommlet et al. (2012) specific memetic algorithms were developed to perform the model search for linear regression. Here we will rely upon the GMJMCMC algorithm to be described in the next section. To compute the marginal likelihood terms P (Y | M ) for M ∈ Ω * we perform Laplace approximation combined with the arguments which lead to the BIC criterion (Schwarz, 1978), in which case we obtain where P (Y | M,θ M ) is the likelihood evaluated at the maximum likelihood estimateθ M of the parameters for model M (the corresponding regression coefficients and possible extra dispersion parameters) while n is the number of observations. In the special case of a linear Gaussian model −2 log P (Y | M,θ M ) = n log RSS M , where RSS M is the residual sum of squares evaluated at the least squares estimates for the given model M .
Based on model posterior probabilities one can easily obtain an estimate of the posterior probability for a logic expression L to be included in a model (also referred to as the marginal inclusion probability) bỹ Inference on trees can then be performed by means of selecting those trees with a posterior probability being larger than some threshold probability π C .

The GMJMCMC algorithm
To fix ideas consider first a variable selection problem with q potential covariates to enter a model. Define γ j to be 1 if the j-th variable is to be included into the model and 0 otherwise. A model M is thus specified by the vector γ = (γ 1 , ..., γ q ) and the general model space Ω is of size 2 q . If this discrete model space is multimodal in terms of model posterior probabilities then simple MCMC algorithms typically run into problems by staying for too long in the vicinity of local maxima. Recently, the mode jumping MCMC procedure (MJMCMC) was proposed (Hubin & Storvik, 2016a) to overcome this issue. MJMCMC is a proper MCMC algorithm equipped with the possibility to jump between different modes within the discrete model space. The key to the success of MJMCMC is the generation of good proposals of models which are not too close to the current state. This is achieved by first making a large jump (changing many model components) and then performing local optimization within the discrete model space to obtain a proposal model. Within a Metropolis-Hastings setting a valid acceptance probability is then constructed using symmetric backward kernels, which guarantees that the resulting Markov chain is ergodic and has the desired limiting distribution (Hubin & Storvik, 2016a). The MJMCMC algorithm requires that all of the covariates defining the model space are known in advance and are all considered at each iteration of the algorithm. In case of logic regression the covariates are trees and a major problem in this setting is that it is quite difficult to fully specify the space Ω. In fact it is even difficult to specify the number q of the total number of feasible trees. To solve this problem we present an adaptive algorithm called Genetically Modified MJMCMC (GMJMCMC), where MJMCMC is embedded in the iterative setting of a genetic algorithm. In each iteration only a given set S of trees (of fixed size d) is considered. Each S then becomes a separate search space for MJMCMC. In the language of genetic algorithms S is the population, which dynamically evolves to allow MJMCMC exploring different reasonable parts of the unfeasibly large total search space. The resulting algorithm is similar to feature engineering (Xu et al., 2012) and allows to consider combinations of covariates that can be adapted throughout the search.
To be more specific, we consider different populations S 1 , S 2 , ... where each S t is a set of d trees. For each given population a fixed number of MJMCMC steps is performed. Since the MJMCMC algorithm is specified in full detail in Hubin & Storvik (2016a), we will concentrate here on describing the evolutionary dynamics yielding subsequent populations S t . In principle it is possible to construct a proper MCMC algorithm which aims at simulating from extended models of the form P (M, S|Y ) having P (M |Y ) as a stationary distribution (to be published in a forthcoming paper). However, utilization of the approximation (5) in combination with (6) allows us to compute posterior probabilities for all models in Ω * which have been visited at least once by the algorithm. This means that we may relax the restriction of detailed balance typically required by MCMC when model posterior probabilities are estimated by the relative frequency of how often a model has been visited.
The algorithm is initialized by first running MJMCMC for a given number of iterations N init on the set of all binary covariates X 1 , ..., X m as potential regressors, but not including any interactions. For later reference we denote the set of all leaves used in this initialization step by S 0 . The first d 1 < d members of population S 1 are then defined to be the d 1 leaves with largest marginal inclusion probability. Other members of s are removed into set D 0 . The remaining d − d 1 members of S 1 are obtained by forming logic expressions from these d 1 covariates where trees are generated randomly by means of the crossover operation described below. After S 1 has been initialized MJMCMC is performed for a fixed number of iterations N expl before the next population S 2 is generated, and so on. Members of the new population S t+1 for t > 1 are chosen from a neighborhood N t of the previous population S t . We define N t to include The d 1 input leaves from the initialization procedure remain in all populations S t throughout our search spaces. Other trees from the population S t with low marginal inclusion probabilities (below a threshold ρ min ) will be substituted by random members of N t to obtain S t+1 . New trees are generated as crossovers, mutations or reductions randomly according to a probabilistic model and their fitness is then computed based on estimates of marginal probabilities of the leaves involved in the tree. This process is iterated for T max populations S t ∈ {1, ..., T max }. Further details on GMJMCMC and choices of the tuning parameters in the experimental studies are provided in Section A of the appendix.
Due to our interest in exploring as many unique high quality models as possible and doing it as fast as possible, running multiple parallel chains is likely to be computationally beneficial compared to running one long chain. The process can be embarrassingly parallelized into B chains using several CPUs, GPUs or clusters. If one is mainly interested in model probabilities, then equation (5) can be directly applied with Ω * now being the set of unique models visited within all runs. However, we suggest a more memory efficient approach. If some statistic ∆ is of interest, one can utilize the following posterior estimates based on weighted sums over individual runs:P Here w b is a set of weights to be specified andP b (∆ | Y ) are the posteriors obtained with formula (5) from run b of GMJMCMC (details on these weights and posteriors of the runs are provided in Section A.2 of the appendix). 8 3 Experiments

Simulation study
The GMJMCMC algorithm was evaluated in a simulation study divided into two parts. The first part considered three scenarios with binary responses and the second part three scenarios with quantitative responses. For each scenario we generated N = 100 datasets according to a model of the form (1) with n = 1000 observations and p = 50 binary covariates. The covariates were assumed to be independent and were simulated for each simulation run as X j ∼ Bernoulli(0.3) for j ∈ {1, . . . , 50} in the first two scenarios and as X j ∼ Bernoulli(0.5) for j ∈ {1, . . . , 50} in the last four scenarios. The Abel cluster (http://www.uio.no/english/services/ it/research/hpc/abel/) with 16 dual Intel E5-2670 (Sandy Bridge, 2.6 GHz.) CPUs and 64 GB RAM under 64 bit CentOS-6 were used for all of the computations. It is a shared resource for research computing capable of 258 TFLOP/s theoretical peak performance. Binary responses. The responses of the first three scenarios were sampled as Bernoulli variables with individual success probability π specified as follows: The first two scenarios with models including only two-way interactions were copied from Fritsch (2006) except that we deliberately did not specify the trees in lexicographical order. The reason for this is that for some procedures (like stepwise search) it might be an algorithmic advantage if the effects are specified in a particular order. The second scenario is slightly more challenging than the first one due to the smaller effect sizes. The third scenario is even more demanding with a model including three-way and four-way interactions. Effect sizes were accordingly increased to give sufficient power to detect these higher order trees.
For the binary response scenarios GMJMCMC was compared with FBLR and MCLR. For all three algorithms we predefined C max = 2 leaves per tree for Scenario 1 and 2 and C max = 5 for Scenario 3. The maximal number of trees per model was set to k max = 15 for GMJMCMC and FBLR whereas for MCLR it is only possible to specify a maximum of k max = 5. This is apparently due to the complexity of prior computations in MCLR. Apart from the specification of C max and k max we used for all 3 algorithms their default priors.
The GMJMCMC was run until up to 1.6 × 10 6 models were visited in the first two scenarios and up to 2.7 × 10 6 models were visited for the third scenario (divided approximately equally on 32 parallel runs). The length of the Markov chains for FBLR and MCLR were chosen to be 2 × 10 6 for the first two scenarios and 3 × 10 6 for the third scenario. Table 1: Results for the three simulation scenarios for binary responses. Power for individual trees, overall power, expected number of false positives (FP) and FDR are compared between FBLR, MCLR and parallel GMJMCMC (GMJ in table). All algorithms were tuned to have approximately the same computational resources used. In case of MCLR we can only provide upper bounds for the power and lower bounds for FP. We also report the total number of wrongly detected leaves (WL) over all simulation runs. To evaluate the performance of the different algorithms we estimated the following metrics: Individual power -the power to detect a particular true tree (a tree from the data generating model); Overall powerthe average power over all true trees; FP -the expected number of false positive trees; FDR -the false discovery rate of trees; WL -the total number of wrongly detected leaves. Further computational details are given in Section B.1 of the appendix.
A summary of the results for the first three simulation scenarios is provided in Table 1. In all three scenarios, MCLR performed better than FBLR, even when taking into account the positively biased summary statistics of MCLR (see section B.1 in the appendix). On the other hand, GMJMCMC clearly outperformed MCLR and FBLR both in terms of power and in terms of controlling the number of false positives. In the first two scenarios GMJMCMC worked almost perfectly. In the few instances where it did not detect the true tree it reported instead the two corresponding main effects. FBLR and MCLR were also good at detecting the true leaves in these simple scenarios, but GMJMCMC was much better in terms of identifying the exact logical expressions.
The third scenario is more complex than the previous ones but nevertheless GMJMCMC performed almost perfectly. Both FBLR and MCLR had severe problems to detect the true logic expressions and they also reported a considerable number of wrongly detected leaves. For a more in depth discussion of these simulation results we refer to Section B.1 of the appendix.
Continuous responses. Responses were simulated according to a Gaussian distribution with error variance σ 2 = 1 and individual expectations specified as follows for the different scenarios: Scenario 4 is similar to the first two scenarios for binary responses and contains only two-way interactions. The models of the last two scenarios both include trees of size 1 to 4, where scenario 5 has one tree of each size. Scenario 6 is the most complex one with two trees of each size, resulting in a model with 20 leaves in total.
For scenarios with Gaussian observations we could only study the performance of GMJM-CMC since the other approaches cannot handle continuous responses (MCLR has an implementation but that does not work properly). For these scenarios the settings of GMJMCMC were adapted to the increasing complexity of the model. We used k max = 15, 20 and 40 for the three scenarios, respectively, thus allowing for models of at least twice the size of the data generating model. Furthermore, the total number of models visited by GMJMCMC before it stopped was increased to 3.5 × 10 6 for Scenario 6. C max is set to 5 for all three of these scenarios. Otherwise all parameters of GMJMCMC were set as described for the binary responses. Table 2 summarizes the results. Scenario 4 illustrates that given a sufficiently large sample size GMJMCMC can reliably detect two-way interactions with effect sizes smaller than one standard deviation. In this simple scenario even the type I error was almost perfectly controlled with false discovery rate equal to 0.005 over all 100 simulation runs. This corresponds to one false discovery which interestingly was of the form X 1 ∧ X 4 ∨ X 8 ∧ X 11 which is equal to L 3 ∨ L 2 (here index of a tree corresponds to its order in the data generating model). One might argue to which extent such a combination of trees should actually be counted as a false positive, a question which is further elaborated in Section B.2 of the appendix.
The remaining two scenarios are a way more complex due to the higher order interaction terms involved. In Scenario 5 the power for all four true trees was also very large, though (as Table 2: Results for the three simulation scenarios for linear regression. Power for individual trees, overall power, expected number of false positives (FP) and FDR are given for parallel GMJMCMC. The four expressions in brackets for Scenario 6 are explained in the text.
S. 4 S. 6 X 5 ∧ X 9 1.00 expected) slightly smaller for the four-way interaction. Here also the probability of type I error increased with an FDR of 6%. However, the majority of false positive results were connected to detecting subtrees of true trees and in all simulation runs there were only 2 wrongly detected leaves.
For the last scenario we again observed large power for all true trees up to order three. For the final two expressions L 7 and L 8 of order four the results became slightly more ambiguous with power estimated to 0.32 and 0.21, respectively. However, among the false positive detections we very often found the expressions X 11 ∧ X 13 , X 19 ∧ X 50 as well as X 11 ∧ X 13 ∧ X 19 ∧ X 50 . In fact in 72 simulation runs all of these three expressions were detected and due to the fact that L 8 = X 11 ∧ X 13 + X 19 ∧ X 50 − X 11 ∧ X 13 ∧ X 19 ∧ X 50 one might consider these as true positives. The numbers in parentheses in Table 2 were based on taking such similarities into account, resulting in much higher power. Among the remaining 205 false positive detections more than two thirds were subtrees of true trees or trees with misspecified logical operators but consisting of leaves corresponding to a true tree. Thus again the vast majority of false detections points towards true epistatic effects where the exact logic expression was not identified.
Section B.2 of the appendix provides more details about the simulation results for the three linear regression scenarios.

Real data analysis
Arabadopsis. Balasubramanian et al. (2009) mapped several different quantitative traits (responses) in Arabidopsis thaliana using an advanced intercross-recombinant inbred line (RIL). Their data is publicly available as supporting information of their PLOS ONE article (Balasubramanian et al., 2009) which also gives all the details of the breeding scheme and the measurement of the different traits. We consider here only the hypocytol length in mm under different light conditions.
Genotype data is available for 220 markers distributed over the 5 chromosomes of Arabidopsis thaliana with 61, 39, 43, 31 and 46 markers, respectively. Balasubramanian et al. (2009) had genotyped 224 markers but we dismissed 4 markers which had identical genotypes with other markers. The amount of missing genotype data is relatively small with a genotype rate of 93.9 % and most importantly the data contains only homozygotes (AA:49.6% vs. BB:50.4%). This means that the RIL population contains no heterozygote markers and logic regression can be directly applied using the genotype data as Boolean variables. Missing data were imputed using the R-QTL package (http://www.rqtl.org/).
The imputed data was then analyzed with our algorithm GMJMCMC to detect potential epistatic effects and the results are summarized in Table 3. Under blue light Balasubramanian et al. (2009) reported 4 potential QTL's, the strongest one on chromosome 4 in the regions of marker X44606688 and three further fairly weak QTL on chromosomes 2, 3 and 5. Our analysis based on logic regression confirmed X44606688 and also detected those markers on chromosomes 2 and 5, though with a posterior probability slightly below 0.5. There was also some indication of a two-way interaction between the strong QTL on chromosome 4 and the QTL on chromosome 2.
Under red light the original interval mapping analysis reported the region of MSAT2.36 as a strong QTL on chromosome 2 and x44607889 as a weaker QTL on chromosome 1. Our logic regression analysis distributes the marker posterior weights on three different markers on chromosome 2 which are all in the neighborhood of MSAT2.36. Additionally there is some rather small posterior probability for an epistatic effect between this region and a marker on chromosome 1 which is somewhat close to x44607889.
Finally both for Far Red Light and for White Light our analysis essentially yielded the same results as the interval mapping analysis, when observing that under the first condition the posterior probability was again almost equally distributed between the neighboring markers MSAT4.37 In summary the sample size in this data set might be slightly too small to detect epistatic effects, although under the first two light conditions there was at least some indication for a two-way interaction.
Drosophila. As a second real data set we considered the Drosophila back cross data from Zeng et al. (2000). There one can also find a linkage map in centiMorgan for the markers on three different chromosomes. There are five quantitative traits available for each species (abbreviated as pc1, adjpc1, area, areat and tibia) which quantify the size and shape of the posterior lobe of the male genital arch. The original publication (Zeng et al., 2000) only includes results on the first measure pc1, which was later analyzed for epistatic effects using a model selection approach based on the Cockerham coding (Bogdan et al., 2008).
Compared with the Arabidopsis example this backcross data set has a much larger sample size combined with a smaller number of genetic markers, which both helps to increase the power to detect QTL. Genotype data from 45 markers is available for 471 samples from Drosophila Simulans and 491 samples from Drosophila Mauritana. Six markers are located on chromosome X, 16 markers on chromosome 2 and 23 markers on chromosome 3. Imputation of the few missing genotypes was performed by a simple maximum likelihood approach based on flanking markers. More details on the experiments and the measured traits can be found in Zeng et al. (2000).  Table 4 reports trees with posterior probabilities larger than 0.3 for the trait pc1 of Drosophila Simulans and compares with the model obtained with mBIC based forward selection by Bogdan et al. (2008). The logic regression approach detected most of the main effects also previously reported, which in itself is quite interesting because as we allowed for higher order interactions we looked at a much larger model space and used therefore implicitly larger penalties than mBIC. In two locations GMJMCMC preferred a neighboring marker (cg instead of gpdh on chromosome 2 and hb instead of rox on chromosome 3. In one region on chromosome 3 mBIC selected 2 markers (rdg, ninaE) whereas GMJMCMC selected only one marker in the middle. These kind of discrepancies are quite natural due to marker correlations in back cross data (Bogdan et al., 2008). Just like with the mBIC approach we detected a two-way interaction between chromosome 2 and chromosome 3, where on both locations the two methods chose neighboring markers, respectively. Otherwise the epistatic effect detected with both methods is identical. The detailed results for both Drosophila Simulans and Mauritana for all of the available traits (except for the case addressed in this section) are presented in Section C of the appendix.

Discussion
We have introduced GMJMCMC as a novel algorithm to perform Bayesian logic regression and compared it with the two existing methods MCLR (Kooperberg & Ruczinski, 2005) and FBLR (Fritsch, 2006). The main advantage of GMJMCMC is that it is designed to identify more complex logic expressions than its predecessors. Our approach differs both in terms of prior assumptions and in algorithmic details. Concerning the prior of regression coefficients in this article we relied upon Laplace approximation. Thus similarly to MCLR we do not need to specify any priors for the coefficients. The implementation of GMJMCMC allow to either work with conjugate priors or to make use of INLA (Rue et al., 2009;Hubin & Storvik, 2016a,b) when more general priors are desired or more accurate calculations of marginal likelihoods are needed. However, we believe that to detect more complex interactions one needs in any case rather large sample size and consequently Laplace approximations will work reasonably well.
GMJMCMC has the capacity to explore a much larger model search space than MCLR and FBLR because it manages to efficiently resolve the issue of not getting stuck in local extrema, a problem that both MCLR and FBLR have in common. In logic regression the marginal posterior probability function is typically multi-modal in the space of models with a large number of extrema which are often rather sparsely located. Additionally, the search space for logic regression is extremely large, where even computing the total number of models is a sophisticated task. As discussed in more detail in Hubin & Storvik (2016a), in such a setting simple MCMC algorithms often get stuck in local extrema, which significantly slows down their performance and convergence might only be reached after run times which are infeasible in practice.
The success of GMJMCMC relies upon resolving the local extrema issue, which is mainly achieved by combining the following two ideas. First, when iterating through a fixed search space S, GMJMCMC utilizes the MJMCMC algorithm (Hubin & Storvik, 2016a) which was specifically constructed to explore multi-modal regression spaces efficiently. Second, the evolution of the search spaces is governed within the framework of a genetic algorithm where a population consists of a finite number of trees forming the current search space. The population is updated by discarding trees with low estimated marginal posterior probability and generating new trees with a probability depending on the approximations of marginal inclusion probabilities from the current search space. The aim of the genetic algorithm is to converge towards a population which includes the most important trees. Finally the performance of GMJMCMC is additionally boosted by running it in parallel with different starting points.
Irreducibility of the proposals both for search spaces and for models within the search spaces guarantees that asymptotically the whole model space will be explored by GMJMCMC and global extrema will at some point be reached. Clearly the genetic algorithm used to update search spaces results in a Markov chain of model spaces. In the future it will be interesting to generalize the mode jumping ideas from Hubin & Storvik (2016a) to the Markov chain of search spaces, making it converge to the right limiting distribution in the joint space of models, parameters and search spaces, whilst remaining the property of not getting stuck in local modes.
Simulation studies confirm the property of GMJMCMC to find true logical expressions with high power and low false discovery rate. Whilst in the real data examples we have shown capability of GMJMCMC to find interesting epistatic effects in QTL analysis. The current implementation makes it more likely to include a set of simpler trees than more complicated ones and do not properly take into account that a complex tree can be represented in several equivalent ways. Taking such aspect into account can potentially give further improvements.
The package implementing both MJMCMC and GMJMCMC is freely available on GitHub at http://aliaksah.github.io/EMJMCMC2016/. There one can also find other examples of application of Logic Regression including prediction driven ones. In future we are going to extend GMJMCMC to a more general than Logic regression non-linear regression settings to be able to address the cases with continuous input covariates. Another direction of the further research is the notion of true and false positives in the context of Logic regression 1 .

APPENDIX A Details of GMJMCMC implementation
A.1 Operators of the genetic algorithm Let D t be the set of trees to be deleted from S t (selected with respect to the threshold ρ min on estimated marginal inclusion probabilities). Then ||D t || replacement trees must be generated instead. Each replacement tree is generated randomly by a crossover operator with probability P c and by a mutation operator with probability P m = 1 − P c . A reduction operator is applied if mutation or crossover gives a tree of larger size than C max . We will describe here how exactly the ||D t || replacement trees from the neighborhood N t of S t are generated. The variable d 1 used below is specified as the number of input leaves with approximation of the marginal inclusion probability (obtained from the first run of MJMCMC) above ρ min .
Crossover Two parent trees from S t are selected with probabilities proportional to the approximated marginal inclusion probabilities of trees in S t . Then each of the parents is inverted with probability P not by the logical not c operator, before they are combined with a ∧ operator with probability P and and with a ∨ operator otherwise.
Mutation One parent tree is selected from S t with probability proportional to the approximated marginal inclusion probabilities of trees in S t , whilst the other parent tree is selected uniformly from the set of m − d 1 leaves which did not make it into the initial population S 1 . Then just like for the crossover operator each of the parents is inverted with probability P not by the logical not c operator, before they are combined with a ∧ operator with probability P and and with a ∨ operator otherwise.
Reduction A new tree is generated from a tree by deleting a subset of leaves, where each leave has a probability of 1/2 to be deleted. The pruning of the tree is performed in a natural way meaning that the 'closest' logical operators of the deleted leaves are also deleted. If the deleted leave is not on the boundaries of the original tree the operation is resulting in obtaining two separated subtrees. The resulting subtrees are then combined in a tree with a ∧ operator with probability P and or with a ∨ operator otherwise.
For all three operators it holds that if the newly generated tree is already present in S t then it is not considered for S t+1 but rather a new replacement tree from the neighborhood N t of S t is proposed instead.

A.2 Details of weights and posteriors of parallel runs of GMJMCMC
Posterior marginal probabilities of models of interest from individual runs b are computed as follows: where Ω * b is the set of unique models visited in run b. Similarly, posterior probabilities of some quantity ∆ is computed bỹ Due to the irreducibility of the GMJMCMC procedure lim k→∞P (∆ | Y ) = P (∆ | Y ) holds where k is the number of iterations. Thus for any set of normalized weights the approximatioñ P (∆ | Y ) converges to the true posterior probability P (∆ | Y ). Therefore in principle any normalized set of weights w b would work, like for example w b = 1 B . However, uniform weights have the disadvantage to potentially give too much weight to posterior estimates from chains that have not quite converged. In the following heuristic improvement w b is chosen to be proportional to the posterior mass detected by run b, This choice indirectly penalizes chains that cover smaller portions of the model space. When estimating posterior probabilities using these weights we only need, for each run, to store the following quantities:P b (∆ | Y ) for all statistics ∆ of interest and s b = M ∈Ω * b P (Y | M )P (M ) as a 'sufficient' statistic of the run. There is no further need of data transfer between processes.

A.3 Tuning parameters
Default tuning parameters of MJMCMC (aliaksah.github.io/EMJMCMC2016/) were used in all of the simulations and in real data analysis. For the final population S Tmax , MJMCMC is run until M f in unique models are visited (within S Tmax ). M f in should be sufficiently large to obtain good MJMCMC based approximations of the posterior parameters of interest based on the final search space S Tmax . Table A1 presents tuning parameters of GMJMCMC (not related to MJMCMC but rather to the genetic algorithm part) which were used in the different simulation scenarios and for real data analysis, respectively. Table A1: Tuning parameters of GMJMCMC in the different examples (simple digits refer to the simulation scenario, RD1 refers to the Arabidopsis data analysis, RD2 to the Drosophila data analysis); Threads -the number of CPUs utilized within the examples; N init -the number of steps of MJMCMC during initialization; N expl -the number of steps of MJMCMC between changes of population; M f in -the number of unique models visited by MJMCMC for the final population; T max -index of the final population; ρ min -threshold for the trees to be deleted; P and -probability of an and operator in crossovers and mutations; P not -probability of using logical not in crossovers and mutations; P c -probability of crossover to propose replacement trees; C max -maximal tree size allowed; k max -maximal number of trees allowed in a model; d -size of the population of trees (search space).
Example Threads N init N expl M f in T max ρ min P and P not P c C max k max d

B Details of Simulation Results
In this section we present further information on the simulation results of our six scenarios.

B.1 Binary Response
In case of GMJMCMC and FBLR a tree was counted as detected if its corresponding posterior probability was larger than 0.5. The power to detect a true tree is estimated by the percentage of simulation runs in which it was detected. The overall power is then defined as the average power over all individual true trees. A detected tree was counted as true positive if it was logically equivalent to a tree from the data generating model or to its logical complement, otherwise it was counted as false positive. FP denotes the average, over simulation runs, number of false positive detections and FDR was estimated as the average, over simulation runs, proportion of false discoveries, where this proportion was defined to be zero if there were no detections at all. WL is the number of binary covariates (leaves) which were not part of the data generating model but part of at least one detected tree. Unfortunately, the output delivered by MCLR does not allow to compute the performance measures in the same way. Whenever MCLR detects a tree of size s then all subtrees are also reported as being detected. Furthermore MCLR reports for each detected tree only the set of leaves v(L) and not the exact logical expression L itself. Thus it becomes impossible to define true positives by comparing the reported trees directly with the trees from the data generating model. Instead we considered for MCLR a reported tree L as a true positive whenever v(L) coincided with the set of leaves of a true tree. This definition only gives an upper bound for the achieved power and is strongly biased in favor of MCLR. For the same reason, any reported tree that was a subtree of a true tree was not considered to be a false positive, resulting in only lower bounds of FP and FDR which are again strongly biased in favor of MCLR. Table A2 gives details about the frequencies of trees detected by the different methods. The first three lines give for each scenario the frequency with which the three true trees in each scenario were detected. Here L j , j ∈ {1, 2, 3} refers to the following expressions: Scenarios 1 and 2: L 1 = X c 1 ∧ X 4 , L 2 = X 8 ∧ X 11 , L 3 = X 5 ∧ X 9 Scenario 3: L 1 = X 2 ∧ X 9 , L 2 = X 7 ∧ X 12 ∧ X 20 , L 3 = X 4 ∧ X 10 ∧ X 17 ∧ X 30 All further detected trees are per definition false positives. However, we considered different classes of false positives. The first class of false positives are trees which are comprised exclusively of leaves from a true tree L j , typically subtrees or trees with a different logic expression. Based on the output of MCLR it is not possible to determine the frequencies of this kind of false positive detections as we will discuss in the next paragraph. In case of FBLR and GMJMCMC Table A2 provides the frequency of this kind of trees in the rows labeled v(L j ). For Scenario 1 and 2 we actually provide more detailed information. Here all true trees are of size 2 and almost all detected trees of the class v(L j ) consisted of single leaves (the only exception was two instances of the expression X c 8 ∧ X 11 in Scenario 1 for FBLR). We therefore explicitly present the number of detections of the first leave and the second leave of L j . The v(M ) rows give the number of trees combining leaves from different true trees. Finally the rows WL(s) are concerned with the number of trees which include s leaves which were not in the data generating model at all.
In case of MCLR the same sort of classification is not possible due to the fact that MCLR does not report the exact logical tree L that it detects but only the corresponding set of leaves v(L). Furthermore MCLR automatically reports the set of leaves for all subtrees of any detected tree which makes an assessment on how often these subtrees were actually detected by MCLR impossible. As a consequence we simply discarded reported subtrees when computing summary statistics, with one exception. In case of Scenario 3 MCLR reports 40 supertrees (trees for which a tree of interest is a subtree) of L 1 , which we classified as false positives themselves but which in 23 principle play an important role for the determination of the power to detect L 1 . We ignored the fact that for any detected supertree of L 1 MCLR automatically also reports L 1 itself as detected and we pretended that in all these cases MCLR would actually have detected L 1 itself. Another peculiarity of MCLR is that it allows to search for trees of size 4, but that it does not report if it detected any such trees. In case of the four-way interaction L 3 from Scenario 3 there were 19 simulation runs where MCLR reported all four subtrees of size 3 from L 3 and we counted those instances as true positives, although MCLR did not really report the correct four-way interaction. For Scenario 1 and 2 none of these problems with supertrees occurred for MCLR because we restricted the search to trees of maximal size 2 in accordance with the data generating model.
The first two scenarios include only two-way interactions and we observe that GMJMCMC worked almost perfectly. In the few instances where it did not detect the correct tree it reported instead the two corresponding main effects, resulting in a total of 25 and 38 false positive trees for the two scenarios (corresponding to an average of 0.25 and 0.38 false positives within each simulation, see Table 1 of the main manuscript). FBLR chose in almost two thirds of the simulation runs two main effects instead of the correct interactions. The majority of the remaining false positives combined leaves from different true trees but there was also for each scenario one expression with a wrongly detected leave, respectively. In contrast MCLR reported in approximately two thirds of the cases trees with the correct leaves resulting in larger power than for FBLR. On the other hand MCLR reported a much larger number of trees which combined leaves from different true trees than FBLR. MCLR reported only one tree with a wrong leave in Scenario 2 and no such tree in Scenario 1. In summary we conclude that all three methods were doing extremely well in detecting the correct leaves in these simple scenarios but GMJMCMC was better than FBLR and MCLR in identifying the exact logical expressions.
The conclusion above is even more pronounced in the third scenario, which is more complex than the previous scenarios but still allows GMJMCMC to perform almost perfectly. It detected both the two-way interaction L 1 and the four-way interaction L 3 with a power of 100%, and had only some minor difficulties to detect the three-way interaction L 2 . From the 15 false positive detections the majority consisted of subtrees of L 2 reported in those simulation runs where L 2 itself was not detected. Five trees were combinations of leaves from different true trees and there was only one tree including a leave which was not part of the data generating model. In comparison, both MCLR and FBLR performed much worse and only managed to detect L 1 with fairly large power. FBLR completely failed to detect the higher order terms L 2 and L 3 whereas MCLR had at least some power to detect the three-way interaction L 2 . Both approaches reported way more false positive trees than GMJMCMC.
For FBLR we can discuss the structure of false positive detections in more detail. A large number of false positive expression were comprised of leaves from single true trees, 20 for v(L 1 ), 195 5 WL(1) 34 54 1 WL(2) 16 9 0 WL(3) 8 0 0 25 162 for v(L 2 ) and 233 for v(L 3 ). These expressions were either subtrees of true trees or trees with misspecified logical operators and can be seen as substitutes for the true trees. Furthermore there were 167 logical expressions which combined leaves from different trees. Additionally, FBLR reported 34 trees with one wrongly detected leave, 16 trees with two wrongly detected leaves and even 8 trees of size three for which all leaves were not part of the data generating model. Thus apart from having problems with determining the exact form of the logical expressions in this scenario FBLR produced also a large number of false positive trees which have nothing to do with the correct model at all. The performance of MCLR was only a little bit better. With respect to the results presented in Table 1 of the main manuscript it is now even more important than for the first two scenarios to emphasize that we are dealing with upper bounds of the power and lower bounds of the number of false positives. MCLR automatically reports all subtrees of any detected tree which makes an assessment how often these tree were actually detected by MCLR impossible. As a consequence we simply discarded reported subtrees from further statistical analysis, with one exception. In case of Scenario 3 MCLR reports some supertrees of L 1 , which we classified as false positives themselves but which in principle played an important role for the determination of the power of L 1 . We ignored the fact that for any detected supertree of L 1 MCLR automatically also reports L 1 itself as detected and pretend that in all these cases MCLR would actually have detected L 1 itself. Another peculiarity of MCLR is that it allows to search for trees of size 4, but that it does not report if it detected any such trees. In case of the four-way interaction L 3 there were 19 simulation runs where MCLR reported all four subtrees of size 3 from L 4 and we counted those instances as true positives, although MCLR did not really report the correct four-way interaction. For Scenario 1 and 2 none of these problems with supertrees occurred for MCLR because we restricted the search to trees of maximal size 2 in accordance with the data generating trees.
Not counting any subtrees of reported trees as false positives gives MCLR a huge advantage, nevertheless it reported almost 20 times more false positive expressions than GMJMCMC. Among those were 40 supertrees of L 1 , which all contributed to the power of L 1 although it is not guaranteed that in all corresponding simulation runs L 1 itself was actually detected. There were 195 false positive trees which combined leaves from different true trees. It was more problematic that there were 54 trees with one wrongly detected leave and 9 trees with two wrongly detected leaves. While there were not as many trees which were completely wrong as for FBLR there were still a considerable number of leaves reported by MCLR which were not part of the data generating model. Table A3 gives detailed results about the frequencies of detected trees similarly to Table A2 but now for the three linear regression scenarios. At the beginning we have again for each scenario the number of true positives with L j referring to the following expressions:
There is not much to be said about Scenario 4 apart from the fact that the only false positive detection L 2 ∨ L 3 was very close to the expression L 2 + L 3 of the data generating model. In Scenario 5 GMJMCMC always detected the first two true trees and detected no false positive subtrees of L 2 . 12 false positive trees comprised of leaves from L 3 including subtrees of size 1, size 2 and trees containing wrong logical operators, like for example X 7 ∧ X 12 ∨ X 20 . 22 false positive trees comprised of leaves from L 4 , again including subtrees and trees with a misspecified ∨ operator. One false positive detection X 2 ∧ X 30 wrongly combined leaves from two different true trees. Finally there were two trees of size 4 and size 5, respectively, each of which included the wrongly detected leave X 43 .
For the most complex Scenario 6, among the remaining 205 false positive detections more than two thirds were subtrees of true trees or trees with misspecified logical operators but consisted of leaves from a single true tree. 58 false positive trees combined leaves from different true trees and only 3 false positive trees included wrongly detected leaves.

C Real Data Analysis on Drosophila
The results for the analysis of trait pc1 of Drosophila Mauritana are reported in Table A4. The results for the other four traits (adjpc1, area, areat and tibia) which were neither analyzed by Zeng et al. (2000) nor by Bogdan et al. (2008) are provided in the following two tables. In case of Drosophila Simulans we detected three two-way interactions for adjpc1. For Table A3: Detailed results for the three simulation scenarios for linear regression. A detailed description of the different classes of false positives (v(L j ), v(M ), WL(1)) is given in Section 2.1. In Scenario 4 there was only one false positive detection which is listed explicitly. In Scenario 6 frequencies of the three trees which in combination give L 8 are also listed explicitly.

Scenario 4
Scenario 6 L 1 100 Drosophila Mauritiana further two-way interactions were found; two for adjpc1, three for area, and two more for areat. We did not find higher order interactions for any of these traits. Table A4 contains the corresponding results for Drosophila Mauritana. As before GMJM-CMC detected most of the additive effects that were reported by mBIC, though it sometimes chose flanking markers (ve and dbi instead of acr, tub instead of hb). Interestingly the marker ewg on the X-chromosome was not reported as a main effect but rather as a two-way interaction together with v also on the X-chromosome, which also showed up as an additive effect. On the other hand the two-way interactions obtained with mBIC were not confirmed. Instead of the interaction between fz and hb GMJMCMC reported additional main effects on fz and rox (the neighbor of hb). For the interaction between w and fas there were no substitutes detected.

D Additional discussion
One important question in the context of logic regression is concerned with how to define true positive and false positive detections in simulations. We adopted a rather strict point of view which might be called an 'exact tree approach': Only those detected logic expressions which were logically equivalent with trees from the data generating model were counted as true positives. While this seems to be a natural definition there are certain pitfalls and ambiguities that occur in logic regressions which might speak against this approach. Apart from the more obvious logic equivalences according to Boolean algebra, for example due to De Morgan's laws or the distributive law, there can be slightly more hidden logic identities in logic regression. For example the expressions (X 1 ∨ X 2 ) − X 1 and X 2 − (X 1 ∧ X 2 ) give identical models. We have seen a less trivial example including three-way interactions in Scenario 6 of our simulation study, where the data generating tree L 8 is equivalent to the expression consisting of three trees X 11 ∧ X 13 + X 19 ∧ X 50 − X 11 ∧ X 13 ∧ X 19 ∧ X 50 . Furthermore, different logic expressions can be highly correlated even when they are not exactly identical. Especially the results from the most complex Scenario 6 impose the question if the exact tree approach is slightly too strict to define false positives. Subtrees of true trees give valuable information even if they are not describing the exact interaction. Often combinations of several subtrees and trees with misspecified logical operators can give expressions which are very close to the correct interaction term. For Scenario 6 we reported two possible summaries of the simulation results, one based strictly on the exact tree approach and the other one counting detections of X 11 ∧ X 13 + X 19 ∧ X 50 − X 11 ∧ X 13 ∧ X 19 ∧ X 50 also as true positives. This was slightly ad hoc and we believe that good reporting of logic regression results is an area which needs further research. The output of MCLR takes a step in that direction, where only the leaves of trees are reported and if a tree has been detected then also all its subtrees are reported. However, in our opinion MCLR throws away too much information. We believe that several different layers of reporting might be more desirable, for example the exact tree approach, the MCLR approach and then something in between which does not reduce trees completely to their set of leaves. We have started to think more systematically in that direction and leave this topic open for another publication.