Bayesian Concentration Ratio and Dissonance

We propose two new classes of Bayesian measure to investigate conflict among data sets from multiple studies. The first (“concentration ratio”) is used to quantify the amount of information provided by a single data set through the comparison of the prior and its posterior distribution, or two data sets according to their corresponding posterior distributions. The second class (“dissonance”) quantifies the extent of contradiction between two data sets. Both classes are based on volumes of highest density regions. They are well calibrated, supported by simulation, and computational algorithms are provided for their calculation. We illustrate these two classes in three real data applications: a benchmark dose toxicology study, a missing data study related to health effects of pollution, and a pediatric cancer study leveraging adult data.


Introduction
With the development of modern information technology, the need to compare and combine data across multiple studies has never been more important. For example, the ability to pool data and perform integrative data analysis is particularly important and timely in the addiction sciences (Conway et al., 2014). In phylogenomics, there is a need for both measuring the amount of information provided by sequence data from individual genes and quantifying the degree of gene tree conflict (Lewis et al., 2016). In meta-analysis, it is important to quantify the extent of heterogeneity among a collection of studies (Higgins and Thompson, 2002;Higgins et al., 2003). In pediatric drug development, difficulties arise in clinical trials due to a low incidence of the disease in children and many substantial challenges, including economic, logistical, technical, and ethical barriers. Recent attention has increasingly focused on extrapolation; that is, leveraging available data from adults or older age groups to draw conclusions for the pediatric population (Schoenfeld et al., 2009;Gamalo-Siebers et al., 2017;Lakshminarayanan and Natanegara, 2019).
When drawing inferences from various data sources, it is crucial to judge whether the data are compatible. Divergence is often used to establish the distance of one probability distribution from another as a means of assessing "incompatibility" in statistics and information geometry. Many kinds of divergence have been proposed. For exam-ple, f-divergence (Ali and Silvey, 1966;Csiszár and Shields, 2004;Morimoto, 1963) (including the Kullback-Leibler divergence (Kullback and Leibler, 1951) and Hellinger distance (Hellinger, 1909)), Rényi's divergence (Rényi, 1961), and Bregman divergence (Bregman, 1967). Many divergence measures are not easy to compute empirically and it is hard to tell when the conflict is severe. In the Bayesian setting, Shi (2017) defined a partition-based measure to quantify the compatibility of two data sets based on their posterior distributions. Methods for detecting and measuring conflict using pvalues include Marshall and Spiegelhalter (2007), Gåsemyr and Natvig (2009), Evans and Moshonov (2006), Presanis et al. (2013), and Nott et al. (2020). While p-value is a single summary, to extend its scope, we propose two new classes of measures: concentration ratio measure focusing on scale change and dissonance measure focusing on location shift. Both measures are easy to interpret and can be computed for any multi-modal, skewed, or complex distribution. Lindley (1956) defined the information provided by an experiment as the difference between the posterior and prior differential entropies. Information is the resolution of uncertainty (Shannon, 1948); it reduces the set of values deemed plausible. A 100(1−α)% Bayesian credible region provides one way to define a set of plausible values after the data are collected. When a distribution is not symmetric or unimodal, a 100(1 − α)% highest density region (HDR) is more desirable than other regions, as every point inside the HDR has higher density value than every point outside.
We propose two novel classes of Bayesian measures based on HDRs. LCR(α) (Logarithm of the Concentration Ratio) is defined as the log-ratio of the volume of two HDRs at level α. LCR(α) indicates whether one distribution is more dispersed than the other, however, it cannot detect conflict caused by a location shift. Therefore D(α) (dissonance) is introduced as the fraction of the volume of the smaller HDR that is not overlapping with the larger HDR at the α level.
Computing HDR volume is required for both measures. In a one-dimensional case, the HDR consists of the highest density interval and thus the volume is measured by the interval length. Chen and Shao (1999) provided a Monte Carlo method to estimate the HD interval for a unimodal distribution. For two or more dimensions, the computation of the HDR is more complicated, so is the estimation of its volume. Tanner (1996) provided an algorithm to calculate the content and boundary of the HDR using a normal approximation. However, Tanner's algorithm does not work well when the distribution is multi-modal or skewed, and computation of the volume of the HDR remains unaddressed. To overcome such difficulties, we propose two two-stage Monte Carlo algorithms to first estimate the HDR and then compute its volume. The algorithms are then applied to calculating LCR(α) and D(α). Simulation studies are provided in Appendix C in Supplementary Material (Shi et al., 2021) to examine the empirical performance of the algorithms.
The article is organized as follows. Section 2 presents a detailed development of LCR(α) and D(α). We propose two two-stage Monte Carlo algorithms to estimate the volume of a HDR in Section 3.1 and use them to calculate LCR(α) and D(α) in Section 3.2. To gain further insight, we compare our dissonance measure to conflict p-value approaches in Evans and Moshonov (2006) and Presanis et al. (2013) for normal location and binomial problems in Section 4. We display the results of concentration ratio measure and dissonance measure for several scenarios, including location shift, scale change, tail change, multi-modal and skewed distributions in Section 5. In Section 6, the proposed LCR(α) measure and the integrated D(α) measure are applied to three studies: (1) a benchmark dose toxicology study, (2) evaluating missing data information in a health effect study on respiratory function for children, and (3) investigating how to leverage adult data to analyze pediatric data. HDR has been criticized for being not invariant under reparametrization, so we explore an invariant HDR in Section 7. We conclude the article with a brief discussion in Section 8.

HDR
For a parameter θ with the probability density function f (θ), the 100(1 − α)% highest density region (HDR) is the subset of the parameter space Θ such that Recognizing that log-concave distributions are frequently encountered in practice, we provide an important property of a HDR for a log-concave distribution in the following theorem.
Definition 1. Let the concentration ratio CR(α) = V (R 1 (α))/V (R 2 (α)). The logarithm of the concentration ratio of the first distribution to the second distribution is defined as For a fixed α, LCR(α) measures the volume change between two HDRs. When the first distribution is the prior distribution and the second one the posterior distribution, LCR(α) quantifies the amount of information provided by the data beyond our prior belief. When the posterior distribution is identical to the prior, the ratio CR(α) = 1 and thus LCR(α) = 0 (Figure 1(a)). The set of plausible values remains the same after the data are considered, which means no concentration change is provided by the data in addition to the prior knowledge. When the volume of the posterior HDR V (R 2 (α)) is smaller than V (R 1 (α)), the ratio CR(α) > 1, which results in positive LCR(α) (Figure 1(b)). In this case, the data effectively exclude some outcomes which are considered plausible by the prior. Finally, data may make us less certain about the value of the parameter, increasing the size of the set of plausible values specified by the prior. This represents negative LCR(α) (Figure 1(c)).
where χ 2 k,α is the upper 100αth percentile of the chi-square distribution with k degrees of freedom. From Appendix A in Supplementary Material (Shi et al., 2021), the volume of (2.2) is Thus, when comparing two normal distributions N k (μ 1 , Σ 1 ) and N k (μ 2 , Σ 2 ), While LCR(α) compares the dispersion of each distribution, it cannot capture conflict caused by a location shift. Therefore, we introduce a dissonance measure, D(α), which complements LCR(α) by focusing on how far apart the two distributions are.
Definition 2. For R 1 (α) and R 2 (α), let R min (α) denote the region with a smaller volume and R max (α) denote the larger one. The dissonance D(α) is measured as the fraction of the volume of the smaller HDR that is not overlapping with the larger HDR, i.e., As D(α) is defined as the fraction of conflict in terms of the volume of the smaller HDR, the range of D(α) is [0, 1]. When the smaller HDR R min (α) entirely falls within the larger HDR R max (α), D(α) = 0, referred as 'no dissonance' (Figure 2(a)). When there is partial overlap of two HDRs (Figure 2(b)), referred as 'partial dissonance'; D(α) is between 0 and 1. When two HDRs are disjoint, D(α) = 1, indicating complete dissonance (Figure 2(c)).

Summarizing Concentration Ratio and Dissonance over Different Content Levels
The measures LCR(α) and D(α) are related to the content level α. LCR(α) is relatively stable over multiple choices of α. We show in (2.3) that when comparing two normal distributions, LCR(α) is actually unrelated to α. In our first real data analysis (Section 6.1), we present results of LCR(α) for 5 choices of α to illustrate the stability of LCR(α) empirically.
In contrast, D(α) is very sensitive to the choice of α. In the partial dissonance case as in Figure 2(b), it is trivial to find a content level α leading to complete dissonance and thus the measure D(α) over different α values yields differing conclusions. As D(α) is a function of α and bounded by 0 and 1, we plot the curve of D(α) versus α and summarize the extent of conflict using the area under the curve, denoted as D-AUC. The formal definition of D-AUC is given as follows: Definition 3. D-AUC is the area under the D(α) curve, i.e., It is a summary of D(α) over all possible choices of α and measures the degree of conflict between two distributions.
We can revisit the three examples in Figure 2 to better understand D-AUC. In the case of no dissonance, D(α) is always 0 over different values of α as the smaller HDR is inside the larger HDR for whatever content level. For partial dissonance, D(α) starts with value 0 when α = 0, i.e., the 100% HDRs of two distributions are both the whole parameter space and thus no dissonance. As α increases, that is when we compare 99%, 98%, . . . HDRs, we begin to have partial non-overlapping regions and thus D(α) has a value between 0 and 1. At a certain content level α, two HDRs become disjoint and D(α) equals 1 and stays at 1 thereafter. In the extreme case of complete dissonance, D(α) = 1 for all values of α > 0, resulting in D-AUC = 1.
In conclusion, the range of D-AUC is also [0, 1], with smaller value suggesting compatibility and larger value indicating contradiction. The interpretation is consistent with that of D(α) with a fixed α. Moreover, D-AUC avoids the necessity for an arbitrary choice of α. In our real data applications, we use D-AUC to report the results of dissonance.

Prior-Data Comparison and Data-Data Comparison
We have defined LCR(α) and D(α) in very general terms, so they can be applied to comparing prior and data, or two data sets.
In the context of prior-data comparison, in addition to comparing prior versus posterior distribution, we can also compare prior and likelihood by converting the likelihood function to a posterior distribution through a noninformative prior. More details are given in Section 4.
For the data-data comparison, consider two data sets S 1 and S 2 from similar studies with common parameter θ defined on the same parameter space Θ. We use the same prior distribution for θ for each data set and compare the two data sets based on their posterior distributions.
Letting R 1 (α) and R 2 (α) be the posterior HDRs for S 1 and S 2 , respectively, LCR(α) defined in Definition 1 compares the concentration of S 1 to that of S 2 . Consequently, LCR(α) = 0 means that S 1 and S 2 are equally concentrated; LCR(α) > 0 indicates that S 2 is more concentrated than S 1 ; and LCR(α) < 0 implies that S 2 is less concentrated than S 1 .
To construct a dissonance measure D(α) between these two data sets, we can apply Definition 2 with R 1 (α) and R 2 (α) specified to be the posterior HDRs for S 1 and S 2 , respectively. Definition 3 for D-AUC still applies in this case.
Remark 2. We expect LCR(α) for comparing two data sets to be robust to the prior choice, given that it is evaluated from the "central" region of each posterior distribution. D(α) is more complicated. Although D(α) is also evaluated from two posterior distributions, the numerator of (2.4) depends on the shapes of each distribution, so this measure is not expected to be robust to the prior choice. Therefore, we have used diffuse priors in real data applications to focus more on the data.

Computation of the Volume of an HDR
A two-stage Monte Carlo algorithm is developed to estimate the volume of the 100(1 − α)% HDR in (2.1). It is often difficult to calculate f α in (2.1) analytically. Therefore, in the first stage, we obtain an approximation to f α through a Monte Carlo technique. Let ψ = f (θ), which is obtained by transforming θ by its own density function. Then f α is the 100α th percentile of ψ since it satisfies Pr(f (θ) ≥ f α ) = 1 − α. Therefore f α can be estimated by the corresponding sample quantile of ψ, denoted asf α . The approximated 100(1 − α)% HDR is therefore defined asR(α) = {θ ∈ Θ : f (θ) ≥f α }. According to Hyndman (1996), when the sample size goes to infinity,f α → f α and soR(α) → R(α). In the second stage, we employ a needle-dropping technique to estimate the volume. We first construct a hyper-rectangle with known volume that covers the approximated regionR(α), generate a large number of random points uniformly from the hyperrectangle region and count the proportion of points that fall withinR(α) to estimate the volume. Again, as the number of points increases, the estimated volumeV (R(α)) approaches the actual volume ofR(α). Detailed steps are summarized in Algorithm 1.
When the distribution is approximately normal, we can improve the algorithm in the second stage using a hyper-ellipsoid region. Since an HDR of a multivariate normal distribution is a hyper-ellipsoid (2.2), we construct a hyper-ellipsoid that just covers the approximated regionR(α). Given the estimated mean (ẑ) and covariance matrix (S), the approximated hyper-ellipsoid is {z : This hyperellipsoid may not be large enough to contain all points known to be in the HDR, so the lengths of its semi-axes are increased by 2% and the constructed hyper-ellipsoid An expansion factor of 2% (1.02) is arbitrary, but the expansion factor may be determined in any particular case by finding the inflection point in a plot of estimated HDR volume against expansion factor. The estimated HDR volume increases linearly while the expansion factor is too small, then flattens when the hyper-ellipsoid defined by the expansion factor is large enough to contain the true HDR. Similarly, we generate uniformly distributed random points from this hyper-ellipsoid region using the method in Dezert and Musso (2001) and calculate the proportion falling withinR(α). Algorithm 2 gives the detailed steps.
Algorithm 1 Computing the volume of a HDR using a hyper-rectangle region.

Remark 3. The density function f (·) in Algorithm 1 is only required to be known up to a normalizing constant.
Algorithm 2 Computing the volume of a HDR using a hyper-ellipsoid region. Follow the same steps as in Algorithm 1, except steps 3 and 5 are replaced as follows: 3 . Obtain the sample mean (ẑ) and sample covariance matrix ( -Generate -Compute the Cholesky decomposition of S, where S = TT .
-Compute and return z i = 1.02 × χ 2 k,α x i +ẑ. 5 . The estimated HDR volume is given bŷ In Appendix B in Supplementary Material (Shi et al., 2021), we explain why we drop needles uniformly to estimate the volume.
We evaluate the performance of both algorithms using simulation studies for a bivariate normal distribution and a 5-dimensional normal distribution and the details are given in Appendix C in Supplementary Material (Shi et al., 2021). Our simulation results show that, when T is large enough, increasing N improves performance, and, when N is large enough, increasing T also improves the performance. For the bivariate normal case, the simulation study suggests we need at least T = 10 4 and N = 10 4 for the hyper-rectangle method (Algorithm 1) and at least T = 10 4 and N = 10 2 for the hyper-ellipsoid method (Algorithm 2). For the 5-dimensional normal case, at least T = 10 4 and N = 10 6 are required for the hyper-rectangle method and at least T = 10 4 and N = 10 3 are required for the hyper-ellipsoid method. As expected, Algorithm 2 produces more accurate results than Algorithm 1 for both normal distributions.

Calculating Concentration Ratio and Dissonance
We now list the steps required to compute our measures using the algorithms of the previous section.
Denote the corresponding approximated cut-off value asf α,1 and the needledropping sample as {z i,1 }.
Denote the corresponding approximated cut-off value asf α,2 and the needledropping sample as {z i,2 }.
Step 3. Calculate LCR(α), which is Step 4. After evaluation, we know which HDR is smaller.

Comparing Dissonance and Conflict P-Values
To better understand our dissonance measure D-AUC, we compare it with two conflict p-value approaches in detecting prior-data conflict. Evans and Moshonov (2006) explored prior-data conflict using the p-value of a minimal sufficient statistic with its prior-predictive distribution as the reference. Presanis et al. (2013) discussed another type of conflict p-value using a node splitting approach that evaluated a conflict locally at a particular node or group of nodes in a directed acyclic graph. The evidence about θ is split into two parts, resulting in two independent distributions p(θ a |y a ) and is a Jeffreys' transformation, and calculated the p-value as the probability that the density of δ is smaller than that at 0, i.e., p Presanis = Pr(p δ (δ|y a , y b ) < p δ (0|y a , y b )).
In this section, we compare our p-values of D-AUC (prior/posterior) and a variation of D-AUC (prior/likelihood), denoted byD-AUC, to the above Bayesian conflict p-values with a normal location example first, then a binomial success probability example.
The prior predictive distribution of y is N (μ 0 , σ 2 +σ 2 0 ) with density denoted by p(y). The Bayesian conflict p-value of Evans and Moshonov (2006), denoted by p EM , is Instead of evaluating the conflict between prior and posterior, we can compare prior and likelihood by converting the likelihood to a posterior distribution through a noninformative prior, using the setup by Presanis et al. (2013): In this case, it is the same as p EM .
Similarly, the p-value ofD-AUC is with Y ∼ N (μ 0 , σ 2 + σ 2 0 ) and y as the observed data. The two p-value methods, p EM and p Presanis , are identical in this example, so they are plotted using one p-value curve. Plotting p-value (EM, Presanis et al.), p-value of D-AUC, and p-value ofD-AUC as a function of y shows that three curves are identical for several choices of σ and σ 0 ( Figure 3 shows the result of σ 0 = 1, σ = 1). This suggests that our dissonance check is equivalent to that of Evans and Moshonov (2006) and Presanis et al. (2013) in this example.
As mentioned in Nott et al. (2020), if two discrepancy statistics A(y) and B(y) are related as a function of y by a monotone transformation, denoted by A(y) . = B(y), they will result in the same predictive p-values. Here

Binomial Model
A binomial model has been applied to events consisting of two possible outcomes, for example, cured or not cured in clinical trial, success or failure in a designed experiment. Binomial models are common in disease monitoring, drug development, opinion poll, quality control, etc.
For n independent trials, assume the probability of a success in each trial remains constant at θ. Let y denote the number of successes in n independent Bernoulli trials. Assuming n is known, consider y ∼ Bin(n, θ), a binomial distribution with parameter θ. For a Bayesian analysis, further assume the Beta(a, b) distribution as the prior for θ. Then the posterior for θ|y ∼ Beta(a + y, b + n − y).
The prior predictive density for y is beta-binomial, The Bayesian conflict p-value of Evans and Moshonov (2006) is data-translated likelihood: θ a ∼ Beta(1 + y, 1 + n − y), assuming that the distribution of δ is roughly symmetric and unimodal. In this case, p Presanis is calculated using MCMC samples by counting the number of times θ a > θ b .
When comparing prior Beta(a, b) versus posterior Beta(a + y, b + n − y), the p-value of D-AUC in this case is p D-AUC = Pr(D-AUC(Y ) ≥ D-AUC(y)), where Y follows the prior predictive distribution with pmf p(y) and y is the observed data.
Similarly, when comparing prior Beta(a, b) and likelihood Beta(1 + y, 1 + n − y), the p-value ofD-AUC is pD -AUC = Pr(D-AUC(Y ) ≥D-AUC(y)), with Y from the prior predictive distribution and y as the observed data.
Assuming n = 10, we have plotted the above 4 p-values (p EM , p Presanis , p D-AUC and pD -AUC ) for each y and 2 prior choices in Figure 4. For most y values, the p-value of D-AUC is the same as the p-value ofD-AUC and is close to either p EM or p Presanis .

Numerical Results from Comparing Two Distributions
To better understand how the two measures behave, we design the following experiments to explore various scenarios between the two distributions, which are inspired by Section 3.3 of Holmes et al. (2015). Let y i denote a sample from each distribution, i = 1, 2. Then we compare the two distributions in each of the following scenarios.

Skewed: y 1 ∼
The results of the 5 scenarios are shown in Figure 5, with LCR plotted in the left column and D-AUC plotted in the right column.
With scenario 1 of mean shift, there is no scale change, so LCR(α) = 0, ∀α and we focus instead on the dissonance measure, plotting D-AUC as a function of θ. We observe that D-AUC increases as θ increases.
For scenario 2, there is no location change, so D-AUC = 0 and thus we only plot the LCR(α) as a function of θ for α = 0.05. We see that LCR values are negative and become more negative with increased θ. For scenario 3, D-AUC = 0, so we report only LCR(α) with α = 0.05 as a function of θ. As expected, values become more negative because the distribution for y 2 becomes more dispersed than that for y 1 when the degree of freedom θ decreases.
For scenarios 4 and 5, we plot both LCR(α) with α = 0.05, and D-AUC as a function of θ. All LCR values are negative as expected because the distribution for y 2 is more diffuse than y 1 . For the mixture scenario, D-AUC remains low when θ ≤ 1 and increases as a function of θ afterwards. This is because, when θ is small, the mixture distribution is still unimodal. Once θ > 1, the distribution for y 2 becomes bimodal in this case. For the skewed scenario, D-AUC increases as a function of θ as expected. Figure 5 the mean shift scenario, we see D-AUC = 0.5, 0.7, 0.9, 0.97 for θ = 0.5, 1, 2, 3. Therefore, we can categorize D-AUC < 0.5 as small dissonance, 0.5 ≤ D-AUC < 0.7 as moderate dissonance, 0.7 ≤ D-AUC < 0.9 as large dissonance, 0.9 ≤ D-AUC < 0.97 as extra large dissonance, and 0.97 ≤ D-AUC as extremely large dissonance.

Analysis of Benchmark Dose Data in Toxicology
In this study, we select two data sets from similar studies and would like to answer the questions: (1) are the two data sets compatible? and (2) which of these two data sets is more concentrated?
The benchmark approach is a useful tool in toxicology. The benchmark dose is the dose of an environmental toxicant that produces a predetermined change in response compared with the background response level. The toxicological data contain n binomial responses y = (y 1 , . . . , y n ) with y i ∼ Bin(n i , p i ), where n i is the number of animals tested at dose level x i and p i is the probability that an animal gives an adverse response at dose level x i . Using the logistic regression model, we have The Kociba study (Kociba et al., 1978) is a lifetime feeding study of both female and male Sprague Dawley rats, with 50 rats tested in each group at doses of 0, 1, 10, and 100 ng/kg/day. Inferences derived from the Kociba study have been widely used as the basis for risk assessments for 2,3,7,8-tetrachlorodibenzodioxin (TCDD). The National Toxicology Program (NTP) study (National Toxicology Program, 1982) is a study in which groups of 50 male rats, 50 female rats, and 50 male mice received TCDD as a suspension in 9:1 corn oil: acetone by gavage twice each week to achieve doses of 0, 10, 50, or 500 ng/kg/week for two years. These exposures correspond to daily averaged doses of 0, 1.4, 7.1, or 71 ng/kg/day for rats. Liver tumor (neoplastic nodule) incidences of female rats from both studies, shown in Table 1, are chosen as the data for this analysis. Posterior mean and posterior standard deviation (SD) for β 0 and β 1 for each study are also given.   (n 1 , y 1 , x 1 ) and the NTP study data by S 2 = (n 2 , y 2 , x 2 ). Shao and Small (2011) showed that while S 1 and S 2 are not compatible in terms of all parameters θ = (β 0 , β 1 ) , they are compatible in terms of one common parameter β 1 . Our measure D(α) confirms their conclusion. The settings for prior distributions, likelihood functions, and corresponding posterior distributions are given as follows: The estimated marginal posterior densities of β 0 based on the Kociba data and the NTP data, respectively, have different locations, inducing almost complete dissonance in β 0 and thus are not compatible (Figure 6(a)). The estimated marginal posterior densities of β 1 for each data set share a similar location leading to small D-AUC values for β 1 and thus are compatible with each other (Figure 6(b)). Because the posterior distributions for the two parts are largely non-overlapping, there is almost complete dissonance for (β 0 , β 1 ) jointly (Figure 6(c)).
To compare these two data sets, we estimated the HDR R 1 (α) based on the posterior distribution π(β 0 , β 1 | S 1 ) and the HDR R 2 (α) based on the posterior distribution Figure 6: Posterior distributions of (β 0 , β 1 ) for Kociba and NTP data. π(β 0 , β 1 | S 2 ). We calculated LCR(α) with respect to β 0 , β 1 and (β 0 , β 1 ) for several different level α values (Table 2)   NTP data is less concentrated than the Kociba data marginally for each parameter and also jointly for both parameters given that LCR(α) is negative in all three cases. Plotting D(α) versus α and summarizing the results using D-AUC (Figure 7) shows almost complete dissonance for β 0 and jointly for (β 0 , β 1 ), as 0.985 and 0.992 are quite close to 1, and little dissonance for β 1 , as expected.

Analysis of Six Cities Data
In this subsection, we illustrate how our dissonance measure can be useful in the missing data problem and would like to know if the data with no missing values are adequate and how much information is contributed by handling the missing data.
The six cities longitudinal study of the health effects of respiratory function in children (Ware et al., 1984) is a well known environmental data set that has been analyzed extensively in the literature. We only select two cities from this data set as an example. The binary response is the wheezing status of a child at age 11 with y = 0 representing no wheezing and y = 1 for wheezing. The wheezing status is modeled as a function of the city of residence (x 1 ) and the smoking status of the mother (x 2 ). The city of residence x 1 is a binary covariate that equals 1 if the child lived in Kingston-Harriman, Tennessee, the more polluted city, and 0 if the child lived in Portage, Wisconsin. The covariate x 2 is maternal cigarette smoking measured by number of cigarettes per day. There are 2315 subjects in the data set. The covariate x 1 is missing for 32.8% of the cases. We present a brief summary of the data in Table 3. Details of the data set can be found in the original paper. x 1 : city of residence. 0: clean city; 1: polluted city; NA: missing.
x 2 : maternal smoking (number of cigarettes per day). We use a logistic regression model for y given x 1 and x 2 . i.e., where β = (β 0 , β 1 , β 2 ). We further model x 1 given x 2 by a logistic regression to handle the missing data problem. Specifically, we assume x 1i given x 2i are independent Bernoulli variables with probability where α = (α 0 , α 1 ). The proposed joint prior distribution for (α, β) is Let r i be the missing indicator with r i = 1 if x 1i is missing and r i = 0 otherwise. Let D cc (complete case) denote the subset of the data where both x 1 and x 2 are observed with no missing values (r i = 0). Then the joint likelihood function based on the complete case data subset D cc is Let D ac denote the whole data set with the subscript ac meaning all case. The joint likelihood function based on all data D ac is That is, for the missing data part, we marginalize over x 1i .
The posterior distributions of the complete case π(α, β | D cc ) and all case π(α, β | D ac ) are defined as:  Using c 0 = d 0 = 100, a MCMC sample of size 1,000,000 was generated from each posterior distribution. As expected, for α = 0.05, comparing D cc versus D ac , LCR(α) for β 0 and β 2 are 0.099 and 0.195 indicating that the marginal posterior distributions for β 0 and β 2 become more concentrated with the addition of data, while LCR(α) for β 1 is −0.001 (nearly zero), which is expected given that the additional observations are completely missing for the covariate x 1 . Marginal posterior densities of β 0 and β 2 show some dissonance between the complete case and the all case, whereas marginal posteriors for β 1 are nearly identical (Figure 8). Posterior mean and standard deviation for each parameter are reported for the D cc and D ac respectively in Table 4. Because the modes of β 1 from the complete case and the all case are almost identical, D-AUC is very close to 0 (0.06) (Figure 9). The modes of β 2 from the two cases are still quite similar, with D-AUC = 0.383. For β 0 , there is a location shift between the complete case and the all case. The D-AUC for β 0 is 0.686. The D-AUC for (β 0 , β 1 , β 2 ) jointly is 0.566. The dissonance measure shows that adding 32.8% missing data to the complete case does not affect β 1 , affects β 2 slightly, affects β 0 moderately, and has moderate effect on (β 0 , β 1 , β 2 ) jointly.

Analysis of Pediatric Cancer Data
Conducting pediatric clinical trials as part of drug development for children and adolescents can be difficult due to the often low incidence of the disease, making accrual slow or infeasible. In addition, low morbidity in this population makes it impractical to achieve adequate power. In this case, pediatric care providers are accustomed to relying on evidence from adult studies, so it is natural to consider leveraging information from adult trials. Our measures LCR(α) and D(α) can be very useful in deciding  the extent we should borrow. Here we illustrate this using the pediatric cancer study of Philadelphia chromosome positive chronic myeloid leukemia (Ph+CML). We have adult and pediatric clinical trials for Tasigma (nilotinib) for newly diagnosed Ph+CML in chronic phase. The primary efficacy endpoint was major molecular response (MMR) at 12 months after the start of study medication. MMR was defined as less than or equal to 0.1% BCR-ABL/ABL % (BCR-ABL: breakpoint cluster region -Abelson murine leukemia gene, ABL: Abelson murine leukemia gene) by international scale measured by real-time quantitative polymerase chain reaction (RQ-PCR), which corresponds to a greater than or equal to 3 log reduction of BCR-ABL transcript from standardized baseline. A summary of two trials is provided in Table 5. Other detailed information can be found in https://www.accessdata.fda.gov/drugsatfda_docs/label/2018/ 022068s027lbl.pdf.

Adult
Pediatric  The adult data are denoted D A , where the sample size n A is 282, and the number of subjects having MMR y A is 125. Similarly, the pediatric data are denoted D P , with n P = 25 and y P = 15. We use a binomial distribution to model y A and y P : y A ∼ Bin(n A , p) and y P ∼ Bin(n P , p), where p is the efficacy for each population. To determine the extent of information we could borrow from the adult data, we used the power prior method (Ibrahim et al., 2015). Letting π(p) denote the initial prior distribution for p, the power prior distribution of p with the weight a 0 for the pediatric cancer study is defined as where a 0 controls the weight of the adult data relative to the pediatric data. Our settings can be summarized as follows: The range of a 0 is restricted between 0 and 1, with a 0 = 0 meaning no incorporation of the adult data. We compare the posterior distributions π(p | D A , D P , a 0 ) to π(p | D A , D P , a 0 = 0), i.e., borrowing versus no borrowing (pediatric data only), through various choices of value a 0 . Ideally, if two distributions are nearly identical, we gain information from incorporating the adult data with certain weight a 0 . In this case, our measure D(α) helps to quantify the extent of discrepancy between the two distributions and LCR(α) can measure how much information to gain by borrowing from the adult data versus not borrowing at all.  We plot the posterior densities of p with a 0 = 0.01 or a 0 = 0.04 (borrowing from the adult data) versus a 0 = 0 (no borrowing) ( Figure 10). For each value a 0 , we calculate D(α) and summarize the result using D-AUC. With weight a 0 increasing, we gain more information as the variance becomes smaller; however, the location shift also becomes more severe (Figure 11). As the borrowing power a 0 increases, the difference of two modes increases as well since the borrowing distribution is pulled more towards the adult side (Figure 12(a)). As a result, D-AUC also increases.
As the weight a 0 increases, both effective sample size and LCR increase due to more borrowing from the adult study, which has a larger sample size and smaller variance (Figure 12(b)). The relationship between LCR and the effective sample size is logarithmic. Both distributions are roughly symmetric (Figure 10), which leads to LCR being almost equivalent to the log ratio of the standard deviations of these two distributions. The effective sample size is proportional to the ratio of variances of these two distributions, thus we see a logarithmic trend in the plot. These two plots work together to determine the weight a 0 for borrowing from the adult study such that the borrowing distribution is similar to the pediatric study and we can gain information from incorporating the adult data. For instance, we can choose weight a 0 to be 0.04, with D-AUC = 0.487 slightly less than 0.5, LCR = 0.170 and effective sample size is 35. In this way, the extent of difference is acceptable, and we gain 100 × (exp(LCR) − 1) = 19% more information by borrowing the amount of information equivalent to 10 subjects from the adult study compared to using pediatric data only.

Extension of Concentration Ratio and Dissonance
Using Invariant HDR

Invariant HDR
We note that the use of HDRs makes our proposed measures not invariant to reparametrization. Consider for example y ∼ Bin(1, θ) and θ ∼ U (0, 1), the posterior HDR for For a new parameterization γ = logit(θ), the posterior HDR for γ is HDR γ (α) = {γ : e γ(y+1) /(1 + e γ ) 3 ≥ f α }. When y = 1, the posterior HDR for θ is one-sided while the posterior HDR for γ is two-sided and thus, HDR γ (α) = logit(HDR θ (α)). It means that with data unchanged, our measures may lead to different conclusions under different parameterizations, which is undesirable. So we explore an invariant version of the HDR proposed in Druilhet and Marin (2007).
They argued that the lack of invariance for HDRs was mainly due to the choice of the Lebesgue measure as dominating measure and thus they chose the Jeffreys measure instead, which makes HDRs invariant under 1-1 smooth reparametrization.
Note that, if I(θ) ∝ 1, the JHDR is the same as the posterior HDR. Recall the normal location example in Section 4.1 where y ∼ N (μ, σ 2 ) with known variance and μ ∼ N (μ 0 , σ 2 0 ), we have I(μ) ∝ 1 and thus the posterior HDR for μ is the JHDR. As our measures involve calculating the volume of a JHDR, we provide an important property of the volume of a JHDR in the following theorem.
Theorem 7.1. The volume of a JHDR is invariant under differentiable reparametrization.

Calculating JHDR Volume
We can tailor Algorithm 1 to calculate the volume of a JHDR, with detailed steps given in Algorithm 3.

Analysis of Pediatric Cancer Data using JHDR
We apply our measures using JHDR to the pediatric cancer study in Section 6.3. The Fisher information of p is I(p) = n P p −1 (1 − p) −1 , and the posterior density of p with respect to J p is π Jp (p | D A , D P , a 0 ) ∝ π(p | D A , a 0 )L(p | D P )|I(p)| − 1 2 ∝ π(p)[L(p | D A )] a0 L(p | D P )|I(p)| − 1 2 ∝ Beta a 0 y A + y P + 1 2 , a 0 (n A − y A ) + (n P − y P ) + 1 2 .
The results of D-AUC and LCR using JHDR are given in Figure 13. They are quite similar to those in Figure 12, with slightly changes that D-AUC = 0.473 when a 0 = 0.04, and LCR = 0.193.

Discussion
We propose two new Bayesian measures, LCR(α) and D(α), for quantifying change in data scale and location, respectively. We provide algorithms for computing both measures based on volumes of HDRs and show that both measures can be calibrated. We provide three examples to illustrate how LCR(α) and D(α) are useful in different areas: comparing two similar studies, measuring the information in missing data, and leveraging adult data in a pediatric cancer study.
We demonstrate that our measures can be computed and calibrated for approximately normal distributions with moderate dimension. For high dimensional non-normal distributions, finding efficient algorithms for evaluating the HDR volume remains challenging.