A Bayesian Analysis of Two-Stage Randomized Experiments in the Presence of Interference, Treatment Nonadherence, and Missing Outcomes

Three critical issues for causal inference that often occur in modern, complicated experiments are interference, treatment nonadherence, and missing outcomes. A great deal of research efforts has been dedicated to developing causal inferential methodologies that address these issues separately. However, methodologies that can address these issues simultaneously are lacking. We propose a Bayesian causal inference methodology to address this gap. Our methodology extends existing causal frameworks and methods, specifically, two-staged randomized experiments and the principal stratification framework. In contrast to existing methods that invoke strong structural assumptions to identify principal causal effects, our Bayesian approach uses flexible distributional models that can accommodate the complexities of interference and missing outcomes, and that ensure that principal causal effects are weakly identifiable. We illustrate our methodology via simulation studies and a re-analysis of real-life data from an evaluation of India's National Health Insurance Program. Our methodology enables us to identify new active causal effects that were not identified in past analyses. Ultimately, our simulation studies and case study demonstrate how our methodology can yield more informative analyses in modern experiments with interference, treatment nonadherence, missing outcomes, and complicated outcome generation mechanisms.


Introduction
Causal inference is a fundamental consideration across a wide range of domains in science, technology, engineering, and medicine (Pearl, 2009;Imbens and Rubin, 2015). A traditional gold standard for performing causal inference is the classical randomized experiment (Rubin, 2008;Imbens and Rubin, 2015). In this type of experiment, a great deal of control and precautions can be taken so as to eliminate events that would introduce instabilities and biases in causal inferences. However, modern experiments, e.g., social experiments, can become so complicated that it may be difficult to institute such control and precautions. Three significant sources of complications that are increasingly arXiv: 2110.10216 * Purdue University, West Lafayette, IN, USA, yohnishi@purdue.edu sabbaghi@purdue.edu of interest are interference among experimental units, nonadherence/noncompliance of experimental units to their assigned treatments, and unintended missing outcomes of experimental units. Interference exists if the outcome of an experimental unit depends not only on its assigned treatment, but also on the assigned treatments for other units. It arises when limited controls are placed on the interactions of experimental units with one another, or when competition for a limited set of resources exists, during the course of an experiment. Treatment nonadherence frequently occurs in human subject experiments, as it can be unethical to force an individual to take their assigned treatment. Clinical trials in particular typically have subjects that do not adhere to their assigned treatments due to adverse side effects or intercurrent events (Little et al., 2012). Missing outcomes commonly occur in human studies. For example, respondents may refuse to report sensitive outcomes (e.g., income) after receiving a treatment in a study (Little and Rubin, 2002, p. 3). Failing to account for interference, nonadherence, and missing outcomes in modern experiments will generally yield unstable and biased inferences on treatment effects.
A great deal of research efforts has been dedicated over the past three decades to developing causal inferential methodologies that address the first two issues separately. Hudgens and Halloran (2008) first introduced the concept of the two-stage randomized design for performing causal inference in the presence of interference when all experimental units adhere to their assigned treatments. In this design, experimental units belong to clusters, and randomizations are performed at both the cluster-level and experimental unit-level. Specifically, the clusters are first randomly assigned different probabilities for treatment assignment of their constituent experimental units, and then treatments are randomly assigned to the units within the clusters (with treatment assignment performed independently across clusters) based on the clusterspecific assignment probabilities. For example, in our case study, villages correspond to the clusters, and each household within a village corresponds to an experimental unit. In this manner, we can analyze the effectiveness of an insurance plan implemented in India under the two-stage randomized design. Hudgens and Halloran (2008) demonstrated how both direct treatment effects and indirect treatment effects (i.e., those effects that can be attributed to the treatments received by other units) can be inferred under this design in the presence of interference. This design and the corresponding causal inference methods that can be performed under it have been further studied by VanderWeele and Tchetgen (2011), Tchetgen and Vanderweele (2012), Liu and Hudgens (2014), and Basse and Feller (2018). For experiments that have treatment nonadherence but not interference, one standard methodology that has been considered for their analyses is the intention-to-treat (ITT) method. Under this approach, the treatment received by an experimental unit is ignored, and instead only the treatment assigned is considered. This method follows the traditional principle of analyzing an experiment according to its physical randomization (Cox and Reid, 2000, p. 14) mechanism, and can yield valid causal inferences on the effects of treatment assigned in certain situations. However, the ITT method will generally yield biased inferences on the effects of treatment received because it does not account for the latent stratification of experimental units defined according to their adherence behaviors to the different treatment assignments, and consequently does not provide inferences for the target stratum of compliers. Angrist et al. (1996) and Imbens and Rubin (1997) developed a framework to provide a more principled approach for the analyses of experiments with nonadherence, and Frangakis and Rubin (2002) extended those frameworks to develop the more general principal stratification framework. This framework has been applied to a wide variety of real-life problems involving complications such as censoring or truncation due to death (Zhang and Rubin, 2003) and the occurrence of intermediate variables that are thought to mediate the effects of treatments on the outcome (Gallop et al., 2009). VanderWeele (2011) provides a detailed review of principal stratification.
Recently, methodologies have been developed to address combinations of interference, nonadherence, and missing outcomes, but not all of these issues simultaneously. Under the assumption of fully observed outcomes, Imai et al. (2021) presented a nonparametric identification of the complier average direct and indirect effects, and proposed consistent estimators for them under the two-stage randomized design in the presence of interference and nonadherence. They derived large-sample nonparametric bounds for the causal effects, but their results are not necessarily valid for the finitesample regime. In addition, the estimators of Imai et al. (2021) are sensitive to outliers. Vazquez-Bare (2022) built on the work of Imai et al. (2021) and analyzed spillover effects using instrumental variables in the two-stage randomized experiment with fully observed outcomes. They considered the identification of causal direct and spillover effects under one-sided noncompliance and demonstrated that these effects can be estimated using two-stage least squares (2SLS). However, their methods do not work in general under two-sided noncompliance or when units have multiple peers without a strong structural assumption about peers' compliance types. Kang and Imbens (2016) considered the peer encouragement design to study network treatment effects when treatment randomization cannot feasibly be forced on experimental units, and presented identification results only for the case of one-sided noncompliance. Forastiere et al. (2016) developed a Bayesian principal stratification method for causal inference in clustered encouragement designs (CEDs), where the assignment of treatment encouragement is performed at the cluster level. The CED is effectively a special case of the two-stage randomized experiment with no randomization within clusters, i.e., with treatment assignment probability for units within clusters being either zero or one. As there is no randomization performed at the unit level within clusters, their methodology cannot capture the complexity of adherence behaviors in the two-stage experiments, where units could possibly change their adherence behaviors depending on what assignment probabilities their clusters are assigned to. It is important to note that none of the above methods can easily accommodate missing outcomes in the presence of interference and nonadherence. Experiments with treatment nonadherence and missing outcomes have been analyzed by Frangakis and Rubin (1999), Mattei and Mealli (2007), Frumento et al. (2012), and Mattei et al. (2014), but none of these studies involved interference.
We develop a new Bayesian causal inferential methodology for two-stage randomized experiments with interference, noncompliance, and missing outcomes. Our methodology utilizes the principal stratification framework to address the identification issues arising from treatment nonadherence and missing outcomes in the presence of interference. To the best of our knowledge, none of the existing causal inference methods have applied Bayesian principal stratification to two-stage randomized designs and addressed these issues simultaneously as our method does. Not only does our Bayesian approach provide a principled framework to analyze two-stage randomized experiments in the presence of all of these complications, but it clarifies what can be learned when causal estimands are not identifiable but are instead weakly identifiable (i.e., when the likelihood functions of parameters and causal estimands have substantial regions of flatness). It is important to recognize that issues of identifiability under the Bayesian paradigm are distinct from those under the frequentist paradigm because the specification of proper prior distributions always yields proper posterior distributions (Imbens and Rubin, 1997). In particular, our Bayesian method enables us to infer principal causal effects under two-sided nonadherence without the need for strong structural assumptions, and define new types of interpretable and informative causal estimands (such as the complier spillover and overall treatment effects) that would be of great interest to policy makers. Furthermore, our methodology highlights new assumptions on compliance types and missingness mechanisms for making causal inferences more efficient and stable in such complicated experiments. The use of Bayesian models in our methodology enables us to naturally accommodate complicated outcome generation mechanisms that frequently occur in modern experiments, such as heavy-tailed, skewed, zero-inflated, and/or multimodal outcome distributions.
We proceed in Section 2 to review the Rubin Causal Model (Rubin, 1974;Holland, 1986), the principal stratification framework, and relevant assumptions and causal estimands for the two-stage randomized experiment. Section 3 introduces the models and computational algorithms involved in our Bayesian methodology. In Section 4 we perform extensive simulation studies to investigate the frequentist performance of our Bayesian method for a heavy-tailed distribution with an excess of zero outcomes. These simulation studies effectively validate our methodology in a situation where the existing frequentist approach (Imai et al., 2021) performs poorly in terms of bias and mean squared error (MSE). Finally, in Section 5 we apply our methodology to the reallife data from the evaluation of India's National Health Insurance Program (RSBY) (Nandi et al., 2015). In our analysis of this case study we are able to uncover more definitive evidence of causal effects that were previously found to be insignificant in past analyses. Our concluding remarks are in Section 6.

Two-Stage Randomized Experiments with Interference and Noncompliance
Throughout this manuscript we consider two-stage randomized experiments involving two treatments and J clusters, with N j experimental units in cluster j = 1, . . . , J (each experimental unit belongs to only one cluster). We let N = J j=1 N j denote the total number of experimental units. The assignment mechanism in the two-stage randomized experiment is performed sequentially, with each stage involving a type of completely randomized design. In the first stage, J 1 clusters are randomly chosen to have a treatment assignment probability of a 1 ∈ (0, 1) for their constituent experimental units, and the remaining J − J 1 clusters have a treatment assignment probability of a 0 ∈ (0, 1) for their units. For each j = 1, . . . , J we let A j be the indicator for whether cluster j was assigned a 1 (A j = 1) or a 0 (A j = 0). We let A = (A 1 , . . . , A J ) T , and without loss of generality we let a 1 > a 0 . In the second stage, the experimental units within the clusters are randomly assigned treatment and control according to the treatment assignment probabilities assigned to their clusters, with the treatment assignment performed independently across clusters. For each cluster j with A j = 1, N j a 1 of their experimental units are randomly assigned treatment and the remaining N j (1 − a 1 ) are assigned control. Similarly, for each cluster j ′ with A j ′ = 0, N j ′ a 0 of their experimental units are randomly assigned treatment and the remaining N j ′ (1 − a 0 ) are assigned control. We assume that N j a 1 and N j a 0 are integers for all j = 1, . . . , J. We let Z i,j denote the treatment assignment indicator for unit i in cluster j, with Z i,j = 1 if it is assigned treatment and Z i,j = 0 otherwise. We let Z j = (Z 1,j , . . . , Z Nj,j ) T denote the vector of treatment assignment indicators for all units in cluster j, and Z −i,j denote the subvector of Z j with the ith entry removed. Other assignment mechanisms for two-stage randomized experiments are provided by VanderWeele and Tchetgen (2011), but we do not consider them here.
As we consider two-stage randomized experiments with interference, nonadherence and missingness under the Rubin Causal Model, we must introduce two types of potential outcomes for the treatment received by an experimental unit and the final outcome of interest that are functions of the experimental units' treatment assignments. We let D i,j (z) denote the treatment received for unit i in cluster j under treatment assignment z ∈ {0, 1} N , D j (z) = (D 1,j (z), ..., D Nj ,j (z)) T be the vector of treatments received by the units in cluster j, and D(z) = (D 1 (z), . . . , D J (z)) T . Furthermore, we let Y i,j (z, D(z)) denote the potential outcome for unit i in cluster j under treatment assignment vector z and treatment received vector D(z). Although the potential outcomes can be written solely as a function of z (because D(z) is a function of z) we include D(z) in the notation for Y i,j (z, D(z)) to emphasize the existence of nonadherence. We let D and Y denote the matrices containing all the potential values of treatment receipts and outcomes, respectively, for all experimental units across all treatment assignments.
Finally, we let M i,j (z) denote the missingness indicator for the realized outcome of unit i in cluster j under treatment assignment z ∈ {0, 1} N , M j (z) = (M 1,j (z), . . . , M Nj,j (z)) T be the vector of missingness indicators of the units in cluster j, and M(z) = (M 1 (z), . . . , M J (z)) T . We note that Y i,j (z, D(z)) is observed when M i,j (z) = 0, and is missing otherwise. It is important to distinguish between missing potential outcomes and unrealized potential outcomes. For a given treatment assignment vector z that was realized in an experiment, the potential outcomes Y i,j (z ′ , D(z ′ )) for any other distinct treatment assignment vector z ′ are referred to as unrealized potential outcomes. Also, for a given treatment assignment vector z that was realized in an experiment, if M i,j (z) = 1 then Y i,j (z, D(z)) is not observed, although the potential outcome was realized in the experiment.
Besides the treatments, potential outcomes, and missingness indicators, we assume covariates are measured for the experimental units. These covariates are either measured prior to treatment assignment, or are measured afterwards but are not affected by treatment assignment. We denote the vector of covariates for unit i in cluster j by X i,j .

Assumptions on the Structure of Interference
We extend the assumptions proposed by (Imai et al., 2021) for the two-stage randomized experiment with nonadherence. We first invoke the partial interference assumption in which units in different clusters do not interact or affect one another. This assumption was first formulated by Hudgens and Halloran (2008), and was extended to the noncompliance setting by Imai et al. (2021). Partial interference facilitates causal inference in our setting of interest because an experimental unit i's treatment received, missingness indicator will only be functions of treatment assignments for other units within the same cluster j as unit i.
The next assumption that we consider is the stratified interference assumption of Hudgens and Halloran (2008). This assumption imposes further structure on interference by having the treatment received, the missingness indicator, and the potential outcome for an experimental unit being a function of just the number of experimental units assigned treatment within the same cluster. This assumption was also considered by Forastiere et al. (2016) and Imai et al. (2021). It is important to recognize that a great deal of work has been conducted to move beyond this condition and consider more flexible structures of interference (Aronow, 2012;Manski, 2013;Basse and Airoldi, 2018b;Aronow and Samii, 2017;Baird et al., 2018;Basse and Airoldi, 2018a;Athey et al., 2018;Basse et al., 2019;Leung, 2020;Forastiere et al., 2021;Sävje et al., 2021). However, none of these works considered noncompliance and missing outcomes.

Assumption 2. For a cluster j and experimental unit
We invoke the same assumptions on outcomes.
Assumption 3. For all z, z ′ ∈ {0, 1} N such that z j = z ′ j for a cluster j, then Y i,j (z, D(z)) = Y i,j (z ′ , D(z ′ )) for all experimental units i in cluster j.

Assumption 4. For a cluster j and experimental unit
Finally, we assume the exclusion restriction with interference between units in the two-stage randomized experiment. In this case, the potential outcome of a unit i in cluster j only depends on the treatments received by units within cluster j.

Assumption 5. For any treatment assignment vectors
These five assumptions imply that for two-stage randomized experiments in which the number of treated units within any cluster is fixed by design, the potential outcomes and missingness indicator for an experimental unit are a function of its own treatment assignment and the treatment assignment probability for its cluster. We accordingly slightly abuse the notation to write D i,j (z), M i,j (z) and Y i,j (z, D(z)) as D i,j (z, a), M i,j (z, a) and Y i,j (z, a), respectively, where z denotes the treatment assignment for the experimental unit and a denotes the treatment assignment probability for the unit's cluster.

Principal Strata, Monotonicity, and the Exclusion Restriction for Two-Stage Randomized Experiments
Under the principal stratification framework, we stratify the experimental units according to their values of D i,j (z, a) under the different possible treatment assignments z ∈ {0, 1} and treatment assignment probabilities a ∈ {a 0 , a 1 } for the clusters. There exist four such potential values: D i,j (0, a 0 ), D i,j (1, a 0 ), D i,j (0, a 1 ), and D i,j (1, a 1 ). A unique feature of our consideration of nonadherence for the two-stage randomized design is that, according to Assumption 2 and 4, units can have different compliance behaviors under different assignment probabilities for their clusters. We formally define the compliance behavior of unit i in cluster j under each treatment assignment probability a ∈ {a 0 , a 1 } for the cluster as where n, c, d, and a denote never-takers, compliers, defiers, and always-takers, respectively. Finally, the compliance behavior for unit i in cluster j is defined according to the pair of compliance indicators G i,j = (G i,j (a 0 ), G i,j (a 1 )).
A standard assumption for the compliance behavior is monotonicity.
This assumption was also considered by Imai et al. (2021) and Forastiere et al. (2016). Monotonicity eliminates the possibility of defiers under either treatment assignment probabilities a 0 and a 1 . It also reduces the number of principal strata from sixteen to nine.
In addition to this existing monotonicity assumption, we also consider the following new assumption on compliance behaviors across the different treatment assignment probabilities that can be assigned to the clusters.
Assumption 7. The set {n, c, a} is a partial order set. For a 0 < a 1 , units whose clusters have been assigned a 1 would have a non-strictly lower compliance type of {n, c, a} than the current compliance type if their clusters were assigned a 0 . Units whose clusters have been assigned a 0 would have a non-strictly greater compliance type of {n, c, a} than the current compliance type if their clusters were assigned a 1 .
This assumption further reduces the number of principal strata. To see this, we first recognize that there are only six possible definitions of partial orders for {n, c, a}: n ≤ c ≤ a, n ≤ a ≤ c, c ≤ n ≤ a, c ≤ a ≤ n, a ≤ n ≤ c, and a ≤ c ≤ n. We adopt the partial orders n ≤ c ≤ a as well as a 0 ≤ a 1 throughout. These correspond to units being more likely to take treatment if a larger proportion of their neighbors take treatment. Alternatively, if a cluster is assigned a 1 then its units are more likely to receive treatment, no matter what the units are assigned, compared to the case if the cluster is assigned a 0 . The combination of Assumptions 6 and 7 thus reduces the number of principal strata to six: {(n, n), (c, c), (a, a), (n, c), (n, a), (c, a)}.
Our definition of principal strata based on Assumptions 6 and 7 is more general than existing definitions. For example, Imai et al. (2021) considered the monotonicity assumption with respect to the treatment assignment probability. Their assumption corresponds to the partial order n ≤ c ≤ a but can not express other orderings of monotonicity, e.g., a ≤ c ≤ n. Forastiere et al. (2016) only defined three principal strata based on the treatment uptake status, excluding defiers. Also, Vazquez-Bare (2022) defined five principal strata by posing monotonicity directly on treatment received, which led to the removal of strata (n, a) from consideration. These distinctions exist because we consider the two-stage randomized design, and we define the monotonicity assumptions on the compliance behaviors with respect to the treatment probabilities for clusters. Forastiere et al. (2016) considered clustered encouragement designs (CEDs) in which encouragement is randomized at the level of clusters but with no randomization carried out within clusters. In contrast, the two-stage randomized design has encouragement randomized at the level of units within clusters, with clusters assigned a treatment probability that governs the proportion of treated units within the cluster. The latter design generates more complicated structures for the units' compliance behaviors because some units might behave differently based on the treatment probabilities assigned to their clusters. This distinction is critical because of our need to capture behavioral shifts of units between such treatment probabilities, and because of the direct and spillover causal estimands of interest that are defined in Section 2.4. Our definitions of the monotonicity assumptions provide more flexible orderings of compliance behaviors than that of Vazquez-Bare (2022) because we permit more relevant orderings.
In addition to monotonicity, we also consider the exclusion restriction for certain principal strata in the case of nonadherence.
Assumption 8. For any unit i = 1, . . . , N j in cluster j = 1, . . . , J: This assumption captures the idea that treatment assignment has no effect on the outcome if the unit is either an always-taker or a never-taker under each treatment assignment probability. It also corresponds to Assumption 5, from which the outcome of an experimental unit is determined only through the treatment received by units within the same cluster. This assumption is equivalent to a modified form of the restricted interference assumption of Imai et al. (2021) that is stated in terms of principal strata.

Direct, Spillover, and Overall Causal Estimands
We consider the finite-population framework for causal estimands under the Rubin Causal Model. In this case, estimands are defined in terms of comparisons of potential outcomes for the N experimental units. Our approach for defining direct, spillover, and overall causal estimands in two-stage randomized experiments with interference and nonadherence follows that of Hudgens and Halloran (2008) and Imai et al. (2021).
To simplify our expressions of the causal estimands, we first write the treatment received and potential outcome for unit i in cluster j under treatment assignment z as D i,j (z) and Y i,j (z, D(z)), respectively. We let K j (a) denote the number of treated units in cluster j under treatment assignment probability a ∈ {a 0 , a 1 }, and Z −i,j denote the set of all subvectors of z j ∈ {0, 1} Nj with the ith element removed such that One set of causal estimands that we consider are the unit-level direct intentionto-treat (ITT) effects of treatment assignment on treatment received and the final outcome under treatment assignment probability a ∈ {a 0 , a 1 }, i.e., , respectively. These estimands capture the adherence behavior and changes in the final outcome under the same treatment assignment probability a when unit i in cluster j is assigned treatment as opposed to control. We average the unit-level effects to define the clusterlevel and finite population-level ITT effects as ITT D,·,j (a) = Another causal estimand of interest in the presence of interference is the spillover (indirect) effect of treatments assigned to other units on the potential outcomes for a particular experimental unit. Following Imai et al. (2021) we define unit-level spillover effects on treatment receipt and outcome as S D,i,j (z) =D i,j (z, a 1 ) −D i,j (z, a 0 ) and S Y,i,j (z) =Ȳ i,j (z, a 1 ) −Ȳ i,j (z, a 0 ). In these two estimands, we consider the average potential outcomes corresponding to the two different treatment assignment probabilities a 0 and a 1 for cluster j but the same treatment assignment z for unit i in cluster j. These estimands quantify the intuition that, if differences exist in potential outcomes under the same treatment assignment z, then they can be attributed to the first stage of the experiment that governs the proportion of treated units in each cluster. This is because under Assumption 2 and 4 the treated units in a particular unit i's cluster are the only factor besides the assigned treatment that can affect unit i's outcomes, and so the S D,i,j (z) and S Y,i,j (z) can be reasonably regarded as spillover effects. We define cluster-level and population-level spillover effects by averaging the S D,i,j (z) and S Y,i,j (z) across i and j, respectively.
The final estimand that we consider is the overall effect of the first stage of the experiment, i.e., the effect of having treatment assignment probability a 1 versus a 0 for a cluster. This effect is usually of greatest interest for policy makers. For example, infectious disease experts may be interested in comparing infection rates under two different vaccine allocation plans (e.g., 40% and 80%) within each cluster. We define unit-level overall effects of the first stage on treatment receipt and outcome and Z j is a set of all possible assignment vectors z j . We have thatȲ i,j (a) is effectively the average value of the individual's outcome under the treatment assignment probability a ∈ {a 0 , a 1 }. It is decomposed intō VanderWeele and Tchetgen (2011) provided the same decomposition but with a slightly different proof. The overall effect is expressed as the sum of the spillover effect and the contrast of two ITT effects under the treatment assignment probability a 1 and under the treatment assignment probability a 0 , each of which are multiplied by the proportion of treated units under that assignment probability.

Principal Causal Estimands
In addition to defining direct, spillover, and overall causal estimands, we define new principal causal estimands. These estimands extend those in Section 2.4 to account for compliers under the different treatment probabilities a 0 and a 1 that can be assigned to clusters. Imai et al. (2021) used Assumption 2 and 4 to define the complier average direct effect under treatment probability a ∈ {a 0 , a 1 } as where ½(·) denotes the indicator for an event. We define a new complier average direct effect by generalizing the estimand in equation (2.2) to account for how experimental units comply under the different treatment probabilities assigned to clusters. This new definition is motivated by the practical interest in modifying the effect of a 1 to account for the units who would comply under a 0 , and vice versa. To define this new complier average direct effect, we introduce the notions of the "base cluster assignment" as the treatment probability for a cluster under which compliers are defined, and the "target cluster assignment" as the treatment probability for a cluster under which causal effects are considered. It is important to separate the base from the target cluster assignments because the treatments received by units are influenced by the treatments assigned to other units, and because compliance behaviors vary with the treatment probabilities a 0 and a 1 . For a target cluster assignment a and base cluster assignment a ′ , our new population-level complier average direct effect is defined as The complier average spillover effect for treatment z defined by Imai et al. (2021) is the ratio of ( 2.4) The estimand in equation (2.4) is interpreted as a local average treatment effect for units who comply with the treatment probability assigned to the cluster. These estimands are not clearly interpretable as complier spillover effects because the compliance is defined for the compliance behaviors with respect to treatment probability, not the actual treatment assignment. We define a new complier average spillover effect using base and target cluster assignments. For treatment z and base cluster assignment a ′ , we define CASE(z, a ′ ) as the ratio of . Similar to the population-level spillover effect on outcome defined in Section 2.4, CASE(z, a ′ ) captures the compliers' local average spillover effect by comparing the two sets of potential outcomes under the two cluster assignments a 0 and a 1 and treatment z. In this estimand, the compliers are defined under the base cluster assignment a ′ , and it is interpretable as an average effect for the compliers of the treatment received by others in the same cluster.
We finally define the finite-population complier average overall effect by combining equation (2.1) with the unit-level complier average overall effect under the base cluster assignment a ′ , {Ȳ i,j (a 1 )−Ȳ ij (a 0 )}½(G i,j (a ′ ) = c), and decomposing the effect according to the sum of the unit-level complier average direct and spillover effects. Specifically, we have (2.5) The population-level effect is then the average of the unit-level effects in equation (2.5) over compliers under the base cluster assignment a ′ , i.e.,

Overview of Methodology
Our Bayesian methodology for performing causal inferences on two-stage randomized experiments with interference, nonadherence, and missing outcomes is based on modelbased imputation of missing and unrealized potential outcomes to derive the posterior distributions of the finite-population causal estimands. To describe this more formally, we let τ denote one of the causal estimands from Sections 2.4 and 2.5, X the N × P matrix of covariates for all experimental units, A o the N × 1 vector of treatment probabilities that were assigned to the clusters in the experiment, Z o the N × 1 vector of treatments that were assigned to the units in the experiment, D o the N × 1 vector of treatments that were received by the units in the realized experiment, M o the N ×1 vector of missingness indicators for the outcomes in the realized experiment, and Y obs the vector of observed outcomes in the realized experiment. Then the Bayesian methodology calculates the distribution p τ | X, A o , Z o , D o , M o , Y obs . As causal estimands are functions of observed, missing, and unrealized potential outcomes, we calculate the posterior distribution above by integrating over the unrealized treatments received, which we denote by D u , and both the missing and unrealized potential outcomes, which we denote by Y mis and Y u respectively, according to and G is an N × 1 vector of (latent) principal strata memberships for the experimental units. This motivates our Bayesian methodology of first deriving a Monte Carlo approximation for the posterior predictive distribution of the missing principal strata and outcomes conditional on the observed data, and then using that posterior predictive distribution to derive a Monte Carlo approximation for the posterior distribution of τ . Our methodology is distinct from existing methods for two-stage randomized experiments in that we impute missing principal strata memberships and at most four missing/unrealized potential outcomes for each experimental unit according to the assumptions in Section 2.
Our imputation approach, and the latent ignorability assumption that underlies it, is in Section 3.2. In Section 3.3 we provide additional details on our Bayesian modeling approach and the unconfoundedness condition underlying it, which is justified by the design of the two-stage randomized experiment. The Gibbs sampling algorithm that we use to derive a Monte Carlo approximation to the posterior distribution of the causal estimand is in Section 3.4. We provide a justification of exchangeability in our Bayesian methodology in the appendix.

Imputation of Missing Values
By virtue of Assumptions 6 and 7, any experimental unit in the two-stage randomized experiment belongs to one of six principal strata. Table 2 illustrates the correspondence between the possible values of A o j , Z o i,j , and D o i,j and the principal strata memberships, along with the missingness rates for the outcomes in the RSBY study that we analyze in Section 5. We let G(a, z, d) denote a set of possible principal strata for units with Assumption 7 in particular helps to narrow down the possible strata that a unit could belong to based on the observed data. For example, if an experi- a)}, and so G i,j = (a, a). Once G i,j is fixed, we can immediately impute all unrealized D u i,j . We also observe in Table 2 that different strata, e.g., (a, a) and (n, n), exhibit different missingness rates. This corresponds to more general phenomenon in modern experiments in which different strata have different missing data mechanisms for the final outcomes. This phenomenon is intuitive for certain strata. For example, those units in the (n, n) strata may be more reluctant to take any actions compared to units in other strata, and thus their outcomes are more likely to be missing. As the missingness mechanism depends on the latent strata, the traditional missing completely at random (MCAR) and missing at random (MAR) assumptions may not be appropriate. In contrast to other methods, our Bayesian methodology accommodates a class of missing not at random (MNAR) mechanisms (Little and Rubin, 2002, p. 351) that correspond to the following latent ignorability (Frangakis and Rubin, 1999, LI) assumption.
Assumption 9 (Latent Ignorability of Missing Data). For any unit i = 1, . . . , N j in cluster j = 1, . . . , J, We formulate LI to mean that if we knew the latent strata G i,j of each unit, the missingness mechanism would be ignorable. Assumptions 8 and 9 facilitate and unify our We provide examples of the number of units and the missingness rates of the outcomes for these different sets of principal strata based on data from the RSBY study in Section 5.
imputations of missing and unrealized potential outcomes. For those units with M o i,j = 0, as one of their potential outcomes is observed we need to impute at most three unrealized potential outcomes for each of them. In the case of a unit whose principal stratum is (c, c), we need to impute three unrealized potential outcomes based on our Bayesian model. For another unit whose principal stratum is (a, a), we only need to impute one unrealized potential outcome, as the potential outcomes are restricted according to Assumption 8 for this stratum. For a unit with M o i,j = 1, we impute at most four potential outcomes based on their imputed principal strata G i,j .

Bayesian Model Building
Following the Bayesian paradigm of Imbens and Rubin (2015) and Gelman et al. (2013), five inputs are necessary for deriving the posterior distributions of the causal estimands. The first input is knowledge of the assignment mechanisms as encoded in the probability mass functions p (A | X) and p (Z | X, A). These two probability mass functions are obtained immediately by virtue of the known design of the two-stage randomized experiment. In addition, these two probability mass functions are independent of the experimental units' covariates X, so that p (A | X) = p (A) and p (Z | X, A) = p (Z | A). The second input is the model for treatment received conditional on X, A, Z. We denote this input by the probability mass function p (D | X, A, Z, ψ) with parameter vector ψ. An equivalent input is the model for principal strata memberships of all experimental units, G, conditional on X, A, Z, because there is a one-to-one mapping between G and D. To simplify our notation we denote this input as p (G | X, A, Z, ψ) with the understanding that ψ is generic notation for the parameter vector associated with either of these two models. The third input is the model for the potential outcomes conditional on X, A, Z, D. We denote this model via a probability density/mass function p (Y | X, A, Z, D, θ) with parameter vector θ. This model can also be equivalently formulated using G as p (Y | X, A, Z, G, θ). The fourth input is the model for the missingness indicators M conditional on X, A, Z, D, Y, denoted by p (M | X, A, Z, D, Y, φ) with parameter vector φ. Under the LI assumption, this model will not involve Y con-ditional on the other variables. The final input is the joint prior distribution for ψ, φ, and θ, denoted by p (ψ, φ, θ). We assume throughout that ψ, φ, and θ are distinct and do not share any common parameters.
Specifying the second and third inputs is facilitated by the fact that the two-stage randomized experiment has an unconfounded assignment mechanism at both cluster and individual experimental unit levels. Unconfoundedness is formally expressed via the following assumption.
Assumption 10 implies that the treatment assignment mechanism is ignorable. In addition, unit exchangeability implies by de Finetti's Theorem that we can specify parametric models on the level of the individual experimental units for the principal strata memberships G i,j , the potential outcomes, and the missingness indicators to derive the joint model for G, M, and Y. A justification of unit exchangeability in the two-stage randomized experiment is in the appendix. Hence we have that (3.1) The last line in equation (3.1) follows from Assumption 9. The posterior distribution are a, z, d, and m, respectively. For each unit i in cluster j and each principal strata be the probability mass/density function of Y i,j (z, a) for g ∈ {(n, n), (c, c), (a, a), (n, c), (n, a), (c, a)}. Under Assumptions 6 -8, the posterior distribution of ψ, φ, and θ can be written as (3.2) Our model-based Bayesian method ultimately utilizes the above mixture model that requires the specification of w . Specific models and prior distributions are provided in our simulation studies in Section 4 and our case study in Section 5. Our Bayesian methodology also enables us to infer principal causal effects under twosided noncompliance, whereas existing methods require additional strict assumptions to perform inferences for principal causal effects under two-sided noncompliance including always-takers and social-interaction compliers (i.e., the strata (a, a) and (c, a)) (Vazquez-Bare, 2022).

Gibbs Sampling Algorithm
We utilize the Gibbs sampling algorithm (Geman and Geman, 1984;Gelfand and Smith, 1990;Imbens and Rubin, 1997) to obtain the posterior predictive distributions of Y mis and Y u given the observed data, so as to impute all missing and unrealized outcomes and derive the posterior distributions for the causal estimands. Specifically, we iterate between drawing from the conditional distributions of (ψ, φ, θ) and G given the other variables, respectively. Then for each iteration we impute Y mis and Y u , and calculate the causal estimands of interest to effectively get a draw from their respective posterior distributions. The essential algorithm is outlined below.
This Gibbs sampler enables us to obtain posterior draws of all the unobserved potential outcomes and principal strata for all units. We then perform causal inferences by means of the posterior distributions. For example, points estimates of the estimands can be obtained via their posterior means or medians, and intervals for the estimands can be obtained via the central credible intervals. Detailed derivations for each step of the Gibbs sampler are provided in the supplementary material.

Evaluation Metrics and Data Generating Mechanisms
We evaluate the frequentist properties of our Bayesian methodology with respect to those of the method of Imai et al. (2021), which we implement using its R package experiment (Imai et al., 2019). Imai et al. (2021) implicitly assumed MCAR data in their analyses, and so removed rows with missing outcomes from their analyses because their methodology does not accommodate missing outcomes. Therefore, to ensure fair comparisons, in our first simulation studies we have either no missing data or MCAR data. Our Bayesian methodology easily accommodates the MCAR mechanism without major modifications, as this would just entail removing units that have missing outcomes and ignoring the missing data mechanism model in the derivations of the likelihood functions and posteriors.
The evaluation metrics that we consider are bias and mean square error (MSE) in estimating a causal estimand, coverage of an interval estimator for a causal estimand, and the interval length. Bias and MSE are generally defined as The median is desirable as it is more robust for the chosen data generating mechanism. The data generating mechanisms that we consider in our study are specified to simulate data resembling those in our case study in Section 5.
The potential outcomes in our simulation study are continuous and are generated to have a right-skewed distribution with an excess of zeros. The specific generation mechanism that we implement is a zero-inflated Log-Normal distribution, with the parameters of the underlying Bernoulli and Log-Normal random variables (representing the excess zeros and heavy tail of the outcomes, respectively) specified to be distinct for each strata and treatment. More formally, for a ∈ {a 0 , a 1 } and z ∈ {0, 1}, the potential outcomes for unit i in cluster j are generated by first sampling W i,j (z, a) ∼ Bernoulli(p z,a,Gi,j ), then samplingỸ i,j (z, a) ∼ Log-Normal(µ z,a,Gi,j , σ 2 z,a,Gi,j ), and finally generating Y i,j (z, a) = {1 − W i,j (z, a)}Ỹ i,j (z, a). For this simulation study, we assume that potential outcomes are generated independently of one another. In addition, Assumption 8 applies throughout the data generating mechanism, including for W i,j (z, a) andỸ i,j (z, a). The generated W i,j (z, a) andỸ i,j (z, a) are not recorded as data. Finally, we use conjugate prior distributions for all parameters, that is, π (n,n) , π (c,c) , π (a,a) , π (n,c) , π (n,a) , π (c,a) ∼ Dirichlet(2, 2, 2, 2, 2, 2), p z,a,Gi,j ∼ Beta(1, 1), µ z,a,Gi,j ∼ Normal(0, 10 2 ) and σ 2 z,a,Gi,j ∼ InverseGamma(1, 1). Detailed algorithms for the Gibbs sampler are provided in the appendix. We simulate 500 datasets for each N = 5000, 10000, 50000 with a fixed cluster size of J = 100. In the appendix we provide all parameter values that are utilized in our simulation study the frequentist evaluations for the super-population versions of the causal estimands, and an additional simulation study using the Gamma distribution for the outcome model. Table 3 summarizes the results of our simulation study in the case of the Log-Normal distribution. The method of Imai et al. (2021) is abbreviated as "IJM" in this table. We observe that both methods perform well with respect to coverage and bias, with the IJM method exhibiting less bias than our Bayesian method for small sample sizes. This difference in bias can be attributed to the effect of the prior distributions, which is not negligible for small N . In addition, the posterior median is not an unbiased estimator of the mean of the Log-Normal distribution, although it is arguably a desirable estimator as it should be more robust for the chosen data generating mechanism. This difference in bias should also be expected as the emphasis of the Bayesian approach is on the posterior distribution and whether it is well-calibrated. Finally, we recognize that our Bayesian methodology outperforms the IJM method with respect to MSE under all conditions. The differences in MSE imply that the Bayesian point estimator is less likely to deviate from the true causal effect over multiple experiments as compared to the IJM method, that the Bayesian estimator has less variability, and that the Bayesian intervals have smaller widths. Ultimately, the Bayesian methodology yields more precise causal inferences. The IJM method is sensitive to the shape of the distribution and exhibits greater variability and MSEs as it attempts to accommodate outliers.

Evaluation under Misspecification
We evaluate our Bayesian methodology under model misspecification. To generate data that resemble those in our case study, we consider heavy-tailed distributions for the data-generating process. We use the Half-Student-t distribution with different values of the degrees of freedom, ν = 1.5, 2.0, 3.0, 5.0 to evaluate how the outliers and skewness of the distribution influence our results. We did not choose ν = 1 because that corresponds to the Cauchy distribution, which does not have a finite first moment. The outcomes are generated byỸ i,j (z, a) ∼ C z,a,Gi,j × Half-t(ν), where C z,a,Gi,j is a scaling constant. We fit the Log-Normal model described in the previous section to the generated data. We fix the number of units and the number of clusters at N = 10000 and J = 100 respectively. The additional parameters C z,a,Gi,j are provided in Table 17 in the appendix. Table 4 summarizes the results of the simulation study in the case of model misspecification. Our Bayesian model outperforms the IJM method for the skewed distribution, ν = 1.5. It also exhibits smaller MSE and slightly larger bias for ν = 2.0, 3.0 for the case of the CADE and ITT Y estimands. The larger bias is attributable to the same reasons as those outlined in the explanation of the results for the first simulation study (Section 4.2). When the data distribution is less skewed and less heavy-tailed, i.e., when ν = 5.0, the IJM method performs well. This results suggest that one should adopt more appropriate outcome models for less skewed data. On the other hand, our model is assumed to be more appropriate for the data in the case study that is considered in Section 5, as those data are heavily skewed and exhibit substantial outliers on the tail. Table 5 presents the simulation results under MNAR. The underlying parametrizations are the same as the Log-normal mechanism in Section 4.1, except that the outcome is missing with probabilities given in Table 18 in the appendix. It is important to recognize that the IJM method is evaluated after applying listwise deletion of the rows with missing outcome. We see that for N = 5000 the interval lengths of the Bayesian method are a bit larger than those in Table 3. This is due to incorporating the missingness mechanism into our model. For sufficiently large data (i.e., N = 10000, 50000), the results are similar to those in Table 3.

Evaluation under MNAR
The performance of the IJM method in Table 5 is worse in all metrics compared to the IJM results in Table 3. This is due to the bias resulting from applying listwise deletion of rows under the assumption of an MCAR mechanism when the actual missing data mechanism is MNAR. The bias becomes more severe when the missingness probabilities are significantly large. We present additional simulations with larger missing probabilities in the appendix.   Nandi et al. (2015). Imai et al. (2021) conducted a two-stage randomized experiment to determine whether access to the national insurance plan provided by RSBY increases access to hospitals and reduces impoverishment due to high medical expenses. This experiment consisted of N = 10, 072 households after Imai et al. (2021) applied listwise deletion for missing outcomes, with the households residing in J = 435 villages. Of these villages, J 1 = 219 were assigned treatment probability a 1 = 0.8 and the remaining 216 villages were assigned treatment probability a 0 = 0.4. One concern in this experiment was the spillover effects between households, because one household's enrollment in RSBY may depend on the treatments assigned to other households. Another concern is that some households assigned treatment may decide to not enroll in RSBY, and some households assigned control may ultimately manage to enroll in RSBY.

Bayesian Analyses of the Experiment
We utilize our Bayesian methodology to infer the direct and spillover effects accounting for interference, treatment nonadherence, and missing outcomes on the annual household hospital expenditure outcome (which ranges from 0 to INR 500, 000). We first consider the case of Imai et al. (2021) in which the outcomes are assumed to be MCAR. In this case, listwise deletion of rows with missingness is performed. We use the same mixture model as in Section 4.1 to analyze the data. The model for principal strata memberships is the Multinomial distribution, G i,j ∼ Multinomial π (n,n) , π (c,c) , π (a,a) , π (n,c) , π (n,a) , π (c,a) . For the potential outcomes we specify a zero-inflated Log-Normal distribution for each principal stratum, that is, for a ∈ {a 0 , a 1 } and z ∈ {0, 1}, the potential outcomes for unit i in cluster j are determined by z,a,Gi,j ). Finally, we use conjugate prior distributions for all parameters. π (n,n) , π (c,c) , π (a,a) , π (n,c) , π (n,a) , π (c,a) ∼ Dirichlet(α (n,n) , α (c,c) , α (a,a) , α (n,c) , α (n,a) , α (c,a) ) , p z,a,Gi,j ∼ Beta(a 0 , b 0 ), µ z,a,Gi,j ∼ Normal(µ 0 , σ 2 0 ) and σ 2 z,a,Gi,j ∼ InverseGamma(k 0 , θ 0 ) where θ 0 is a scale parameter. For the hyperparameters we choose α (n,n) = α (c,c) = α (a,a) = α (n,c) = α (n,a) = α (c,a) = 1, a 0 = b 0 = 1, µ 0 = 0, σ 2 0 = 5, k 0 = 0.1 and θ 0 = 1. Note that σ 2 0 = 5 is sufficiently We consider the finite-population inference as was done by Imai et al. (2021). One advantage of the alternative, super-population inference is that the estimands are free of the parameters governing the associations between potential outcomes. This is an advantage because the data are not informative about the associations between potential outcomes as all potential outcomes can never be jointly observed simultaneously for an experimental unit. Ding and Li (2018) suggested isolating the parameters that govern the marginal distributions from the parameters that govern the associations between potential outcomes, and performing sensitivity analyses for the association parameters. We omit the sensitivity analysis for these parameters and instead perform a sensitivity analysis for the prior distributions. Table 6 compares the results obtained from our Bayesian methodology with those obtained by the method of Imai et al. (2021) in the case of MCAR outcomes. We do not consider the complier average spillover effects because we defined these estimands differently from Imai et al. (2021), and because Imai et al. (2021) mentioned that these estimands were imprecisely estimated in their analyses. We observe in Table 6 that our results are generally consistent with those of Imai et al. (2021). However, differences exist because our Bayesian method is able to detect significance effects in CADE(a 1 , a 1 ), ITT Y,·,· (a 1 ), and S Y,·,· (1). The negative spillover effects that our method detects indicate that treated households are more likely to be negatively affected by the shift from a 0 to a 1 . Alternatively, assigning a greater proportion of households to treatment will cause another treated household in the same village to spend less. We also observe large posterior standard deviations, and that the posterior intervals always have smaller widths than IJM's confidence intervals. The new effects that our methodology detects can be attributed to the greater precision (hence power) that follows from the use of the Bayesian model. Furthermore, our Bayesian methodology's ability to consider a point estimator based on the median, and a model that accommodates both an abundance of zeros and heavy tails, is advantageous for analyzing the data as the inferences would be robust to outlying observations. In particular, there are 36 observations greater than  INR 100, 000, with the two largest ones being INR 403, 000 and 500, 000. For comparison, the median in the dataset is INR 1, 000. Table 7 summarizes the results for the other causal estimands in the case of MCAR outcomes. Interestingly, the credible intervals of the overall effects for all units, as well as for compliers, lie below zero. The overall effect is of greatest interest to policy makers as it captures a pure impact of the intervention on all units. Our inferences on compliers imply that the overall effects are negative for units who comply with the assignment regardless of which treatment probability a 0 or a 1 is assigned to their respective cluster. We can also conclude from both Tables 6 and 7 that the direct effect of the treatment assignment under treatment probability a 1 is negative for compliers, regardless of their base cluster assignment. Finally, we can conclude that the spillover effects for compliers are negative when all units are assigned to treatment, regardless of whether they are compliers under a 0 or a 1 . Combined with our inferences on S Y,·,· (1), the spillover effect of treatment assignment on units is negative, regardless of their principal strata. Table 8 presents the results in the case of MNAR outcomes. As the overall missingness rate is small (7.8%), we do not observe significant changes from the results under the MCAR mechanism in Table 6 and 7. This observation implies that assuming the MCAR mechanism would not hurt the final estimates for our analyses. Our Bayesian methodology enables us to explicitly assess the plausibility of missingness assumptions. This is another advantage of our methodology over the previous works (Imai et al., 2021;Forastiere et al., 2016;Vazquez-Bare, 2022).

Sensitivity Analysis to Prior Specifications
When performing Bayesian analyses with weakly identifiable models it is important to investigate the robustness of the results with respect to the prior specifications, so as to make inferences more reliable. The results in this section are derived using proper prior distributions. In particular, we use µ z,a,Gi,j ∼ Normal(0, σ 2 0 ), σ 2 z,a,Gi,j ∼ InverseGamma(a 0 , b 0 ), (π (n,n) , π (c,c) , π (a,a) , π (n,c) , π (n,a) , π (c,a) ) ∼ Dirichlet(φ (c,c) ,φ (a,a) ,φ (n,n) ,φ (n,c) ,φ (n,a) ,φ (c,a) ), and p z,a,Gi,j ∼ Beta(α 0 , β 0 ) for the prior specifications. Table 10 presents the specifications of hyperparameters. Note that σ 2 0 = 3 is substantially different from σ 2 0 = 5  1 , a 1  and σ 2 0 = 7 on a log scale. Table 9 reports the results for CADE. we observe that the extreme cases for p, specifically, Cases 10 and 11, lead to slight changes in the intervals, and that the other cases only exhibit minor changes in the results. However, the extreme values of p indicate strong prior beliefs about whether the outcome is zero or not for each principle strata. In our case study, as we lack knowledge about the unobservable strata, we believe that such strong priors about these parameters are unreasonable.

Concluding Remarks
We presented a Bayesian model-based methodology to simultaneously address interference, treatment nonadherence, and missing outcomes in two-stage randomized experiments. These three complications are significant drivers of unstable and biased causal inferences in modern experiments. Existing methodologies are lacking in their abilities to address all these issues simultaneously. There are three novel contributions of our Bayesian methodology. First, it provides a set of assumptions for performing valid causal inference in two-stage randomized experiments in the presence of these complications. We extended existing assumptions on the structure of interference to handle missing outcomes and also clarified and formalized assumptions about adherence behaviors within and across clusters. Second, it defines new causal estimands, including the overall effects of intervention and interpretable spillover effects, that can be inferred by means of our flexible Bayesian models. Our Bayesian methodologies address the issues of identifiability under two-sided nonadherence. Finally, our methodology can enable more Table 9: Sensitivity analysis to prior specifications. We present the posterior medians and 95% central credible intervals of the causal estimands.   precise causal inferences compared to existing methods for complex, non-standard datagenerating mechanisms. We illustrated the utility of our methodology in this respect via simulation studies and the RSBY case study. In the latter study our methodology was able to uncover more definitive evidence of spillover and overall effects of the intervention that were not recognized in the previous analyses by Imai et al. (2021). Our results are further validated by sensitivity analyses to prior distribution specifications and consideration of the case of MNAR missing outcomes.
Of great interest for future research on our Bayesian methodology is the relaxation of the assumptions of the interference structure. In certain real-life contexts it may be too restrictive to employ the two-stage randomized design and assume stratified interference. A great deal of research has been conducted on inferring causal effects without using special designs such as the two-stage randomized design or clustered encouragement design. The work of Aronow and Samii (2017) and Sävje et al. (2021) provide one possible path for future research based on the network structure and exposure mapping.

Supplementary Material
Appendix A: Derivation ofD i,j (z, a) andȲ i,j (a) Under Assumptions 1 -4, we havē where the second line follows from Assumption 2 and 4 and the third line follows from a simple probability calculation. We also have that

Appendix B: On Exchangeability
Exchangeability can be justified for the two-stage randomized experiment in the presence of interference between units. First, consider two units that belong to different clusters. Assumption 1 and 3 imply that they do not interfere with each other. As such, it is plausible that their outcomes are independent. Also, as covariates are independent of treatment assignments, the treatments received and potential outcomes under different treatment assignments would be identical if the units' cluster labels were permuted. Thus exchanging units that belong to different clusters in this manner does not affect the joint distribution for all units, and so they are exchangeable. We next consider two units i 1 and i 2 in the same cluster j. These units are not independent because their outcomes can be affected by interference. If both i 1 and i 2 are assigned treatment, then as Assumption 2 and 4 imply that interference is determined by the total number of treated units within cluster j, the joint distribution of all units in the cluster doesn't change even if their labels are permuted because the number of treated units stays the same. Hence, the units are exchangeable. The same argument applies to the case in which both i 1 and i 2 are assigned control. Finally, if one is assigned treatment and the other is assigned control, then the total number of treated units in cluster j still remains the same, and so the effect on the rest of the units within cluster j would not change.
The joint distribution of i 1 and i 2 will not change because only one of them is assigned treatment and the other is assigned control. Thus, exchangeability holds in such a case as well.
Ultimately, Assumptions 1 -4 justify de Finetti's Theorem in the two-stage randomized experiment and the use of parametric models under our methodology. If these assumptions did not hold, unit exchangeability would be questionable because the interference structure may collapse if the labels of the experimental units were permuted, which would affect the joint distribution of all units.
of G i,j , ρ g,a,z i,j as the probability mass function of M i,j , and f g,a,z i,j as the probability mass/density function of Y i,j (a, z) for g in Section 3.3. So, w g i,j = π g , ρ g,a,z i,j = ρ z,a,g and f g,a,z i,j = (p z,a,g ) ½(y=0) (1 − p z,a,g ) h(y | µ z,a,g , σ 2 z,a,g ) ½(y>0) where h(y | µ z,a,g , σ 2 z,a,g ) is the probability density function for the log-normal distribution with parameters µ z,a,g and σ z,a,g . For observations (i, j) For observations (i, j) ∈ O obs = (a, z, d, 1, * ), we have is missing, which is denoted by Y obs i,j = * . In Step 2(b), we impute the missing potential outcomes for unit i in cluster j using the parameter θ (t) , the compliance behaviors G (t+1) i,j = g and the observed variables z, d, m). First, we consider M o i,j = 0. We need to impute the missing values such that Assumption 8 holds. The imputation is two-fold. First, we sample W z ′ ,a ′ ,g ) 2 ) for missing potential outcomes where (a, z) = (a ′ , z ′ ).
For example, if we observe (A o j , Z o ij , D o ij ) = (a 0 , 0, 0) and are given G (t+1) ij = (c, c) in the previous step of the Gibbs sampler, we need to impute three missing potential outcomes Y ij (1, a 0 ), Y ij (0, a 1 ), Y ij (1, a 1 ) using corresponding parameters (p 1,a1,(c,c) ) 2 ). If G (t+1) ij ∈ {(a, a), (n, n), (n, a)}, we can invoke Assumption 8 and only need to impute either Y ij (0, a 0 ) or Y ij (0, a 1 ), depending on the observed treatment assignment for the unit. If A o j = a 0 , we impute Y ij (0, a 1 ), otherwise, we impute Y ij (0, a 0 ). If G (t+1) i,j = (n, c) and A o j = a 0 , we need to impute two missing potential outcomes Y ij (0, a 1 ), 1), we impute Y ij (0, a 0 ) and Y ij (0, a 1 ). The same arguments apply to the units with M o i,j = 1 except that their potential outcomes are completely missing. Therefore, we need to impute all of four potential outcomes, Y ij (0, a 0 ), Y ij (1, a 0 ), Y ij (0, a 1 ) and Y ij (1, a 1 ), for the units.

Finally, in
Step 2(d), we use conjugate prior distributions for all parameters. Had we observed the full compliance types of the units, the resulting complete-data likelihood Now, assuming appropriate conjugate prior distributions for each parameter, we use the corresponding factor of the complete-data likelihood function to derive the posterior update for each parameter.
Update of π: Consider a prior distribution (π (c,c) , π (a,a) , π (n,n) , π (c,a) , π (n,c) , π (n,a) ) ∼ Dirichlet(α) where α = (1, 1, 1, 1, 1, 1). Then, using the corresponding factor of L comp , we draw from π (t+1) ∼ Dirichlet(N g is the number of compliance type g at the t-th MCMC step. Update of p: Consider a prior distribution p z,a g ∼ Beta(1, 1). Then, using the corresponding factor of L comp , we have the posterior distribution such that, for z = 0, 1 and a = a 0 , a 1 , Note that, by Assumption 8, the number of parameters to be estimated varies with compliance behaviors. This can be seen in the complete-data likelihood as well. That is, if the parameter is not defined due to Assumption 8, the corresponding observations are used for the update of the counterpart parameter. For example, p 1,a0,(a,a) is not defined due to Assumption 8. In such a case, all the observations with A o j = a 0 are used for updating p (0,a0,a,a) regardless of the observed value of Z o i,j .

)
=4765.78 Table 15: Evaluation for the Log-Normal data-generating processes under the superpopulation perspective. Table 20 presents additional simulation results under MNAR. This simulation aims to compare the performance of our Bayesian methodology with the IJM method for different missing rates of outcomes. We fix N = 10000 and J = 100. The missingness column "None" means that we do not have any missing outcomes. "Moderate" means that the outcomes are missing based on the probabilities given in Table 18. "Severe" corresponds to the case where the outcomes are missing with higher probabilities. The missing probabilities are provided in Table 19. We see that the higher the missing rate is, the more poorly the IJM method performs. Our method successfully accommodates the underlying missingness mechanism so the results do not change much.  Z ij = 0, A j = a 0 Z ij = 1, A j = a 0 Z ij = 0, A j = a 1 Z ij = 1, A j = a 1 C cc 10 15 10 15 C aa 15 -15 -C nn 9 -9 -C ca 10 15 15 -C nc 9 -10 15 C na 9 -15 -. Table 18: Additional parameters for simulation studies under the MNAR mechanism in Section 4.4 Z ij = 0, A j = a 0 Z ij = 1, A j = a 0 Z ij = 0, A j = a 1 Z ij = 1, A j = a 1 ρ cc 0.  Table 19: Missing probabilities for the severe case. Z ij = 0, A j = a 0 Z ij = 1, A j = a 0 Z ij = 0, A j = a 1 Z ij = 1, A j = a 1 ρ cc 0.5 0.4 0.5 0.4 ρ aa 0.