Optimal designs for testing the efficacy of heterogeneous experimental groups

This paper develops a unified framework for deriving optimal designs for hypothesis testing in the presence of several heteroscedastic groups. In particular, the obtained optimal designs are generalized Neyman allocations involving only two experimental groups. In order to account for the ordering among the treatments, particularly relevant in the clinical context for ethical reasons, we provide the optimal design for testing under constraints reflecting their effectiveness. The advantages of the suggested allocations are illustrated both theoretically and through several numerical examples, also compared with other designs proposed in the literature, showing a substantial gain in terms of both power and ethics. AMS 2000 subject classifications: Primary 62K05, 62L05; secondary 62G20, 60F05.


Introduction
This paper addresses the issue of designing experiments for comparing several treatments when the principal inferential aim is testing the homogeneity of the treatment effects. Starting from the classical one-way analysis of variance (ANOVA) of Sir. R.A. Fisher, the problem of comparing the equality of several means has a long history in the statistical literature and covers all the applied fields. Over the past 50's, there has been a growing stream of papers about the design of experiments for treatment comparisons; however, they are almost exclusively focused on estimation precision. In particular, having in mind the linear homoscedastic model set-up, balancing the allocations among treatments is often considered as desirable, since this strategy optimizes the usual alphabetical criteria for the estimation of the treatment effects. However, balance could be highly inefficient in the case of heteroscedasticity, or when inference is focused on the treatment contrasts; moreover, it could be strongly inappropriate for clinical trials, since the demand of individual care often induces to skew the allocations to the best performing drugs.
Although the ability of detecting a significant treatment difference is a fundamental issue for statistical inference, in the design of experiment literature very little attention has been devoted to hypothesis testing, also due to the underlined complex mathematical structure. Only recently there has been a growing interest on this topic, in particular in the clinical/pharmaceutical research, also due to the encouragement of Health Authorities [5]. In the context of binary trials, Tymofyeyev et al. [14] were among the first authors to derive the design maximizing the non-centrality parameter (NCP) of the Wald test of homogeneity. In general, this design is a degenerate allocation involving only the best and the worst treatments, with no observations on the intermediate ones: for this reason, a lower bound of each treatment allocation proportion is superimposed and the related constrained optimal design is derived. By applying the same methodology to exponential outcomes, Zhu and Hu [17] derived the corresponding optimal allocations, while Sverdlov et al. [13] extended their results in the presence of censoring. In general, the ensuing designs are discontinuous (non-degenerate) functions of the unknown model parameters (i.e., they are locally optimal) and, by suitable smoothing transformations, they could be implemented in a sequential fashion via response-adaptive randomization procedures, namely sequential rules that change the treatment allocation probabilities to approximate the chosen target (for a review, see [2,11]).
Recently, Baldi Antognini et al. [3] derived the design maximizing the power of the test of homogeneity for normal homoscedastic data, which is a balanced allocation involving only the best and the worst treatments; moreover, by imposing the ethical constraint that the treatment allocation proportions should reflect the a-priori unknown ordering among their effects, they also derived a non-degenerate optimal target implementable via response-adaptive randomization. Under the same framework, assuming that the treatment ordering is known, Singh and Davidov [12] discussed the optimal designs for restricted and unrestricted statistical inference (which are equivalent for large samples) by adopting a maxi-min approach, in order to overcome local optimality problems.
The aim of the present paper is to present a unified framework for deriving optimal designs for hypothesis testing in the presence of several experimental groups, also encompassing the general ANOVA set-up with heteroscedastic errors. In particular, the optimal designs are generalized Neyman allocations involving only two treatments, not necessarily the best and the worst ones. In order to account for the ordering among treatments (which could be particularly relevant in the clinical context, for ethical reasons), we derive constrained optimal designs, where the allocation proportions are themselves ordered as the treatment efficacies. Since the ordering among the effects is generally apriori unknown, the ensuing allocations are locally optimal designs that can be approached by response-adaptive randomization procedures after suitable smoothing techniques. Several illustrative examples are provided for normal, binary, Poisson and exponential data (with and without censoring), also amending Optimal designs for testing 3219 some previously obtained results. The properties of these designs are described both theoretically and through numerical examples, showing a substantial gain in terms of power and ethics as well.
The paper is structured as follows. Section 2 deals with optimal designs for hypothesis testing, taking into account both unconstrained and constrained optimization. Section 3 discusses the performance of the proposed allocations both analytically and numerically, also compared with other designs suggested in the literature. Section 4 deals with a general discussion about our results, including also their implementation via response-adaptive randomization methodology, while Section 5 concludes the paper. The mathematical details are available in the Appendix.

Preliminaries
Suppose we have K ≥ 2 competing treatments and let δ i = (δ i1 , . . . , δ iK ) t be the indicator managing the allocation of the ith subject, namely δ ik = 1 if he/she is assigned to treatment k (k = 1, . . . , K) and 0 otherwise. Given the assignments, the observations Y i s are assumed to be independent and identically distributed belonging to the exponential family parameterized in such a way that θ k ∈ Θ ⊆ R denotes the mean effect of treatment k, while v k = v(θ k ) ∈ R + represents the corresponding variance (k = 1, . . . , K) and we set θ = (θ 1 , . . . , θ K ) t and v = (v 1 , . . . , v K ) t . Under such a mean-value parameterization, the Maximum Likelihood Estimators (MLEs) of the treatment effects are the sample means: special cases of practical relevance are binary B(θ k ) (θ k ∈ (0; 1), v(θ k ) = θ k (1 − θ k )) and Poisson P (θ k ) (θ k ∈ R + , v(θ k ) = θ k ) trials for dichotomous and count data, respectively, while normal model N (θ k ; v k ) (with θ k ∈ R and v(θ k ) = v k independent from θ k ) is also encompassed for continuous responses as well as the exponential one exp(θ k ) (θ k ∈ R + , v(θ k ) = θ 2 k ) for survival outcomes. After n allocations, let N n = n i=1 δ i = (N n1 , . . . , N nK ) t , where N nk = n i=1 δ ik denotes the number of assignments to treatment k and, clearly, N t n 1 K = n (here 1 K is the K-dim vector of ones); while ρ = n −1 N n is the vector of the treatment allocation proportions, where ρ k = n −1 N nk ≥ 0 for k = 1, . . . , K and ρ t 1 K = 1 for every n. Letθ n = (θ n1 , . . . ,θ nK ) t be the MLE of θ (i.e., , under well-known regularity conditionsθ n is strongly consistent and asymptotically normal with In this setting the inferential focus is on the contrasts A t θ where, considering without loss of generality (wlog) the first treatment as the reference one, Notice that both the likelihood ratio and score tests are asymptotically equivalent to the Wald test and, for normal homoscedastic outcomes, likelihood ratio and Wald's tests coincide. Thus, from now on we take into account the NCP as the inferential criterion to be maximized, since it represents an increasing measure of the approximate power (and of the exact power for normal data).

Unconstrained optimal design for testing
In this section we derive the optimal design maximizing the NCP, namely the allocationρ, defined on the simplexρ k ≥ 0 for every k = 1, . . . , K withρ t 1 K = 1, maximizing φ(·). As is well-known, for K = 2 the optimal design is the so-called Neyman Assuming (wlog) that high responses are preferable, the treatment outcomes are ordered on the basis of their effects and, for ease of notation, we assume (wlog) that θ 1 ≥ . . . ≥ θ K (i.e., the best treatment will be labelled as the first one, while the Kth treatment as the worst, admitting also clusters with the same efficacy), with at least one strict inequality. We wish to stress that this is a simple label-coding intended to avoid more complex notation; clearly, the treatment ranking is a-priori unknown, but it can be estimated sequentially, as we will discuss in Section 4. For Bernoulli, Poisson, exponential and normal homoscedastic models, this ordering corresponds to the classical stochastic order, while it does not imply a specific ordering for normal heteroscedastic data. For a given design ρ, let π = (π 1 , . . . , . . , K) then π t 1 K = 1 and, after straightforward calculation, θ k π k are the variance and the mean of θ (with K possibly different ordered support points θ 1 ≥ . . . ≥ θ K ), evaluated with respect to (wrt) the pdf π. Clearly, for homoscedastic treatment groups v k = ν (k = 1, . . . , K) and π = ρ.

Remark 2.2.
This representation is quite general and covers the case of data following the exponential distribution exp(θ k ) subject to an independent right censoring scheme (which is a common feature of survival trials). Indeed, let k = (θ k ) : R + → (0; 1) be the probability that a failure/death occurs before censoring in the kth group (k = 1 . . . , K), that are assumed to be constant for every subject in each group, then is a decreasing function depending on the particular censoring scheme adopted in the trial (one of the most general is described in [16]). In such a case, φ(ρ) in (2.1) should be simply re-parametrized by substituting each treatment variance v k = θ 2 k withv k = θ 2 k / (θ k ) (k = 1, . . . , K). However, we wish to stress that some models are not encompassed by our framework like, e.g., the normal distribution where the variance v k is not independent from the mean θ k .
The next Lemma shows some general properties of the function φ(ρ).
is optimal (namely, the Neyman allocation is spanned over the two clusters).
As discussed in Remark 2.2, for exponential outcomes with censoring the treatment variance should be re-scaled. Thus, from Theorem 2.1, the optimal design maximizing the NCP is the generalized Neyman allocatioñ ρĩk = eĩςĩk + ekςkĩ, where now ςĩk = 1 + θk (θĩ)/θĩ (θk) −1 , on the pair The maximization of the RHS in (2.4) depends on the specific form of the adopted censoring through (·), and it could involve each pair of treatments, not necessarily {1, K} (i.e., the one with the best and the worst treatments). Thus, this result conflicts with the optimal design obtained in [13] where, setting the minimum treatment allocation proportion equal to 0 leads toρ {1K} , as the following example shows.

Example 2.2.
As in [13], we take into account the censoring scheme suggested in [16] with duration D = 96 and recruitment period R = 55, for K While the general cases of normal heteroscedastic outcomes and exponential responses with censoring should be analysed by Theorem 2.1 and Corollary 2.1, the next Corollary provides some useful simplifications for the most common models, where the Neyman targetρ 1K involving the best and the worst treatments is optimal.
• under exponential responses in the absence of censoring, every designρ , in the absence of clusters of best and worst treatments) the optimal designρ =ρ 1K = e 1 [14] and [17], by covering every possible scenario of clusters of treatments. Indeed, in both papers h and s are assumed to be positive integers with h + s < K, therefore the special case when all treatments are grouped into two clusters is excluded (e.g., for K = 3, h = 2 and s = 1 or h = 1 and s = 2), also showing that every design spanning the Neyman target over the two clusters is optimal (instead of assuming as unique solutionρ [13,14,17]).

Constrained optimal designs for testing
This section deals with the problem of finding the design ρ * = (ρ * 1 , . . . , ρ * K ) t maximizing the NCP of the Wald test of homogeneity under the (ethical) con- Due to the complexity induced by this general framework, we need to introduce the following notation. Let and, for i = 1, . . . , K − 1,

6)
Optimal designs for testing Ifk is not unique but there exists (one or more)ȋ satisfying P 1 (namely P 1a and P 1b, where now P 1b should hold for every i = {k,ȋ}), then Ak = Aȋ and ρ * can be obtained by any convex combination of the corresponding constrained optimal designs ρ * k and ρ * i in (2.7), so that the NCP in (2.9) still holds. Whereas, if In all the other scenarios is the minimum number of inferior treatments that should be omitted in order to satisfy P 1 or P 2 and ρ * [K−ċ] is the previously defined constrained optimal design evaluated by taking into account the remaining K −ċ (superior) treatments.

Corollary 2.3. Assume that the variance v(·) is non-decreasing in θ and
Thus, when θ 1 > θ 2 , the constrained optimal design is Whereas, in the presence of a cluster of superior treatments In particular, normal homoscedastic, Poisson and exponential models satisfy condition (2.11) and τ in (2.8) is given by Although the hypothesis of Corollary 2.3 do not hold for binary outcomes, the constrained optimal design has an analogous form, as the following proposition shows.
Optimal designs for testing

Analytical and numerical comparisons
This section is dedicated to the performance assessment of the newly introduced optimal designs. Starting with the normal model, ρ * andρ will be compared with the balanced allocation ρ B and the design ρ M = (e 1 + e K )/2 proposed by Baldi Antognini et al. [3] and Singh and Davidov [12], which is the optimal design for normal homoscedastic data (i.e.,ρ = ρ M ) and it is also the target maximizing the minimum power for both restricted and unrestricted likelihood ratio tests under the simple order restriction θ 1 ≥ . . . ≥ θ K . Moreover, we will also consider the design ρ A provided in [6], which allocates observations to the experimental groups proportionally to the absolute values of the Abelson and Tukey [1] scores (note that, for K = 3, ρ M = ρ A ). For binary and exponential responses, we compare our proposals with ρ B and the design ρ H proposed by Tymofyeyev et al. [14] and Zhu and Hu [17], that maximizes the NCP under the constraint of a minimum prefixed threshold of allocations to each treatment.
tends to grow for values of θ 2 close to θ 3 , with a maximum efficiency around 80%. This behaviour is even more pronounced in the normal heteroscedastic scenario (a) for K = 3. Whereas, heteroscedastic scenario (b) exhibits decreasing Λ(ρ B ) in θ 2 for both K = 3 and 5. As regards Poisson and exponential models the performances improve as θ 2 → θ 1 , whereas for θ 2 close to θ 3 the efficiency becomes even lower than 40%. The behaviour of Λ(ρ B ) for increasing θ 1 is reported in Figure 2, where θ t = (θ 1 , 2, 1) and θ t = (θ 1 , 2.5, 2, 1.5, 1) for normal, exponential and Poisson models with θ 1 ∈ [3,11]; while for binary outcomes θ t = (θ 1 , 0.2, 0.1) and θ t = (θ 1 , 0.25, 0.2, 0.15, 0.1) with θ 1 ∈ [0.3, 0.9]. Similarly to Figure 1, when K = 5 poor performances in terms of Λ(ρ B ) are observed. For the binary, exponential and Poisson models, the efficiency is below 75% for K = 3 and it is below 51% for K = 5. As regards the normal model, graphical evidence points out that the best performances in terms of efficiency are achieved in the homoscedastic and heteroscedastic scenario (b), while in scenario (a) Λ(ρ B ) < 57% and Λ(ρ B ) < 32% for K = 3 and 5, respectively. As regards ethics, in the multi-treatment context several ethical measures could be adopted, some of them are only model-specific. In our general setup, as a measure of ethics we take into account the total expected outcome E n (ρ) = nθ t ρ; the corresponding ethical efficiency (θ t ρ − θ K )/(θ 1 − θ K ) ∈ [0; 1] will be provided within brackets. Since E 2n (ρ) = 2E n (ρ), in the following tables the ethical criterion will be provided only for n = 100.
As previously showed, the unconstrained optimal designρ maximizes the NCP and the approximate power too; therefore, it does not exists a target with better performances in terms of both ethics and inference wrtρ, simultaneously. Whereas, in some circumstances,ρ dominates ρ B and ρ M , as discussed in the following proposition. Proof. The proof follows easily from (2.1) after some algebra, by observing that Starting from the case of normal responses, Tables 1 and 2 summarize the performances in terms of P n with n = 50 and 100 (within brackets the corresponding efficiency evaluated wrtρ) and E n (with n = 100) of the considered allocations for K = 3, 4 and 5 treatments, as θ and v vary. Whereas, Tables 3  and 4 show the results in the case of binary and exponential outcomes, respectively, where the minimum proportion of subjects assigned to each treatment group for ρ H is set to 0.2 (K = 3, 4) and 0.15 (K = 5).
Let us first consider the results for normal responses:ρ exhibits the highest approximate power, with an ethical efficiency varying between 42.9% and 72.2%, while ρ * substantially shows the highest ethical efficiency (between 51.8% and 70.2%), also guaranteeing valid inferential performances (for n = 100, its efficiency is always greater than 65.6%). Excluding the homoscedastic scenario, the approximate power of ρ M is strictly related to the treatment variances: for unordered variances ρ M exhibits poor performances (showing in some cases an extremely low efficiency, equal to 39.3% for K = 4 and n = 50), while when the variances are ordered as the treatment effects, then P n tends to increase. Moreover, the ethical efficiency of ρ M is always equal to 0.5 and in several scenarios ρ M is dominated by ρ * , especially when the number of the treatment groups increases. The balanced design ρ B is always dominated by ρ * and it often shows the worst performances in terms of precision and ethics in all the considered scenarios. The approximate power provided byρ with n = 50 observations tends to be quite similar to the one of ρ B with n = 100 subjects. Even if ρ A = ρ M when K = 3 regardless of the values of the model parameters, in the experimental scenarios of Table 1 with K = 4, ρ A presents even lower approximate power than ρ M , with efficiencies between 35% and 48% in the case of heteroscedasticity. Excluding the homoscedastic case, for K = 4 the approximate power provided by ρ A is always lower than the one by ρ * . For what concerns ethics, the performances of ρ A are very similar to those of ρ M and ρ B . As regards binary trials in Table 3, the constrained optimal design ρ * and 3232 A. Baldi Antognini et al. ρ H tend to perform quite similarly in terms of inference, with an efficiency always higher than 68%, whereas ρ B shows the lowest approximate power with a maximum loss up to 44%. Taking into account the ethical criterion, that in this case corresponds to the total expected successes, ρ * andρ guarantee the highest ethical efficiency (with only one exception, where E n (ρ B ) is slightly bigger wrt E n (ρ)), with an ethical gain up to 8 and 12 successes wrt ρ H and ρ B , respectively. Similar considerations still hold for exponential responses reported in Table 4, where the minimum inferential efficiency becomes 73% for ρ * , 67% for ρ H and 58% for the balanced allocation. The ethical gain (i.e., the additional total expected survival time in this context) induced by ρ * wrt ρ H ranges from 36 to 90 (corresponding to a gain in efficiency between 5% − 18%) and it is even more evident for ρ B (the additional expected survival is up to 250, with an ethical gain up to 33% in terms of efficiency). The unconstrained optimal desigñ ρ exhibits the highest approximate power and, at the same time, the greatest ethical gain, therefore it dominates all the other designs. Similarly to the case of normal responses, the approximate power induced byρ with n = 50 observations tends to be quite similar to those of ρ B and ρ H with n = 100 subjects. In general, for all the considered models, bothρ and ρ * present high values of ethical efficiency. Moreover, for normal homoscedastic, binary and exponential responses, it is evident that ρ B is dominated by ρ * and also byρ (with the exception of the second last scenario of Table 3). As further comparisons (omitted here for brevity) showed, the results for Poisson outcomes are substantially the same of those of the exponential model. The case of exponential responses with censoring tends to be similar to that of Normal heteroscedastic data with variances ordered as the treatment effects, and it is strongly affected by the chosen censoring scheme. Finally, inspired by [12], it is interesting to compare the above-considered designs when the values of the model parameters induce unfavorable configurations. From (2.1), due to the heteroscedasticity, the NCP tends to vanish as the variances of the treatment group grow, regardless of the chosen design; so, for any fixed maximum difference between the treatment effects, there is no unique least favorable configuration, but different scenarios that tend to degenerate as the signal to noise ratio vanishes. In particular, Table 5 shows the performances of the considered targets for normal and exponential responses, where the maximum distance θ 1 − θ K has been set equal to 1. The upper part of the table summarizes the results for normal responses as the variances vary. In the first scenario,ρ shows the highest values of P n and E n , with a gain in terms of approximate power up to 10% wrt ρ M = ρ A and up to 9% wrt ρ * = ρ B . In the second and the third scenarios, the variances are at least ten times the value of the treatment effects; this has the effect of vanishing the differences among the  treatment means, leading to a drastically reduced approximate power. In such cases, the performances ofρ and ρ M tend to coincide (also due to the fact that the treatment variances are quite similar), while ρ * behaves similarly to ρ B . The bottom part of Table 5 deals with the case of exponential responses. In the first scenario,ρ provides a considerable gain in terms of approximate power wrt all the competitors; however, since the difference θ 1 − θ K is held constant, P n greatly decreases as the values of θ increase and all the considered targets tend to perform quite similarly (in particular for n = 50).

Implementation via response-adaptive randomization and discussion
The unconstrained optimal designρ in Theorem 2.1 and the constrained one ρ * in Theorem 2.2 depend on the unknown model parameters and therefore they are a-priori unknown (i.e., locally optimal). The dependence on the model parameters acts in terms of both i) the a-priori unknown treatment ordering and ii) the functional form of the optimal design itself, which is often a degenerate allocation with no assignments to some treatment groups. Unlike other alternative approaches suggested in the literature, which are intended to mediate the design criterion onto the entire parameter space to obtain good overall performances (such as, e.g., Bayesian or maxi-min approaches), in this paper we consider response-adaptive randomization as the natural solution to this local optimality problem. Under this framework, the exact optimal designs are sequentially estimated step-by-step: on the basis of earlier responses and past assignments, the unknown parameters are estimated along with the treatment ordering (which could change as the trial progresses) and thus, the next assignment is randomly forced to progressively approach the optimal target (for instance, by applying the Doubly Adaptive Biased Coin Design of [8]).
Even if response-adaptive randomization methodology seems a natural choice in order to implement the optimal proposed allocations, we wish to stress that the optimal designsρ and ρ * cannot be targeted directly, due to the fact that i) their functional forms are locally discontinuous around the subset of Θ K under which the treatment ordering changes (namely, where θ i tends to coincide with one or more θ k s) and ii) in several scenarios these optimal designs lay on the boundary. To overcome these drawbacks, that prevent the applicability of standard response-adaptive randomization methodology, a smoothing transformation (e.g., via a Gaussian kernel) can be applied to obtain a continuous and non-degenerate version of these targets (see, e.g., [14]). In particular, for the mono-parametric exponential family we take into account the convolution of ρ =ρ(θ) = (ρ 1 (θ), . . . ,ρ K (θ)) t (or, analogously, ρ * ), with a K-dim Gaussian kernel (σ 2 > 0 controls the degree of smoothing), namely we define the smoothed versionρ S (θ) = (ρ S 1 (θ), . . . ,ρ S K (θ)) t ofρ, by letting (which could be naturally extended to the case of heteroscedastic normal model whereρ =ρ(θ; v) : Θ K × R +K → [0; 1]). This smoothing transformation essentially impacts on the points of discontinuity ofρ and on its boundary, so that the smoothed optimal designρ S obey the classical regularity conditions of continuity, non-degeneracy and differentiability (see, e.g., [7,2]), that allows for the standard asymptotic inference for response-adaptive randomization procedures (therefore, all the asymptotics in Section 2.1 are still valid). For instance, taking into account binary trials with K = 3 treatments, Figure 3 illustrates the behaviour of the first component of ρ and the one of its smoothed versionρ S , with θ = (θ 1 , θ 2 , 0.15) t as θ 1 and θ 2 vary in [0; 1] (where now the treatment ordering is free to change on the basis of the values of θ 1 and θ 2 ).
In order to show how the performances ofρ andρ S (as well as those of ρ * and ρ * S ) are quite similar, Table 6 presents the approximate power P n and total expected outcomes E n under the same scenarios of Table 3 with K = 3 treatments.

Conclusions
This paper discusses optimal designs for hypothesis testing in the presence of heterogeneous experimental groups, also encompassing the general one-way ANOVA with heteroscedastic errors. In particular, we derive the allocation maximizing the NCP of the classical Wald test of homogeneity about the treatment contrasts; this optimal design is a generalized Neyman allocation involving only two treatments, not necessarily the best and the worst ones. Moreover, to account for the ordering among treatments, we derive the optimal design maximizing the NCP of the homogeneity test, subject to an ethical constraint reflecting the efficacy of the competing treatments. Due to the dependence on the unknown model parameters, these allocations are locally optimal and therefore a-priori unknown. Moreover, these designs are degenerate allocations with possible local discontinuities. To avoid these drawbacks, a smoothing transformation via Gaussian kernel implemented via response-adaptive randomization has been proposed. The suggested convolution avoids i) degeneracies by assigning a non-null mass on each treatment and ii) potential discontinuities. Thus, these smoothed optimal designs could be approached via standard response-adaptive randomization procedures that, by estimating at each step the unknown parameters as well as the treatment ordering, change sequentially the probabilities of treatment assignments in order to converge to the desired target. To provide a flexible and easy to use tool for implementing the proposed optimal designs and their smoothed versions, a fully documented R code is available in the GitHub repository https://bit.ly/38ER2CD. As showed in Section 3, the balanced allocation is strongly inappropriate due to heteroscedasticity. Moreover, any deviation from the assumption of homoscedasticity could also affect the inferential performances of ρ M and ρ A , which tend to exhibit a very low approximate power, especially in the case of unordered variances. In such scenarios,ρ S provides a remarkable gain in terms of inferential efficiency wrt all the other designs, guaranteeing also high ethical standards. Nevertheless, whenever the ethical dimension plays a crucial role, ρ * S could be preferable since, although it places more emphasis on the ethical aspects, it still guarantees good inferential performances, also wrt ρ H .

Acknowledgments
We are very grateful to the referees for their helpful suggestions and comments, which led to a substantially improved version of the paper.
is negative semi-definite, since it has K − 1 null eigenvalues and one eigenvalue (that coincides with the Laplacian)

A.3. Proof of Corollary 2.1
Since θĩ = θĩ and vĩ = vĩ , the pair {ĩ,k} maximizes the RHS of (2.2) if and only if {ĩ ,k} maximizes the RHS of (2.2). Therefore, from Theorem 2.1, both ρĩk andρĩ k are optimal designs as well as every mixture of them, where clearly ςĩk = ςĩ k . The case of two clusters of treatments follows easily from the previous result by noticing that max i,k∈{1,...,K} namely the pair {1, K} surely maximizes the RHS of (2.2).