Optimal group testing designs for prevalence estimation combining imperfect and gold standard assays

We consider group testing studies where a relatively inexpensive but imperfect assay and a perfectly accurate but higher-priced assay are both available. The primary goal is to accurately estimate the prevalence of a trait of interest, with the error rates of the imperfect assay treated as nuisance parameters. Considering the costs for performing the two assays and enrolling subjects, we propose a Ds-optimal mixed design to provide maximal information about the prevalence. We show that extreme values for the cost of the perfect assay lead to designs in which only one of the two assays is used, but otherwise the optimal designs use both assays. We provide a guaranteed algorithm to efficiently build an optimal design on discrete design spaces. Our computational results also show the robustness of the proposed design. MSC2020 subject classifications: Primary 62K05; secondary 62P10.


Introduction
Estimating the proportion of a population having a trait, for example, the prevalence of a disease, is an important objective in scientific research and is full of history. When the disease or the trait is rare and the individual samples can be pooled as a group sample to be tested simultaneously, group testing, also known as pooled testing, can be applied to save cost and to improve the estimation accuracy. The key assumption of conventional group testing is that the test result from a group sample is positive if and only if at least one individual in the group has the trait. Ever since Dorfman (1943), the group testing approach has found numerous applications, such as human and plant disease screening and surveillance (Pilcher et al., 2005;Liu et al., 2011;Saá et al., 2018), genetics studies (Gastwirth, 2000), and drug development (Xie et al., 2001).
Most theoretical studies on group testing only consider the setting where all tests are performed using the same assay. Some researchers make a simplified assumption that the test is perfect (Hughes-Oliver and Swallow, 1994;Hughes-Oliver and Rosenberger, 2000) or that the testing error rates are known values (Tu et al., 1995;Liu et al., 2012), while others allow the error rates to be estimated from the data (Huang et al., 2017(Huang et al., , 2020. In practice, the tests used in most group testing experiments are likely to have imperfect performance, and the error rates for the target population are unknown to the researchers. However, in some situations a gold standard assay, which is perfectly accurate but may be too expensive to be adopted widely, such as the Western-blot assay for HIV screening (Delaney et al., 2006), is available. Zhang et al. (2014) provided an optimal validation subsample design to efficiently estimate the unknown prevalence in their setting. Their main idea was that all group samples with a common size are tested by an inexpensive assay to provide a large amount of mixed information about the prevalence and the testing error rates. In addition, some samples were tested a second time using the gold standard to characterize the error rates, which improved the estimation and identification of the prevalence.
This work motivates us to consider the design problem under a more flexible set-up: (i) Our optimal designs are sought from among all possible designs, without any restriction on the number of different group sizes. (ii) We consider mixed designs based on three testing procedures: each group sample is tested either by only one of the two assays, or by both assays. (iii) We utilize the cost considerations as in Huang et al. (2020), that is, the relative costs for performing assays and for enrolling subjects, and then we fix the total budget and control the numbers of tests and subjects through their costs. In particular, according to (iii), when prevalence estimation is the only interest, only the gold standard is used when its cost is sufficiently low, and only the regular assay is used when the cost of the gold standard is sufficiently high; besides these two extreme conditions, a mixed design of the three procedures should be adopted.
The outline of this paper is as follows. We introduce some statistical background in Section 2. In Section 3, we theoretically characterize a class of designs containing the optimal ones, and then we provide a search algorithm to obtain an optimal design in this class. We show some examples and discuss computation and other practical issues in Section 4. All proofs and technical details are given in Appendix A.

Modeling and cost framework
We refer to the cheaper, imperfect assay as the regular assay, and the higherpriced, perfect assay as the gold standard. The testing procedures based on only the regular assay, only the gold standard, or both assays are denoted as procedures r, g, and b respectively. Let θ = (p 0 , p 1 , p 2 ) , where p 0 is the prevalence, and p 1 and p 2 are respectively the true positive and true negative rates of the regular test. The parameters p 1 and p 2 are also known as sensitivity and specificity, respectively. We assume that p 0 ∈ (0, 1), p 1 , p 2 ∈ (0.5, 1], the two types of testing errors of the regular test occur randomly with error rates 1 − p 1 and 1−p 2 , respectively, and there is no testing error for the gold standard. Hence, for the procedure based on only the regular assay, the positive response probability (either true or false positive) of a trial with group size x is Similarly, for the procedure based on only the gold standard, the positive response probability of a trial with group size x is Moreover, when both assays are applied to the same group sample, the probabilities of observing that the regular assay has a true negative, a false negative, a false positive, and a true positive are In practice, there are generally limits on the group sizes; therefore, we consider designs subject to a known group size constraint 1 ≤ x L ≤ x ≤ x U < ∞, where the limits on the group sizes are driven by practical considerations.
To introduce costs, we let the total budget be C, and we assume that the costs of performing an assay and enrolling a subject are c 0p and c 1p , respectively, where the testing procedure is p = r, g, or b. In practice, these costs are known, with c ir ≤ c ig ≤ c ib for i = 0, 1. Without loss of generality, we set the cost for a regular individual test to satisfy the constraint c 0r + c 1r = 1, and the total budget C refers to how many trials with regular individual testing can be afforded. The cost function of a trial with group size x using procedure p = r, g, or b is denoted as (2.1) When the gold standard is not available, the scenario with a fixed total number of trials (Liu et al., 2012;Zhang et al., 2014;Huang et al., 2017) is equivalent to setting c 0r = 1, and the scenario with a fixed total number of subjects (Tu et al., 1995;Liu et al., 2012;Zhang et al., 2014) is equivalent to setting c 0r = 0. When the three testing procedures are all available, a mixed design can be utilized. Let P = {r, g, b} be the set of procedures. An experiment consists of n ip trials with group size x ip , where x i1p = x i2p if i 1 = i 2 , under procedure p ∈ P for i = 1, . . . , k p . For such an experiment, we denote a mixed design ξ with total budget C = p∈P w ip is the proportion of budget expended on procedure p, A p x refers to the single-point design with procedure p at size x, and ξ p = kp i=1 (w ip /w p )A p xip refers to the sub-design under procedure p.
For such an experiment, let y ir ∼ binomial(n ir , π r (x ir )) be the number of positive responses at x ir for i = 1, . . . , k r ; let y ig ∼ binomial(n ig , π g (x ig )) be the number of positive responses at x ig for i = 1, . . . , k g ; and let y ib = (y 00 ib , y 01 ib , y 10 ib , y 11 ib ) ∼ multinomial(n ib , π b (x ib )) be the number of true negative, false negative, false positive, and true positive responses at x ib . Thus, after subtracting an unimportant additive constant, the log-likelihood function of θ under design ξ in equation (2.2) is a linear combination of the log-likelihood functions under the single-point designs The maximum likelihood estimateθ is obtained by maximizing function (2.3), and the covariance matrix ofθ is asymptotically proportional to the inverse of the information matrix of ξ, which is a linear combination of the information matrices of the corresponding single-point designs, To precisely estimate the prevalence p 0 , we seek a design that minimizes the variance ofp 0 by the choice of k p and (x ip , w ip ) for i = 1, . . . , k p and p ∈ P under the cost constraint C and the design space constraint [x L , x U ]. In design theory, the design corresponds to a D s -optimal design (Atkinson et al., 2007, Chapter 10.3), which maximizes among all designs under which p 0 is estimable, where M − is a generalized inverse of matrix M and M 11 is its (1,1)-entry. From (2.4) and (2.5), the optimality of a design depends on the unknown parameters θ = (p 0 , p 1 , p 2 ) and on the cost functions (2.1) as inverse weights, but is invariant to the total budget C.

D s -optimal budget-constrained designs
In this section we theoretically characterize D s -optimal budget-constrained designs via the approximate design framework proposed by Kiefer (1974). We first consider the design space as the interval [x L , x U ] to obtain an overview of the behavior of these designs. Then we use these results to provide a guaranteed algorithm to obtain optimal designs supported on integers within [x L , x U ] for practical use.
Intuitively, when c g (x) is close to c r (x), one may prefer only to have gold standard tests, which means k b = k r = 0. In contrast, when c g (x) is much higher than c r (x), one may prefer only to have regular tests, and thus k b = k g = 0. Before addressing the relationship between cost functions and the best choice of {k r , k g , k b }, we first define a valid design to be one where p 0 is estimable. We then provide a class Ξ of designs in which there exists a D s -optimal design, with sharp upper bounds for the numbers of group sizes under the three procedures, where The following proposition characterizes all valid designs and establishes that Ξ contains a D s -optimal design. Its proof and other theoretical results are provided in Appendix A.

(i) A design ξ is valid if and only if
Note that k r ≥ 3 or k b ≥ 1 yields a nonsingular information matrix. Now we partition Ξ into subclasses. For a nonempty subset of procedures Q ⊆ P, we state a design ξ ∈ Ξ is a Q-design if k p > 0 for p ∈ Q and k p = 0 for p / ∈ Q. We denote that Ξ Q is the collection of all Q-designs. Therefore, Ξ is partitioned as Since D soptimality is equivalent to c-optimality with c = (1, 0, 0) (Atkinson et al., 2007, Chapter 17.5), which minimizes the asymptotic variance of c θ , a D s -optimal design must satisfy the generalized Elfving's theorem (Dette and Holland-Letz, 2009). In the next theorem we show that we only need to search for a D s -optimal design in Ξ r , Ξ g , Ξ b , and Ξ rb .
Theorem 3.1. There exists a D s -optimal design with either k g = 0 or k r +k b = 0.
Theorem 3.1 greatly simplifies the problem by eliminating the need to consider three of the most difficult subclasses, namely Ξ rg , Ξ gb , and Ξ rgb , among the seven. For the remaining four subclasses, we provide the corresponding optimal designs in three of them below, leaving only one subclass for further consideration. Now we characterize D r s -, D g s -, and D b s -optimal designs, respectively. Theorem 2 in Huang et al. (2020) provides the D r s -optimal design ξ * r which is a threepoint design involving the smallest allowable group size x L . Here we provide D g sand D b s -optimal designs using the complete class approach (Yang and Stufken, 2012). . (3.1) We have the following proposition.
Through Theorem 3.1, one of the three designs, ξ * r , ξ * g , and ξ * b , may be D soptimal among all valid designs. However, if none of them is, then there must exist an rb-design that is D s -optimal. More explicitly, Theorem 3.1 and Proposition 3.2 show that either ξ * g or an optimal design in Ξ r⊕b ≡ Ξ r ∪ Ξ b ∪ Ξ rb is D s -optimal among all valid designs, where the latter may be ξ * r , ξ * b , or an rb-design. Moreover, if ξ * b is optimal within Ξ r⊕b , then ξ * g must outperform all designs in Ξ r⊕b . Otherwise, we need to check whether ξ * g or an optimal design in Ξ r⊕b performs better.
Furthermore, we state that a design ξ ∈ Ξ r⊕b is D r⊕b s -optimal if it is D soptimal within Ξ r⊕b . Now we consider D r⊕b s -optimal designs. Proposition 3.1 implies that valid designs in Ξ r⊕b have nonsingular information matrices. Therefore, it is convenient to utilize the general equivalence theorem (Kiefer, 1974) to check whether ξ * r , ξ * b , or an r⊕b-design is D r⊕b s -optimal. We first state some notations. For a design ξ ∈ Ξ r⊕b and a single-point design A p x with procedure p = r or b, we define the directional derivative of Φ s at M (ξ) in the direction where M is the 2 × 2 submatrix of M after deleting the first row and the first column. We have the following theorem based on the proof of the general equivalence theorem.
In practice, group testing designs can only be supported on integers. Above we provide some key properties of D s -optimal designs supported on the interval [x L , x U ]. Our proofs of Propositions 3.1(i) and 3.2, and of Theorems 3.1 and 3.2 still hold for the integer-valued design space Ω (i) = {x L , . . . , x U }. We state that the D s(i) -optimality is the D s -optimality among integer-supported designs, where "(i)" stands for integers. A D r s(i) -optimal design, denoted as ξ (i) r , can be obtained from Algorithm 1 in Huang et al. (2020). Moreover, D g s(i) -and D b s(i)optimal designs, ξ (i) g and ξ (i) b , are respectively supported on the integers closest to x * g and x * b in (3.1) for p = g or b. Therefore, using similar arguments as above, Theorem 3.1 and Proposition 3.2 show that either ξ (i) g or a D r⊕b s(i) -optimal design is D s(i) -optimal among all designs supported on Ω (i) .
First we can utilize Theorem 3.2 to check whether ξ If none of them is, we provide a guaranteed algorithm based on Theorem 3.2 to obtain a D r⊕b s(i) -optimal design. For a design ξ and a procedure p, letφ p (ξ) = max x φ p (x, ξ) and letx p (ξ) be the corresponding maximizer. The proposed algorithm is presented as follows.

4.3: Let ξ
Once we obtain a D r⊕b s(i) -optimal design through Algorithm 3.1, we can compare it with ξ (i) g to check which one is D s(i) -optimal, using the algorithm below.
Algorithm 3.2 Construct a D s -optimal design ξ (i) .

1: Obtain ξ
Theorems 3.1 and 3.2 guarantee that the design obtained by Algorithms 3.1 and 3.2 must be D s -optimal. Algorithm 3.1 is guaranteed to terminate in a finite number of iterations, and it usually stops within ten iterations in our experience. The finite convergence of Algorithm 3.1 is due to its finite design space, and the sequence of designs is strictly monotonic with respect to the D s -criterion. Moreover, as a result of the convexity of the design criterion, this stepwise ascent algorithm converges to a global optimum. Note that these two algorithms which apply to discrete design spaces sometimes provide designs with more group sizes than stated in Proposition 3.1(ii), which applies to continuous design spaces. For example, in Section 4.2 we observe some optimal designs having two group sizes under procedure b, in contrast to the setting where k b ≤ 1 in Proposition 3.1(ii). This is because when an optimal value for the continuous design space is not an integer, and the two integers closest to it present roughly equal performance, the design including both integers may perform better than the designs including either one or the other.
Remark 3.1. In some practical situations only procedures r and g are available; for example, a group sample can only be tested once by either one of the two assays. Under this simpler setting, Theorem 3.1 indicates that a mixed design incorporating procedures r and g is not necessary. The better one between ξ (i) r and ξ (i) g with respect to the D s -criterion is optimal. Remark 3.2. For an approximate mixed design ξ and a given total budget C, the expected number of trials n 0 ip = Cw ip /c p (x ip ) with group size x ip under procedure p is not guaranteed to be an integer. For practical use, we obtain ξ with total budget C by implementing a two-step efficient rounding procedure for a budget-constrained design, introduced in Section 3.2 of Huang et al. (2020). More explicitly, we first allocate the budget to have n 0 ip trials with group size x ip under procedure p, where x is the largest integer not greater than x. Afterward, among all possible remaining budget allocations, we choose the one that minimizes the asymptotic variance of the prevalence estimate.
Remark 3.3. The above theoretical results rely on the assumption that each individual sample having the disease is independently Bernoulli distributed with prevalence p 0 . Although in practice, the sampling scheme is usually without replacement, the Bernoulli assumption is acceptable when the number of individuals in the study is smaller than a certain percentage of the population size, for instance, less than 20% (Christensen and Gardner, 2000).

Illustrative numerical examples
In this section we provide some numerical examples to study the performance of the D s(i) -optimal design when the working parameters may be moderately misspecified, and to investigate how multiple assays together with the cost settings affect the optimal designs. We consider the case where the working values of prevalence, true positive rate, and true negative rate areθ = (p 0 ,p 1 ,p 2 ) = (0.06, 0.95, 0.95) , with group sizes in [1,50]. We focus on the natural situation where the cost for performing both the regular and gold standard assays on a group sample is equal to the sum of the costs for performing the two assays separately, and where enrolling a subject for each procedure has identical costs. In this situation we only have two independent cost parameters c 0r ∈ [0, 1] and c 0g ≥ c 0r , whereas the other cost parameters are c 0b = c 0r + c 0g and c 1r = c 1g = c 1b = 1 − c 0r .

Sensitivity analysis
Here we examine how moderately misspecified working parameters affect the performance of the proposed D s(i) -optimal design. We set c 0r = 0.5 and c 0g = 10. By Algorithm 3.2 with working parametersθ = (0.06, 0.95, 0.95) , we obtain a D s(i) -optimal designξ supported on group sizes 11 with procedure r and 14 with procedure b with weights 0.352 and 0.648, respectively. When the total budget C = 8000, by Remark 3.2, we implementξ as 470 trials with size 11 under r and 296 trials with size 14 under b. The performance ofξ under a true value of θ is measured by the asymptotic efficiency where ξ * (θ) is a D s(i) -optimal design under θ.
Suppose that the true value of θ comes from Θ = (0.03, 0.09)×(0.9, 1) 2 , which coversθ. Figure 1 shows the asymptotic efficiency ofξ for θ drawn randomly from Θ. Among the 1000 draws, the asymptotic efficiency ofξ is always at least 60%, and is usually greater than 90% (860 draws), indicating that the designξ performs stably well. We also observe that the asymptotic efficiency decreases as the true prevalence remains away from the working prevalence, and as the true positive and true negative rates tend to one.

The role of multiple assays with cost considerations
Here we consider how the use of two assays together with cost constraints leads to different optimal designs than arise when either of these features is not present. We consider three different cases, with c 0r = 1, 0.5, or 0 respectively, and study how the optimal designs vary with c 0g . Three properties of variations concerning group sizes x ip , weights (proportions of budget) w ip , and relative numbers of trials n 0 ip /N , where N = i,p n 0 ip , for the design considered are presented in Figures 2, 3, and 4.
First, we consider the case with c 0r = 1, where subjects are cost-free. The properties of the proposed D s(i) -optimal designs for the selected c 0g are shown in Figure 2. As expected, a g-design and an r-design are optimal when c 0g is sufficiently low and sufficiently high, respectively, and an rb-design is optimal otherwise. When the gold standard assay is sufficiently cheap, in this example no more than twice the cost of a regular individual test, using only the gold standard assay is the optimal strategy, as it provides the most information about the prevalence. In contrast, when the gold standard assay is sufficiently expensive, at least 17 times the cost of a regular individual test in this example, it is optimal to run ξ (i) r which only uses procedure r. Besides these two extreme situations, a mixed design incorporating procedures r and b is optimal. In this case, the optimal design allocates more budget on procedure b, but more trials on procedure r. Moreover, a group size based on procedure r is close to the intermediate group size of ξ (i) r which mainly provides information about the prevalence (Huang et al., 2017). In particular, when c 0g is moderately expensive (c 0g ≥ 7 here), the optimal designs include the smallest group size x L with procedure r, which provides cheap information about the true negative rate to identify the prevalence (Huang et al., 2020). Moreover, under such moderate c 0g the optimal group sizes for the two procedures are usually different, and there may be multiple group sizes for one procedure. This indicates that using a common group size for both procedures r and b may not be optimal when distinct sizes for different procedures are allowed.
Next, we consider the cases with c 0r = 0.5 or c 0r = 0, whereby in the latter case the regular assay is cost-free. The properties of the proposed D s(i) -optimal designs for the selected c 0g in these two cases are shown in Figures 3 and 4. We observe patterns similar to those in Figure 2, where the optimal design strategy as c 0g increases is to first use procedure g only, then use a mixed design with procedures r and b, and finally use procedure r only. In addition, as the subject cost, c 1r = 1 − c 0r , grows, the r-designs are not favored because they usually contain tests with extremely large group sizes (Huang et al., 2020). Through Figures 2, 3 and 4, a design based on only procedure r is optimal only when the gold standard assay is extremely expensive, c 0g ≥ 17, 180, or 320 for the three cases considered here with subject cost c 1r = 0, 0.5, or 1, respectively. This indicates that as the subject costs grow, which is typical in modern studies involving human subjects, it is often worthwhile to develop a gold stan- dard assay, even when the imperfect regular assay is cost-free. Our results show that mixed designs are often optimal and may substantially outperform single assay approaches.

Appendix A: Supporting and technical materials
Here we provide the technical details of the propositions and theorems in the main text. In Section A.1 we present some supporting materials mainly based on the complete class approach. In Section A.2 we provide the technical proofs of the propositions and theorems.

A.1. Supporting materials
Firstly we provide the complete class results for procedures g and b based on the framework in Yang and Stufken (2012). The design ξ 1 is said to be at least as informative as ξ 0 under the Löwner ordering, denoted by ξ 1 ≤ L ξ 2 , if M (ξ 2 )−M (ξ 1 ) is nonnegative definite. Moreover, we state that a class of designs Ξ b1 ⊂ Ξ b0 = {ξ : k b > 0, k r = k g = 0} is a complete class under procedure b if for an arbitrary design ξ 0 ∈ Ξ b0 , there exists a ξ 1 ∈ Ξ b1 such that ξ 0 ≤ L ξ 1 . Now we present the following results.
Proof. For (i), note that the information matrix of A g x can be written as M (A g x ) = ν(x)e 1 e 1 , where e 1 = (1, 0, 0) and .
Therefore, by the definition of x * g , we have that ∀x, A g x ≤ L A g
We prove part (ii) as follows. First, use the general equivalence theorem to show that k r of a D s -optimal design cannot be greater than 3, which is identical to part (i) of Theorem 2 in Huang et al. (2020) and is therefore omitted. Moreover, Proposition A.1 implies that k g ≤ 1 and k b ≤ 1 are sufficient to ensure the existence of a D s -optimal design by the complete class approach under the Löwner ordering (Yang and Stufken, 2012).
Proof of Theorem 3.1. This theorem is proved by the generalized Elfving's theorem (Dette and Holland-Letz, 2009, Theorem 3). If there is a D s -optimal design, denoted by ξ = w r ξ r + w g ξ g + w b ξ b , with k r + k b ≥ 1 and k g ≥ 1, then ξ g and w r ξ r + w b ξ b are D s -optimal, where w p = wp wr+wb for procedures p = r or b. Let μ p (x) = λ p (x) 1/2 f p (x) for p = r, g, b0, b1, or b2. By Elfving's theorem, because ξ is also c-optimal with c = (1, 0, 0) , there exists δ ip = ±1 for i = 1, . . . , k p , procedure p = r or g, and exists δ ibj for j = 0, 1, 2 with 2 j=0 δ 2 iBj = 1 for i = 1, . . . , k b , such that the Elfving's representation of ξ, lies on the boundary of the generalized Elfving set E for some γ > 0.
Because μ g (x) ∝ c for all x, we have that both of the two designs ξ g and w r ξ r + w b ξ b must have their Elfving's representations lying in the direction of c. Therefore, it must be the case that the two representations are both equal to γc, which means that the two designs are both D s -optimal.
Proof of Proposition 3.2. Proposition A.1(i) directly shows that ξ * g is D g soptimal. By Proposition A.1(ii), there exists a group size x such that A b x is D b s -optimal. By maximizing Φ s {M (A b x )}, we get that x * b is the optimal group size. Finally, because we have the desired result.
Proof of Theorem 3.2. Note that all designs in Ξ r⊕b have a nonsingular information matrix. If ξ 1 is D r⊕b s -optimal, then for an arbitrary group size x and procedure p = r or b, the directional derivative of Φ s at M (ξ) in the direction M (A p x ) must have φ p (x, ξ) ≤ 0; otherwise there exists a δ 0 > 0 such that ξ 2 = (1 − δ 0 )ξ 1 + δ 0 A p x ∈ Ξ r⊕b and Φ s {M (ξ 2 )} − Φ s {M (ξ 1 )} > 0, which makes a contradiction.
On the other hand, if ξ 1 is not D r⊕b s -optimal, then there exists another design ξ 2 such that Φ s {M (ξ 2 )} − Φ s {M (ξ 1 )} > 0. Because M (ξ 2 ) is a linear combination of the information matrices of its support points under the corresponding procedures, we have that ip s are the group sizes in ξ 2 . Therefore, there must exist some φ p (x (2) ip , ξ 1 ) > 0.

Acknowledgment
The first two authors were supported by the Ministry of Science and Technology, Taiwan, with grants MOST 107-2118-M-008-005-MY2 and MOST 103-2118-M-110-002-MY2, respectively. The third author was supported by the National Center for Theoretical Sciences, Taiwan. The authors thank the editor and the referee for their valuable comments, Hsin-wen Chang for discussions on this topic, and Enago (www.enago.tw) for the English language review of this manuscript.