Flexible Inference of Optimal Individualized Treatment Strategy in Covariate Adjusted Randomization with Multiple Covariates

To maximize clinical benefit, clinicians routinely tailor treatment to the individual characteristics of each patient, where individualized treatment rules are needed and are of significant research interest to statisticians. In the covariate-adjusted randomization clinical trial with many covariates, we model the treatment effect with an unspecified function of a single index of the covariates and leave the baseline response completely arbitrary. We devise a class of estimators to consistently estimate the treatment effect function and its associated index while bypassing the estimation of the baseline response, which is subject to the curse of dimensionality. We further develop inference tools to identify predictive covariates and isolate effective treatment region. The usefulness of the methods is demonstrated in both simulations and a clinical data example.


Introduction
Precision medicine, which is defined as treatments targeted to individual patients' needs based on genetics, biomarker, phenotypic, or psychosocial characteristics that distinguish a given patient from other patients with similar clinical presentations (Jameson & Longo 2015), has generated tremendous interest in statistical research.Precision medicine based on individual's health-related metrics and environmental factors is used to discover individualized treatment regimes (ITRs); methodology for such discovery is an expanding field of statistics.
The rules based on parametric models of clinical outcomes given treatment and other covariates are simple, yet can be incorrect when parametric models are misspecified.The rules based on machine learning techniques to determine the relationship between clinical outcomes and treatment plus covariates, such as in Zhao et al. (2009Zhao et al. ( , 2011)), are nonparametric and flexible but are often complex and may have large variability.Existing semiparametric methodologies, such as Murphy (2003) and Song et al. (2017), flexibly incorporate the relationship between the covariates and the response variables, but are not efficient due to the challenges in estimating the decision rules.Moreover, to the best of our knowledge, these existing semiparametric models do not allow covariate adjusted randomization, where a patient is randomized to one of the treatment arms based on the patient's covariate and a predetermined randomization scheme.Therefore we are motivated to derive semiparametric efficient estimators for flexible models that allow covariate adjusted randomization.
Let X i be the p-dimensional covariate vector, Z i be the treatment indicator, and Y i be the response, and assume (X i , Z i , Y i )'s are independent and identically distributed (iid) for i = 1, . . ., n.Here, the treatment indicator Z i is often categorical (i.e. when several treatment arms are considered) but can also be continuous (i.e. when a continuous of dosage range is considered).In the simplest case when Z i is binary, we denote Z i = 1 if the ith patient is randomized to treatment and Z i = 0 to placebo.When the randomization is covariate adjusted, the probability distribution of Z i depends on X i .Our goal is to analyze the data (X i , Z i , Y i ) for i = 1, . . ., n, so that we can develop a treatment rule that is best for each new individual.Without loss of generality, we assume larger value of Y 0 indicates better outcome.Equivalently, we want to identify a deterministic decision rule, d(x), which takes as input a given value x of X 0 and outputs a treatment choice of Z 0 that will maximize the potential treatment result Y 0 .Here X 0 denotes the covariate of the new individual, Z 0 the treatment decision and Y 0 the potential treatment result given Z 0 .
Let P d be the distribution of (X 0 , Z 0 , Y 0 ) and E d be the expectation with respect to this distribution.The value function is defined as ).An optimal individualized treatment rule d 0 maximizes V (d) over all decision rules.That is, It is important to recognize the difference between the clinical trial data (X i , Z i , Y i ), i = 1, . . ., n and the potential data associated with the new individual (X 0 , Z 0 , Y 0 ).
In the covariate adjusted randomization, let the probability of assigning to the treatment or placebo arm be a known function f Z|X (x, z).Here and throughout, we omit the subindex i when describing the data from a randomized clinical trial as long as it does not cause confusion.We model the treatment effect using a single index model of the covariates, while leaving the baseline response unspecified.This yields the model We assume the regression error is independent of the covariates X and the treatment indicator Z, and ∼ N (0, σ 2 ).We do not assume Z and X to be independent, hence allow the covariate adjusted randomization procedure.Naturally from (1), g(β T x) is the treatment effect as a function of the covariate value x.We assume g to be a smooth function.
To retain the identifiability of g(•) and β, we fix the first component of β to be 1.Our goal is to estimate β and g(•), hence to estimate the treatment effect given any covariate x.
The contribution of our proposed method beyond existing literature can be summarized in the following four folds.First, we propose an estimator and show that it is locally semiparametric efficient.Second, our method allows randomization to depend on patients' covariates, which significantly generalizes the practicality of the semiparametric methodology in real studies.Third, although illustrated for binary treatment selection, the proposed method does not restrict to the binary treatment case and is directly applicable even when Z is categorical or continuous.Thus, the method applies to slope estimation while the intercept, i.e. f (X) in (1), is unspecified and inestimable due to the curse of dimensionality caused by the multi-dimension of X. Fourth, different from Song et al. (2017) where B-spline expansion were used, we employ kernel smoothing techniques into the estimation and inference for optimal treatment regimes.Fifth, our method is more flexible in terms of model assumptions compared to Zhu et al. (2020).
The rest of the article is organized as the following.In Section 2, we devise a class of estimators for β and g(•) while bypassing the estimation of f (•).We study the large sample properties of the estimators and carry out inference to detect effective treatment region in Section 3. Simulation experiments are carried out in Section 6, and the method is implemented on a clinical trial data in Section 7. We conclude the article with a discussion in Section 8 and collect all the technical derivations and proofs in the Supplementary Material.

Basic estimation
The estimation of β and g(•) is complicated by the presence of the intercept term f (X).
When X is of high or even moderate dimension, f (X) is challenging to estimate due to the well known curse of dimensionality.Thus, a simple treatment is to eliminate the effect of f (X).Following this approach, multiplying Z and E(Z | X) on (1), we have Taking difference of the above two equations yields Viewing X and Y as the covariate and response respectively, this is a classical single index model, and we can estimate β and g using standard methods (Ichimura 1993).Considering that var(Z | X) might be close to zero, one may want to avoid taking its inverse.In this case, a more stable estimator would be to minimize and parameters in β, where At a fixed β value, the minimization with respect to (a j , b j ) does not rely on other (a k , b k ) values for k = j, hence can be done separately, while the optimization with respect to β involves all the terms.The resulting β estimates β and the resulting a j estimates g(β T x j ).We can further estimate g(β T x 0 ) by a 0 which is obtained by

Proposed estimator
Although the above procedure provides one way of estimating β and g(•), it is somewhat ad hoc, and it is unclear if other estimators exist to achieve similar or better estimation.To investigate the problem more thoroughly and systematically, we start with the likelihood function of a single observation (x, z, y), Here η(x) is the marginal probability density or mass function (pdf or pmf) of X, and φ(•) is the standard normal pdf.We first focus our attention on estimating β alone, thus we view g together with f , η and σ as nuisance parameters.In this case, ( 2) is a semiparametric model, thus we derive the estimators for β through deriving its influence functions and constructing estimating equations.It is easy to obtain the score function with respect to β through taking partial derivative of the loglikelihood function with respect to the parameter.In Supplementary Material A.1, we further project the score functions S β (x, z, y, β, g, f, σ) onto the nuisance tangent space, a space spanned by the nuisance score functions, and obtain the efficient score function Inspired by the form of the efficient score function, we propose a general class of consistent estimating equations for β as where f * is an arbitrary function of x and h * is an arbitrary function of β T x.Regarding g * and u * , we have the freedom of estimating one of the two functions and replacing the other with an arbitrary function of β T x, or estimating both.
To explore the various flexibilities suggested above, let f * (x) be an arbitrary predecided function.For example, f * (x) = 0. We first examine the choice of approximating both u(•) and g(•).As a by-product of local linear estimation, we also approximate g (•).Let u(β T x) be a nonparametric estimation of u(β T x).Note that u(β T x) involves only univariate nonparametric regression.Specifically, using kernel method, . (3) Let α = (α c , α 1 ) T and for j = 1, . . ., n, let α(β, j) solve the estimating equation w.r.t.α c and α 1 for given β and In (4), we can replace w ij with K h {β T (x i − x j )} and the resulting estimating equation is identical.Note that the above procedure enables us to obtain α c (β, j) as an approximation of g(β T x j ) and α 1 (β, j) as an approximation of g (β T x j ).We then plug in the estimated u, g and g and solve to obtain β.It is easily seen that the procedure we described above is a type of profiling estimator, where we estimate g(•), g (•), u(•) as functions of a given β, and then estimate β.The idea behind the construction of ( 4) is similar to the consideration of the efficient score function, and we describe the detailed derivation in Supplementary Material A.2.The estimator based on solving (5) is guaranteed to be consistent, and has the potential to be efficient.In fact, it will be efficient if f * (•) happens to be the true f (•) function.Of course, this is not very likely to happen in practice.However, this is still the estimator we propose, because f (•) is formidable to estimate for large or even moderate p.Compared to all other estimators that rely on the same f * (•), the estimator proposed here yields the smallest estimation variability and is the most stable, as we will demonstrate in Section 3.
The explicit algorithm of estimating β and g is described below.
Remark 1.We left out the details on how to select the bandwidths in the algorithm.In Section 3, we will show that a large range of bandwidth will yield the same asymptotic results and no under-smoothing is required.In other words, when the sample size is sufficiently large, the estimator of β is very insensitive to the bandwidth.However, for small or moderate sample size and when the estimation of the functional form g is of interest, then different bandwidths may yield different results.In this case, a careful bandwidth selection procedure such as leave-one-out cross-validation needs to be implemented.

Alternative simpler estimators
To estimate β, instead of estimating both u(•) and g(•), we can estimate only u(•), as we now investigate.The basic idea is to replace g and g using arbitrary functions, say g * and h * respectively in the efficient score function construction, and solve to obtain an estimator for β.The simplest choice will be to set f * = g * = 0, h * = 1.Denote the estimator β.To further estimate the function g(•), we can use a simple local constant estimator via solving The resulting solution α c ( β, 0) is then our estimate of g at β T x 0 , i.e. g(β T x 0 ) = α c ( β, 0).
Likewise, we can also opt to estimate g(•) instead of u(•).We can choose either to estimate g (•) or to make a subjective choice for it.When we estimate g (•) along with g(•), the procedure is the following.Solve (4) to obtain α(β, j) for j = 1, . . ., n.Then solve to obtain β.Here u * is an arbitrarily chosen function, for example, the simplest is to set u * = 0.
Because the procedures involved in these two simpler estimators are similar and are both simpler compared with the estimator described in Section 2.2, we omit the details on the computational algorithms.
3 Large sample property and inference Theorem 1. Assume the regularity conditions listed in Supplementary Material A.3 hold.
When n → ∞, the estimator described in Section 2.2 satisfies the property that √ n( β−β) → N (0, A −1 BA −1 T ) in distribution, where Here and throughout the text, a ⊗2 ≡ aa T for a generic vector or matrix a.When f * (X) = f (X), the estimator obtains the optimal efficiency bound.
We also have similar large sample properties for the two alternative estimators given in Section 2.3, stated in the following Theorems 2 and 3.The proofs of Theorems 1, 2 and 3 are given in the Supplementary Material.
Theorem 2. Assume the regularity conditions listed in Supplementary Material A.3 hold.
When n → ∞, the estimator obtained from solving (6) satisfies the property that Here, g * (•) is the first derivative of g * (•).
Interestingly, even when f * (X) = f (X), g * (β T X) = g(β T X) and h * (β T X) = g (β T X), the estimator still does not achieve the optimal efficiency bound.This is in stark contrast with the proposed estimator, which is optimal as long as f * (X) = f (X).
Theorem 3. Assume the regularity conditions listed in Supplementary Material A.3 hold.
When n → ∞, the estimator obtained from solving (7) satisfies the property that When f * (X) = f (X) and u * (β T X) = u(β T X), the estimator is efficient.

Estimation and Inference when Z is continuous
When Z is continuous, typically representing the dosage, the treatment effect of the form Zg(β T X) may not be adequate.The only conclusion that can be drawn from such a model is that if g(β T X) is positive, the largest dosage Z should be selected and when it is negative, the smallest dosage should be taken.More useful models in this case include a or some other nonlinear model of Z.For example, if the quadratic model is used, a new patient with covariate X 0 and g 2 (β T X 0 ) < 0 would have the best treatment We thus consider a general polynomial model of the form while keep all other aspects of the model assumption identical to that of (1).Similar to (2), the pdf in this case is and it has efficient score where Here at any x 0 = x 1 , . . ., x n and any β, where Similar to the binary Z case, we also have the standard root-n consistency and local efficiency of β for continuous Z.We state the results in Theorem 4 while skipping the proof because it is almost identical to that of Theorem 1.
Theorem 4. Assume the regularity conditions listed in Supplementary Material A.3 hold.
When n → ∞, the estimator described in ( 12) and ( 13) satisfies the property that Additionally, when f * (X) = f (X), the estimator obtains the optimal efficiency bound.

Estimation and Inference when Z is categorical
To make the analysis complete, we now consider the case where Z is categorical.This arises naturally in practice when several different treatment arms are compared.Assume we consider K treatment arms in addition to the control arm.Thus, we have K binary variables, Z 1 , . . ., Z K , each takes value 0 or 1.Note that at most one of the Z k , k = 1, . . ., K values can be 1 in each observation.A sensible model in this scenario is and the pdf is where they form a set of orthonormal bases.We can see that the efficient score has much resemblance with the case for the continuous Z case, except that we are now treating a multivariate Z.We show the derivation of the efficient score in Supplementary Material A.8, where the detailed construction of W k 's are also given.Similar to the continuous univariate Z case, we propose a general class of consistent estimating equations for β as where Here at any x 0 = x 1 , . . ., x n and any β, g k (β T x 0 , β), g k1 (β T x 0 , β) are nonparametric profiling estimators obtained from where Similarly, we summarize th asymptotic properties of β in Theorem 5 and skip the proof because of its similarity to that of Theorem 1.
Theorem 5. Assume the regularity conditions listed in Supplementary Material A.3 hold.
When n → ∞, the estimator described in ( 16) and ( 17) satisfies the property that Additionally, when f * (X) = f (X), the estimator obtains the optimal efficiency bound.

Simulation studies
We conduct six sets of simulation experiments, covering all the scenarios discussed so far, to investigate the finite sample performance of the methods proposed in Section 2. The results under various scenarios reflect the superior performance of Method I and show reasonably accurate inference results as well.We also compute the percentage of making correct decisions based on g(•) functions.In most of the scenarios, we are able to make correct decisions more than 90% of the times.In each experiment, we simulate 500 data sets.

Simulation 1
In the first experiment, we consider a relatively simple setting.Specifically, we let the covariate X i = (X i1 , X i2 ) T , where X i1 and X i2 are independently generated from a uniform distribution on (0, 1).We set f (X i ) = 0.05(X i1 + X i2 ) and g(β where β = (1, −1) T .Note that the function g is monotone.We generated the model error i from a centered normal distribution with standard error 0.3, and generated the treatment indicator Z i from a Bernoulli distribution with probability 0.5.
We implemented four methods for estimating the unknown parameter β for comparison.
Method I is the one given in (5).Method II is the alternative method proposed in ( 6), where we let f * = g * = 0, h * = 1.Method III corresponds to (7) and Method IV refers to the method proposed in Section 2.1.To ensure identifiability, we fix the first component of β to be 1.
From the results reported in Table 1, we can observe that while all the estimation methods have very small biases and standard errors, Method I performed significantly better than all other methods in terms of both estimation bias and the standard errors.This is within our expectation because Method I is a locally efficient estimator.In addition, in our experience, the computation time for all four methods are also comparable.Thus we recommend using Method I due to its superior performance.Consequently, we focus on Method I to proceed with the inference of β and the estimation and inference of g(•).
Although Theorem 1 provides an explicit form of the asymptotic variance, the resulting form contains f (X), whose estimation is subject to the curse of dimensionality and we have successfully avoided so far.Thus, to proceed with inference on β, we evaluate the coverage probabilities of the 95% confidence regions for β resorting to the bootstrap method.The coverage probabilities for β 2 are reported in the upper part of Table 2 under Method I.All results are reasonably close to the nominal level.To evaluate the accuracy of the subsequent treatment assignment rule based on sign{ĝ( β T X)}, we calculate the percentage of making correct decisions, defined as PCD = 1 We also evaluated the value functions via In this particular simulation, due to covariate-independent randomization, VF = (2/n) . The values of PCD and VF and their standard errors are reported in the upper part of Table 2 under Method I.
For comparison, we consider the semiparametric single index model (SIM) approach by Song et al. (2017) to estimate the optimal individualized treatment strategy.Results are provided in Table 2.We can clearly see that our proposed Method I performs much better in that it yields smaller bias in estimating parameter β 2 , PCD and value function, compared to the SIM approach.The coverage probability of estimating β 2 using SIM approach does not achieve the nominal level of 95%.The estimation variability of our Method I is also less than SIM method.

Simulation 2
In the second experiment, the covariate X i = (X i1 , X i2 , X i3 ) T , where X i1 , X i2 , X i3 are also independently generated from a uniform(0,1) distribution.We let f . Thus, the g function here is both nonlinear and non-monotone.The regression errors i and the treatment indicator Z i are generated identically as in Simulation 1. Similar to Simulation 1, we implemented the four methods to compare the estimation of unknown parameter β, while fixing the first component of β at 1.The estimation bias and standard errors are reported in Table 3. From the results, we can again conclude that Method I performed significantly better than all other methods, even though all methods have acceptable bias and standard errors.We computed the coverage probabilities of the  4 under Method I. To further check the performance, we plot the median, 95% confidence bands for the estimation of g(•) in Figure 1 and the performance is satisfactory.We also reported the results obtained by the SIM approach (Song et al. 2017) in Table 4.As expected, our proposed Method I performs better compared to SIM method, even when the treatment effect function is not monotone.Our proposed Method I has smaller bias and less estimation variability.

Simulation 3
In the third simulation, the covariate X i = (X i1 , X i2 ) T is generated from bivariate normal distribution with zero mean and identity covariance matrix.Here, g(β The regression errors i , the baseline model f (X i ) and the treatment indicator Z i are generated similarly as in Simulation 1.
We summarized the estimation bias and standard errors of unknown parameter β in Table 5, obtained from four methods.As usual, Method I performed significantly better compared to other methods.In the upper part of Table 6, we reported the coverage probability of the 95% bootstrap confidence intervals of β, mean and standard deviations for VF and PCD for Method I. We also computed the 95% confidence band and median for g(•) in Figure 2.For comparison, we summarized the results via SIM (Song et al. 2017) in the lower part of Table 6.We observe that our proposed Method I yields smaller bias and

Simulation 4
In the fourth simulation, the covariate X i = (X i1 , X i2 ) T is generated the same as in Simulation 3. We set g(β

The regression errors i
and the baseline response model f (X i ) are generated identically as in Simulation 1.We allow the distribution of treatment indicator Z i to depend on the covariates.Specifically, Z i is generated from a Bernoulli distribution with probability of success exp(γ where γ = (0.3, −0.2) T .Note that here we considered covariate adjusted randomization.
From the results in Table 7, we can observe that all estimation methods have acceptable  biases and standard errors even in covariate adjusted randomization setup.Method I performed significantly better compared to other methods, as expected.In the upper part of Table 8, we summarized the coverage probability of the 95% bootstrap confidence intervals of β, mean and standard deviations for estimating VF and PCD for Method I. We also plotted the 95% confidence band and median for g(•) in Figure 3. From the results provided in Table 8, We can clearly see that our proposed Method I performs better in estimating parameter, β 2 , PCD and value function, compared to the SIM approach.The estimation variability of our Method I is also less than SIM.

Simulation 5
In this simulation setting, we consider the categorical treatment indicator.Our covariate X i = (X i1 , X i2 , X i3 ) T is generated from multivariate standard normal distribution.We consider three levels for treatment indicator Z i , 0, 1 and 2 with probability 0.4, 0.4 and 0.2, respectively.Here, relative to the base line at Z = 0, g 1 (β

and the regression error
i is generated identically as in Simulation 2. As we have repeatedly observed in previous simulations that Method I is superior to other three methods, we only consider Method I to estimate the unknown parameter β and to proceed with the inference of β and the estimation and inference of g 1 (•) and g 2 (•).
The results are summarized in the first parts of Tables 9 and 10.In the second parts of the aforementioned tables, we provided the results obtained from SIM.From Table 9, we can see that using our proposed Method I, estimation bias is acceptable and the coverage probability obtained from bootstrap confidence interval is close to the nominal level of 0.95.
On the other hand, as expected, due to the non-monotone treatment effect models g 1 (•) and g 2 (•), SIM yields large bias.Also, SIM performs poorly in terms of inference of β.From Table 10, we can clearly observe that our proposed Method I yields higher PCD values and also small bias in estimating value function, compared to SIM.In Figure 4, we computed 95% confidence band and median for g 1 (•) and g 2 (•).

Simulation 6
In the last setting, we consider continuous treatment indicator.We generate covariate X i as in Simulation 4, and consider a quadratic treatment effect ) and generated the regression error i as in Simulation 2. Similar to Simulation 5, we implement Method I to estimate the unknown parameter β and proceed with the inference of β and the estimation of value function and PCD for g 1 (•) and g 2 (•).The results are summarized in the upper parts of Tables 11 and 12. From the upper part of Table 11, we can see that estimation bias is acceptable and the coverage probability obtained from bootstrap confidence interval is close to the nominal level of 0.95 for our proposed method.Also, value function obtained by proposed method is close to true value.In Table 12, we summarized the mean and standard deviations for PCD for g 1 (•) and g 2 (•).In Figure 5, we also plotted 95% confidence band and median for g 1 (•) and g 2 (•).
We also compared the performance of our method with the existing kernel assisted learning (KAL) method (Zhu et al. 2020).From the lower part of Table 11, we clearly see that the our method performs better than KAL method.We suspect that due to their restrictive model set up, the value function estimation using KAL is not trustworthy.Indeed, Zhu et al. (2020) consider the following set up : a unimodal function which is maximized at 0 and H(X) is a non-negative function.Then E(Y | Z, X) is maximized at Z = g( β T X), where g : R → A is a predefined link function which ensures that the suggested dose falls within safe dose range (A).On the other hand, we consider a general polynomial model, particular simulation setting, we consider k=2.Thus, we try to re-write our model similar to their model so that we can easily implement their method in our simulation study.After re-writing, we get and our model satisfies the conditions of KAL when β T X > √ 2. Thus, we use the subset of simulated data that fits into the KAL model requirement to to obtain the KAL estimate of β 2 and the value function.for patients with non-psychotic major depressive disorder.This study aimed to determine what antidepressant medications should be given to patients to yield the optimal treatment effect.For illustration purpose, we focused on a subset of 319 participants who were given treatment bupropion (BUP) or sertraline (SER).Among these participants, 153 patients were randomly assigned to the BUP treatment group and the rest of them were assigned to the SER treatment group.The 16-item Quick Inventory of Depressive Symptomatology-Self-Report scores (QIDS-SR( 16)) were recorded at treatment visits for each patient and considered as the outcome variable.We used R = −QIDS-SR( 16) as the reward to accommodate the assumption that the larger reward is more desirable.
In the original data set, there are a large number of covariates that describe participant features such as age, gender, socioeconomic status, and ethnicity.However, many of them are not significantly related to the QIDS-SR( 16).According to the study conducted by Fan et al. (2016), we included five important covariates into our study.These five covariates are "fatigue or loss of energy" in baseline protocol eligibility (DSMLE, X 1 ), patient's age (Age, X 2 ), "ringing in ears" in patient rated inventory of side effects at Level 2 (EARNG-Level2, X 3 ), "feeling of worthlessness or guilt" in baseline protocol eligibility (DSMFW, X 4 ) and "hard to control worrying" in psychiatric diagnostic, screening questionnaire at baseline (WYCRL, X 5 ).
Let X = (X 1 , X 2 , X 3 , X 4 , X 5 ) T and let Z i be the treatment indicator where Z i = 1 means the participant was assigned to the BUP treatment, Z i = 0 represents the SER group.We fitted the model Y i = f (X i ) + Z i g(X T i β) + i .We are interested in finding the optimal treatment assignment to the patients.Since treatment effects are described by g(X T β), we first obtained the estimators for β and the function g(•), using the proposed locally efficient estimator described in Method I. We fixed the coefficient of X 1 to be 1 and estimated the remaining four coefficients.The estimator for (β 2 , β 3 , β 4 , β 5 ) T was (3.00985, −4.08772, 3.83337, 0.460488) T .To check if these coefficients are significant, we used bootstrap to obtain the estimation variance.We provide a box plot of the bootstrap estimators of β in Figure 6.We also constructed the confidence intervals for β 2 to β 5 respectively.We found that the effects of β 3 and β 4 are significantly different from zero.The 95% confidence intervals for β 2 , . . ., β 5 are, respectively, (−2.001, 3.621), (−4.737, −0.299), (0.657, 4.920) and (−4.207, 3.224).q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q β 2 β 3 β We also estimated the function g(•) nonparametrically according to the method introduced in the Section 2.2 to obtain g, with its plot in Figure 7.To evaluate the significance of the nonparametric function g(•), we applied a permutation test.Specifically, we randomly permute the treatment label Z i 's and estimate g(•) based on each permuted data.If the true g(•) function is zero, then the permutation should not alter the true function g(•) and the function estimated from the original real data, i.e. g, is not likely to be an extreme case among all the estimated g(•) functions based on the permuted data sets.We plot the pointwise upper 95% quantile curve based on 1000 permuted data sets in Figure 7.It is clear that there is one region where the estimated curve g is above the 95% quantile curve.
Therefore, in this region, we have a significantly positive treatment effect using BUP.
Finally, we compare the optimal treatment assignment with the real treatment assignment in this experiment.Under the estimated optimal ITR, we assign the participants according to their corresponding estimation of g(•), i.e. the ith participant is assigned to the treatment BUP if g(X T i β) > 0 and is assigned to SER if g(X T i β) ≤ 0. Then we classified the patients according to the optimal treatment assignments and the real treatment assignments.The results are displayed in Table 13.We can see that a total of 179 participants were not assigned to their corresponding optimal treatment group.Therefore, the proposed method could have been used for improving the patient satisfaction in this example.

Conclusion
We note that in the derivations of the proposed estimator, we did not use the fact that Z 2 = Z in the binary data.In fact, the estimation procedure, asymptotic theory and implementation details are directly applicable even when Z is categorical or continuous.A categorical Z corresponds to the case of comparing multiple treatments, while a continuous Z could be used when a range of dosage levels are under examination.
In many clinical studies, the covariate is often of very high dimension.To develop optimal individualized treatment rules in this case, it will be important to develop simultaneous variable selection and estimation of individualized rules.It is also of great interest to extend the current approach to multi-stage randomized clinical studies.

SUPPLEMENTARY MATERIAL
A.1 Derivation of the efficient score function S eff (x, z, y, β, g, f, σ) for binary Z in Section 2.2 Taking derivative with respect to η, f, g and σ of the logarithm of (2), we obtain It is easy to verify that Λ η ⊥ (Λ f + Λ g + Λ σ ) and Λ σ ⊥ (Λ f + Λ g ).To further orthogonalize Λ f and Λ g , define Consequently, the nuisance tangent space orthogonal complement is characterized through We further calculate the score function to obtain where x L is the lower (p − 1)-dimensional sub-vector of x.Let We decompose the score function as Obviously, hence the efficient score function is A.2 Derivation of the estimating equation ( 4) Following the local linear estimation idea, we replace g(β T x) with α c + α 1 β T (x − x 0 ).Our goal is to estimate α = (α c , α 1 ) T .The corresponding pdf of (X, Z, Y ) then becomes Viewing α together with β as the parameter of interest, η, f and σ as nuisance parameters, at any fixed β, following the similar lines of derivation as in Section A.1, we have Further, the score functions with respect to the parameters of interest β, α are respectively Decomposing the score functions as we verify that Therefore, Extracting the last two components of the efficient score function provides us the element to construct the estimating equations for α.Since our goal is to estimate g(•) at β T x j , we set x 0 = x j for j = 1, . . ., n, and weigh the evaluation of the estimating function at the ith observation using w ij .This yields (4).

A.3 List of regularity conditions
C1.The univariate kernel function K(•) is symmetric, has compact support and is Lipschitz continuous on its support.It satisfies C2.The probability density function of β T x denoted by f 1 β T x is bounded away from 0 and ∞.
, where a 1 (X, Z, Y ) can be any Y , Z and x.

C5. The bandwidths satisfy nh
C6.In a domain of the parameter values B, the limiting estimating equations for β when n → ∞ has a unique solution.

A.4 Proof of Theorem 1
We decompose the estimating equation as It is easy to obtain that On the other hand, we make the following expansion and we inspect each of the above eight terms.It is easy to see that under Condition C5, the first four terms will go to zero in probability when n → ∞.Plugging (3) in to the fifth term, we have Denote the last summand a(o i , o j ), where On the other hand Hence, using the U-statistic property, the fifth term in (A.1) is of order o p (1).We now examine the sixth and seventh term in (A.1).From Section 2.2, we can easily obtain Similarly, we can obtain and Combining the above results, we obtain Combining with the definition of u(β T x i ), these results directly lead to the conclusion that the sixth and seventh terms in (A.1) are of order o p (1) as well.Therefore, we have obtained It is easy to check that and when f * (•) = f (•), this is the variance of the efficient score function, hence we have proved Theorem 1.

A.5 Proof of Theorem 2
We decompose the estimating equation as It is easy to obtain that Using the form of u(•) in (3), we write Denote the summand in the last equation a(o i , o j ).Taking expectation of a(o i , o j ) conditional on the ith observation, we have Taking expectation of a(o i , o j ) conditional on the jth observation, we have Obviously, E{a(O i , O j )} = 0. Thus, using the U-statistic property, we obtain From central limit theorem, the above expression converges to a normal distribution with mean zero and variance given in (8).
Thus we obtain the final expansion We now handle the second term in (A.3).Denote the summand b(o i , o j ).Taking expectation of b(o i , o j ) conditional on the ith observation, we have Taking expectation of b(o i , o j ) conditional on the jth observation, we have as well, hence the last term is of order o p (1).Central limit theorem then directly leads to the result, with the variance given in (9).
A.7 Derivation of the efficient score function S eff (x, z, y, β, g, f, σ) for continuous Z in Section 4 Taking derivative with respect to η, f, g and σ of the logarithm of (11), we obtain The nuisance tangent space with respect to the various nonparametric components are respectively , and the nuisance tangent space is then Λ = It is easy to verify that Λ f ⊥ Λ g k for k = 1, . . .K, and Λ We now further orthogonalize the Λ g k 's. Define We can verify that E(W j W k | β T X) = I(j = k).It is easy to see that the W 1 , . . ., W k is an orthonormal representation of Z − E(Z | X), . . ., Z k − E(Z k | X).Although the relation is complex, it is a linear transformation hence using matrix notation, we can write the relation as Define further Then, we have A kj (β T x)W j g k1 (β T x)x L + σ −2 Obviously, hence the efficient score function is A.8 Derivation of the efficient score function S eff (x, z, y, β, g, f, σ) for categorical Z in Section 5 Taking derivative with respect to η, f, g and σ of the logarithm of (15), we obtain η 1 (X)f Z|X (Z, X).
The nuisance tangent space with respect to the various nonparametric components are respectively Λ η = {a(X) : ∀a(X) ∈ R p−1 s.t.E(a) = 0} where = Y − f (X) − K k=1 Z k g k (β T X), and the nuisance tangent space is then Λ = To further orthogonalize let It is easy to verify that Λ f ⊥ Λ g k for k = 1, . . .K, and Λ f + Λ g We now further orthogonalize the Λ g k 's.
Define W 1 = Z 1 − E(Z 1 | X), W 1 = W 1 /sd( W 1 | β T X).For j < k, k = 2, . . ., K, let We can verify that E(W j W k | β T X) = I(j = k).It is easy to see that the W 1 , . . ., W k is an orthonormal representation of Z 1 − E(Z 1 | X), . . ., Z k − E(Z k | X).Although the relation is complex, it is a linear transformation hence using matrix notation, we can write the relation as Define further Then, we have Λ = Λ η ⊕ Λ f ⊕ Λ g 1 ⊕ • • • ⊕ Λ g K ⊕ Λ σ .Consequently, the nuisance tangent space orthogonal complement is characterized through We further calculate the score function to obtain where x L is the lower (p − 1)-dimensional sub-vector of x.We decompose the score function as S β (x, z, y, β, g, f, σ) Obviously, hence the efficient score function is they form a set of orthonormal bases.Supplementary Material A.7 provides the detailed construction of W k 's and the derivation of the efficient score function.Naturally, based on the form of the efficient score function and our experience in the binary Z case, we propose a general class of consistent estimating equations for β as n i=1

Table 1 :
Bias and standard deviations (SD) for the estimation of β 2 in Simulation 1.

Table 2 :
(Song et al. 2017)erence results for β 2 in Simulation 1 using Method I and SIM approach(Song et al. 2017).We report mean of estimated single index coefficient biases, standard errors (SE) of estimated single index coefficients, estimation results for PCD and Value function over 500 replications with their empirical standard deviations (SD).True value function is 0.4314.
(Liu et al. 1999)confidence regions for β for Method I based on the depth function(Liu et al. 1999)to accommodate the two dimensional parameter.The results are reported in Table4.All results are reasonably close to the nominal level.We also reported the mean and standard deviations of VF and PCD in the lower part of Table

Table 4 :
(Song et al. 2017)erence results for β 2 in Simulation 2 using SIM approach(Song et al. 2017).Other caption is same as Table2.True value function is 0.3937.

Table 5 :
Bias and standard deviation (SD) for the estimation of β 2 in Simulation 3.

Table 6 :
Estimation and inference results for β 2 in Simulation 3 using Method I and SIM approach.Other caption is same as Table2.True value function is 1.1306.

Table 7 :
Bias and standard deviation (SD) for the estimation of β 2 in Simulation 4.

Table 8 :
Estimation and inference results for β 2 in Simulation 4 using Method I and SIM approach.Other caption is same as Table2.True value function is 0.9027.

Table 9 :
Bias, standard deviation (SD) and coverage probabilities (CP) for the estimation of β 2 and β 3 in Simulation 5 using proposed method and SIM method.

Table 10 :
The percentage of correct decisions (PCDs) with standard errors, and the estimated value function (VF) with empirical standard deviations in Simulation 5, using proposed method and SIM method.Here, PCD1 and PCD2 are related to g 1 (•) and g 2 (•), respectively.True value function is 0.8056.

Table 11 :
(Zhu et al. 2020nference results of β 2 and estimation of Value function using our proposed method and KAL method(Zhu et al. 2020) in Simulation 6. True value function is 0.6652.

Table 12 :
The percentage of correct decisions (PCDs) with standard deviations obtained by proposed method in Simulation 6.Here, PCD1 and PCD2 are related to g 1 (•) and g 2 (•), Qian et al. (2013)) demonstrate our proposed method using the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study.The STAR*D study was a sequential, multiple-assignment, randomized trial (SMART,Murphy (2005)andQian et al. (2013))

Table 13 :
Real treatment assignments versus the optimal treatment assignments.