Adaptive clinical trial designs for phase I cancer studies

: Adaptive clinical trials are becoming increasingly popular re- search designs for clinical investigation. Adaptive designs are particularly useful in phase I cancer studies where clinical data are scant and the goals are to assess the drug dose-toxicity proﬁle and to determine the maximum tolerated dose while minimizing the number of study patients treated at suboptimal dose levels. In the current work we give an overview of adaptive design methods for phase I cancer trials. We ﬁnd that modern statistical literature is replete with novel adaptive designs that have clearly deﬁned objectives and estab- lished statistical properties, and are shown to outperform conventional dose ﬁnding methods such as the 3+3 design, both in terms of statistical eﬃ- ciency and in terms of minimizing the number of patients treated at highly toxic or noneﬃcacious doses. We discuss statistical, logistical, and regu- latory aspects of these designs and present some links to non-commercial statistical software for implementing these methods in practice.


Introduction
The clinical drug development starts with phase I first-in-human (FIH) studies. The major goals of an FIH study are to explore the safety, tolerability and pharmacokinetics of the compound and to identify suitably safe doses for testing in subsequent studies. In many disease areas, FIH studies are conducted in healthy volunteers who do not expect to benefit from the drug. However, in life-threatening diseases, such as cancer, the drugs are highly toxic and may unnecessarily harm healthy volunteers. For this reason phase I oncology trials are typically small, non-comparative studies conducted in patients with advanced cancer who have failed standard treatment options.
For cytotoxic anticancer agents a dual assumption is usually made: both the probability of toxicity and the probability of therapeutic response are increasing with dose. It is implicitly assumed that a higher dose, if it is not prohibitively toxic, will subsequently translate into higher treatment efficacy. The primary objective of a phase I cancer trial is to determine the maximum tolerated dose (MTD), defined as the highest dose level at which the rate of toxicity is "acceptable". A clinical trial protocol must specify the criteria for dose limiting toxicity (DLT), often defined as any severe (requiring hospitalization) or life-threatening adverse event. In the USA, the National Cancer Institute Common Toxicity Criteria are commonly used to define DLT [1]. There are two different philosophies in defining the MTD [2]. The first one treats MTD as a statistic observed from the data and the second treats MTD as a quantile of a monotonic dose-toxicity probability curve. The second approach is more scientifically sound.
Let Ω d = {d 1 < · · · < d K } denote a set of pre-specified doses of an experimental drug which are selected based on data from animal studies. Let Y denote a binary indicator of toxicity (Y = 1 if DLT; Y = 0 if no DLT). Assume a monotone relationship between the dose and the probability of toxicity: where ψ(d) is an unknown continuous monotone increasing function such that 0 ≤ ψ(d 1 ) < · · · < ψ(d K ) ≤ 1. Let Γ ∈ (0, 1) be a pre-specified target probability of DLT (in phase I trials Γ is typically set between 0.1 and 0.35). Then γ, the dose level at which probability of toxicity is equal to Γ is rigorously defined as the (100 × Γ)th quantile of ψ, γ = ψ −1 (Γ). (1.2) Because of continuity of dose-toxicity curve, γ may not be among the dose levels in Ω d . This leads to an alternative definition of the MTD as the dose level in Ω d with toxicity probability closest to Γ, At the time when a trial is designed, knowledge about the underlying dosetoxicity relationship is scant, and so phase I trial designs are naturally adaptive. The basic setup of a phase I oncology trial design include pre-specified doses to be evaluated, a starting dose that is thought to be safe (commonly d 1 ), a maximum sample size in the study, a number of subjects to be treated at each dose (cohort size), criteria for dose escalation/de-escalation, stopping rules, and criteria for determining an MTD at the end of the trial. Commonly patients are enrolled in cohorts of size c, where c is a fixed small positive integer (if c = 1 we have a fully sequential design). Let X j denote the dose assignment (X j ∈ Ω d ), and Y j = (Y j1 , . . . , Y jc ) denote the toxicity outcomes for the jth cohort of patients. Let F j = {(x 1 , y 1 ), . . . , (x j , y j )} denote the cumulative data from cohorts 1, . . . , j in the study (lowercase (x j , y j ) is used instead of (X j , Y j ) to emphasize the observed data). In general, the dose assignment for the (j + 1)th cohort of patients is determined as (1.4) where D is some prospectively defined design adaptation rule. We shall now discuss some terminology and classifications of phase I trial designs. The designs which utilize only data from the most recent cohort of patients (in which case the allocation rule is X j+1 = D(x j , y j )) are referred to as "short memory" or "memoryless" [3,4]. For instance, the 3+3 design (Subsection 2.1) uses estimated toxicity rate based on data from 3 or 6 patients to determine the dose assignment for the next cohort; the "biased coin design" [5,6] (Subsection 2.5.1) uses the dose and the toxicity outcome of a current patient to determine the dose assignment for a new patient (Markovian design). Other designs utilize greater amount of information in the allocation scheme. For example, the "cumulative cohort design" [7] (Subsection 2.5.4) uses cumulative toxicity data from patients at a given dose (while ignoring information at other doses) to determine the dose assignment for the next cohort of patients. In contrast, the designs that are based on parametric models for dose-toxicity curves utilize the entire history F j = {(x 1 , y 1 ), . . . , (x j , y j )} to borrow information across different dose levels. The latter designs are referred to as "long memory" [8] or "designs with memory" [4], and we shall discuss them in Section 3.
In any phase I study, there are two important goals related to the individual and collective ethics. The individual ethics (treatment goal) requires that every patient be treated at the dose closest to the MTD. The collective ethics (estimation goal) requires that the trial achieves accurate estimation of the MTD an the end of the study, to benefit future patients. The designs that attempt to assign each new patient to the dose that is currently viewed as closest to the target given current data are referred to as "Best Intention" (BI) designs [9]. Examples of BI designs include the Continual Reassessment Method (CRM) of O'Quigley et al. [10] (Subsection 3.1), the Escalation with Overdose Control (EWOC) method of Babb et al. [11] (Subsection 3.2), to name a few. BI designs are viewed as ethically appealing as they attempt to fulfil the treatment goal; however, they can be suboptimal in terms of convergence and estimation efficiency of the parameters of interest [8,12]. In contrast, designs that attempt to achieve the estimation goal are based on formal optimal design theory [13,14]. Dose assignments for such designs are determined sequentially to minimize some convex criterion of the Fisher information matrix, which is best from a statistical perspective but may result in placing many subjects to highly toxic doses. Recently, several adaptive design methods were proposed to achieve tradeoff between individual and collective ethics; these include constrained Bayesian opti-mal designs [15] (Subsection 3.3.2), "hybrid" designs [16,17] (Subsection 3.3.3), and penalized adaptive D-optimal designs [18].
Some desirable properties of a phase I trial design are formalized in [19]. These include coherence [20] (dose escalate (de-escalate) only when no toxicity (toxicity) is observed), consistency (the sequence of dose assignments should converge to the target MTD), high sensitivity (an interval in which the toxicity probability of the selected dose will eventually fall should be narrow), and unbiasedness (the design should have improved performance as the underlying dose-toxicity curve becomes steep). In brief, a scientifically sound phase I trial design should concentrate dose assignments at and near the MTD, minimize dose assignments at suboptimal (either too low or overly toxic) dose levels, recognizing that greater penalty is associated with overdosing compared to underdosing, and the design should lead to accurate estimation of the MTD at the end of the study.
In this paper, we give an overview of statistical designs for phase I cancer trials to determine the MTD. We distinguish two major types of designs: i) algorithmbased and nonparametric model-based designs (Section 2) and ii) designs that use explicit parametric models to direct dose assignments (Section 3). Such a classification is consistent, for example, with the one in the edited volume by Chevret [21]. Section 4 discusses data analysis issues following phase I trial designs. Section 5 describes additional important topics including designs which perform dose search over a continuum of dose levels, designs for drug combination and dose-schedule combination trials, and designs for more complex settings with multiple toxicities. Section 6 presents concluding remarks.

Algorithm-based and nonparametric designs to determine MTD
A useful definition of an algorithm-based design is given in Chapter 2 of Cheung [22], p. 14: "Generally, an algorithm-based design prescribes a set of escalation rules for any dose without regard to the outcomes at other doses." This definition ensures that the rule can be tabulated and clinical investigators can preview all possible dose escalation decisions before the trial starts. Some notable algorithm-based designs are the "traditional" 3+3 design and its generalizations (Subsections 2.1-2.2) and Bayesian designs based on toxicity posterior intervals (Subsection 2.3). Unlike algorithm-based designs, nonparametric model-based designs can include additional features such as randomization and/or nonparametric estimation of dose-toxicity probability curve which increases design complexity. Both algorithm-based and nonparametric designs require no parametric assumption about the underlying dose-toxicity probability relationship. We will discuss nonparametric up-and-down designs and their generalizations in Subsection 2.5 and isotonic designs in Subsection 2.6.

The "traditional" 3+3 design and A+B designs
The "traditional" or 3+3 design is the most commonly used design in phase I oncology trials [23]. The 3+3 design uses the philosophy that MTD is identifiable Adaptive designs for phase I cancer studies 7 from the data. The design is very simple which explains its popularity among clinical investigators. Patients are treated in cohorts of size 3, starting with the lowest dose level and never skipping an untested dose when escalating. At any dose level a maximum of 6 patients are treated. For the initial three patients at each dose, if 0/3 toxicities are observed then the next three patients are treated at the next higher dose; if 1/3 toxicities are observed then the next three patients are treated at the same dose; if ≥ 2/3 toxicities are observed then the MTD is considered to have been exceeded. Suppose six patients have been treated at a given dose. If 1/6 toxicities are observed then the next three patients are treated at the next higher dose; if 2/6 toxicities are observed then the current dose is declared the MTD; if ≥ 3/6 toxicities are observed then the MTD is considered to have been exceeded. Therefore, by construction, the 3+3 design is supposed to target the 33rd percentile of a dose-toxicity distribution curve [24]. Heather and Mackey [25] made an interesting observation that the 3+3 design uses isotonic regression for estimating MTD. Storer [26,27] was one of the first to explore statistical properties of the 3+3 design and proposed a few modifications, called "up-and-down designs". Several other practical modifications of the 3+3 design were proposed [28,29,30,31].
Many papers assessed operating characteristics of the 3+3 design (along with other phase I designs) via simulation [32,33,34,35,36,37]. Reiner et al. [38] studied the issue of early stopping of the 3+3 design and found that "the probability of coming to an early halt at an incorrect level is higher than generally believed", and if the trial stops after fewer than 15 patients then results should be interpreted with caution. Exact statistical properties of the 3+3 design were investigated, among others, by Kang and Ahn [39,40] and Lin and Shih [41]. In particular, Lin and Shih [41] found that target toxicity level (TTL), defined as the probability of a DLT at the MTD for a 3+3 design is not 33% as opposed to the popular belief. In fact, the design does not have a fixed TTL and it is advisable that several possible dose-toxicity scenarios be explored to determine the corresponding TTLs before implementing a clinical trial with the 3+3 design. Ivanova [42] showed theoretically that the 3+3 design will, on average, select a dose with toxicity rate between 0.16 and 0.27. Overall, despite its popularity, the 3+3 design is an ad hoc method with poor statistical properties. In addition to that the design identifies MTD imprecisely and unreliably, its dose escalation algorithm is quite slow and many patients in the trial may be treated at suboptimal dose levels [10,43].
Lin and Shih [41] proposed generalizations of the 3+3 design, the A+B designs which can enroll cohorts of size other than 3 and can be cast with or without dose de-escalation. Ivanova [42] showed how to calibrate design parameters in an A+B design to achieve selection of the dose with the toxicity rate close to the target Γ. The A+B designs were further studied in [44, 45].

Accelerated titration designs
Simon et al. [46] proposed accelerated titration designs (ATDs) to improve the 3+3 design. An ATD consists of three important components: a rapid accelera- tion phase, an intra-patient dose escalation phase and a model-based analysis of the dose-toxicity relationship. A rapid acceleration phase is achieved by treating one patient per dose level and using single-dose or double-dose escalation steps until the first instance of a DLT or the second instance of grade 2 toxicity, at which step the design switches to the traditional 3+3 scheme (two more patients are treated at the dose level that has triggered the switch and 3 to 6 patients are treated in subsequent cohorts). The intra-patient dose escalation part allows dose titration during subsequent treatment cycles in those patients who remain in the study and have no evidence of toxicity during the current cycle. A nonlinear mixed-effects model is used for data analysis at the end of the trial. Simon et al.
[46] evaluated the performance of three different ATDs through simulation, using parameter estimates elicited from 20 real trials. They found that ATDs, on average, can substantially reduce the number of undertreated patients with very little increase in the number of patients experiencing DLTs. Also, ATDs require substantially fewer patients compared to the traditional design to achieve similar information about MTD, and in some cases ATDs can reduce the trial duration. Since ATDs use a dose-toxicity model, they provide greater information about cumulative toxicity, inter-patient variability and steepness of the dose-toxicity curve. An MS Excel macro for managing dose assignments and an S-Plus program to facilitate data analysis following ATDs are available at the Biometric Research Branch of National Cancer Institute (see Table 2). ATDs can be advantageous over the traditional 3+3 design, but these designs have found little use in practice, likely due to conservativeness of investigators. One successful application of an ATD is reported in [47].

Bayesian algorithm-based designs
Ji et al. [48] proposed an algorithm-based dose finding method based on toxicity posterior intervals (TPI). A Beta-binomial model for toxicity probabilities at each dose level is assumed. Given a target toxicity rate and toxicity outcomes at any dose, one calculates posterior probabilities of three non-overlapping intervals that partition the (0, 1) interval. These intervals correspond to decisions of dose escalation, staying at the same dose, or dose de-escalation. The interval with the highest posterior probability triggers the decision for the next cohort of patients. The algorithm includes a stopping rule which may facilitate early termination of a trial if dose 1 is excessively toxic, and an exclusion rule which restricts escalation to a dose that is likely to be highly toxic. At the end of the trial, the MTD is selected as the dose for which the isotonic-transformed posterior probability of toxicity is closest to the target toxicity probability. The merits of the TPI method are three-fold: simplicity of implementation, transparency to clinical investigators, and good statistical properties (the method can target any user-defined toxicity probability and performs comparably to model-based methods such as the CRM [10]. A decision theoretic justification for the TPI procedure is given in [49]. Ji et al. [50] proposed a modified version of the TPI procedure (mTPI), based on the unit probability mass statistic. The mTPI inherits many attractive features of the TPI procedure, but it is simpler as it only requires specification of an equivalence interval where any dose with toxicity probability within the interval can be considered as the MTD. The authors provide an MS Excel macro for implementing both TPI and mTPI methods in real time, and an R code for performing simulations (see Table 2). A recent simulation study [51] shows that the mTPI design generally outperforms the traditional 3+3 design both in terms of mean number of patients treated above MTD and the percentage of correct MTD selection.

Other algorithm-based designs
Pharmacologically guided designs [52,53] were proposed to improve the 3+3 design by utilizing exposure-toxicity data from animal studies to direct initial dose escalation in a phase I trial until the first DLT, after which the 3+3 algorithm is applied. Although scientifically sound, these designs have been rarely used in practice [54]. Cheung [55] developed a sequential procedure for identifying MTD based on stepwise tests which control family-wise error rates without assuming monotonic dose-toxicity relationship. Jiang et al. [56] proposed algorithms to construct optimal and approximately optimal designs based on minimization of a penalty function associated with selection of the MTD. Their methodology includes the 3+3 design as a special case, but it is more flexible as it can target doses with pre-specified toxicity levels. Blanchard and Longmate [57] proposed "toxicity equivalence range design", a frequentist version of the mTPI design of Ji et al. [50].

Up-and-down designs
Up-and-down designs (sometimes referred to as "random walk rules") are nonparametric procedures that can be used to target quantiles of interest. They should be distinguished from Storer's [26] "up-and-down" procedures which are modifications of the 3+3 design.

Biased coin designs
In the context of phase I clinical trials, the most famous up-and-down design is the "biased coin design" (BCD) developed by Durham and Flournoy [5,6] and Durham et al. [58]. The BCD is a randomized extension of the Dixon and Mood's [59] up-and-down design.
Let Γ (0 < Γ ≤ 0.5) denote the pre-specified target toxicity level (similar results are available for 0.5 ≤ Γ < 1). Define b = Γ/(1 − Γ), the bias coin probability. For the jth patient (j = 1, . . . , n), let Y j denote a binary indicator of toxicity (Y j = 1 if toxicity; Y j = 0 if no toxicity) and X j ∈ Ω d denote the dose level assignment. The first patient is treated at the dose that is thought to be closest to the target (alternatively, the experiment can start at d 1 and doses can be escalated one at a time until first toxicity is observed, at which instant the main algorithm starts). Suppose the jth patient has been assigned to dose d k for some k = 1, . . . , K. The (j + 1)th patient's dose assignment is determined as follows.
i Suppose X j = d k for some k = 2, . . . , K − 1. If Y j = 1 then assign patient (j + 1) to dose d k−1 . If Y j = 0 then randomize patient (j + 1) with probability b to dose d k+1 and with probability 1 The experiment continues until the pre-specified number of patients have been treated in the trial. The BCD generates a random walk on the lattice of doses. The exact and asymptotic properties of the design have been established using the theory of finite Markov chains [5,6,58,60]. These properties as well as an application of the BCD are presented in [61]. The asymptotic distribution of allocation proportions at different dose levels is unimodal around the target quantile γ. Bortot and Giovagnoli [62] showed that the BCD possesses certain optimal asymptotic properties. Specifically, empirical mode estimator of the target quantile has smallest variance and the BCD has fastest speed of convergence to the stationary distribution. Estimation of the target quantile following the BCD was explored in [61,62,63,64,65,66]. While maximum likelihood estimators (MLEs) using logistic models are consistent and asymptotically normal [63], for small and moderate samples such estimators may not exist, may be biased [64] or may be highly variable [61]. The empirical mode is strongly consistent for the mode of the stationary treatment distribution [62,65]; yet it may be unsatisfactory for small and moderate samples. Stylianou and Flournoy [66] investigated five different estimators of the target quantile and found that the isotonic regression estimator has best small-sample performance. The toxicity probabilities {ψ(d k )} K k=1 can be also estimated by isotonic regression [67]. Stylianou [68] showed how to derive exact distributions of various statistics of interest by enumerating all possible outcomes in a BCD with a pre-determined sample size.
The original BCD as described by Durham and Flournoy [5] is a first order Markovian design: its dose escalation rule depends only on the toxicity outcome of the current patient. By utilizing more information than just the most recent response one can enhance the design performance. For this purpose a number of proposals have been made.

Improved up-and-down designs
Ivanova et al. [69] studied several designs including the "k-in-a-row rule" [70,71], the "moving average up-and-down rule", and the "Narayana rule" to target toxicity probabilities of the form Γ = 1 − (0.5) 1/k , where k = 1, 2, . . . is a pre-specified number. A distinctive feature of these designs is that they look at the k most recent responses at the current dose level to determine the dose assignment for the next patient. Simulations show that these designs improve the BCD by assigning more patients in a neighborhood of the target dose and achieving more precise estimation of the MTD. An isotonic regression estimator is recommended for data analysis [69].

Group up-and-down designs
In many trials it is desired to treat patients in cohorts so that patients in the same cohort receive the same dose, but the doses may differ across the cohorts. Gezmu and Flournoy [72] proposed "group up-and-down designs" (GUDs) which make dose modifications based on the number of toxicities from the most recent cohort of patients. These designs are non-randomized. They use additional parameters (cohort size and cutoff points for the number of toxicities in a cohort) that can be tuned to target certain toxicity probabilities of interest. In particular, for cohorts of size k, the designs can target toxicity probabilities of the form Γ = 1 − (0.5) 1/k . Baldi Antognini et al. [73] proposed randomized GUDs which can target any given Γ ∈ (0, 1).

Cumulative cohort designs
Ivanova et al. [7] proposed "cumulative cohort design" (CCD) which utilizes cumulative toxicity data from patients at a given dose in the dose-allocation algorithm. Let d k be the currently administered dose (for the jth cohort of patients),ψ(d k ) be an estimate of ψ(d k ) based on data from patients that have been assigned to d k thus far, and ∆ > 0 be the pre-determined constant. The dose assignment X j+1 for the next, (j + 1)th cohort of patients is determined as follows.
How to judiciously choose the design parameter ∆ (e.g. to maximize the total number of patients assigned to the MTD over a set of dose-toxicity scenarios) is also discussed in [7]. Oron et al. [74] proved almost sure convergence of the CCD to the target dose under widely satisfied conditions on ψ(d). Simulations show favorable properties of the CCD [7,75,76]. In particular, Liu et al. [76] reported a comprehensive simulation study comparing six different up-and-down designs under various experimental settings. Their findings are summarized as follows: "The results show that the CCD has the best overall performance in terms of selecting the MTD, assigning patients to the MTD and patient safety. Its performance is generally close to the upper bound of nonparametric designs, but improvement seems possible in some cases."

Up-and-down designs with delayed responses
In trials with long evaluation periods some patients may enroll into the study before toxicity outcomes from previous patients are available. Stylianou and Follman [77] proposed an "accelerated BCD" for which the dose assignment for the next patient is based on the last completely evaluated patient. Their design can reduce the total study duration (20%-70% less time depending on sample size and accrual rate) while having similar statistical properties to the BCD of Durham and Flournoy [5]. Jia and Braun [78] proposed "adaptive accelerated BCD" which accounts for the patients' follow-up times in the study and facilitates dose assignments without delay. Ivanova et al. [7] proposed a version of the cumulative cohort design with delayed toxicity outcomes.

More complex trials
Generalizations of up-and-down designs to trials with ordinal toxicity grades are available [79,80]. Applications of up-and-down designs in trials with toxicity and efficacy considerations are discussed in [81,82]. A review of merits and limitations of up-and-down designs can be found in [83]. Most recently, Oron and Hoff [8] found that up-and-down designs can outperform more complex model-based "long memory" designs in small sample studies (10-40 patients) which are typical in phase I cancer trials.

Isotonic designs
Since phase I trials are small, estimation of toxicity probabilities may be challenging. Parametric methods such as the maximum likelihood estimators may be unsatisfactory, especially if the model is misspecified. A popular nonparametric approach to estimating a monotone dose-toxicity curve is the isotonic regression. Recently, several nonparametric designs for MTD finding have been proposed that use current isotonic regression estimates of toxicity probabilities to determine the next dose assignment [84,85,86,87]. Ivanova and Flournoy [75] performed a simulation study comparing several such designs and found that designs that select the next dose with the estimated toxicity rate closest to the target (e.g. designs of Leung and Wang [84], Yuan and Chappell [85] and Conaway et al. [86] perform less well than the cumulative cohort design [7], both in terms of correct identification of the MTD and in terms of assigning more patients to doses near the MTD. Cheung [22] notes that isotonic designs may be undesirably rigid.

Parametric model-based designs to determine MTD
Model-based designs for phase I trials were proposed as alternatives to algorithmbased designs. A distinctive feature of these methods is the postulation of a parametric statistical model for dose-toxicity relationship. The dose-toxicity curve is sequentially updated based on accumulating data from patients in the trial, and the dose assignment for an incoming patient is determined on the basis of the current model-based estimates of toxicity probabilities. Since model-based approaches utilize the entire history from subjects in the trial, they are expected to be more efficient than algorithm-based approaches. On the other hand, these designs may be subject to modeling bias and careful assessment of robustness properties of the methods is essential. Also, these designs may be more difficult to implement and require more upfront planning than conventional designs. Since phase I studies are small, a model to facilitate design adaptations should be parsimonious, typically with one or two parameters. Sophisticated models should be avoided at the design stage because parameter estimates may be numerically unstable. In this section we will discuss several popular model-based approaches for phase I trials: the Continual Reassessment Method (CRM), the Escalation with Overdose Control (EWOC), and Bayesian decision-theoretic approaches.

Continual reassessment method
The Continual Reassessment Method (CRM) is the first model-based design in the clinical trial literature, proposed in 1990 by O'Quigley et al. [10]. Many modifications of the original CRM have been developed since then. A comprehensive exposition of the CRM is given in the book by Cheung [22]. Other excellent reviews of the CRM are available [88,89].
The defining feature of the original CRM is a one-parameter working model for the dose-toxicity probability curve: where a > 0 is the parameter. It is assumed that ψ(d, a) is monotonically increasing in d for every value of a and ψ(d, a) is monotone in a for every d. Given any Γ ∈ (0, 1) it is assumed that for each d there exists a unique parameterã such that ψ(d,ã) = Γ. In other words, the chosen one-parameter model should be rich enough to uniquely determine the target toxicity probability at each dose level. As noted by O'Quigley and Conaway [89], while a one-parameter model may not be best for describing the entire dose-toxicity probability curve, it is more important that the model provides adequate local fit near the MTD. In practice, it is helpful to have a graphical review of dose-toxicity curves to assess plausibility of the selected family of models; this should be done collaboratively by clinician and statistician [88]. Three one-parameter models are commonly proposed [10,22]: Suppose a one-parameter model is chosen and let a 0 be an unknown value of the parameter that defines the "true" model ψ(x, a 0 ). Let 0 < α 1 < · · · < α K < 1 be pre-specified a priori toxicity probabilities at the K dose levels (the "skeleton" of the CRM) and letâ 0 be the prior estimate of a 0 . To achieve consistency between the prior assumptions and the postulated model, the dose labels d 1 , . . . , d K for implementing CRM can be determined by solving the equations α i = ψ(d i ,â 0 ), i = 1, . . . , K. Importantly, this preserves the ordering of the doses, i.e. d 1 < · · · < d K .
Given initial uncertainty about the parameter a, one may use a Bayesian approach to the trial design. Three distinct approaches to establish the prior distribution, including the use of analytically tractable prior (e.g. a standard exponential density), the use of pseudo-data prior, and the use of empirical prior are discussed in O'Quigley and Conaway [89]. The original CRM is fully sequential. The first patient is treated at the dose which a priori is thought to be closest to the MTD. Any subsequent patient is treated at a dose with estimated toxicity probability closest to the target toxicity level. Let g(a) denote the prior density and let F j = {(x 1 , y 1 ), . . . , (x j , y j )} denote the history from first j patients in the trial, where x m ∈ {d 1 , . . . , d K } is the dose assignment and y m is the toxicity outcome of the mth patient (m = 1, . . . , j). Using Bayes formula, the posterior density for a is g( da is the normalizing constant. The posterior mean toxicity probability at d i is estimated as Then the dose assignment for the (j + 1)th patient is determined as The dose assignment algorithm (3.3) is applied sequentially until a pre-specified number of patients, n, have been treated in the trial. The final recommendation of the MTD is based on data from n patients.
, whereã j is the posterior mean of a, orp i = ψ(d i ,â j ), whereâ j = arg max a L j (a), the maximum likelihood estimator. The latter approach (proposed by O'Quigley and Shen [90]) is referred to as the likelihood CRM (CRML). The maximum likelihood estimator of toxicity probabilityp i = ψ(d i ,â n ) is asymptotically fully efficient, and the confidence intervals based onp i are quite accurate even for samples as small as n = 12 or n = 16 [91]. Theoretical large sample properties of the CRM have been established [92,93,94]. Under widely satisfied conditions, the maximum likelihood estimator is consistent and asymptotically normal and the recommended dose level converges to the target level even if the dose-toxicity model is misspecified [92,94]. Cheung [20] introduced the coherence principle for phase I trials which posits that dose escalation is appropriate only when the most recent patient has no toxicity and dose de-escalation is coherent only when a toxic outcome has just been seen. The one-stage Bayesian CRM is coherent whereas the two-stage CRM is in general not coherent in escalation [22]. Cheung [95] developed a sample size formula for a clinical trial with the Bayesian CRM. O'Quigley et al. [96] proposed a non-parametric optimal design (assuming that complete dose-toxicity profile is available) as the benchmark for comparing small sample efficiency of various designs in term of percentage of correct dose selection. The CRM is shown to be highly efficient, with limited potential for improvement, compared to the optimal design [96,97,98].
Small-sample properties of the CRM were investigated via simulations which also included comparisons of the CRM with other phase I dose finding designs [32,33,34,35,99,100,101]. Some authors emphasize the importance of including measures of variability in simulations to facilitate comparisons among the designs [2,8]. The CRM has been shown to yield high percentage of correct selection of the MTD under a variety of dose-toxicity models and sample sizes applicable in phase I trials. However, Oron and Hoff [8] recently showed that the CRM, while performing well on average, has a highly variable distribution of the number of cohorts treated at the MTD which may be unsatisfactory in small trials.
Some early criticisms of the CRM were attributed to the fully sequential nature of the method and the fact that escalation may occur too fast [32]. Many practical improvements of the method have been proposed to overcome these criticisms and make the method attractive to investigators. Some important modifications and extensions of the CRM are outlined as follows.

More cautious escalation schemes
The original CRM starts at the dose which a priori is thought to be closest to the MTD. More cautious strategies include starting the design at the lowest dose [32]; not allowing escalation if the most current outcome is DLT [102]; escalating one dose level at a time [103,104]; and treating more than one patient at the higher dose levels [32,103,105].

Two-stage CRM
To facilitate better learning at the beginning of the trial, two-stage designs have been proposed. At the first ("start-up") stage, patients are treated in cohorts at escalating doses until the first instant of toxicity is seen. This is done to ascertain initial data and gain prior knowledge about the drug characteristics. The second (main) stage involves implementation of the CRM or its modification [90,104,106,107,108].

Model calibration in the CRM
The performance of the CRM may depend on the specification of a prior distribution and the working model. Model calibration strategies for the CRM are presented in [109,110,111,112,113,114,115,116].

Curve-free Bayesian designs
Gasparini and Eisele [117] proposed a class of curve-free designs by placing a multivariate prior distribution on the vector of toxicity probabilities at the K doses. O'Quigley [118] showed equivalence of this class of curve-free designs to a class of CRM designs. Cheung [119] noted that curve-free designs with vague priors may be undesirably rigid and suggested using slightly more informative priors. Whitehead et al. [120] proposed a Bayesian dose-finding approach assuming only monotonicity of the dose-toxicity curve.

Robustified CRM
Yuan and Yin [121] proposed a hybrid design which uses the traditional 3+3 scheme if observed data are informative about the toxicity rate at a given dose, and it switches to Bayesian CRM if there is not enough information to make such a definitive decision. Su [122] proposed an approach combining Bayesian and likelihood-based CRM designs. Yuan and Yin [123] proposed a "robust EM CRM" which addresses the issues of choosing prior toxicity probabilities and handling delayed toxicity outcomes.

Stopping rules
The CRM uses a fixed and pre-determined sample size. An investigator may want to stop the trial before the target sample size is reached for safety issues and/or budgetary reasons. The proposals for early stopping of the CRM trial design include using the width of the posterior 95% confidence interval for the MTD as a stopping criterion [43], using Bayesian stopping rules for early detection of mis-choice of the dose range [124,125], and terminating the trial at the point when one can predict with high probability the final recommendation for the MTD [126,127].

Time-to-event CRM (TITE-CRM)
The CRM requires model updates after each patient or group of patients. However, in many trials toxicity outcomes are not observed immediately after treatment. Cheung and Chappell [128] proposed an extension of CRM which accounts for patient staggered entry and delayed toxicity outcomes (TITE-CRM). Their method can shorten the trial duration while maintaining important statistical properties of the original CRM. Some further enhancements of TITE-CRM have been proposed recently [129,130,131,132]. Applications of TITE-CRM are presented in Muler et al. [133] and Normolle and Lawrence [134].

More complex trials
O'Quigley and Conaway [135] discuss extensions of the CRM to dose-finding trials with special considerations. These include studies where two subpopulations of patients have possibly different susceptibility to toxicity and the objective is to find the MTD within each subpopulation [136,137], dose finding trials using ordinal toxicity grades [138,139,140,141], dose finding trials of combination drugs [142,143,144] and trials with both toxicity and efficacy outcomes [145,146,147]. Huang and Chappell [148] proposed an extension of CRM by using multiple dose levels in the cohorts, to achieve faster and more efficient design.
Some recent interesting papers report applications of CRM in pediatric phase I cancer trials [149], phase I cancer trials in Japanese patients [150,151], and a trial for a rare disease with constraints on the maximal sample size [152]. Software for CRM implementation is available [153,154,155]. R packages 'dfcrm' and 'bcrm' are available for download from the Comprehensive R Archive Network ( Table 2).

Escalation with overdose control (EWOC)
The EWOC is a Bayesian adaptive design proposed by Babb et al. [11]. The method uses a similar idea to the CRM of treating each patient at the dose estimated to be closest to the MTD, but it places heavier penalties on overdosing than underdosing. The method starts with a two-parameter dose-toxicity model where α and β are unknown parameters and ψ(x) = 1/(1 + e −x ) is the standard logistic distribution. It is assumed that β > 0 so that ψ(α + βx) is strictly increasing in dose. The model (3.4) is also re-parameterized in terms of more clinically interpretable measures such as ρ, the toxicity probability at d 1 , and γ (d 1 ≤ γ ≤ d K ), the unknown MTD that corresponds to the pre-specified target toxicity level Γ ∈ (0, 1). Since ρ = ψ(α + βd 1 ) and Γ = ψ(α + βγ), we have where logit(x) = log{x/(1 − x)}, and we can write Let g(ρ, γ) denote the joint prior density for (ρ, γ) with support on [0, , which can be elicited from physicians. Given history of dose assignments and toxicity outcomes from j patients in the trial, F j = {(x 1 , y 1 ), . . . , (x j , y j )}, the joint posterior density for (ρ, γ) can be obtained as g(ρ, γ|F j ) = ρ0 0 L j (ρ, γ)g(ρ, γ)dρdγ is the normalizing con- stant. The parameter of interest is γ whose marginal posterior density is g(γ|F j ) = ρ0 0 g(ρ, γ|F j )dρ. The distinctive feature of the EWOC method is the dose allocation rule. Assuming g(γ|F j ) is available, the (j + 1)th patient's dose assignment X j+1 is chosen such that Pr(γ < X j+1 ) = ǫ, (3.5) for some small pre-specified feasibility bound ǫ > 0. In other words, the dose is selected such that the posterior probability of overdosing is low. Note that the described procedure considers doses on a continuous scale. To modify the procedure for a discrete dose space, one can cast the allocation rule as where X j+1 is determined from (3.5) and T 1 and T 2 are suitably chosen nonnegative numbers referred to as "tolerances" [11]. The EWOC procedure can be applied sequentially or for cohorts of patients. If some patients' toxicity outcomes are delayed, one just uses all available data to update the model and perform dose allocation for an incoming patient. Final data analysis following EWOC is performed using Bayesian methods. Simulations from the original paper [11] showed that EWOC has similar estimation efficiency of the MTD but lower frequency of overdosing compared to CRM, and it compares favorably to various non-parametric dose escalation schemes.
The EWOC design has a firm theoretical basis [11,156,157,158]. Under mild regularity conditions a sequence of EWOC-generated dose assignments converges in probability to the true MTD and the EWOC design is optimal in the class of Bayesian-feasible designs [156]. Furthermore, the EWOC is coherent in both escalation and de-escalation [157].
Important extensions of EWOC include incorporation of covariates [159,160,161], accounting for the number and ordinal nature of toxicity grades [162,163], accounting for late onset toxicities [164], the choice of a cohort size [165], the use of variable feasibility bound [166], and an extension to drug combination trials [167]. The free interactive software from the authors of the method is available (see Table 2). A review of the software is given in [168]. An interesting application of EWOC is given in [169].

Bayesian decision theoretic and optimal designs
In this subsection we discuss novel Bayesian adaptive designs which incorporate formal optimality criteria in design adaptation rules. Subsection 3.3.1 discusses Bayesian decision theoretic designs that encompass many model-based methods under one paradigm. Subsection 3.3.2 discusses Bayesian optimal sequential designs which simultaneously address the objectives of efficient estimation and an ethical dilemma of restricting allocation of patients to highly toxic doses by sequentially solving a restricted optimization problem. Subsection 3.3.3 discusses "hybrid" designs that achieve a tradeoff between efficiency and ethics by solving a stochastic optimization problem.

Bayesian decision theoretic designs
Whitehead and Brunier [170] proposed a Bayesian decision theoretic framework for phase I dose finding studies. This approach requires specification of a dose-toxicity model, a prior distribution for model parameters, a set of possible decisions (dose level assignments for the patients) and a gain (or loss) function. Consider a two-parameter logistic model for the dose-toxicity curve: Let g(θ) denote the prior density for θ = (α, β) and g(θ|F j ) denote the posterior density for θ given data from first j patients in the trial. Let L(θ, x) denote the loss function which represents the value of loss if dose x is assigned to the (j + 1)th patient (x ∈ {d 1 , . . . , d K }) when parameter θ is valid. Given g(θ|F j ), one can obtain posterior expected loss for each of the K dose levels: The dose level for the (j + 1)th patient is chosen to minimize the posterior expected loss: As noted by Whitehead and Williamson [171], instead of posterior expected loss, one can use more convenient plug-in estimators such asL = L(θ j , d i ), whereθ j = θg(θ|F j )dθ is the posterior mean, orL = L(θ j , d i ), whereθ j = arg max θ g(θ|F j ) is the posterior mode. The form of a loss function must reflect the study objectives. Recall that p(x) denotes the toxicity probability at dose x, Γ ∈ (0, 1) is the pre-specified target toxicity level and γ is an unknown (100 × Γ)th quantile of p(x). The two key considerations are the patient loss and the information loss. For the patient loss, one can choose which is the loss function used in the CRM method. The design sequentially selects a dose with toxicity probability closest to Γ [10]. A "more cautious" asymmetric loss of the EWOC method [11] puts higher penalties on overdosing than on underdosing: where x + = max(x, 0). Some other loss functions to limit exposure of patients to doses higher than the estimated MTD can be found in Whitehead and Williamson [171].
For the information loss one can consider some criterion of the Fisher information matrix. If the goal is to estimate the target quantile γ = α+β log{Γ/(1−Γ)} as accurately as possible (see Subsection 3.3.2 for details), then a loss function can be chosen as which is an asymptotic variance of the maximum likelihood estimator of γ, assuming that j patients have been treated at doses x 1 , . . . , x j and the (j + 1)th patient is treated at dose x. This can be derived using delta-method using the Fisher information matrix for (α, β). The (j + 1)th patient's dose assignment should be one which results in the smallest variance ofγ. Other measures of information such as the determinant of the Fisher information matrix or a weighted combination of different criteria can be used [172].
The operating characteristics of Bayesian decision theoretic designs with twoparameter logistic model for sample sizes typical to phase I trials were assessed via simulations [171,173,174]. Zhou and Whitehead [175] provide guidance to practical implementation of Bayesian dose escalation procedures. The authors developed a software package Bayesian ADEPT [176,177,178]. In ADEPT one can assess the operating characteristics of several designs via simulations, and the software has a utility to implement Bayesian dose escalation designs in real time. The authors point out that ADEPT should be viewed as an assistant in decision-making, but it should not replace clinical judgment.

Bayesian optimal sequential designs
Haines et al. [15] used optimal design theory to construct Bayesian optimal sequential designs for phase I trials. Consider a two-parameter location-scale logistic model for the dose-toxicity curve (3.6). For a chosen toxicity level Γ, the target quantile is expressed as γ = α + β log{Γ/(1 − Γ)}, which is linear in the parameters α and β. Let n be the fixed and predetermined study sample size. A design for model (3.6) is determined by a discrete probability measure ξ = {(d i , ρ i ), i = 1 . . . , K, K i=1 ρ i = 1}, where ρ i = n i /n is the proportion of patients assigned to dose d i . Let z i = (d i − α)/β. Then the Fisher information for θ = (α, β) at d i is given by ). An optimal design problem is to maximize some concave criterion of M (ξ, θ). Haines et al. [15] considered two Bayesian criteria: a D-optimality criterion and a c-optimality criterion To incorporate ethical constraints, a restriction on the dose space is imposed. Let Γ R ≤ Γ be a pre-defined maximum acceptable level of toxicity and γ R = α+ β log{Γ R /(1 − Γ R )} denote the corresponding quantile (the "maximally allowed dose"). Given prior g(θ), one can obtain the prior distribution for γ R . The restricted dose space is Ω R = {d : d ≤ γ R }. To minimize probability of highly toxic dose assignments in the trial, one introduces a linear constraint function where ǫ > 0 is some small probability chosen by an investigator. Then a constrained Bayesian D-optimal optimization problem is formulated as (3.9) A similar problem can be formulated using c-optimality criterion as the objective function. The optimization problem (3.9) is well defined and the solution (optimal design points and probability mass at these points) can be obtained numerically. The problem (3.9) covers both the case when optimal dose levels are sought on the continuous (log-transformed) scale and a more practical case when the dose levels are pre-specified in advance. In the latter case, the optimization problem is simpler because the design points are known and only the optimal proportions at these points are to be found. The described optimal designs are theoretical measures. To construct sequential Bayesian optimal designs, Haines et al. [15] proposed a two-stage approach. At the first (pilot) stage, a small number of n 0 patients are allocated among the K dose levels according to the optimal design with prior density g(θ). Based on the history F n0 = {(x 1 , y 1 ), . . . , (x n0 , y n0 )}, the posterior density is obtained as g(θ|F n0 ). At the second stage, stepwise allocation of (n − n 0 ) patients is made to maximize the D-optimality criterion log |n 0 M (ξ * , θ) + I(d, θ)|g(θ|F n0 )dθ (3.10) or maximize the c-optimality criterion subject to constraint Pr(γ R ≤ d) ≤ ǫ evaluated over the posterior g(θ|F n0 ). In (3.10) and (3.11), is the number of patients assigned to dose d i after the pilot stage, with K i=1 N i (n 0 ) = n 0 . At the end of the trial, statistical inference can be based on the Bayes estimatorθ n = θg(θ|F n )dθ.
Roy et al. [179] established theoretical properties of the sequential D-optimal design. They showed convergence of the design measure to the locally D-optimal design as well as consistency and asymptotic normality of the Bayes estimator. Their proof requires that the logistic model is correctly specified. Simulations show that sequential designs provide good approximations to the "true" optimal designs, for sample sizes as large as n = 35 even with moderately misspecified priors [15]. A web-based application for implementing Bayesian optimal sequential designs is available [180].
Warfield and Roy [181] proposed semiparametric sequential designs for a class of 2-parameter models which includes logistic model as a special case. Roth [182] proposed "sequential locally optimal designs" which use the 3+3 design at initial stages of the trial and then switch to an optimal design once model parameters become estimable.

Hybrid designs
Bartroff and Lai [16] proposed another approach to handle "treatment versus experimentation" dilemma in phase I clinical trials. For a trial of size n, they proposed selecting dose assignments sequentially to minimize the "global risk" where for j = 1, . . . , n, h(γ, x j ) is the loss for the jth patient,γ n is the final estimate of the MTD, andh(γ n , γ) represents the final loss in estimation efficiency. As described by Bartroff and Lai [16], this stochastic minimization problem can be solved using dynamic programming which may be computationally prohibitive. In practice an approximate solution can be obtained using rollout algorithms [183]. An approximate solution which Bartroff and Lai [16] call "hybrid" designs is a convex combination of a "treatment" design such as EWOC and a "learning" design such as D-optimal design, with an adaptive weight that is skewed in favor of the "learning" design at initial stages of the trial and the weight becomes skewed in favor of the "treatment" design as the trial progresses. Recently, Bartroff and Lai proposed another simpler and more efficient hybrid design [17].

Data analysis following phase I trial designs
The issue of data analysis following a phase I trial is often overlooked in practice. For instance, for the popular 3+3 design the MTD is simply taken to be one dose below the dose at which the trial has stopped. This approach may be unsatisfactory because it provides no information about uncertainty of the estimator. Clearly, one can do much better by utilizing all collected trial data in statistical modeling. Given the final data set {t i , n i , d i , i = 1, . . . , K}, where t i is the number of toxicities among n i patients assigned to dose d i , how do we estimate the MTD? Also, how do we estimate toxicity probabilities p(d) = Pr(Y = 1|d) for d ∈ {d 1 , . . . , d K }? For the 3+3 design and some of its modifications, Storer [26,27] discussed confidence interval estimation of the MTD using a two-parameter logistic model for dose-toxicity curve. The approaches include delta-method, a method related to Fieller's theorem, and a method based on likelihood ratio test. Storer found that none of these approaches is completely satisfactory because of small sample sizes and sparse data in a phase I trial. Tremmel [184] considered a probit model for a dose-toxicity relationship: p k = Pr(Y = 1|d k ) = Φ(log(d k ), µ, σ), k = 1, . . . , K and found that this approach estimates the MTD better than the traditional 3+3 approach. He et al. [185] proposed a model-based approach for estimating the MTD assuming a one-parameter dose-toxicity model within a CRM framework. Their main finding is that the method provides more accurate estimation of the MTD when the one-parameter model is correctly specified, but the method can overestimate the toxicity levels when the model is misspecified. O'Quigley [186] notes that applying directly CRM-type analysis to data generated by another design may not be satisfactory because of potential lack of fit of a one-parameter model. Instead, O'Quigley [186] suggests obtaining parameter estimates by solving a weighted estimation equation where the weight at a given dose is proportional to the number of patients the CRM would have assigned to that dose. Iasonos and Ostrovnaya [187] addressed the issue of analyzing data following the 3+3 design. They compared several estimation methods and found that constrained maximum likelihood estimation method for MTD estimation performs better than weighted CRM approach [186] and isotonic regression. Confidence intervals around toxicity probabilities at each dose are obtained using cumulative toxicity data.
For up-and-down designs of Durham and Flournoy [5] and their modifications, data analysis issues were investigated in several papers [61,66,67]. In particular, Stylianou and Flournoy [66] investigated different estimators of the MTD, including the maximum likelihood estimator, the weighted least squares estimator, the empirical mean and the isotonic regression estimator with linear interpolation. They found that the isotonic regression estimator has best smallsample performance. Stylianou et al. [67] recommend using isotonic estimates of toxicity probabilities at different dose levels {ψ(d k )} K k=1 . For clinical trials that use model-based approach in the design, the subsequent data analysis can be performed using final model-based estimates. The final recommended dose is taken as an estimate of the MTD. For designs such as the CRM [10], EWOC [11], or Bayesian optimal sequential design [15], theoretical results are available that ensure consistency and asymptotically normality of parameter estimates. Therefore, standard asymptotic procedures for statistical inference should apply for these designs. A cautionary note should be made that the results are valid asymptotically and are subject to certain assumptions, including the form of the model and the prior. In practice these assumptions are hard to verify, models are likely to be misspecified, and sample sizes in phase I trials are small. Therefore, fitting a more elaborate model to the final data set may provide a better approach than just analyzing data using the "working" model which formed the basis for design adaptations. Since phase I trials are exploratory studies, they are not subject to the same level of statistical scrutiny as confirmatory phase III trials. The model for final data analysis does not necessarily have to be specified in the protocol. Different models can be tried 24 O. Sverdlov et al. to obtain the best fit to the observed data. Meta-analytic approaches may be useful under certain circumstances [188].

Finding MTD in a continuous dose space
Instead of considering pre-fixed dose levels, one can assume the dose is measured on a continuous scale. Sequential designs to determine the MTD in a continuous dose setting have been proposed in the literature; yet few of them have found use in clinical trials [19,189]. Robbins and Monro [190] introduced a nonparametric stochastic approximation method for finding a root of the regression function, which can be used to generate a sequence of dose assignments to converge to the target quantile as follows: where b > 0 is a pre-determined constant. Importantly, for the algorithm (5.1), Pr(X n → γ) = 1 and the procedure is coherent [19]. For a logistic model p(x) = {1 + exp(−(α + βx))} −1 , the choice b = β minimizes asymptotic variance of √ n(X n − γ). Since β is unknown in practice, one can replace it with a strongly consistent sequence of estimators, which leads to an adaptive Robbins-Monro procedure. Such a procedure, however, may be numerically unstable for small and moderate samples [19]. Wu [191] proposed logit-MLE designs which, under certain regularity conditions, are asymptotically equivalent to the adaptive Robbins-Monro procedure. Wu's designs use sequentially computed MLEs from a two-parameter logistic model and can be applied only when data are not too sparse. For the same two-parameter logistic model, McLeisch and Tosh [192] proposed a design that sequentially selects points to achieve maximal incremental increase in some criterion such as D-or c-optimality. The c-optimal sequential design of McLeisch and Tosh [192] is very similar to Wu's logit-MLE design [191]. Liu et al. [193] conducted a comprehensive study of various sequential designs for phase I binary clinical trials with both discrete and continuous dose spaces. They addressed multiple issues including the choice of a "start-up" design to ensure existence of MLEs for the logistic model, the construction of the "follow-on" stage to achieve nonseparable data before applying the desired sequential design, and the issue of data analysis. They conclude that in a continuous dose space Wu's logit-MLE design [191] applied in combination with appropriate start-up and follow-on rules outperforms nonparametric methods in terms of estimation efficiency but may place more subjects to highly toxic doses. Cheung [22] notes that assuming a continuum of doses is not always feasible in practice. Modification of Robbins-Monro's procedure or Wu's logit MLE method to a discrete dose space objective is not straightforward. One possibility is to run a sequential design on "virtual observations" which are functions of observed toxicity data and doses on both discrete and continuous scale [194].

Finding maximum tolerated combinations in drug combination trials
Many modern phase I cancer trials are designed to evaluate the effect of two or more drugs used in a combination [195]. Typically, each compound in a combination has been studied previously and has an established dose range and the MTD. For each cytotoxic drug in the combination, a monotone dose-toxicity probability relationship is assumed to hold marginally. However, the joint probability of toxicity as a function of combination drugs may exhibit a complex pattern due to possibly unknown interactions between the drugs. As noted by Gasparini et al. [196], there are three possible types of interaction for multiagent therapies: antagonism (one drug reduces or neutralizes the toxic potential of the other), no interaction, and synergy (co-administration of the drugs results in greater toxicity compared to administration of each drug alone). The latter one (synergy) is the most plausible type of interaction for cytotoxic agents. Consider a trial with combination of two drugs. Let s 1 < · · · < s I denote the dose levels for drug one and t 1 < · · · < t J denote the dose levels for drug two, I ≥ J. The set of all possible drug combinations is a two-dimensional lattice {(s i , t j ), i = 1, . . . , I; j = 1, . . . , J}. Let Γ ∈ (0, 1) denote the target toxicity level for the drug combination. The objective is to find maximum tolerated combinations (MTCs), i.e. one or more pairs (s i , t j ) with an estimated toxicity level closest to Γ. Ivanova and Wang [197] quantified this as an objective to find combinations w * j = (s * j , t j ), where s * j = arg min 1≤i≤I | Pr(Y = 1|s i , t j ) − Γ| is the "optimal" dose of drug one corresponding to a selected dose t j of drug two, for j = 1, . . . , J.
In general, how should one perform a search on a two-dimensional lattice of dose combinations to find MTCs? Both algorithm-based and model-based approaches have been proposed for this purpose. The traditional approach is to pre-specify an escalation path on the lattice and apply the 3+3 algorithm along this path. However, such an approach is essentially using only one path, and it can miss more promising dose combinations located outside of the path. Fan et al. [198] proposed two-dimensional search strategies based on two-stage (A+B) and three-stage (A+B+C) designs. Braun and Alonzo [199] considered A+B+C designs with "fast escalation" that allow simultaneous increases in the doses of both agents. After the first DLT is observed, further searches are split into two directions (corresponding to one dose de-escalation for each agent); therefore, two different MTCs may be potentially identified. Lee and Fan [200] proposed an A+B+C algorithm that takes into account the nature of observed DLTs (whether they were likely due to agent 1, agent 2, or their interaction) when making a decision to de-escalate a dose. Assuming the toxicity probability is increasing in dose of each agent, Ivanova and Wang [197] proposed a two-dimensional Narayana design with a two-dimensional isotonic regression procedure for estimating MTC in the end of the trial.
For clinical settings when all drug combinations can be ordered a priori in terms of toxicity probabilities, Kramar et al. [142] proposed applying techniques for a single-agent problem to a combination-agent problem. In situations when only a partial ordering is available, one can use a nonparametric procedure of Conaway et al. [86] or a CRM-type design of Wages et al. [143,201,202]; see also [203]. Yuan and Yin [204] proposed another elegant sequential design which converts a two-dimensional search into a series of one-dimensional searches using the CRM which allows to explore many promising combinations in one trial.
When ordering of drug combinations is unclear, model-based approaches may be preferred. A model-based approach for drug combination trials uses similar building blocks as for single-agent trials. For a drug combination x = (x 1 , x 2 ) (where x 1 ∈ {s 1 , . . . , s I } is the dose level of drug one and x 2 ∈ {t 1 , . . . , t J } is the dose level of drug two), the probability of toxicity is modeled as for some function ψ (0 ≤ ψ(x, θ) ≤ 1 for all x and θ). The parameter vector θ contains effects of the two drugs and their interaction. With a Bayesian approach, θ follows some prior distribution g(θ) which can be elicited from physicians based on single-agent toxicity profiles. Let F m = {(x 1 , y 1 ), . . . , (x m , y m )} denote the history from m patients in the trial, where y l = 1(0), if the lth patient experienced (did not experience) toxicity at drug combination x l , l = 1, . . . , m.
The dose combination for the (m + 1)th patient is determined based on the posterior density g(θ|F m ) ∝ L m (θ)g(θ). A most common approach is to assign combination x * that minimizes |E θ (ψ(x, θ)|F m ) − Γ|. Dose escalation may be applied to either individual patients or cohorts of patients.
Several methods based on this model-based framework have been recently proposed [205,206,207,208,209,210,211,167]. Thall et al. [205] developed a two-stage Bayesian design assuming a six parameter dose-toxicity model. At the first (escalation) stage, dose combinations are selected along the diagonal path until m patients have been treated. Given g(θ|F m ), the dose combinations for the (n − m) patients in the second stage are sought along the contour {x : E θ (ψ(x, θ)|F m ) = Γ} subject to certain optimality criteria. Wang and Ivanova [206] considered a three parameter model and proposed a two-stage Bayesian CRM-type design which, after an initial escalation stage with an updated posterior g(θ|F m ), attempts to find dose combinations for subsequent patients by minimizing the distance to the target toxicity level. Yin and Yuan [207] proposed a Bayesian procedure for which a dose-toxicity probability surface is modeled using a three-parameter copula model. Yin and Yuan [208] used another modeling approach based on latent 2 × 2 contingency tables for toxicity probabilities for the two agents. Bailey et al. [209] used a Bayesian logistic regression model to estimate toxicity probabilities for drug combinations and suggested calculating posterior probabilities for the four toxicity categories (underdosing, targeted, excessive and unacceptable toxicity) as a decision making tool. Braun and Wang [210] developed a Bayesian adaptive design based on a hierarchical model for toxicity probabilities. Huo et al. [211] developed a method for finding MTCs in trials involving combinations of a continuous-dose standard-of-care agent and a discrete-dose investigational drug. Shi and Yin [167] proposed a two-dimensional EWOC design for drug combination trials.

More complex settings
Thall [212] provides a comprehensive discussion on early phase clinical trials with special considerations. Two important topics we shall focus on in this subsection are the use of various grades and types of toxicity and optimization of dose and schedule assignments.
Most oncology chemotherapy trials use ordinal scales to measure severity of toxicity, and different types of toxicities may have different clinical implications. Bekele and Thall [213] introduced a measure "total toxicity burden" (TTB) which quantifies in a single score the number and severity of toxicities a patient may experience in the study. The authors proposed a Bayesian adaptive design to find a dose with a pre-specified TTB. This method requires an assignment of weights to different types of toxicities and a selection of the target TTB at the trial onset. Some extensions of Bekele and Thall's [213] method were proposed by Yuan et al. [139] and Lee et al. [141] (CRM-type designs) and by Chen et al. [162] (EWOC-type design).
In clinical trial practice, in addition to determining an optimal dose level of the drug, it is important to account for the duration of treatment and frequency of dosing. Braun et al. [214] is perhaps the first paper that proposed a design that simultaneously optimizes schedule of treatment and dose peradministration. The design uses a Bayesian model for time-to-toxicity and performs a search over a two-dimensional lattice of successive administration times and the doses at those times. The authors demonstrate via simulations that the method "outperforms any method that searches for an optimal dose but does not allow schedule to vary, both in terms of the probability of identifying optimal (dose, schedule) combinations, and the numbers of patients assigned to those combinations in the trial." Several important extensions of the methodology of Braun et al. [214] were recently published in major statistical journals [215,216,217]. While these designs have not been widely known yet, we believe they hold much promise to improve clinical trial practice in the near future.

Concluding remarks
In this paper we presented an overview of adaptive designs for phase I cancer trials where the objective is to find the maximum tolerated dose or maximum tolerated combinations. Unlike confirmatory trials that are driven by hypothesis testing considerations, phase I studies are driven by estimation of the drug dose-toxicity profile and selection of the most appropriate dose(s) for subsequent studies. Since at the beginning of clinical drug development there is limited knowledge about the drug characteristics, adaptive designs are attractive research designs for phase I dose finding trials. The use of adaptive designs in this setting is explicitly encouraged by the Health Authorities [218,219]. In particular, the Committee for Medicinal Products for Human Use (CHMP) Guideline on Clinical Trials in Small Populations [219], page 8, states: ". . . A variation of response-adaptive designs is those used for dose finding-they are typically referred to as 'continual re-assessment' methods. They are some-

Special Considerations
Continuous dose space [19,190,191,192,193] [ 189,194] Drug combination trials: Algorithm-based [197,198,199,200] [86] Model-based [205,206,207,208,209,210,211] Multiple toxicities [213,139,141,162] Dose-schedule combination [214,215,216,217] times, but rarely, used. The properties of such methods far outstrip those of conventional 'up and down' dose finding designs. They tend to find the optimum (however defined) dose quicker, they treat more patients at the optimum dose, and they estimate the optimum dose more accurately. Such methods are encouraged.
Despite significant recent advances in adaptive design methodology for phase I cancer trials, the traditional 3+3 design remains the most popular choice in practice [220]. Our literature review provides a comprehensive, but not exhaustive list of novel adaptive dose-finding methods for phase I cancer trials. A succinct summary is presented in Table 1. Many of these designs have clearly defined objectives and established statistical properties, and are shown to outperform  [10,128] http://cran.r-project.org/web/packages/bcrm/index.html [10,11,101] http://cran.r-project.org/web/packages/pocrm/index.html [201] http://biostatistics.csmc.edu/ewoc/ewoc-s.php [11] conventional procedures such as the 3+3 design, both in terms of statistical efficiency and in terms of minimizing the number of patients treated at suboptimal dose levels. One should be mindful that model-based adaptive designs are operationally more complex and require continuous collaboration between clinical investigator and biostatistician. Simulations must be routinely used to evaluate the design performance under a variety of hypothetical experimental scenarios. Validated statistical software is crucial for successful implementation of adaptive dose finding methods. Table 2 gives some web-links to non-commercial software packages. While more upfront planning is needed before implementation of a phase I adaptive design, this may profoundly benefit the development program. Our review covers adaptive designs for cancer trials of cytotoxic agents in which acceptable toxicity is used as a surrogate for a therapeutic response. Many novel anti-cancer therapies, such as cytostatic agents are typically tolerable and much less toxic than cytotoxic anti-cancer drugs, and efficacy is more relevant a consideration than toxicity in trials of cytostatic agents [221,222]. Dose finding trial designs utilizing both toxicity and efficacy measurements are referred to as "seamless phase I/II designs". A review of these methods is beyond the scope of the current paper, but the interested reader is referred to [223].