Bayes Linear Bayes Networks with an Application to Prognostic Indices

. Bayes linear kinematics and Bayes linear Bayes graphical models provide an extension of Bayes linear methods so that full conditional updates may be combined with Bayes linear belief adjustment. The use of Bayes linear kinematics eliminates the problem of non-commutativity which was observed in earlier work involving moment-based belief updates. In this paper we describe this approach and investigate its application to the rapid computation of prognostic index values in survival when a patient’s values may only be available for a subset of covariates. We consider the use of covariates of various kinds and introduce the use of non-conjugate marginal updates. We apply the technique to an example concerning patients with non-Hodgkin’s lymphoma, in which we treat the linear predictor of the lifetime distribution as a latent variable and use its expectation, given whatever covariates are available, as a prognostic index.


Introduction
A Bayes linear analysis (Goldstein and Wooff, 2007) differs from a full Bayesian analysis in that only first and second order moments are specified in the prior. Posterior (termed adjusted ) moments are then calculated when data are observed. The introduction of Bayes linear kinematics and Bayes linear Bayes models (Goldstein and Shaw, 2004) extends Bayes linear methods to allow the incorporation of observations of types which are not readily accommodated in a straightforward Bayes linear analysis. For example, beliefs about certain unknown quantities might be updated by full conditional Bayesian inference when observations are made on conditionally Poisson or binomial variables and then information can be propagated between these unknowns, or to other unknowns, via a Bayes linear belief structure. This approach avoids the need for computationally intensive methods such as Markov chain Monte Carlo which are often required in standard Bayesian analyses.
In this paper, we investigate the application of Bayes linear Bayes models and Bayes linear kinematics to the calculation of prognostic indices in medicine, particularly in survival or time-to-event problems. Many prognostic indices used in practice involve only a small number of covariates but other information may be available. We wish to be able to construct a prognostic index which can use information from a larger number of variables. An obstacle to doing this with conventional methods is that the extra variables might not always be observed. We wish to overcome this difficulty by being able to compute an index value even when some variables are not observed. This can be done in the case where all variables are jointly multivariate normal. It can also be done, in a Bayesian network, when all variables are categorical with finite sets of states (eg Lauritzen and Spiegelhalter, 1988;Jensen and Nielsen, 2007). Other cases are more computationally demanding and typically require the use of methods such as Markov chain Monte Carlo (MCMC) algorithms. We develop an approach which allows for a large number of potential covariates but which allows the calculation of a prognostic index value when only a subset of the covariates are observed. The Bayes linear kinematic approach makes this possible in a straightforward way, with deterministic calculations, and, indeed, allows the index to be adjusted in a straightforward and commutative way if new covariate information is received. This opens the possibility of a fast and easyto-use Web-based prognostic index calculator which might be used in different parts of the world where different subsets of covariates may be available. Covariates used in prognostic indices can take many forms so we investigate a new approach to dealing with non-conjugate relationships in the marginal full-Bayes parts of the model. This extends the applicability of the Bayes linear Bayes approach.
In Section 2 we discuss prognostic indices and the case where some covariates might not be observed. In Section 3 we briefly outline Bayes linear methods, Bayes linear kinematics and Bayes linear Bayes graphical models. In Section 4 we extend previous work on Bayes linear Bayes graphical models by developing the case where the marginal full-Bayesian sub-models are non-conjugate. In Section 5 we discuss the application of these methods to the routine computation of prognostic index values for patients with terminal illness where only partial information may be available. We discuss the stages of constructing the Bayes linear Bayes network which is used to compute the index values. In Section 6 we illustrate this with an example involving patients with Non-Hodgkin's lymphoma.

Prognostic Indices in Survival
Prognostic indices are used to help to predict outcomes in patients, for example survival times in patients with a terminal illness. The value taken by the prognostic index depends on the clinical information about patients. A prognostic index can be useful to make a decision about the appropriate treatment for the patients. For example, Henderson et al. (2001) described the significance of using prognostic indices in terms of the impact on the selection of various treatments and the importance for the patients and their families to know and think about the future to give the best support to the patients in the remaining years of their lives.
Typically, a prognostic index is a linear predictor based on the explanatory variables. In a proportional hazards model, the hazard function for patient i is λ i (t) = λ 0 (t)e ηi , where λ 0 (t)e β0 is the baseline hazard, and the prognostic index of patient i is η i = log[λ i (t)/λ 0 (t)]. A greater value corresponds to worse prognosis. Typically we write where (β 0 , β 1 , . . . , β J ) are the regression coefficients and x 1,i . . . , x J,i , are the covariate values for the patient of interest. Thus the baseline hazard applies to a patient for whom all of the covariate values are zero. Usually we would determine values for the coefficients by fitting a suitable survival model to historical data. For example, if T i is the survival time of patient i in the historical data then we might use a Weibull model with T i ∼ Weibull(α, λ i ) where, α is the shape parameter in the model and λ i = exp{η i } is the scale parameter. So, η i = log(λ i ) is the prognostic index for patient i.
Note that the predictive mean for the linear predictor for a new patient with covariate values x 1,i , . . . , x J,i is obtained by substituting the posterior means of β 0 , . . . , β J into (1), provided, of course, that all of the covariates are observed. This predictive mean is then given as the prognostic index value. Our method provides a means of dealing with the case where not all of the covariates are observed.
Once the prognostic index value for a patient has been computed, it might be transformed to help with interpretation. For example, we might use the distribution of prognostic index values in the historical data and determine the percentile of this distribution corresponding to a new index value. Thus we would give a score on a 0-100 scale.

Prognostic Networks and Unobserved Covariates
If we have a fixed list of covariates X = {X 1 , . . . , X J } then the index must be given by a function of the values x 1 , . . . , x J taken by these covariates. Conventionally, to construct a prognostic index, we try to choose a suitable function g(x 1 , . . . , x J ). Suppose now that we have a list X max = {X 1 , . . . , X J } of covariates which can be observed but that we might not always observe all of X 1 , . . . , X J but rather we observe some subset of X max . Let the possible observed subsets be X 1 , . . . , X M . We need a different function g m for each possible subset X m . To do this in a coherent and principled way, we introduce the idea of a latent variable Z T . When we supply an index value to a user, we will give our current expectation of Z T , given the information available to us. We can choose Z T to be a quantity on which the lifetime distribution depends, as in a traditional prognostic index. For example, in a Weibull model with survival function exp{−λ i t α } for subject i, where X i is the subset of observations made for patient i. We refer to G i as the predicted prognostic index value. Alternatively we can regard Z T as the result of a transformation of the lifetime, T , itself, such that E(Z T,i | η i ) = η i where Z T,i is the value of Z T for patient i. Again, in this case, we compute the current expectation of η i as our predicted prognostic index value. This allows us to compute a (predicted) prognostic index value given observations of any subset of the possible covariates, for example when some values are missing or when some variables are only measured in certain cases. Additional flexibility is provided by modelling the joint distribution of Z T and the covariates, often through latent variables Z 1 , . . . , Z J associated with the covariates, so that Z T is not known precisely even when all of the covariates are observed. In this way we always use an expectation of Z T as our declared index value.
Bayes linear methods applied to {Z 1 , . . . , Z J , Z T } require the specification of only the first and second moments of this collection and, when given the values of some of these variables, it is straightforward to compute the adjusted expectation of Z T . We outline Bayes linear methods in Section 3.1. In many cases, the observation of a covariate X j does not determine the value of the corresponding Z j but only changes its moments. The use of Bayes linear kinematics and a Bayes linear Bayes model allows us to deal with this situation in a way which is quick and efficient and, importantly, commutative so that covariate information may be introduced in any order without affecting the resulting index value. We outline Bayes linear kinematics and Bayes linear Bayes models in Sections 3.3 and 3.4.

Bayes Linear Kinematics and Bayes Linear Bayes
Graphical Models

Bayes Linear Methods
In the standard Bayesian paradigm, we specify the full joint prior distribution for all unknown quantities. Using Bayes' theorem, we update our prior beliefs by conditioning on the observations and then calculating the posterior distributions. A Bayes linear analysis (Goldstein and Wooff, 2007) is distinct from the full Bayesian approach in that we only need to specify the first and second-order moments for the prior and then calculate posterior moments. For instance, if we have a vector random quantity Z, then we specify the prior expectation and variance of Z respectively as E 0 (Z) and Furthermore, for two quantities Z 1 and Z 2 , we also specify a prior covariance Cov 0 (Z 1 , Suppose that we have two vectors A = (A 1 , . . . , A p ) and B = (B 1 , . . . , B q ) , where A is the observed quantities and B is the inferential quantities, and that we have made a full second-order prior specification for the elements of Z = (A , B ) . Bayes linear methods provide a way to update beliefs about B by a linear fitting on A using the Bayes linear updating equations for B adjusted by A as where H B|A = Cov 0 (B, A)Var −1 0 (A), provided that Var 0 (A) is invertible.

Probability Kinematics
Bayes linear kinematics was introduced by Goldstein and Shaw (2004). It is named after the probability kinematics proposed by Jeffrey (1965) since the relationship of Bayes linear kinematics to Bayes linear methods can be seen as analogous to the relationship of probability kinematics to probabilistic conditioning. So, before discussing Bayes linear kinematics, we first briefly introduce probability kinematics.
Probability kinematics is a method for revising a probability specification which depends on new probabilities over a partition. Suppose that we have a partition A = (a 1 , . . . , a m ) and corresponding probabilities Pr 0 (a k ) and m k=1 Pr 0 (a k ) = 1. Suppose that we have obtained some information I which causes us to update our probabilities of these events to Pr 1 (a 1 | I), . . . , Pr 1 (a m | I). Then we can impose the so-called rigidity, or sufficiency, condition that, for any future event B, Therefore, the new marginal probability of B can be found by probability kinematics on Pr 1 (a 1 | I), . . . , Pr 1 (a m | I) as An important special case is where I consists of the observation of the value x of a random variable X where X is judged to be probabilistically conditionally independent of B given A. See, for example, Diaconis and Zabell (1982). In this case, Pr 1 (B) = Pr 0 (B | X = x).
However, suppose that we have two different partitions, A 1 = (a 1,1 , . . . , a 1,m1 ) and A 2 = (a 2,1 , . . . , a 2,m2 ). Then successive probability kinematic revisions based on these two partitions might not be commutative if the order is reversed. A simple example of non-commutativity is given by Wilson (2011). The conditions for commutativity are discussed by Field (1978) and Diaconis and Zabell (1982).

Bayes Linear Kinematics
Bayes linear kinematics was introduced by Goldstein and Shaw (2004). It is a special form of Bayes linear analysis where, instead of observing A, as in Section 3.1, we simply update our beliefs about this set of quantities by obtaining some information. Then those changes in our beliefs can be propagated through other unknown quantities within a Bayes linear structure. The idea is that, instead of observing A directly and using (2) and (3), we observe some relevant information I that changes our beliefs about A so that our mean and variance become E(A | I) and Var(A | I). Then we wish to propagate these updates to B. In particular, the information I, may be observations on some related quantities X.
In order to propagate these changes in our beliefs, we could use a full-Bayes analysis which requires a full probabilistic specification and more intensive calculations such as Markov chain Monte Carlo (MCMC) methods. However, by imposing the sufficiency conditions that (2) and (3) continue to hold when we have not observed A but merely changed its mean and variance, we can use Bayes linear kinematics by adjusting the expectation vector and variance matrix based on (2) and (3) and therefore we can adjust our mean and variance of Z. Applying (2) and (3) to B we obtain We can combine these into two updating equations for the whole of Z by writing where I p is a p × p identity matrix. Then and Now suppose we wish to make multiple updates. As with probability kinematics, we need to consider commutativity. Suppose that we have J sets of random quantities Z 1 , . . . , Z J , where, for j = 1, . . . , J, the elements of Z j are arranged as a vector Z j = (Z j,1 , . . . , Z j,nj ) . The sets Z j need not be disjoint. Suppose that a full second order prior specification, S 0 (Z) = [E 0 (Z), Var 0 (Z)], has been made for Z = Z 1 ∪ . . . ∪ Z J and that the elements of Z are arranged in a vector Z. If we observed Z k then (2) and (3) would lead to an adjusted expectation E (0) (Z | Z k ) and variance Var (0) (Z | Z k ). Now suppose that, instead of observing Z k , information I k is received which causes our beliefs about Z k to be updated to S 1 (Z k | I k ) = [E 1 (Z k ), Var 1 (Z k )].
Suppose that, in this new situation, if we now observed Z k , our expectation would be E (1) (Z | Z k ) and our variance would be Var (1) (Z | Z k ). Then the Bayes linear kinematic update for Z, if we gain information I k , is found by setting the adjustments to expectations and variances using specifications S 0 (Z) and S 1 (Z | I k ) equal to each other. That is we can use (2) and (3) with A replaced by Z k and B replaced by Z. So Now suppose that, for each j (j = 1, . . . , J), information I j is received once and beliefs are changed for Z j . A Bayes linear kinematic update can be made for Z each time. However, successive Bayes linear kinematic updates are not necessarily commutative. Making an update given I 1 changes the moments of the Bayes linear structure and so the update using I 2 is changed. The Bayes linear kinematic method depends on the assumption that the updating formulae do not change. By straightforward repeated application of (6) and (7) we violate this assumption and it turns out that commutativity does not hold. It is necessary to define Bayes linear kinematic formulae for updating by the whole of the data, based on an assumption analogous to the assumption that (2) and (3) continue to hold, but which will apply commutatively to intermediate steps, whatever the order in which we enter the information. Goldstein and Shaw (2004) derived the conditions under which a commutative Bayes linear kinematic update exists and under which this update is unique. When a unique commutative update exists, the adjusted variance [P(Z | I)] −1 and expectation E(Z | I) are given by where I = (I 1 , . . . , I J ) and P(Z) = Var(Z) −1 is the prior precision matrix. Goldstein and Shaw give formal proofs in the general case.

Bayes Linear Bayes Graphical Models
In complicated models, experts often make full marginal probabilistic specifications, but they are not able to assess the full joint probability distribution for all unknown quantities in the model. Goldstein and Shaw (2004) developed a formalism for updating beliefs about these quantities which depends on Bayes linear kinematics. They introduced Bayes linear Bayes graphical models with directed graph G = (V, E) where V is a collection of nodes representing quantities (X 1 , . . . , X J , Z 1 , . . . , Z K ), each of which may be a vector, and E is a collection of edges. A Bayes graphical model is a model where the generalised conditional independence relationship is probabilistic conditional independence (Lauritzen, 1996;Cowell et al., 1999) and by taking second-order belief as the generalised conditional independence, we obtain a Bayes linear graphical model. See Goldstein and Wilkinson (2000). A Bayes linear Bayes graphical model is a combination of fully Bayesian and Bayes linear graphical models allowing conditioning on marginal distributions of any form and taking advantage of Bayes linear kinematics to involve full conditional updates within Bayes linear adjustments. The quantities Z 1 , . . . , Z K are connected in a Bayes linear structure. Each X j , j = 1, . . . , J depends on an associated Z k(j) , which is one of Z 1 , . . . , Z K , through a conditional probability distribution and is conditionally independent of X j for j = j and of Z k for k = k(j) given Z k(j) . See Goldstein and Shaw (2004). In the rest of this paper we will assume that Z 1 , . . . , Z K are labelled so that k(j) = j.
Apart from Goldstein and Shaw (2004), published work on Bayes linear Bayes graphical models includes Wilson and Farrow (2010); Wilson et al. (2013); Gosling et al. (2013); Quigley et al. (2013) and Wilson and Farrow (2017). Figure 1 shows a very simple Bayes linear Bayes graphical model. We have two observable quantities, X 1 and X 2 . Each X j depends, through a conditional probability distribution, on a corresponding underlying quantity Z j . So, beliefs about the pair (Z j , X j ) are represented by a probability model. The relationship between Z 1 and Z 2 is described by a Bayes linear structure. Figure 1: Bayes linear Bayes graphical model with two variables. The red undirected edge linking the pair (Z 1 , Z 2 ) represents a Bayes linear structure (we could use a directed edge in either direction) and the directed blue arrows refer to the conditional distributions which allow full-Bayes analysis.

Example 1
For example, consider the following prognostic problem. Patients suffering from chronic obstructive pulmonary disease (COPD) suffer exacerbations from time to time. Current research includes the development of systems to monitor patients and predict when the next exacerbation might occur in each patient. See, for example, Guerra et al. (2017). Many covariates might be used but suppose, for the purpose of this example, that we just use the number, X 1 , of exacerbations suffered by the patient in an earlier year. We are interested in X 2 , the time, in days, until the next exacerbation. In a very simple model we might suppose that, given the value of λ 1 , X 1 has a Poisson distribution with mean 365λ 1 and, given the value of λ 2 , X 2 has an exponential distribution with mean λ −1 2 . Although we might expect λ 1 and λ 2 to be similar, we do not expect them to be equal since the condition of the patient changes over time. We have marginal prior distributions for λ 1 and λ 2 and we express our belief in the relationship between them by a Bayes linear structure involving quantities Z 1 and Z 2 which are related to λ 1 and λ 2 in some specified way. So we specify means and variances for Z 1 and Z 2 and a covariance between them. Then, given an observation of X 1 for a particular patient, our beliefs about Z 1 change. This changes our beliefs about Z 2 , through the Bayes linear relationship, and therefore our beliefs about X 2 change, although it may be sufficient for us to have a revised expectation of Z 2 . Figure 2 shows a larger Bayes linear Bayes graphical model, with J = K = 5. Here X 5 is denoted T to represent survival time as this model is intended for use in survival prognosis. The m = J − 1 = 4 quantities X 1 , . . . , X 4 represent covariates. We give Z = (Z 1 , . . . Z K ) a second-order prior specification. For generality, this Bayes linear structure is shown as a fully connected undirected graph in Figure 2. In particular applications, directed edges can be used and conditional independence might lead to the omission of some edges. If we observe X j , then Bayes theorem is used to calculate the adjusted expectation E(Z j | X j ) and variance Var(Z j | X j ). These changes are passed through the rest of the network using Bayes linear kinematics. When m > 1 we use (8) and (9).

Example 2
Early work on Bayes linear Bayes models used conjugate prior distributions in the full-Bayes components and applied Bayes linear kinematics to the usual parameters of these distributions. For example, if the conditional distribution of X j given Z k(j) were Poisson with mean λ = Z k(j) , then a gamma prior distribution would be given to λ. Observing X j would lead to a posterior gamma distribution and the revised mean and variance would lead to changes in the moments in the Bayes linear structure which would be propagated by Bayes linear kinematics. Wilson and Farrow (2010) introduced the idea of using a transformation, or link function. This can make the use of a Bayes linear structure more reasonable by eliminating bounds on the values of parameters and removing mean-variance relationships. It can also, in certain cases, ensure that the conditions for (8) and (9) to be the unique commutative Bayes linear kinematic update are met. For example, in the Poisson case, we would use Z k(j) = η = log λ instead of λ and compute the adjusted mean and variance of η given X j , while still giving λ a gamma prior distribution. Wilson and Farrow (2017) retained the conjugate prior distribution but suggested using a "guide relationship", as used, for example, by West et al. (1985) and Gamerman (1991). The flexibility of Bayes linear kinematics allows us to use as the adjusted mean and variance of Z k(j) , after observing X j , values which are approximations to the posterior mean and variance given X j , rather than the exact values. In particular, in the Poisson case, instead of setting Z k(j) = η = log λ and giving Z k(j) the mean and variance of the resulting log-gamma distribution, we could give Z k(j) the mean and variance of the normal distribution such that the corresponding lognormal distribution has the same mean and variance as the gamma prior for λ, or we could give Z k(j) a mean equal to the mode of the distribution of η and a variance given by −[dl(η)/dη] −1 , evaluated at the mode, where l(η) is the log density of η. These options, in the Poisson case, are discussed by Wilson and Farrow (2017) who examine the effects of these choices on the adjustments propagated through a simple network.
Example 1 We illustrate three variations on the method using Example 1. For each of λ 1 and λ 2 , we assess the prior mean to be 0.01 and the prior upper quartile to be about three times the prior lower quartile. This suggests a gamma Ga(1.8,180) prior distribution for each, which has a variance of 0.01/180 = 5.56×10 −5 . Suppose that, in each case, we use the same relationship for (λ 1 , Z 1 ) as for (λ 2 , Z 2 ) so E 0 (Z 1 ) = E 0 (Z 2 ) and Var 0 (Z 1 ) = Var 0 (Z 2 ), and, in each case, we give Z 1 and Z 2 a prior correlation of 0.9. Therefore H Z2|Z1 = 0.9. Suppose for a particular patient, we observe six exacerbations in the earlier year. The posterior distribution for λ 1 is therefore Ga(7.8, 545), with a mean of 0.0143 and a variance of 2.62 × 10 −5 .
Using (4) and (5), we have, in each case, Working directly in terms of λ 1 and λ 2 , so Z j = λ j , we obtain the moments in the rows labelled "Direct" in Table 1.
Using the method of Wilson and Farrow (2010), we obtain the moments in the rows labelled "WF10" in Table 1. The adjusted moments of Z 2 correspond to a Ga(4.466, 314.9) distribution for λ 2 | X 1 . Now consider the "lognormal" method of Wilson and Farrow (2017). If we give Z j a normal N(−4.826, 0.4418) prior distribution, then exp(Z j ) has the same mean and variance as the gamma prior for λ j . Similarly, if Z 1 has a normal N(−4.307, 0.1206) distribution, then exp(Z j ) has the same mean and variance as the gamma posterior for λ 1 . We obtain the moments in the rows of Table 1 labelled "WF17". Using the lognormal relationship the adjusted moments correspond to the moments given for λ 2 | X 1 .
For comparison, we can use a full-Bayes analysis of a model in which λ j = exp(Z j ) and Z j ∼ N(m, v), j = 1, 2. We show the results of this with two different prior moment specifications. Firstly we set m = −4.826 and v = 0.4418, the prior moments used to illustrate the Wilson and Farrow (2017) method. The results are labelled "Full Bayes 1" in Table 1. Secondly we set m = −4.937 and v = 0.6635 which gives the specified mean and ratio of quartiles for λ j . We obtain the moments labelled "Full Bayes 2" in Table 1.
In this paper we consider a broad range of different kinds of covariates. Conjugate priors might not be available. We investigate the use of non-conjugate priors, typically using a one-dimensional numerical integration for each marginal update. For example, in the Poisson case, we set Z k(j) = η = log λ but give η a normal prior and update its moments numerically when X j is observed. We see that the resulting revised mean for a prognostic index is similar to that obtained by a full-Bayes calculation. This is described in Section 4.   (2017) "lognormal". Here η j = log(λ j ) andη j ≈ η j .

Introduction
While there are advantages in using a link function, as in Wilson andFarrow (2010, 2017), the use of a conjugate prior and then calculation of the change in mean and variance of a transformed parameter is a somewhat restrictive arrangement. In addition, a normal prior distribution is symmetric, has unbounded support and no mean-variance dependence. These features fit well with the use of Bayes linear updating between nodes in the network. A direct non-conjugate relationship to such a prior is closer to the usual structure in a full-Bayes analysis and, as we shall see, the results tend to be closer to those of the full-Bayes analogue. Furthermore our approach provides a general method which allows many different kinds of observational distributions. The price to be paid for this is that we need to use numerical integration to find the adjusted mean and variance of Z j given X j = x j . However this is typically a one-dimensional integration and suitable fast approximation methods can often be used. Suppose that we give Z j a prior distribution with density f j (z) and that the likelihood from observing X j = x j is L j (z; x j ). Then the posterior density of Z j is proportional to g j (z) = f j (z)L j (z; x j ). For example, in the Poisson case mentioned above, we might give a normal prior distribution to η = log λ and the likelihood is proportional to exp(−λ)λ xj .
In suitable cases, particularly where the support is unbounded, we might use a simple normal approximation to obtain posterior moments. Let G j (z) = log g j (z). Then we can find the maximum m of G j (z) as an approximation to the posterior mean and use v = −[∂ 2 G j (z)/∂z 2 ] −1 evaluated at z = m as an approximation to the posterior variance. Alternatively, we write the posterior mean as and then use Laplace approximations (Tierney and Kadane, 1986) for the integrals in the numerator and denominator. Another possibility would be to use Gauss-Hermite quadrature (eg Naylor and Smith, 1982).
Example 1 To apply this method in Example 1, we use the same model as in the full-Bayes analysis except that we do not relate Z 1 and Z 2 through a bivariate normal distribution but simply through a second-order moment specification. We use the Laplace approximation to find the moments of Z 1 | X 1 . The effect of this change is then passed to Z 2 using (4) and (5) as in the other Bayes linear kinematic methods. The results are shown in the rows labelled "Nonconjugate" in Table 1. We can see that these results are the closest of all of the Bayes linear kinematic results to the corresponding full-Bayes moments ("Full Bayes 2").
The "Nonconjugate" moments are closer to the "Full Bayes 2" moments than the "WF17" moments are to either set of full-Bayes results. This is achieved with a very small increase in computation compared to the conjugate-update methods. The greatest discrepancy between the "Nonconjugate" and "Full Bayes 2" moments is in Var Z1|X1 (λ 2 ) where the full-Bayes posterior variance is obtained by integrating over the posterior distribution of Z 2 to obtain the posterior variance of λ 2 = exp(Z 2 ) but the Bayes linear kinematic method has available only the first two moments of Z 2 .
In the rest of Section 4 we consider some special types of observational variables, especially those which may be relevant to our prognostic index application.

Binary Variables
In the case of a binary variable X, a number of possibilities arise. One is simply to have separate conditional models for the two possible values of the variable. We will see an example of this in Section 5. Another is to let X = 1 if the corresponding Z ≥ 0 and X = 0 if Z < 0 and, for this purpose, we assign to Z a normal prior distribution. In this case the support of the posterior distribution is bounded at zero and, for this reason, a quadrature method rather than the normal or Laplace approximation would be used. We treat binary variables as a special case of ordinal variables. See Section 4.3.

Ordinal Variables
Suppose that we have ordered categories labelled 1, . . . , S and X takes the value of the category label. Then X = s if and only if c s−1 ≤ Z < c s for a set of thresholds c 1 , . . . , c S−1 where c 0 → −∞ and c S → ∞. In this case the posterior support is bounded, often both below and above, and we use a quadrature method to find the posterior moments.
Suppose that X j is an ordinal variable and that the conditional mean of Z j , given Z j =z j , is m j + b j (z j −m j ) whereZ j = (Z 1 , . . . , Z j−1 , Z j+1 , . . . , Z J ) andm j is the corresponding mean vector, and the conditional variance is w j . Then, if Z j |Z j =z j has a conditional normal distribution, w j and Φ() is the standard normal distribution function. So the marginal model corresponds to a conventional ordinal regression with a probit link. For identifiability, when S > 2, we fix two thresholds, usually c 1 = 0 and c s−1 = 1. In the binary case S = 2 and we fix c 1 = 0 and w j = 1.
To do the numerical integration we proceed as follows. Suppose that the prior dis-

Unordered Categorical Variables
Dealing with a categorical variable with S > 2 unordered categories, other than by conditioning the whole model on the categories, requires the use of an underlying vector variable with dimension S − 1. This can be handled within the general Bayes linear kinematic and Bayes linear Bayes framework since an observation X j can be associated with a subset Z j = {Z j,1 , . . . , Z j,S } containing more than one of the elements of Z. For example, we can set and this provides the likelihood. A constraint, such as Z j,1 = 0 or S s=1 Z j,s = 0 is applied for identifiability. The elements of Z j which are not fixed can then be given a multivariate normal prior distribution with marginal variances of 1.
A numerical integration of dimension S − 1 is required. This presents a difficulty if S is large but, in certain special cases, such as when the categories have a hierarchical structure, the (S − 1)-dimensional integration can be replaced with a sequence of integrations of lower dimension. We have not come across an application to prognostic indices where anything other than a low dimensional integration was required.

Interval Censored Variables
A variable subject to interval censoring may be handled by methods similar to those in the case of ordinal variables. If X is not censored, then Z = X and we make a direct observation. If the observation is censored then we observe that c 1 ≤ Z < c 2 for lower and upper bounds c 1 and c 2 and the posterior support is bounded.

Introduction
Consider Figure 2. The nodes X 1 , . . . , X 4 represent covariates. Just as X j depends on Z j , we have a lifetime T which depends on Z 5 ≡ Z T . In general we can have covariates X 1 , . . . , X J depending on Z 1 , . . . , Z J respectively and then an additional dependent node T depending on an element Z J+1 of the Bayes linear structure. Then Z J+1 represents our prognostic index. When we observe some or all of the covariates this changes our expectation of Z J+1 and therefore the index value which we report.
In routine use, with a new patient, i, we need a mean vector and a variance-covariance matrix for the elements, Z 1,i , . . . , Z J,i , Z J+1,i of the Bayes linear structure. Some or all of these moments might differ between patients because of other information about individual patients. Specifically, we might select certain important variables which are always observed, such as Age and Sex, and condition the rest of the model on these. We will refer to these as permanent covariates. We propose to allow the mean vector m i of Z i = (Z 1,i , . . . , Z J,i , Z J+1,i ) for patient i to depend on the values of the permanent covariates for this patient, typically through a linear model. However we propose that the variance-covariance matrix, V , should not depend on the permanent covariates.
The variance-covariance matrix might be developed in a general, unstructured, way, as suggested by Figure 2. Alternatively we might impose some structure and exploit conditional independences, perhaps by introducing mediating nodes which induce correlation between related covariates. This might be done by expert judgement. A subjective covariance structure might be developed using an approach similar to methods described in Farrow (2003). On the other hand we might use analysis of historical data to help determine a suitable network structure. Methods for structure learning for Bayesian networks are discussed in, for example, Heckerman et al. (1995);Neapolitan (2003); Margaritis (2003); Wang (2015). In our Bayes linear Bayes structure, the dependencies between variables are specified by the variance-covariance matrix, V , of Z i . Given a value for this matrix, it is straightforward to compute, for example, conditional correlations given the values of some elements and this can help in the search for potential conditional independences.
Once we have chosen a qualitative structure, we need to quantify it with values for means, variances and covariances. These might be chosen subjectively. More likely we will use historical data and an offline learning phase in which we fit an analogous model, with a fully specified prior distribution, using, for example, Markov chain Monte Carlo (MCMC) methods to compute posterior summaries for model parameters. We construct a model for the joint distribution of Z 1,i , . . . , Z J,i , Z J+1,i , the corresponding covariates, X 1,i , . . . , X J,i , and the lifetime, T . This provides a missing data model for any missing covariate values in the historical data. The model for offline learning is the same as the Bayes linear Bayes model except that we specify a full joint probability distribution and a prior distribution for the unknown model parameters, including the thresholds for ordinal covariates and the parameters involved in the means and the variance-covariance matrix in the Bayes linear structure.
Currently we use the posterior means from the offline learning as values for model parameters in our Bayes linear Bayes model. The historical data are, however, independent of future patients, given the model parameters. This raises the possibility, that we can avoid any such compromise and obtain exactly the expectations which we need. For example, in (8) and (9), clearly we can obtain the posterior expectations of P(Z) and P(Z)E(Z) directly from the MCMC computations in the learning phase. However further work is required to address the problem of parameter uncertainty in the adjusted expectations and precisions. Nevertheless, with a large historical data set, such effects are likely to be small.
A possible alternative to using MCMC in the offline-learning phase is the integrated nested Laplace approximation (INLA) (eg Rue et al., 2009;Martino et al., 2011). Abdul Jalal (2020) discusses the use of INLA for survival data with missing covariate values.
Once we have a fully specified model, in routine use with new patients we compute adjusted expectations of the prognostic index given observations of some or all of the covariates. Because we can do the calculation when only a subset of the covariates are observed, we can include a greater number of potential covariates in our model and therefore use more information when it is available.

The Latent Variables Z
In the offline-learning phase, the vector To facilitate specification of a prior distribution for the (J + 1)(J + 2)/2 unique elements of V , we factorize the distribution of and, for j > 1, conditional distributions of Z j,i given Z 1,i , . . . , Z j−1,i and X (p) i . That is, for j > 1, we regress Z j,i on its predecessors. Thus where φ j,k is the regression coefficient of Z j,i on Z k,i , ε j,i ∼ N(0, τ −1 j ) and ε j,i is independent of ε j ,i unless j = j and i = i . The means are given by where the values of the permanent covariates for patient i are X h,i } might include indicator variables for any categorical permanent covariates, such as Sex.
Our model parameters thus include the J + 1 means, μ 1 , . . . , μ J+1 , the (J + 1)H regression coefficients, γ 1,1 , . . . , γ J+1,H , on the permanent covariates and our new parameters for V which are the J +1 conditional precisions τ 1 , . . . , τ J+1 and the J(J +1)/2 coefficients φ j,k . The form (10) gives the possibility of introducing conditional independence structure, if required, by setting some of the regression coefficients to zero and thus omitting some edges in the directed graph. The regression parameters φ j,k are unrestricted and the conditional precisions τ j just need to be positive. Expressing prior beliefs in terms of these parameters allows greater flexibility in the specification of a prior distribution than using, for example, an inverse-Wishart prior. Furthermore, especially if the order of the variables is chosen to reflect prior perceptions of the direction of causality or possible conditional independences, we believe that prior specification in terms of these regression parameters is much easier than directly in terms of the elements of V .
The use of this reparameterisation of a variance matrix was introduced, in the context of models for longitudinal data, by Pourahmadi (1999), who explained its relationship to a square-root-free Cholesky decomposition of the precision matrix V −1 . An explanation of this relationship is given in the Supplementary Materials (Al-Taie and Farrow, 2022). See also Daniels and Pourahmadi (2002). Ibrahim et al. (2001) use a similar structure, applied directly to the covariates, as a missing-data model in survival and Zhao (2010) discusses this and other approaches to missing data in survival.
If we wish to use the historical data to select a more sparse structure, with some edges omitted, then one possibility is to use a shrinkage prior for the parameters. See, for example, Wang (2015); Carvalho et al. (2009).

Marginal Models for Covariates
We need to specify the conditional distributions of the covariates X j given the corresponding variables Z j , and of T given Z T . Some such distributions involve parameters and we use the offline learning to choose values for these. We will discuss the case of the lifetime T in Section 5.4.
In the case of an ordinal covariate, it is necessary to learn about the values of the thresholds, c 1 , . . . , c S−1 . However, since the mean and variance of Z are both unknown, for identifiability we fix two thresolds. Thus, if S = 3, we can fix c 1 = 0 and c 2 = 1. If S = 4, then we can fix c 1 = 0 and c 3 = 1 and then give c 2 a beta prior distribution.
For S > 4, we can fix c 1 = 0 and c S−1 = 1 and then give u 1 , . . . , u S−2 a Dirichlet prior distribution and, for s = 2, . . . , S − 2, let c s = s−1 i=1 u i . Some continuous covariates may be suitable for direct inclusion in the Bayes linear structure, perhaps transformed through a suitable function g() so that Z j = g(X j ). In this case Z j might be observed. If we have more than one such directly observed element of Z, the covariance structure of Z might be such that the conditional independence conditions for the use of (8) and (9) are violated. To overcome this, when updating, we collect all such directly observed elements into a single vector observation, where there may be nonzero conditional covariances between the elements, given the nondirectly observed elements of Z. The directly observed elements can include intervalcensored variables when they are not censored. In other cases we might prefer a less direct relationship. For example we might set Z j = E[g(X j ]. We might then structure the graph so that each such X j has a single parent Z k(j) so that the conditional independence conditions given in Section 3.4 are satisfied. Then we can include g(X j ) in updating in the same way as other covariates except that we update the mean and variance of Z k(j) using Bayes linear adjustment.

Lifetime Distribution
In the routine use of our Bayes linear Bayes prognostic index with new patients, we do not observe T or Z T . We simply need to compute the expectation of a quantity related to T , given the available information. We order the variables in (10) so that Z T comes last. Then, to compute the expectation of Z T given any information on the covariates, that is given any current values of the expectations of Z 1 , . . . , Z J , we do not need to specify τ T , where τ −1 T = σ 2 T = Var(ε T ), the conditional variance of Z T given Z c = (Z 1 , . . . , Z J ) . To see this, first consider the dependence of Z T on Z c . The variance-covariance matrix of Z can be written as where V c is the variance-covariance matrix of Z c , c is the covariance between Z c and Z T and V T is the marginal variance of Z T . Therefore we can write where m T is the prior mean of Z T and m c is the prior mean of Z c . Moreover (11) continues to hold when we gain information, I, which changes our beliefs about some or all of the elements of Z c . In particular which does not involve τ T . Further details are given in the Supplementary Materials (Al-Taie and Farrow, 2022).
In the offline-learning phase, we use observed lifetimes and we do need to specify a model for the conditional lifetime distribution. An approach which is analogous to traditional survival models is to use a parametric model, such as a Weibull distribution, for the conditional lifetime distribution given Z c . In the case of a Weibull distribution, for patient i, we use a Weibull(α, λ i ) distribution for patient i, where η i = log(ρ i ) = α −1 log(λ i ) and we set η i = E(Z T,i | Z c ). We do not really need to spell out the relationship between T i and Z T,i , merely to suppose that it exists and specify the relationship between the conditional mean of Z T,i and the scale parameter of the lifetime distribution. However we can describe the relationship as follows, conditioning throughout on the values of m 1,i , . . . , m J,i and φ J+1,1 , . . . , φ J+1,J .
where Φ() is the standard normal distribution function and λ i = exp(αm * T,i ). We show this as follows. The distribution functions of Z T,i and T i and solving for T i in terms of Z T,i gives (12). The ratio of two quantiles in a Weibull(α, λ i ) distribution depends only on α. Similarly, the ratio of two quantiles in the lognormal(m * T,i depends only on σ 2 T . The quantiles at probability p for these two distributions respectively. The logarithm of the ratio of two quantiles, at probabilities p 1 and p 2 , is R W (α, p 1 , p 2 ) = α −1 log[log(1 − p 1 )/ log(1 − p 2 )] in the Weibull case and R LN (σ 2 T , p 1 , p 2 ) = σ T [Φ −1 (p 1 − Φ −1 (p 2 )] in the lognormal case. We can fix the relationship between α and σ 2 T by choosing p 1 and p 2 and setting R W (α, p 1 , p 2 ) = R LN (σ 2 T , p 1 , p 2 ). For example, we can use the upper and lower quartiles, so p 1 = 0.75 and p 2 = 0.25. This gives α = 1.166σ −1 T . An alternative would be to set η i = Z T,i , rather than η i = m * T,i , in a Weibull(α, λ i ) distribution, with η i = log(λ i )/α. This corresponds to introducing lognormal frailties. In this case τ −1 T would be the variance of the logarithms of the frailties and we would need to learn about both α and τ T in the offline learning. With univariate survival data, separating the effects of these two parameters in the likelihood depends on a change in the shape of the survival function and the degree of non-proportionality in the hazards.
Of course, distributions other than the Weibull distribution could be used. Wilson and Farrow (2017) used Bayes linear kinematics in survival analysis using a semiparametric approach with a piecewise-constant hazard model.

Background
As an example, we use survival prognosis for patients with non-Hodgkin lymphoma. To build our model we used historical data collected by the Scotland and Newcastle Lymphoma Group from patients in Scotland and the North of England, UK, (Proctor and Taylor, 2000). The data set contained 1391 patients. A large proportion of these patients had at least some missing covariate values. The data set was used by Sieniawski et al. (2009) who used fourteen of the many covariates. Details of these are summarised in the Supplementary Materials (Al-Taie and Farrow, 2022). Three of the fourteen covariates are quantitative, seven are binary, two are ordinal and one is an intervalcensored continuous variable, where the observation was recorded as "normal", if the measurement was inside the normal range, or as an actual value if it was not. Only 636 (45.7%) of these patients had observations of all fourteen of these covariates recorded. The outcome variable is survival time, from diagnosis. This was right-censored in 653 cases, with 738 observed death times. For further details, see, for example, Zhao (2010); Abdul Jalal (2020). Abdul Jalal (2020) gives a survey of relevant medical literature which can be used in the construction of prior distributions.
Computer code used in the example is available for download at https://github.com/ZigZag1dr/AlTaieFarrow1.

Example with Six Covariates
To illustrate our method, we initially use a subset of these fourteen covariates, consisting of Age, Sex, Stage (Ann Arbor Stage, Carbone et al., 1971), an ordinal variable, Hb, blood haemoglobin concentration, a continuous variable, Wbc, white blood cell count, also treated as a continuous variable, and Albumin, a binary variable indicating whether serum albumin concentration was within the normal range. We used the logarithm of the white blood cell count to make the distribution more symmetric.
We chose to treat Age and Sex, which are always observed, as permanent covariates and to condition the rest of the model on these. Thus the means of Z 1 , . . . , Z J , Z J+1 , but not the variances and covariances, depend via a linear model on age and sex. We adopted a general covariance structure for the Bayes linear network, with Z 1 , . . . , Z 4 , for the remaining covariates, and Z T related using the regression structure described in Section 5.2. Specifically, the covariates were numbered so that X 1,i , X 2,i , X 3,i and X 4,i are, respectively, Hb, the logarithm of Wbc, Stage and Albumin (0: normal, 1: abnormal) for patient i. Let X 5,i = T i , the lifetime for patient i. Let X a,i + 60 be the age in years and X s,i indicate the sex (−1: female, 1: male) of patient i. Then, for j = 1, . . . , 5, m j,i = μ j + γ j,a X a,i + γ j,s X s,i . The regression structure is then given by (10).
The marginal models were as follows. For j = 1, 2, X j,i = Z j,i . For j = 3 (Stage), X j,3 = s if and only if c s−1 ≤ Z 3,i < c s for cut-points c 0 → −∞, c 1 = 0, c 2 , c 3 = 1, c 4 → ∞. For j = 4 (Albumin), X 4,i = 0 if Z 4,i < 0 and X 4,i = 1 if Z 4,i ≥ 0. For j = 5 (lifetime), we used a Weibull distribution. Of course other forms of lifetime distribution could also be used, but, in our example, we used The computations in the offline-learning phase were done using JAGS through the R package rjags (Plummer, 2017;R Core Team, 2018). The JAGS model specification is given in the Supplementary Materials (Al-Taie and Farrow, 2022).
Having completed the offline learning, we calculated the Bayes linear kinematic (BLK) prognostic index values for all of the patients in the dataset with parameter values obtained from the offline learning. For comparison with the BLK prognostic index values, we also used MCMC to calculate "full Bayes" values. To make the results comparable and represent routine use in practice, we fixed parameters in the "full Bayes" calculation at the values obtained from the offline learning. The R code for the Bayes linear kinematic calculations and the JAGS model specification for the full-Bayes comparison are given in the Supplementary Materials (Al-Taie and Farrow, 2022). Figure 3 shows the BLK index values plotted against the corresponding "full-Bayes" values. We can see that the values from the BLK adjustment are very close to the posterior means from the full-Bayes analysis. To be able to see the agreement more clearly we show a Bland and Altman agreement plot (Bland and Altman, 1986) in Figure 4. The difference between the BLK index value and the "full-Bayes" value is plotted against the mean of these values for each patient. We can see that the mean difference is very close to zero and the differences are small compared to the scale of the index values. Some points are shown coloured. These are cases where the value of at least one covariate is missing. Only Hb, Wbc and Albumin have missing values. There were no missing values for Stage. There is no obvious pattern to the locations of the missing-value cases.
Using non-conjugate marginal updates with normal marginal priors has brought the results of the Bayes linear kinematic analysis closer to those obtained in an analogous full-Bayes analysis when compared to earlier work. The difference between the Bayes linear kinematic analysis and the full-Bayes analogue is now largely in how changes in belief are propagated through Z when data are observed, whether this is by Bayes linear kinematics using only first and second moments or a more complicated change in a joint probability distribution. This example suggests that, in practical terms, there is little difference between the results.

Example Using Ten Covariates
We also investigated the construction of a network using ten of the covariates used by Sieniawski et al. (2009). As in the 6-covariate example, we treated Age and Sex as permanent covariates. In the offline-learning phase, the additional four covariates were appended to the regression structure following the four in the six-covariate example, in the order as follows: LDH, serum lactate dehydrogenase level, units per litre, an interval-censored variable; ECOG, Eastern Cooperative Oncology Group performance status (Oken et al., 1982), an ordinal variable with five categories; AP, serum alkaline phosphatase concentration, a binary variable (normal or high); Urea, serum urea concentration, a binary variable (normal or high).
In the case of LDH, observations within the normal range were simply recorded as "normal" but values outside the normal range were recorded numerically. The normal range varies between centres of data collection. Zhao et al. (2014) converted LDH values into an ordinal variable by first dividing the LDH value by the upper limit of the normal range and then categorising this ratio, R, into three categories: 1 for R ≤ 1, 2 for 1 < R ≤ 3 and 3 for R > 3. Like Zhao et al. (2014), we transformed the LDH values using the limits of the normal range. However we did not convert observed values into an ordinal variable but rescaled so that, if Y is the original LDH value and y 0 and y 1 are the lower and upper limits of the normal range, then we used X = (Y − y 0 )(y 1 − y 0 ) as our covariate so that the lower and upper limits are always 0 and 1. Values between these limits are interval censored. If the observation is not censored, then it is treated as a continuous, normally distributed, observation. If the observation is censored, then it is treated as an observation on an ordinal variable with three categories, 0, 1 and 2, with the observed value being 1.
After completing the offline-learning, as in the six-covariate example, computation of the BLK index values for patients was very fast and no problems were encountered. The corresponding "full-Bayes" computation, used for comparison, involved sampling values for a number of unobserved elements of Z. If no covariates were missing and LDH was not censored, this number was six. Any missing covariate increased this number by one, as did LDH being censored. Mixing and convergence of the MCMC chains need to be considered in the "full-Bayes" calculation. In contrast, the Bayes linear kinematic computation is deterministic so there were no concerns about mixing or convergence. As in the six-covariate example, there is a good agreement between the index values given by Bayes linear kinematics and the "full-Bayes" analogue. Further details of this larger example are given in the Supplementary Materials (Al-Taie and Farrow, 2022).

Conclusions
In this paper, we have extended the applicability of Bayes linear kinematics and Bayes linear Bayes graphical models by developing the use of non-conjugate marginal prior distributions for observable variables. We have then described the application of these ideas to the routine calculation of prognostic index values in medical survival.
The purpose of our Bayes linear Bayes prognostic network is the quick and easy routine calculation of prognostic index values for new patients, potentially using a large number of covariates but able to work when only some of these are observed. The missing-data ability can be achieved in a full-Bayes model by modelling the joint distribution of all of the variables, rather than just the conditional distribution of the lifetime given the covariates. However, in a full-Bayes model, we would need to integrate over the joint distribution of the missing covariates, conditional on the observed values, which may be computationally demanding. In our Bayes linear Bayes network, even with non-conjugate marginal updates, we typically need no more than a series of one-dimensional integrations which can usually be done very quickly.
Furthermore, a full-Bayes analysis typically requires a lot of decisions to be made about the forms of relationships between variables and these choices may have little basis either in expert judgement or the analysis of historical data. In contrast, the Bayes linear Bayes approach requires a more limited specification of relationships in terms of first and second moments and focussing on these more limited judgements might lead to sounder choices.
Our method might be regarded as an approximation to a full-Bayes analysis. Wilson and Farrow (2017) compared the behaviour of Bayes linear kinematic belief adjustments with full-Bayes posterior inferences in the case of a piecewise constant hazard survival model and found that the results were generally close. Our use of non-conjugate updates brings our model closer to the corresponding full-Bayes model and gives Bayes linear kinematic adjusted expectations which are even closer to the full-Bayes posterior means.
Our prototype prognostic network produces prognostic index values using all, or only some, of the possible covariates almost instantly and has the potential to be used, for example, as a Web-based calculator.
An alternative method, not involving BLK or MCMC in the computation of an index value for a new patient, might be developed by using the offline-learning phase to develop a separate predictive formula for each possible configuration of missing and censored covariates. However the number of such configurations might be large and some might not be present in the historical data, or only represented by a small number of cases. It would therefore be necessary to use a model for the joint distribution of the covariates and, when different types of covariates are involved, the resulting calculations for the conditional expectation of the index might not be straightforward. This paper is not intended to deal with questions such as variable selection. It is assumed that the covariates are all thought to contain information which is at least potentially useful in prognosis. Similarly, the paper is not concerned specifically with answering questions of whether particular covariates are or are not useful predictors of survival. We might, however, learn something about this latter question in the offlinelearning phase. For example, a particular latent variable Z j might turn out to be almost conditionally uncorrelated with Z T given some other elements of Z. This latter possibility is interesting since there may be little to be gained by measuring X j if certain other covariates have already been measured. This leads to the interesting decision problem of whether to observe additional covariates when making the observations might entail costs or risks.

Supplementary Material
Bayes linear Bayes networks with an application to prognostic indices Supplementary Materials (DOI: 10.1214/22-BA1314SUPP; .pdf). The Supplementary Material contains the following items. Numbers in parentheses refer to sections of this paper. Sections 1 and 2 give further details of the square-root-free Cholesky decomposition of the precision matrix (5.2) and the adjusted expectation of Z T (5.4) respectively. Section 3 gives details of the data set in the non-Hodgkin's lymphoma example (6) and the analyses with six and ten covariates are described in Sections 4 and 5. JAGS model specifications for the offline learning and the "full-Bayes" comparison are given in Sections 7 and 9 while Section 8 gives R functions used in the Bayes linear kinematic calculations. The code is all available in a repository specified in Section 6. many covariates and missing data." Ph.D. thesis, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK. 16,19 Zhao, Z., Sehn, L. H., Rademaker, A. W., Gordon, L. I., Lacase, A. S., Crosby- Thompson, A., and Vanderplas, A. (2014). "An enhanced International Prognostic Index (NCCN-IPI) for patients with diffuse large B-cell lymphoma treated in the rituximab era." Blood , 123 (26): 837-842. 21, 22