The Dynamic Chain Event Graph

: In this paper we develop a formal dynamic version of Chain Event Graphs (CEGs), a particularly expressive family of discrete graphical models. We demonstrate how this class links to semi-Markov models and provides a convenient generalization of the Dynamic Bayesian Network (DBN). In particular we develop a repeating time-slice Dynamic CEG providing a useful and simpler model in this family. We demonstrate how the Dynamic CEG’s graphical formulation exhibits the essential qualitative features of the hypotheses it embodies and also how each model can be estimated in a closed form enabling fast model search over the class. The expressive power of this model class together with its estimation is illustrated throughout by a variety of examples including ones describing childhood hospitalization and the eﬃcacy of a ﬂu vaccine.


Introduction
In this paper we propose a novel class of graphical models called Dynamic Chain Event Graph (DCEG) to model longitudinal discrete processes that exist 1 Paper No. 14-04v2, www.warwick.ac.uk/go/crism in many diverse domains such as medicine, biology and sociology.These processes often evolve over long periods of time allowing studies to collect repeated multivariate observations at different time points.In many cases they describe highly asymmetric unfoldings and context-specific structures, where the paths taken by different units are quite different.Our objective is to develop a framework for graphical representation, propagation, estimation, model selection and causal analysis for those dynamic processes.
In the literature there are various dynamic graphical models to model longitudinal data.The most widely used is the Dynamic Bayesian Network (DBN) (Dean and Kanazawa (1989); Nicholson (1992); Kjaerulff (1992)), where the process in each time-slice is modelled using a Bayesian Network (BN) and the temporal dynamic is embedded in the model by temporal edges connecting these different BNs, see Section 2.2.However a DBN (and also a BN) does not allow us to model context-specific conditional independences directly in its graph and thus in the statistical model.A DBN does this analytically and in a hidden way by absorbing these context-specific statements into the implicit structures within its conditional probability tables.
To allow for irregular time-steps in a DBN, Nodelman et al. (2002) suggested the development of a Continuous-Time BN whose variables evolve continuously over time.The model combines the standard BN with continuous-time Markov processes, where a graph describes the local dependence structure between the variables and the evolution of each variable is given by a set of conditional Markov processes.One problem of these models is, however, that exact inference is intractable and approximate techniques need to be used.This is a particular problem if one wants to support model selection techniques across a class.
Another interesting class of dynamic graphical model (also related to BNs) is the local independence graph (Didelez (2008)) or the graphical duration model (got (2007)).Here it is assumed that data is available on the event history of a group of people, which includes particular events that occur and the time until an event occurs.The dependence structure between the number of occurrences of each event is then depicted by a local independence graph.This, however, assumes that the conditional independencies do not change with time, and relationships can be naturally expressed in terms of risks.
Here we propose a different graphical framework based on a tree to model longitudinal data, which are observed at not necessarily regular time-steps.We can incorporate many potential context-specific conditional independences that may vary over time within this class.This enables us to estimate each model in a tractable and transparent way.In spite of their power and flexibility to model diverse domains, previous graphical models are not able to enjoy all these advantages.
Recently tree-based graphical models have been successfully used to describe various phenomena.This tree provides a flexible graphical support through which time sequences can be easily incorporated.Each path in the tree de-2 Paper No. 14-04v2, www.warwick.ac.uk/go/crism scribes the various possible sequences of events an individual can experience.One such alternative tree based model is the Chain Event Graph (CEG) (Smith and Anderson (2008); Freeman and Smith (2011a); Thwaites (2013); Barclay et al. (2014)).
A Chain Event Graph (CEG) is a colored discrete graphical model constructed from a tree.The coloring of the vertices and edges of the tree represents elicited context-specific symmetries of a process (see Section 2.3).Although the topology of a CEG is usually more complicated than the corresponding discrete BN if it exists, it is often much more expressive.In a CEG not only conditional independencies, but also context-specific symmetries, are directly depicted in the topology and coloring of the graph of the CEG.Furthermore structural zero probabilities in the conditional probability tables are directly depicted by the absence of edges in its graph.See, for example, Smith and Anderson (2008); Barclay et al. (2013); Cowell and Smith (2014).
It has been recently discovered that a CEG also retains most of the useful properties of a BN like closure to learning under complete sampling (Freeman and Smith (2011a)) and causal expressiveness (Thwaites (2013); Thwaites et al. (2010); Riccomagno and Smith (2009); Thwaites and Smith (2006a)).It also supports efficient propagation (Thwaites et al. (2008); Thwaites and Smith (2006b)).Hence CEGs provide an expressive framework for various tasks associated with representing, propagating and learning, especially when the tree of the underlying sample space is asymmetric (French and Insua (2010)).
The CEG model resembles the Probabilistic Decision Graph (PDG) (Oliver (1993); Jaeger (2004)).This has been widely studied as an alternative inferential framework to the BN and efficient model selection algorithms (Jaeger et al. (2006); Nielsen et al. (2010)) have been developed for it.However, unlike the PDG the class of CEGs contains all discrete BNs (see Smith and Anderson (2008), section 3.2, p. 56 for the proof), as well as all the extension of the BN to context-specific BNs (Boutilier et al. (1996); Friedman and Goldszmidt (1998)) and Bayesian multinets (Geiger and Heckerman (1996); Bilmes (2000)).
This fact guarantees that all the conditional independences entailed by these model classes are embodied in the topology of the graph of a single CEG (see for example Smith and Anderson (2008); Thwaites and Smith (2011) as well as many others).The topology of the CEG has hence been exploited to fully represent and generalize models such as context-specific BNs.
It has became increasingly apparent that there are many contexts that need a BN to be dynamic, see e.g.Rubio et al. (2014).Currently there is no such dynamic CEG defined in the literature.Freeman and Smith (2011b) developed one dynamic extension of CEGs where the underlying probability tree is finite but the stage structure of the possible CEGs is allowed to change across discrete time-steps.This model, however, develops an entirely distinct class of models to the one considered here.It looks at different cohorts entering the tree at discrete time-points rather than assuming that repeated measurements are taken over time. 3 In this paper we develop the DCEG model class that extends the CEG model class so that it contains all dynamic BNs as a special case.In this sense we discover an exactly parallel extension to the original CEG extension of the BN class.We show that any infinite tree can be rewritten as a DCEG, which represents the originally elicited tree in a much more compact and easily interpretable form.A DCEG actually provides an evocative representation of its corresponding process.It allows us to define many useful DCEG model classes, such as the Repeating Time-Slice DCEG (RT-DCEG) (see Section 3.4), that have a finite model parameter space.
A DCEG also supports conjugate Bayesian learning where the prior distribution chosen in a family of probability distributions A together with the available likelihood function yield a posterior distribution in the same family A. This is a necessary requirement to guarantee analytical tractability and hence to design clever model search algorithms that are able to explore the large numbers of collections of hypotheses encoded within the DCEG model space.We further demonstrate that we can extend this framework by attaching holding time distributions to the nodes in the graph, so that we can model processes observed at irregular time intervals.
In Section 2 we present some important graph concepts and the definitions of a BN, a DBN and a CEG.In Section 3 we formally define the infinite staged tree and the DCEG.We further introduce the Extended DCEG which attaches conditional holding times to each edge within the graph.We here also introduce the RT-DCEG, a special class of DCEGs, which imposes certain restrictions on the more general class of DCEGs.In Section 4 we show how to perform fast conjugate Bayesian estimation of these model classes and demonstrate how a typical model can be scored.In section 5, we demonstrate that any general DBN lies in the class of DCEGs and so show that DCEGs are a formal extension of the successful class of DBN models.We then present some connections between the (Extended) DCEG model class and some (Semi-) Markov processes.We conclude the paper with a short discussion.

Background
In this section we revisit some graph notions that will be useful to discuss graphical models.Next we explains briefly the BN and DBN models and then define a CEG model.See, e.g., Korb and Nicholson (2004); Neapolitan (2004); Cowell et al. (2007); Murphy (2012) for more detail on BNs and DBNs.The CEG concepts presented here are a natural extension of those in Smith and Anderson (2008); Thwaites et al. (2010); Freeman and Smith (2011a).These conceptual adaptations will allow us to directly use these concepts to define a DCEG model.

Graph Theory
Definition 1. Graph Let a graph G have vertex set V (G) and a (directed) edge set E(G), where for each edge e(v i , v j ) ∈ E(G), there exists a directed edge and let pa(v j ) be the set of all parents of a vertex v j .Also, call v k a child of v i if e(v i , v k ) ∈ E(G) and let ch(v i ) be the set of all children of a vertex v i .We say the graph is infinite when either the set V (G) or the set E(G) is infinite.
is a graph all of whose edges are directed with no directed cycles -i.e. if there is a directed path from vertex v i to vertex v j then a directed path from vertex v j to vertex v i does not exist.
Definition 3. Tree A tree T = (V (T ), E(T )) is a connected graph with no undirected cycles.Here we only consider a directed rooted tree.In this case, it has one vertex, called the root vertex v 0 , with no parents, while all other vertices have exactly one parent.
A leaf vertex in V (T ) is a vertex with no children.A level L is the set of vertices that are equally distant from the root vertex.A tree is an infinite tree if it has at least one infinite path. where: • its vertex set V (F (s i )) consists of {s i } ∪ ch(s i ), and • its edge set E(F (s i )) consists of all the edges between s i and its children in T .

A Bayesian Network and a Dynamic Bayesian Network
A Bayesian Network is a probabilistic graphical model whose support graph is a DAG G = (V (G), E(G)).Each vertex v i ∈ V (G) represents a variable Z i and the edge set E(G) denotes the collection of conditional dependencies that are assumed in the variable set.Thus the variable Z j is conditionally independent of variable Z i given the variable set {Z 1 , . . ., Z j−1 } \ {Z i }} whenever an edge e(v i , v j ) does not exist in E(G).Recall that the concept of conditional independence plays a central role in this model class (Dawid (1998); Pearl (2009), Chapter 1, p. 1-40).We give a simple example below.
Example 1.An individual is at risk of catching flu.Having caught flu he either decides to take antiviral treatment or not (Treatment variable, see Figure 1).If he takes the antiviral treatment we assume that he will always recover.On the other hand if he does not take the antiviral treatment he either manages to recover or he dies from the virus (Recovery variable, see Figure 1).Given a full recovery the individual can either decide to go back to his normal life or to receive an influenza vaccine to prevent him from being at risk again (Vaccine variable, see Figure 1).We further hypothesise that the decision of taking a vaccine is conditionally independent of the decision to take the antiviral treatment given, of course, that the individual is alive.Thus the Recovery and Vaccine variables depends, respectively, on the Treatment and Recovery variables, but the Vaccine variable is conditionally independent of the Treatment variable given the Recovery variable.Figure 1 shows a standard BN to model this process.In Section 2.3 we will redepict this process using a CEG and demonstrate the extra expressiveness this gives us.
T reatment Recovery V accine Another class of graphical model we will discuss and compare in this paper is the Dynamic Bayesian Network (DBN) which models the temporal changes in the relationships among variables.It extends directly the BN conception.
A DBN G = (V (G), E(G)) can be interpreted as a collection of BNs {G t = (V (G t ), E(G t )); t = 1, 2, . ..}where the variables in time-slices t can also be affected by variables in the previous time-slices but not by variables in the next ones.These dependencies between time-slices are represented graphically by edges called temporal edges.Formally, , where E t is the set of temporal edges {e(v i,τ , v j,τ ); τ < t} associated with a BN G t and v i,τ represents a variable Z i in time-slice τ < t.
In this paper we only consider discrete BNs and discrete DBNs where all variables have discrete state spaces.

A Chain Event Graph
A full description of this construction for finite processes can be found in Smith and Anderson (2008); Freeman and Smith (2011a); Thwaites et al. (2010).Here we summarise this development.
Definition 5. Event Tree An event tree is a finite tree T = (V (T ), E(T )) where all vertices are chance nodes and the edges of the tree label the possible events that happen.A non-leaf vertex of a tree T is called a situation and S(T ) ⊆ V (T ) denotes the set of situations.
The path from the root vertex to a situation s i ∈ S(T ) therefore represents a sequence of possible unfolding events.The situation denotes the state that is reached via those transitions.Here we assume that each situation s i ∈ S(T ) has a finite number of edges, m i , emanating from it.A leaf node symbolises a possible final situation of an unfolding process.An edge can be identified by two situations s i and s j (edge e(s i , s j )) or by a situation s i and one of its corresponding unfolding events k (edge e sik ).
Definition 6. Stage We say two situations s i and s k are in the same stage, u, if and only if 1. there exists an isomorphism Φ ik between the labels of E(F (s i )) and E(F (s k )), where Φ ik (e sij ) = e s k j , and 2. their corresponding conditional probabilities are identical.
When there is only a single situation in a stage, then we call this stage and its corresponding situation trivial.
If two situations are in the same stage then we assign the same color to their corresponding vertices.In other publications, for example Smith and Anderson (2008), corresponding edges of situations in the same stage are also given the same color.For clarity here we only color vertices and edges corresponding to non-trivial situations.We can hence partition the situations of the tree S(T ) into stages, associated with a set of isomorphisms {Φ ik : s i , s k ∈ S(T )}, and embellish the event tree with colors to obtain the staged tree.See section 3.1 for formal details in how to embellish a probability map in a event tree.

Definition 7. Staged Tree
A staged tree version of T is one where 1. all non-trivial situations are assigned a color 2. situations in the same stage in T are assigned the same color, and 3. situations in different stages in T are assigned different colors.
We illustrate these concepts through a simple example on influenza, which we later develop further using a DCEG model.
Example 1 (continued).After eliciting the event tree (see Figure 2) corresponding to the Example 1 we can hypothesize possible probabilistic symmetries in this process.For example, we might assume that recovering with or without treatment will not affect the individual's probability to decide to get the vaccine.This demands that the probabilities on the edges emanating from s 1 , labeled "resume normal life" and "get vaccine", are identical to the probabilities on the edges emanating from s 5 with the same labels.This assumption can be visualized by coloring the vertices of the event tree and their corresponding edges.
A finer partition of the vertices in a tree is given by the position partition.Let T (s i ) denote the full colored subtree with root vertex s i .Two situations s i , s k in the same stage, that is, s i , s k ∈ u ∈ U, are also in the same position w if there is a graph isomorphism Ψ ik between the two colored subtrees T (s i ) → T (s k ).We denote the set of positions by W .
The definition hence requires that for two situations to be in the same position there must not only be a map between the edge sets E(T (s i )) → E(T (s k )) of the two colored subtrees but also the colors of any edges and vertices under this map must correspond.For example when all children of s i , s k are leaf nodes then T (s i ) = F (s i ) and T (s k ) = F (s k ).Therefore s i and s k will be in the same position if and only if they are in the same stage.But if two situations are further from a leaf, not only do they need to be in the same stage but also each child of s i must correspond to a child of s k and these must be in the same stage.This further applies to all children of each child of s i and so on.
Definition 9. Chain Event Graph (Smith and Anderson (2008)) A CEG C = (V (C), E(C)) is a directed colored graph obtained from a staged tree by successive edge contraction operations.The situations in the staged tree are merged into the vertex set of positions and its leaf nodes are gathered into a single sink node w ∞ .
A CEG depicts not only the unfolding of events expressed in a tree but also the types of probabilistic symmetries.
Example 1 (continued).The hypothesis that recovering with or without treatment does not affect the probability of the individual taking the flu vaccine, places s 1 and s 5 in the same position.We then obtain the following CEG given in Figure 3 with stages and positions given by:  Note that the corresponding BN (Figure 1) cannot depict graphically the asymmetric unfolding of this process and the context-specific conditional statements.

Infinite Probability Trees and DCEGs
In this section we extend the standard terminology used for finity trees and CEGs to infinite trees and DCEGs.In the first subsection we derive the infinite staged tree, followed by a formal definition of the DCEG.The next subsection we extend the DCEG to not only describe the transitions between the vertices of the graph but also the time spent at each vertex.Finally we define a useful class of DCEG models.

Infinite Staged Trees
Clearly an infinite event tree can be uniquely characterized by its florets, which retain the indexing of the vertices of T .The edges of each floret can be labeled as e sij ∈ E(F (s i )), j = 1, . . ., m i , where s i has m i children.As noted above, we can think of these edge labels as descriptions of the particular events or transitions that can occur after a unit reaches the root of the floret.In particular, we can also use the index j = 1, . . ., m i to define a random variable taking values {x 1 , . . ., x mi } associated with this floret.
Example 1 (continued).Assume that the individual is every month at risk of catching the flu.As before, given a full recovery from the virus, with or without treatment, the individual can either decide to go back to his normal life where he is at risk of catching flu again or decide to receive an influenza vaccine to prevent him from being at risk again.As the tree is infinite, only an informal depiction of the corresponding tree can be given (Figure 4), where implicit continuations of the tree are given by the notation '. ..'.
In our example the edges E(F (s 0 )) describe whether the individual catches the flu (edge e(s 0 , s 1 )) or not (edge e(s 0 , s 2 )), while the floret of vertex s 1 describes, having caught the flu, whether the individual takes the treatment (edge e(s 1 , s 3 )) or not (edge e(s 1 , s 4 )).
From the above example we can observe that each path within an infinite event tree is a sequence through time.To embellish this event tree into a probability tree we need to elicit the conditional probability vectors (CPVs) associated with each floret F (s i ).This is given by where π sij = P (e sij |s i ) is the probability that the unit transitions from s i along the j th edge, and mi j=1 π sij = 1.Collections of conditional independence (or Markovian) assumptions are intrinsic to most graphical models.For an event tree these ideas can be captured by coloring the vertices and edges of the tree as discussed above.This idea immediately extends to this class of infinite trees.
Call U the stage partition of T and define the conditional probability vector 10 Paper No. 14-04v2, www.warwick.ac.uk/go/crism (CPV) on stage u to be where u has m u emanating edges.If U is the trivial partition, such that every situation is in a different stage, then the coloring contains no additional information about the process that is not contained in T .
As above we can further define a CPV on each position: Surprisingly, the positions of an infinite tree T are sometimes associated with a coarser partition of its situations than a finite subtree of T with the same root.This is because in an infinite tree two situations lying on the same directed path from the root can be in the same position.This is impossible for two situations s i , s k in a finite tree: the tree rooted at a vertex further up a path must necessarily have fewer vertices than the one closer to the root, so in particular no isomorphism between T (s i ) and T (s k ) can exist.We give examples below which explicate this phenomenon.
Note that, we would normally plan to elicit the structural equivalences of the model -here the topology of the tree and stage structure associated with its coloring -before we elicit the associated conditional probability tables.This would then allow the early interrogation and adoption of the qualitative features of an elicited model before enhancing it with supporting probabilities.These structural relationships can be evocatively and formally represented through the graph of the CEG and DCEG.In particular this graph can be used to explore and critique the logical consequences of the elicited qualitative structure of the underlying process before the often time consuming task of quantifying the structure with specific probability tables.
Example 1 (continued).In the flu example we may have the staged tree as given in Figure 5. Hence we assume that the probability of catching flu does not change over the months and does not depend on whether flu has been caught before.This implies that s 0 , s 2 , s 5 , s 8 and s 12 are in the same stage, as well as all subsequent situations describing this event, which are not represented in Figure 4. Similarly, s 1 and s 11 are in the same stage, such that whether the antiviral medication is taken or not is also independent of the number of months until the individual catches flu and independent of flu having been caught before.
We further assume that the probability of the individual returning to his normal life after recovery is the same when he recovers after treatment as when he successfully recovers without treatment.This means that s 3 and s 7 , as well as all other situations representing the probability of returning to a normal life after recovery, are in the same stage.It can be seen from the staged tree that, in this example, whenever two situations are in the same stage, they are also in the same position as their subtrees have the same topology and the same coloring of Not all paths in the tree are infinite and hence a set of leaf vertices, {l 6 , l 9 , l 10 , . ..}, exists.

Dynamic Chain Event Graphs
From the definition of a position, w, given a unit lies in w, any information about how that unit arrived at w is irrelevant for predictions about its future development.As for the CEG, the positions therefore become the vertices of the new graph, the DCEG, which we use as a framework to support inference.Further, colors represent probabilistic symmetries between positions in the same stage.Figure 6 depicts the DCEG corresponding to the staged tree shown in Figure 5 above.
We can now define the DCEG, which depicts a staged tree (see Definition 7 in Section 2.3) in a way analogous to the way the CEG represents structural equivalences.
Definition 10.Dynamic Chain Event Graph A Dynamic Chain Event Graph (DCEG) D = (V (D), E(D)) of a staged tree T is a directed colored graph with vertex set V (D) = W , the set of positions of the staged tree T , together with a single sink vertex, w ∞ , comprising the leaf nodes of T , if these exist.The edge set E(D) is given as follows: Let v ∈ w be a single representative vertex of the position w.Then there is an edge from w to a position w When two positions are also in the same stage then they are colored in the same color as the corresponding vertices in the tree T .
We call the DCEG simple if the staged tree T is such that the set of positions equals the number of stages, W = U , and it is then uncolored.
A DCEG is actually obtained from the staged tree by edge contraction operations.Observe also that if two situations are in the same position w there is a bijection between their corresponding florets.Thus we can take any vertex in w to represent it.
Note that the DCEG class extends the CEG models since it could in principle have an infinite number of distinct vertices.When a tree is finite, a CEG is actually a DCEG.However the CEG is always acyclic, whilst a DCEG can exhibit cycles -self-loops or loops across several vertices -when it has an infinite number of atoms but a finite graph.In this case a cycle represents a subprocess whose unfolding structure is unchangeable over time.We illustrate below that in many applications the number of positions of a staged tree is finite even though the tree's vertex set is infinite.When this is the case the DCEG is a finite graph and therefore provides a succinct picture of the structural relationships in the process.
Example 1 (continued).Figure 6 shows the corresponding DCEG of the staged tree given in Figure 5 with V (D) given in Equation 4. The loop from w 0 into itself illustrates that every month the individual could remain well and not catch flu.Alternatively, the individual may move to w 1 at some point, meaning that he has caught flu.In this case he can recover either by getting treatment (w 1 → w 2 ) or recover on his own (w 1 → w 3 → w 2 ).Having recovered the individual either decides to take a flu vaccine to avoid getting flu again (w 2 → w ∞ ) or to simply resume his normal life and risk getting flu again (w 2 → w 0 ).Finally, when not taking treatment, the individual may not recover, and hence move from w 3 to w ∞ .Given the graph of a DCEG we can trace the possible paths an individual may take and the associated events that may occur across time.So far we have implicitly assumed that we have regular steps such as days or months.For instance in the DCEG of the flu example (Figure 6), every month the individual is at risk of catching flu: If he catches flu, he traverses through the rest of the DCEG ending up either at w ∞ or back at w 0 ; if not he loops back directly to w 0 .In this case the time an individual stays in a particular position simply follows a geometric distribution, where the probability that an individual stays in position w for k time steps is equal to Further, it has been assumed that once an individual catches flu, only the events of taking treatment, recovering, and receiving a vaccine are recorded and not the time until these events occur.These could, for example, be recorded retrospectively when measurements are taken a month later.The holding time distributions on a position without a loop into itself are therefore degenerate.
However, in many cases our process is unlikely to be governed by regular time steps and it is much more natural to think of the time steps to be event driven.
A process like this is naturally represented within a tree and hence a DCEG: when moving from one position to another the individual transitions away from a particular state into a different state associated with a new probability distribution of what will happen next.For example, the individual may not record whether he catches flu or not every month but instead monitor the time spent at w 0 not catching flu, until one day he falls ill.Similarly, the time until seeing the doctor for treatment or the time until recovery may be of different lengths and so he spends different amounts of time at each position in the DCEG.Motivated by this irregularity of events, we look at processes in which an individual stays a particular time at one vertex of the infinite tree and then moves along an edge to another vertex.We hence define in this section a generalization of the DCEG, called the Extended DCEG, which attaches a conditional holding time distribution to each edge in the DCEG.
We call the time an individual stays in a situation s i the holding time H si associated with this situation.We can further also define the conditional holding 14 times associated with each edge e sij , j = 1, . . ., m i in the tree, denoted by H sij .This describes the time an individual stays at a situation s i given that he moves along the edge e sij next.Analogously to this we can further define holding times on the positions in the associated DCEG: We let H w be the random variable describing the holding time on position w ∈ W in the DCEG and H wj , j = 1, .., m w the random variable describing the conditional time on w given the unit moves along the edge e wj next.
In this paper we assume that all DCEGs are time-homogeneous.This means that the conditional holding time distributions for two situations are the same whenever they are in the same stage u.Hence, given the identity of the stage reached, the holding times are independent of the path taken.We denote the random variable of the conditional holding time associated with each stage by H uj , j = 1, . . ., m u .Time-homogeneity then implies that when two situations are in the same stage u then their conditional holding time distributions are also the same.We note that an individual may spend a certain amount of time in position w ∈ u before moving along the j th edge to a position w ′ which is in the same stage.So an individual may make a transition into a different position but arrive at the same stage.
We further assume throughout that the conditional probabilities of going along a particular edge after reaching a stage, do not vary with previous holding times.In the flu example this would mean that the time until catching flu does not effect the probability of taking treatment and the probability of recovery without treatment.Similarly, the holding times are assumed to be independent of previous holding times.So, for example, the time until recovery is independent of the time to catching flu.Contexts where the holding time distribution may affect the transition probabilities and future holding times can provide an interesting extension to the DCEG and will be discussed in a later paper.Under these assumptions an Extended DCEG is defined below.
Definition 11.Extended DCEG An Extended DCEG D = (V (D), E(D)) is a DCEG with no loops from a position into itself and with conditional holding time distributions conditioned on the current stage, u, and the next edge, e uj , to be passed through: (5) Hence F uj (h) describes the time an individual stays in any position w merged into stage u before moving along the next edge e wj .
Consequently, given a position w ∈ W (D) is reached, the joint probability of staying at this position for a time less than or equal to h and then moving along the jth edge is Finally, the joint density of e wj and h is where f uj is the pdf or pmf of the holding time at stage u going along edge e wj , w ∈ u next.
An Extended DCEG with stage partition U is hence fully specified by its set of conditional holding time distributions {F uj (.) : u ∈ U } and its collection of CPVs {π u : u ∈ U }.Note that it is simple to embed holding times into the staged tree and into the DCEG as exemplified below.
Example 1 (continued).Return again to the flu example from Section 3.1 with a slightly different infinite tree given in Figure 7. Instead of measuring every month whether the individual catches flu, the individual will spend a certain amount of time at s 0 before moving along the tree.Hence the second edge emanating from s 0 in Figure 5 and its entire subtree have been removed.As before, it is assumed that the probability of catching flu and the decision to take treatment does not depend on whether the flu has been caught before.Also, recovery with or without treatment is assumed not to affect the probability of receiving a vaccine.The corresponding Extended DCEG is given in Figure 8 with positions given by w 0 = {s 0 , s 4 , s 7 , . ..}, w 1 = {s 1 , s 10 , s 11 , . ..}, w 2 = {s 2 , s 6 , . ..}, w 3 = {s 3 , . ..}, w ∞ = {l 5 , l 8 , l 9 , . ..}.
In comparison to Figure 6 the loop from w 0 into itself has been removed.Instead the time spent at w 0 is described by the holding time at position w 0 .Similarly, the time until treatment is taken or not, the time until recovery or death and the time to receiving the flu vaccine or not are of interest and holding time distributions can be defined on these.Definition 12. Repeating Time-Slice DCEG Consider a discrete-time process on I = {t 0 , t 1 , t 2 , . ..} characterised by a finite collection of variables {Z p , p = 1, 2, . . ., P } where the index p defines the same unfolding variable order for each time slice t ∈ I. Denote by Z p,t the variable Z p in the time slice t and assume that all situations corresponding to a variable Z p,t define the level L p,t in the corresponding event tree.Denote also by {s p l ,t0 } and {s p l ,t1 } the set of situations associated to the last variables of time-slices t 0 and t 1 , respectively.We have a Repeating Time-Slice DCEG (RT-DCEG) when all previous conditions are valid and there is a surjection map Υ : {s p l ,t0 } → {s p l ,t1 } such that Υ(s p l ,t0 ) is in the same position as s p l ,t0 for all s p l ,t0 .
The main characteristic of the RT-DCEG topology is that in the end of the second time slice the edges loop back to the end of the first time slice (see Figure 9).We will now illustrate a RT-DCEG modelling with a real-world example.
Example 2. We here consider a small subset of the Christchurch Health and Development Study, previously analysed in Fergusson et al. (1986); Barclay et al. (2013).This study followed around 1000 children and collected yearly information about their family history over the first five years of the children's life.We here consider only the relationships of the following variables given below.
• Financial difficulty -a binary variable, describing whether the family is likely to have financial difficulties or not, • Number of life events -a categorical variables distinguishing between 0, 1 − 2 and ≥ 3 life events (e.g.moving house, husband changing job, death of a relative) that a family may experience in one year, • Hospital admission -a binary variable, describing whether the child is admitted to hospital or not.
In this setting each time slice corresponds to a year of a child's life starting from when the child is one year old, t 0 = 1.A plausible RT-DCEG could be the one given in Figure 9.Note that this RT-DCEG assumes that whether the individual is admitted to hospital or not does not affect the subsequent variables.This is evident from the double arrows from w 3 to w 6 , w 4 to w 7 and w 5 to w 8 .Also, observe that the variable describing the hospital admission is not included at time t = 0, as it does not provide additional information under this assumption.We start at w 0 in order to follow the path an individual might take through the DCEG across time.The first part of the graph describes the initial CPVs at time t 0 .It is first resolved whether or not the family has financial difficulties (w 0 → w 1 , w 0 → w 2 ) and whether the individual experiences 0, 1 − 2 or ≥ 3 life events during this year (w 1 → w 3 ,w 1 → w 4 , w 2 → w 4 , w 2 → w 5 ).She then reaches one of the three positions w 3 , w 4 and w 5 describing a 'health state' the individual is in before a hospital admission may occur.Independent of whether an admission has occurred or not (w 6 , w 7 , w 8 ) she then moves to positions that describe the same three health states.Then, given the individual is in one of the three health states (w 3 , w 4 , w 5 ) at time t, for t ≥ t 1 , she traverses through the graph in the following year according to the financial difficulty and number of life events in year t + 1 and ends up in one of the three previous health states again.
Note that the positions of the RT-DCEG encode the entire history of an individual and we can trace back the full path the individual has taken through the graph.This is a property inherited from the event tree that supports the RT-DCEG graph.For instance, in Example 2 the probability of an individual having a hospital admission at time t is given by P (Adm = 1|w i ) = π wi , i = 3, 4, 5.It therefore depends on the position where the individual is located at time t.These positions are reached depending on the number of life events and the financial difficulty in that year and the health state of the previous year, which is again determined by the financial difficulty and the number of life events of the year before.18

Bayesian Learning of the Parameters of an Extended DCEG
In this section we present the learning process of an Extended DCEG which extends those for the CEG.Conjugate learning in CEGs is now well documented (Smith (2010); Freeman and Smith (2011a)) and the methods resemble indeed analogous learning in discrete BNs -see Korb and Nicholson (2004); Neapolitan (2004); Cowell et al. (2007); Heckerman (2008).
Here we consider only conditional holding time distributions F uj parametrised by a one-dimensional parameter λ uj .Assuming random sampling and prior independence of the vector π of all stage parameters and the vector λ of different holding time parameters , we can show that the posterior joint density of π and λ is given by: where h and N are, respectively, the vector of holding times associated with each stage and the vector of the number of times each edge is taken in the sample; and p 1 (π|N, D) and p 2 (λ|h, N, D) are the posterior distributions of parameters π and λ, respectively.See Appendix A for more details.
Equation 7 makes sure that the parameters π and λ can be updated independently.The Extended DCEG learning can then be divided in two distinct steps: 1. learning the stage parameters π; and 2. learning the holding time parameters λ.
Learning the posterior p 1 (π|N, D) therefore proceeds exactly analogously to learning within the standard CEG.Thus assuming local and global independence of stage parameters π and random sampling -these conditions are also assumed for conjugate learning of BNs - Freeman and Smith (2011a) show that with an appropriate characterisation stage must have a Dirichlet distribution a priori and a posteriori.Here we also assume these conditions to update the stage parameters π in a DCEG model.It then follows that where α uj is the hyperparameter of the prior distribution associated with an edge j of stage u and N uj is the number of times an edge j of stage u is taken.
As with all Bayesian learning some care needs to be taken in the setting of the hyperparameter values α u .In the simplest case we assume that the paths taken on the associated infinite tree are a priori equally likely.We then specify the hyperparameters associated with each floret accordingly.Given that the Extended DCEG has an absorbing position w ∞ we can find, under the above assumptions, the α u , u ∈ U of the Extended DCEG structure D derived from the infinite tree by simply summing the hyperparameters of the situations merged.This direct analogue to Freeman and Smith (2011a) does not however work when no absorbing position exists, for then these sums diverge.hence we need to take a slightly different approach.There are many possible solutions.Here we will adapt the concept of 'equilibrium' in Markov chains and thus make the simplest assumption that our prior beliefs are 'in equilibrium' (see the discussion of Example 2 below).In Section 5 we will analyse the connection between the Extended DCEG and Markov processes in more details.
Note that when the holding time distributions are identical across the model space we have a DCEG and thus the above conjugate analysis suffices to learn its parameters.To compare two different models we can then use the log Bayes Factor as a score function.We illustrate below how we can update the CPVs in a DCEG using the Christchurch example (Section 3.4).
Example 2 (continued).Take the RT-DCEG depicted in Figure 9 (Section 3.4).Note that again the stages and positions of the graph coincide and hence learning the stage parameters is equivalent to learning the position parameters of the graph.To specify the stage priors, we determine the hyperparameters α u of the Dirichlet distribution associated with each stage u as suggested above as follows: We first find the limiting distribution of the Markov process with state space W = {w 3 , w 4 , w 5 , w 6 , w 7 , w 8 , w 9 , w 10 , w 11 } and with transition probability matrix that assumes all paths in the graph are equally likely.So for example, the transition probability from position w 9 is 2/3 to position w 3 , 1/3 to position w 4 and 0 to any other position.The limiting distribution together with an equivalent sample size of 3 (equal to the largest number of categories a variable of the problem takes Neapolitan ( 2004)) determines the strength of the prior on each stage.Further, assuming that the probabilities on the edges emanating from each position are uniform we can deduce the stage priors to be as given in Table 1.
We can now update these priors separately and in closed form for each stage using the data.The data set has 1062 children born in Christchurch, New Zealand, for the first 2 − 5 years of their lives.We use the data from year 2 to update the initial positions w 0 , w 1 and w 2 and then use the hospital admissions variable of year 2, as well as years 3 − 5, to update the remaining CPVs.Doing so we obtain the posterior distributions associated with each stage given in Table 1.WE also present their corresponding means and 95% credible intervals.
Thus, for example, the expected probability of a child being admitted to hospital is 0.07 given she has reached position w 3 .This represents three possible developments: i) she was previously in position w 3 and had fewer than 3 life events in the current year; ii) she was previously in state w 4 and then had no financial difficulties and less than 3 events in the current year; or iii) she was previously in state w 4 and had financial difficulties but no life events in the current year.
Similarly, we have that the probabilities of an admission when reaching w 4 and w 5 are 0.11 and 0.13, respectively.
Next we consider the updating of the prior holding time distribution p(λ|D) to  its posterior distribution p(λ|h, N, D) using the holding time component of the likelihood.Here we restrict ourselves to discussing some examples for conjugate learning.An option is to assume that each holding time parameter λ uj has a Weibull distribution W (λ uj , κ uj ) with a known κ uj .If we set κ uj = 1, the parameter λ uj corresponds to the average rate that transitions from a stage u using edge j occur.It corresponds to assuming an exponential distribution for λ uj .In this case, it is implicitly hypothesised that these transition events happen at a constant expected rate over time and are mutually exchangeable given a DCEG model.However we are able to allow for the possibility that the transition average rate varies over time by adjusting the hyperparameter κ uj ; for κ uj < 1 this rate decreases over time and for κ uj > 1 this rate increases over time.
To learn the parameters of the conditional holding time distributions, the priors on λ uj are assumed to be mutually independent and have inverse-Gamma distributions IG(α uj , β uj ).This enables us to perform conjugate analyses which are analytically tractable.It also allows us to incorporate some background prior domain information.The hyperparameters α uj and β uj have a strict link with the expected mean and variance of the transitions events about which domain experts can provide prior knowledge.Of course in certain contexts the parameter independence assumption a priori may not be appropriate because the transition times are mutually correlated.In these situations it is likely that conjugacy would be lost, requiring other methods such as MCMC to find the corresponding posterior distribution.
Under the assumptions discussed above to obtain a conjugate learning, the posterior of the rate under this model is given by where h ujl , l = 1, . . ., N uj are the conditional holding times for each unit l that emanates from a stage u through an edge j.
Weibull distribution with scale parameter λ u32 and known shape parameter k 3 < 1 indicating that the death rate decreases with time.The holding times H u21 and H u22 could again have exponential distributions with parameters λ u21 and λ u22 respectively.Here the time until getting the vaccine or resuming a normal life is measured.
If Inverse-Gamma priors on λ u01 , λ k1 u11 , λ u12 , λ u21 , λ u22 , λ k2 u31 and λ k3 u32 are assumed, a conjugate analysis as described above can be carried out.The priors can be specified by assuming two conditions: i) a prior mean equal to 1 for all prior holding times; and ii) an equivalent sample size corresponding to the strength of the prior belief on the edge associated with each conditional holding time distribution (see Table 2).Then, given a complete random sample of individuals going through the Extended DCEG for a certain length of time, the number of times, N uj , each edge, e uj , is used can be recorded, as well as the time spent at each position before moving along a particular edge.The prior distributions on π and λ could then be updated in closed form by Equations 10 and 9, respectively.The CPVs and expected time spent at each position, before moving along a certain edge, can thus be calculated.
Because the estimation above is in closed form, the corresponding marginal likelihood can easily be computed.Thus, note that the marginal likelihood of an Extended DCEG structure given a complete random sample L(D|h, N) separates into two parts -one associated with the stages and another with the holding times: Then, the marginal likelihood of an Extended DCEG takes the form: After a little algebra the second component of the marginal likelihood associated with, for example, exponential holding times distributions can be written as: When the prior distributions on λ are the same for all Extended DCEG structures the log marginal likelihood, log L(D|h, N), can be written as a linear function of scores associated with different components of the models.The overall linearity of the score is an important property to be explored to devise clever techniques for traversing the Extended DCEG model space since the size of this space is vast without further constraints.

Discussion
In this section we discuss the association between DCEGs and three other dynamic models: DBNS, Markov chains and semi-Markov processes.

The Relationship between DBNs and DCEGs
Here we demonstrate that discrete DBNs (see Section 2.2) constitute a special DCEG class.We then discuss some pros and cons in using one or other model.Smith and Anderson (2008) and Barclay et al. (2013) have shown how a BN can be written as a staged tree and hence as a CEG.This can be simply extended to a dynamic setting and we explain below how a DBN can be represented as an infinite staged tree and therefore as a DCEG.It is also easy to check that many other processes such as dynamic context-specific BNs (Boutilier et al. (1996); Friedman and Goldszmidt (1998)) or dynamic Bayesian multinets (Geiger and Heckerman (1996); Bilmes (2000)) are amenable to this representation.Here to match our methods against the usual formulation of the DBN we are focusing only on the DCEG (Section 3.2), where one-step transitions are known and holding times do not need to be explicitly considered.
Let {Z t : t ∈ I} where I = {t 0 , t 1 , t 2 , . ..} be a vector stochastic process.Assume that at each time point t, we have a vector of n t variables Z t = (Z 1,t , . . ., Z nt,t ), and that the components Z p,t , p = 1, ..., n t all take a finite number of values.The variables Z t then form a time-slice of the DBN for each time point t.In the most general case, the DBN on Z t has an associated infinite acyclic directed graph G where the component Z p,t of Z t has parents pa(Z p,t ) = {Z q,s : s < t, q ∈ {1, . . ., n s }} ∪ {Z q,s : s = t, q ∈ {1, . . ., p − 1}} .
Next we claim that any general DBN can be written as an infinite staged tree.
To demonstrate this, we first show how to write the variables of the DBN as an infinite tree.We then define the conditional independence statements of the DBN by coloring the florets in the tree to form a stage partition of the situations.
Reindex the variables as Z k = Z p,t , k = 1, 2, 3, . . .so that, whenever Z i = Z q,s ∈ pa(Z p,t ), then the index i < k.This will ensure that parent variables come before children variables and time-slices come before each other.There is clearly always such an indexing because of the acyclicity and time element of G.This gives a potential total ordering of the variables in {Z t : t ∈ I} from which we choose one.Let a(Z k ) = {Z i : i < k} be the set of antecedents of Z k for the chosen variable order.Note that pa(Z k,t ) ⊆ a(Z k ).
By the assumptions of the ordering the components up to index k can be represented by a finite event tree denoted by T k = (V k , E k ).Recall from Section 3.1 that each floret in the tree can be associated with a random variable Z i and the edges e ij , j = 1, . . ., m i describe the m i values in the sample space that this random variable can take.Hence the paths in the tree T k correspond to the set of all combinations of values that variables Z k can take.Then a sequential construction of the stochastic process allows us to define a set of trees {T k } k≥1 , such that T k is a subtree of T k+1 , recursively as follows: 1.For k = 1, let T 1 be the floret, F (s 0 ), associated with Z 1 which can take m 1 values.Therefore V 1 = {s 0 , l 11 , l 12 , . . ., l 1m1 } and E 1 = {e s0j : j = 1, . . ., m 1 }.Given T k = (V k , E k ), define the edge set of the tree T k+1 as follows: is a set of N k × m k+1 new edges such as m k+1 edges emanate from each vertex l ki , i = 1, 2, .., N k .Each of these edge e l ki j describes a specific value that the random variable Z k+1 can take.To define the vertex set of T k+1 , attach now a new leaf vertex to each of the edges in E + k+1 and let and The infinite tree T of this DBN is now simply defined as T = (V, E), where the vertex and edges sets are, respectively, given by Note that the infinite length directed paths starting from the root of this tree correspond to the atoms of the sample space of the process.
We demonstrate this recursive construction of the infinite tree below.
Example 3.Here we remodel the Christchurch example (Section 3.4) where we take only the binary variables Financial Difficulty and Hospital Admission into account.Let Z 1,t and Z 2,t denote, respectively, the variables Financial Difficulty and Hospital Admission in time-slice t.Suppose now that the financial life enjoyed by a family in time t depends only on its previous financial situation in time t − 1. Assume also that the probability of a child been admitted in the hospital in time t depends if she visited the hospital in time t − 1 as well as on the current financial difficulty faced by her family.The DBN given in Figure 10 represents this process over time.Note that this is a 1-Markov BN: a variable is only affected by variables of the previous and current time-slices.
We can now reindex the variables of the DBN as follows: where Z i represents a variable Financial Difficulty, if the index i is an odd number, and a variable Hospital Admission, otherwise.Thus for example in the event tree a(Z 6 ) = {Z 1 , Z 2 , Z 3 , Z 4 , Z 5 } will be the antecedents of the variable hospital admission associated with the third time-slice (Z 2,t2 ).
Because we have defined Z 1 = Z 1,t0 , T 1 hence corresponds to the tree given in Figure 11  We next represent the conditional independencies of the DBN by coloring the vertices and associated edges that are in the same stage as described in Section 3.1.The resulting staged tree then encodes the same conditional independencies as the DBN.
Notice that the vertex l ki ∈ V k ⊆ V labels the conditioning history of the variable Z k+1 based on the values of its antecedent variables.By the definition of a DBN Z k+1 ⊥ ⊥ a(Z k+1 )|pa(Z k+1 ), ( 16) which means that a variable Z k+1 is independent of its antecedents given its parents.So by the DCEG definition the leaf nodes l ki1 and l ki2 associated with the event tree T k are in the same stage whenever their edge probabilities are the same.More formally P (e l ki 1 j |l ki1 ) = P (e l ki 2 j |l ki2 ) for all edges e l ki 1 j and e l ki 2 j , j = 1, . . ., m k+1 or alternatively, where z k+1 is a value the variable Z k+1 can take.If this is true then we assign the same color to l ki1 as to l ki2 .Thus the corresponding DCEG follows directly from the staged tree by performing edge contraction operations according to the position partition (see Section 3.2).
Example 3 (continued).Recall the previous Example 3. Assume now that the conditional probability tables remain the same across the time-slices t, t ≥ t 1 , for the Financial Difficulty variable set and the Hospital Admission variable set.Consider also that the probability of hospital admission in a specific time-slice t, t ≥ t 1 , only changes -in this case positively -if a family currently enjoys a good financial situation and their child was not admitted to the hospital in the previous time-slice.This probability is hypothesised to be equal to the one assigned to a child that lives in a financially stable family in the first time-slice.
Appendix B shows the staged tree corresponding to these hypotheses.Note that the colors alternate between odd and even levels because of the invariance of the conditional probability tables over time.Observe that it is not possible to represent these context-specific conditional statements graphically using a DBN model on these variables although they are encoded in the DBN's conditional probabilistic tables.In contrast, these additional conditions are not only directly depicted in a DCEG -which is actually a RT-DCEG -but also the corresponding graph is quite compact and easily interpreted (Figure 12).
Note that the re-expression of the DBN as a staged tree emphasizes how the usual classes of DBNs only represent graphically certain specific families of symmetric conditional independences.In contrast, the DCEG can allow us to depict asymmetric dependence structures between the variables of a time-slice and also across time-slices.When the dependence structure is defined through symmetric conditional independencies then the DBN is topologically much simpler than the corresponding DCEG.But when, as is often the case, many combinations of values of states are logically impossible and the number of non-zero probability No diff transitions between states is small then the DCEG depicts these zeros explicitly and can sometimes be topologically simpler than the DBN.
Consider the staged tree of Example 3 (Figure 16, Appendix B).If the conditional probability tables of the BN state that P (Z 2,t0 = Adm|Z 1,t0 = No diff) = 0 then the edge describing this probability can be omitted from the tree and the tree is hence reduced to three quarters of its size.Hence unlike the BN and its dynamic analogue, as well as depicting independence relationships the DCEG also allow us to read zeros in the corresponding transition matrix, represented by missing edges in the tree.This is particularly helpful when representing processes which have many logical constraints.However, these gains imply that the DCEG model space scales up super-exponentially with the number of variables.
Here the main challenge is to devise clever algorithms to search the DCEG model space.

DCEG and Markov Chain
In this section we use some examples to illustrate some topological links between DCEG graphs and state-transition diagrams of Markov Chains.These connections constitute a promising start pointing to extend many of the well-developed results on Markov processes to the DCEG domain -see, for example, the use of limiting distribution to initialise the DCEG learning process (Section 4) .In its turn, the DCEG framework can be used to verify if there is statistical evidence 28 that supports modelling a real-world process as a Markov Chain, and (if there is) to infer its corresponding transition matrix.
Note that the topology of the DCEG graph resembles the familiar state-transition diagram of a Markov process, where the positions of the DCEG can be reinterpreted as states of the Markov process.However, as mentioned at the end of section 3.1 the DCEG is usually constructed from a description of a process as a staged tree rather than from a prespecified Markov chain.Thus there are also some differences between the DCEG graph and standard state-transition diagrams such as the one-to-one relationship between atoms of the space of the DCEG and its paths and its coloring as will be illustrated in the simple examples below.However, the DCEG representation gives a different structure, which becomes apparent when looking first at the tree representation of the problem.As the process is infinite, the number of situations of the tree is also infinite.The initial situation s 0 , the root of the tree, has emanating edges which represent the choice of initial state with associated CPV π s0 = (0.4,0.4, 0.2).The other situations could be indexed as {s i,n , i = a, b, c, n ∈ N} with CPVs π sa,n = (0.2, 0.3, 0.5) and π s b ,n = π sc,n = (0.5, 0.3, 0.2).It is then immediate that the corresponding DCEG only has three stages and positions with the stage and position partition given by There is no w ∞ as all paths are infinite and hence no leaf vertices exist in the tree.The DCEG can then be drawn as given in Figure 14(a) and the associated CPVs are π w0 = (0.4,0.4, 0.2), π w1 = (0.2, 0.3, 0.5) and π w2 = (0.5, 0.3, 0.2).
For a better comparison the CPVs have here also been attached to the of the DCEG. Figure 14(b) depicts the same process when it has a degenerate initial distribution π s0 = (1, 0, 0).Even here, where the process is initially defined through a transition matrix, the graph of the DCEG automatically identifies states which have equivalent roles: here state b being identified with state c, and illustrates the identical conditional probabilities associated with the two states by putting s b,n and s c,n , for n ∈ N in the same position w 1 in Figure 14.
The DCEG also depicts explicitly the initial distribution of the process given by the edges emanating from w 0 and acknowledges the initially elicited distinctions of the states b and c through the double edge from w 0 to w 2 .Observe also that if a process has a degenerate initial distribution -for example, the one depicted in Figure 14(b) -, the DCEG will show this phenomenon transparently and it only implies minor changes in the DCEG topology.
These topological properties often have important interpretive value, as the DCEG can discover a different partition of the states of a variable or even help to construct new informative variables to represent a problem.Further, any residual coloring, inherited from the staged tree allows us to elaborate the structure of the transitions in a natural and consistent way, highlighting some possible common underlying structures between the states of a Markov process.This can bring new questions and motivate a deeper understanding of the process under analysis.For example, the state-transition diagram and DCEG graph (Figure 15) corresponding to the simple Example 5 are identical except for the colors.By coloring the positions red, the DCEG model stresses that their transition processes are probabilistically identified with each other (ie.they are in the same stage).In real-world problems, these coloring properties can stimulate domain experts to speculate some compelling reasons for them.
A semi-Markov process is usually specified by an initial distribution α and by its semi-Markov kernel Q whose ij th entry is given by We assume here that all Markov processes considered are time-homogeneous and hence the above equations do not depend on the index n.In order to illustrate a link between the Extended DCEG and semi-Markov processes we write the semi-Markov kernel as where is the conditional holding time distribution, i.e. the holding time at X n = i assuming that we move to X n+1 = j next and p ij is given in Definition 13.We can then show that a particular subclass of the time-homogeneous Extended DCEG corresponds to a semi-Markov Process.
Theorem 1.Let a DCEG D with holding times be simple and let no two edges lead from the same parent into the same child.Then this DCEG is a semi-Markov process with state space S = {V (D)\w 0 } and with the entries of its transition matrix given by π wij : if e wij = e(w i , w j ) exists 1 : if w i = w j = w ∞ 0 : otherwise, and with conditional holding time distributions P (H wij ≤ t|e wij , w i ) : if e wij = e(w i , w j ) exists 1 : if w i = w j = w ∞ 0 : otherwise.
If the position w 0 is a source vertex then the initial distribution is given by α = π w0 .Otherwise the initial distribution assigns probability 1 to w 0 and w 0 is included in the state space.
Proof.See Appendix C.
Results such as the one in Theorem 1 allow us to identify particular DCEG subclasses whose models have a strong connection with semi-Markov processes.This can indeed be very useful as many of the well-developed results on Markov processes could be extended to the DCEG.For instance, from Equation 6 the probability of staying at a position w for a time ≤ h and then moving along the edge e wk can be calculated.This equation corresponds to the entries of the semi-Markov kernel (Equation 20) of a semi-Markov process.Then, for example, Barbu and Limnios (2008) or Kulkarni (1995) have shown how to derive the 32 transition matrix of the semi-Markov process from the semi-Markov kernel, in order to calculate the probability of being in state j at time t given that we are initially in state i.These types of calculations could be directly extended to the DCEG.This would further enable the DCEG to be applicable to the wide-ranging domain of semi-Markov processes, which includes reliability theory, finance and insurance or traffic modelling.

Conclusion
We have demonstrated here that a dynamic version of the CEG is straightforward to develop and that this class enjoys most of the convenient properties of the CEG.It further usefully generalizes the discrete DBN when the context demands it.Although we do not envisage the DCEG taking over from the DBN as a representational device and framework for structured stochastic propagation and learning we nevertheless believe that it provides a valuable complementary tool to alternative graphical models.It is particularly suited to domains where the levels of state vectors are numerous but the associated transitions are sparse, or when context-specific symmetries abound.The fact that their finite analogues express BNs as a special case and standard learning algorithms for these classes nest into each other means that the DCEG and DBN representations are particularly complementary: the first focuses on the micro structure of the transitions between states of the process whilst the other focuses on the macro elements of the relationships between relevant variables within the study domain.Despite the closed form of their score functions, the major challenge that exists is to develop effective model search algorithms to discover potential causal mechanisms within this class.Faster and more efficient algorithms are now becoming available for CEG model search (Collazo and Smith (2015)) and the technology is now being transferred to address DCEG model selection.Early results on this topic are promising and will be reported in a later paper.
with scale parameter λ u and known shape parameter.We call the concatenation of these different holding time parameters λ.
Given an Extended DCEG D, for each individual that traverses the DCEG, the edges he passes along can be recorded as well as the holding times at each position.Assume the individual ι takes the path ǫ ι = (e wi 0 j0 , e wi 1 j1 , . . ., e wi nι jn ι ) along n ι + 1 edges starting at w i0 = w 0 .Then, let w ι ia describe the a th position reached by individual ι, h ι ia the holding time at position w ι ia and e ι iaja the ath edge passed along, where a = 0, 1, . . ., n ι .Then, by the definition of a DCEG (see Definition 11) the likelihood, given an individual ι, with path ǫ ι and vector of holding times h ι = (h ι i0 , h ι i1 , . . ., h ι in ), is given by L(π, λ|ǫ ι , h ι , D) =

Figure 1 :
Figure 1: BN of flu example

Figure
Figure 2: Flu example

Figure 4 :
Figure 4: Flu example: the beginning of the infinite tree, T

Figure 5 :
Figure 5: Flu example: the beginning of the infinite staged tree, T

Figure 6 :
Figure 6: Flu Example: DCEG of the infinite staged tree

Figure 7 :
Figure 7: Variant of flu example: infinite tree T *

Figure 8 :
Figure 8: Variant of the flu example: Extended DCEG of the infinite staged tree

Figure 10 :
Figure 10: A simple DBN for Christchurch Example

Figure 11 :
Figure 11: Illustration of T 1 and T 2 of a DBN

Figure 12 :
Figure 12: Illustration of the RT-DCEG of a DBN

Example 4 .Figure 13 :
Figure 13: State-transition diagram of the Markov process in Example 4 Figure 14: DCEG representation of the Markov process in Example 4

Figure 16 :
Figure 16: Illustration of the staged tree of T of a DBN s 4 , s 6 , s 7 , s 8 }