Symmetries in Directed Gaussian Graphical Models

We define Gaussian graphical models on directed acyclic graphs with coloured vertices and edges, calling them RDAG (restricted directed acyclic graph) models. If two vertices or edges have the same colour, their parameters in the model must be the same. We present an algorithm to find the maximum likelihood estimate (MLE) in an RDAG model, and characterise when the MLE exists, via linear independence conditions. We relate properties of a graph, and its colouring, to the number of samples needed for the MLE to exist and to be unique. We also characterise when an RDAG model is equal to an associated undirected graphical model and study connections to groups and invariant theory. We provide examples and simulations to study the benefits of RDAGs over uncoloured DAGs.


Introduction
The concept of a graph is widely used across the sciences [Fou12].Graphs are a framework to relate entities: the vertices are the entities of interest, and the edges encode connections between them.A graph is given a statistical meaning in the study of graphical models [Lau96,MDLW18].Each vertex represents a random variable, and the edges between variables reflect their statistical dependence [VP90].In this paper, we study directed Gaussian graphical models, also called Gaussian Bayesian networks, or linear structural equation models with independent errors [Sul18].Such models have been applied to cell signalling [SPP + 05], gene interactions [FLNP00], causal inference [Pea09], and many other contexts.
We define graphical models on directed acyclic graphs (DAGs) with a colouring of their vertices and edges.Vertices or edges with the same colour must have the same parameter values.Thus, the graph colouring imposes symmetries in the model.We call such models RDAG models, where the 'R' stands for restricted, cf.[HL08].
Our first motivation for RDAG models is that vertex and edge symmetries appear in various applications, such as in the study of longitudinal data [AFS16,VAAW16], or clustered variables [GM15,HL08].The coloured directed graph gives an intuitive pictorial description of the symmetry conditions in the model.
Our second motivation is to decrease the maximum likelihood threshold, the minimum sample size required for the maximum likelihood estimate (MLE) to exist almost surely, see [Dem72].In applications, it is desirable for the MLE to exist when there is only a small number of samples; i.e., for the maximum likelihood threshold to be small.Innovative ideas have been used to find maximum likelihood thresholds in graphical models [Buh93, DFKP19, GS18, Uhl12] and for estimating the MLE from too few samples [FHT08, WZV + 04].Removing edges from a graph can lower the threshold [Uhl12,Lau96], but there is a trade-off: removing edges imposes more conditional independence among the variables.This is why, instead, we aim to decrease the maximum likelihood threshold by introducing symmetries.
We will use the following as our running example throughout the paper.
Example 0.1.Consider the coloured graph 1 3 2 , with blue (circular) vertices {1, 2}, black (square) vertex 3 and two red edges.The RDAG model is where 1 , 2 ∼ N (0, ω) and 3 ∼ N (0, ω ), i.e. ω is the variance of blue vertices 1 and 2 and ω is the variance of black vertex 3. The third parameter λ is the regression coefficient given by a red edge.We will see that the MLE exists uniquely (almost surely) given one sample.For comparison, if we remove the colours the resulting model needs two samples for the MLE to exist.We use this example to model the dependence of two daughters' heights on the height of their mother, and we compute the MLE given some sample data, in Section 4.3.
As far as we are aware, RDAG models have not been defined before in the literature; we comment on some related models.The assumption of equal variances from [PB14] is the special case of an RDAG model, where all vertex colours are the same.Special colourings encode exchangeability between variables, or invariance under a group of permutations.A graphical model is combined with group symmetries in the directed setting in [Mad00] and in the undirected setting in [AM98,SC12].RDAG models also relate to the fused graphical lasso [DWW14], which penalises differences between parameters on different edges, whereas in an RDAG model the parameters on edges of the same colour must be equal.
In this paper, we give a closed-form formula for the MLE in an RDAG model, as a collection of least squares estimators, see Algorithm 1.We characterise the existence and uniqueness of the MLE via linear algebraic properties of the sample data, see Theorem 4.4.We give upper and lower bounds on the threshold number of samples required for existence and uniqueness of the MLE in Theorem 5.3.Our results show that RDAG thresholds are less or equal to the DAG thresholds, and that high symmetry decreases the thresholds.Finally, we compare RDAG MLEs to uncoloured DAG MLEs via simulations in Section 6.Our results hold with an assumption on the graph colouring, which we call compatibility (Defintion 2.5).It is an open problem to extend our results to the non-compatible setting, as well as to directed graphs with cycles.It is also an open problem to find the exact maximum likelihood thresholds, see Problem 5.4.
The undirected analogue to RDAG models are the RCON models from [HL08].Although a motivation for the graph colouring in RCON models is to lower the maximum likelihood threshold, there are relatively few graphs for which the threshold is known: colourings of the four cycle are studied in [Uhl12, §6], [SU10, §5], while an example with five vertices is [Uhl12, Example 3.2].In certain cases, RDAG models are equivalent to RCON models.We determine precisely the conditions under which this occurs in Theorem 3.4.As a consequence, we obtain an entire class of RCON models where conditions for MLE existence and uniqueness can be found by appealing to our results on RDAGs.This paper has two appendices, where we explain some connections to invariant theory.A Gaussian group model [AKRS21] is parametrised by a group.In [AKRS21], the authors draw a dictionary between maximum likelihood estimation and stability notions in invariant theory.This dictionary allows for the transfer of tools from the algebraic subjects of representation theory and invariant theory to statistics: maximum likelihood thresholds were computed for matrix normal models in [DM21] and for tensor normal models in [DMW20].
We extend the dictionary between maximum likelihood estimation and stability notions to RDAGs in Theorem A.2.This requires us to extend the definitions of stability beyond the setting of a group action, see Definition A.1.While not evident in our final presentation, this perspective gave us the understanding needed to obtain many of the results in this paper and we would like to stress its importance for future work.We have far more tools at our disposal when a model is backed by a group action, i.e., when it is a Gaussian group model.We identify RDAGs that are Gaussian group models in Proposition B.2 and exhibit additional tools that one can use in such cases.The We refer to a multivariate Gaussian model by the set of concentration matrices in the model.So, a model is a subset of PD m , the cone of m × m positive definite matrices.
We study statistical models in PD m via a set of invertible matrices.We define (1) where A is a subset of GL m , the real invertible m × m matrices.
where Y i is the ith column of Y .We work with the log-likelihood function log L Y , which has the same maximisers as L Y .The log-likelihood function can be written, up to additive and multiplicative constants, as (2) i is the sample covariance matrix.Four possibilities arise when maximising the log-likelihood: (a) Y unbounded from above (b) Y bounded from above (c) the MLE exists (i.e.Y is bounded from above and attains its supremum) (d) the MLE exists and is unique.The minimal number of samples needed for the MLE to exist almost surely is the MLE existence threshold; the number of samples for the MLE to exist uniquely almost surely is the uniqueness threshold.For a model M A as in (1), we can rewrite the log-likelihood (2) at sample matrix Y ∈ R m×n as a function of a matrix a ∈ A.
1.3.Directed Gaussian graphical models (DAG models).A directed acyclic graph (DAG) is G = (I, E), where I is a set of vertices, and E a set of directed edges.We write j → i for an edge from j to i; the absence of such an edge is denoted j → i.The parents and children of i in G are, respectively, the vertex sets We often take the vertex set I to be [m] = {1, 2, . . ., m}.
We call a directed Gaussian graphical model on G a DAG model.A DAG model is defined by the linear structural equation (4) y = Λy + ε, i.e.
where y ∈ R m , λ ij = 0 for j → i in G, and ∼ N (0, Ω) with Ω diagonal.The coefficient λ ij is a regression coefficient, the effect of parent j on child i.The model encodes conditional independence: a node is independent of its non-descendants, after conditioning on its parents [VP90].
Remark 1.2 (The matrix Λ is strictly upper triangular).Throughout the paper, we choose an ordering on the vertices of G so that Λ is upper triangular.That is, if edge j → i is in E then j > i.Such an ordering is possible because G is acyclic.Thinking of a vertex label as its age, the ordering makes parents older than children.
Solving (4) for y gives y = (id − Λ) −1 ε, where id denotes the m × m identity matrix, and the acyclicity of G ensures that (id − Λ) is invertible.Hence y is multivariate Gaussian with mean zero and concentration matrix We define a set of matrices associated to the DAG G The set of concentration matrices of the form (5) is equal to the set M A(G) .We prove this in Lemma 2.9.
1.4.Undirected Gaussian graphical models.Multivariate Gaussian models can also be obtained from undirected graphs.An undirected graph G = (I, E) is a set of vertices I and undirected edges E. The model is the set of distributions with mean zero and concentration matrix Ψ with Ψ ij = 0 whenever edge i j is not in E. That is, the variables at nodes i and j are independent after conditioning on all others, see [Sul18, Proposition 13.1.5].
1.5.Restricted concentration (RCON) models.In [HL08], the authors introduce restricted concentration (RCON) models, which impose symmetries on the concentration matrix Ψ according to a graph colouring.A coloured undirected graph is a tuple (G, c), where G = (I, E) is an undirected graph and the map assigns a colour to each vertex and to each edge.The vertex i ∈ I has colour c(i) ∈ C, and edge i j has colour c(ij) ∈ C.

Introducing RDAG models
A colouring of a DAG assigns colours to the vertices and edges.A coloured DAG is a tuple (G, c), where G = (I, E) is a DAG on vertices I and edges E, and is a colouring of the vertices and edges.Vertex i ∈ I has colour c(i) ∈ C, and edge j → i has colour c(ij) ∈ C. We sometimes denote the vertex colour c(i) by c(ii), with no ambiguity because a DAG cannot have loops.
(2) λ ij = λ kl whenever edges j → i and l → k have the same colour and the diagonal matrix Ω ∈ R m×m has positive entries and satisfies (3) ω ii = ω jj if vertices i and j have the same colour.The model is given by the linear structural equation y = Λy + ε, where ε ∼ N (0, Ω).
Example 2.2.Consider the coloured graph 1 3 2 from Example 0.1.The RDAG model is parametrised by matrices We will parametrise the RDAG model on (G, c) via the set (7) A(G, c) := a ∈ GL m a ij = 0 for i = j with j → i in G a ij = a kl whenever c(ij) = c(kl) .
Note that A(G, c) is contained in the set A(G) from (6): the zero patterns of A(G) and A(G, c) are the same, and A(G, c) has further equalities imposed by the colouring c.Before we characterise which RDAG models can be parametrised by A(G, c), we pause to motivate the use of this alternative parametrisation.
Remark 2.3 (Motivation for the parametrisation via A(G, c)).The RDAG model on (G, c) will be the set M A as in (1), where A := A(G, c).This point of view is motivated by connections to invariant theory for transitive DAG models in [AKRS21, Section 5].
The alternative parametrisation has useful consequences.First, it leads to a condition on the graph colouring, called compatibility, which is indispensable in our results of Sections 4 and 5. Second, it is helpful when comparing directed and undirected models in Section 3. Finally, it enables us to generalise the connections to invariant theory from [AKRS21] to the setting of RDAGs, see Appendices A and B.
Example 2.4.Returning to the example 1 3 2 , we have We now introduce a natural assumption on a colouring.
Definition 2.5.A colouring c of a directed graph is compatible, if: (i) Vertex and edge colours are disjoint; and (ii) If edges j → i and l → k have the same colour, then the child vertices i and k also have the same colour, i.e. c(ij) = c(kl) =⇒ c(i) = c(k).
Note that compatibility does not impose equality of parent colours c(j) and c(l).
Remark 2.6 (Motivation for compatibility).In an RDAG model, we do not impose equalities between Ω and Λ.The entry ω ii is a variance, while λ kl is a regression coefficient, so setting them to be equal would be difficult to interpret.Hence the vertex and edge colours can always be thought of as disjoint, as in compatibility condition (i).
Compatibility condition (ii) has the statistical interpretation that the same regression coefficient appearing in an expression for two variables implies that their error variances agree.This extra assumption is indispensable in many of our results and proofs.It is a directed analogue to the condition appearing in [HL08, Proposition 1].
The first use of compatibility condition (ii) is in relating an RDAG model on (G, c) to the set A(G, c).As in (1), we consider the model Proposition 2.7.Fix a coloured DAG (G, c).The RDAG model on (G, c) is equal to M A(G,c) if and only if the colouring c is compatible.
Before proving the proposition, we recall two matrix decompositions.The LDL decomposition writes a positive definite matrix as Ψ = LDL T , where D is diagonal with positive entries, and L is lower triangular and unipotent (i.e. has ones on the diagonal).The LDL decomposition is closely related to the factorisation Ψ = (id − Λ) T Ω −1 (id − Λ) from (5): the LDL decomposition is D = Ω −1 and L = (id − Λ) T .Hence an RDAG model imposes zeros and symmetries in the LDL decomposition.
The second matrix decomposition is the Cholesky decomposition.It writes a postive definite matrix as the product Ψ = a T a, where a is upper triangular with positive diagonal entries.The model M A(G,c) imposes zeros and symmetries in the Cholesky decomposition, as follows.
Lemma 2.8.Fix a coloured DAG (G, c) with compatible colouring c.Then M A(G,c) is the set of matrices with Cholesky decomposition a T a for some a ∈ A(G, c).
Proof.The set M A(G,c) consists of all matrices Ψ of the form a T a for some a ∈ A(G, c), see (1).The matrix a is upper triangular by the structure of G. To get the Cholesky decomposition, it remains to modify a to have positive diagonal entries.We replace a by ka, where k is the diagonal matrix with k ii = 1 if a ii > 0 and k ii = −1 if a ii < 0. Then ka flips the sign of all rows of a with negative diagonal entry, hence it has all diagonal entries strictly positive.The compatibility of the colouring ensures that a ij = a kl can only hold in A(G, c) if a ii = a kk .Hence multiplying by k doesn't break any edge compatibility conditions, and ka ∈ A(G, c).
The LDL and Cholesky decompositions are both unique, since Ψ is (strictly) positive definite.They are related by: Cholesky from LDL: a = D 1/2 L T , LDL from Cholesky: D = diag(a 2 11 , . . ., a 2 mm ), L T = D −1/2 a.The following lemma is proved by comparing zero patterns in the two decompositions.
Lemma 2.9.The DAG model on G is the model M A(G) .
Proof.The LDL decomposition of Ψ = (id − Λ) T Ω −1 (id − Λ) is given by D = Ω −1 and L = (id − Λ) T .The Cholesky decomposition has We have containment of the DAG model inside M A(G) , because if j → i in G, then λ ij = 0 and therefore a ij = 0. Conversely, given Ψ ∈ M A(G) , its Cholesky decomposition is a T a for some a ∈ A(G) by Lemma 2.8, since the colouring that assigns all vertices and edges different colours satisfies A(G) = A(G, c) and is compatible.Hence a ij = 0 for i = j unless j → i in G.We therefore have (id a ii a ij = 0, where a ii = 0, and hence λ ij = 0.
We prove Proposition 2.7, by comparing symmetries in the two decompositions.
Proof of Proposition 2.7.Given Ψ in the RDAG model, we show that its Cholesky decomposition is Ψ = a T a with a ∈ A(G, c).By Lemma 2.9, a ∈ A(G) and it remains to show that the colour conditions hold, i.e. that a ij = a kl whenever c(ij) = c(kl).If c(ii) = c(kk), then ω ii = ω kk since Ω respects the colouring.This shows that a ii = a kk , using (8).If c(ij) = c(kl) for edges j → i and l → k, then λ ij = λ kl since Λ respects the colouring and, moreover, ω ii = ω kk by compatibility.This implies a ij = a kl , by (8).
Conversely, given Ψ ∈ M A(G,c) , we show that Ψ is in the RDAG model.The Cholesky decomposition is Ψ = a T a for a ∈ A(G, c), by Lemma 2.8.The entries of Ω and Λ are ω ii = a −2 ii and λ ij = −a −1 ii a ij , by (8), which satisfy the RDAG model conditions.Hence a compatible colouring implies the equivalence of the RDAG model on (G, c) and M A(G,c) .
If the colouring is not compatible, we exhibit some Ψ in the RDAG model that is not in M A(G,c) .Let Ψ = a T a be the Cholesky decomposition.If there is some a ∈ A(G, c) with Ψ = (a ) T a then, similar to the proof of Lemma 2.8, there is a diagonal matrix o with entries ±1 with oa = a.First, if Definition 2.5(i) does not hold, there is a vertex k and an edge j → i with c(k) = c(ij).The RDAG model imposes no relation between ω kk and λ ij , so let Ψ be given by some Ω and Λ with ω kk = 1 and λ ij = 0. Then o kk a kk = a kk = 1 and o ii a ij = a ij = 0, by (8).Hence, a kk = 0 = a ij and therefore Ψ / ∈ M A(G,c) .Second, if Definition 2.5(ii) does not hold, then there exist edges j → i and l → k with c(ij) = c(kl) but c(i) = c(k).We choose Ψ given by some Ω and Λ with Example 2.10.We return to the graph 1 3 2 from Examples 2.2 and 2.4.The colouring is compatible, because the set of vertex colours {blue, black} is disjoint from the edge colour set {red}, and the children of both red edges have the same colour.Hence Proposition 2.7 shows that the RDAG model is equal to Remark 2.11.RDAG models can also be defined over the complex numbers.Here, the parameters Λ can be complex, and we obtain a subset of PD m by taking conjugate transposes, Ψ = (id − Λ) † Ω −1 (id − Λ).For the M A characterisation, we replace a T a by a † a.Many of our results and proofs can be modified to hold in the complex setting.We return to complex RDAGs in Section B.2.

Comparison of RDAG and RCON models
Given a directed graph, we can forget the direction of each edge to give an undirected graph.We characterise when the RDAG model on the coloured directed graph is equal to the undirected model on the corresponding coloured undirected graph in Theorem 3.4.We begin by comparing RDAG and RCON models in two examples.
Example 3.1 (RDAG = RCON).We revisit our running example 1 3 2 .The corresponding RCON model has coloured undirected graph 1 3 2 , with blue (circular) vertices {1, 2}, black (square) vertex 3, and red edges.By Definition 1.3, the RCON model is the set of positive definite matrices of the form Since the colouring is compatible, the RDAG model is equal to M A(G,c) from (9).Any matrix in M A(G,c) satisfies the equalities for the RCON model, so we have containment of the RDAG model in the RCON model.Conversely, given Ψ in the RCON model,

Example 3.2 (RDAG = RCON). Consider the RDAG model on 1
2 , the graph with two blue (circular) vertices {1, 2} and a red edge.The colouring is compatible, so by Proposition 2.7 the RDAG model is M A(G,c) , where The corresponding RCON model is on the coloured undirected graph 1 2 .The RCON model consists of all Ψ ∈ PD 2 with Ψ 11 = Ψ 22 and Ψ 12 = Ψ 21 , by Definition 1.3.Neither model is contained in the other: the RCON model contains but the diagonal entries 2 and √ 3 in the Cholesky decomposition do not satisfy the conditions for A(G, c).Conversely, the matrix is in the RDAG model, but not the RCON model, since Ψ 11 = Ψ 22 .
To characterize when an RDAG model is equal to its corresponding RCON model, we give two constructions of coloured graphs obtained from some (G, c), one that is built from a vertex and the other from an edge.As before, G = (I, E).
Fix a vertex i ∈ I. Recall that the children of i are the vertices k with (i → k) ∈ E. Consider the subgraph on vertex set {i} ∪ ch(i) with edges i → k for each k ∈ ch(i), and colours inherited from (G, c).We denote the subgraph by G i .
Now fix an edge (j → i) ∈ E. Consider vertices {i} ∪ (ch(i) ∩ ch(j)) with vertex colours inherited from (G, c).For each k ∈ ch(i) ∩ ch(j), we introduce two edges i → k, one with colour c(ki) and the other with colour c(kj).We denote this graph by G (j→i) .The vertex construction at vertex 5 and edge construction at edge 5 → 4 are: Given a DAG G, its corresponding undirected graph is denoted G u .Similarly, given a coloured DAG (G, c), its corresponding undirected coloured graph (where the colour of a directed edge becomes the colour of the undirected edge) is (G u , c).Two coloured graphs (G, c) and (G , c ) are isomorphic if the coloured graphs are the same up to relabelling vertices.We denote an isomorphism by G G when the colouring is clear.(a) G has no unshielded colliders; (b) G i G j for every pair of vertices i, j of the same colour; and (c) G (j→i) G (l→k) for every pair of edges j → i and l → k of the same colour.
Example 3.5.Our running example 1 3 2 satisfies the conditions of Theorem 3.4: it has no unshielded colliders, the graphs G 1 and G 2 both consist of a single blue vertex, and G (3→1) and G (3→2) are both isomorphic to 1 2 .The RDAG and RCON models are therefore equivalent, as we saw in Example 3.1.
Example 3.6.The following graph also satisfies the conditions of Theorem 3.4: The purple (triangular) vertices have G i isomorphic to G 5 from Example 3.3.(c) All edges j → i have ch(j) ∩ ch(i) = ∅, except for the two brown edges.For these, G (10→8) and G (9→7) are both isomorphic to G (5→4) from Example 3.3.Hence the RDAG model on this coloured graph is equal to the RCON model on the underlying undirected graph.Note that the two connected components of (G, c) are not isomorphic.We will see why this is not required for the proof of Theorem 3.4, i.e. why we can collapse vertices i and j in the definition of G (j→i) .
One ingredient to our proof of Theorem 3.4 is the condition for a DAG model to be equal to its corresponding undirected graphical model.Proof.Since the colouring is compatible, the RDAG model is M A(G,c) , by Proposition 2.7.Let a ∈ A(G, c) be a general matrix.We think of it as having indeterminate entries, one for each vertex colour and one for each edge colour.
If the RDAG model is contained in the RCON model, we must have (a T a) ij = 0 whenever a ij = a ji = 0.This holds iff there are no unshielded colliders in G, by Theorem 3.7.Moreover, certain equalities must hold on a T a.We have vertex colour condition (a T a) ii = (a T a) jj whenever c(i) = c(j) and edge colour condition (a T a) ij = (a T a) kl whenever c(ij) = c(kl).These give the polynomial identities We show that (10) is equivalent to (b) and that (11) is equivalent to (c).
Given two vertices i, j with c(i) = c(j), we have a 2 ii = a 2 jj .The sums in (10) are equal if and only if |ch(i)| = |ch(j)| and the edge colours in G i and G j agree (counted with multiplicity).By compatibility, the vertex colours also agree, hence (10) is equivalent to G i G j .
Next, we consider (11).No terms a ji a jj and a lk a ll appear, since there is no edge i → j or k → l.The compatibility of the colouring gives a ii = a kk .Hence a ii a ij = a kk a kl .A summand in (11) vanishes unless p ∈ ch(i) ∩ ch(j) or q ∈ ch(k) ∩ ch(l).The sums are equal if and only if |ch(i) ∩ ch(j)| = |ch(k) ∩ ch(l)| and the graphs G (j→i) and G (l→k) are isomorphic on their edge colours.By compatibility of the colouring, the vertex colours of the children also agree and Proof.Given a matrix Ψ in the RCON model on (G u , c), we show that it is contained in the RDAG model on (G, c) by showing that the Cholesky decomposition Ψ = a T a satisfies the conditions of the set A(G, c) in (7).The matrix a is upper triangular.Completing it one column at a time shows that its entries are Note that the expression under the square root in (12) is a positive real number, see [TB97,Lecture 23].We need to show that a ij = 0 for i = j with j → i (the support conditions) and that a ij = a kl whenever c(ij) = c(kl) (the colour conditions).
First we show that a satisfies the support conditions of A(G, c).If there is no edge from j to i in G, then G u has no edge between i and j, so Ψ i,j = 0.Moreover, all products a p,i a p,j vanish, otherwise i → p ← j would be an unshielded collider.Hence Next, we show that a satisfies the colour conditions of A(G, c).We prove this inductively over the top left k × k blocks of a.If k = 1 there are no symmetries to check.We assume that the top left k ×k submatrix of a satisfies the symmetries.For the induction step, we compare a 1,k+1 , a 2,k+1 , . . ., a k+1,k+1 with each other and with a i,j , i, j ∈ If there is an edge (k + 1) → 1 with same colour as j → i for i, j ∈ [k], we need to show that a 1,k+1 = a i,j .First, a 11 = a ii by compatibility and the induction hypothesis, and Ψ i,j = Ψ 1,k+1 since Ψ is in the RCON model.Moreover, all a pq for p, q ∈ [k] respect the symmetries.Since G (j→i) G (k+1→1) , the expressions (13) for a i,j and a 1,k+1 are equal.Proceeding inductively, we show that all entries a 2,k+1 , . . ., a k,k+1 respect the symmetries of c.

MLE: existence, uniqueness, and an algorithm
In this section we characterise the existence and uniqueness of the MLE in an RDAG model via linear dependence conditions on certain matrices.Our description specialises to give the characterisation of existence and uniqueness of the MLE in a usual DAG model in terms of linear dependence among the rows of the sample matrix.
Let α s be the number of vertices of colour s.For an edge j → i, vertex j is called the source of the edge and i is called the target.The parent relationship colours are the colours of all edges with a target of colour s: A sample matrix is Y ∈ R m×n ; its m rows index the vertices in G, and its columns are the n samples.For each vertex colour in G we define an augmented sample matrix.
Definition 4.1.The augmented sample matrix of sample matrix Y and vertex colour s, denoted M Y,s , has size (β s + 1) × α s n.We construct it row by row.Each row consists of α s blocks, each a vector of length n.For notational simplicity, we assume for now that the vertices of colour s are the set {1, 2, . . ., α s } ⊂ I. Then the top row of M Y,s is where The other rows of M Y,s are indexed by the parent relationship colours prc(s).The row indexed by t ∈ prc(s) is Note that the sum at the kth block is zero if there are no j → k of colour t.Let M (i) Y,s denote the ith row of M Y,s , where we index from 0 to β s .
Example 4.2.Our running example 1 3 2 has two augmented sample matrices, one for each vertex colour: Example 4.3.The RDAG model on the coloured DAG Y,s onto span{M 2 ; end end return MLE for Λ and Ω 4.1.Usual DAG models.We recall the characterisation of MLE existence and uniqueness for usual Gaussian graphical models, as linear dependence conditions on the sample matrix.For a DAG G on m nodes, and n data samples, the sample matrix is Y ∈ R m×n .For a node i in G we denote by Y (i) the ith row of Y , by Y (pa(i)) the sub-matrix of Y with rows indexed by the parents of i in G, and by Y (pa(i)∪i) the sub-matrix of Y with rows indexed by node i and its parents.
This result follows from viewing maximum likelihood estimation in a DAG as a sequence of regression problems.The acyclicity ensures that the sub-problems are uncoupled.We give a proof, so as to see its generalisation in Theorem 4.4.
Proof.We denote the entries of the MLEs Λ and Ω by λij and ωkk .The negative of the log-likelihood Y , in terms of the parameters ω ii and λ ij , is We minimise the above expression.Each parameter only appears in one summand.
In the ith summand, the λij always exist: they are coefficients of each Y (j) in the orthogonal projection of Y (i) onto the span of {Y (j) : j ∈ pa(i)}.The λij are unique if and only if Y (pa(i)) has full row rank.Let ζ i be the residual ∈ span Y (j) : j ∈ pa(i) , then in the limit ω ii → 0, the log-likelihood tends to infinity and ωii does not exist.Otherwise, the minimum is attained uniquely at ωii = ζ i /n.Combining these cases gives the theorem.4.2.Proof of Theorem 4.4.The proof of Theorem 4.4 will be similar to the proof for uncoloured models in Theorem 4.9.We start by proving the following lemma.
Lemma 4.10.Fix α > 0 and, for γ ≥ 0, consider the family of functions (i) If γ = 0, then f γ is neither bounded from below nor bounded from above.
(ii) If γ > 0, then f γ attains a global minimum at x 0 = γ α with function value at the global minima.Proof.Part (i) follows from the properties of log.To prove part (ii), one computes Thus and similarly one has f γ (x) < 0 if and only if x < γ α = x 0 .Therefore, x 0 is a global minimum of f γ .One directly verifies the function value for f γ (x 0 ), and so part (iii) follows from the monotonicity of the logarithm.
An MLE is a minimiser of the above expression.Each parameter occurs in exactly one of the summands over s ∈ c(I), because the prc(s) partition the edge colours by compatibility.We therefore minimise each summand separately.
If Remark 4.11 (Comparison with usual DAG models).The framework of RDAG models includes usual DAG models as a special case; namely, when each colour is used only once.In this case, Theorem 4.4 specialises to Theorem 4.9.We see in the next section that imposing colours in a DAG reduces the threshold number of samples required for existence and uniqueness of the MLE.

Illustrative examples.
In this section we apply RDAG models to some small illustrative examples.We first apply our running example to model the effect of a mother's height on her two daughters' heights.
Example 4.12.The RDAG model on coloured graph 1 3 2 is given by 3 ∼ N (0, ω ).Let variable 3 be the height (in cm) of a woman and let variables 1 and 2 be, respectively, the heights of her younger and older daughter.Vertices 1 and 2 both being blue indicates that, conditional on the mother's height, the variance of the daughter's heights is the same.The edges both being red encodes that the dependence of a daughter's height on the mother's height is the same for both daughters.
We saw in Example 4.5 that the MLE exists almost surely given one sample.We use Algorithm 1 to find the MLE, given one sample where the the younger daughter's height is 159.75cm, the older daughter's height is 161.56, the mother's height is 155.32, and the population mean height is 163.83cm.Mean-centring the data gives The only black vertex is 3, and it has no parents, hence ω = Y (3) 2 = 72.42.The orthogonal projection of Y (1) Y (2) onto span Y (3) Y (3) has coefficient λ = 0.37 and residual ω = (−3.175+ 4.08) 2 + (−3.175 + 2.27) 2 /2 = 0.82.As we would expect, the regression coefficient λ is positive and the variance of the daughters' heights conditional on the mother's height is lower than the variance of the mother's height.
We now consider multiple measurements taken in each generation.
Example 4.13.We consider measurements of the snout length and head length of dogs.These are the first two of the seven morphometric parameters in the study of clinical measurements of dog breeds in [MMB + 20].We compare two RDAG models: The black/square vertices 1 and 3 are the snout lengths of the two offspring.Blue/circular vertices 2 and 4 are their head lengths.The purple/triangular vertex 5 is the snout length of the parent and grey/pentagonal vertex 6 is the head length of the parent.The edges encode the dependence of the offsprings' traits on those of the parents.Maximum likelihood estimation in the left hand model is two copies of Example 4.12, one on the three odd variables, and one on the three even variables.Thus, given one sample a unique MLE exists almost surely.For the right hand model, Theorem 4.4 says that an MLE exists provided Y (5) = 0, Y (6) = 0 and neither Y (1) Y (3) nor Y (2) Y (4) are in span Y (5) Y (5) , Y (6) Y (6) .Hence an MLE exists almost surely with one sample.Moreover, the augmented sample matrices M Y,• and M Y, have full row rank almost surely provided n ≥ 2, hence the MLE exists uniquely with two samples, by Theorem 4.4.
The vertex colours in an RDAG model could correspond to colours in an experiment, as follows.Fluorescent reporters can be used to take measurements in a cell.In [SRA + 21], the authors quantify data in a single living bacteria using fluorescent reporters in red, cyan, yellow, and green, see [SRA + 21, Figure 2a].Given such measurements taken for a parent cell and its daughter cells, we could consider analogues of Example 4.13 in which, for example, the red fluorescence in a daughter cell only depends on the red fluorescence in the parent, or larger models in which there can also be dependence on the fluorescence of other colours.

Maximum likelihood thresholds
In the previous section we gave a characterisation of the existence and uniqueness of the MLE based on linear independence conditions.Here we turn this into results that depend only on the coloured graph, and that hold for a generic sample matrix.We give upper and lower bounds on the thresholds for almost sure existence and uniqueness of the MLE.The bounds hold whenever the sample matrix doesn't have extra linear dependencies among its rows.
For a fixed number of samples, the MLE in an RDAG model may exist but not be unique almost surely, which cannot happen in an uncoloured model.In fact, Example 5.5 gives a family of models for which the gap between the existence and uniqueness thresholds becomes arbitrarily large.
As well as assuming compatibility of the colouring, we often assume in this section that there are no edges between vertices of the same colour.Some of our bounds involve the following notion of generic rank.
Definition 5.1.Let M Y be a matrix whose entries are linear combinations of the entries of a matrix Y .The generic rank of M Y is its rank for generic Y .
For Y ∈ R m×n , we often study the generic rank of M Y by considering it as a symbolic matrix whose entries are linear forms in the mn indeterminates Y ij .
Example 5.2.The graph2 • has generic rank two.Removing its top row gives a 5 × 2 matrix of generic rank one.
Define the augmented sample matrix M Y,s ∈ R (βs+1)×αsn as in Definition 4.1.We let M Y,s ∈ R βs×αsn be obtained from M Y,s by removing its top row.As before, α s is the number of vertices of colour s and β s is the number of edge colours of edges to a vertex of colour s.Let r s be the generic rank of M Y,s when n = 1.Let mlt e (resp.mlt u ) denote the minimal number of samples needed for almost sure existence (resp.uniqueness) of the MLE.  and (G, c) has no edges between vertices of the same colour.We have the following bounds on the thresholds mlt e and mlt u max It remains an open problem to turn the bounds in Theorem 5.3 into formulae for mlt e and mlt u .
Problem 5.4.Determine the maximum likelihood thresholds mlt e and mlt u of an RDAG model, as formulae involving properties of the graph G and its colouring c.
Note that the upper bounds for existence and uniqueness are both at most max s β s +1, which is the threshold for uniqueness in the (uncoloured) DAG case, see [DFKP19].Hence the RDAG thresholds are always at least as small as the DAG threshold.
Example 5.5.We find the existence and uniqueness thresholds for the RDAG on the graph in Example 5.2.The black (square) vertices have no children, so the matrix M Y, is full rank as soon as n ≥ 1.The thresholds are therefore determined by the matrix M Y,• .The generic rank of M Y,• is one when n = 1, i.e. r • = 1.Theorem 5.3 and Proposition 5.10 then give bounds In fact, both bounds are attained, as follows.When n = 1, the row ) is almost surely not contained in the span of the other rows of M Y,• , hence mlt e = 1.On the other hand, we need at least n = 5 samples for generic linear independence of the rows (Y (3) , Y (3) ), . . ., (Y (7) , Y (7) ).
This example extends to an arbitrary number of vertices, i.e. the graph with k + 2 vertices, 2 blue/circular and k black/square, and 2k edges of k colours (arranged as in the k = 5 case above).Repeating the above argument gives mlt e = 1 and mlt u = k.
Remark 5.6.Theorem 5.3 applies to all RDAG models with compatible colouring that are equal to some RCON model, because such models never have edges between vertices of the same colour, as follows.Take some vertex i of minimal index with i ← j in G having c(i) = c(j).Then no children of i have colour c(i), therefore G i = G j , a contradiction to Theorem 3.4(b).
We modify the edges from Examples 5.2 and 5.5 to see how the thresholds change.

•
In both cases we have α • = 2, β • = 5, and r • = 2. Thus, Theorem 5.3 gives the bounds In fact, Proposition 5.10 gives an upper bound of 4 on mlt u , since r First, we study the left-hand RDAG.When n = 2 the row ) is not in the linear span of the other five rows of M Y,• , and we deduce mlt e = 2.For generic uniqueness of an MLE we need M Y,• ∈ R 6×2n to have full row rank six.For n ≥ 2, the submatrix ) is generically contained in the linear span of the other rows of M Y,• .From the bounds we conclude that mlt e = 3.For uniqueness, when n = 3 the submatrices Proof.We think of the mn entries of Y as indeterminates.Let M Y,s ∈ R βs×αsn be the matrix with rows M (1) Y,s .We construct an invertible β s ×β s submatrix of M Y,s .
Without loss of generality, let 1, 2, . . ., α s be the vertices of colour s.The matrix M Y,s has α s many β s × n blocks.For each parent relationship colour p t , t ∈ [β s ] there is some vertex i = i(t) ∈ [α s ] such that there is an edge of colour p t pointing towards vertex i = i(t).That is, the i th block of M Y,s has non-zero entries in the t th row.Let C t be the t th column of that block, which exists as n ≥ β s .By construction, the t th entry of C t is non-zero.We show that the matrix C = C 1 C 2 . . .C βs , is invertible.
An entry of C is either a sum of variables or it is zero.By construction, column C t only contains (sums of) elements of the t th column of Y .The same variable Y j,t cannot occur in two different entries of C t , because there is at most one edge from j to vertex i(t).Altogether, the entries of C are (possible empty) sums of variables and each variable occurs in at most one entry of C. The determinant is an alternating sum over products of permutations, and it is enough to show that one product is non-zero.By construction, Proposition 5.9.Consider the RDAG model on (G, c) where colouring c is compatible, with no edges between any vertices of colour s.
Proof.If α s = 1, the equality mlt e (s) = β s + 1 is known from the uncoloured case.It remains to consider α s ≥ 2. For the upper bound, we show that if n > βs αs , then the top row of M Y,s is generically not in the span of the other rows.Since there are no edges between two vertices of colour s, the nα s entries of the top row M (0) Y,s are all independent, from each other and from the entries of the other rows.If β s < α s n, the other β s rows do not span R nαs , so a generic choice of top row will not lie in their span.
For the lower bound, consider the β s × α s matrix M Y,s for n = 1.Its generic rank is denoted r s .We consider the 1 × n matrix blow up, where the scalar variables Y (i) are replaced by generic row vectors of length n, to give a β s × α s n matrix.We consider the rank of this matrix as n increases.The maximum rank is β s , which occurs for n ≥ β s by Lemma 5.8.Moreover, the rank is a (weakly) concave function in n [DM17, Corollary 2.8].Since the rank is positive integer valued, it cannot be the same at two distinct n unless it is at its maximum.Hence the rank for fixed n is at least min(r s + n − 1, β s ).Therefore, the top row is in the span of the others whenever We exclude the possibility that the smaller of the two arguments in the minimum is β s + 2 − r s by appealing to Proposition 5.10.Proposition 5.10.Consider the RDAG model on (G, c) where colouring c is compatible, with no edges between any vertices of colour s.Then Proof.For the lower bound, we observe that if α s n ≤ β s , the β s + 1 rows of M Y,s will be linearly dependent.Hence, we need n > βs αs .For the upper bound, let M Y,s and r s be as above.Recall from the proof of Proposition 5.9 that, for n samples, rk(M Y,s ) ≥ min(r s + n − 1, β s ) generically.Thus, for n = β s + 1 − r s the matrix M Y,s generically has full row rank β s .It remains to consider the top row of M Y,s .We must have β s ≤ α s n, otherwise the β s × α s n matrix could not have full row rank.We look separately at the possible cases: β s < nα s and β s = nα s .If β s < nα s , the row vector M (0) Y,s ∈ R nαs is generically not in the span of the β s rows of M Y,s , because there are no edges between vertices of colour s.Thus, M Y,s generically has full row rank β s + 1, and mlt u (s) Proof of Theorem 5.3.We have mlt e = max s mlt e (s) and mlt u = max s mlt u (s) by Theorem 4.4 parts (b) and (c).Taking the maximum of the lower and upper bounds in Propositions 5.9 and 5.10, over all vertex colours, gives the stated bounds.

A randomised algorithm.
Proposition 5.11.For an RDAG model on (G, c), where colouring c is compatible, there is a randomised algorithm for computing the thresholds mlt e and mlt u .
Proof.It suffices to give a randomised algorithm to compute mlt e (s) and mlt u (s) for fixed vertex colour s.The rank of a symbolic matrix can be computed (efficiently) by a randomised algorithm, see e.g.[Lov79,Sch80].Hence, thinking of the entries of Y ∈ R m×n as indeterminates, we can compute for any n ≥ 1 the rank of the symbolic (β s + 1) × α s n matrix M Y,s as well as the rank of the symbolic β s × α s n matrix M Y,s .We obtain mlt e (s) as the smallest n such that rk(M Y,s ) > rk(M Y,s ) and mlt u (s) as the smallest n such that rk(M Y,s ) = β s + 1.The algorithm terminates by the upper bound of β s + 1 for both mlt e (s) and mlt u (s).

Simulations
In the previous section, we gave upper and lower bounds for the maximum likelihood thresholds for RDAG models, see Theorem 5.3.Our bounds quantify how the graph colouring serves to decrease the number of samples needed for existence and uniqueness of the MLE.In this section, we assume that the number of samples is above the maximum likelihood threshold.We explore via simulations the distance of an MLE to the true model parameters.We compare the RDAG model estimate from Algorithm 1 to the usual (uncoloured) DAG model MLE.
The details of our simulations are as follows.We used the NetworkX Python package [HSSC08] to build an RDAG model via the following steps.We first build a DAG by generating an undirected graph according to an Erdős-Rényi model that includes each edge with fixed probability, and then directing the edges so that j → i implies j > i.We assign edge colours randomly, after fixing the total number of possible edge colours.We choose the unique vertex colouring with the largest number of vertex colours that satisfies the compatibility assumption from Definition 2.5.We sample edge weights λ st from a uniform distribution on [−1, −0.25] ∪ [0.25, 1] and we sample noise variances ω ss uniformly from [0, 1].Our code is available at https://github.com/seigal/rdag.
The RDAG MLE is generally closer to the true model parameters than the DAG MLE, see Figure 1.As we would expect, both estimates get closer to the true parameters as the number of samples from the distribution increases.At a high number of samples, the difference between the RDAG MLE and the DAG MLE is smaller than at a low number of samples.Next we examined how the RDAG MLE was affected by the number of edge colours, see Figure 2. The RDAG MLE is closest to the true parameters when the number of edge colours is small; i.e., when there are fewer parameters to learn.As the number of edge colours increases, the difference between the RDAG MLE and the DAG MLE decreases.Note that the DAG model is the setting where each vertex and edge has its own colour.
Finally, we looked at how the RDAG MLE and DAG MLE are affected by the edge density of the graph, see Figure 3.The RDAG MLEs get closer to the true parameter values as the edge density increases: more edges have the same weight, so more samples contribute to estimating each edge weight.By comparison, the DAG MLEs get further from the true parameters as the edge density increases, because there are more parameters to learn.(iii) polystable if Y = 0 and the set A • Y is (Euclidean) closed (iv) stable if Y is polystable and A Y is finite.
The above notions of stability are usually studied for A a reductive group [Dol03,MFK94].They were studied for reductive and non-reductive groups in [AKRS21].We are not aware of these definitions being used before without any group structure on A.
We relate maximum likelihood estimation for an RDAG model to these stability notions.For coloured graph (G, c), we recall the definition of A(G, c) from (7).To prove Theorem A.2 and Proposition A.3, we first we generalise [AKRS21, Proposition 3.4 and Theorem 3.6] to no longer require that A is a group.We say that a set A is closed under non-zero scalar multiples if a ∈ A implies ta ∈ A for all t ∈ R × .
Proposition A.4.Let A be a set of real invertible m × m matrices, closed under non-zero scalar multiples.There is a correspondence between stability under A ± SL and maximum likelihood estimation in the model M A given sample matrix Y ∈ R m×n : The MLEs, if they exist, are the matrices λa T a, where a If A contains an orthogonal matrix of determinant −1, then A ± SL can be replaced by A SL .
Proof.This is proved using the same argument as [AKRS21, Proposition 3.4 and Theorem 3.6].Maximising Y over M A is equivalent to minimising We write a ∈ A as τ b, where τ ∈ R >0 and b ∈ A ± SL .Setting x := τ 2 we compute The infimum of the function x → xC − m log(x) increases as C ≥ 0 increases, hence We have inf a∈A f (a) = −∞ if and only if inf b∈A ± SL b • Y 2 = 0, i.e. if and only if Y is unstable.This shows parts (a) and (b).
To prove (c) assume that Y is polystable under Y,s / ∈ X s for all s ∈ c(I) by assumption, we have 0 / ∈ H s and hence H s has at least codimension one in V s .We define the linear subspace K s := RM (0) Y,s ⊕ X s of V s .Since T acts on each V s with the non-zero scalar of colour s, we have Each H s is an affine subspace of K s with codimension one.Therefore, there exists a linear form p s ∈ K * s such that H s = V Ks (p s − 1), where V(•) denotes the vanishing locus.
We finish the proof by showing that A by the choice of p s ∈ K * s and since det(a) = s a αs s = 1.On the other hand, suppose Y stable ⇔ M Y,s has full row rank for all s ∈ c(I) Proof.Proposition A.4 in combination with Theorem 4.4 yields part (a) and the forwards direction of (c), while Lemma A.6 gives the backwards direction of (c).We obtain part (b) as a direct consequence of (a) and (c).Y,s = 0 has infinitely many solutions.Distinct solutions give distinct unipotent matrices a ∈ A by using a s,t as the entry for edge colour t ∈ prc(s), and setting all other off-diagonal entries of a to zero.Such a unipotent matrix a ∈ A satisfies aY = Y , since the sets prc(s) are disjoint, so the a s,t do not affect any rows of Y with a different vertex colour.In conclusion, A Y is infinite if M

Appendix B. Connections to Gaussian group models
A model is a Gaussian group model if it is equal to M A , see (1), where the set A a group.In this case, the second term in the log-likelihood (3) is the minimisation of the norm over a group orbit.This perspective was used in [AKRS21] to relate existence of the MLE to notions of stability under a group action.In this appendix, we characterise when the set of matrices A(G, c) from ( 7) is a group.We use Popov's criteron to study stability, and give our third and final description of the set of MLEs in an RDAG model.k ∈ I with j → k and k → i in G, i.e.G must be transitive, by contraposition.We have shown that (1), (2) and (3) are satisfied if and only if conditions (a) and (b) hold.
Example B.3. Surprisingly, two graphs can have all the same butterfly graphs without being isomorphic.We present an example.Consider the following coloured graph with 10 black (square) vertices, and edges that are red (solid), green (squiggly), orange (dashed) or brown (dotted).We add some further edges: four purple edges a 1 → c i , four blue edges b i → d 1 , and a yellow edge a 1 → d 1 .Now consider the graph obtained by exchanging the green (squiggly) and orange (dashed) edges.The butterfly graphs for the two graphs are the same, as follows.On the yellow edge, the butterfly graphs both have four paths consisting of a brown edge followed by a blue edge, and four that are a purple edge followed by a brown edge.Similarly, we can check the butterfly graphs at the other edge colours.
However, the two coloured graphs are not isomorphic.Indeed, the only way to get an isomorphism is to permute the b-layer and the c-layer.The red (solid) edges give the identity permutation, the orange (dashed) edges give the cycle σ = (1 4 3 2), and the green (squiggly) edges give σ 2 .Hence an isomorphism would need to consist of permutations µ 1 and µ 2 of {1, 2, 3, 4} with µ 1 idµ 2 = id, µ 1 σµ 2 = σ 2 , µ 1 σ 2 µ 2 = σ.The first condition implies µ 2 = µ −1 1 , hence σ and σ 2 need to be simultaneously conjugate to σ 2 and σ respectively.This implies (σ 2 ) 2 = σ, a contradiction because σ 4 = id.A(G, c) is a group, we can prove the important Lemma A.6 differently, via a criterion of Popov [Pop89, Theorem 4].The criterion characterises when an orbit under a connected solvable group is closed, provided the underlying field is algebraically closed.Due to the latter assumption, we work in this subsection with RDAG models defined over the complex numbers, and note that many of our results and proofs carry over to the complex case, see Remark 2.11.We start by describing Popov's criterion for the group G := A(G, c) SL .

B.2. Popov's criterion. If
Since G ⊆ GL m (C) is a group of invertible upper triangular matrices, it is solvable.We decompose G = T • U ⊆ GL m as a semi-direct product, where T is the subgroup of diagonal matrices in G, and U is the subgroup of unipotent matrices in G.The

Example 1. 1 .
Let M = PD m .The unique maximiser of the likelihood is Ψ = S −1 Y , if S Y is invertible.If S Y is not invertible, the likelihood function is unbounded from above, see [Sul18, Proposition 5.3.7].The existence and uniqueness thresholds are therefore both m, since with m samples the matrix S Y will almost surely be invertible.

Theorem 3. 4 .
Consider the RDAG model on (G, c) where colouring c is compatible.The RDAG model on (G, c) is equal to the RCON model on (G u , c) if and only if: It has no unshielded colliders.(b) For the black (square) vertices, the graphs G i consist of one black vertex.For the blue (circular) vertices, the G i are isomorphic to Theorem 3.7 (Gaussian special case of [AMP97, Theorem 3.1], [Fry90, Theorem 5.6].).The DAG model on G is equal to the undirected Gaussian graphical model on G u if and only if G has no unshielded colliders.We now prove Theorem 3.4 via two propositions.Proposition 3.8.Let (G, c) be a coloured DAG with compatible colouring c.The RDAG model on (G, c) is contained in the RCON model on (G u , c) if and only if (a) G has no unshielded colliders; (b) G i G j for every pair of vertices i, j of the same colour; and (c) G (j→i) G (l→k) for every pair of edges j → i and l → k of the same colour.
Proposition 3.9.Let (G, c) be a coloured DAG with compatible colouring c such that (a) G has no unshielded colliders; (b) G i G j for every pair of vertices i, j of the same colour; and (c) G (j→i) G (l→k) for every pair of edges j → i and l → k of the same colour.Then the RCON model on (G u , c) is contained in the RDAG model on (G, c).
Proof of Theorem 3.4.If any of conditions (a),(b), and (c) do not hold, this rules out containment of the RDAG model in the RCON model, by Proposition 3.8, and hence rules out the two models being equal.If conditions (a),(b),(c) hold, we have containment of the RDAG model inside the RCON model (by Proposition 3.8) and the reverse containment (by Proposition 3.9).
4.4.Consider the RDAG model on (G, c) where colouring c is compatible, and fix sample matrix Y ∈ R m×n .The following possibilities characterise maximum likelihood estimation given Y : (a) Y unbounded from above ⇔ M c) MLE exists uniquely ⇔ M Y,s has full row rank for all s ∈ c(I).Example 4.5.For our running example 1 3 2 , Theorem 4.4 says that the MLE exists uniquely provided Y (3) = 0 and Y (1) Y (2) is not parallel to Y (3) Y (3) .This holds almost surely as soon as we have one sample, as we mentioned in Example 0.1.Example 4.6.Returning to Example 4.3, the MLE given Y exists provided M Y, = Y (3) • • • Y (7) = 0, and Y (1) Y (2) is not in the span of the other rows of M Y,• .The MLE is unique if and only if M Y,• is full row rank, since this also implies M Y, = 0.The proof of Theorem 4.4 gives Algorithm 1 for finding the MLE in an RDAG model with compatible colouring.The MLE is returned as entries of the matrices Λ and Ω.We give the MLE in a closed-form formula, as a collection of least squares estimators.Remark 4.7.In Algorithm 1, the entries of Ω are given as {ω ss : s ∈ c(I)}.The entries of Λ are returned as {λ s,t : s ∈ c(I), t ∈ prc(s)}, which equals the set of edge colours by compatibility, since edge colour t only appears in prc(s) for one s.The proof of Theorem 4.4 directly gives a description of the set of MLEs.Corollary 4.8.Consider the RDAG model on (G, c) where colouring c is compatible, with sample matrix Y ∈ R m×n .If (Λ, Ω) and (Λ , Ω ) are two MLEs, then Ω = Ω and t∈prc(s) (λ s,t − λ s,t )M (t) Y,s = 0, for all s ∈ c(I).Algorithm 1: Computing the MLE for an RDAG model input : A coloured DAG (G, c) and sample matrix Y output: An MLE given Y in the RDAG model on (G, c), if one exists for s ∈ c(I) do α s := |c −1 (s)|; β s := |prc(s)|; construct matrix M Y,s ∈ R (βs+1)×αsn ; P Y,s := orthogonal projection of M

Theorem 4. 9 .
Consider the DAG model on G, with m nodes, and fix sample matrix Y ∈ R m×n .The following possibilities characterise maximum likelihood estimation given Y : (a) Y unbounded from above ⇔ Y Proof of Theorem 4.4.Since colouring c is compatible, the RDAG model equals M A(G,c) , by Proposition 2.7.That is, for Ψ= (id − Λ) T Ω −1 (id − Λ) in the RDAG model, the matrix a = Ω −1/2 (id − Λ) is in M A(G,c) and satisfies Ψ = a T a.As usual, let α s := |c −1 (s)| and β s := |prc(s)|.The entry of Ω at vertex colour s is denoted ω ss and the edge colour entries of Λ that point towards colour s are labelled by λ s,t , t ∈ [β s ].Using the construction of the matrices M Y,s ∈ R (βs+1)×αsn and that det(id − Λ) = 1, we compute equivalently ζ s > 0, then the summand α s log(ω ss ) + ζ s /(nω ss ) has unique minimiser ωss = ζ s /(nα s ), by Lemma 4.10(ii).Hence, an MLE exists if ζ s > 0 for all s ∈ c(I), which proves "⇐" in (b).As the right-hand sides of (a) and (b) are opposites, we have proved (a) and (b).Since the ωss are uniquely determined (if they exist), an MLE (if it exists) is unique if and only if for all s ∈ c(I) the vectors M (t) Y,s , t ∈ [β s ] are linearly independent.In combination with the condition for MLE existence from part (b) we deduce (c).

Theorem 5. 3 .
Consider the RDAG model on (G, c) where colouring c is compatible,

Example 5. 7 .
Consider the following DAGs, both with compatible colouring Y ∈ R 7×n , we respectively obtain generically has rank two.Therefore, M Y,• has rank at most five if n = 3, but n = 4 suffices for M Y,• generically having full row rank.We conclude mlt u = 4.Next, we study the right-hand RDAG.For n = 2, M • generically have rank three, and the zero pattern then ensures that M Y,• has full row rank six.Combining with the lower bound 3 ≤ mlt u gives mlt u = 3. 5.1.Proof of Theorem 5.3.For fixed vertex colour s, we define mlt e (s) to be the smallest n such that the top row of M Y,s is almost surely not in the span of the other β s rows, and define mlt u (s) to be the smallest n such that the matrix M Y,s is almost surely of full row rank β s + 1.To prove Theorem 5.3, we use the following lemma.Lemma 5.8.Consider the RDAG model on (G, c) where colouring c is compatible, and fix a vertex colour s.For n ≥ β s and generic Y ∈ R m×n the row vectors M

Figure 1 .
Figure 1.We generated RDAGs on 10 vertices, with each edge present with probability 0.5 and 5 edge colours.We sampled from the distribution n ∈ {5, 10, 100, 1000, 10000} times.For each n we generated 50 random graphs and computed the RDAG MLE and the DAG MLE, comparing them to the true parameter values on a log scale.The results are displayed in a violin plot, with blue for the RDAG MLE and orange for the DAG MLE.

Figure 2 .
Figure 2. We generated RDAGs on 10 vertices, each edge present with probability 0.5 and number of edge colours in {2, 5, 10, 50, 100}.We sampled from the distribution 100 times and compared the MLE to the true parameter values on a log scale.The DAG MLE is shown in orange for comparison.

Figure 3 .
Figure3.We generated RDAGs on 10 vertices, each edge present with probability in {0.1, 0.3, 0.5, 0.7, 0.9} and 5 edge colours.For each edge probability we generated 50 random graphs, sampled from each one 100 times, and compared the RDAG and DAG MLEs.As above, blue is the RDAG MLE and orange is the DAG MLE.
Given a set A ⊆ GL m , we define A SL = {a ∈ A | det(a) = 1} and A ± SL = {a ∈ A | det(A) = ±1}.Theorem A.2. Consider an RDAG model on (G, c) with compatible colouring c and sample matrix Y ∈ R m×n .Then stability under A(G, c) SL relates to maximum likelihood estimation: (a) Y unstable ⇔ Y unbounded from above (b) Y semistable ⇔ Y bounded from above (c) Y polystable ⇔ MLE exists (d) Y stable ⇔ MLE exists uniquely.Proposition A.3.Fix the RDAG model on (G, c) and set A := A(G, c) SL .If λa T a is an MLE given Y , where a ∈ A and λ > 0 is a scalar, then the set of MLEs given Y are in bijection with A Y under mapping b ∈ A Y to λ(a + b − id) T (a + b − id).Theorem A.2 applies to any DAG model, see Remark 4.11.Therefore, Theorem A.2 generalises [AKRS21, Theorem 5.3] in multiple ways.First, it extends from transitive DAGs 3 to all DAGs.Second, it extends from uncoloured DAG models to RDAG models.Third, it adds part (d) about stable samples.Moreover, Proposition A.3 gives a bijection between the MLEs and the stabilising set.
Thus, an MLE given Y exists.If an MLE exists, then the inner and outer infima in (19) are attained, and any MLE has the form in the statement.IfA contains an orthogonal matrix of determinant −1, we can write from the outset a = τ ob with τ ∈ R >0 , b ∈ A SL and o orthogonal of determinant ±1.Setting x := τ 2 , we derive the same computation for f (a) (now with b ∈ A SL ), since o is orthogonal.The rest of the proof then works with A SL instead of A ± SL .Remark A.5.Given an RDAG model M A(G,c) with compatible colouring, we can always apply Proposition A.4 using stability under A(G, c) SL instead of the bigger set A(G, c) ± SL .Indeed, if the number of vertices of colour s, α s , is even for all s ∈ c(I), then A(G, c) only contains matrices of positive determinant, so A(G, c) ± SL = A(G, c) SL .If α s is odd for some vertex colour s, then A(G, c) contains an orthogonal matrix with determinant −1.Next, we return to the linear independence condition in Theorem 4.4(b).Lemma A.6.Consider the RDAG model on (G, c) where colouring c is compatible, and set A := A(G, c) SL .Assume for a non-zero Y ∈ R m×n that M for all s ∈ c(I).Then Y is polystable under A and A • Y is Zariski closed.Proof.The hypotheses in the statement imply that the log-likelihood Y is bounded from above, by Theorem 4.4(b).Since A(G, c) is closed under non-zero scalar multiples, Y is semistable under A = A(G, c) SL by Proposition A.4(b) and Remark A.5.We now study the orbit A • Y .Let T be the set of diagonal matrices in A and U the unipotent matrices in A. Then A = T • U by compatibility and, in fact, any a ∈ A admits a unique decomposition a = tu with t ∈ T , u ∈ U .For s ∈ c(I), recall the construction of M Y,s ∈ R (βs+1)×αsn from Definition 4.1.Setting V s := R 1×αsn we can identify R m×n ∼ = s V s such that the rows of vertex colour s belong to V s .By definition of M Y,s , and since the prc(s) partition c(E), the set U • Y is H := s H s with H s = M (0) Y,s + a s,1 M (1) Y,s + . . .+ a s,βs M (βs) Y,s | a s,t ∈ R .The affine space H s is M (0) Y,s + X s , where X s := span M a s := p s (W s ), then we have s a αs s = 1, so the a s define some a ∈ T .Moreover, W := (a −1 s W s ) s ∈ H by definition of the a s and henceW = a • W is contained in T • H = A • Y .Proposition A.7. Consider an RDAG model on (G, c) with compatible colouring c and A := A(G, c) SL .Let Y ∈ R m×n be a sample matrix.Stability under A relates to linear independence conditions on the matrices M Y,s :

2
are linearly dependent for some s ∈ c(I).Proof of Theorem A.2. Combine Proposition A.7 with Theorem 4.4.We now turn to Proposition A.3.As above, A := A(G, c) SL for an RDAG with compatible colouring.Denote the set of diagonal (respectively unipotent) matrices in A by T (respectively U ).By compatibility of the colouring, A = T • U and in fact any a ∈ A admits a unique factorisation a = tu with t ∈ T , u ∈ U .Lemma A.8. Consider the RDAG model on (G, c) where colouring c is compatible.For A := A(G, c) SL write A = T • U as above.If Y ∈ R m×n is polystable under A, then the following hold: (a) U • Y contains a unique element Y of minimal norm.(b) For t ∈ T and u ∈ U , tu • Y ≥ t • Y with equality if and only if u • Y = Y .(c) Let a, a ∈ A be such that a • Y and a • Y are of minimal norm in A • Y .Then there is an orthogonal t ∈ T such that ta • Y = a • Y .Proof.For part (a), we recall that the prc(s), s ∈ c(I), partition the edge colours c(E).over u ∈ U we can minimise each summand separately.For each s ∈ c(I), the affine space M (0) Y,s + span M (t) Y,s : t ∈ [β s ] has a unique element of minimal norm, call it M s .Hence, U • Y has a unique element of minimal norm, Y , determined by M (0) Y ,s = M s for all s ∈ c(I).(Note that there may be several u ∈ U with uY = Y .)To prove part (b), we use (the proof of) part (a) to obtain (21) M Proof of Proposition A.3.Recall that A = A(G, c) SL = T • U , where T is the diagonal matrices in A, and U the unipotent matrices in A. If aY = Y , then for all s ∈ c(I), M 0 Y,s = a s M : t ∈ [β s ] , since Y is polystable.Hence a s = 1 for all s, i.e. a ∈ U and therefore A Y = U Y .We set N Y := U Y − id, which consists of strictly upper triangular matrices.It suffices to show that for fixed MLE λa T a the map ϕ : N Y → {MLEs given Y } b → λ(a + b) T (a + b) is well-defined and bijective.Note that bY = 0 for any b ∈ N Y .Therefore, (a+b)Y = aY is of minimal norm in A • Y and thus ϕ(b) is an MLE by Proposition A.4.For surjectivity, let λ a T a be another MLE given Y .Then aY and aY are of minimal norm in A•Y , hence there is an orthogonal t ∈ T with aY = t aY by Lemma A.8(c).We set b := t a − a so that b • Y = 0 and (id + b)Y = Y .By compatibility of the colouring we have t a ∈ A and thus all entries of b = t a − a obey the colouring c.However, bY = 0 implies b s = 0 for all s by polystability of Y , hence b ∈ N Y .We compute ϕ(b) = λ(t a) T (t a) = λ a T a using orthogonality of t.For injectivity, let b, b ∈ N Y be such that ϕ(b) = ϕ(b ).Let t ∈ T be defined by t s = 1 if a s > 0 and t s = −1 if a s < 0. Then t is orthogonal and thus (ta + tb) T (ta + tb) = (a + b) T (a + b).Similarly, (ta + tb ) T (ta + tb ) = (a + b ) T (a + b ).Then ϕ(b) = ϕ(b ) implies (22) (ta + tb) T (ta + tb) = (ta + tb ) T (ta + tb ).Moreover, tb, tb are strictly upper triangular and ta ∈ A has positive diagonal entries, by construction of t.Hence, applying uniqueness of the Cholesky decomposition to (22) gives ta + tb = ta + tb , and we deduce b = b .
group G acts on C m×n by left-multiplication.Let f k,l ∈ C[C m×n ], k ∈ [m], l ∈ [n], be the coordinate functions on C m×n .Let x s be the coordinate function corresponding to vertex colour s ∈ c(I), and let x s,t be the coordinate function for the edge colour t ∈ prc(s).Given a tuple of samples Y ∈ C m×n we consider the orbit mapµ G•Y : G → C m×n g → g • Y or, on coordinate rings, µ * G•Y (f k,l ) = m j=1 x c(kj) Y j,l .We defineR Y := µ * G•Y C[C m×n ] = C m j=1 Y j,l x c(kj) k ∈ [m], l ∈ [n] ⊆ C[G].Since R Y is a C-algebra, we obtain the semigroupX G•Y := (d s ) s∈c(I) ∈ X(T ) s∈c(I) x ds s ∈ R Y ,whereX(T ) ∼ = Z |c(I)| / Z • (α s ) s∈c(I) is the character group of T .Theorem B.4 (Popov's Criterion, [Pop89, Theorem 4]).Let G and Y be as above.The orbit G • Y is Zariski closed if and only if X G•Y is a group.Remark B.5.The group G = A(G, c) SL may not be connected as required in [Pop89, Theorem 4].However, the orbit G • Y is Zariski-closed if and only if G • • Y is Zariskiclosed, where G • is the identity component of G.Thus, after restricting to G • = T • U we may assume that G is connected.Restricting to T • amounts to restricting to the torsion-free part of X(T ): if α is the greatest common divisor of all α s , s ∈ c(I), then T • ∼ = (g s ) s∈c(I) | s g αs/α s = 1 and X(T • ) = Z |c(I)| / Z • (α s /α) s∈c(I) .Second Proof of Lemma A.6.The matrix Y is semistable by Proposition A.4(b) and Theorem 4.4(b).Fix s ∈ c(I) and let M † Y,s be the Hermitian transpose of M Y,s .Since M we have ker(M †Y,s ) ⊆ span{e 1 , . . ., e βs } ⊆ C βs+1 .Therefore, e 0 is in the orthogonal complement of ker(M † Y,s ), i.e. in the image of M Y,s , so there is some z ∈ C αsn with M Y,s z = e 0 .By construction of the matrix M Y,s , the equationx s x s,1 x s,2 • • • x s,βs M Y,s z = x s x s,1 x s,2 • • • x s,βs e 0 = x s shows that x s is a C-linear combination of the m j=1 x c(kj) Y j,l ,where k ∈ c −1 (s) and l ∈ [n]; the coefficients are given by z ∈ C αsn .In particular, x s ∈ R Y .Since the coordinate functions x s , s ∈ c(I) generate the character group X(T ) (thinking of characters as algebraic group morphisms G → C × ), we conclude X G•Y = X(T ).Hence, X G•Y is a group and G • Y is Zariski closed by Popov's criterion, Theorem B.4. B.3.Bijection between the stabiliser and the set of MLEs.So far we have given two descriptions of the set of MLEs given Y in an RDAG model.Corollary 4.8 gives a linear space of possible Λ, while Proposition A.3 gives an additive bijection between the MLEs and the stabiliser.Here we give an alternative (multiplicative) bijection between the set of MLEs and the stabiliser, when A(G, c) is a group.This is similar two appendices offer two alternative descriptions of the set of MLEs, see Propositions A.3 and B.6.
1.1.Multivariate Gaussian models.We consider m-dimensional Gaussian distributions with mean zero.Such a distribution is determined by its concentration (inverse covariance) matrix Ψ, a real m × m positive definite matrix.The density function is Maximum likelihood estimation.A maximum likelihood estimate (MLE) is a point in the model that maximizes the likelihood of observing some data.For n samples from a Gaussian model M ⊆ PD m , the data samples are the columns of a matrix Y ∈ R m×n .Assuming independent samples, the likelihood function is Many sets A can correspond to the same model M A .For instance, the full cone PD m is M A whenever A contains all invertible upper triangular matrices.When the set A is a group, the model M A is called a Gaussian group model [AKRS21].1.2.
s then return MLE does not exist; else coefficients λ s,t are such that P Y,s = t∈prc(s) λ s,t M By Lemma 4.10(iii) we can first determine λs,t , t ∈ [β s ] that minimise , t ∈ [β s ] are linearly independent.Denote the minimum value of (16) by ζ s .We will apply Lemma 4.10 several times with γ s For part (d), it suffices to see that a polystable Y has a trivial stabiliser A Y if and only if for all s ∈ c(I) the rows M Y,s are linearly independent, then (20) has exactly one solution, namely a s = 1 and a s,t = 0 for all t ∈ [β s ].Thus, if M Y,s are linearly independent for all s ∈ c(I), then A Y = {id}.On the other hand, if for some s ∈ c(I) the rows M for all s ∈ c(I), where a s ∈ R × is the entry of a for vertex colour s, and a s,t ∈ R is the entry of a for the parent relationship colour encoded by (s, t), where t ∈ [β s ].We note that (20) implies a s = 1 and t∈[βs] a s,t M (t) Y,s = 0, by polystability of Y and part (c).
,s 2 for all s ∈ c(I), hence tu • Y ≥ t Y .The latter inequality is strict if and only if there is strict inequality in (21) for at least one s.By uniqueness of Y , this is the case if and only if uY = Y .Y the matrices aY and aY are also of minimal norm in T • Y .We note that T is a group isomorphic to the reductive group {(t s ) s∈c(I) | t s ∈ R × , s t αs s = 1}.Hence the Kempf-Ness theorem, see [AKRS21, Theorem 2.2], for the action of T implies that there is some orthogonal t ∈ T that relates the minimal norm elements aY and aY in T • Y .We conclude this appendix with a proof of Proposition A.3.
For (c), write a = tu with t ∈ T and u ∈ U .Since aY is of minimal norm in A • Y , we deduce uY = Y using (b).Thus, aY ∈ T • Y and similarly aY ∈ T • Y .As T • Y ⊆ A •