Recursive max-linear models with propagating noise

Recursive max-linear vectors model causal dependence between node variables by a structural equation model, expressing each node variable as a max-linear function of its parental nodes in a directed acyclic graph (DAG) and some exogenous innovation. For such a model, there exists a unique minimum DAG, represented by the Kleene star matrix of its edge weight matrix, which identifies the model and can be estimated. For a more realistic statistical modeling we introduce some random observational noise. A probabilistic analysis of this new noisy model reveals that the unique minimum DAG representing the distribution of the non-noisy model remains unchanged and identifiable. Moreover, the distribution of the minimum ratio estimators of the model parameters at their left limits are completely determined by the distribution of the noise variables up to a positive constant. Under a regular variation condition on the noise variables we prove that the estimated Kleene star matrix converges to a matrix of independent Weibull entries after proper centering and scaling.

1. Introduction.Graphical modeling has shown to be a powerful tool for understanding causal dependencies in a multivariate random vector.However, most models have been linear and limited to discrete or Gaussian distributions (see e.g.[24] and [26]).Such models lead to severe underestimation of large risks and, therefore, are not suitable in the context of extreme risk assessment.First examples combining extreme value methods with graphical models include flooding in river networks ( [12]), financial risk ( [10], [23]), and nutrients ( [23]).
We consider the class of recursive max-linear (ML) models, which has been defined in [13].A recursive ML model is defined by a structural equation model (SEM) of the form where the dependence structure between random variables is represented by a DAG D := (V, E) with node set V := {1, . . ., d} and edge set E = E(D) ⊆ V × V , and each variable X i for i ∈ V has a representation in terms of ML functions of its parental nodes pa(i) = {j ∈ V : (j, i) ∈ E} and an independent innovation Z i .Both, SEMs (e.g.[5], [30]) and directed graphical models (e.g.[24], [26], [34]) are wellestablished and widely used to understand causality.
As shown in [22] recursive ML models respect the basic Markov properties associated with DAGs (e.g.[27], [28]).Moreover, the equation system (1.1) has the solution b ji Z j , i = 1, . . ., d, with ML coefficient matrix (in tropical algebra called the Kleene star matrix) B := (b ij ) d×d , see [6], Corollary 1.6.16.Unlike the edge weight matrix C = (c ij ) d×d , B is identifiable and completely determines the distribution of X := (X 1 , . . ., X d ) (see [14], Theorem 1).Also, B is idempotent with respect to the tropical matrix multiplication defined in (2.4) below, and defines a graphical model on a DAG.Furthermore, [14] proposes a minimum ratio estimator for B, which itself is idempotent, and is a generalized maximum likelihood estimator in the sense of [21].
We extend the original model (1.1) by allowing for multiplicative observation errors and define with ε i ≥ 1 and iid for i = 1, . . ., d.By taking advantage of tropical algebra, we present in Theorem 3.2 a solution of (1.2) which represents each node variable U i in terms of a ML function of its ancestral nodes and an independent innovation Z i given by where an(i) denotes the ancestors of i and bji are random variables involving the edge weights and the noise variables.It comes as no suprise that the true DAG and edge weights for a recursive ML model with propagating noise inherit the non-identifiability property from the non-noisy model.However, as we will prove in section 4, the ML coefficient matrix B = (b ij ) d×d remains identifiable in spite of the observational noise and even if we do not know the underlying DAG.
To link up our new model (1.2) with existing literature, observe that a log-transformation of (1.2) yields Ũi = j∈pa(i) with εi ≥ 0. Thus, for every j ∈ pa(i), the difference Ũi − Ũj is lower-bounded by cji and The estimation of (linear) functions with one-sided errors has been considered in the literature before.For instance, in [16] and [19] observations are given by Y j = f (X j ) + ε j for j = 1, . . ., n with observation errors ε j > 0, with density given conditionally or unconditionally on X j = x, and f describes some frontier or boundary curve, which has to be estimated.To present an archetypical example, consider the linear regression problem stated in [32] and [33] as Y i = β + ε i for i = 1, . . ., n and observation errors, which have density g(x) ∼ αcx α−1 as x ↓ 0 for α, c > 0. In these papers, the focus is on the non-regular case, when α < 2. Then β can be estimated by the sample minimum Y 1,n which has a Weibull limit law: (1.4) The work in [32] has been used in [8] to estimate the coefficient φ of a first order autoregressive time series with positive innovations.They propose the minimum ratio estimator φ = n j=1 X j /X j−1 and show in their Corollary 2.4 that it also has a Weibull limit law similar to (1.4).
In our model (1.2) we find two interpretations for the noise variables.Firstly, in the logtransformed version (1.3) we consider a ML model as baseline model, which is observed with some additive noise.A second representation is given in Corollary 3.3 below, where the edge and path weights become noisy by the noise variables.This gives rise to the interpretation that we observe the model parameters with noise similarly as in the regression examples above.As a consequence, a path from j to i realising the ML coefficient b ji is no longer deterministic but depends on the individual realizations of the noise variables.However, in Theorem 3.12 we show that at the left limit of support the distribution of the ratio of two model components is determined by all noise variables along the path between the two nodes.Assuming noise variables with regularly varying distribution in their left limit of support, we propose a minimum ratio estimator and show in Theorem 5.5 that the estimated ML coefficient matrix converges to a matrix of independent Weibull entries after proper centering and rescaling.
The paper is organized as follows.In section 2, we summarize the properties of recursive ML models as defined in (1.1) and state the most important results relevant for our paper.In section 3 we consider the extension of the recursive ML model given in (1.2), which we coin the max-linear model with propagating noise and present its solution and the main properties of this new model.In section 4 we address the identifiability of the ML model with propagating noise.Similarly as in (1.4) we suggest minimum ratio estimators for the model parameters B. In section 5 we assume regular variation of the noise variables.Under this assumption, we show that the minimum ratios are asymptotically independent and Weibull distributed.Finally, in section 6, we provide a data example and apply the theory that we have derived in the previous sections.All proofs are postponed to an Appendix.
Throughout we use the following notation.R + = (0, ∞) and R + = [0, ∞), x∧y = min{x, y} and x ∨ y = max{x, y} with i∈∅ x i = ∞ and i∈∅ x i = 0 for x i ∈ R + .Bold letters denote vectors and matrices, e.g.I d denotes the d × d identity matrix.Moreover, all vectors are row vectors unless stated otherwise.For two functions f, g we write f (x) ∼ g(x) as x ↓ c if lim x↓c f (x)/g(x) = 1 and 1 denotes the indicator function.Moreover, an(i), pa(i) and de(i) denote the ancestors, the parents, and the descendants of node i, respectively, and An(i) := an(i) ∪ {i}.Every edge (j, i) ∈ E is a directed edge j → i.Finally, for a path p = [k 0 → . . .→ k n ] we define the node set on the path (excluding the initial node) by S p := {k 1 , . . ., k n } and its path length by |S p |.For a random variable Y with distribution function F Y , the symbol F ← Y denotes its quantile function.
2. Preliminaries -Recursive max-linear models.We first formally introduce the class of recursive ML models and state their most important results for this paper.Let D = (V, E) be a DAG with nodes V = {1, . . ., d} and edges E. Then a random vector X := (X 1 , . . ., X d ) is a recursive max-linear vector or follows a max-linear Bayesian network on D if with positive edge weights c ki for i ∈ V and k ∈ pa(i), and independent positive random variables Z 1 , . . ., Z d with support R + 0 := [0, ∞) and atom-free distributions.We shall refer to Z := (Z 1 , . . ., Z d ) as the vector of innovations.
For a path p = [j = k 0 → k 1 → . . .→ k n = i] from j to i we define the path weight Denoting the set of all paths from j to i by P ji , we define the ML coefficient matrix B = (b ij ) d×d of X with entries The components of X can also be expressed as ML functions of their ancestral innovations and an independent one; the corresponding ML coefficients are the entries of B: which can be shown by a path analysis as in Theorem 2.2 in [13] or by tropical algebra as in (2.6) below, and as we explain now.
For two non-negative matrices F and G, where the number of columns in F is equal to the number of rows in G, we define the matrix product : R (2.4) The triple (R + , ∨, •), is an idempotent semiring with 0 as 0-element and 1 as 1-element and the operation is therefore a matrix product over this semiring; see for example [6].Denoting by M all d × d matrices with non-negative entries and by ∨ the componentwise maximum between two matrices, (M, ∨, ) is also a semiring with the null matrix as 0-element and the d × d identity matrix I d as 1-element.
The matrix product allows us to represent the ML coefficient matrix B of X in terms of the edge weight matrix C := (c ij 1 pa(j) (i)) d×d of D, since (2.1) can be rewritten as with unique solution (equivalent to (2.3)) given by where B is the Kleene star matrix and we have let and k ∈ N; see Proposition 1.6.15 of [6] as well as Theorem 2.4 and Corollary 2.5 of [13].For more information on the max-times (tropical) algebra in ML models see section 2.2 in [1].
We have seen that a recursive ML vector X has two representations, one in terms of parental nodes X j and edge weights c ji and another in terms of innovations Z j and ML coefficients b ji .However, while the ML coefficient matrix B of X is identifiable from the distribution of X, the edge weight matrix C is generally not, see Theorem 5.4(b) in [13].Theorem 5.3 in that paper and Theorem 2 in [14] show that an edge with edge weight c ji is identifiable from B if and only if it is the unique path from j to i with d ji (p) = b ji .
For a recursive ML vector X on a DAG D = (V, E) and ML coefficient matrix B this result leads to the following definition.
+ be a recursive ML vector on the DAG D = (V, E) with ML coefficient matrix B. We define the minimum ML DAG of X as Moreover, it has be shown that for a recursive ML vector X for the support it holds that supp( with P(X i /X j = b ji ) > 0 for all j ∈ an(i); see Lemma 1 of [14].Hence, for a given iid sample X 1 , . . ., X n from X define a minimum ratio estimator B of B by bij := n k=1 (X k i /X k j ) for i, j ∈ V .Moreover, when the DAG D is known, we define B 0 by 1) .Theorem 4 of [14] ensures that B is a generalized maximum likelihood estimate (GMLE) in the sense of [21].
3. Recursive ML model with propagating noise.In this section we present structural results for the recursive ML model with propagating noise, in particular, we investigate which properties of the non-noisy model prevail.
+ is a recursive ML vector with propagating noise on a DAG D = (V, E), if with edge weight matrix C := (c ij 1 pa(j) (i)) d×d .The noise variables ε 1 , . . ., ε d are iid and atom-free random variables with ε i ≥ 1 and unbounded above for all i ∈ V , and independent of the innovations vector Z := (Z 1 , . . ., Z d ).For simplicity, we denote by ε a generic noise variable and by Z a generic innovation.
Although the noise variables act on the observations, formally we can view them as random scalings of edge weights.More precisely, for a path p = [j = k 0 → k 1 → . . .→ k n = i] from j to i we define the random path weight dji similarly to the definition of If we define the random edge weight matrix Hence, we can view the noise variables as random scalings for the edge weights c ji .Since ε ≥ 1, the edge weights c ji of the non-noisy model are lower bounds for the random edge-weights cji of the propagating noise model.Again denoting the set of all paths from j to i by P ji , we define the random ML coefficient matrix B = ( bij ) d×d of U with entries bji := p∈P ji dji (p) for j ∈ an(i), bjj = ε j , and bji = 0 for j ∈ V \ An(i). ( We next show that there exists a solution of (3.1) in terms of the ancestral innovations Z and B. All proofs of this section are postponed to Appendix A.
+ be a recursive ML vector with propagating noise on a DAG D as in (3.1).Define (E d ) d×d as the diagonal matrix given by E d (i, i) = ε i for i ∈ V and E d (i, j) = 0 for i, j ∈ V and i = j.
We rewrite (3.1) in matrix form by means of the matrix multiplication (2.4) as Then U has a unique solution in terms of the tropical matrix multiplication with random matrix B given by with C as defined in (3.3).
Since bji = 0 whenever j ∈ An(i), the representation (3.5) can be rewritten as follows.
Corollary 3.3.Let U be as in Theorem 3.2 and bji be the random ML coefficients defined in (3.4).Then (3.6) is equivalent to Note that the definition in (3.1) is equivalent to From this result we can compute the following representation.
We next define critical and generic paths which play an essential role for the understanding of our model.
happens with positive probability.
Remark 3.6.We have defined a non-random critical path and a random critical path.We want to emphasize, however, that while the first path property is simply inherited from C via B, the second one is inherited from C and the noise variables.We also note that by continuity of the innovations and the noise variables, any random critical path between a pair of nodes must be unique, although it may vary with the realizations of the noise variables.
We explain the model and the notions of Definition 3.5 in an example.
Example 3.7.Consider the DAG: Then, C is generic if and only if c 13 = c 12 c 23 .Moreover, we have with cji = c ji ε i as defined in (3.3).Now assume that c 13 > c 12 c 23 .In that case, [1 → 3] is the critical path, while the path [1 → 2 → 3] is not critical.However, P(c 13 < c12 c23 ) = P(ε 2 > c 13 /(c 12 c 23 )) > 0. For this reason, both paths can be random critical.Finally, all paths in D can be possible critical path realizations.To stress the difference between a random critical path and a possible critical path realization, observe that e.g.[1 → 3] and [2 → 3] can be random critical for the same realized noise and innovation variables, however, the two paths cannot be possible critical path realizations for the same noise and innovation variables up to a null set.In contrast, if c 13 < c 12 c 23 we have P(c 13 > c12 c23 ) = P(ε 2 < c 13 /(c 12 c 23 )) = 0.In this case, the path [1 → 3] can not be random critical and particularly not a possible critical path realization.
This illustrates that a path p from j to i with path weight d ji (p) < b ji may as well contribute to the distribution of U i .However, an edge p = [j → i] with d ji (p) < b ji is still not identifiable and does not change the distribution of U .
While we still want to estimate the (non-random) ML coefficient matrix B, we first present some useful properties of B and B and a link between the noisy and non-noisy model as defined in (2.1) and (3.1), respectively.Lemma 3.8.Let U ∈ R d + be a recursive ML vector with propagating noise on a DAG D as defined in (3.1) with B and B defined in (2.6) and (3.5), respectively.Then the following assertions hold: bjk bki bkk , where the inequality is strict, whenever the random critical path from j to i is the edge j → i, or j = i.b) There exists some path p := [j → . . .→ k → . . .→ i] from j to i that passes through k such that dji (p) = bji if and only if bji = bjk bki bkk .Definition 3.9.Let U ∈ R d + be a recursive ML vector with propagating noise on the DAG D = (V, E) as defined in (3.1).Then we define the minimum ML DAG D * B of U as

c)
In addition, applying first Lemma 3.8 e) and f), an in the second part Lemma 3.8 b) yields the following result.
Corollary 3.10.Let X ∈ R d + be a recursive ML vector on a DAG D = (V, E) as defined in (2.1) and U ∈ R d + be a recursive ML vector with propagating noise as defined in (3.1) on the same DAG D with the same edge weight matrix C. Then Moreover, the minimum ML DAG D B is the smallest DAG that preserves the distribution of X and of U .
Therefore, we will henceforth only use the term D B .Lemma 3.11.Let U ∈ R d + be a recursive ML vector with propagating noise on a DAG D as defined in (3.1).Then the following assertions hold: i if and only if all edges of p belong to the minimum ML DAG D B .b) Let p 1 and p 2 be two possible critical path realizations from j to i and from l to m, respectively.Then has positive probability if and only if S p 1 ∩S p 2 = ∅, or for every r ∈ S p 1 ∩S p 2 the sub-path of p 1 from j to r is a sub-path of p 2 or the sub-path of p 2 from l to r is a sub-path of p 1 .
We illustrate part b) with Figure 1 and Figure 2.
We conclude this section with an important result that not only helps us to understand the model better, but is also an important step for learning the model.Theorem 3.12.Let U ∈ R d + be a recursive ML vector with propagating noise on a DAG D as defined in (3.1).Suppose that for some constant c ∈ (0, 1).
Remark 3.13.If the distributions of the noise variables and the innovations as well as the path weights of the underlying DAG D are given, the constant c in Theorem 3.12 can be calculated explicitly.
can be possible critical path realizations from the same realized noise variables along the nodes.Theorem 3.12 also shows that, while any path p from j to i with d ji (p) < b ji contributes to the distribution of U (as we have seen in Example 3.7), they influence the distribution of U i /U j at their left limit of support only by a constant.
We now extend the result to situations with several critical paths.
Corollary 3.14.Let U be as in Theorem 3.12.Suppose that there are several paths p 1 , . . ., p n from j to i that are critical; i.e., for some constant c ∈ (0, 1).
For simplicity, we assume from now on that C is generic in the sense of Definition 3.5.However, we want to remark that all such results can be extended to the case of several non-random critical paths between two nodes.The proofs of such results work similarly as the proof of Corollary 3.14.
We continue with another consequence of Theorem 3.12.
Corollary 3.15.Let U be as in Theorem 3.12 and suppose that p := . ., U n for n ∈ N be an iid sample from U .Then, for the same constant c ∈ (0, 1) as in Theorem 3.12, we have We conclude this section by extending Theorem 3.12 to multivariate distributions.We only formulate and prove the bivariate case, the general case is then obvious.Recall that in Lemma 3.11 we gave a necessary and sufficient condition for (3.10) below.
+ be a recursive ML vector with propagating noise on a DAG D as defined in (3.1).Suppose generic paths p 1 from j to i and p 2 from l to m. Assume that (3.10) for some constant c ∈ (0, 1).

Identification and estimation.
We first address the question of identifiability of B from the distribution of U .In particular, we are going to show that even though innovations and noise variables are generally not identifiable, B remains identifiable also in the propagating noise model.
We discuss three settings (1)-( 3) below.For each setting, we propose an appropriate minimum ratio estimator for B. Afterwards, we will show the almost sure convergence of each of the estimators.Since we can identify B from the distribution of U , we can also identify the minimum ML DAG D B from Definition 2.1 (which by Definition 3.9 and Corollary 3.10 is the minimum DAG preserving the distribution of U ). Therefore, since ε ≥ 1, Theorem 2 of [14] also holds for the propagating noise model as defined in (3.1).Therefore, as exemplified in Example 3.7, we can identify the class of all DAGs and edge weights that could have generated U .However, unlike for the non-noisy model, we can generally not identify innovations or noise variables.To see this assume a source node U i in a DAG D such that an(i) = ∅.If U follows a recursive ML model with propagating noise, then U i := Z i ε i .In particular, we can not identify Z i or ε i .
When estimating a recursive ML model, we distinguish between three settings: (1) All ancestral relations are known; i.e., we know the set of edges E, hence the DAG.This might be the case when modeling networks that contain natural information about edges.The problem then reduces to finding appropriate estimates bji for j ∈ an(i).( 2) The ancestral relations are unknown; however, we know a topological order of the nodes.
Then, in contrast to setting 1, we need to decide if a path from j to i with j < i exists.
(3) Neither the underlying DAG nor a topological order of the nodes is known.Then we need to find a topological order of the nodes and proceed then as in setting 2.
We next want to estimate B for each of the three settings ( 1)-(3).

4.2.
Known DAG structure with unknown edge weights.Given an iid sample U 1 , . . ., U n from a recursive ML model with propagating noise on a known DAG D as defined in (3.1) and knowing all ancestral relations of D, we could choose the simple estimate However, as in the non-noisy model, the estimate (4.1) may not define any recursive ML model on the given DAG D, cf.Example 3 of [14].We use instead Applying Lemma 2 in [14] to B 0 , the estimator B yields a valid estimate of the given DAG in the sense that B defines a recursive ML model and for any pair (j, i) ∈ E(D) we have bji = k∈{1,...,d}\{j,i} bjk bki .Moreover, by the idempotency of B and Lemma 3.8 c), similarly to the non-noisy model, it also holds that from a recursive ML model with propagating noise without knowing D or the topological order, we will recover a topological order first and then proceed as in section 4.3.
Estimating the topological order of an underlying DAG is often done by learning algorithms that successively identify source nodes and succeeding generations.For additive models, usually regression techniques are applied (see e.g.[7] or [31]).In the recursive ML model, the noise is not additive and the model is highly non-linear.Hence, such regression methods cannot be applied.However, under the condition of multivariate regular variation, the paper [23] suggests a learning algorithm for the model without noise as given in (1.1).We propose a different approach, which to the best of our knowledge has not been considered in the literature before.It applies to the propagating noise model without any distributional assumptions on the innovations and noise variables and learns the DAG by using minimum ratios.We first consider the matrix of all minimum ratios given by Let Π denote the set of all topological orders of V .Furthermore, denote an equivalence class of topological orders induced by the underlying (unknown) DAG D = (V, E) by By Lemma 3.8 d), bji is lower bounded by b ji for j ∈ an(i) and bji → 0 a.s. as n → ∞ for j ∈ An(i).This is a direct result from Lemma 3.8 c) and the fact that the minimum is nonincreasing.Hence, for any π ∈ R D it holds that bji → 0 a.s. as n → ∞ whenever π(j) > π(i).
Therefore, also max In contrast, for any π ∈ R D , there is a pair of nodes (j, i) such that b ji > 0 although π(j) > π(i).For this reason, max As a consequence, for a given topological order π, by (4.7) and (4.8), the maximum converges almost surely to zero if and only if π ∈ R D .Hence we propose a topological order that minimizes this expression, i.e., arg min π∈Π max (j,i)∈V ×V : π(j)>π(i) bji . (4.9) A topological order found by (4.9) generally is not unique.Algorithm 1 returns a unique topological order for any fixed estimated matrix B.
The DAG Ď constructed in Algorithm 1 works as an auxiliary instrument to infer a topological order.Since Ď is complete with edges between every node pair in V , it returns a unique topological order.Moreover, since we sort the weights by size, the algorithm solves (4.9) in an optimal way for given B. At first sight the algorithm bears some similarity to Kruskal's classical algorithm for finding a minimum spanning tree; see [25].However, Algorithm 1 works with directed edges and, of course, the optimization problem itself is very different.
Adding an edge and checking the presence of a path between any pair of nodes both can be implemented in O(d) amortized complexity (see [17]).Hence, since S as computed in line 2 of Algorithm 1 contains d(d − 1) pairs of nodes, we have an overall amortized complexity of O(d 3 ).After Algorithm 1 we can again use the minimum ratio estimator

Algorithm 1 Estimating a topological order
Input: A matrix of minimum ratios B as in (4.5) Output: An estimated topological order π 1: Set Ď = (V, E) with V = {1, . . ., d} and E = ∅.2: Set S := {(j, i) ∈ V × V : j = i} and sort the elements (j, i) of S by the size of bji from big to small.3: for (j, i) in S do 4: if i ∈ an(j) in Ď then 5: + be an iid sample from U .Then the estimates (4.2), (4.4) and (4.10) of B are strongly consistent, i.e., it holds a.s.for n → ∞ that bji −→ b ji for j ∈ an(i), bii = 1, and bji −→ 0 for j ∈ V \ An(i).
In sections 4.2-4.4we have been discussing how to estimate B under the settings (1)-(3).However, as we know from Corollary 3.10, only critical edges of D contribute to the distribution of U .Asymptotically, we can almost surely identify D B since there is an edge j → i in D B if and only if b ji > b jl b li for all l ∈ de(j) ∩ an(i).
However, in real life we estimate the edges of D B for a finite data set.Since n k=1 (U k i /U k j ) > 0 holds for all n ∈ N and all i, j ∈ V , the estimators (4.4) or (4.10) result in a matrix representing a complete DAG.
Since small estimated values bji may well be 0 in the true model, we use a threshold δ 1 > 0 with the aim to set an estimator bji < δ 1 equal to 0. However, setting single values bji := 0 may destroy the idempotency of B since idempotency requires for any triple of nodes (j, l, i), For the estimates however, it might be possible that bji < δ 1 , while bjl > δ 1 and bli > δ 1 .In this case, setting bji = 0 would result it bji < bjl bli violating (4.11).To preserve the idempotency of B while setting some small values to 0, we propose a simple adapted thresholding algorithm.
Lemma 4.3.Algorithm 2 with threshold δ 1 > 0 outputs an idempotent matrix, i.e.B B = B and there is no other idempotent matrix B such that b ji = bji whenever bji > δ 1 that contains more zero entries than B.
Remark 4.4.If we choose δ 1 ≤ min{ bji : j < i} no entry is set to 0, and if δ 1 > max{ bji : j < i} all entries are set to 0 except for the diagonal.So in the first case, we obtain the complete DAG and in the second case the DAG consists of isolated nodes only.

Algorithm 2 Thresholding while maintaining idempotency
Input: A topological order π : 1, . . ., d and an idempotent estimate B as in (4.4) or (4.10) and a threshold value δ1 > 0 Output: An idempotent estimate B 1: E := {(j, i) ∈ V × V : sgn( bji) = 1 and i = j} 2: D := (V, E) 3: S := {(j, i) ∈ E : 0 < bji < δ1} 4: Sort the pairs (j, i) in S by the distance i − j from low to high 5: for (j, i) in S do 6: if if for every l with j < l < i: (j, l) or (l, i) ∈ S then 10: bji = 0 11: end if 14: end for 15: return B In order to estimate the minimum ML DAG D B it is not sufficient to decide if a path from j to i exists, i.e. if b ji > 0. We need in particular to decide if the edge j → i belongs to D B .By continuity of the noise variables we may observe for the estimated path weights bji > bjl bli even if b ji = b jl b li .However, by Proposition 4.2, in this situation the difference ( bji − bjl bli ) → 0 a.s. as n → ∞.Therefore, we introduce another threshold δ 2 > 0 enforcing an edge in D B if this difference is greater than δ 2 .In Theorem 3.12 we have seen that the distribution of the ratio P(U i /U j ≤ b ji x) is asymptotically determined by P( k∈Sp ε k − 1 ≤ x) for x ↓ 0. Hence, the rate of convergence of ( bji − bjl bli ) depends crucially on the path length m = |S p |. Ideally, we therefore choose δ 2 = δ 2 (n, m) depending not only on the sample size n, but also on the path length m.
More precisely, since Theorem 5.5 and its proof below), and assuming that C is generic, we find that Algorithm 3 asymptotically identifies D B , if In real life we do not know the number of critical edges in either of the three settings.We distinguish between setting (1) and settings (2)-(3) and propose Algorithm 3 with δ 2 (m) := δ 2 (n, m), i.e., for a fixed sample size n we focus on the path length m.
For setting (1) we do know the underlying unweighted DAG D. Therefore, we do not need to decide whether some small value bji corresponds to a path from j to i.However, we do not know the minimum ML DAG D B such that we would apply Algorithm 3 to estimate D B .For settings (2) and (3) we would apply first Algorithm 2 and afterwards Algorithm 3.
In the next section we derive the asymptotic distribution of the estimators.

Asymptotic distribution of the minimum ratio estimators. With the goal of proving asymptotic distributional properties of the minimum ratio estimators for the different
settings ( 1)-(3), we require regular variation of the noise variable ε in its left endpoint.Under this condition we first prove that also the minimum ratio estimators n k=1 (U k i /U k j ) are regularly varying.Moreover, we show that their joint limit distribution is the product of Weibull distributions.In this section we assume C is generic in the sense of Definition 3.5.The results can be extended to a non-generic model by similar methods as used in Corollary 3.14.
We first recall the family of Weibull distribution functions, which will act as limit distributions for the estimators of B, whose strong consistency we have already proved in section 4.5.
Definition 5.1.A positive random variable Y is Weibull distributed with shape α > 0 and scale s > 0 and we write Y ∼ Weibull(α, s) if the distribution function of Y is given by Next we define regular variation in 0, which transforms to regular variation in 1 (or any other point) and at ∞ by the usual transformations (see [4] for details).Definition 5.2.Let Y be a positive random variable with distribution function F .Then we call Y or F regularly varying at zero with exponent α > 0, if We abbreviate this by Y ∈ RV 0 α or F ∈ RV 0 α , respectively.
In what follows we assume that the random variables εi := ln(ε i ) > 0 for i = 1, . . ., d are iid regularly varying at zero with exponent α > 0 and remark in passing that, by a Taylor expansion of ln(ε) at one, this is equivalent to (ε − 1) ∈ RV 0 α or ε ∈ RV 1 α .
Two families of distribution functions such that ε = ln(ε) ∈ RV 0 α are given in the next example.
We first prove that ln(U i /U j ) − ln(b ji ) is regularly varying at zero which will be a consequence of Theorem 3.12.In this auxiliary result as well as in the theorems below we need that C is generic.Further, for a path p we denote by ζ(p) = |S p | its path length.
Lemma 5.4.Let U ∈ R d + be a recursive ML vector with propagating noise on a DAG D as defined in (3.1) and assume that the path p := The following is the main result of this section and describes the asymptotic distribution of the minimum ratio estimator B from (4.10).In particular, it shows that its entries are asymptotically independent.Theorem 5.5.Let U ∈ R d + be a recursive ML vector with propagating noise as defined in (3.1).Assume that C is generic and that ε = ln(ε) ∈ RV 0 α .For every path p ji from j to i and node set S p ji choose a where c (ji) ∈ (0, 1) is defined as in Theorem 3.12.
If we know the minimum ML DAG D B = (V, E(D B )), it is preferable to estimate b ji as in (4.2).Then Theorem 5.5 reduces as follows.
6. Data analysis and simulation study.We want to apply the methods that we have developed over the past sections and consider a data example.For a quality assessment we also perform a simulation study.
6.1.Data example.We consider dietary supplement data of n = 8327 independent patients taken from a dietary interview from the NHANES report for the year 2015-2016, which is available at https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DR1TOTI.XPT.The data contains 168 food components with the object of estimating the total intake of calories, macro and micro nutrients from foods and beverages consumed a day prior to the interview.More details can be found on the website.
In [18], the data set has been considered in terms of an adapted k-means clustering algorithm for extremal observations.Moreover, assuming a recursive ML model and standardising the marginal data to regular variation at ∞ with α = 2, [23] investigated the causal relationship between four nutrients using a different estimation method based on scalings.
We first consider the full matrix of minimum ratios B = ( bij ) d×d with bij = n t=1 (X We next apply Algorithm 1 to obtain an estimated topological order π := (AC, LZ, BC, V A).
First we want to assess the quality of the estimated topological order π, which also supports or contradicts the model assumption of a Bayesian network.Motivated by the coefficient R 2 of determination in regression we define the following.Definition 6.1.For a given topological order π and an estimator B of the ML coefficient matrix we define the ML coefficient of determination The coefficient R max (π) can take any value in the interval [0, 1].Large R max (π) supports the hypothesis that the underlying graph is a DAG and the estimated topological order lies in the equivalence class of topological orders defined in (4.6).
In our data example, we have R max (π) = 0.929, strongly supporting the hypothesis of a recursive ML model.Now using the estimator (4.10), and applying Algorithms 2 and 3 with δ 1 = 0.02 and δ 2 (k) = 0.02 for k ∈ {1, 2, 3}, we get the estimated minimum ML DAG D B and ML coefficient matrix B, where we sorted the matrix according to π.These are shown in Figure 3. Observe that, since we estimate the edge from AC to LZ to be absent, there are two possible topological orders.
From the estimates we observe that both, α-carotene and β-carotene lead to high amounts of vitamin A. This is in line with our expectation since β-carotene is a precursor to vitamin A and can be converted by β-carotene 15,15'-monoxygenase by many animals including humans.Similarly, also α-carotene can be converted to vitamin A. However, it is only half as active as β-carotene which explains that the edge weight from α-carotene to vitamin A is approximately half compared to the edge weight from β-carotene to vitamin A (0.146 compared to 0.321).Moreover, we can see that high amounts of lutein+zeaxanthin also lead to high amounts of β-carotene and high amounts of α-carotene also lead to high amounts of β-carotene.However, we did not find a significant connection between α-carotene and lutein+zeaxanthin.Observe that [23] inferred the same topological order, yet with one additional edge from α-carotene to lutein+zeaxanthin.However, it is also the edge with the smallest estimated edge weight.Similarly as in [23], we plot bivariate extremes in Figure 4 to underline our finding.The first 5 plots in Figure 4 look rather similar.For every large value of the substance on the vertical axis, we can see a large value of the substance on the horizontal axis.Moreover, these observations are shaped closely to a line.In contrast, a large value of the substance on the horizontal axis might as well coincide with a small value of the substance on the vertical axis.Therefore, e.g. a high amount of α-carotene leads to a high amount of vitamin A but a high amount of vitamin A does not necessarily lead to a high amount of α-carotene.This also supports that the dependence is not mutual and hence we can model it by a DAG.The same can be seen for any pair given in the plots 1-5.Moreover, since parts of the observations are shaped closely along a line, which we would expect for a recursive ML model, we can conclude that the recursive ML model fits the data very well.The sixth plot is different from the other 5 plots, since for most large observations of α-carotene the level of lutein+zeaxanthin is not increased as most large observations in lutein+zeaxanthin do also not result in a high level of α-carotene.Therefore, the two substances do not seem to affect each other and we rightly concluded that there is no edge.6.2.Simulation study.We want to illustrate the effect of observational noise in the ML model.We simulate recursive ML vectors with propagating noise, where the innovations Z 1 , . . ., Z 4 are Fréchet(2) distributed and we use the estimated B from (6.1) from the data analysis above for the ML coefficient matrix B.Moreover, we simulate three different scenarios.In the first scenario, we assume the non-noisy model as given in (1.1), while for the second scenario we choose the propagating noise model with a medium sized noise and in the third setting we choose a noise variable which is stochastically larger.The scenarios are given as follows: We assume to have no information on the underlying DAG and we only consider the quality of the estimator bji given in (4.5).We choose the sample sizes n ∈ {50, 200, 500, 1000} and 1000 simulation runs for each sample size.We first assess the success probabilities for Algorithm 1. Table 1 shows that the topological order can be correctly estimated even for small sample sizes.Moreover, the number of correct runs increases for larger noise variables.This is expected since the noise variables are one-sided.Therefore, for a path p from j to i the ratio U i /U j ≥ d ji (p) k∈Sp ε k increases, while the ratio U j /U i ≤ 1/(d ji (p) k∈Sp ε k ) decreases.Therefore, it is easier to identify the paths in D for larger noise.
Next, we want to assess the quality of the estimated ML coefficient matrix B. To do so, for every pair (j, i) with b ji > 0 and every simulation run k ∈ {1, . . ., 1000}, we denote the minimum ratio estimator given in (4.5) by bk ji .We consider the empirical RMSE, standard deviation and bias for each b ji > 0 in each model ( 1 All three quantities are comparatively small even for small sample sizes and decrease whenever the sample size increases.Moreover, they are larger in the propagating noise model and larger noise terms also increase the three quantities.This is in line with what we can expect from the model as noise terms increase the ratios U i /U j and hence also increase the minimum ratio estimator.On the other hand, recall from above that with increasing noise the estimation of the DAG improves.only path from j to i. Furthermore, the equality bji = bjk bki bkk holds for k = i and k = j while for all k ∈ {i, j} it must hold that bjk bki bkk = 0. Therefore, the equality holds.Moreover, the right-hand side of the inequality again equals zero and we have strict inequality.Now assume j ∈ an(i) and de(j) ∩ pa(i) = ∅.Then for every path p = where in the last step we have used that ε km = bkmkm .Therefore, for the random critical path p with bji = dji (p) it holds that every sub-path of this path is itself critical, otherwise we could find a path of larger random path weight by replacing the sub-path by a path of larger random weight.It follows that bji ≥ k∈de(j)∩an(i) bjk bki bkk with equality whenever the critical path p from j to i contains a node k ∈ de(j) ∩ an(i).Since for k = i or k = j we have bji = bjk bki bkk and for k ∈ V \ (an(i) ∩ de(j)) ∪ {j, i} we have bjk bki bkk = 0, the equality holds as well.
(b) First assume that there is a path p := [j → . . .→ k → . . .→ i] with dji (p) = bji .Then by (A.3) we have bji = djk (p 1 ) dki (p 2 ) bkk .Now every sub-path of a random critical path must be itself critical, as explained in the proof of part a).Hence, bjk = djk (p 1 ) and bki = dki (p 2 ) and for this reason bji = bjk bki bkk .In contrast, let dji (p) < bji for all p ∈ P jki , where P jki denotes all paths from j to i that pass through k.Now choose p 1 = [j → . . .→ k] and p 2 = [k → . . .→ i] such that djk (p 1 ) = bjk and dki (p 2 ) = bki .Then, for the path p ∈ P jki that results from concatenation of p 1 and p 2 we have by (3.4) bji > dji (p) = bjk bki bkk , which proves the reverse direction.
(c) For j = i the inequality obviously holds, since b ii = 1.If j ∈ An(i), then by definition bji = b ji = 0 and bjj = ε j ≥ 1.Therefore, the inequality is equivalent to U i /U j ≥ 0, which is true.Now let j ∈ an(i).Then by (3.2) and (3.4) the center ratio can be written as bji bjj := Therefore, the ratio inherits the continuity of the noise variables and part d) follows.
Next assume that j ∈ an(i), and b ji = k∈de(j)∩an(i) For a contradiction, assume that bji > k∈de(j)∩an(i)

bjk bki bkk
. This is equivalent to the edge j → i being the random critical path.However, every path p ∈ P ji has random path weight, which depends on both noise variables ε i and ε j , so in particular, the non-random critical path p = [j = k 0 → k 1 → . . .→ k n = i] from j to i with path weight d ji (p) = b ji is one of these paths.Therefore, by (3.2) and since b ji > c ji , the random path weight of p is where we have used that ε ≥ 1.This is a contradiction and hence bji = k∈de(j)∩an(i) otherwise we can construct a path of larger random path weight from j to i passing through k as explained in the proof of part a).First assume that bji = dji (p max ).Then, bji > k∈de(j)∩an(i) bjk bki bkk and de(j Hence, the event given by (A.7) has positive probability which is however, strictly smaller than one, since the noise variables do not have an upper bound.Therefore, since bji ≥ k∈de(j)∩an(i)

bjk bki bkk
, by part a), the complementary event bji = k∈de(j)∩an(i) bjk bki bkk is also having positive probability.
Proof of Lemma 3.11 (a) Suppose there is an edge k l → k l+1 in p such that c k l k l+1 ∈ D B .Then, de(j)∩an(i) = ∅ and by Lemma 3.8 e) P( bk l k l+1 = c k l k l+1 ε k l ε k l+1 ) = 0, so we can replace the edge k l → k l+1 by some other path to get a new path from j to i of larger random path weight than p.Hence, p is not a possible critical path realization.The same argument can be used for the reverse.
(b) First consider ¬(S p 1 ∩ S p 2 = ∅ or for every r ∈ S p 1 ∩ S p 2 the sub-path of p 1 from j to r is a sub-path of p 2 or the sub-path of p 2 from l to r is a sub-path of p 1 ).Then there exists some node r ∈ S p 1 ∩ S p 2 such that p 1 = [j → . . .→ s → r → . . .→ i] and p 2 = [l → . . .→ t → r → . . .→ m] with s = t.Denote by p 11 := [j → . . .→ s] the sub-path of p 1 from j to s.We want to show by contradiction that the event (3.9) has probability zero.Therefore, we consider the subset of Ω such that (3.9) holds and show that it is a null-set.Since on this subset, p 1 is the random critical path and passes through s, by Lemma 3.
Now since D is acyclic, there cannot be a path from s to t and from t to s; so without loss of generality we can assume that there is no path from t to s.However, the right-hand side of the equation always contains ε t which is not part of the left-hand side.Since Z 1 , . . ., Z d as well as ε 1 , . . ., ε i are atom-free and independent random variables, this can only happen on a null-set.
Next consider the reverse, i.e. S p 1 ∩ S p 2 = ∅ or for every r ∈ S p 1 ∩ S p 2 the sub-path of p 1 from j to r is a sub-path of p 2 or the sub-path of p 2 from l to r is a sub-path of p 1 .
If S p 1 ∩S p 2 = ∅, then the probability of (3.9) is obviously positive.Without loss of generality we now assume that for every r ∈ S p 1 ∩ S p 2 the sub-path of p 2 from l to r is a sub-path of p 1 .We now define r to be the last common node of the two paths p 1 and p 2 .Then, p 1 and p 2 induce the paths p = [j → . . .→ l → . . .r], p = [r → . . .→ i] and p = [r → . . .→ m].Then which has positive probability, since S p ∩ S p ∩ S p = ∅.
Proof of Theorem 3.12 By the law of total probability we have for x ≥ 1, Cancelling all noise variables possible, and since ε > 1, we find a lower bound for some constant c 1 ∈ [0, 1] by independence of the noise variables.We show that c 1 > 0. To do so, recall that b ji /d ji (p) > 1 for every p = p max .Therefore, since {p ∈ P ji \ {p max }} = ∅ and ε > 1, Next, we want to show that also For this, observe that the left-hand side of the inequality in (A.15) does not contain Z j , since all paths from j to i pass through j.Since blj and the left-hand side of the inequality in (A.15) is independent of Z j for all l ∈ {1, . . ., d} and Z j has unbounded support, Z j can become arbitrarily large with positive probability such that (A.15) holds.The intersection of the two events has also positive probability since (A.14) is independent of Z j .This implies that c 1 > 0 and a positive lower bound for I 11 (x).
To get an upper bound, observe that ε ≥ 1 and, hence, for every set S p we have Therefore, starting with (A.13) we find the upper bound Since the innovations and the noise variables are atom-free, it follows that lim x↓1 c 2 (x) = c 1 and, therefore, We next show that I 12 (x) = o(I 11 (x)) as x ↓ 1.We have for each summand m ∈ {1, . . ., r}, using the simple identity (A.10) to obtain the third line, Now the first event rewrites as Hence, (A.18) ≤ P Moreover, since ε ≥ 1, we have for every subset S ⊆ S pmax that 1 ≤ k∈S ε k ≤ x, whenever 1 ≤ k∈Sp max ε k ≤ x.Therefore, for another node set S with S ∩ S = ∅ we have Finally, since dji (p m ) = bji and d ji (p m ) < d ji (p max ) we have S pm \ S pmax = ∅.In total, we obtain as the interval in the second probability gets arbitrarily small and the distribution of ε is atomfree.Comparing this upper bound with (A.17) we can see that every summand of I 12 (x) is negligible with respect to I 11 (x) as x ↓ 1.Since there are only finitely many nodes and hence finitely many paths from j to i, we have proved that I 12 (x) = o(I 11 (x)) as x ↓ 1.Hence, Next, we assume that r = 0, i.e. that there is only one path p max from j to i. Then from (A.9) we find that I 1 (x) = I 11 (x) and simplifies (A.13) to for c 1 > 0. On the other hand, (A.21) which, together with (A.24), proves the result.
Proof of Corollary 3.14.We first show the result for P ji \ {p 1 , . . ., p n } = ∅, i.e. there exists a path p from j to i with d ji < b ji .We start as in the proof of Theorem 3.12 for x ≥ 1 and similarly to (A.8), we again apply the law of total probability to I 1 (x) With the same arguments as in the proof of Theorem 3.12 we find upper and lower bounds for Ĩ11 (x).Analogously to (A.12) and (A.13) we find Ĩ11 (x) =P With the same arguments as in the previous proof, we can show that c 1 ∈ (0, 1) and c 2 (x) → c 1 for x ↓ 1 and Ĩ12 (x) = o( Ĩ11 (x)) and I 2 (x) = o(I 1 (x)).Hence, the result follows.If P ji \ {p 1 , . . ., p n } = ∅ the result follows analogously.
Proof of Corollary 3.15 From Theorem 3.12 we have as x ↓ 1, where we have used the binomial theorem and the fact that the summands for k ≥ 2 are negligible when n is fixed.
Proof of Theorem 3.16.We give a proof for P ji \ {p 1 } = ∅ and P lm \ {p 2 } = ∅, i.e. p 1 and p 2 are not the only paths from j to i and from l to m, respectively.All other cases follow analogously.By the law of total probability, we have We first consider I 1 (x 1 , x 2 ).Observe that for I 11 (x) defined in (A.9) we have by (A.25) and hence, I 1 (x 1 , x 2 ) is the bivariate extension to I 11 (x).For this reason, we can follow the proof of Theorem 3.12 at (A.11), we again find upper and lower bounds based on the decomposition Now for three paths p, p 1 and p 2 and a node i we denote Therefore, For an upper bound, observe that on as well as For this reason, The proof of Theorem 5.5 is divided into a proof of the one-dimensional marginal limit distributions, followed by the proof of the multidimensional result.We start with the onedimensional limits.
Proposition C.4.Let U be a recursive ML vector with propagating noise on a DAG D as defined in (3.1) and assume that the path p := [j → • • • → i] from j to i is generic.Assume further that ε = ln(ε) ∈ RV 0 α .For the node set S p choose a n ∼ F ← k∈Sp εk (1/n) as n → ∞.Let U 1 , . . ., U n be an iid sample from U .Then Recall from Theorem 3.12 and the regular variation of ε, that for the same c as defined in Theorem 3.12 we have

Definition 3 . 5 .
Let D be a DAG with edge weight matrix C and let B be the corresponding ML coefficient matrix (i.e. the Kleene star of C).Let p be a path from j to i with node set S p .(a) p is called a (non-random) critical path if d ji (p) = b ji .(b) p is called a generic path if it is the only path satisfying d ji (p) = b ji .(c) We call C generic, if all paths in D are generic.(d) p is called a random critical path if dji (p) = bji .(e) p is called a possible critical path realization, if j = i, neither the distribution of U i /U j nor the distribution of U j /U i have any atoms.e) If b ji = k∈de(j)∩an(i) b jk b ki b kk , then bji = k∈de(j)∩an(i) bjk bki bkk .f ) If b ji > k∈de(j)∩an(i) b jk b ki b kk and de(j) ∩ an(i) = ∅, then P bji > k∈de(j)∩an(i) bjk bki bkk > 0 and P bji = k∈de(j)∩an(i) bjk bki bkk > 0. Lemma 3.8 b) and f) motivate the following definition.

Fig 2 :
Fig 2: Both dashed paths p 1 := [j → k 5 → k 6 → i] and p 2 := [l → k 5 → k 6 → m] can only on a null-set be possible critical path realizations from the same realized noise variables along the nodes.

4. 1 .
Identifiability of the model.Most results concerning the identifiability are based on results from section 3.As we have already seen in Example 3.7, the edge weight matrix C is generally not identifiable from the distribution of U .However, an immediate consequence of Lemma 3.8 d) is the following.

Corollary 4 . 1 .
Let U ∈ R d + be a recursive ML model with propagating noise on a DAG D as defined in (3.1).Then the ML coefficient matrix B is identifiable from the distribution of U .

. 3 ) 4 . 3 . 4 .
Known topological order.Given an iid sample U 1 , . . ., U n ∈ R d + from a recursive ML model with propagating noise without knowing D, but knowing the topological order of nodes, we adapt the estimator (4.1) to this situation and define B := ( bij ) d×d = Unknown DAG and unknown topological order.Given an iid sample U 1 , . . ., U n ∈ R d +

Algorithm 3
Approximating max-weighted paths Input: Threshold sequences δ2(1), . . ., δ2(d) and (a): a known underlying DAG D := (V, E) and an estimate B as in (4.2), or (b): a (known or estimated) topological order π : 1, . . ., d and an estimate B as in (4.4) or (4.10) Output: An estimated minimum DAG D B = (V, E B ) E B := ∅ and D B := (V, E B ) (1): S := {(j, i) ∈ V × V : j ∈ pa(i)} and infer a topological oder π : 1, . . ., d from D (2)-(3): S := {(j, i) ∈ V × V : j < i} Sort pairs (j, i) in S by their distance (i − j) according to the topological order from low to high for (j, i) in S do if ∃ path p from j to i in D B then Set m as the maximum path length in D B Set l := arg max l∈V \{j,i} bjl bli

Fig 3 :
Fig 3: Estimated minimum ML DAG D B with estimated ML coefficient matrix B.
The assumptions b ji > k∈de(j)∩an(i) b jk b ki b kk and de(j) ∩ an(i) = ∅ are equivalent to the edge p max = [j → i] being the only non-random critical path.Let p = [j = k 0 → k 1 → . . .→ k n = i] = p max be the path such that p∈P ji \{pmax} dji (p) = dji (p ).Then p∈P ji \{pmax} dji (p) = dji (p ) = k∈de(j)∩an(i) bjk bki bkk , 8 b) we have bji = bjs bsi bss and U s = U j bjs /ε j = U j d js (p 11 ) k∈Sp 11 ε k .With the same argument it also holds that bji = bjr bri brr and U r = U j bjr /ε j = U j d js (p 11 ) k∈Sp 11 ε k c sr ε r .Hence, it must holds that U r = U s c sr ε r .By the same arguments, we also must have U r = U t c tr ε r , which together leads to U s c sr = U t c tr .This is by (3.4) and (3.6) equivalent to c sr l∈An(s) and, since Z and ε are atom-free it again follows that lim x↓1 c 2 (x) = c 1 and therefore (A.20) holds also for r = 0.We next show that I 2 (x) = o(I 1 (x)) as x ↓ 1.Since I 12 (x) = o(I 11 (x) as x ↓ 1, we can and do assume that bji = b ji ε j k∈Sp max ε k .

p∈P lm \{p 1 }
k∈S p\p 1 +p 2 ε k ≤ b lm x 2 d lm (p) n∈An(i) p∈P ni \P nji d ni (p) k∈S p+n\p 1 +p 2 ε k Z n ≤ b ji x 1 x 2 n∈An(j) p∈P nj d nj (p) k∈S p+n\p 1 +p 2 ε k Z n n∈An(m) p∈Pnm\P nlm d nm (p) k∈S p+n\p 1 +p 2ε k Z n ≤ b lm x 1 x 2 n∈An(l) p∈P nl d nl (p) k∈S p+n\p 1 +p 2 ε k Z n .Since all random variables are continuous, c 4 (x 1 , x 2 ) tends to c 3 (x 1 , x 2 ) for x 1 , x 2 ↓ 1, andc 3 (x 1 , x 2 ) ≤ c ≤ c 4 (x 1 , x 2 ) with c := P p∈P ji \{p 1 } k∈S p\p 1 +p 2 ε k ≤ b ji d ji (p) p∈P lm \{p 1 } k∈S p\p 1 +p 2 ε k ≤ b lm d lm (p) n∈An(i) p∈P ni \P nji d ni (p) k∈S p+n\p 1 +p 2 ε k Z n ≤ b ji n∈An(j) p∈P nj d nj (p) k∈S p+n\p 1 +p 2 ε k Z n n∈An(m) p∈Pnm\P nlm d nm (p) k∈S p+n\p 1 +p 2 ε k Z n ≤ b lm n∈An(l) p∈P nl d nl (p) k∈S p+n\p 1 +p 2 ε k Z n .Since i = m we can use the same arguments as for c 1 in the proof of Theorem 3.12 to show that c > 0. Therefore, we only need to show thatI i (x 1 , x 2 ) = o(I 1 (x 1 , x 2 )) for i ∈ {2, 3, 4}.It is obvious that I 2 (x 1 , x 2 ) = o(I 1 (x 1 , x2)) implies the other two cases.Using the same arguments from the proof of Theorem 3.12 regarding I 2 (x) = o(I 1 (x)) and I 12 (x) = o(I 11 )(x), the result follows.Proof.(a) We use Proposition C.1 a) for U being F X or F Y , the distribution function of X or Y , respectively.Since the Laplace-Stieltjes transforms FX (s) = [0,∞) e −sx dF X (x) ≤ 1 and FY (s) = [0,∞) e −sx dF Y (x) ≤ 1 for all s ≥ 0, by Proposition C.1, they are both regularly varying at ∞ in the sense of (C.1); i.e., FX ∈ RV ∞ α 1 , FY ∈ RV ∞ α 2 .By independence, the convolution theorem for Laplace-Stieltjes transforms gives FX+Y (s) = FX (s) FY (s) and, therefore, FX+Y ∈ RV ∞ α 1 +α 2 .Applying again Proposition C.1 we find that X 1 +X 2 ∈ RV 0 α 1 +α 2 .(b) This follows from a Taylor expansion.Proof of Lemma 5.4 By Theorem 3.12 we get for x ↓ 1,lim t↓0 P(ln(U i /U j ) − ln(b ji ) ≤ tx) P(ln(U i /U j ) − ln(b ji ) ≤ t) = lim t↓0 P(U i /U j ≤ b ji exp(tx)) P(U i /U j ≤ b ji exp(t)) = lim t↓0 c P k∈Sp ε k ≤ exp(tx) c P k∈Sp ε k ≤ exp(t) = lim t↓0 P k∈Sp ln(ε k ) ≤ tx P k∈Sp ln(ε k ) ≤ t = x ζ(p)αfor ζ(p) = |S p | by Lemma C.2 a) and the fact that ln(ε k ) ∈ RV 0 α .For the proof of Theorem 5.5 we need the following distribution family.Definition C.3.A positive random variable Y is Fréchet distributed with shape α > 0 and scale s > 0 and we write Y ∼ Fréchet(α, s) if the distribution function of Y is given by Φ α,s (x) = exp − x s