Geometry of weighted recursive and affine preferential attachment trees

We study two models of growing recursive trees. For both models, initially the tree only contains one vertex $u_1$ and at each time $n\geq 2$ a new vertex $u_n$ is added to the tree and its parent is chosen randomly according to some rule. In the \emph{weighted recursive tree}, we choose the parent $u_k$ of $u_n$ among $\{u_1,u_2,\dots, u_{n-1}\}$ with probability proportional to $w_k$, where $(w_n)_{n\geq1}$ is some deterministic sequence that we fix beforehand. In the affine preferential attachment tree with initial fitnesses, the probability of choosing the same $u_k$ is proportional to $a_k+\mathrm{deg}^{+}(u_k)$, where $\mathrm{deg}^{+}(u_k)$ denotes its current number of children, and the sequence of initial fitnesses $(a_n)_{n\geq 1}$ is deterministic and chosen as a parameter of the model. We show that for any sequence $(a_n)_{n\geq 1}$, the corresponding preferential attachment tree has the same distribution as some weighted recursive tree with a random sequence of weights (with some explicit distribution). We then prove almost sure convergences for some statistics associated to weighted recursive trees as time goes to infinity, such as degree sequence, height, profile and measures. Thanks to the connection between the two models, these results also apply to affine preferential attachment trees.


Introduction
The uniform recursive tree has been introduced in the 70's as an example of random graphs constructed by addition of vertices: starting from a tree with a single vertex, the vertices arrive one by one and the n-th vertex picks its parent uniformly at random from the n − 1 already present vertices. Many properties of this tree were then investigated due to its particularly simple dynamics: number of leaves, profile, height, degrees, distribution of vertices into subtrees... We refer to [13] for an overview. A generalisation of the uniform recursive, the weighted recursive tree (WRT), was introduced in [6] in 2006. In this model, each vertex is assigned a non-negative weight, constant in time. When a newcomer randomly picks its parents, it does so with probability proportional to those weights. Although more general than the uniform recursive tree, WRT have attracted far fewer contributions, see e.g. [20,16].
We will also consider another model of trees which we call the affine preferential attachment tree (PA) with initial fitnesses. In this one every vertex has a fixed initial fitness, and the probability of picking any vertex to be the parent of a newcomer is proportional to its initial fitness plus its current number of children. This type of preferential attachment mechanism has been extensively studied in the last two decades because it shares some quantitative properties with real networks, see in particular the literature about Barabási-Albert model. One of our motivations for studying such trees arises from the analysis of some growing random graphs, see the companion paper [25].
We shall see that using a de Finetti-type argument, preferential attachment trees can be seen as WRT with random weights. This will enable us to translate results obtained for WRT to corresponding results for PA.

Two related models of growing trees
Definitions. For any sequence of non-negative real numbers (w n ) n≥1 with w 1 > 0, we define the distribution WRT((w n ) n≥1 ) on sequences of growing rooted trees 1 , which is called the weighted recursive tree with weights (w n ) n≥1 . We construct a sequence of rooted trees (T n ) n≥1 starting from T 1 containing only one root-vertex u 1 and let it evolve in the following manner: the tree T n+1 is obtained from T n by adding a vertex u n+1 with label n + 1. The father of this new vertex is chosen to be the vertex with label K n+1 , where ∀k ∈ {1, . . . , n}, In this definition, we also allow sequences of weights (w n ) n≥1 that are random and in this case the distribution WRT((w n ) n≥1 ) denotes the law of the random tree obtained by the above process conditionally on (w n ) n≥1 , so that the obtained distribution on growing trees is a mixture of WRT with deterministic sequences. Similarly, for any sequence (a n ) n≥1 of real numbers, with a 1 > −1 and a n ≥ 0 for n ≥ 2, we define another model of growing tree. The construction goes on as before: P 1 containing only one root-vertex u 1 and P n+1 is obtained from P n by adding a vertex u n+1 with label n + 1 and the father of the newcomer is chosen to be the vertex with label J n+1 , where now ∀k ∈ {1, . . . , n}, where deg + Pn (·) denotes the number of children in the tree P n . In the particular case where n = 1, the second vertex u 2 is always defined as a child of u 1 , even in the case −1 < a 1 ≤ 0 for which the last display does not make sense. We call this sequence of tree an affine preferential attachment tree with initial fitnesses (a n ) n≥1 and its law is denoted by PA((a n ) n≥1 ).
Here and in the rest of the paper, whenever we have any sequence of real numbers (x n ) n≥1 , we write x = (x n ) n≥1 in a bold font as a shorthand for the sequence itself, and (X n ) n≥1 with a capital letter to denote the sequence of partial sums defined for all n ≥ 1 as X n := n i=1 x i . In particular, we do so for sequences of initial fitnesses (a n ) n≥1 , for deterministic sequences of weights (w n ) n≥1 and for random sequence of weights (w n ) n≥1 .
Representation result. The following result gives a connection between these two models of growing trees. It is an analogue of the so-called "Pólya urn-representation" result described in [2,Theorem 2.1] or [5,Section 1.2] for related models.
Theorem 1 (WRT-representation of PA trees). For any sequence a of initial fitnesses, we define the associated random sequence w a = (w a n ) n≥1 as w a 1 = W a 1 = 1 and ∀n ≥ 2, W a n = n−1 where the (β k ) k≥1 are independent with respective distribution Beta(A k + k, a k+1 ). Then, the distributions PA(a) and WRT(w a ) coincide.
Let us quickly explain how this sequence w a can be read from the growth of the trees (P n ) n≥1 ∼ PA(a). For any sequence of weights w that satisfies for some γ ∈ (0 , 1) and a positive C > 0, it is easy to prove that the degrees of vertices in a sequence of random trees (T n ) n≥1 with distribution WRT(w) are such that almost surely for all From this observation, if we suppose that the theorem holds and that the sequence w a has almost surely the behaviour (2), then using the convergence (3) for the sequence (P n ) n≥1 ∼ WRT(w a ) conditionally on the sequence w a ensures that for all k ≥ 1, almost surely.
As suggested by the last display, the result of the theorem is obtained by studying the evolution of the degrees in the preferential attachment model (P n ) n≥1 . The key argument lies in the fact that we can describe the whole process using a sequence of independent Pólya urns, related to the degrees of the vertices. The theorem is then obtained by using de Finetti theorem for these urns.
In fact, and this is the content of Proposition 2 below, if A n grows linearly as some c · n with some c > 0 then the sequence (W a n ) indeed almost surely satisfies (2) for γ = c c+1 . This is done using moment computations under the explicit definition of (W a n ) n≥1 given by the theorem. In the rest of the paper, we investigate several properties of the WRT under this type of assumptions for the sequence of weights, such as convergence of height, profile and measures carried on the tree. Thanks to this connection, our results will then also hold for PA tree under the assumption that A n grows linearly.
Assumptions on the sequences. For two sequences (x n ) and (y n ) we say that Our main assumption for sequences a = (a n ) n≥1 of initial fitnesses is the following (H c ), which is parametrised by some positive c > 0 and ensures that the initial fitness of vertices is c on average For sequences of weights w = (w n ) n≥1 , we introduce the following hypothesis, which depends on a parameter γ > 0 The following proposition ensures in particular that our assumption on sequences of initial fitnesses a translates to a power behaviour for the random sequence of cumulated weights (W a n ) n≥1 defined in Theorem 1.
Proposition 2. Suppose that there exists c > 0 such that a satisfies (H c ), then the random sequence (w a n ) n≥1 defined in Theorem 1 almost surely satisfies ( γ ) with γ = c c + 1 .
Convergence of degrees using the WRT representation. In the WRT with a deterministic sequence of weights that satisfy (2), the degree of one fixed vertex evolves as a sum of independent Bernoulli random variables and it is possible to handle it with elementary methods and obtain (3). Further calculations allow us to improve this statement to a convergence in a ℓ p sense, for sequences w that satisfy some additional control. A precise version of this statement is given in Proposition 5. Suppose that a satisfies (H c ). Applying this convergence to sequence of random trees (P n ) n≥1 which has distribution PA(a), using its WRT-representation provided by Theorem 1, together with Proposition 2, yields the following almost sure convergence to a random sequence, in the product topology, which also takes place in the space ℓ p for all p > c+1 1−(c+1)c ′ as soon as a n ≤ n c ′ +o (1) , for some 0 ≤ c ′ < 1 c+1 . This improves some ℓ p convergence proved in distribution in [21] for a related model, which we treat in Proposition 31.
Of course, thanks to our discussion above concerning the convergence of degrees, it is immediate that the sequence (m a n ) n≥1 is almost surely proportional to the sequence (w a n ) n≥1 i.e.
(m a where Z is the random variable such that W a n ∼ Z · n c c+1 almost surely as n → ∞, which exists thanks to Proposition 2. Of course, even if (W a n ) n≥1 was defined as a product of independent random variables, it is not the case for (M a n ) n≥1 anymore since the random variable Z depends on the whole sequence (β n ) n≥1 used in the definition of (W a n ) n≥1 . Nevertheless, the sequence still has the nice property of being an inhomogeneous Markov chain with a simple backward transition, characterised by the equality M a n = β n · M a n+1 , where β n is independent of M a n+1 and has distribution Beta(A n + n, a n+1 ). This is the content of Proposition 27.
Distribution of the limiting chain. For some specific choices of sequences a, the distribution of the chain (M a n ) n≥1 is explicit. Whenever a is of the form we retrieve Goldschmidt and Haas' Mittag-Leffler Markov chain family, introduced in [15] and also studied by James [17]. The other case where the chain is explicit is when a is of the form In this case, the process (M a n ) n≥1 is constant on the interval of the form 1 + kℓ , (k + 1)ℓ and we define N a k := M a (k−1)ℓ+1 for all k ≥ 1. Then the sequence ℓ ℓ m+ℓ m+ℓ ·(N a k ) k≥1 has the Product Generalised Gamma distribution PGG (a, ℓ, m), which we define in Section 5.1.2.

Other geometric properties of weighted random trees
Let us now state the convergence for other statistics of weighted random trees, namely profile, height and probability measures. Here we let (T n ) n≥1 be a sequence of trees evolving according to the distribution WRT(w) for some deterministic sequence w and state our results in this setting. Our results will also apply to random sequences of weights w that satisfy the assumptions of the theorems almost surely, they will hence apply to PA trees with appropriate sequences of initial fitnesses, thanks to Theorem 1 and Proposition 2.

Height and profile of WRT
be the number of vertices of T n at height k. The function k → L n (k) is called the profile of the tree T n . The height of the tree is the maximal distance of a vertex to the root, which we can also express as ht(T n ) := max {k ≥ 0 | L n (k) > 0}. We are interested in the asymptotic behaviour of L n and ht(T n ) as n → ∞.
In order to express our results, we need to introduce some quantities. For γ > 0, we define the function f γ : R → R as This function is increasing on (−∞ , 0] and decreasing on [0 , ∞) with f γ (−∞) = 1−γ and f γ (0) = 1 and f γ (∞) = −∞. We define z + and z − as We are going to assume that we work with a sequence w which satisfies the following assumption ( p γ ) for some γ > 0 and p ∈ (1 , 2], Thanks to Proposition 2, this property is almost surely satisfied for γ = c c+1 by the random sequence w a for any sequence a of initial fitnesses satisfying A n ⊲⊳ n→∞ c · n and a n ≤ (n + 1) o(1) .
Theorem 3. Suppose that there exists γ > 0 and p ∈ (1 , 2] such that the sequence w satisfies ( p γ ). Then, for a sequence of random trees (T n ) n≥1 ∼ WRT(w), we have the almost sure asymptotics for the profile where the error term is uniform in k ≥ 0. Also for any compact K ⊂ (z − , z + ) we have almost surely for all z ∈ K L n (⌊γe z log n⌋) = n fγ (z)− 1 2 log log n log n +O( 1 where the error term is uniform in z ∈ K. Moreover, we have the almost sure convergence The proof of this result follows the path used for many similar results for trees with logarithmic growth (see [7,8,19]): we study the Laplace transform of the profile z → n k=0 e zk L n (k) on an open domain of the complex plane and prove its convergence to some random analytic function when appropriately rescaled. Then, we apply [18, Theorem 2.1], which consists in a fine Fourier inversion argument and hence allows to obtain precise asymptotics for L n . The application of the theorem in its full generality proves a so-called Edgeworth expansion for L n , which we express here in a weaker form by equations (7) and (8). The convergence (7) expresses that the profile is asymptotically close to a Gaussian shape centred around γ log n and with variance γ log n, so that a majority of vertices have a height of order γ log n. The second equation (8) provides the behaviour of the number of vertices at a given height, for heights that are not necessarily close to γ log n (for which the preceding result ensure that there are of order n √ log n vertices per level). According to this result, at height ⌊γe z log n⌋ for any z ∈ (z − , z + ) there are of order n fγ (z) √ log n vertices. Remark that the exponent f γ (z) is continuous in z and tends to 0 when z → z + . Although this does not directly prove the convergence (9), it already provides a lower-bound for ht(T n ) since it ensures that asymptotically there always exist vertices at height ⌊γe (z+−ǫ) log n⌋, for any small ǫ > 0. The convergence of the height (9) can then be obtained by proving a corresponding upper-bound, which can be done using quite rough estimates. This result includes the well-known asymptotics ht(T n ) ∼ e log n as n → ∞ for the uniform random tree, proved for example in [12,24]. Using the connection of preferential attachment trees to weighted recursive trees given by Theorem 1, it also includes the case of preferential attachment trees with constant initial fitnesses, for which similar results were proved, in [24] for the height and in [19] for the asymptotic behaviour of the profile (7).
As a complement to this result, let us mention that there is another case where we can compute the asymptotic height of the tree, which corresponds to sequences w that grow fast to infinity. For any sequence of weights w, a quantity of interest is n i=2 wi Wi , which is the expected height of a "typical" point. When this quantity grows faster than logarithmically, we have the almost sure convergence (see Proposition 25 in Section 3.3) which in some sense indicates that all the action takes place at the very tip of the tree.

Convergence of the weight measure
We also study the convergence of some natural probability measures defined on the trees (T n ) n≥1 . This will prove useful for the applications developed in the companion paper [25].
For this result it will be easier to work with plane trees. We introduce the Ulam-Harris tree U = ∞ n=0 N n , where N := {1, 2, . . . }. Classically, a plane tree τ is defined as a non-empty subset of U such that . We choose to construct our sequence (T n ) n≥1 of weighted recursive trees as plane trees by considering that each time a vertex is added, it becomes the right-most child of its parent. In this way the vertices (u 1 , u 2 . . . ) of the trees (T n ) n≥1 , listed in order of arrival, form a sequence of elements of U. In fact, from now on, we will always assume that we use this particular embedded construction, both for WRT and PA trees.
We also denote ∂U = N N , which we can be interpreted as the set of infinite paths from the root to infinity, and write U = U ∪ ∂U. We classically endow this set with the distance where u ∧ v denotes the most recent common ancestor of u and v in U.
For every n ≥ 1, we define the measure µ n on U, which only charges the set {u 1 , . . . , u n } of vertices of T n , with for any 1 ≤ k ≤ n, We refer to µ n as the natural weight measure on T n . The following theorem classifies the possible behaviours of (µ n ) for any weight sequence.
Theorem 4. The sequence (µ n ) n≥1 converges almost surely weakly towards a limiting probability measure µ on U. There are three possible behaviours for µ: wi Wi 2 < ∞, then µ is diffuse and supported on ∂U.
wi Wi 2 = ∞ then µ is concentrated on one point of ∂U.
This convergence can be extended to other natural measures on the tree, such as the uniform measure on T n , or some "preferential attachment measure" which charges each vertex proportionally to some affine function of its degree. This is the content of Proposition 8.

Organisation of the paper
The paper is organised as follows. We first investigate some properties of weighted random trees (T n ) n≥1 with deterministic weight sequence w. In Section 2.1 we first prove Proposition 5 which states the convergence of the degree sequence using elementary methods. Then in Section 2.2, we prove the weak convergence of the weight measure µ n to some limit µ and describe three regimes for its behaviour. We also study other natural measures related to the sequence of trees (T n ) and prove that they also converge towards µ. For all these measures, our main tool consists in introducing martingales related to the mass of a subtree descending from a fixed vertex. This is the content of Theorem 4 and Proposition 8. In Section 3, we prove Theorem 3 about the convergence of the height and the profile of WRT. This is achieved by first proving the uniform convergence of a rescaled version of the Laplace transform of the profile on a complex domain, which is the content of Proposition 9. This ensures that we can use [18,Theorem 2.1] for the convergence of the profile. This convergence provides a lower-bound for the height of the tree; we then prove a matching upper-bound to obtain asymptotics for the height.
Then we switch to studying a sequence (P n ) n≥1 of preferential attachment trees with initial fitnesses a. In Section 4, we present a proof of Theorem 1 using a coupling of the preferential attachment process with a sequence of Pólya urn processes and this establishes that (P n ) n≥1 can also be described as having distribution WRT(w a ) for a random sequence w a ; we then prove Proposition 2 which relates the properties of w a to the ones of a. We finish the section by stating and proving Proposition 27 in which we prove that the sequence (M a n ) defined above as some random multiple of (W a n ) is a Markov chain. In Section 5, we identify in Proposition 28 the distribution of the chain (M a n ) for particular sequences a using moment identifications. We then present an application of this result to an other model of preferential attachment graphs in Proposition 31.
Some technical results can be found in Appendix A.

Measures and degrees in weighted random trees
In this section, we work with a sequence of trees (T n ) n≥1 that has distribution WRT (w) for a deterministic sequence w. We start with two statistics of the tree that are quite easy to analyse, namely the sequence of degrees of the vertices of the tree and also some natural measures defined on the tree.

Convergence of the degree sequence
We start the section by proving convergence for the sequence of degrees of the vertices in their order of creation under the WRT model. We suppose here that the sequence of weights w is such that there exists constants C > 0 and 0 < γ < 1 for which We write deg + Tn (u k ) for the out-degree of the vertex u k in T n . For a fixed k ≥ 1 remark that, as a sequence of random variables indexed by n ≥ 1, we have the equality in distribution with (U i ) i≥1 a sequence of independent uniform variables in (0 , 1). With this description of the distribution of the degrees of fixed vertices, only using some law of large numbers for the convergence and Chernoff bounds for the fluctuations we obtain the following result.
Proposition 5. For a sequence of weights w satisfying (11), the following holds.
(i) We have the almost sure pointwise convergence (ii) If the sequence furthermore satisfies w k ≤ (k + 1) γ−1+c ′ +o(1) for some constant 0 ≤ c ′ < 1 − γ, then there exists a function of k which goes to 0 as k → ∞, also denoted o(1), such that all n large enough, we have for all k ≥ 1 and the convergence (13) holds almost surely in the space ℓ p for all p > 1 1−γ−c ′ . Proof. To prove (i), just remark that for any k ≥ 1 such that w k = 0, thanks to (11), we have so thanks to the law of large numbers, we get that almost surely For the indices k for which w k = 0, we of course have deg + Tn (u k ) = 0 almost surely for all n ≥ 1, and so the convergence also holds. This finishes the proof of (i).
For the second part of the statement, let us first compute Now let C ′ be a constant such that for all n ≥ 1, we have n−1 i=1 1 Wi ≤ C ′ · n 1−γ (such a constant exists because of the assumption (11)). For all k ≥ 1, we introduce the following where the real number a > 0 is chosen in such a way that the function x → x γ−1 log(x + a) is decreasing on R * + . Using Markov's inequality, we get for any integers k and n such that n ≥ k Using an union bound, the fact that deg + Tn (u k ) = 0 for any k > n, and the definition of ξ k , we get that for all n ≥ 1 The last display is summable over all n ≥ 1 and hence using the Borel-Cantelli lemma, we almost surely have for n large enough ∀k ≥ 1, deg + Tn (u k ) ≤ n 1−γ · ξ k . We can conclude by noting that under our assumptions we have ξ k ≤ (k + 1) γ−1+c ′ +o (1) . The convergence in ℓ p for p > 1 1−γ−c ′ is just obtained by dominated convergence using the pointwise convergence (13) and the ℓ p domination (14).

Convergence of measures
The goal of this section is to prove Theorem 4, which concerns the convergence of the sequence of weight measures (µ n ) seen as measures on U. One of the key arguments is the fact that the weight of the subtree descending from a fixed vertex can be described using a generalised Pólya urn scheme, as studied by Pemantle [23]. We also prove Proposition 8, which states the weak convergence of other measures.
Convergence of the weight measure in U. Recall from the introduction the definition of the Ulam-Harris tree U = ∞ n=0 N n and its completed version U = U ∪ ∂U, which is endowed with the distance d(u, v) = exp (− ht(u ∧ v)). For any u ∈ U, we write T (u) := uv v ∈ U the sub-tree descending from u. In U there is an easy characterisation of the weak convergence of Borel measures, which a direct consequence of the Portmanteau theorem (see e.g. [4, Theorem 2.1]): Lemma 6. Let (π n ) n≥1 be a sequence of Borel probability measures on U. Then (π n ) n≥1 converges weakly to a probability measure π if and only if for any u ∈ U, π n ({u}) → π({u}) and π n (T (u)) → π(T (u)) as n → ∞.
We are going to apply this criterion to our sequence (µ n ) n≥1 , which, we recall, is defined in such a way that for all n ≥ 1, the measure µ n charges only the vertices {u 1 , u 2 , . . . , u n } of the tree T n , and such that for any 1 ≤ k ≤ n, We can already see that if (W n ) n≥1 converges to some W ∞ we have µ n ({u k }) → w k W∞ as n → ∞, and in this case it is easy to verify that µ n weakly converges to some limit µ which is such that In this case µ(U) = 1 and so µ is carried on U. From now on, let us assume that W n → ∞ as n → ∞. In this case we have µ n ({u k }) → 0 as n → ∞. Now denote for every integers n, k ≥ 1, the proportion of the total mass above vertex u k at time n. Remark that this quantity evolves as the proportion of red balls in a time-dependent Pólya urn scheme with weights (w i ) i≥k+1 , see [23], starting at time k with W k−1 black balls and w k red balls 2 . In particular, for all n ≥ k, Hence for all k ≥ 1, the sequence (M ∞ . Also, for any u ∈ U that does not receive a label in the process, the sequence (µ n (T (u))) n≥1 (and also (µ n ({u})) n≥1 ) is identically equal to zero. Hence we have convergence of (µ n ({u})) n≥1 and (µ n (T (u))) n≥1 for all u ∈ U.
The last step in order to prove the weak convergence of (µ n ) n≥1 is to prove that the quantities that we obtain in the limit indeed define a probability measure on U. If for all u ∈ U we have then it entails that µ n → n→∞ µ, where µ is the unique probability measure on U such that for all u ∈ U, µ({u}) = 0 and µ(T (u)) = lim n→∞ µ n (T (u)).
2 Those numbers of balls are not required to be integers.
For any u / ∈ {u 1 , u 2 , . . . }, the equality (16) is immediate, so let us prove it for all u k for k ≥ 1. For any n, k, i ≥ 1, let Using what we just proved, we know that for any k, i, the quantity M (k,i) n almost surely converges as n → ∞ to some limit M (k,i) ∞ . Proving (16) reduces to proving that for any k ≥ 1, we almost surely have M is non-negative and non-increasing, hence it converges, so it suffices to prove that its limit is 0 almost surely.
We define τ (k,i) := inf {n ≥ 1 | u n = u k i}, the time when the vertex u k receives its i-th child in the growth procedure. Remark that after this random time, the process (M (k,i) n ) n≥τ (k,i) is a martingale because again, it evolves as the proportion of red balls in a time-dependent Pólya urn scheme, starting with w k red balls and W τ (k,i) blacks balls.
in L 1 , so its almost sure limit is also 0. In the end, by Lemma 6, the sequence of measures (µ n ) almost surely converges weakly to a limit µ, and this measure only charges the set ∂U.
wn Wn Proof. For any k ≥ 1 the process (µ n (T (u k )) n≥k follows a so-called time-dependent Pólya urn scheme with weights (w n ) n≥k+1 . By the work of Pemantle in [22], if we assume ∞ n=1 wn Wn 2 = ∞ then the limiting proportion µ(T (u k )) almost surely belongs to the set {0, 1}. This translates into the fact that µ(T (u)) ∈ {0, 1} almost surely for any u ∈ U, which entails that µ is almost surely carried on one leaf of ∂U.
On the contrary, let us suppose that ∞ n=1 wn Wn 2 < ∞ and prove that this entails that the limiting measure µ is diffuse almost surely. Consider the function (· ∧ ·) : U × U → U which associates to each couple (u, v) their most recent common ancestor u ∧ v in the completed tree U. This function is continuous with respect to the distance d. Then, since µ n → µ almost surely, we also have the almost sure weak convergence Let us fix n ≥ 1 and let D n and D ′ n be two independent vertices taken under µ n , conditionally on the tree T n . Then, the proof of [10, Lemma 3.8] ensures that Note that the obtained sequence (p k ) k≥1 is a probability distribution, which thanks to the weak convergence (17) corresponds to the (annealed) distribution where D ∞ and D ′ ∞ are two independent points taken under the measure µ, conditionally on µ. Now we can write where the inequality is due to the fact that the vertices u 1 , u 2 , . . . , u k have a height smaller than k.
So, almost surely, two points taken independently under µ are different, and this ensures that µ is diffuse.
In the end, we just finished the proof of Theorem 4.
Other sequences of measures We also study two other sequences of measures (η n ) and (ν n ) carried on the Ulam tree U. For every n ≥ 2, these measures only charge the vertices {u 1 , u 2 , . . . , u n } in such a way that for any 1 ≤ k ≤ n, where (b n ) n≥1 is a sequence of real numbers such that b 1 > −1 and b n ≥ 0 for all n ≥ 2. We write B n := n k=1 b k . We suppose that B n = O(n) and that there exists ǫ > 0 such that b n = O n 1−ǫ . The assumptions on the sequence (b n ) n≥1 are chosen such that they are satisfied by a sequence (a n ) n≥1 of initial fitnesses that satisfies (H c ) for some c > 0. For the proof of this proposition, we are going to use Lemma 6 again, using appropriate martingales in order to handle the evolution of the measure of the subtree descending from every vertex u ∈ U. We treat the two sequences of measures separately.
The degree measure. Consider the sequence (η n ) n≥1 on U. Since the sequence (W n ) n≥1 tends to infinity, we have η n ({u}) → 0 for every u ∈ U. Indeed, using the equality in distribution (12) and Lemma 32 in the appendix, it is easy to see that either As in the preceding case, for all k ≥ n we let Conditionally on T n , with probability M (k) n , the vertex u n+1 is grafted onto T (u k ) and with complementary probability, it is not. So , then the last computation shows that is a martingale for the filtration generated by (T n ) n≥1 . More precisely we can write Then, using [9, Theorem 1], we get that if ∞ as n → ∞. In our case, we can verify that (18) holds. Indeed, using the fact that we assumed that B n = O(n) and b n+1 = O n 1−ǫ , we have which is summable under our assumptions. In the end, using Lemma 6, we have the almost sure convergence η n −→ µ weakly.
The uniform measure on the vertices of T n . Consider the sequence (ν n ) on U. Fix , which tends a.s. to some limit µ(T (u k )) as i → ∞. Using Lemma 32 in the appendix, we have In both cases we get ν n (T (u k )) → n→∞ lim i→∞ p i = µ(T (u k )) almost surely. We also have for any k ≥ 1, so we can conclude using Lemma 6 that almost surely ν n → n→∞ µ weakly.

Height and profile of WRT
The main goal of this section is to prove Theorem 3 which gives asymptotics for the profile and height of the tree. Recall that we denote the number of vertices at height k in the tree T n . In order to get information on the sequence of functions (k → L n (k)) n≥1 we study their Laplace transform where the last expression is given using an integral against the probability measure ν n defined in Section 2.2 as the uniform measure on the vertices of T n . The key result in our approach is to prove the convergence of this sequence of analytic functions when appropriately rescaled, uniformly in z on an open neighbourhood of 0 in the complex plane. It then allows us to use [18, Theorem 2.1] and hence derive a convergence result for the profile. We actually start in Section 3.1 by studying the convergence of the similarly defined sequence of functions where we integrate with respect to the weight measure µ n instead of the uniform measure ν n as before. This one is easier to study because for every fixed z ∈ C, it defines a martingale as n grows, up to some deterministic scaling. Then in Section 3.2, we make use of this first convergence and show that up to some deterministic multiplicative constant, the two sequences of integrals appearing in (19) and (20) are almost surely equivalent when n tends to infinity.
We work under some technical assumption for the sequence w. Let us fix γ > 0 and suppose from now on that the w satisfies the assumption ( p γ ) for some p ∈ (1 , 2], i.e. We let φ : z → γ(e z − 1) be a function of a complex parameter z and let z → N n (z) be the following rescaled version of the Laplace transform of the profile The proposition below ensures that the sequence (z → N n (z)) n≥1 converges uniformly on all compact subset of some open domain D ⊂ C to some limiting function z → N ∞ (z) which does not vanish anywhere on the set D ∩ R, along with some more technical statements.
Proposition 9. Suppose that the weight sequence w satisfies ( p γ ) for some γ > 0 and some p ∈ (1 , 2]. Then there exists an open connected domain D ⊂ C such that D ∩ R = (z − , z + ) with z − < 0 and z + is the largest real solution of the equation γ(ze z − e z + 1) − 1 = 0 and such that the following properties are satisfied.
(i) With probability 1, the sequence of random analytic functions (z → N n (z)) n≥1 converges uniformly on all compact subsets of D, as n → ∞, to some random analytic function z → N ∞ (z) which satisfies P (N ∞ (z) = 0 for all z ∈ (z − , z + )) = 1.
(ii) For every compact set K ⊂ D and r ∈ N, we can find an a.s. finite random variable C K,r such that for all n ∈ N, (iii) For every compact set K ⊂ (z − , z + ), every 0 < a < π and r ∈ N, Under the results of Proposition 9 we can apply [18, Theorem 2.1] whose conclusions for the sequence (k → L(k)) n≥1 are the following. For any k ≥ 0, n ≥ 1 and z ∈ (z − , z + ), we denote Then, for every integer r ≥ 0 and every compact set K ⊂ (z − , z + ), we have the convergence where for all j ≥ 0, the (random) functions G j (x, z) are polynomials of degree at most 3 in x and are entirely determined from φ and N ∞ , with G 1 = 1, see [18] for their complete definition. The asymptotics (7) and (8) stated in Theorem 3 follow from the last display. Indeed, (7) is obtained by letting r = 0 and z = 0 and using the fact that N ∞ (0) = 1 almost surely. For (8), we let r = 0, and use k = ⌊γe z log n⌋. In Section 3.3, we complete the proof of Theorem 3 by computing the asymptotic behaviour of the height of the tree. Since the convergence of the profile already ensures that there almost surely are vertices at height γe (z+−ǫ) log n for ǫ > 0 small enough and all n large enough, it suffices to prove a corresponding upper-bound in order to finish proving the convergence (9) in Theorem 3.

Study of the Laplace transform of the weighted profile
We study the sequence z → n i=1 wi Wn e z ht(ui) n≥1 . The following lemma is the starting point of our analysis.
Lemma 10. For all z ∈ C and all n ≥ 1, we have Proof. Recall that conditionally on T n , the n Taking conditional expectation with respect to T n yields: This concludes the proof.
Let J be an integer that we are going to fix later on. The last result ensures that if z ∈ C is such that ∀i ≥ J, 1 + (e z − 1) wi Wi = 0, then we can define for all n ≥ J and the sequence (M n (z)) n≥J is a martingale. We want to prove results about the asymptotic behaviour of (z → M n (z)) n≥J , uniformly in z on an appropriate domain. If J is fixed, then there exists parameters z with Im(z) = π mod 2π for which the sequence (C n (z)) n≥J takes the value 0. Due to our assumption ( γ ) on the sequence w, we know that wn Wn → 0 as n → ∞. If we restrict ourselves to a domain of the form {z ∈ C | Re(z) < x} for some x > 0, then hence it suffices to take J large enough in order for the sequence (C n (z)) n≥J to only take non-zero values for all z ∈ {ξ ∈ C | Re(ξ) < x} and all n ≥ J. In what follows we work on the domain where z + is as defined in Proposition 9. Using the preceding discussion, we fix J ≥ 1 such that the sequence z → (C n (z)) n≥J does not have any zero on E , so that z → (M n (z)) n≥J is well-defined for all z ∈ E . We introduce the following notation. Let F (z, n) and G(z, n) be two functions of a complex parameter z and an integer n ∈ N. For D ⊂ C a domain of the complex plane we write to express the fact that F (n, z) is a big (resp. small) o of G(n, z) as n → ∞, uniformly on all compact K ⊂ D. Now, let us derive some information on the asymptotic behaviour of C n (z).
Lemma 11. Suppose that w satisfies ( γ ). Then there exists ǫ > 0 and an analytic function Remark that the lemma implies that for any z ∈ E , we have satisfies the same asymptotics up to a constant, as soon as z is such that E [M J (z)] = 0.
Before proving the lemma, we state the following result which follows from elementary calculus. Its proof can be found in the appendix.
Proof of Lemma 11. For any z ∈ C \ (−∞ , −1] we write Log(1 + z) for a complex determination of the logarithm which coincides with ∞ i=1 (−1) n−1 z n near 0. If we let is summable in i and the rest of the series is for some ǫ > 0, thanks to Lemma 12. Then we write which yields using (23) and Lemma 12 and c(z) is an analytic function of z, which finishes the proof.
Convergence of the martingales (M n (z)) n≥1 . When the parameter z is a positive real number, the sequence (M n (z)) n≥1 is a positive martingale and so it converges almost surely to some limit. We want to prove that these martingales converge almost surely and in L 1 for the largest possible range of parameters z. We recall that the weight sequence w satisfies ( p γ ) for some fixed parameters γ > 0 and p ∈ (1 , 2]. We align our notation with the one used in [8,Theorem 2.2] which states something similar to our forthcoming Proposition 14 for another model, the binary search tree.
For any z ∈ E and q ∈ (1 , p], we let For any q ∈ (1 , p], let V q = {z ∈ E | g(z, q) < 0}, and denote The proof of the proposition will follow from the next lemma, together with Lemma 34, stated in the appendix.
Lemma 15. For any q ∈ (1 , p] we have and also Proof. For any q ∈ (1 , p] and n ≥ J, we write Taking the q-th power of the modulus on both sides and using the inequality |a + b| q ≤ 2 q |a|+ 2 q |b|, we get Using Lemma 33 in the appendix, we have for any n ≥ J, Using the last display and equation (27), we get a recurrence inequality of the form where a n (z) = 2 2q C n (z) Applying (28) in cascade we get Now notice that from our assumption on the sequence (w n ) n≥1 we have a n (z) = 2 2q C n (z) On the other hand, thanks to Lemma 11 we have We conclude using the following lemma which is an application of Hölder inequality using the assumption ( p γ ) Together with (30), this proves that (a n (z)) n≥1 is summable and so (1) . Replacing this in (29) finishes to prove (25). In order to prove (26), we use Lemma 33 again and write which finishes the proof of the lemma.
Proof of Proposition 14. Any compact K ⊂ V q can be covered by a finite number of V q . The convergence result is then an application of Lemma 34, on the domain V q with α(z) = 0 and, say δ(z) = − 1 2 g(z, q) > 0. The limiting function is analytic as a uniform limit of analytic functions.
Zeros of the limit. Now that we have proved that their exists a limiting function z → M ∞ (z) defined on the domain V , we are interested in the possible location of the zeros of this random function. In fact, the function z → M ∞ (z) is related to the function z → N ∞ (z) of Proposition 9, for which we aim to prove that it has almost surely no zero on some real interval (z − , z + ) which contains 0. We will prove a similar result for z → M ∞ (z) in Lemma 18, and we start by proving the following weaker statement.
Lemma 17. For all z ∈ V ∩ R, we have almost surely M ∞ (z) > 0. As a consequence, the number of zeros on every compact of V is almost surely finite.
Proof. This follows from an application of Kolmogorov's 0−1 law. Indeed, fix N ≥ J and z ∈ V ∩R and for all n ≥ N , let TN ) .
Now remark the following: n (z)) n≥N is a positive martingale which satisfies the same assumptions as M n (z) so it converges a.s. and in L 1 towards a non-negative limit, M (N ) , is independent of the N first steps of the construction, coded by the vector (K 2 , . . . , K N ).
Using all these observations we deduce that for any N ≥ J we have the equality of events This proves that {M ∞ (z) > 0} is mesurable with respect to the tail σ-algebra generated by the sequence (K 2 , K 3 , . . . ), which are independent, and has hence probability 0 or In order to prove this lemma, we use an argument of self-similarity: essentially, if we take two vertices u i and u j in the tree, then conditionally on the sequences of vertices that are grafted above u i or above u j , the subtrees above u i and u j evolve as two independent weighted recursive trees. Using Proposition 14 and Lemma 17, the normalized Laplace transform of the weighted profile of each of those two subtrees should converge to some random analytic function which is non-negative on V ∩ R and has at most countably many zeros. Since the two are independent, their zeros should not overlap and hence the sum of their contribution should result in a function that is positive on V ∩ R.
Proof. Let us formalise this line of reasoning. Using Theorem 4, we know that the measure µ on ∂U is almost surely diffuse, hence we can define and they are almost surely finite. Thanks to Theorem 4 again, we know that for any k ≥ 1 we have the convergence ν n (T (u k )) → µ(T (u k )). In fact, the slightly stronger statement holds and we prove it at the end of the section. Let us consider the sequences 1 {u I (j) un} n≥1 for j = 1, 2, which record the times when a vertex is added to T (u I (1) ) or T (u I (2) ), and work conditionally on them. Thanks to our definition of I (1) and I (2) , we know that the number of vertices in each of those subtrees will grow linearly in time (in particular, they go to infinity). We let ∀n ≥ 1, N (j) n := nν n (T (u I (j) )) = n i=1 1 {u I (j) ui} and ∀k ≥ 1, τ (j) which record respectively the number of vertices in T (u I (j) ) at time n and conversely, the time when the k-th vertex is added. To ease notation we let w (j) k . Let us state the following intermediate result, which we will prove at the end of the section.
Lemma 20. For j = 1, 2 the sequences (w (j) k ) k≥1 almost surely satisfy ( p γ ). Recall the discussion before Lemma 10 and fix J ′ ≥ 1 such that for j = 1, 2, for all k ≥ J ′ and for all z ∈ E we have 1 + (e z − 1) Then we can define for j = 1, 2 and k ≥ J ′ , Conditionally on the sequences 1 {u I (j) un} n≥1 for j = 1, 2, these processes are the martingales associated to the weighted profile of the tree T (j) , and the sequences (T (j) k ) k≥1 for j = 1, 2 are independent weighted recursive trees with respective weight sequence (w (j) k ) k≥1 . We know thanks to Proposition 14 that these two sequences of functions converge to analytic limits on the domain V . In addition, thanks to Lemma 17, their almost sure limit z → M Using Lemma 11, we have almost surely for j = 1, 2, Using the asymptotics N Wn = µ n (T (u I (j) )) = µ(T (u I (j) ))(1 + O(n −ǫ )), this entails that for j = 1, 2, uniformly on all compact included in V the function z → Cn(z) converges to some analytic function z → A j (z), which only takes positive values on V ∩ R. Then we write If we condition on the location of the (at most) countable number of zeros z 1 , z 2 . . . of M ∞ is independent of z 1 , z 2 . . . , we have M Proof of Lemma 19 and Lemma 20. Recall the proof of Theorem 4. For all k ≥ 1 the process (µ n (T (u k ))) n≥k is a martingale and almost surely we have |µ n+1 (T (u k )) − µ n (T (u k ))| ≤ wn+1 Wn+1 , hence using Lemma 33 we get Using then Lemma 34 with q = p and α = 0 and δ = (p − 1)/2 yields |µ n (T (u k )) − µ(T (u k ))| = O(n −ǫ ) for some ǫ > 0. For the second one, we consider the process nν n (T (u k )) − n i=k+1 µ i (T (u k )) n≥k . It is easy to verify that this process is a martingale for its own filtration and that its increments are bounded by 1. Using again Lemma 34 with q = 2 and α = 1 and δ = 1, we get n −1 nν n (T (u k )) − n i=k+1 µ i (T (u k )) = O(n −ǫ ), which is enough to conclude for Lemma 19.
By definition of I (1) and I (2) , the limits µ(T (u I (j) )) are positive, so, using the preceding dicussion, we can write N (j) n = nν n (T (u I (j) )) ⊲⊳ cst ·n and also µ n (T (u I (j) )) ⊲⊳ cst, with positive constants. Using the definition of τ (j) n we can check that this entails that τ (j) n ⊲⊳ cst ·n almost surely. Using ( γ ), by composition we get with a positive constant. Then we write, for j = 1, 2 where the last inequality is due to the linear growth of τ (j) n and the fact that w satisfies ( p γ ).

From the weighted to the unweighted sum.
Now we want to transfer these results of convergence to the Laplace transform of the real profile.
In this aim, we introduce the following quantity, for n ≥ J, The goal of this subsection is to show that the quantity X n (z) is negligible as n → ∞ compared to any of the two terms in the difference, for z is contained in some domain. This way we will transfer the asymptotics that we have proved for M n (z) and C n (z) in the last section to asymptotics for N n (z), which is the quantity that we want to study in the end. Recall the definition of z + and z − in (6). Let us define the domain D to which we refer in the statement of Proposition 9 In this way D is a connected domain of C and D ∩ R = (z − , z + ). Indeed, recall from Lemma 13 is an open interval which contains 0 and has z + as its right bound. Now just check that {z ∈ R | 1 + Re(φ(z)) > 0} = (z − , ∞) and that z − ∈ I γ .
For technical reasons, we also introduce on which the process (z → M n (z)) n≥J , and hence also (z → X n (z) n≥J , is well-defined. Let us further decompose D ′ into a union of open sets Lemma 21. The process (X n (z)) n≥J is a martingale with respect to the filtration generated by (T n ) n≥1 . Furthermore, for all q ∈ (1 , p], Proof. This process is of course (σ(T n ))-adapted and integrable. For the martingale property we compute For z ∈ E and q ∈ (1 , p], we make the following computation, using Lemma 11 and Lemma 15, and the last exponent reduces to q Re φ(z) ∨ φ(q Re z) because (q Re φ(z) + g(z, q)) = φ(q Re z) + 1 − q < φ(q Re z). Hence, using Lemma 33 which finishes the proof of the lemma. (ii) For all compact K ⊂ D ′ , there exists ǫ(K) > 0 such that Proof. For the first one, for any q ∈ (1 , p] we can apply Lemma 34 on the domain V q ∩ {z ∈ C | 1 + Re(φ(z)) > 0} with α(z) = 1 + Re(φ(z)) > 0 and δ(z) = min(q − 1, −g(z, q)) > 0, thanks to Lemma 21. Then using the compactness property, (i) is true for every compact K ⊂ D.
We can now prove Proposition 9.
Proof of Proposition 9. Let us start by proving simultaneously that N ∞ (z) = e z+c(z) 1+φ(z) M ∞ (z) and both point (i) and (ii) of the proposition. For K ⊂ D compact and z ∈ K, we write The first term is O K n −ǫ(K) thanks to Lemma 22(i). We bound the second one by the following quantity .
We then use respectively Lemma 23 and then Lemma 11 together with Proposition 14 to prove that the first and the second term of the last display are O K n −ǫ(K) . The limiting function N ∞ (z) is analytic as a uniform limit of analytic functions and has almost surely no zero on (z − , z + ) because of Lemma 18. For (iii), let us prove the stronger statement: for any compact set K ⊂ (z − , z + ) and 0 < a < π, there exists ǫ(K, a) > 0 such that almost surely, For this, we write We apply points (ii) and (iii) of Lemma 22 to the compact K ×[a , π] and get the desired bound.

Height of the tree
In this section, we study the behaviour of the height ht(T n ) of the tree T n , which is defined as the maximal height of the vertices of T n , i.e. ht(T n ) = max 1≤k≤n ht(u k ).
We start by showing that under the assumption ( p γ ) we have the convergence (9). Then, for the sake of completeness, we also study the simpler case where log n = o n i=1 wi Wi .
One key argument in our proofs is the following equality for the annealed moment generating function of the height of u k , for any fixed k ≥ 1, which can be seen as a corollary of Lemma 10 Some elementary computations using the Chernoff bound and the last display yield the following lemma.
Lemma 24. Suppose that the sequence of weights w satisfies lim sup Then almost surely we have where z + (u) is the unique positive root of u(ze z − e z + 1) − 1 = 0.
Proof. Using the expression (32) for the moment generating function of ht(u n ) we get, for any where we use the inequality (1 + x) ≤ e x and the assumption on w. Then, for any z > 0 and n ≥ 1, P (ht(u n ) ≥ ue z log n) ≤ e −uze z log n E e z ht(un) ≤ exp (−u log n(ze z − e z + 1 + o(1))) If we take z > 0 such that u(ze z − e z + 1) > 1 then the right-hand-side is summable and hence using the Borel-Cantelli lemma shows that for all n large enough, we have ht(u n ) ≤ ue z log n. Letting z ց z + (u), we get the result.
Let us prove the last claim of Theorem 3. Here we suppose that the weight sequence w satisfies ( p γ ) for some γ > 0 and some p ∈ (1 , 2]. Proof of Theorem 3. Recall the asymptotics (8) in Theorem 3. It ensures that there almost surely exists vertices at height γe z log n, for any z ∈ (z − , z + ). Hence the height of the tree T n satisfies lim inf n→∞ ht(T n ) log n ≥ γe z+ .
For the limsup, we use Lemma 25 with u = γ (this is justified by Lemma 12), which yields lim sup n→∞ ht(Tn) log n ≥ γe z+ .
To finish the section, we state a proposition. Proof. For the upper-bound, we proceed as above. For any ǫ > 0 and z > 0: If we choose z > 0 close enough to 0 then the last display is summable, due to our assumption on f . This implies using the Borel-Cantelli lemma that lim sup n→∞ ht(Tn) f (n) ≤ 1 + ǫ almost surely, for any fixed ǫ > 0. Then we let ǫ ց 0.
For the lower-bound, we use the fact that we can construct jointly with (T n ) n≥1 a sequence (D n ) n≥1 such that ∀n ≥ 1, D n ∈ T n , increasing for the genealogical order and such that, as a sequence, we have  [20,Corollary 8]. Using the law of large numbers, we get that almost surely ht(D n ) ∼ n i=2 wi Wi = f (n) as n → ∞. Since ht(T n ) ≥ ht(D n ), this proves the lower-bound and finishes the proof.

Preferential attachment trees are weighted recursive trees
In this section, we study preferential attachment trees with initial fitnesses a as defined in the introduction. First, in Section 4.1, we prove Theorem 1 which allows us to see them as weighted random trees WRT(w a ) for some random weight sequence w a . Then in Section 4.2 we prove Proposition 2 which relates the asymptotic behaviour of w a to the behaviour of a. Finally, in Section 4.3 we prove Proposition 27, which ensures that the sequence m a obtained as the scaling limit of the degrees can be expressed as the increments of a Markov chain.

Coupling with a sequence of Pólya urns
Here we fix an arbitrary sequence a such that a 1 > −1 and ∀n ≥ 2, a n ≥ 0. Let us recall the notation, for n ≥ 0, with the convention that A 0 = 0. We consider a sequence of trees (P n ) n≥1 evolving according to the distribution PA(a) and we want to prove Theorem 1, namely that there exists a random sequence of weights w a for which the sequence evolves as a WRT(w a ). The proof uses a decomposition of this process into an infinite number of Pólya urns. This is very close to what is used in the proofs of [2, Theorem 2.1] or [5, Section 1.2] in similar settings. The novelty of our approach is to express this result using weighted random trees, since it allows us to apply all the results developed in the preceding section.
The quantities X(n) and Total(n) represent respectively the number of red balls and the total number of balls at time n in a urn containing red and blacks balls, in which we add a ball at each time, the colour of which is chosen at random proportionally to the current proportion in the urn. Starting at time 0 from the state (a, a + b), i.e. with a red balls and b black balls, it is well-known that the sequence (∆X(n)) n≥1 = (X(n) − X(n − 1)) n≥1 of random variables is exchangeable, and an application of de Finetti's representation theorem ensures that it has the same distribution as i.i.d. samples of Bernoulli random variables with a random parameter β, which has distribution Beta(a, b), where we use the convention that Beta(a, b) = δ 1 if b = 0.
Nested structure of urns in the tree. For all k ≥ 1 we define the following process in the "total fitness" of the vertices {u 1 , u 2 , . . . , u k }, for which we remark that for any k ≥ 1 we have Imagine that P n is constructed and we add a new vertex u n+1 to the tree. We choose its parent in a downward sequential way: • we first determine whether the parent is u n , this happens with probability • then with the complementary probability Wn−1(n) Wn(n) it is not, so conditionally on this we determine whether it is u n−1 , this happens with (conditional) probability a n−1 + deg + Pn (u n−1 ) W n−1 (n) = 1 − W n−2 (n) W n−1 (n) .
• then with the complementary probability Wn−2(n) Wn−1(n) it is not, etc... We continue this process until we stop at some u i . Now let us fix k ≥ 1 and introduce the following time-change: for all N ≥ 0, we let be the N -th time that a vertex in attached on one of the vertices {u 1 , . . . , u k+1 }. Remark that it can be the case that θ k (N ) is not defined for large N , if there is only a finite number of vertices attaching to {u 1 , . . . , u k+1 }. Let us ignore this possible problem for the moment, and only consider bounded sequences a, for which this will almost surely not happen. In this case for all N ≥ 0 we set Now, the two following facts are the key observations in order to prove Theorem 1: (i) for all k ≥ 1, the process Urn k = (Urn k (N )) N ≥0 has the distribution of a Pólya urn starting from the state (A k + k, A k+1 + k), (ii) those process are jointly independent for k ≥ 1.
Point (i) already follows from the discussion above. A moment of thought shows that (ii) holds as well: of course the processes (W k (n), W k+1 (n)) n≥k+1 for different k are not independent at all but the point is that they only interact through the time-changes (θ k (·), k ≥ 1).
Reversing the construction and using the exchangeability. Using de Finetti's theorem and points (i) and (ii), each of the processes Urn k can be produced by sampling β k ∼ Beta(A k + k, a k+1 ) and adding a red ball at each step independently with probability β k and a black ball with probability 1 − β k . This is of course done independently for different k ≥ 1.
In terms of our downward sequential procedure defined above for finding the parent of each newcomer, it amounts to saying that each time that we have to choose between attaching to u k+1 or attach to a vertex among {u 1 , . . . , u k }, the former is chosen with probability 1−β k and the latter with probability β k . Let us verify that the law of (P n ) n≥1 conditionally on the sequence (β k ) k≥1 can indeed be expressed as WRT with the random sequence of weights w a defined in Theorem 1, which is defined from the sequence (β k ) k≥1 as, and w a n = W a n − W a n−1 , with the convention that W a 1 = 1 and W a 0 = 0. Let us reason conditionally on the sequence (β k ) k≥1 (or equivalently the sequence (w a n ) n≥1 ). When determining the parent of u n+1 , we successively try to attach to u n , u n−1 , . . . until we stop at some u k . Using the independence, we get Remark that the above construction is still valid without the assumption that the sequence a is bounded, and hence Theorem 1 is proved.

Proof of Proposition 2
Let (W a n ) n≥1 be the random sequence of cumulated weights defined Theorem 1, whose distribution depends on a sequence a of initial fitnesses, and is expressed using a sequence of independent Beta-distributed random variables (β k ) k≥1 . We are going to prove Proposition 2, which relates the growth of (W a n ) n≥1 to the one of (A n ) n≥1 . In this proof, we omit the subscript a for readability.
Proof of Proposition 2. As in [15], we introduce It is easy to see that X n is a positive martingale, hence it almost surely converges to a limit X ∞ as n → ∞. Now, using the fact that the (β n ) n≥1 are independent and that the expectation of a random variable with Beta(a, b) distribution distribution has q-th moment, for q ≥ 0, we can compute Now from our hypotheses on the sequence (A n ), we have for all k ∈ 0 , p − 1 n + A n + k − 1 = n→∞ (c + 1)n + O n 1−ǫ and so 1 n + A n + k − 1 = n→∞ 1 (c + 1)n + O n −1−ǫ .
In the end, since where C p is a positive constant which depends on the sequence a and p. This entails that, under our assumptions, for any p ≥ 1, we have E [X p n ] → C p /C p 1 as n → ∞, which shows that this martingale is bounded in L p for all p ≥ 1 and hence it is uniformly integrable. Consequently, it converges a.s. and in L p to a limit random variable X ∞ , with moments determined by Furthermore, we have Since β n ∼ Beta(n + A n , a n+1 ), we get: Using equation (40), equation (41), Lemma 33 and summing over n ≤ k ≤ 2n − 1 we get that E (X 2n − X n ) 2 = O(n −ǫ ). Using Lemma 34, we get, for some ǫ > 0, Since β i > 0 almost surely for every i ≥ 1, the event {X ∞ = 0} is a tail event for the filtration generated by the β i and has probability 0 or 1. In the end, it has probability 0 because E [X ∞ ] = 1. We deduce that Hence, we have, Whenever a n ≤ n c ′ +o(1) as n → ∞, we can show the following (we postpone the proof to the end of the section) Lemma 26. For any δ > 0, we have Since the last quantity is summable in k we can use the Borel-Cantelli lemma (and a sequence of δ going to 0) to show that almost surely 1 − β k ≤ k −1+c ′ +o(1) as k → ∞. This finishes to prove the proposition, because we can write (1) .
We finish by giving a proof of Lemma 26.
Proof of Lemma 26. Let x > 0 and y > 1 and let β be a random variable with Beta(x, y) distribution. Then for any z ∈ [0 , 1] we have, using the explicit expression of the density of β: For any two sequences (x n ) and (y n ) simultaneously going to infinity with x n = o(y n ), we have the following bound using Stirling's approximation: x n log(y n ).
Applying the above computations for (1 − β n ) ∼ Beta (a n+1 , A n + n), and using the assumptions on the sequence a, we get which is what we wanted.

The distribution of the limiting sequence
Recall the convergence of the degree sequence stated in Proposition 5. Thanks to what precedes, we know that if some sequence a satisfies (H c ) then the associated random sequence (w a n ) n≥1 satisfies W a n ∼ n→∞ Z · n c/(c+1) and so in this setting the convergence of degrees can be stated as where (m a n ) n≥1 = c+1 Z ·(w a n ) n≥1 . Remark that the random variable Z depends on the whole sequence (β n ) n≥1 used in the definition of (W a n ) n≥1 , so the sequence (M a n ) n≥1 can not be seen as an iterated product of independent random variables, which was the case for (W a n ) n≥1 . We will prove that this new process still has some nice properties.
Proposition 27. For any sequence a that satisfies the condition (H c ), the sequence (M a k ) k≥1 is a (possibly time-inhomogeneous) Markov chain such that for all k ≥ 1, M a k+1 is independent of β 1 , β 2 , . . . , β k . The fact that for all k ≥ 1 we have M a k = β k · M a k+1 with β k ∼ Beta(A k + k, a k+1 ) independent of M a k+1 characterises the backward transitions of the chain.
Proof. We follow the same steps as [15,Lemma 1.1]. Let us fix a sequence a that satisfies the hypotheses of the proposition and make the dependence on it implicit to ease notation. Recall from (38) the definition of C 1 and from (42) the definition of Z from X ∞ . We have It then follows that we can write, for k ≥ 1, , which ensures that M k+1 is independent of β 1 , β 2 , ..., β k . The limit in the last equality exists almost surely thanks to the results of the preceding section. Now we prove the Markov property of the chain. Let k ≥ 1. Because of the definition of the chain as a product, the distribution of M k+1 conditional on the past trajectory M 1 , M 2 , . . . , M k is the same as the distribution of M k+1 conditional on M k , β 1 , . . . , β k−1 . Since M k+1 = β −1 k · M k and that β k and M k are both independent of β 1 , . . . , β k−1 , this conditional distribution corresponds to the one of M k+1 conditional on the present state of the chain M k .
Computing the moments. In some cases where the sequence a is sufficiently regular, we can compute explicitly every moment of the random variable M a k for every k ≥ 1. Indeed, using (39) and (43) and the independence, we get In general, if the collection (µ p ) p≥1 of p-th moments of some positive random variable satisfies the so-called Carleman's condition: = ∞, then its distribution is uniquely determined from those moments.

Examples and applications
In this section, we compute the explicit distribution of (M a n ) for some particular sequences a. We apply this result to another model of preferential attachment.

The limit chain for particular sequences a
As stated in the preceding section, we can compute the distribution of M a k for some fixed k by the expression of its moments (44), provided that they satisfy Carleman's condition. Knowing these distributions and the backward transitions given in Proposition 27 then characterises the law of the whole process. For two particular examples, this law has a nice expression.
Proposition 28. In the two following cases, the distribution of the chain (M a n ) is explicit.
has the Product Generalised Gamma distribution PGG (a, ℓ, m).
We will prove the two points of this proposition in separate subsections. The proper definitions of the distributions to which we refer are given along the proof. For the rest of the section, we drop the superscript a and write (M n ) n≥1 .

Mittag-Leffler Markov chains
Let us study the case where the underlying preferential attachment tree has a sequence of initial fitnesses a that are of the form (a, b, b, b, . . . ). We start by recalling the definitions of Mittag-Leffler distributions and Mittag-Leffler Markov chains and introduced in [15], and also studied in [17].
Mittag-Leffler Markov Chains. For any 0 < α < 1 and θ > −α, we introduce the (a priori) inhomogenous Markov chain (M α,θ n ) n≥1 , the distribution of which we call the Mittag-Leffler Markov chain of parameters (α, θ), or MLMC(α, θ). This type of Markov chain was already defined in [15], for some choice of parameters α and θ. It is a Markov chain such that for any n ≥ 1, and the transition probabilities are characterised by the following equality in law: . These chains are constructed (for a some values of θ depending on α) in [15]. In fact, our proof of Proposition 28(i) ensures that these chains exists for any choice of parameters 0 < α < 1 and θ > −α. Let us mention that the proof of [15, Lemma 1.1] is still valid for the whole range of parameters 0 < α < 1 and θ > −α, which proves that these Markov chains are in fact time-homogeneous.
The limiting Markov chain is a Mittag-Leffler. Recall the definition of the sequence (β k ) k≥1 and their respective distributions β k ∼ Beta(A k + k, a k+1 ). From our assumptions on the sequence a we have for all k ≥ 1, Proof of Proposition 28 (i). For p ≥ 1, we can make the following computation, using (37), one change of indices and several times the property of the Gamma function that for any z > 0 we have Γ (z + 1) = zΓ (z): Using Stirling formula, we can then compute the numbers C p introduced in (38), Using (44), the moments of M k are given, for any p ∈ N by the formula: These moments identify using (45) the distribution of M k for all k ≥ 1, From this, and the form of the backward transitions, we can identify (M k ) k≥1 as having a distribution MLMC 1 b+1 , a b+1 .

Products of generalised Gamma.
The following paragraphs aim at proving Proposition 28(ii). In the first paragraph we define the family of distributions of PGG-process. In the second one we prove that the distribution of (M k ) k≥1 belongs to this family whenever the sequence a is of the form assumed in Proposition 28(ii).
Construction of a PGG(a, ℓ, m)-process. For a > −1 a real number and ℓ, m ≥ 1 integers, we define the following. Let Z (q) i 0 ≤ q ≤ m − 1, i ≥ 1 be a family of independent variables with the following distribution: for all 0 ≤ q ≤ m − 1, where, for any k, θ > 0, the distribution Gamma(k, θ) has density x → x k−1 e − x θ θΓ(k) 1 {x>0} . Then for all k ≥ 1 we define G k as, We say that the process (G k ) k≥1 has the distribution Product of Generalised Gamma with parameters (a, ℓ, m) which we denote PGG(a, ℓ, m).
The limiting chain is a PGG. Fix ℓ ≥ 1 and m ≥ 1 some integers and suppose that the sequence a has the following form, meaning that for all j ≥ 0 we have a ℓ·j+1 = m, and a n = 0 whenever n − 1 is not a multiple of ℓ, and a 1 = a > −1.
Then we compute the following, using the properties of the Gamma function.
Using Stirling's approximation we get: Hence, recalling the definition of C p in (38), we get Then using (44) with c = m/ℓ, Using the last display and the fact that random variable with distribution Gamma(x, 1) has p-th moment equal to Γ(x+p) Γ(p) , we can identify the distribution of the marginals ℓ ℓ m+ℓ m+ℓ · N k for any k ≥ 1 with the ones of the process described in (48). The identification of the distribution of whole process ℓ ℓ m+ℓ m+ℓ · (N k ) k≥1 with a PGG(a, ℓ, m) is then obtained by checking that their backward transitions are the same.
Remark 29. For m = a = 1, the process (G k ) k≥1 has exactly the distribution of the points of a Poisson process on R + with intensity (ℓ + 1)t ℓ dt, listed in increasing order.
Remark 30. The distribution of G 1 coincides with the one proved in [1] for the limiting proportion of some periodic Pólya urn, which is not a surprise because the degree of the first vertex in the tree follows exactly the urn dynamic that they study (with completely different tools).

Applications to some other models of preferential attachment
Let us present here another model of preferential attachment which appears in the literature, for example in [21]. This model does not produce a tree as ours does, but we can couple them in such a way that some of their features coincide. We only focus on one particular model of graph here but the method presented here can adapt to other similar models.
A model of (m, α)-preferential attachment Let S be a non-empty graph, with vertex-set 1 } which have degrees (d 1 , . . . d k ), and m ≥ 2 an integer and α > −m a real number such that α + d i > 0 for all 1 ≤ i ≤ k. The model is then the following: we let G 1 = S. Then, at any time n ≥ 1, the graph G n+1 is constructed from the graph G n by: • adding a new vertex labelled v n+1 with m outgoing edges, • choosing sequentially to which other vertex each of these edges are pointed, each vertex being chosen with probability proportional to α plus its degree (the degree of the vertices are updated after each edge-creation).
The degree of a vertex in a graph refers in this section to the number of edges incident to it. Here the growth procedure in fact produces multigraphs, in which it is possible for two vertices to be connected to each other by more than one edge. In this case, all those edges contribute in the count of their degree.
We can couple this model to a preferential attachment tree with sequence of initial fitnesses a defined as: where w(S) := d 1 + d 2 + · · · + d k + kα.
Indeed, we can construct (T n ) with distribution PA(a). Then, for any n ≥ 1, consider the tree T 1+m(n−1) and for all 2 ≤ i ≤ n, merge together each vertex with initial fitness m + α together with the m − 1 vertices with fitness 0 that arrived just before it. If G 1 only contains one vertex, it is immediate that the obtained sequence of graphs has exactly the same distribution as (G n ) n≥1 . For general seed graphs S, we can still use the same construction and the obtained sequence of graphs has the same evolution as some sequence ( G n ) n≥1 which would be obtained from (G n ) n≥1 by merging all the vertices {v Note that a similar construction would also be possible if the initial degrees of the vertices v 2 , v 3 , . . . were given by a sequence of integers (m 2 , m 3 , . . . ) instead of all being equal to some constant value m. This is for example the case in the model studied in [11], where the initial degrees are random.
We have the following convergence for degrees of vertices in the graph, as n → ∞.
Furthermore, whenever α ∈ Z or m = 1 then the distribution of (N n ) n≥1 is explicit and given by: This result strengthens the one of [21, Theorem 1, Theorem 2 and Proposition 1] which corresponds (up to some definition convention) to the case α = 1 − m. We emphasize that the convergence here is almost sure in an ℓ p space.
Proof of Proposition 31. Using the coupling argument, we know that the sequence 1 ) − d 2 ) + · · · + (deg Gn (v almost surely in ℓ p for all p > 2 + α m , for some random sequence (N k ) k≥1 . In the case α ∈ Z or m = 1, Proposition 28 identifies the distribution of the limiting sequence. Last, the convergence of 1 ), . . . , deg Gn (v (k) 1 )) just follows from the classical result of convergence for the proportion of balls in a Pólya urn.

A Technical proofs and results
This appendix contains the proofs of technical results that are used throughout this paper. Let start by stating a useful conditional version of the Borel-Cantelli lemma.
Lemma 32. Let (F n ) be a filtration and let (B n ) n≥1 be a sequence of events adapted to this filtration. For all n ≥ 1, let p n := P (B n | F n−1 ). We have Proof. The first convergence is the content of Theorem 5.4.11 and the second one is an application of Theorem 5.4.9, both taken from [14].
The following lemma is a rewriting of [3, Lemma 1]. We provide the proof for completeness.
Lemma 33 ("Biggins' lemma"). Let (M n ) n≥1 be a complex-valued martingale with finite q-th moment for some q ∈ [1 , 2]. Then for every n ≥ 1 we have Proof. Let X n+1 := M n+1 − M n and let X ′ n+1 be a random variable such that conditionally on (M 1 , . . . , M n ) the random variable X ′ n+1 is independent of, and has the same distribution as X n+1 . Then where the first equality comes from the fact that E X ′ n+1 M 1 , . . . M n+1 = 0. The first inequality is the one of Jensen for conditional expectation, applied to the convex function z → |z| q . The second inequality is due to Clarkson, see [26,Lemma 1], and can be applied because the distribution of X n+1 − X ′ n+1 conditional on M n is symmetric and 1 ≤ q ≤ 2. The last inequality comes from the triangle inequality for the L q -norm.
Let us state another result about martingales, which we use numerous times throughout the paper. Recall our uniform big-O and small-o notation, introduced in (22).
Lemma 34. Suppose that (z → Z n (z)) n≥1 is a sequence of analytic functions on some open domain O ⊂ C, adapted to some filtration (G n ). Suppose that for every z ∈ O, the sequence (Z n (z)) n≥1 is a martingale with respect to the filtration (G n ). If there exists a parameters q > 1 and continuous functions α : O → R and δ : O → R * + such that for all n ≥ 1 we have then for any compact K ⊂ O, there exists ǫ(K) > 0 such that (i) if α > 0 on O we have n −α(z) · |Z n (z)| = O K n −ǫ(K) almost surely and also in expectation, (ii) if α ≤ 0 on O, the almost sure limit Z ∞ (z) exists for z ∈ O and we have n −α(z) · |Z n (z) − Z ∞ (z)| = O K n −ǫ(K) almost surely and also in expectation.
Proof of Lemma 34. By compactness, it is sufficient to prove the result for a small disk around each x ∈ K. Since O is an open set, let ρ > 0 be such that D(x, 2ρ) ⊂ O, where D(x, 2ρ) is the closed disk in the complex plane with centre x and radius 2ρ. We denote and choose ρ small enough so that α − α + 1 q δ > 0. Then if we let ξ : [0 , 2π] → C such that ξ(t) = x + 2ρe it , we have for any n and m, using Cauchy formula sup z∈D(x,ρ) |Z n (z) − Z m (z)| ≤ π −1 2π 0 |Z n (ξ(t)) − Z m (ξ(t))|dt.
• For α > 0 and n ≥ 1, let r ∈ N be such that 2 r ≤ n ≤ 2 r+1 and write The expectation of the right-hand side tends to 0 exponentially fast in r hence also almost surely, which proves point (i).
• For α ≤ 0, we write for any n let r be such that 2 r ≤ n ≤ 2 r+1 , E n −α(z) sup k≥n sup z∈D(x,ρ) and the last display converges exponentially fast to 0. So the function z → Z n (z) converges almost surely to some z → Z ∞ (z) uniformly on the disc, and point (ii) is satisfied.
Finally, let us give a proof of Lemma 12.
Proof of Lemma 12. Let ǫ > 0 and suppose that W n = cst ·n γ + O(n γ−ǫ ) as n → ∞. Then it is immediate that w n = W n+1 − W n = O(n γ−ǫ ). Then and the first point follows by summing over intervals of the type n2 k , n2 k+1 . Now write Since wi Wi → 0 as n → ∞, we get Putting everything together, we get Last, just remark that log W n = log(cst ·n γ · (1 + O(n −ǫ ))) = γ log n + cst +O(n −ǫ ) , which finishes the proof.