Random trees and applications

We discuss several connections between discrete and continuous random trees. In the discrete setting, we focus on Galton-Watson trees under various conditionings. In particular, we present a simple approach to Aldous' theorem giving the convergence in distribution of the contour process of conditioned Galton-Watson trees towards the normalized Brownian excursion. We also briefly discuss applications to combinatorial trees. In the continuous setting, we use the formalism of real trees, which yields an elegant formulation of the convergence of rescaled discrete trees towards continuous objects. We explain the coding of real trees by functions, which is a continuous version of the well-known coding of discrete trees by Dyck paths. We pay special attention to random real trees coded by Brownian excursions, and in a particular we provide a simple derivation of the marginal distributions of the CRT. The last section is an introduction to the theory of the Brownian snake, which combines the genealogical structure of random real trees with independent spatial motions. We introduce exit measures for the Brownian snake and we present some applications to a class of semilinear partial differential equations.


Introduction
The main purposes of these notes are to present some recent work about continuous genealogical structures and to point at some of their applications. The interest for these continuous branching structures first arose from their connections with the measure-valued branching processes called superprocesses, which have been studied extensively since the end of the eighties. Independently of the theory of superprocesses, Aldous [1], [2] discussed scaling limits of various classes of discrete trees conditioned to be large. In the case of a Galton-Watson tree with a finite variance critical offspring distribution and conditioned to have a large number of vertices, he proved that the scaling limit is a continuous random tree called the Brownian CRT. Moreover, this limiting continuous object can be coded by a normalized Brownian excursion, a fact that is reminiscent of the Brownian snake construction of superprocesses [25]. In the recent years, these ideas have been extended to much more general continuous trees: See in particular [12] for a discussion of Lévy trees, which are the possible scaling limits of critical or subcritical Galton-Watson trees.
Section 1 below is concerned with scaling limits of discrete trees. In fact, we do not really discuss limits of trees, but consider rather certain coding functions of these trees, namely the height function and the contour function (see Fig.1 below). Our main result (Theorem 1.8) states that the rescaled height function associated with a forest of independent (critical, finite variance) Galton-Watson trees converges in distribution towards reflected Brownian motion on the positive half-line. From this, one can derive a variety of analogous limit theorems for a single Galton-Watson tree conditioned to be large. The derivation is quite simple if the conditioning is "non-degenerate": For instance if the tree is conditioned to have height greater than n (resp. total progeny greater than n), the scaling limit of the height function will be a Brownian excursion conditioned to have height greater than 1 (resp. duration greater than 1). For degenerate conditionings, things become a little more complicated: The case of a Galton-Watson tree conditioned to have exactly n vertices, corresponding to Aldous theorem [2], is treated under an exponential moment assumption using an idea of Marckert and Mokkadem [31]. We briefly discuss some applications to various classes of "combinatorial" trees.
Although the limit theorems of Section 1 give a lot of useful information about asymptotics of discrete random trees, it is a bit unsatisfactory that they only discuss continuous limits for the coding functions of trees and not for the trees themselves. The formalism of real trees, which is briefly presented in Section 2, provides an elegant way of restating the limit theorems of Section 1 in terms of convergence of trees. The use of real trees for probabilistic purposes seems to be quite recent: See in particular [17] and [12]. In Section 2, we first discuss the coding of a (compact) real tree by a continuous function. This is of course a continuous analogue of the correspondence between a discrete tree and its contour function. This coding makes it possible to get a simple and efficient construction of the CRT and related random trees as trees coded by various kinds of Brownian excursions. As an application, we use some tools of Brownian excursion theory to derive the finite-dimensional marginals of the CRT, which had been computed by Aldous with a very different method.
Section 3 gives an introduction to the path-valued process called the Brownian snake and its connections with certain semilinear partial differential equations. The Brownian snakes combines the genealogical structure of the random real trees studied in Section 2 with spatial motions governed by a general Markov process ξ. Informally, each Brownian snake path corresponds to the spatial positions along the ancestral line of a vertex in the tree. The precise definition of the Brownian snake is thus motivated by the coding of real trees that is discussed in Section 2. In view of applications to PDE, we introduce the exit measure from a domain D, which is in a sense uniformly spread over the set of exit points of the Brownian snake paths from D. We then derive the key integral equation (Theorem 3.11) for the Laplace functional of the exit measure. In the particular case when the underlying spatial motion ξ is d-dimensional Brownian motion, this quickly leads to the connection between the Brownian snake and the semilinear PDE ∆u = u 2 . Up to some point, this connection can be viewed as a reformulation of (a special case of) Dynkin's work [13] about superprocesses. Although we content ourselves with a simple application to solutions with boundary blow-up, the results of Section 3 provide most of the background that is necessary to understand the recent deep applications of the Brownian snake to the classification and probabilistic representation of solutions of ∆u = u 2 in a domain [33].
The concepts and results that are presented here have many other recent applications. Let us list a few of these. Random continuous trees can be used to model the genealogy of self-similar fragmentations [20]. The Brownian snake has turned out to be a powerful tool in the study of super-Brownian motion: See the monograph [27] and the references therein. The random measure known as ISE (see [3] and Section 3 below), which is easily obtained from the Brownian snake driven by a normalized Brownian excursion, has appeared in asymptotics for various models of statistical mechanics [7], [21], [23]. There are similar limit theorems for interacting particle systems in Z d : See [5] for the voter model and coalescing random walks and the recent paper [22] for the contact process. Another promising area of application is the asymptotic study of planar maps. Using a bijection between quadrangulations and well-labelled discrete trees, Chassaing and Schaeffer [6] were able to derive precise asymptotics for random quadrangulations in terms of a one-dimensional Brownian snake conditioned to remain positive (see [30] for the definition of this object). The Chassaing-Schaeffer asymptotics have been extended to more general planar maps by Marckert and Miermont [32]. In fact one expects the existence of a universal continuous limit of planar maps that should be described by random real trees of the type considered here.

From Discrete to Continuous Trees
In this section, we first explain how discrete random trees can be coded by discrete paths called the height function and the contour function of the tree. We then prove that the rescaled height function associated with a forest of independent Galton-Watson trees converges in distribution towards reflecting Brownian motion on the positive half-line. This has several interesting consequences for the asymptotic behavior of various functionals of Galton-Watson forests or trees. We also discuss analogous results for a single Galton-Watson tree conditioned to be large. In particular we recover a famous theorem of Aldous showing that the suitably rescaled contour function of a Galton-Watson tree conditioned to have n vertices converges in distribution towards the normalized Brownian excursion as n → ∞. Consequences for various classes of "combinatorial trees" are outlined in subsection 1.5.

Discrete trees
We will be interested in (finite) rooted ordered trees, which are also called plane trees in combinatorics (see e.g. [41]). We first introduce the set of labels where N = {1, 2, . . .} and by convention N 0 = {∅}. An element of U is thus a sequence u = (u 1 , . . . , u n ) of elements of N, and we set |u| = n, so that |u| represents the "generation" of u. If u = (u 1 , . . . u m ) and v = (v 1 , . . . , v n ) belong to U, we write uv = (u 1 , . . . u m , v 1 , . . . , v n ) for the concatenation of u and v. In particular u∅ = ∅u = u.
A (finite) rooted ordered tree t is a finite subset of U such that: (iii) For every u ∈ t, there exists an integer k u (t) ≥ 0 such that, for every j ∈ N, uj ∈ t if and only if 1 ≤ j ≤ k u (t) The number k u (t) is interpreted as the "number of children" of u in t.
We denote by A the set of all rooted ordered trees. In what follows, we see each vertex of the tree t as an individual of a population whose t is the family tree. The cardinality #(t) of t is the total progeny.  We will now explain how trees can be coded by discrete functions. We first introduce the (discrete) height function associated with a tree t. Let us denote by u 0 = ∅, u 1 , u 2 , . . . , u #(t)−1 the elements of t listed in lexicographical order. The height function (h t (n); 0 ≤ n < #(t)) is defined by h t (n) = |u n |, 0 ≤ n < #(t).
The height function is thus the sequence of the generations of the individuals of t, when these individuals are listed in the lexicographical order (see Fig.1 for an example). It is easy to check that h t characterizes the tree t.
The contour function (or Dyck path in the terminology of [41]) gives another way of characterizing the tree, which is easier to visualize on a picture (see Fig.1). Suppose that the tree is embedded in the half-plane in such a way that edges have length one. Informally, we imagine the motion of a particle that starts at time t = 0 from the root of the tree and then explores the tree from the left to the right, moving continuously along the edges at unit speed (in the way explained by the arrows of Fig.1), until all edges have been explored and the particle has come back to the root. Since it is clear that each edge will be crossed twice in this evolution, the total time needed to explore the tree is ζ(t) := 2(#(t) − 1). The value C s of the contour function at time s ∈ [0, ζ(t)] is the distance (on the tree) between the position of the particle at time s and the root. By convention C s = 0 if s ≥ ζ(t). Fig.1 explains the construction of the contour function better than a formal definition.
Proof. We note that if #(t) = p, the sum k u0 (t)+k u1 (t)+· · ·+k u #(t)−1 (t) counts the total number of children of all individuals in the tree and is thus equal to p−1 (because ∅ is not counted !). Furthermore, if i ∈ {0, 1, . . . , p−2}, k u0 +· · ·+k ui is the number of children of u 0 , . . . , u i and thus greater than or equal to i, because u 1 , . . . , u i are counted among these children (in the lexicographical order, an individual is visited before his children). There is even a strict inequality because the father of u i+1 belongs to {u 0 , . . . , u i }. It follows that Φ maps A into S. We leave the rest of the proof to the reader.
Let t ∈ A and p = #(t). Rather than the sequence (m 1 , . . . , m p ) = Φ(t), we will often consider the finite sequence of integers which satisfies the following properties Such a sequence is called a Lukasiewicz path. Obviously the mapping Φ of the proposition induces a bijection between trees (in A) and Lukasiewicz paths.
We now observe that there is a simple relation between the height function of a tree and its Lukasiewicz path.
The height function h t of a tree t is related to the Lukasiewicz path of t by the formula where ≺ stands for the genealogical order on the tree (u ≺ v if v is a descendant of u). Thus it is enough to prove that for j ∈ {0, 1, . . . , n − 1}, u j ≺ u n holds iff To this end, it suffices to verify that inf{k ≥ j : x k < x j } is equal either to #(t), in the case when all u k with k > j are descendants of u j , or to the first index k > j such that u k is not a descendant of u j .
However, writing and using the same arguments as in the proof of Proposition 1.1 (to prove that Φ takes values in S), we see that for every ℓ > j such that u ℓ is a descendant of u j , we have x ℓ − x j ≥ 0, whereas on the other hand . This completes the proof.

Galton-Watson trees
Let µ be a critical or subcritical offspring distribution. This means that µ is a probability measure on Z + such that ∞ k=0 kµ(k) ≤ 1.
We exclude the trivial case where µ(1) = 1. We will make use of the following explicit construction of Galton-Watson trees: Let (K u , u ∈ U) be a collection of independent random variables with law µ, indexed by the label set U. Denote by θ the random subset of U defined by (Z n , n ≥ 0) is a Galton-Watson process with offspring distribution µ and initial value Z 0 = 1.
The tree θ, or any random tree with the same distribution, will be called a Galton-Watson tree with offspring distribution µ, or in short a µ-Galton-Watson tree. We also write Π µ for the distribution of θ on the space A.
We leave the easy proof of the proposition to the reader. The finiteness of the tree θ comes from the fact that the Galton-Watson process with offspring distribution µ becomes extinct a.s., so that Z n = 0 for n large.
If t is a tree and 1 ≤ j ≤ k ∅ (t), we write T j t for the tree t shifted at j: Note that T j t is a tree.
Then Π µ may be characterized by the following two properties (see e.g. [34] for more general statements): (ii) For every j ≥ 1 with µ(j) > 0, the shifted trees T 1 t, . . . , T j t are independent under the conditional probability Π µ (dt | k ∅ = j) and their conditional distribution is Π µ .
Property (ii) is often called the branching property of the Galton-Watson tree.
We now give an explicit formula for Π µ .
Proof. We can easily check that Recall from Proposition 1.1 the definition of the mapping Φ.
Proposition 1.5 Let θ be a µ-Galton-Watson tree. Then where the random variables M 1 , M 2 , . . . are independent with distribution µ, and Remark. The fact that T < ∞ a.s. is indeed a consequence of our approach, but is also easy to prove directly by a martingale argument.
Proof. We may assume that θ is given by the preceding explicit construction. Write U 0 = ∅, U 1 , . . . , U #(θ)−1 for the elements of θ listed in lexicographical order, in such a way that We already know that K U0 +· · ·+K Un ≥ n+1 for every n ∈ {0, 1, . . . , #(θ)−2}, and K U0 + · · · + K U #(θ)−1 = #(θ) − 1. It will be convenient to also define U p for p ≥ #(θ), for instance by setting where in the right-hand side we have added p − #(θ) + 1 labels 1. Then the proof of the proposition reduces to checking that, for every p ≥ 0, K U0 , . . . , K Up are independent with distribution µ. The point is that the labels U j are random (they depend on the collection (K u , u ∈ U)) and so we cannot just use the fact that the variables K u , u ∈ U are i.i.d. with distribution µ. We argue by induction on p. For p = 0 or p = 1, the result is obvious since U 0 = ∅ and U 1 = 1 are deterministic. Fix p ≥ 2 and assume that the desired result holds at order p − 1. Use the notation u ≤ v for the lexicographical order on U (in contrast with u ≺ v for the genealogical order !). As usual u < v if u ≤ v and u = v. The point is to observe that, for every fixed u ∈ U, the random set is measurable with respect to the σ-field σ(K v , v < u). This readily follows from the construction of θ. As a consequence, the event is measurable with respect to σ(K v , v < u). It is also easy to see that the same measurability property holds for the event Finally, if g 0 , g 1 , . . . , g p are nonnegative functions on {0, 1, . . .}, because K up is independent of σ(K v , v < u p ), and we use the preceding measurability property. Then E[g p (K up )] = µ(g p ) does not depend on u p , and taking g p = 1 in the preceding formula we see that An application of the induction assumption completes the proof. Corollary 1.6 Let (S n , n ≥ 0) be a random walk on Z with initial value S 0 and jump distribution ν(k) = µ(k + 1) for every k ≥ −1. Set Then the Lukasiewicz path of a µ-Galton-Watson tree θ has the same distribution as (S 0 , S 1 , . . . , S T ). In particular, #(θ) and T have the same distribution. This is an immediate consequence of the preceding proposition.

Convergence to Brownian motion
Our goal is to show that the height functions (or contour functions) of Galton-Watson trees (resp. of Galton-Watson forests) converge in distribution, modulo a suitable rescaling, towards Brownian excursions (resp. reflected Brownian motions).
We fix a critical offspring distribution µ with finite variance σ 2 > 0. Note that the criticality means that we now have ∞ k=0 kµ(k) = 1.
Let θ 1 , θ 2 , . . . be a sequence of independent µ-Galton-Watson trees. With each θ i we can associate its height function (h θi (n), 0 ≤ n ≤ #(θ i ) − 1)). We then define the height process (H n , n ≥ 0) of the forest by concatenating the functions h θ1 , h θ2 , . . .: if #(θ 1 ) + · · · + #(θ i−1 ) ≤ n < #(θ 1 ) + · · · + #(θ i ). Clearly, the function (H n , n ≥ 0) determines the sequence of trees. To be specific, the "k-th excursion" of H from 0 (more precisely, the values of H between its k-th zero and the next one) is the height function of the k-th tree in the sequence. By combining Corollary 1.6 with Proposition 1.2, we arrive at the following result (cf Corollary 2.2 in [29]). Proposition 1. 7 We have for every n ≥ 0 where (S n , n ≥ 0) is a random walk with the distribution described in Corollary 1.6.
This is the main ingredient for the proof of the following theorem. By definition, a reflected Brownian motion (started at the origin) is the absolute value of a standard linear Brownian motion started at the origin. The notation [x] refers to the integer part of x. Theorem 1.8 Let θ 1 , θ 2 , . . . be a sequence of independent µ-Galton-Watson trees, and let (H n , n ≥ 0) be the associated height process. Then where γ is a reflected Brownian motion. The convergence holds in the sense of weak convergence on the Skorokhod space D(R + , R + ).
Let us establish the weak convergence of finite-dimensional marginals in the theorem.
Let S = (S n , n ≥ 0) be as in Proposition 1.7. Note that the jump distribution ν has mean 0 and finite variance σ 2 , and thus the random walk S is recurrent. We also introduce the notation Donsker's invariance theorem gives where B is a standard linear Brownian motion started at the origin. For every n ≥ 0, introduce the time-reversed random walk S n defined by S n k = S n − S (n−k) + and note that ( S n k , 0 ≤ k ≤ n) has the same distribution as (S n , 0 ≤ k ≤ n). From formula (1), we have where for any discrete trajectory ω = (ω(0), ω(1), . . .), we have set We also set K n = Φ n (S) = #{k ∈ {1, . . . , n} : S k = M k }. Lemma 1.9 Define a sequence of stopping times T j , j = 0, 1, . . . inductively by setting T 0 = 0 and for every j ≥ 1, Then the random variables S Tj − S Tj−1 , j = 1, 2, . . . are independent and identically distributed, with distribution Proof. The fact that the random variables S Tj −S Tj−1 , j = 1, 2, . . . are independent and identically distributed is a straightforward consequence of the strong Markov property. It remains to compute the distribution of S T1 . The invariant measure of the recurrent random walk S is the counting measure on Z. By a standard result, if R 0 = inf{n ≥ 1 : S n = 0}, we have for every i ∈ Z, Notice that T 1 ≤ R 0 and that the random walk takes positive values on ]T 1 , R 0 [. It easily follows that for every i ≤ 0 Therefore, for any function g : Then, for any function f : which gives the desired formula. In the third equality we used the Markov property at time k and in the fourth one we applied (3).
Note that the distribution of S T1 has a finite first moment: The next lemma is the key to the first part of the proof.
where the notation (P) → means convergence in probability.
Proof. From our definitions, we have Using Lemma 1.9 and the law of large numbers (note that K n −→ ∞), we get By replacing S with the time-reversed walk S n we see that for every n, the pair (M n , K n ) has the same distribution as (S n − I n , H n ). Hence the previous convergence entails and the lemma follows. From (2), we have for every choice of 0 ≤ t 1 ≤ t 2 ≤ · · · ≤ t m , Therefore it follows from Lemma 1.10 that However, a famous theorem of Lévy states that the process is a reflected Brownian motion. This completes the proof of the convergence of finite-dimensional marginals in Theorem 1.8.
We will now discuss the functional convergence in Theorem 1.8. To this end, we will need more precise estimates. We will give details of the argument in the case when µ has small exponential moments, that is there exists λ > 0 such that ∞ k=0 e λk µ(k) < ∞.
Our approach in that case is inspired from [31]. See [28] for a proof in the general case. We first state a lemma. 1 4 ). We can find ε ′ > 0 and an integer N ≥ 1 such that, for every n ≥ N and ℓ ∈ {0, 1, . . . , n}, We postpone the proof of the lemma and complete the proof of Theorem 1.8. We apply Lemma 1.11 with ε = 1/8. Since for every n the pair (M n , K n ) has the same distribution as (S n − I n , H n ), we get that, for every sufficiently large n, and ℓ ∈ {0, 1, . . . , n}, Let A ≥ 1 be a fixed integer. We deduce from the preceding bound that, for every p sufficiently large, A simple application of the Borel-Cantelli lemma gives It is now clear that the theorem follows from the convergence which is an immediate consequence of (2).
We still have to prove Lemma 1.11. We first state a very simple "moderate deviations" lemma for sums of independent random variables.
. . be a sequence of i.i.d. real random variables. We assume that there exists a number λ > 0 such that E[exp(λ|Y 1 |)] < ∞, and that E[Y 1 ] = 0. Then, for every α > 0, we can choose N sufficiently large so that for every n ≥ N and ℓ ∈ {1, 2, . . . , n}, Proof. The assumption implies that E[e λY1 ] = 1 + cλ 2 + o(λ 2 ) as λ → 0, where c = 1 2 var(Y 1 ). Hence we can find a constant C such that for every sufficiently small λ > 0, It follows that, for every sufficiently small λ > 0, If n is sufficiently large we can take λ = n −1/2 and the desired result follows (after also replacing Y i with −Y i ).
Let us now prove Lemma 1.11. We choose α ∈ (0, ε/2) and to simplify notation we put m n = [n 1 2 +α ]. Then, for every ℓ ∈ {0, 1, . . . , n}, Recalling (4), we have first where the last bound holds for n large by Lemma 1.12. Note that we are assuming that µ has small exponential moments, and the same holds for the law of S T1 by Lemma 1.9, which allows us to apply Lemma 1.12.
We still need to bound the first term in the right-hand side of (6). Plainly, Applying again Lemma 1.12, we get that for n large, Finally, and since S Tm n − σ 2 2 m n is the sum of m n i.i.d. centered random variables having small exponential moments, we can again apply Lemma 1.12 (or a classical large deviations estimate) to get the needed bound. This completes the proof of Lemma 1.11.

Some applications
Let us first recall some important properties of linear Brownian motion. Let β be a standard linear Brownian motion started at 0. Then there exists a continuous increasing process L 0 t = L 0 t (β) called the local time of β at 0 such that if N ε (t) denotes the number of positive excursions of β away from 0 with height greater than ε and completed before time t, one has lim ε→0 2ε N ε (t) = L 0 t for every t ≥ 0, a.s. The topological support of the measure dL 0 t coincides a.s. with the zero set {t ≥ 0 : β t = 0}. Moreover, the above-mentioned Lévy theorem can be strengthened in the form where B t = inf 0≤s≤t B s . See e.g. [38] Chapter VI, Theorem VI.2.3. Keeping the notation of subsection 1.3, we set for every n ≥ 0, in such a way that k is the index of the tree to which the n th -visited vertex belongs.
The convergence stated in Theorem 1.8 can now be strengthened in the following form: This is a simple consequence of our arguments: It is easily seen that On the other hand, we saw that, for every A > 0, Combining with Donsker's theorem, we get and an application of Lévy's theorem in the form recalled above yields (7). We will now apply (7) to study the asymptotics of a single Galton-Watson tree conditioned to be large. We write h(θ) = sup{|v| : v ∈ θ} for the height of the tree θ. Let us fix x > 0 and for every integer p ≥ 1 denote by θ {x √ p} a random tree with distribution where we recall that Π µ is the law of the Galton-Watson tree with offspring distribution µ.
We denote by H {x √ p} the height function of θ {x √ p} . By convention, H Corollary 1. 13 We have where e σx/2 is a Brownian excursion conditioned to have height greater than σx/2.
The excursion e σx/2 can be constructed explicitly in the following way. Set Then we may take e σx/2 t = |β (G+t)∧D |.
uniformly on every compact set, a.s. Set The reason for the term −p −1 in the formula for G (p) comes from the integer part in 1 √ p H [pt] : We want the process H (G (p) +t)∧D (p) , t ≥ 0) has the distribution of the (rescaled) height process of the first tree in the sequence θ 1 , θ 2 , . . . with height greater than x √ p, which is distributed as θ {x √ p} . Therefore, , t ≥ 0). The corollary will thus follow from (8) if we can prove that Using the fact that immediately after time T the process |β| hits levels strictly larger than σx/2, we easily get from (8) that Now note that L 0 D = L 0 T . Since T (p) converges to T , (8) also shows that Λ converges to σL 0 T , and it follows that for p large, a.s. on {D < t}.
Observing that Λ (p) stays constant on the interval [T (p) , D (p) ), we conclude that This is enough to prove that lim sup D (p) ≤ D a.s.
Replacing p by p 2 and taking x = 1, we deduce from the corollary (in fact rather from its proof) that where ζ σ/2 = D − G is the length of excursion e σ/2 . Indeed this immediately follows from the convergence of D (p) − G (p) towards D − G and the fact that, by construction in the notation of the preceding proof.
Notice that the Laplace transform of the limiting law is known explicitly : This basically follows from the Williams decomposition of Itô's excursion measure (Theorem XII.4.5 in [38]) and the known formulas for the hitting time of σ/2 by a three-dimensional Bessel process or a linear Brownian motion started at 0 (see e.g. [38]).
Exercise. Show the convergence in distribution of p −1 h(θ {p} ) and identify the limiting law.
We will now discuss "occupation measures". Rather than considering a single tree as above, we will be interested in a finite forest whose size will tend to ∞ with p. Precisely, we fix b > 0, and we set in such a way that H p is the height process for a collection of [bp] independent Galton-Watson trees. Then it easily follows from (7) that where, for every r > 0, Indeed, we can write where τ Then we observe from (7) that and (9) follows. Taking b = 1, we deduce from (9) that, for every x > 0, The last equality is a simple consequence of excursion theory for linear Brownian motion (see e.g. Chapter XII in [38]). Now obviously and we recover the classical fact in the theory of branching processes We now set Z p 0 = p and, for every n ≥ 1, From Proposition 1.3, we know that (Z p n , n ≥ 0) is a Galton-Watson branching process with offspring distribution µ. We can thus apply the classical diffusion approximation to this process.
where the limiting process X is a diffusion process with infinitesimal generator 1 2 σ 2 x d 2 dx 2 , which can be obtained as the unique solution of the stochastic differential equation For a proof, see e.g. Theorem 9.1.3 in Ethier and Kurtz [16]. It is easy to see that, for every p ≥ 1, k is a martingale, which strongly suggests that the limiting process X is of the form stated in the theorem.
The process X is called Feller's branching diffusion. When σ = 2, this is also the zero-dimensional squared Bessel process in the terminology of [38]. Note that X hits 0 in finite time and is absorbed at 0.
To simplify notation, let us fix µ with σ = 2. Let f 1 , . . . , f q be q continuous functions with compact support from R + into R + . As a consequence of (9) we have On the other hand, By using Theorem 1.14, we see that In other words, the occupation measure of |β| over the time interval [0, τ 1/2 ], that is the measure has the same distribution as the measure X a da. We have recovered one of the celebrated Ray-Knight theorems for Brownian local times (see e.g. Theorem XI.2.3 in [38]).

Galton-Watson trees with a fixed progeny
We can also use Theorem 1.8 to recover a famous result of Aldous concerning Galton-Watson trees conditioned to have a large (fixed) number of vertices (see [2], Aldous dealt with the contour function rather than the height function, but this is more or less equivalent as we will see in the next subsection). We will follow an idea of [31]. We assume as in the end of subsection 1.3 that µ has a small exponential moment. Our results hold without this assumption, but it will simplify the proof. For every p ≥ 1 we denote by θ (p) a µ-Galton-Watson tree conditioned to have #(θ) = p. For this to make sense we need P (#(θ) = p) > 0 for every p ≥ 1, which holds if µ(1) > 0 (in fact, we only need P (#(θ) = p) > 0 for p large, which holds under an aperiodicity condition on µ). In the notation of subsection 1.2, the distribution of θ (p) is Π µ (da | #(a) = p).
We denote by (H We also need to introduce the normalized Brownian excursion (e t ) 0≤t≤1 . This is simply the Brownian excursion conditioned to have length 1. For instance, we may look at the first positive excursion of β (away from 0) with length greater than 1, write [G, D] for the corresponding time interval, and set A more intrinsic construction of the normalized Brownian excursion will be presented in the next subsection.
Theorem 1. 15 We have This is obviously very similar to our previous results Theorem 1.8 and Corollary 1.13. However, because the present conditioning "degenerates in the limit" p → ∞ (there is no Brownian excursion with length exactly equal to 1), we cannot use the same strategy of proof as in Corollary 1.13.
Proof. Let (H n , n ≥ 0) be as in Theorem 1.8 the height process associated with a sequence of independent µ-Galton-Watson trees. We may and will assume that H is given in terms of the random walk S as in (1).
Denote by T 1 the number of vertices of the first tree in the sequence, or equivalently A simple combinatorial argument (consider all circular permutations of the p increments of the random walk S over the interval [0, p]) shows that, for every p ≥ 1, On the other hand classical results for random walk (see e.g. P9 in Chapter II of [40]) give and it follows that Recall from the end of the proof of Theorem 1.8 (see (5)) that we can find ε > 0 so that, for p large enough By comparing with (10), we see that we have also for p large for any ε ′ < ε. Since I n = 0 for 0 ≤ n < T 1 , we have also for p large . Therefore Theorem 1.15 is a consequence of the last bound and the following lemma which relates the normalized Brownian excursion to the random walk excursion with a fixed long duration.
under the conditional probability P (· | T 1 = p) converges as p tends to ∞ to the law of the normalized Brownian excursion.
We omit the proof of this lemma, which can be viewed as a conditional version of Donsker's theorem. See Kaigh [24].
See also Duquesne [10] for generalisations of Theorem 1.15.
Application. We immediately deduce from Theorem 1.15 that, for every x > 0, There is an explicit (complicated) formula for the right-hand side of (11).
Combinatorial consequences. For several particular choices of µ, the measure Π µ (da | #(a) = p) coincides with the uniform probability measure on a class of "combinatorial trees" with p vertices, and Theorem 1.15 gives information about the proportion of trees in this class that satisfy certain properties.
To make this more explicit, consider first the case when µ is the geometric distribution with parameter 1 2 (µ(k) = 2 −k−1 ). Then Π µ (da | #(a) = p) is the uniform distribution on the set A p of all rooted ordered trees with p vertices (this follows from Proposition 1.4). Thus (11) shows that the height of a tree chosen at random in A p is of order √ p, and more precisely it gives the asymptotic proportion of those trees in A p with height greater than x √ p. Similar arguments apply to other functionals than the height: For instance, if 0 ≤ a < b are given, we could derive asymptotics for the number of vertices in the tree between generations a √ p and b √ p, for the number of vertices at generation [a √ p] that have descendants at generation [b √ p], etc. The limiting distributions are obtained in terms of the normalized Brownian excursion, and are not always easy to compute explicitly ! Similarly, if µ is the Poisson distribution, that is µ(k) = e −1 /k!, then Π µ (da | #(a) = p) yields the uniform distribution on the set of all rooted Cayley trees with p vertices. Let us explain this more in detail. Recall that a Cayley tree (with p vertices) is an unordered tree on vertices labelled 1, 2, . . . , p. The root can then be any of these vertices. By a famous formula due to Cayley, there are p p−2 Cayley trees on p vertices, and so p p−1 rooted Cayley trees on p vertices. If we start from a rooted ordered tree distributed according to Π µ (da | #(a) = p), then assign labels 1, 2, . . . , p uniformly at random to vertices, and finally "forget" the ordering of the tree we started from, we get a random tree that is uniformly distributed over the set of all rooted Cayley trees with p vertices. Hence Theorem 1.15, and in particular (11) also give information about the properties of large Cayley trees.
As a last example, we can take µ = 1 2 (δ 0 +δ 2 ), and it follows from Proposition 1.4 that, provided p is odd, Π µ (da | #(a) = p) is the uniform distribution over the set of (complete) binary trees with p vertices. Strictly speaking, Theorem 1.15 does not include this case, since we assumed that µ(1) > 0. It is however not hard to check that the convergence of Theorem 1.15 still holds, provided we restrict our attention to odd values of p.
It is maybe unexpected that these different classes of combinatorial trees give rise to the same scaling limit, and in particular that the limiting law appearing in (11) is the same in each case. Note however that the constant σ varies: σ 2 = 1 for (complete) binary trees or for Cayley trees, whereas σ 2 = 2 for rooted ordered trees.
As a final remark, let us observe that the convergence in distribution of Theorem 1.15 is often not strong enough to deduce rigorously the desired combinatorial asymptotics (this is the case for instance if one looks at the height profile of the tree, that is the number of vertices at every level in the tree). Still Theorem 1.15 allows one to guess what the limit should be in terms of the normalized Brownian excursion. See in particular [9] for asymptotics of the profile that confirmed a conjecture of Aldous.

Convergence of contour functions
In this subsection, we briefly explain how the preceding results can be stated as well in terms of the contour processes of the trees rather than the height processes as discussed above. The contour function of a tree was discussed in subsection 1.1 (see Fig.1). Notice that in contrast to the height process it is convenient to have the contour function indexed by a real parameter.
We will give the result corresponding to Theorem 1.8. So we consider again a sequence θ 1 , θ 2 , . . . of independent µ-Galton-Watson trees and we denote by (C t , t ≥ 0) the process obtained by concatenating the contour functions of θ 1 , θ 2 , . . . Here we need to define precisely what we mean by concatenation.
In Fig.1 and the discussion of subsection 1.1, the contour function of a tree θ is naturally defined on the time interval [0, ζ(θ)], where ζ(θ) = 2(#(θ) − 1). This has the unpleasant consequence that the contour function of the tree consisting only of the root is trivial. For this reason we make the slightly artificial convention that the contour function For every n ≥ 0, we set Note that the sequence J n is strictly increasing and J n ≥ n.
Recall that the value at time n of the height process corresponds to the generation of the individual visited at step n, assuming that individuals are visited in lexicographical order one tree after another. It is easily checked by induction on n that [J n , J n+1 ] is exactly the time interval during which the contour process goes from the individual n to the individual n + 1. From this observation, we get A more precise argument for this bound follows from the explicit formula for C t in terms of the height process: For t ∈ [J n , J n+1 ], These formulas are easily checked by induction on n. Define a random function ϕ : R + −→ {0, 1, . . .} by setting ϕ(t) = n iff t ∈ [J n , J n+1 ). From the previous bound, we get for every integer m ≥ 1, Similarly, it follows from the definition of J n that sup Theorem 1. 17 We have where β is a standard linear Brownian motion.
Remark. There is one special case where Theorem 1.17 is easy, without any reference to Theorem 1.8. This is the case where µ is the geometric distribution µ(k) = 2 −k−1 , which satisfies our assumptions with σ 2 = 2. In that case, it is not hard to see that away from the origin the contour process (C n , n ≥ 0) behaves like simple random walk (indeed, by the properties of the geometric distribution, the probability for an individual to have at least n + 1 children knowing that he has at least n is 1/2 independently of n). A simple argument then shows that the statement of Theorem 1.17 follows from Donsker's invariance theorem.

Conclusion
The various results of this section show that the rescaled height processes (or contour processes) of large Galton-Watson trees converge in distribution towards Brownian excursions. Still we did not assert that the trees themselves converge. In fact, a precise mathematical formulation of this fact requires a formal definition of what the limiting random trees are and what the convergence means. In the next section, we will give a precise definition of continuous trees and discuss a topology on the space of continuous trees. This will make it possible to reinterpret the results of this section as convergence theorems for random trees.
Bibiographical notes. The coding of discrete trees by contour functions (Dyck paths) or Lukasiewicz words is well known: See e.g. [41]. Theorem 1.8 can be viewed as a variant of Aldous' theorem about the scaling limit of the contour function of Galton-Watson trees [2]. The method that is presented here is taken from [28], with an additional idea from [31]. More general statements can be found in Chapter 2 of the monograph [11]. See Chapters 5 and 6 of Pitman [36], and the references therein, for more results about the connections between trees, random walks and Brownian motion.

Real Trees and their Coding by Brownian Excursions
In this section, we first describe the formalism of real trees, which can be used to give a precise mathematical meaning to the convergence of rescaled discrete trees towards continuous objects. We then show how a real tree can be coded by a continuous function in a way similar to the coding of discrete trees by their contour functions. Aldous' Continuum Random Tree (the CRT) can be defined as the random real tree coded by a normalized Brownian excursion. For every integer p ≥ 1, we then compute the p-dimensional marginal distribution (that is the law of the reduced tree consisting of the ancestral lines of p individuals chosen uniformly at random) of the tree coded by a Brownian excursion under the Itô excursion measure. Via a conditioning argument, this leads to a simple derivation of the marginal distributions of the CRT.

Real trees
We start with a formal definition. In the present work, we consider only compact real trees, and so we include this compactness property in the definition.
A rooted real tree is a real tree (T , d) with a distinguished vertex ρ = ρ(T ) called the root. In what follows, real trees will always be rooted, even if this is not mentioned explicitly.
Let us consider a rooted real tree (T , d). The range of the mapping f a,b in (i) is denoted by [[a, b]] (this is the line segment between a and b in the tree). In particular, [[ρ, a]] is the path going from the root to a, which we will interpret as the ancestral line of vertex a. More precisely we define a partial order on the tree by setting a b (a is an ancestor of We write c = a ∧ b and call c the most recent common ancestor to a and b.
By definition, the multiplicity of a vertex a ∈ T is the number of connected components of T \{a}. Vertices of T \{ρ} which have multiplicity 1 are called leaves.
Our goal is to study the convergence of random real trees. To this end, it is of course necessary to have a notion of distance between two real trees. We will use the Gromov-Hausdorff distance between compact metric spaces, which has been introduced by Gromov (see e.g. [19]) in view of geometric applications.
If (E, δ) is a metric space, we use the notation δ Haus (K, K ′ ) for the usual Hausdorff metric between compact subsets of E : Then, if T and T ′ are two rooted compact metric spaces, with respective roots ρ and ρ ′ , we define the distance d GH (T , T ′ ) by where the infimum is over all choices of a metric space (E, δ) and all isometric embeddings ϕ : T −→ E and ϕ ′ : Two rooted compact metric spaces T 1 and T 2 are called equivalent if there is a root-preserving isometry that maps T 1 onto T 2 . Obviously d GH (T , T ′ ) only depends on the equivalence classes of T and T ′ . Then d GH defines a metric on the set of all equivalent classes of rooted compact metric spaces (cf [19] and [17]). We denote by T the set of all (equivalence classes of) rooted real trees.
We will not really use this theorem (see however the remarks after Lemma 2.4). So we refer the reader to [17], Theorem 1 for a detailed proof.
We will use the following alternative definition of d GH . First recall that if (T 1 , d 1 ) and (T 2 , d 2 ) are two compact metric spaces, a correspondence between T 1 and T 2 is a subset R of T 1 × T 2 such that for every x 1 ∈ T 1 there exists at least one x 2 ∈ T 2 such that (x 1 , x 2 ) ∈ R and conversely for every y 2 ∈ T 2 there exists at least one y 1 ∈ T 1 such that (y 1 , y 2 ) ∈ R. The distortion of the correspondence R is defined by Then, if T and T ′ are two rooted compact metric spaces with respective roots ρ and ρ ′ , we have where C(T , T ′ ) denotes the set of all correspondences between T and T ′ (see Lemma 2.3 in [17] -actually this lemma is stated for trees but the proof applies as well to compact metric spaces).

Coding real trees
In this subsection, we describe a method for constructing real trees, which is particularly well-suited to our forthcoming applications to random trees. We consider a (deterministic) continuous function g : support and such that g(0) = 0. To avoid trivialities, we will also assume that g is not identically zero. For every s, t ≥ 0, we set Clearly d g (s, t) = d g (t, s) and it is also easy to verify the triangle inequality for every s, t, u ≥ 0. We then introduce the equivalence relation Obviously the function d g induces a distance on T g , and we keep the notation d g for this distance. We denote by p g : [0, ∞[−→ T g the canonical projection. Clearly p g is continuous (when [0, ∞[ is equipped with the Euclidean metric and T g with the metric d g ). We set ρ = p g (0). If ζ > 0 is the supremum of the support of g, we have p g (t) = ρ for every t ≥ ζ. In particular, is a real tree. We will view (T g , d g ) as a rooted tree with root ρ = p g (0).

Remark.
It is also possible to prove that any (rooted) real tree can be represented in the form T g . We will leave this as an exercise for the reader.
To get an intuitive understanding of Theorem 2.2, the reader should have a look at Figure 2. This figure shows how to construct a simple subtree of T g , namely the "reduced tree" consisting of the union of the ancestral lines in T g of three vertices p g (s), p g (t), p g (u) corresponding to three (given) times s, t, u ∈ [0, ζ]. This reduced tree is the union of the five bold line segments that are constructed from the graph of g in the way explained on the left part of the figure. Notice that the lengths of the horizontal dotted lines play no role in the construction, and that the reduced tree should be viewed as pictured on the right part of Figure 2. The ancestral line of p g (s) (resp. p g (t), p g (u)) is a line segment of length g(s) (resp. g(t), g(u)). The ancestral lines of p g (s) and p g (t) share a common part, which has length m g (s, t) (the line segment at the bottom in the left or the right part of Figure 2), and of course a similar property holds for the ancestral lines of p g (s) and p g (u), or of p g (t) and p g (u).
We present below an elementary proof of Theorem 2.2, which uses only the definition of a real tree, and also helps to understand the notions of ancestral line and most recent common ancestor in T g . Another argument depending on Theorem 2.1 is presented at the end of the subsection. See also [18] for a short proof using the characterization of real trees via the so-called four-point condition.
Before proceeding to the proof of the theorem, we state and prove the following root change lemma, which is of independent interest.
for every s ∈ [0, ζ], and g ′ (s) = 0 for s > ζ. Then, the function g ′ is continuous with compact support and satisfies g ′ (0) = 0, so that we can define T g ′ . Furthermore, for every s, t ∈ [0, ζ], we have and there exists a unique isometry R from T g ′ onto T g such that, for every s ∈ [0, ζ], R(p g ′ (s)) = p g (s 0 + s).
Assuming that Theorem 2.2 is proved, we see that T g ′ coincides with the real tree T g re-rooted at p g (s 0 ). Thus the lemma tells us which function codes the tree T g re-rooted at an arbitrary vertex.
Proof. It is immediately checked that g ′ satisfies the same assumptions as g, so that we can make sense of T g ′ . Then the key step is to verify the relation (19). Consider first the case where s, t ∈ [0, ζ − s 0 [. Then two possibilities may occur.

It follows that
If m g (s 0 + s, s 0 + t) < m g (s 0 , s 0 + s), then the minimum in the definition of m g ′ (s, t) is attained at r 1 defined as the first r ∈ [s, t] such that g(s 0 + r) = m g (s 0 , s 0 + s) (because for r ∈ [r 1 , t] we will have g(s 0 + r) − 2m g (s 0 , s 0 + r) ≥ −m g (s 0 , s 0 + r) ≥ −m g (s 0 , s 0 + r 1 )). Therefore, The other cases are treated in a similar way and are left to the reader. By (19), if s, t ∈ [0, ζ] are such that d g ′ (s, t) = 0, we have d g (s 0 + s, s 0 + t) = 0 so that p g (s 0 + s) = p g (s 0 + t). Noting that T g ′ = p g ′ ([0, ζ]) (the supremum of the support of g ′ is less than or equal to ζ), we can define R in a unique way by the relation (20). From (19), R is an isometry, and it is also immediate that R takes T g ′ onto T g .
Proof of Theorem 2.2. Let us start with some preliminaries. For σ, σ ′ ∈ T g , we set σ σ ′ if and only if d g (σ, σ ′ ) = d g (ρ, σ ′ ) − d g (ρ, σ). If σ = p g (s) and σ ′ = p g (t), it follows from our definitions that σ σ ′ iff m g (s, t) = g(s). It is immediate to verify that this defines a partial order on T g .
For any σ 0 , σ ∈ T g , we set We now prove property (i) of the definition of a real tree. So we fix σ 1 and σ 2 in T g and we have to prove existence and uniqueness of the mapping f σ1,σ2 . By using Lemma 2.3 with s 0 such that p g (s 0 ) = σ 1 , we may assume that σ 1 = ρ. If σ ∈ T g is fixed, we have to prove that there exists a unique isometric mapping , so that g(s) = d g (ρ, σ). Then, for every a ∈ [0, d g (ρ, σ)], we set v(a) = inf{r ∈ [0, s] : m g (r, s) = a}.
Note that g(v(a)) = a. We put f (a) = p g (v(a)). We have f (0) = ρ and f (d g (ρ, σ)) = σ, the latter because m g (v(g(s)), s) = g(s) implies p g (v(g(s))) = p g (s) = σ. It is also easy to verify that f is an isometry: If a, b ∈ [0, d g (ρ, σ)] with a ≤ b, it is immediate that m g (v(a), v(b)) = a, and so To get uniqueness, suppose thatf is an isometric mapping satisfying the same properties as f . Then, if a ∈ [0, d g (ρ, σ)], f (a)).
Therefore,f (a) σ. Recall that σ = p g (s), and choose t such that p g (t) =f (a). Note that g(t) = d g (ρ, p g (t)) = a. Sincef (a) σ we have g(t) = m g (t, s). On the other hand, we also know that a = g(v(a)) = m g (v(a), s). It follows that we have a = g(t) = g(v(a)) = m g (v(a), t) and thus d g (t, v(a)) = 0, so that f (a) = p g (t) = p g (v(a)) = f (a). This completes the proof of (i).
Once we know that (T g , d g ) is a real tree, it is straightforward to verify that the notation σ σ ′ , [[σ, σ ′ ]], σ∧σ ′ introduced in the preceding proof is consistent with the definitions of subsection 2.1 stated for a general real tree.
Let us briefly discuss multiplicities of vertices in the tree T g . If σ ∈ T g is not a leaf then we must have ℓ(σ) < r(σ), where are respectively the smallest and the largest element in the equivalence class of σ in [0, ζ]. Note that m g (ℓ(σ), r(σ)) = g(ℓ(σ)) = g(r(σ)) = d g (ρ, σ). Denote by (a i , b i ), i ∈ I the connected components of the open set (ℓ(σ), r(σ)) ∩ {t ∈ [0, ∞[: g(t) > d g (ρ, σ)} (the index set I is empty if σ is a leaf). Then we claim that the connected components of the open set T g \{σ} are the sets p g ((a i , b i )), i ∈ I and T g \T g [σ] (the latter only if σ is not the root). We have already noticed that T g \T g [σ] is open, and the argument used above for T g [σ]\{σ} also shows that the sets p g ((a i , b i )), i ∈ I are open. Finally the sets p g ((a i , b i )) are connected as continuous images of intervals, and .
We conclude this subsection with a lemma comparing the trees coded by two different functions g and g ′ . Lemma 2.4 Let g and g ′ be two continuous functions with compact support where g − g ′ stands for the uniform norm of g − g ′ .
Proof. We rely on formula (18) for the Gromov-Hausdorff distance. We can construct a correspondence between T g and T g ′ by setting In order to bound the distortion of R, let (σ, σ ′ ) ∈ R and (η, η ′ ) ∈ R. By our definition of R we can find s, t ≥ 0 such that p g (s) = σ, p g ′ (s) = σ ′ and p g (t) = η, p g ′ (t) = η ′ . Now recall that Thus we have dis(R) ≤ 4 g − g ′ and the desired result follows from (18).
Lemma 2.4 suggests the following alternative proof of Theorem 2.2. Denote by C 00 the set of all functions g : [0, ∞[−→ [0, ∞[ that satisfy the assumptions stated at the beginning of this subsection, and such that the following holds: There exist ε > 0 and ρ > 0 such that, for every i ∈ N, the function g is linear with slope ρ or −ρ over the interval [(i − 1)ε, iε)]. Then it is easy to see that T g is a real tree if g ∈ C 00 . Indeed, up to a simple time-space rescaling, g will be the contour function of a discrete tree t ∈ A, and T g coincides (up to rescaling) with the real tree that can be constructed from t in an obvious way. Then, a general function g can be written as the uniform limit of a sequence (g n ) in C 00 , and Lemma 2.4 implies that T g is the limit in the Gromov-Hausdorff distance of the sequence T gn . Since each T gn is a real tree, T g also must be a real tree, by Theorem 2.1 (we do not really need Theorem 2.1, but only the fact that the set of all real trees is closed in the set of all compact metric spaces equipped with the Gromov-Hausdorff metric, cf Lemma 2.1 in [17]).
Remark. Recalling that any rooted real tree can be represented in the form T g , the separability in Theorem 2.1 can be obtained as a consequence of Lemma 2.4 and the separability of the space of continuous functions with compact support on R + .

The continuum random tree
We recall from Section 1 the notation e = (e t , 0 ≤ t ≤ 1) for the normalized Brownian excursion. By convention, we take e t = 0 if t > 1.

Definition 2.2
The continuum random tree (CRT) is the random real tree T e coded by the normalized Brownian excursion.
The CRT T e is thus a random variable taking values in the set T. Note that the measurability of this random variable follows from Lemma 2.4.

Remark.
Aldous [1], [2] uses a different method to define the CRT. The preceding definition then corresponds to Corollary 22 in [2]. Note that our normalization differs by an unimportant scaling factor 2 from the one in Aldous' papers: The CRT there is the tree T 2e instead of T e .
We can restate many of the results of Section 1 in terms of weak convergence in the space T. Rather than doing this in an exhaustive manner, we will give a typical example showing that the CRT is the limit of rescaled "combinatorial" trees.
Recall from subsection 1.1 the notation A for the set of all (finite) rooted ordered trees, and denote by A n the subset of A consisting of trees with n vertices. We may and will view each element t of A as a rooted real tree: Simply view t as a union of line segments of length 1 in the plane, in the way represented in the left part of Figure 1, equipped with the obvious distance (the distance between σ and σ ′ is the length of the shortest path from σ to σ ′ in the tree). Alternatively, if (C t , t ≥ 0) is the contour function of the tree, this means that we identify t = T C (this is not really an identification, because the tree t has an order structure which disappears when we consider it as a real tree).
For any λ > 0 and a tree T ∈ T, the tree λT is the "same" tree with all distances multiplied by the factor λ (if the tree is embedded in the plane as suggested above, this corresponds to replacing the set T by λT ).
Theorem 2.5 For every n ≥ 1, let T (n) be a random tree distributed uniformly over A n . Then (2n) −1/2 T (n) converges in distribution to the CRT T e , in the space T.
Proof. Let θ be a Galton-Watson tree with geometric offspring distribution µ(k) = 2 −k−1 , and for every n ≥ 1 let θ n be distributed as θ conditioned to have n vertices. Then it is easy to verify that θ n has the same distribution as T (n) . On the other hand, let (C n t , t ≥ 0) be the contour function of θ n , and let C n t = (2n) −1/2 C n 2nt , t ≥ 0.
From Theorem 1.15 (restated in terms of the contour function as explained in subsection 1.6), we have On the other hand, from the observations preceding the theorem, and the fact that θ n has the same distribution as T (n) , it is immediate that the tree TCn coded by C n has the same distribution as (2n) −1/2 T (n) . The statement of the theorem now follows from the previous convergence and Lemma 2.4.
We could state analogues of Theorem 2.5 for several other classes of combinatorial trees, such as the ones considered at the end of subsection 1.5. For instance, if τ n is distributed uniformly among all rooted Cayley trees with n vertices , then (4n) −1/2 τ n converges in distribution to the CRT T e , in the space T. Notice that Cayley trees are not ordered, but that they can be obtained from (ordered) Galton-Watson trees with Poisson offspring distribution by "forgetting" the order, as was explained at the end of subsection 1.5. By applying the same argument as in the preceding proof to these (conditioned) Galton-Watson trees, we get the desired convergence for rescaled Cayley trees.

The Itô excursion measure
Our goal is to derive certain explicit distributions for the CRT, and more specifically its so-called finite-dimensional marginal distributions. For these calculations, we will need some basic properties of Brownian excursions. Before dealing with the normalized Brownian excursion, we will consider Itô's measure.
We denote by (B t , t ≥ 0) a linear Brownian motion, which starts at x under the probability measure P x . We set It follows that, for every t > 0, the density under P 0 of the law of the pair The reflection principle also implies that S t and |B t | have the same distribution. Let a > 0. Observing that {T a ≤ t} = {S t ≥ a}, P 0 a.s., we obtain that the density of T a under P 0 is the function ).
Notice the relation γ t (a, b) = 2 q 2a−b (t) for a ≥ 0 and b < a. For every ε > 0, denote by ν ε the law of the first excursion of B away from 0 that hits level ε. More specifically, if G ε = sup{t < T ε : B t = 0} and D ε = inf{t > T ε : B t = 0}, ν ε is the law of (B (Gε+t)∧Dε , t ≥ 0). The measure ν ε is thus a probability measure on the set C = C(R + , R + ) of all continuous functions from R + into R + , and is supported on C ε = {e ∈ C : sup e(s) ≥ ε}. If 0 < ε < ε ′ , we have For every ε > 0, set Then n ε ′ = n ε (·∩C ε ′ ) for every 0 < ε < ε ′ . This leads to the following definition. Let us briefly state some simple properties of the Itô measure. First n(de) is supported on the set E consisting of all elements e ∈ C which have the property that there exists σ = σ(e) > 0 such that e(t) > 0 if and only if 0 < t < σ (the number σ(e) is called the length, or the duration of excursion e). By construction, n ε is the restriction of n to C ε , and in particular n(C ε ) = (2ε) −1 . Finally, if T ε (e) = inf{t ≥ 0 : e(t) = ε}, the law of (e(T ε (e) + t), t ≥ 0) under n(· | T ε < ∞) = ν ε is the law of (B t∧T0 , t ≥ 0) under P ε . The last property follows from the construction of the measure ν ε and the strong Markov property of Brownian motion at time T ε . Proposition 2.6 (i) For every t > 0, and every measurable function g : R + −→ R + such that g(0) = 0, In particular, n(σ > t) = n(e(t) > 0) = (2πt) −1/2 < ∞. Moreover, (ii) Let t > 0 and let Φ and Ψ be two nonnegative measurable functions defined respectively on C([0, t], R + ) and C. Then, Proof. (i) We may assume that g is bounded and continuous and that there exists α > 0 such that g(x) = 0 if x ≤ α. Then, by dominated convergence, using the property stated just before the proposition. From formula (21) we get The first assertion in (i) now follows, observing that q x (t) = 1 2 γ t (0, −x). The identity n(e(t) > 0) = (2πt) −1/2 < ∞ is obtained by taking g(x) = 1 {x>0} . The last assertion in (i) follows from (22), recalling that the function t → q x (t) is a probability density.

Finite-dimensional marginals under the Itô measure
If (T , d) is a real tree with root ρ, and if x 1 , . . . , x p ∈ T , the subtree spanned by x 1 , . . . , x p is simply the set It is easy to see that T (x 1 , . . . , x p ), equipped with the distance d, is again a real tree, which has a discrete structure: More precisely, T (x 1 , . . . , x p ) can be represented by a discrete skeleton (which is a discrete rooted tree with p labelled leaves) and the collection, indexed by vertices of the skeleton, of lengths of "branches". Rather than giving formal definitions for a general real tree, we will concentrate on the case of the tree T g coded by g in the sense of subsection 2.2.

Figure 3
To give a precise definition of θ(g; t 1 , . . . , t p ), we proceed by induction on p.
If now g satisfies the conditions in subsection 2.2, it is easy to see that θ(g; t 1 , . . . , t p ) corresponds to the tree T g (p g (t 1 ), . . . , p g (t p )) spanned by the vertices p g (t 1 ), . . . , p g (t p ) in the tree T g . More precisely, if we attach to every v ∈ τ (g; t 1 , . . . , t p ) a line segment in the plane with length h v , in such a way that the line segments attached to v and to its children share a common end (the same for all children of v) and that the line segments otherwise do not intersect, the union of the resulting line segments will give a representative of the equivalence class of T g (p g (t 1 ), . . . , p g (t p )). (Note that the order structure of τ (g; t 1 , . . . , t p ) plays no role in this construction. ) We let A (p) be the set of all rooted ordered trees with p leaves (a leaf of a tree τ ∈ A is a vertex u ∈ τ with no child, i.e. such that k u (τ ) = 0, with the notation of Section 1). We denote by Θ (p) the set of all marked trees with p leaves: Elements of Θ (p) are of the form θ = (τ, (h v ) v∈τ ) where τ ∈ A (p) and h v ≥ 0 for every v ∈ τ . The set Θ (p) is equipped with the obvious topology and the associated Borel σ-field. We also consider the corresponding sets of binary trees: A bin (p) is the set of all binary rooted trees with p leaves (and hence 2p − 1 vertices), and Θ bin (p) is the set of marked trees θ = (τ, (h v ) v∈τ ) whose skeleton τ belongs to A bin (p) . Recall that is the Catalan number of order p − 1.
Before proving this proposition, we state a lemma which is an immediate consequence of (21). Recall that B is a Brownian motion that starts from x under the probability P x , and that I = (I t , t ≥ 0) is the associated minimum process.

Lemma 2.9
If g is a nonnegative measurable function on R 3 and x ≥ 0, In particular, if h is a nonnegative measurable function on R 2 , Proof of Proposition 2.8. This is a simple consequence of Lemma 2.9. For p = 1, the result is exactly formula (23) in Proposition 2.6. We proceed by induction on p using the Markov property under n (property (ii) in Proposition 2.6) and then (25) The proof is then completed by using the induction hypothesis.
The existence of this function easily follows from our construction by induction of the marked tree θ(e; t 1 , . . . , t p ).
Denote by ∆ p the measure on R 2p−1 + defined by In view of Proposition 2.8, the proof of Theorem 2.7 reduces to checking that Γ p (∆ p ) = Λ p . For p = 1, this is obvious.
Let p ≥ 2 and suppose that the result holds up to order p − 1. For every j ∈ {1, . . . , p − 1}, let H j be the subset of R 2p−1 + defined by Then, On the other hand, it is immediate to verify that 1 Hj · ∆ p is the image of the measure The construction by induction of the tree θ(e; t 1 , . . . , t p ) exactly shows that, a.e. for the measure ∆ j (dα ′ where if θ ∈ Θ (j) and θ ′ ∈ Θ (p−j) , the tree θ * h θ ′ is obtained by concatenating the discrete skeletons of θ and θ ′ (as above in the construction by induction of θ(g; t 1 , . . . , t p )) and assigning the mark h to the root ∅.
Together with the induction hypothesis, the previous observations imply that for any nonnegative measurable function f on Θ (p) , To complete the proof, simply note that Remark. The fact that we get only binary trees in Theorem 2.7 corresponds to the property that local minima of Brownian motion are distinct: If 0 < t 1 < · · · < t p and if the local minima m g (t i , t i+1 ) are distinct, the tree θ(g; t 1 , . . . , t p ) is clearly binary.

Finite-dimensional marginals of the CRT
In this subsection, we propose to calculate the law of the tree θ(e; t 1 , . . . , t p ) when e is a normalized Brownian excursion. This corresponds to choosing p vertices independently uniformly on the CRT (the uniform measure on the CRT T e is by definition the image of Lebesgue measure on [0, 1] under the mapping p e ) and determining the law of the tree spanned by these vertices. In contrast with the measure Λ p of Theorem 2.7, we will get for every p a probability measure on Θ bin (p) , which may be called the p-dimensional marginal distribution of the CRT (cf Aldous [1], [2]).
We first recall the connection between the Itô measure and the normalized Brownian excursion. Informally, the law of the normalized Brownian excursion (e in the notation of Section 1) is n(de | σ(e) = 1). More precisely, using a standard desintegration theorem for measures, together with the Brownian scaling property, one easily shows that there exists a unique collection of probability measures (n (s) , s > 0) on the set E of excursions, such that the following properties hold: (i) For every s > 0, n (s) (σ = s) = 1.
(ii) For every λ > 0 and s > 0, the law under n (s) (de) of e λ (t) = √ λ e(t/λ) is n (λs) . (iii) For every Borel subset A of E, The measure n (1) is the law of the normalized Brownian excursion e which was considered in Section 1 and in subsection 2.3 above.
Our first goal is to get a statement more precise than Theorem 2.7 by considering the pair (θ(e; t 1 , . . . , t p ), σ) instead of θ(e; t 1 , . . . , t p ). If θ = (τ, {h v , v ∈ τ }) is a marked tree, the length of θ is defined in the obvious way by Proposition 2.10 The law of the pair (θ(e; t 1 , . . . , t p ), σ) under the measure Proof. Recall the notation of the proof of Theorem 2.7. We will verify that, for any nonnegative measurable function f on R 3p + , . , α p−1 , β 1 , . . . , β p , s 1 , .. , s p+1 ). (26) Suppose that (26) holds. It is easy to check (for instance by induction on p) that Using the convolution identity q x * q y = q x+y (which is immediate from the interpretation of q x as the density of T x under P 0 ), we get from (26) that As in the proof of Theorem 2.7, the statement of Proposition 2.10 follows from this last identity and the equality Γ p (∆ p ) = Λ p .
It remains to prove (26). The case p = 1 is easy: By using the Markov property under the Itô measure (Proposition 2.6 (ii)), then the definition of the function q x and finally (22), we get Let p ≥ 2. Applying the Markov property under n successively at t p and at t p−1 , and then using (24), we obtain It is then straightforward to complete the proof by induction on p.
We can now state and prove the main result of this subsection.

Theorem 2.11
The law of the tree θ(e; t 1 , . . . , t p ) under the probability measure Proof. Let F be a nonnegative bounded continuous function on Θ (p) and let h be bounded, nonnegative and measurable on R + . By Proposition 2.10, On the other hand, using the properties of the definition of the measures n (s) , we have also ; t 1 , . . . , t p )).
By comparing with the previous identity, we get for a.a. s > 0, Both sides of the previous equality are continuous functions of s (use the scaling property of n (s) for the left side). Thus the equality holds for every s > 0, and in particular for s = 1. This completes the proof.
Concluding remarks. If we pick t 1 , . . . , t p in [0, 1] we can consider the increasing rearrangement t ′ 1 ≤ t ′ 2 ≤ · · · ≤ t ′ p of t 1 , . . . , t p and define θ(e; t 1 , . . . , t p ) = θ(e; t ′ 1 , . . . , t ′ p ). We can also keep track of the initial ordering and consider the treeθ(e; t 1 , . . . , t p ) defined as the tree θ(e; t 1 , . . . , t p ) where leaves are labelled 1, . . . , p, the leaf corresponding to time t i receiving the label i. (This labelling has nothing to do with the ordering of the tree.) Theorem 2.11 implies that the law of the treeθ(e; t 1 , . . . , t p ) under the probability measure with respect to Θ bin (p) (dθ), the uniform measure on the set of labelled marked (ordered) binary trees with p leaves.
We can then "forget" the ordering. Letθ(e; t 1 , .. , t p ) be the treeθ(e; t 1 , .. , t p ) without the order structure. Since there are 2 p−1 possible orderings for a given labelled binary tree with p leaves, we get that the law (under the same measure) of the treeθ(e; t 1 , . . . , t p ) has density 2 2p L(θ) exp(−2L(θ) 2 ) with respect toΘ bin (p) (dθ), the uniform measure on the set of labelled marked (unordered) binary trees with p leaves.
In agreement with Aldous' normalization of the CRT, replace the excursion e by 2e (this simply means that all marks h v are multiplied by 2). We obtain that the law of the treeθ(2e; t 1 , . . . , t p ) has density with respect toΘ bin (p) (dθ). It is remarkable that the previous density (apparently) does not depend on p.
In the previous form, we recognize the finite-dimensional marginals of Aldous' continuum random tree [1], [2]. To give a more explicit description, the discrete skeletonτ (2e; t 1 , . . . , t p ) is distributed uniformly on the set of labelled rooted binary trees with p leaves. This set has b p elements, with Then, conditionally on the discrete skeleton, the marks h v are distributed with the density (verify that this is a probability density on R 2p−1 + !).
Bibliographical notes. The coding of real trees described in subsection 2.2 is taken from [12], although the underlying ideas can be found in earlier papers (see in particular [2] and [25]). A simple approach to Theorem 2.2, based on the four-point condition for real trees, can be found in Lemma 3.1 of [18]. See e.g. Chapter XII of [38] or the last section of [39] for a thorough discussion of the Itô excursion measure. The CRT was introduced and studied by Aldous [1], [2]. The direct approach to finite-dimensional marginals of the CRT which is presented in subsections 2.5 and 2.6 above is taken from [26].

The Brownian Snake and its Connections with Partial Differential Equations
Our goal in this section is to combine the continuous tree structure studied in the previous section with independent spatial motions: In additional to the genealogical structure, "individuals" move in space according to the law of a certain Markov process ξ. This motivates the definition of the path-valued process called the Brownian snake. We study basic properties of the Brownian snake, and we use our previous calculation of marginal distributions of random real trees (subsection 2.5) to give explicit formulas for moment functionals. We then introduce the exit measure of the Brownian snake from a domain, and we derive a key integral equation for the Laplace functional of this random measure. In the case when the spatial motion ξ is Brownian motion in R d , this integral equation leads to important connections with semilinear partial differential equations, which have been studied recently by several authors.

Combining the branching structure of a real tree with a spatial displacement
We consider a Markov process (ξ t , Π x ) t≥0,x∈E with values in a Polish space E. We will assume that ξ has continuous sample paths and that there exist an integer m > 2 and positive constants C and ε such that for every x ∈ E and t > 0, where δ denotes the distance on E. This assumption is not really necessary, but it will simplify our treatment. It holds for Brownian motion or for solutions of stochastic differential equations with smooth coefficients in R d or on a manifold. Later ξ will simply be d-dimensional Brownian motion, but for the moment it is preferable to argue in a more general setting. We denote by W the set of all finite E-valued paths. An element of W is a continuous mapping w : [0, ζ] → E, where ζ = ζ (w) ≥ 0 depends on w and is called the lifetime of w. The final point of w will be denoted byŵ = w(ζ). If x ∈ E, the trivial path w such that ζ (w) = 0 and w(0) = x is identified with the point x, so that E is embedded in W. The set W is a Polish space for the distance For x ∈ E we denote by W x the set W x = {w ∈ W : w(0) = x}.
If g : R + → R + is a continuous function with compact support such that g(0) = 0, we saw in subsection 2.2 that g codes a real tree T g . Our goal is now to combine this branching structure with spatial motions distributed according to the law of the process ξ. To this end, we will not make an explicit use of the tree T g but rather give our definitions in terms of the coding function. It will be convenient to drop the compact support assumption on g, and to consider instead a general function f ∈ C(R + , R + ).
Notation. Let w : [0, ζ] → E be an element of W, let a ∈ [0, ζ] and b ≥ a. We denote by R a,b (w, dw ′ ) the unique probability measure on W such that Under R a,b (w, dw ′ ), the path w ′ is the same as w up to time a and then behaves according to the spatial motion ξ up to time b.
Let (W s , s ≥ 0) denote the canonical process on the space C(R + , W) of continuous functions from R + into W. We also denote by ζ s = ζ (Ws) the lifetime of W s .
. Assume that f is locally Hölder continuous with exponent η for every η ∈ (0, 1 2 ). Let w ∈ W with ζ (w) = f (0). Then, there exists a unique probability measure Γ f w on C(R + , W) such that W 0 = w, Γ f w a.s., and, under Γ f w , the canonical process (W s , s ≥ 0) is (time-inhomogeneous) Markov with transition kernel between times s and s ′ given by The intuitive meaning of this construction should be clear : At least when f (0) = 0 and f has compact support (so that we can use the theory of subsection 2.2), the vertices p f (s) and p f (s ′ ) in the tree T f have the same ancestors up to generation m f (s, s ′ ). Therefore, the corresponding spatial motions W s and W s ′ must be the same up to time m f (s, s ′ ) and then behave independently. This means that the path W s ′ is obtained from the path W s through the kernel Proof. We give the detailed argument only in the case when f (0) = 0, which implies that w = x ∈ E. We leave the general case as an exercise for the reader (cf the proof of Proposition IV.5 in [27]).
For each choice of 0 ≤ t 1 ≤ t 2 ≤ · · · ≤ t p , we can consider the probability measure π x,f t1,...,tp on W p defined by It is easy to verify that this collection is consistent when p and t 1 , . . . , t p vary. Hence the Kolmogorov extension theorem yields the existence of a process ( W s , s ≥ 0) with values in W (in fact in W x ) whose finite-dimensional marginals are the measures π x,f t1,...,tp . We then verify that ( W s , s ≥ 0) has a continuous modification. Thanks to the classical Kolmogorov lemma, it is enough to show that, for every T > 0 there are constants β > 0 and C such that for every s ≤ s ′ ≤ T . (Here m is as in assumption (27).) Our assumption on f guarantees that for every η ∈ (0, 1/2) there exists a finite constant C η,T such that, for every s, s ′ ∈ [0, T ], By our construction, the joint distribution of ( W s , W s ′ ) is This means that W s and W s ′ are two random paths that coincide up to time m f (s, s ′ ) and then behave independently according to the law of the process ξ.
Using the definition of the distance d, we get for every s, where we used assumption (27) in the second inequality, and (29) in the third one. We can choose η close enough to 1 2 so that mη > 1 and (2 + ε)η > 1. The bound (28) then follows.
We then define Γ f x as the law on C(R + , W) of the continuous modification of ( W s , s ≥ 0). The fact that under Γ f x the canonical process is Markov with the given transition kernels is obvious from the choice of the finite-dimensional marginals. The form of the marginals also readily shows that ζ t1 = f (t 1 ), . . . , ζ tp = f (t p ), Γ f x a.s., for every finite collection of times t 1 , . . . , t p . The last assertion of the proposition then follows from a continuity argument.
The process (W s , s ≥ 0) under the probability measure Γ f w is sometimes called the snake driven by the function f (with spatial motion ξ and initial value w). From the form of the transition kernels, and an easy continuity argument we have, Γ f w a.s. for every 0 ≤ s ≤ s ′ , We sometimes refer to this property as the snake property. Note in particular that if x = w(0) we have W s (0) = x for every s ≥ 0, Γ f w a.s.

The Brownian snake
We now randomize f in the construction of the previous subsection. For every r ≥ 0, we denote by P r (df ) the law of reflected Brownian motion started at r (the law of (|B s |, s ≥ 0) if B is a linear Brownian motion started at r). Then P r (df ) is a probability measure on the set C(R + , R + ). Note that the assumption of Proposition 3.1 holds P r (df ) a.s. For every s > 0, we denote by ρ r s (da db) the law under P r of the pair ( inf 0≤u≤s f (u), f (s)).
The reflection principle (cf subsection 2.4) easily gives the explicit form of ρ r s (da, db): Theorem 3.2 For every w ∈ W, denote by P w be the probability measure on C(R + , W) defined by The process (W s , P w ) s≥0,w∈W is a continuous (time-homogeneous) Markov process with values in W, with transition kernels This process is called the Brownian snake with spatial motion ξ.
Proof. The continuity is obvious and it is clear that W 0 = w, P w a.s. The semigroup property for the kernels Q s is also easy to verify. As for the Markov property, we write This completes the proof.
Under P w , the process (ζ s , s ≥ 0) is a reflected Brownian motion started at ζ (w) . This property is obvious from the last assertion of Proposition 3.1 and the very definition of P w .
The snake property can then be stated in the form for every s < s ′ , P w a.s. In particular, if w(0) = x we have W s ∈ W x for every s ≥ 0, P w a.s.
We now state the strong Markov property of W , which is very useful in applications. We denote by F s the canonical filtration on C(R + , W) (F s is the σ-field generated by W r , 0 ≤ r ≤ s) and as usual we take The process (W s , P w ) s≥0,w∈W is strong Markov with respect to the filtration (F s+ ).
Proof. Let T be a stopping time of the filtration (F s+ ) such that T ≤ K for some K < ∞. Let F be bounded and F T + measurable, and let Ψ be a bounded measurable function on W. It is enough to prove that for every s > 0, We may assume that Ψ is continuous. Then, In the first equality, we used the continuity of paths and in the second one the ordinary Markov property, together with the fact that 1 {k/n≤T <(k+1)/n} F is F (k+1)/n -measurable. At this point, we need an extra argument. We claim that Clearly, the desired result follows from (30), because on the set {k/n ≤ T < (k + 1)/n} we can bound To prove (30), we write down explicitly and a similar expression holds for E Wt Ψ(W s ) . Set c(ε) = sup t≤K,t≤r≤t+ε |ζ r − ζ t | and note that c(ε) tends to 0 as ε → 0, P w a.s. Then observe that if t ≤ K and t ≤ r ≤ r + ε, the paths W r and W t coincide at least up to time (ζ t − c(ε)) + . Therefore we have for every a ≤ (ζ t −c(ε)) + and b ≥ a. The claim (30) follows from this observation and the known explicit form of ρ ζr s (da, db).
Remark. The strong Markov property holds for W even if the underlying spatial motion ξ is not strong Markov.

Excursion measures of the Brownian snake
For every x ∈ E, the excursion measure N x is the σ-finite measure on C(R + , W) defined by where n(de) denotes Itô's excursion measure as in Section 2. In particular, (ζ s , s ≥ 0) is distributed under N x according to the Itô measure n(de). As in Section 2, we will use the notation σ for the duration of the excursion under N x (dω): σ = inf{s > 0 : ζ s = 0}.
Note that W s ∈ W x for every s ≥ 0, N x a.e.
Let us comment on the intuitive interpretation of N x . As we saw in Section 2, the excursion (ζ s , 0 ≤ s ≤ σ) codes a random real tree. If a is a vertex of this real tree, corresponding say to s ∈ [0, σ], the path W s gives the spatial positions along the line of ancestors of a in the tree, and in particular its terminal point W s is the position of a. The snake property reflects the fact that the ancestors of the vertices corresponding to s and s ′ are the same up to generation inf [s,s ′ ] ζ r . To summarize, the paths W s , 0 ≤ s ≤ σ form under N x a "tree of paths" of the spatial motion ξ, whose genealogical structure is governed by the tree coded by the Brownian excursion (ζ s , 0 ≤ s ≤ σ).
Remark. We know that the process (W s , P w ) s≥0,w∈W , where the driving random function is reflected Brownian motion, is a continuous strong Markov process. Furthermore, every point x ∈ E (viewed as an element of W) is regular for W , in the sense that P This last property is immediate from the analogous property for reflected linear Brownian motion. Thus it makes sense to consider the excursion measure of W away from x, in the sense of the general Itô excursion theory (see e.g. Blumenthal [4]), and this excursion measure is easily identified with N x .
In what follows, since we use the notation (W s , s ≥ 0) for the canonical process on C(R + , W), it is important to realize that the driving random function of our Brownian snake (W s , s ≥ 0) is either reflected Brownian motion (under the probability measures P w ) or a Brownian excursion (under the excursion measures N x ).
We will make use of the strong Markov property under N x . To state it, it will be convenient to introduce the following notation. For every w ∈ W, we denote by P * w the distribution under P w of the process where σ := inf{s > 0 : ζ s = 0} as above.
Theorem 3.4 Let T be a stopping time of the filtration (F s+ ). Assume that 0 < T ≤ σ, N x a.e. Then, if F and G are nonnegative measurable functionals on C(R + , W), and if F is F T+ -measurable, we have If we interpret N x as an excursion measure away from x (as explained in the remark above), the preceding theorem becomes a well-known fact of the theory of Markov processes: See e.g. [4]. Alternatively, it is also easy to give a direct proof of Theorem 3.4 using the same method as in the proof of Theorem 3.3, together with the simple Markov property under the Itô excursion measure (cf Proposition 2.6 (ii)).
We will now use Theorem 2.7 to derive some important formulas under the excursion measure N x . Let p ≥ 1 be an integer. Recall from subsection 2.5 the notation Θ bin (p) for the set of all marked trees with p leaves, and let θ ∈ Θ bin (p) . For every x ∈ E, we associate with θ a probability measure on W p , denoted by Π θ x , which is defined inductively as follows.
Informally, Π θ x is obtained by running independent copies of ξ along the branches of the tree θ.
Finally, we recall the notation θ(f ; t 1 , . . . , t p ) from subsection 2.5, and we let Λ p be as in Theorem 2.7 the uniform measure on Θ bin (p) .
(ii) For any nonnegative Borel measurable function F on W p , Proof. Assertion (i) follows easily from the definition of Γ f x and the construction of the trees θ(f ; t 1 , . . . , t p ). A precise argument can be given using induction on p, but we leave details to the reader. To get (ii), we write The first equality is the definition of N x , the second one is part (i) of the proposition, and the last one is Theorem 2.7.
The cases p = 1 and p = 2 of Proposition 3.5 (ii) are of special interest. Let us rewrite the corresponding formulas in a particular case. Recall that we denote byŵ the terminal point of w. For any nonnegative Borel measurable function g on E, we have Remark. In addition to (31), we could also consider the associated normalized excursion measure where n (1) is as in Section 2 the law of the normalized Brownian excursion. Let Z be the random probability measure on E defined under N (1) x by In the case when ξ is Brownian motion in E = R d and x = 0, the random measure Z is called ISE (Aldous [3]) for Integrated Super-Brownian Excursion. ISE and its variants play an important role in various asymptotics for statistical mechanics models (see e.g. [7], [21]). In such applications, the explicit formula for the moments of the random measure Z is often useful. This formula is proved by the same method as Proposition 3.5 (ii), using Theorem 2.11 instead of Theorem 2.7. Precisely, we have the following result.
Proposition 3.6 For any nonnegative Borel measurable function g on E, We conclude this subsection with an important technical lemma. We fix w ∈ W x with ζ (w) > 0 and now consider the Brownian snake under P w , that is the driving random function of the snake is reflected Brownian motion (and no longer a Brownian excursion as in the beginning of this section). We again use the notation σ = inf{s > 0 : ζ s = 0} From the snake property we have in fact W i ∈ C(R + , W w(ζα i ) ).
is under P w a Poisson point measure on R + × C(R + , W) with intensity Proof. A well-known theorem of Lévy (already used in Section 1) states that, if (β t , t ≥ 0) is a linear Brownian motion started at a, the process β t − inf [0,t] β r is a reflected Brownian motion whose local time at 0 is t → 2 a − inf [0,t] β r . From this and excursion theory, it follows that the point measure is under P w a Poisson point measure with intensity 2 1 [0,ζw] (t)dt n(de) .
It remains to combine this result with the spatial displacements. To this end, fix a function f ∈ C(R + , R + ) such that f (0) = ζ (w) , σ(f ) = inf{t > 0 : f (t) = 0} < ∞ and f is locally Hölder with exponent 1 2 − γ for every γ > 0. Recall the notation Γ f w from subsection 3.1 above. Denote by e j , j ∈ J the excursions of f (s) − inf [0,s] f (r) away from 0 before time σ(f ), by (a j , b j ), j ∈ J the corresponding time intervals, and define for every j ∈ J W j s (t) = W (aj +s)∧bj f (a j ) + t , From the definition of Γ f w , it is easily verified that the processes W j , j ∈ J are independent under Γ f w , with respective distributions Γ ej w(f (aj )) .
Let F be a bounded nonnegative measurable function on R + × C(R + , W), such that F (t, ω) = 0 if sup ζ s (ω) ≤ γ, for some γ > 0. Recall the notation P r (df ) for the law of reflected Brownian motion started at r. By using the last observation and then the beginning of the proof, we get The third equality is the exponential formula for Poisson measures, and the last one is the definition of N x . This completes the proof.

The exit measure
Let D be an open set in E and fix x ∈ D. For every w ∈ W x set where inf ∅ = +∞. Define so that E D is the set of all exit points from D of the paths W s , for those paths that do exit D. Our goal is to construct N x a.e. a random measure that is in some sense uniformly spread over E D . To avoid trivial cases, we first assume that We start by constructing a continuous increasing process that increases only on the set {s ≥ 0 : τ (W s ) = ζ s }. Proof. Since N x can be viewed as the excursion measure of W away from x, it is enough to prove that the given statement holds under P w . Indeed, if follows from the construction of the Itô measure that, for every h > 0, N x (·| sup ζ s > h) is the law under P x of the first excursion of W away from x with "height" greater than h, and so the result under N x can easily be derived from the case of P x .
We use the following lemma, where w ∈ W x is fixed.
Then σ s < ∞ for every s ≥ 0, P w a.s., and the process Γ s = γ σs is under P w a reflected Brownian motion started at (ζ w − τ (w)) + . which is the desired result.
Proof of Lemma 3.9. For every ε > 0, introduce the stopping times We first verify that the stopping times S ε n and T ε n are finite P w a.s. By applying the strong Markov property at inf{s ≥ 0, ζ s = 0}, it is enough to consider the case when w = x. Still another application of the strong Markov property shows that it is enough to verify that S ε 1 < ∞ a.s. To this end, observe that P x ζ 1 ≥ τ (W 1 ) + ε > 0 (by (33) and because, conditionally on ζ 1 , W 1 is a path of ξ with length ζ 1 ) and apply the strong Markov property at inf{s ≥ 1, ζ s = 0}.
From the snake property and the continuity of s → ζ s , one easily gets that the mapping s → γ s is also continuous. It follows that γ S ε 1 = ε ∨ (ζ (w) − τ (w)) and γ S ε n = ε for n ≥ 2. We then claim that, for every n ≥ 1, we have T ε n = inf s ≥ S ε n : ζ s = τ (W S ε n ) . Indeed the snake property implies that for the paths W r and W S ε n coincide for t ≤ τ (W S ε n ), so that τ (W r ) = τ (W S ε n ). This argument also shows that γ r = ζ r − τ (W S ε n ) for S ε n ≤ r ≤ T ε n .
From the previous observations and the strong Markov property of the Brownian snake, we see that the processes γ (S ε n +r)∧T ε n , r ≥ 0 , n = 1, 2, . . . are independent and distributed according to the law of a linear Brownian motion started at ε (at ε ∨ (ζ (w) − τ (w)) for n = 1) and stopped when it hits 0. Hence, if the process (γ σ ε r , r ≥ 0) is obtained by pasting together a linear Brownian motion started at ε ∨ (ζ (w) − τ (w)) and stopped when it hits 0, with a sequence of independent copies of the same process started at ε. A simple coupling argument shows that (γ σ ε r , r ≥ 0) converges in distribution as ε → 0 to reflected Brownian motion started at (ζ (w) −τ (w)) + . The lemma follows since it is clear that σ ε r ↓ σ r a.s. for every r ≥ 0.
Definition. The exit measure Z D from D is defined under N x by the formula From Proposition 3.8 it is easy to obtain that L D s increases only on the (closed) set {s ∈ [0, σ] : ζ s = τ (W s )}. It follows that Z D is (N x a.e.) supported on E D .
Let us consider the case when (33) does not hold. Then a first moment calculation using the case p = 1 of Proposition 3.5 (ii) shows that Therefore the result of Proposition 3.8 still holds under N x with L D s = 0 for every s ≥ 0. Consequently, we take Z D = 0 in that case.
We will need a first moment formula for L D . With a slight abuse of notation, we also denote by τ the first exit time from D for ξ. Proposition 3.10 Let Π D x denote the law of (ξ r , 0 ≤ r ≤ τ ) under the subprobability measure Π x (· ∩ {τ < ∞}) (Π D x is viewed as a measure on W x ). Then, for every bounded nonnegative measurable function G on W x , In particular, for every bounded nonnegative measurable function g on E, Proof. We may assume that G is continuous and bounded, and G(w) = 0 if ζ (w) ≤ K −1 or ζ (w) ≥ K, for some K > 0. By Proposition 3.8, N x a.e. If we can justify the fact that the convergence (34) also holds in L 1 (N x ), we will get from the case p = 1 of Proposition 3.5 (ii): It remains to justify the convergence in L 1 (N x ). Because of our assumption on G we may deal with the finite measure N x · ∩ {sup ζ s > K −1 } and so it is enough to prove that is finite. This easily follows from the case p = 2 of Proposition 3.5 (ii), using now the fact that G(w) = 0 if ζ (w) ≥ K.
Let us give an important remark. Without any additional effort, the previous construction applies to the more general case of a space-time open set D ⊂ R + × E, such that (0, x) ∈ D. In this setting, Z D is a random measure on ∂D ⊂ R + × E such that for g ∈ C b+ (∂D) To see that this more general case is in fact contained in the previous construction, simply replace ξ by the spacetime process ξ ′ t = (t, ξ t ), which also satisfies assumption (27), and note that the Brownian snake with spatial motion ξ ′ is related to the Brownian snake with spatial motion ξ in a trivial manner.
We will now derive an integral equation for the Laplace functional of the exit measure. This result is the key to the connections with partial differential equations that will be investigated later.
Theorem 3.11 Let g be a nonnegative bounded measurable function on E. For every x ∈ E, set The function u solves the integral equation Our proof of Theorem 3.11 is based on Lemma 3.7. Another more computational proof would rely on calculations of moments of the exit measure from Proposition 3.5 above.
Proof. For every r > 0 set η D r = inf{s ≥ 0 : L D s > r}, with the usual convention inf ∅ = ∞. By the definition of Z D , we have The second equality is the simple identity 1 − exp(−A t ) = t 0 dA s exp(−(A t − A s )) valid for any continuous nondecreasing function A. The third equality is the change of variables s = η D r and the fourth one follows from the strong Markov property under N x (cf Theorem 3.4) at the stopping time η D r . Let w ∈ W x be such that ζ (w) = τ (w). From Lemma 3.7, we have Hence, by Proposition 3.10. The proof is now easily completed by the usual Feynman-Kac argument:

The probabilistic solution of the nonlinear Dirichlet problem
In this subsection, we assume that ξ is Brownian motion in R d . The results however could easily be extended to an elliptic diffusion process in R d or on a manifold.
We say that y ∈ ∂D is regular for D c if inf{t > 0 : ξ t ∈ D} = 0 , Π y a.s.
The open set D is called regular if every point y ∈ ∂D is regular for D c . We say that a real-valued function u defined on D solves ∆u = 4u 2 in D if u is of class C 2 on D and the equality ∆u = 4u 2 holds pointwise on D.
where the notation u |∂D = g means that for every y ∈ ∂D, lim D∋x→y u(x) = g(y) .
Proof. First observe that, by (35), so that u is bounded in D. Let B be a ball whose closure is contained in D, and denote by τ B the first exit time from B. From (35)  = Π x Π ξτ B (1 {τ <∞} g(ξ τ )) .
By combining this with formula (35) applied with x = ξ τB , we arrive at The function h(x) = Π x u(ξ τB ) is harmonic in B, so that h is of class C 2 and ∆h = 0 in B. Set where G B is the Green function of Brownian motion in B. Since u is measurable and bounded, Theorem 6.6 of [37] shows that f is continuously differentiable in B, and so is u since u = h − 2f . Then again by Theorem 6.6 of [37], the previous formula for f implies that f is of class C 2 in B and − 1 2 ∆f = u 2 in B, which leads to the desired equation for u.
For the second part of the theorem, suppose first that D is bounded, and let y ∈ ∂D be regular for D c . Then, if g is continuous at y, it is well known that lim D∋x→y Π x g(ξ τ ) = g(y) .
On the other hand, we have also lim sup Thus (35) implies that lim D∋x→y u(x) = g(y) .
When D is unbounded, a similar argument applies after replacing D by D ∩ B, where B is now a large ball: Argue as in the derivation of (37) to verify that for x ∈ D ∩ B, u(x) + 2Π x τD∩B 0 u(ξ s ) 2 ds = Π x 1 {τ ≤τB} g(ξ τ ) + Π x 1 {τB<τ } u(ξ τB ) and then follow the same route as in the bounded case. The nonnegative solution of the problem (36) is always unique. When D is bounded, this is a consequence of the following analytic lemma. Then u ≤ v.
Proof. Set f = u − v and D ′ = x ∈ D, f (x) > 0 . If D ′ is not empty, we have for every x ∈ D ′ . Furthermore, it follows from the assumption and the definition of D ′ that lim sup for every z ∈ ∂D ′ . Then the classical maximum principle implies that f ≤ 0 on D ′ , which is a contradiction.  Proof. (i) By translation invariance we may assume that x = 0. We then use a scaling argument. For λ > 0, the law under n(de) of e λ (s) = λ −1 e(λ 2 s) is λ −1 n (exercise !). It easily follows that the law under N 0 of W (ε) s (t) = ε −1 W ε 4 s (ε 2 t) is ε −2 N 0 . Then, with an obvious notation, It remains to verify that N 0 R ∩ B(0, 1) c = ∅ < ∞. If this were not true, excursion theory would imply that P 0 a.s., infinitely many excursions of the Brownian snake exit the ball B(0, 1) before time 1. Clearly this would contradict the continuity of s → W s under P 0 .
(ii) Let x ∈ D and r > 0 be such thatB(x, r) ⊂ D. By Corollary 3.14, we have for every y ∈ B(x, r) u(y) = N y 1 − exp − Z B(x,r) , u .
In particular, In the second inequality we used the fact that Z B(x,r) is supported on E B(x,r) ⊂ R ∩ B(x, r) c .
(iii) Let (u n ) be a sequence of nonnegative solutions of ∆u = 4 u 2 in D such that u n (x) −→ u(x) as n → ∞ for every x ∈ D. Let U be an open ball whose closure is contained in D. By Corollary 3.14, for every n ≥ 1 and x ∈ U , u n (x) = N x 1 − exp − Z U , u n .
Note that N x (Z U = 0) < ∞ (by (i)) and that the functions u n are uniformly bounded on ∂U (by (ii)). Hence we can pass to the limit in the previous formula and get u(x) = N x 1 − exp − Z U , u for x ∈ U . The desired result then follows from Theorem 3.12.
Let us conclude this subsection with the following remark. Theorem 3.11 could be applied as well to treat parabolic problems for the operator ∆u − 4u 2 . To this end we need only replace the Brownian motion ξ by the space-time process (t, ξ t ). If we make this replacement and let D ⊂ R + × R d be a spacetime domain, then for every bounded nonnegative measurable function g on ∂D, the formula u(t, x) = N t,x 1 − exp − Z D , g gives a solution of ∂u ∂t + 1 2 ∆u − 2u 2 = 0 in D. Furthermore, u has boundary condition g under suitable conditions on D and g. The proof proceeds from the integral equation (35) as for Theorem 3.11.

Solutions with boundary blow-up
Proof. First note that u 1 (x) < ∞ by Proposition 3.15 (i). For every n ≥ 1, set v n (x) = N x (1 − exp −n Z D , 1 ), x ∈ D. By Theorem 3.12, v n solves (36) with g = n. By Proposition 3.15 (iii), u 1 = lim ↑ v n also solves ∆u = 4u 2 in D. The condition u 1 | ∂D = ∞ is clear since u 1 ≥ v n and v n | ∂D = n. Finally if v is another nonnegative solution of the problem (38), the comparison principle (Lemma 3.13) implies that v ≥ v n for every n and so v ≥ u 1 . Proof First note that R is connected N x a.e. as the range of the continuous mapping s → W s . It follows that we may deal separately with each connected component of D, and thus assume that D is a domain. Then we can easily construct a sequence (D n ) of bounded regular subdomains of D, such that D = lim ↑ D n andD n ⊂ D n+1 for every n. Set v n (x) = N x (Z Dn = 0) ,ṽ n (x) = N x (R ∩ D c n = ∅) for x ∈ D n . By the support property of the exit measure, it is clear that v n ≤ṽ n . We also claim thatṽ n+1 (x) ≤ v n (x) for x ∈ D n . To verify this, observe that on the event {R ∩ D c n+1 = ∅} there exists a path W s that hits D c n+1 . For this path W s , we must have τ Dn (W s ) < ζ s (here τ Dn stands for the exit time from D n ), and it follows from the properties of the Brownian snake that This follows easily from the fact that the event {R ∩ D c = ∅} is equal N x a.e. to the intersection of the events {R ∩ D c n = ∅}. By Proposition 3.16, v n solves ∆u = 4u 2 in D n . It then follows from (39) and Proposition 3.15 (iii) that u 2 solves ∆u = 4u 2 in D. Finally, if u is another nonnegative solution in D, the comparison principle implies that u ≤ v n in D n and it follows that u ≤ u 2 .
Example. Let us apply the previous proposition to compute N x (0 ∈ R) for x = 0. By rotational invariance and the same scaling argument as in the proof of Proposition 3.15 (i), we get N x (0 ∈ R) = C|x| −2 with a nonnegative constant C. On the other hand, by Proposition 3.17, we know that u(x) = N x (0 ∈ R) solves ∆u = 4u 2 in R d \{0}. A short calculation, using the expression of the Laplacian for a radial function, shows that the only possible values of C are C = 0 and C = 2 − d 2 . Since u is the maximal solution, we conclude that if d ≤ 3, whereas N x (0 ∈ R) = 0 if d ≥ 4. In particular, points are polar (in the sense that they are not hit by the range) if and only if d ≥ 4.
Let us conclude with some remarks. First note that, if D is bounded and regular (the boundedness is superfluous here), the function u 2 of Proposition 2 also satisfies u 2 | ∂D = +∞. This is obvious since u 2 ≥ u 1 . We may ask the following two questions.
1. If D is regular, is it true that u 1 = u 2 ? (uniqueness of the solution with boundary blow-up) 2. For a general domain D, when is it true that u 2 | ∂D = +∞? (existence of a solution with boundary blow-up) A complete answer to question 2 is provided in [8] (see also [27]). A general answer to 1 is still an open problem (see [27] and the references therein for partial results).
Bibliographical notes. Much of this section is taken from [27], where additional references about the Brownian snake can be found. The connections with partial differential equations that are discussed in subsections 3.5 and 3.6 were originally formulated by Dynkin [13] in the language of superprocesses (see Perkins [35] for a recent account of the theory of superprocesses). These connections are still the subject of an active research: See Dynkin's books [14], [15]. Mselati's thesis [33] gives an application of the Brownian snake to the classification and probabilistic representation of the solutions of ∆u = u 2 in a smooth domain.