A survey of random processes with reinforcement

The models surveyed include generalized P\'{o}lya urns, reinforced random walks, interacting urn models, and continuous reinforced processes. Emphasis is on methods and results, with sketches provided of some proofs. Applications are discussed in statistics, biology, economics and a number of other areas.


Introduction
In 1988 I wrote a Ph.D. thesis entitled "Random Processes with Reinforcement". The first section was a survey of previous work: it was under ten pages. Twenty years later, the field has grown substantially. In some sense it is still a collection of disjoint techniques. The few difficult open problems that have been solved have not led to broad theoretical advances. On the other hand, some nontrivial mathematics is being put to use in a fairly coherent way by communities of social and biological scientists. Though not full time mathematicians, these scientists are mathematically apt, and continue to draw on what theory there is. I suspect much time is lost, google not withstanding, as they sift through the existing literature and folklore in search of the right shoulders to stand on. My primary motivation for writing this survey is to create universal shoulders: a centralized base of knowledge of the three or four most useful techniques, in a context of applications broad enough to speak to any of half a dozen constituencies of users.
Such an account should contain several things. It should contain a discussion of the main results and methods, with sufficient sketches of proofs to give a pretty good idea of the mathematics involved 1 . It should contain precise pointers to more detailed statements and proofs, and to various existing versions of the results. It should be historically accurate enough not to insult anyone still living, while providing a modern editorial perspective. In its choice of applications it should winnow out the trivial while not discarding what is simple but useful.
The resulting survey will not have the mathematical depth of many of the Probability Surveys. There is only one nexus of techniques, namely the stochastic approximation / dynamical system approach, which could be called a theory and which contains its own terminology, constructions, fundamental results, compelling open problems and so forth. There would have been two, but it seems that the multitype branching process approach pioneered by Athreya and Karlin has been taken pretty much to completion by recent work of S. Janson.
There is one more area that seems fertile if not yet coherent, namely reinforcement in continuous time and space. Continuous reinforcement processes are to reinforced random walks what Brownian motion is to simple random walk, that is to say, there are new layers of complexity. Even excluding the hot new subfield of SLE, which could be considered a negatively reinforced process, there are several other self-interacting diffusions and more general continuous-time processes that open up mathematics of some depth and practical relevance. These are not yet at the mature "surveyable" state, but a section has been devoted to an in-progress glimpse of them.
The organization of the rest of the survey is as follows. Section 2 provides an overview of the basic models, primarily urn models, and corresponding known methods of analysis. Section 3 is devoted to urn models, surveying what is known about some common variants. Section 4 collects applications of these models from a wide variety of disciplines. The focus is on useful application rather than on new mathematics. Section 5 is devoted to reinforced random walks. These are more complicated than urn models and therefore less likely to be taken literally in applications, but have been the source of many of the recognized open problems in reinforcement theory. Section 6 introduces continuous reinforcement processes as well as negative reinforcement. This includes the self-avoiding random walk and its continuous limits, which are well studied in the mathematical physics literature, though not yet thoroughly understood.

Overview of models and methods
Dozens of processes with reinforcement will be discussed in the remainder of this survey. A difficult organizational issue has been whether to interleave general results and mathematical infrastructure with detailed descriptions of individual processes, or instead whether to lay out the bulk of the mathematics, leaving only some refinements to be discussed along with specific processes and applications. Because of the way research has developed, the existing literature is organized mostly by application; indeed, many existing theoretical results are very much tailored to specific applications and are not easily discussed abstractly. It is, however, possible to describe several distinct approaches to the analysis of reinforcement processes. This section is meant to do so, and to serve as a standalone synopsis of available methodology. Thus, only the most basic urn processes and reinforced random walks will be introduced in this section: just enough to fuel the discussion of mathematical infrastructure. Four main analytical methods are then introduced: exchangeability, branching process embedding, stochastic approximation via martingale methods, and results on perturbed dynamical systems that extend the stochastic approximation results. Prototypical theorems are given in each of these four sections, and pointers are given to later sections where further refinements arise.

Some basic models
The basic building block for reinforced processes is the urn model 2 . A (singleurn) urn model has an urn containing a number of balls of different types. The set of types may be finite or, in the more general models, countably or uncountably infinite; the types are often taken to be colors, for ease of visualization. The number of balls of each type may be a nonnegative integer or, in the more general models, a nonnegative real number.
At each time n = 1, 2, 3, . . . a ball is drawn from the urn and its type noted. The contents of the urn are then altered, depending on the type that was drawn. In the most straightforward models, the probability of choosing a ball of a given type is equal to the proportion of that type in the urn, but in more general models this may be replaced by a different assumption, perhaps in a way that depends on the time or some aspect of the past, there may be more than one ball drawn, there may be immigration of new types, and so forth.
In this section, the discussion is limited to generalized Pólya urn models, in which a single ball is drawn each time uniformly from the contents of the urn. Sections 3 and 4 review a variety of more general single-urn models. The most general discrete-time models considered in the survey have multiple urns that interact with each other. The simplest among these are mean-field models, in which an urn interacts equally with all other urns, while the more complex have either a spatial structure that governs the interactions or a stochastically evolving interaction structure. Some applications of these more complex models are discussed in Section 4.6. We now define the processes discussed in this section.
Some notation in effect throughout this survey is as follows. Let (Ω, F , P) be a probability space on which are defined countable many IID random variables uniform on [0,1]. This is all the randomness we will need. Denote these random variables by {U nk : n, k ≥ 1} and let F n denote the σ-field σ(U mk : m ≤ n) that they generate. The variables {U nk } k≥1 are the sources of randomness used to go from step n − 1 to step n and F n is the information up to time n. In this section we will need only one uniform random variable U n at each time n, so we let U n denote U n1 . A notation that will be used throughout is 1 A to denote the indicator function of the event A, that is, Vectors will be typeset in boldface, with their coordinates denoted by corresponding lightface subscripted variables; for example, a random sequence of d-dimensional vectors {X n : n = 1, 2, . . .} may be written out as X 1 := (X 11 , . . . , X 1d ) and so forth. Expectations E(·) always refer to the measure P.

Pólya's urn
The original Pólya urn model which first appeared in [EP23;Pól31] has an urn that begins with one red ball and one black ball. At each time step, a ball is chosen at random and put back in the urn along with one extra ball of the color drawn, this process being repeated ad infinitum. We construct this recursively: let R 0 = a and B 0 = b for some constants a, b > 0; for n ≥ 1, let R n+1 = R n + 1 Un+1≤Xn and B n+1 = B n + 1 Un+1>Xn , where X n := R n /(R n + B n ). We Robin Pemantle/Random processes with reinforcement 5 interpret R n as the number of red balls in the urn at time n and B n as the number of black balls at time n. Uniform drawing corresponds to drawing a red ball with probability X n independent of the past; this probability is generated by our source of randomness via the random variable U n+1 , with the event {U n+1 ≤ X n } being the event of drawing a red ball at step n. This model was introduced by Pólya to model, among other things, the spread of infectious disease. The following is the main result concerning this model. The best known proofs, whose origins are not certain [Fre65;BK64], are discussed below.
Theorem 2.1. The random variables X n converge almost surely to a limit X. The distribution of X is β(a, b), that is, it has density Cx a−1 (1 − x) b−1 where . In particular, when a = b = 1 (the case in [EP23]), the limit variable X is uniform on [0, 1].
The remarkable property of Pólya's urn is that is has a random limit. Those outside of the field of probability often require a lengthy explanation in order to understand this. The phenomenon has been rediscovered by researchers in many fields and given many names such as "lock-in" (chiefly in economic models) and "self organization" (physical models and automata).

Generalized Pólya urns
Let us generalize Pólya's urn in several quite natural ways. Take the number of colors to be any integer k ≥ 2. The number of balls of color j at time n will be denoted R nj . Secondly, fix real numbers {A ij : 1 ≤ i, j ≤ k} satisfying A ij ≥ −δ ij where δ ij is the Kronecker delta function. When a ball of color i is drawn, it is replaced in the urn along with A ij balls of color j for 1 ≤ j ≤ k. The reason to allow A ii ∈ [−1, 0] is that we may think of not replacing (or not entirely replacing) the ball that is drawn. Formally, the evolution of the vector R n is defined by letting X n := R n / k j=1 R nj and setting R n+1,j = R nj + A ij for the unique i with t<i X nt < U n+1 ≤ t≤i X nt . This guarantees that R n+1,j = R nj + A ij for all j with probability X ni for each i. A further generalization is to let {Y n } be IID random matrices with mean A and to take R n+1,j = R nj + (Y n ) ij where again i satisfies t<i X nt < U n+1 ≤ t≤i X nt .
I will use the term generalized Pólya urn scheme (GPU) to refer to the model where the reinforcement is A ij and the term GPU with random increments when the reinforcement (Y n ) ij involves further randomization. Greater generalizations are possible; see the discussion of time-inhomogeneity in Section 3.2. Various older urn models, such as the Ehrenfest urn model [EE07] can be cast as generalized Pólya urn schemes. The earliest variant I know of was formulated by Bernard Friedman [Fri49]. In Friedman's urn, there are two colors; the color drawn is reinforced by α > 0 and the color not drawn is reinforced by β. This is a GPU with A = α β β α .

Robin Pemantle/Random processes with reinforcement
6 Let X n denote X n1 , the proportion of red balls (balls of color 1). Friedman analyzed three special cases. Later, David Freedman [Fre65] gave a general analysis of Friedman's urn when α > β > 0. Freedman's first result is as follows (the paper goes on to find regions of Gaussian and non-Gaussian behavior for (X n − 1 2 )). Theorem 2.2 ([Fre65, Corollaries 3.1, 4.1 and 5.1]). The proportion X n of red balls converges almost surely to 1 2 . What is remarkable about Theorem 2.2 is that the proportion of red balls does not have a random limit. It strikes many people as counterintuitive, after coming to grips with Pólya's urn, that reinforcing with, say, 1000 balls of the color drawn and 1 of the opposite color should push the ratio eventually to 1 2 rather than to a random limit or to {0, 1} almost surely. The mystery evaporates rapidly with some back-of-the-napkin computations, as discussed in section 2.4, or with the following observation.
Consider now a generalized Pólya urn with all the A ij strictly positive. The expected number of balls of color j added to the urn at time n given the past is i X ni A ij . By the Perron-Frobenius theory, there is a unique simple eigenvalue whose left unit eigenvector π has positive coordinates, so it should not after all be surprising that X n converges to π. The following theorem from to [AK68,Equation (33)] will be proved in Section 2.3. Theorem 2.3. In a GPU with all A ij > 0, the vector X n converges almost surely to π, where π is the unique positive left eigenvector of A normalized by |π| := i π i = 1.
Remark. When some of the A ij vanish, and in particular when the matrix A has a nontrivial Jordan block for its Perron-Frobenius eigenvalue, then more subtleties arise. We will discuss these in Section 3.1 when we review some results of S. Janson.

Reinforced random walk
The first reinforced random walk appearing in the literature was the edgereinforced random walk (ERRW) of [CD87]. This is a stochastic process defined as follows. Let G be a locally finite, connected, undirected graph with vertex set V and edge set E. Let v ∼ w denote the neighbor relation {v, w} ∈ E(G). Define a stochastic process X 0 , X 1 , X 2 , . . . taking values in V (G) by the following transition rule. Let G n denote the σ-field σ(X 1 , . . . , X n ). Let X 0 = v and for n ≥ 0, let P(X n+1 = w | G n ) = a n (w, X n ) y∼Xn a n (y, X n ) where a n (x, y) is one plus the number of previous times the edge {x, y} has been traversed (in either direction): a n (x, y) := 1 + n−1 k=1 1 {X k ,X k+1 }={x,y} .
(2.2) Formally, we may construct such a process by ordering the neighbor set of each vertex v arbitrarily g 1 (v), . . . , g d(v) (v) and taking X n+1 = g i (X n ) if i−1 t=1 a n (g t (X n ), X n ) d(Xn) t=1 a n (g t (X n ), X n ) ≤ U n < i t=1 a n (g t (X n ), X n ) d(Xn) t=1 a n (g t (X n ), X n ) . (2.3) In the case that G is a tree, it is not hard to find multi-color Pólya urns embedded in the ERRW. For any fixed vertex v, the occupation measures of the edges adjacent to v, when sampled at the return times to v, form a Pólya urn process, {X (v) n : n ≥ 0}. The following lemma from [Pem88a] begins the analysis in Section 5.1 of ERRW on a tree. The vertex-reinforced random walk or VRRW, also due to Diaconis and introduced in [Pem88b], is similarly defined except that the edge weights a n (g t (X n ), X n ) in equation (2.3) are replaced by the occupation measure at the destination vertices: a n (g t (X n )) := 1 + n k=1 1 X k =gt(Xn) . (2.4) For VRRW, for ERRW on a graph with cycles, and for the other variants of reinforced random walk that are defined later, there is no representation directly as a product of Pólya urn processes or even generalized Pólya urn processes, but one may find embedded urn processes that interact nontrivially. We now turn to the various methods of analyzing these processes. These are ordered from the least to the most generalizable.

Exchangeability
There are several ways to see that the sequence {X n } in the original Pólya's urn converges almost surely. The prettiest analysis of Pólya's urn is based on the following lemma.
Lemma 2.5. The sequence of colors drawn from Pólya's urn is exchangeable. In other words, letting C n = 1 if R n = R n−1 +1 (a red ball is drawn) and C n = 0 otherwise, then the probability of observing the sequence (C 1 = ǫ 1 , . . . , C n = ǫ n ) depends only on how many zeros and ones there are in the sequence (ǫ 1 , . . . , ǫ n ) but not on their order.
Proof: Let n i=1 ǫ i be denoted by k. One may simply compute the probabilities: (2.5) Robin Pemantle/Random processes with reinforcement 8 It follows by de Finetti's Theorem [Fel71,Section VII.4] that X n → X almost surely, and that conditioned on X = p, the {C 1 } are distributed as independent Bernoulli random variables with mean p. The distribution of the limiting random variable X stated in theorem 2.1 is then a consequence of the formula (2.5) (see, e.g., [Fel71,VII.4] The method of exchangeability is neither robust nor widely applicable: the fact that the sequence of draws is exchangeable appears to be a stroke of luck. The method would not merit a separate subsection were it not for two further appearances. The first is in the statistical applications in Section 4.2 below. The second is in ERRW. This process turns out to be Markov-exchangeable in the sense of [DF80], which allows an explicit analysis and leads to some interesting open questions, also discussed in Section 5 below.

Embedding
Embedding in a multitype branching process Let {Z(t) := (Z 1 (t), . . . , Z k (t))} t≥0 be a branching process in continuous time with k types, and branching mechanism as follows. At all times t, each of the |Z(t)| := k i=1 Z i (t) particles independently branches in the time interval (t, t + dt] with probability a i dt. When a particle of type i branches, the collection of particles replacing it may be counted according to type, and the law of this random integer k-vector is denoted µ i . For any a 1 , . . . , a k > 0 and any µ 1 , . . . , µ k with finite mean, such a process is known to exist and has been constructed in, e.g., [INW66;Ath68]. We assume henceforth for nondegeneracy that it is not possible to get from |Z(t)| > 0 to |Z(t)| = 0 and that it is possible to go from |Z t | = 1 to |Z t | = n for all sufficiently large n. We will often also assume that the states form a single irreducible aperiodic class.
Let 0 < τ 1 < τ 2 < · · · denote the times of successive branching; our assumptions imply that for all n, τ n < ∞ = sup m τ m . We examine the process X n := Z(τ n ). The evolution of {X n } may be described as follows. Let F n = σ(X 1 , . . . , X n ). Then The quantity a i X ni k j=1 a j X nj is the probability that the next particle to branch will be of type i. When a i = 1 for all i, the type of the next particle to branch is distributed proportionally to its representation in the population. Thus, {X n } is a GPU with random increments. If we further require F i to be deterministic, namely a point mass at some vector (A i1 , . . . , A ik ), then we have a classical GPU.
The first people to have exploited this correspondence to prove facts about GPU's were Athreya and Karlin in [AK68]. On the level of strong laws, results about Z(t) transfer immediately to results about X n = Z(τ n ). Thus, for example, the fact that Z(t)e −λ1t converges almost surely to a random multiple of the Perron-Frobenius eigenvector of the mean matrix A [Ath68, Theorem 1] gives a proof of Theorem 2.3. Distributional results about Z(t) do not transfer to distributional results about X n without some further regularity assumptions; see Section 3.1 for further discussion.

Embedding via exponentials
A special case of the above multitype branching construction yields the classical Pólya urn. Each particle independently gives birth at rate 1 to a new particle of the same color (or equivalently, disappears and gives birth to two particles of the original color). This provides yet another means of analysis of the classical Pólya urn, and new generalizations follow. In particular, the collective birth rate of color i may be taken to be a function f (Z i ) depending on the number of particles of color i (but on no other color). Sampling at birth times then yields the dynamic X n+1 = X n + e i with probability f (X ni )/ k j=1 f (X nj ). Herman Rubin was the first to recognize that this dynamic may be de-coupled via the above embedding into independent exponential processes. His observations were published by B. Davis [Dav90] and are discussed in Section 3.2 in connection with a generalized urn model.
To illustrate the versatility of embedding, I include an interesting, if not particularly consequential, application. The so-called OK Corral process is a shootout in which, at time n, there are X n good cowboys and Y n bad cowboys. Each cowboy is equally likely to land the next successful shot, killing a cowboy on the opposite side. Thus the transition probabilities are (X n+1 , Y n+1 ) = (X n − 1, Y n ) with probability Y n /(X n + Y n ) and (X n+1 , Y n+1 ) = (X n , Y n − 1) with probability X n /(X n + Y n ). The process stops when (X n , Y n ) reaches (0, S) or (S, 0) for some integer S > 0. Of interest is the distribution of S, starting from, say the state (N, N ). It turns out (see [KV03]) that the trajectories of the OK Corral process are distributed exactly as time-reversals of the Friedman urn process in which α = 0 and β = 1, that is, a ball is added of the color opposite to the color drawn. The correct scaling of S was known to be N 3/4 [WM98; Kin99]. By embedding in a branching process, Kingman and Volkov were able to compute the leading term asymptotic for individual probabilities of S = k with k on the order of N 3/4 .

Martingale methods and stochastic approximation
Let {X n : n ≥ 0} be a stochastic process in the euclidean space R n and adapted to a filtration {F n }. Suppose that X n satisfies where F is a vector field on R n , E(ξ n+1 | F n ) = 0 and the remainder terms R n ∈ F n go to zero and satisfy ∞ n=1 n −1 |R n | < ∞ almost surely. Such a process is known as a stochastic approximation process after [RM51]; they used this to approximate the root of an unknown function in the setting where evaluation queries may be made but the answers are noisy. Stochastic approximations arise in urn processes for the following reason. The probability distributions, Q n , governing the color of the next ball chosen are typically defined to depend on the content vector R n only via its normalization X n . If b new balls are added to N existing balls, the resulting increment X n+1 − X n is exactly b b+N (Y n − X n ) where Y n is the normalized vector of added balls. Since b is of constant order and N is of order n, the mean increment is Defining ξ n+1 to be the martingale increment X n+1 −E(X n+1 | F n ) recovers (2.6). Various recent analyses have allowed scaling such as n −γ in place of n −1 in equation (2.6) for 1 2 < γ ≤ 1, or more generally, in place of n −1 , any constants γ n satisfying n γ n = ∞ (2.7) and n γ 2 n < ∞ . (2.8) These more general schemes do not arise in urn and related reinforcement processes, though some of these processes require the slightly greater generality where γ n is a random variable in F n with γ n = Θ(1/n) almost surely. Because a number of available results are not known to hold under (2.7)-(2.8), the term stochastic approximation will be reserved for processes satisfying (2.6). Stochastic approximations arising from urn models with d colors have the property that X n lies in the simplex ∆ d−1 := {x ∈ (R + ) d : In the two-color case (d = 2), the X n take values in [0, 1] and F is a univariate function on [0, 1]. We discuss this case now, then in the next subsection take up the geometric issues arising when d ≥ 3.
Proof: by symmetry we need only consider the case F < −δ on (a 0 , b 0 ). There is a semi-martingale decomposition X n = T n + Z n where and Z n = n k=1 γ n ξ n are respectively the predictable and martingale parts of X n . Square summability of the scaling constants (2.8) implies that Z n converges almost surely. By assumption, n −1 R n converges almost surely. Thus there is an almost surely finite N (ω) with When N is sufficiently large, the trajectory {X N +k } may not jump from [a, b] to the right of b 0 nor from the left of a 0 to [a, b]. The lemma then follows from the observation that for n > N , the trajectory if started in [a, b] must exit [(a + a 0 )/2, b] to the left and may then never return to [a, b].
Corollary 2.7. If F is continuous then X n converges almost surely to the zero set of F .
Proof: consider the sub-intervals [a, b] of intervals (a 0 , b 0 ) on which F > δ or F < −δ. Countably many of these cover the complement of the zero set of F and each is almost surely excluded from the limit set of {X n }.
This generalizes a result proved by [HLS80]. They generalized Pólya's urn so that the probability of drawing a red ball was not the proportion X n of red balls in the urn but f (X n ) for some prescribed f . This leads to a stochastic approximation process with F (x) = f (x) − x. They also derived convergence results for discontinuous F (the arguments for the continuous case work unless points where F oscillates in sign are dense in an interval) and showed Theorem 2.8 ([HLS80, Theorem 4.1]). Suppose there is a point p and an ǫ > 0 with F (p) = 0, F > 0 on (p−ǫ, p) and F < 0 on (p, p+ǫ). Then P(X n → p) > 0. Similarly, if F < 0 on (0, ǫ) or F > 0 on (1 − ǫ, 1), then there is a positive probability of convergence to 0 or 1 respectively.
Proof, if F is continuous: Suppose 0 < p < 1 satisfies the hypotheses of the theorem. By Corollary 2.7, X n converges to the union of {p} and (p − ǫ, p + ǫ) c . On the other hand, the semi-martingale decomposition shows that if X n is in a smaller neighborhood of p and N is sufficiently large, then {X n+k } cannot escape (p − ǫ, p + ǫ). The cases p = 0 and p = 1 are similar.
It is typically possible to find more martingales, special to the problem at hand, that help to prove such things. For the Friedman urn, in the case α > 3β, it is shown in [Fre65,Theorem 3.1] that the quantity Y n := C n (R n − B n ) is a martingale when {C n } are constants asymptotic to n −ρ for ρ := (α−β)/(α+β). Similar computations for higher moments show that lim inf Y n > 0, whence R n − B n = Θ(n ρ ).
Much recent effort has been spent obtaining some kind of general hypotheses under which convergence can be shown not to occur at points from which the process is being "pushed away". Intuitively, it is the noise of the process that prevents it from settling down at an unstable zero of F , but it is difficult to find the right conditions on the noise and connect them rigorously to destabilization of unstable equilibria. The proper context for a full discussion of this is the next subsection, in which the geometry of vector flows and their stochastic analogues is discussed, but we close here with a one-dimensional result that underlies many of the multi-dimensional results. The result was proved in various forms in [Pem88b;Pem90a].

Proof:
Step 1: it suffices to show that there is an ǫ > 0 such that for every n, P(X k → p | F n ) < 1−ǫ almost surely. Proof: A standard fact is that P(X k → p | F n ) → 1 almost surely on the event {X k → p} (this holds for any event A in place of {X k → p}). In particular, if P(X k → p) = a > 0 then for any ǫ > 0 there is some n such that P(X k → p | F n ) > 1 − ǫ on a set of measure at least a/2. Thus P(X k → 0) > 0 is incompatible with P(X k → p | F n ) < 1 − ǫ almost surely for every n.
Step 2: with probability ǫ, given F n , X n+k may wander away from p by cn −1/2 due to noise. Proof: Let τ be the exit time of the interval (p − cn −1/2 , p + cn −1/2 ). Then E(X τ − p) 2 ≤ c 2 n −1 . On the other hand, the quadratic variation of {(X n∧τ − p) 2 } increases by Θ(n −2 ) at each step, so on {τ = ∞} is Θ(n −1 ). If c is small enough, then we see that the event {τ = ∞} must fail at least ǫ of the time.
As an example, apply this to the urn process in [HLS80], choosing the urn function to be given by f (x) = 3x 2 −2x 3 . This corresponds to choosing the color of each draw to be the majority out of three draws sampled with replacement. Here, it may easily be seen that F < 0 on (0, 1 2 ) and F > 0 on ( 1 2 , 1). Verifying the hypotheses on ξ, we find that convergence to 1 2 is impossible, so S n → 0 or 1 almost surely.

Dynamical systems and their stochastic counterparts
In a vein of research spanning the 1990's and continuing through the present, Benaïm and collaborators have formulated an approach to stochastic approximations based on notions of stability for the approximating ODE. This section describes the dynamical system approach. Much of the material here is taken from the survey [Ben99].

The dynamical system heuristic
For processes in any dimension obeying the stochastic approximation equation (2.6) there are two natural heuristics. Sending the noise and remainder terms to zero yields a difference equation X n+1 − X n = n −1 F (X n ) and approximating n k=1 k −1 by the continuous variable log t yields the differential equation The first heuristic is that trajectories of the stochastic approximation {X n } should approximate trajectories of the ODE {X(t)}. The second is that stable trajectories of the ODE should show up in the stochastic system, but unstable trajectories should not. A complicating factor in the analysis is the possibility that the trajectories of the ODE are themselves difficult to understand or classify. A standard battery of examples from the dynamical systems literature shows that, once the dimension is greater than one, complicated geometry may arise such as spiraling toward cyclic orbits, orbit chains punctuated by fixed points, and even chaotic trajectories. Successful analysis, therefore, must have several components. First, definitions and results are required in order to understand the forward trajectories of dynamical systems; see the notions of ω-limit sets (forward limit sets) and attractors, below. Next, the notion of trajectory must be generalized to take into account perturbation; see the notions of chain recurrence and chain transitivity below. These topological notions must be further generalized to allow for the kind of perturbation created by stochastic approximation dynamics; see the notion of asymptotic pseudotrajectory below. Finally, with the right definitions in hand, one may prove that a stochastic approximation process {X n } does in fact behave as an asymptotic pseudotrajectory, and one may establish, under the appropriate hypotheses, versions of the stability heuristic.
It should be noted that an early body of literature exists in which simplifying assumptions preclude flows with the worst geometries. The most common simplifying assumption is that F = −∇V for some function V , which we think of as a potential. In this case, all trajectories of X(t) lead "downhill" to the set of local minima of V . From the viewpoint of stochastic processes obeying (2.6) that arise in reinforcement models, the assumption F = −∇V is quite strong. Recall, however, that the original stochastic approximation processes were designed to locate points such as constrained minima [Lju77;KC78], in which case F is the negative gradient of the objective function. Thus, as pointed out in [BH95;Ben99], much of the early work on stochastic approximation processes focused exclusively on geometrically simple cases such as gradient flow [KC78;BMP90] or attraction to a point [AEK83]. Stochastic approximation processes in the absence of Lyapunov functions can and do follow limit cycles; the earliest natural example I know is found in [Ben97].

Topological notions
Although all our flows come from differential equations on real manifolds, many of the key notions are purely topological. A flow on a topological space M is a continuous map (t, ) (note that negative times are allowed). The relation to ordinary differential equations is that any bounded Lipschitz vector field F on R n has unique integral curves and therefore defines a unique flow Φ for which (d/dt)Φ t (x) = F (Φ t (x)); we call this the flow associated to F . We will assume hereafter that M is compact, our chief example being the d-simplex in R d+1 . The following constructions and results are due mostly to Bowen and Conley and are taken from Conley's CBMS lecture notes [Con78]. The notions of forward (and backward) limit sets and attractors (and repellers) are old and well known.
For any set Y ⊆ M , define the forward limit set by (2.10) When Y = {y}, this is the set of limit points of the forward trajectory form y. Limit sets for sample trajectories will be defined in (2.11) below; a key result will be to relate these to the forward limit sets of the corresponding flow. Reversing time in (2.10), the backward limit set is denoted α(Y ). An attractor is a set A that has a neighborhood U such that ω(U ) = A. A repeller is the time-reversal of this, replacing ω(U ) by α(U ). The set Λ 0 of rest points is the set {x ∈ M : Φ t (x) = x for all t}.
Conley then defines the chain relation on M , denoted →. Say that x → y if for all t > 0 and all open covers U of M , there is a sequence x = z 0 , z 1 , . . . , z n−1 , z n = y of some length n and numbers t 1 , . . . , t n ≥ t such that Φ ti (z i−1 ) and z i are both in some U ∈ U. In the metric case, this is easier to parse: one must be able to get from x to y by a sequence of arbitrarily long flows separated by arbitrarily small jumps. The chain recurrent set R = R(M, Φ) is defined to be the set {x ∈ M : x → x}. The set R is a compact set containing all rest points of the flow (points x such that Φ t (x) = x for all t), all closures of periodic orbits, and in general all forward and backward limit sets ω(y) and α(y) of trajectories.
An invariant set S (a union of trajectories) is called (internally) chain recurrent if x → S x for all x ∈ S, where → S denotes the flow restricted to S. It is called (internally) chain transitive if x → S y for all x, y ∈ S. The following equivalence from [Bow75] helps to keep straight the relations between these definitions.
Proposition 2.10 ([Ben99, Proposition 5.3]). The following are equivalent conditions on a set S ⊆ M .
1. S is chain transitive; 2. S is chain recurrent and connected; 3. S is a closed invariant set and the flow restricted to S has no attractor other than S itself.  As we have seen, the geometry is greatly simplified when F = −∇V . Although this requires differential structure, there is a topological notion that captures the essence. Say that a flow {Φ t } is gradient-like if there is a continuous real function V : M → M such that V is strictly decreasing along non-constant trajectories. Equation (1) of [Con78,I.5] shows that being gradient-like is strictly weaker than being topologically equivalent to an actual gradient. If in addition, the set R is totally disconnected (hence equal to the set of rest points), then the flow is called strongly gradient-like.
Chain recurrence and gradient-like behavior are in some sense the only two possible phenomena. In a gradient-like flow, one can only flow downward. In a chain-recurrent flow, any function weakly decreasing on orbits must in fact be constant on components. Although we will not need the following result, it does help to increase understanding.
Theorem 2.11 ([Con78, page 17]). Every flow on a compact space M is uniquely represented as the extension of a chain recurrent flow by a strongly gradient flow. That is, there is a unique subflow (the flow restricted to R) which is chain recurrent and for which the quotient flow (collapsing components of R to a point) is strongly gradient-like.

Probabilistic analysis
An important notion, introduced by Benaïm and Hirsch [BH96], is the asymptotic pseudotrajectory. A metric is used in the definition, although it is pointed out in [BLR02, page 13-14] that the property depends only on the topology, not the metric.
Definition 2.12 (asymptotic pseudotrajectories). Let (t, x) → Φ t (x) be a flow on a metric space M . For a continuous trajectory X : denote the greatest divergence over the time interval [t, t + T ] between X and the flow Φ started from X(t). The trajectory X is an asymptotic pseudotrajectory This definition is important because it generalizes the "→" relation so that divergence from the flow need not occur at discrete points separated by large times but may occur continuously as long as the divergence remains small over arbitrarily large intervals. This definition also serves as the intermediary between stochastic approximations and chain transitive sets, as shown by the next two results. The first is proved in [Ben99, Proposition 4.4 and Remark 4.5] and the second in [Ben99, Theorem 5.7].
Theorem 2.13 (stochastic approximations are asymptotic pseudotrajectories). Let {X n } be a stochastic approximation process, that is, a process satisfying (2.6), and assume F is Lipschitz. Let {X(t) := X n + (t − n)(X n+1 − X n ) for n ≤ t < n + 1} linearly interpolate X at nonintegral times. Assume bounded noise: |ξ n | ≤ K. Then {X(t)} is almost surely an asymptotic pseudotrajectory for the flow Φ of integral curves of F .
Remark. With deterministic step sizes as in (2.6) one may weaken the bounded noise assumption to L 2 -boundedness: E|ξ n | 2 ≤ K; the stronger assumption is needed only under (2.7)-(2.8). The purpose of the Lipschitz assumption on F is to ensure (along with the standing compactness assumption on M ) that the flow Φ is well defined.
The limit set of a trajectory is defined similarly to a forward limit set for a flow. If X : R + → M is a trajectory, or X : Z + → M is a discrete time trajectory, define L(X) := t≥0 X([t, ∞)) . (2.11) Theorem 2.14 (asymptotic pseudotrajectories have chain-transitive limits).
The limit set L(X) of any asymptotic pseudotrajectory, X, is chain transitive.
Combining Theorems 2.13 and 2.14, and drawing on Proposition 2.10 yields a frequently used basic result, appearing first in [Ben93].
Corollary 2.15. Let X := {X n } be a stochastic approximation process with bounded noise, whose mean vector field F is Lipschitz. Then with probability 1, the limit set L(X) is chain transitive. In view of Proposition 2.10, it is therefore invariant, connected, and contains no proper attractor.
Continuing Example 2.1, the right-hand flow has three connected, closed invariant sets S 1 , {a} and {b}. The flow restricted to either {a} or {b} is chain transitive, so either is a possible limit set for {X n }, but the whole set S 1 is not chain transitive, thus may not be the limit set of {X n }. We expect to rule out the repeller {a} as well, but it is easy to fabricate a stochastic approximation that is rigged to converge to {a} with positive probability. Further hypotheses on the noise are required to rule out {a} as a limit point. For the left-hand flow, any of the three invariant sets is possible as a limit set.
Examples such as these show that the approximation heuristic, while useful, is somewhat weak without the stability heuristic. Turning to the stability heuristic, one finds better results for convergence than nonconvergence. From [Ben99,Theorem 7.3], we have: Theorem 2.16 (convergence to an attractor). Let A be an attractor for the flow associated to the Lipschitz vector field F , the mean vector field for a stochastic approximation X := {X n }. Then either (i) there is a t for which {X t+s : s ≥ 0} almost surely avoids some neighborhood of A or (ii) there is a positive probability that L(X) ⊆ A .
Proof: A geometric fact requiring no probability is that asymptotic pseudotrajectories get sucked into attractors. Specifically, let K be a compact neighborhood of the attractor A for which ω(K) = A (these exist, by definition of an attractor). It is shown in [Ben99, Lemma 6.8] that there are T, δ > 0 such that for any trajectory X starting in K, d Φ,t,T (X) < δ for all t implies L(X) ⊆ A.
Fix such a neighborhood K of A and fix T, δ as above. By hypothesis, for any t > 0 we may find X t ∈ K with positive probability. Theorem 2.13 may be strengthened to yield a t such that on the event X t ∈ K. If P(X t ∈ K) = 0 then conclusion (i) of the theorem is true, while if P(X t ∈ K) > 0, then conclusion (ii) is true.
For the nonconvergence heuristic, most known results (an exception may be found in [Pem91]) are proved under linear instability. This is a stronger hypothesis than topological instability, requiring that at least one eigenvalue of dF have strictly positive real part. An exact formulation may be found in Section 9 of [Ben99]. It is important to that linear instability is defined there for periodic orbits as well as rest points, thus yielding conclusions about nonconvergence to entire orbits, a feature notably lacking in [Pem90a].
Theorem 2.17 ([Ben99, Theorem 9.1]). Let {X n } be a stochastic approximation process on a compact manifold M with bounded noise ||ξ n || ≤ K for all n and C 2 vector field F . Let Γ be a linearly unstable equilibrium or periodic orbit for the flow induced by F . Then Proof: The method of proof is to construct a function F for which F (X n ) obeys the hypotheses of Theorem 2.9. This relies on known straightening results for stable manifolds and is carried out in [Pem90a] for Γ = {p} and in [BH95] for general Γ; see also [Bra98].

Infinite dimensional spaces
The stochastic approximation processes discussed up to this point obey equation (2.6) which presumes the ambient space R d . In Section 6.1 we will consider a stochastic approximation on the space P(M ) of probability measures on a compact manifold M . The space P(M ) is compact in the weak topology and metrizable, hence the topological definitions of limits, attractors and chain transitive sets are still valid and Theorem 2.14 is still available to force asymptotic pseudotrajectories to have limit sets that are chain transitive. In fact this justifies the space devoted in [Ben99] and its predecessors to establishing results that applied to more than just R d . The place where new proofs are required is in proving versions of Theorem 2.13 for processes in infinite-dimensional spaces (see Theorem 6.4 below).

Lyapunov functions
A Lyapunov function for a flow Φ with respect to the compact invariant set Λ is defined to be a continuous function V : M → R that is constant on trajectories in Λ and strictly decreasing on trajectories not in Λ. When Λ = Λ 0 , the set of rest points, existence of a Lyapunov function is equivalent to the flow being gradient-like. The values V (Λ 0 ) of a Lyapunov function at rest points are called critical values. Gradient-like flows are geometrically much better behaved than more general flows, as is shown in [Ben99, Proposition 6.4, and Corollary 6.6]: Proposition 2.18 (chain transitive sets when there is a Lyapunov function). Suppose V is a Lyapunov function for a set Λ such that the set of values V (Λ) has empty interior. Then every chain transitive set L is contained in Λ is a set of constancy for V . In particular, if Λ = Λ 0 and intersects the limit set of an asymptotic pseudotrajectory {X(t)} in at most countably many points, then X(t) must converge to one of these points.
It follows that the presence of a Lyapunov function for the vector flow associated to F implies convergence of {X t } to a set of constancy for the Lyapunov function. For example, Corollary 2.7 may be proved by constructing a Lyapunov function with Λ = the zero set of F . A usual first step in the analysis of a stochastic approximation is therefore to determine whether there is a Lyapunov function. When F = −∇V of course V itself is a Lyapunov function with Λ = the set of critical points of V .

Time-homogeneous generalized Pólya urns
Recall from Section 2.1 the definition of a generalized Pólya urn with reinforcement matrix A. We saw in Section 2.3 that the resulting urn process {X n } may be realized as a multitype branching process {Z(T )} sampled at its jump times τ n . Already in 1965, for the special case of the Friedman urn with A := α β β α , D. Freedman was able to prove the following limit laws via martingale analysis.
Arguments for these results will be given shortly by means of embedding in branching processes. Freedman's original proof of (iii) was via moments, estimating each moment by means of an asymptotic recursion; a readable sketch of this argument may be found in [Mah03, Section 6]. The present section summarizes further results that have been obtained via the embedding technique described in Section 2.3. Such an approach rests on an analysis of limit laws in multitype branching processes. These are of independent interest and yet it is interesting to note that such results were not pre-existing. The development of limit laws for multitype branching process was motivated in part by applications to urn processes. In particular, the studies [Ath68] and [Jan04] of multitype limit laws were motivated respectively by the companion paper [AK68] on urn models and by applications to urns in [Jan04; Jan05].
The first thorough study of GPU's via embedding was undertaken by Athreya and Karlin. Although they allow reinforcements to be random, subject to the condition of finite variance, their results depend only on the mean matrix, again denoted A. They make an irreducibility assumption, namely that exp(tA) has positive entries. This streamlines the analysis. While it does not lose too much generality, it probably caused some interesting phenomena in the complementary case to remain hidden for another several decades.
The assumptions imply, by the Perron-Frobenius theory, that the leading eigenvalue of A is real and has multiplicity 1, and that we may write all the eigenvalues as If we do not allow balls to be subtracted and we rule out the trivial case of no reinforcement, then λ 1 > 0. For any right eigenvector ξ with eigenvalue λ, the quantity ξ · Z(t)e −λt is easily seen to be a martingale [AK68, Proposition 1]. When Re {λ} > λ 1 /2, this martingale is square integrable, leading to an almost sure limit. This recovers Freedman's first result in two steps. First, taking ξ = (1, 1) and λ = λ 1 = α + β, we see that R n + B n ∼ W e (α+β)t for some random with the assumption ρ > 1/2 being exactly what is needed square integrability. These two almost sure limit laws imply Freedman's result (i) above.
The analogue of Freedman's result (iii) is that for any eigenvector ξ whose eigenvalue λ has Re {λ} < λ 1 /2, the quantity ξ · X n / √ v · X n converges to a normal distribution. The greater generality sheds some light on the reason for the phase transition in the Friedman model at ρ = 1/2. For small ρ, the mean drift of R n −B n = u·X n is swamped by the noise coming from the large number of particles v · X n = R n + B n . For large ρ, early fluctuations in R n = B n persist because their mean evolution is of greater magnitude than the noise.
A distributional limit for {X n = Z(τ n )} does not follow automatically from the limit law for Z(t). A chief contribution of [AK68] is to carry out the necessary estimates to bridge this gap.
Athreya and Karlin also state that a similar result may be obtained in the "log" case Re {λ} = λ 1 /2, extending Freedman's result (ii), but they do not provide details.
At some point, perhaps not until the 1990's, it was noticed that there are interesting cases of GPU's not covered by the analyses of Athreya and Karlin. In particular, the diagonal entries of A may be between −1 and 0, or enough of the off-diagonal entries may vanish that exp(tA) has some vanishing entries; essentially the only way this can happen is when the urn is triangular, meaning that in some ordering of the colors, A ij = 0 for i > j.
The special case of balanced urns, meaning that the row sums of A are constant, is somewhat easier to analyze combinatorially because the total number of balls in the urn increases by a constant each time. Even when the reinforcement is random with mean matrix A, the assumption of balance simplifies the analy-sis. Under the assumption of balance and tenability (that is, it is not possible for one of the populations to become negative), a number of analyses have been undertaken, including [BP85], [Smy96] and [Mah03]; see also [MS92;MS95] for applications of two-color balanced urns to random recursive trees, and [Mah98] for a tree application of a three-color balanced urn. Exact solutions to two-color balanced urns exhibit involve number theoretic phenomena which are described in [FGP05].
Without the assumption of balance, results on triangular urns date back at least to [DV97]. Their chief results are for two colors, and their method is to analyze the simultaneous functional equations satisfied by the generating functions. Kotz, Mahmoud and Robert [KMR00] concern themselves with removing the balance assumption, attacking the special case A = 1 0 1 1 by combinatorial means. A martingale-based analysis of the cases A = 1 0 c 1 and A = a 0 0 b is hidden in [PV99]. The latter case had appeared in various places dating back to [Ros40], the result being as follows.
Theorem 3.3 (diagonal urn). Let a > b > 0 and consider a GPU with reinforcement matrix Then R n /B ρ n converges almost surely to a nonzero finite limit, where ρ := a/b. Proof: From branching process theory there are variables W, W ′ with e −at R t → W and e −bt B t → W ′ . This implies R t /B ρ t converges to the random variable W/(W ′ ) ρ , which gives convergence of R n /B n to the same quantity.
Given the piecemeal approaches to GPU's it is fitting that more comprehensive analyses finally emerged. These are due to Janson [Jan04; Jan05]. The first of these is via the embedding approach. The matrix A may be of any finite size, diagonal entries may be as small as −1, and the irreducibility assumption is weakened to the largest eigenvalue λ 1 having multiplicity 1 and being "dominant". This last requirement is removed in [Jan05], which combines the embedding approach with some computations at times τ n via generating functions, thus bypassing the need for converting distributional limit theorems in Z(t) to the stopping times τ n . The results, given in terms of projections of A onto various subspaces, are somewhat unwieldy to formulate and will not be reproduced here. As far as I can tell, Janson's results do subsume pretty much everything previously known. For example, the logarithmic scaling result appearing in a crude form in [PV99, Theorem 2.3] and elsewhere was proved as Theorem 1.3 (iv) of [Jan05]: almost surely to a random finite limit. Equivalently, converges to a random finite limit.
To verify the equivalence of the two versions of the conclusion, found respectively in [PV99] and [Jan05], use the deterministic relation for some finite random Z. Also, both versions of the conclusion imply log(n/B n ) = log log n + log c + o(1) and log log n = log log B n + o(1). It follows then that (3.2) is equivalent to which is equivalent to the convergence of (3.1) to the random limit c −1 (Z−log c).

Dependence on time
The time-dependent urn is a two-color urn, where only the color drawn is reinforced; the number of reinforcements added at time n is not independent of n but is given by a deterministic sequence of positive real numbers {a n : n = 0, 1, 2, . . .}. This is introduced in [Pem90b] with a story about modeling American primary elections. Denote the contents by R n , B n and X n = R n /(R n + B n ) as usual. It is easy to see that X n is a martingale, and the fact that the almost sure limit has no atoms in the open interval (0, 1) may be shown via the same three-step nonconvergence argument used to prove Theorem 2.9. The question of atoms among the endpoints {0, 1} is more delicate. It turns out there is an exact recurrence for the variance of X n , which leads to a characterization of when the almost sure limit is supported on {0, 1}.
Theorem 3.5 ([Pem90b, Theorem 2]). Define δ n := a n /(R 0 +B 0 + n−1 j=0 a j ) to be the ratio of the n th increment to the volume of the urn before the increment is added. Then lim n→∞ X n = 1 almost surely if and only if ∞ n=1 δ 2 n = ∞.
Note that the almost sure convergence of X n to {0, 1} is not the same as convergence of X n to {0, 1} with positive probability: the latter but not the former happens when a n = n. It is also not the same as almost surely choosing one color only finitely often. No sharp criterion is known for positive probability of lim n→∞ X n ∈ {0, 1}, but it is known [Pem90b, Theorem 4] that this cannot happen when sup n a n < ∞.

Ordinal dependence
A related variation adds a n red balls the n th time a red ball is drawn and a ′ n black balls the n th time a black ball is drawn. As is characteristic of such models, a seemingly small change in the definition leads to an different behavior, and to an entirely different method of analysis. One may in fact generalize so that the n th reinforcement of a black ball is of size a ′ n , not in general equal to a n . The following result appears in the appendix of [Dav90] and is proved by Rubin's exponential embedding.
Theorem 3.6 (Rubin's Theorem). Let S n := n k=0 a k and S ′ n := n k=0 a ′ n . Let G denote the event that all but finitely many draws are red, and G ′ the event that all but finitely many draws are black. Then Proof: Let {Y n , Y ′ n : n = 0, 1, 2, . . .} be independent exponential with respective means 1/a n and 1/a ′ n . We think of the sequence Y 1 , Y 1 +Y 2 , . . . as successive times of an alarm clock. Let R(t) = sup{n : n k=0 Y k ≤ t} be the number of alarms up to time t, and similarly let B(t) = sup{n : n k=0 Y ′ k ≤ t} be the number of alarms in the primed variables up to time t. If {τ n } are the successive jump times of the pair (R(t), B(t)) then (R(τ n ), B(τ n )) is a copy of the Davis-Rubin urn process. The theorem follows immediately from this representation, and from the fact that ∞ n=0 Y n is finite if and only if its mean is finite (in which case "explosion" occurs) and has no atoms when finite.

Altering the draw
Mahmoud [Mah04] considers an urn model in which each draw consists of k balls rather than just one. There are k + 1 possible reinforcements depending on how many red balls there are in the sample. This is related to the model of Hill, Lane and Sudderth [HLS80] in which one ball is added each time but the probability it is red is not X n but f (X n ) for some function f : [0, 1] → [0, 1]. The end of Section 2.4 introduced the example of majority draw: if three balls are drawn and the majority is reinforced, then f (x) = x 3 + 3x 2 (1 − x) is the probability that a majority of three will be red when the proportion of reds is x. If one samples with replacement in Mahmoud's model and limits the reinforcement to a single ball, then one obtains another special case of the model of Hill, Lane and Sudderth.
A common generalization of these models is to define a family of probability distributions {G x : 0 ≤ x ≤ 1} on pairs (Y, Z) of nonnegative real numbers, and to reinforce by a fresh draw from G x when X n = x. If G x puts mass f (x) on (1, 0) and 1 − f (x) on (0, 1), this gives the Hill-Lane-Sudderth urn; an identical model appears in [AEK83]. If G x gives probability k j x j (1 − x) k−j to the pair (α 1j , α 2j ) for 0 ≤ j ≤ k then this gives Mahmoud's urn with sample size k and reinforcement matrix α.
When G x are all supported on a bounded set, the model fits in the stochastic approximation framework of Section 2.4. For two-color urns, the dimension of the space is 1, and the vector field is a scalar field is the mean of G x . As we have already seen, under weak conditions on F , the proportion X n of red balls must converge to a zero of F , with points at which the graph of F crosses the x-axis in the downward direction (such as the point 1/2 in a Friedman urn) occurring as the limit with positive probability and points where the graph of F crosses the x-axis in an upward direction (such as the point 1/2 in the majority vote model) occurring as the limit with probability zero.
Suppose F is a continuous function and the graph of F touches the x-axis at (p, 0) but does not cross it. The question of whether X n → p with positive probability is then more delicate. On one side of p, the drift is toward p and on the other side of p the drift is away from p. It turns out that convergence can only occur if X n stays on the side where the drift is toward p, and this can only happen if the drift is small enough. A curve tangent to the x-axis always yields small enough drift that convergence is possible. The phase transition occurs when the one-sided derivative of F is −1/2. More specifically, it is shown in [Pem91] and F (x) > 0 on a neighborhood (p, p + ǫ), then P(X n → p) = 0. The proof of (i) consists of establishing a power law p − X n = Ω(n −α ), precluding X n ever from exceeding p.
The paper [AEK83] introduces the same model with an arbitrary finite number of colors. When the number of colors is d + 1, the state vector X n lives in the d-simplex x j = 1}. Under relatively strong conditions, they prove convergence with probability 1 to a global attractor. A recent variation by Siegmund and Yakir weakens the hypothesis of a global attractor to allow for finitely many non-attracting fixed points on ∂∆ d [SY05, Theorem 2.2]. They apply their result to an urn model in which balls are labeled by elements of a finite group: balls are drawn two at a time, and the result of drawing g and h is to place an extra ball of type g ·h in the urn. The result is that the contents of the urn converge to the uniform distribution on the subgroup generated by the initial contents.
All of this has been superseded by the stochastic approximation framework of Benaïm et al. While convergence to attractors and nonconvergence to repelling sets is now understood, at least in the hyperbolic case (where no eigenvalue of dF (p) has vanishing real part), some questions still remain. In particular, the estimation of deviation probabilities has not yet been carried out. One may ask, for example, how the probability of being at least ǫ away from a global attractor at time n decreases with n, or how fast the probability of being within ǫ of a repeller at time n decreases with n. These questions appear related to quantitative estimates on the proximity to which {X n } shadows the vector flow {X(t)} associated to F (cf. the Shadowing Theorem of Benaïm and Hirsch [Ben99, Theorem 8.9]).

Urn models: applications
In this section, the focus is on modeling rather than theory. Most of the examples contain no significant new mathematical results, but are chosen for inclusion here because they use reinforcement models (mostly urn models) to explain and predict physical or behavioral phenomena or to provide quick and robust algorithms.

Self-organization
The term self-organization is used for systems which, due to micro-level interaction rules, attain a level of coordination across space or time. The term is applied to models from statistical physics, but we are concerned here with self-organization in dynamical models of social networks. Here, self-organization usually connotes a coordination which may be a random limit and is not explicitly programmed into the evolution rules. The Pólya urn is an example of this: the coordination is the approach of X n to a limit; the limit is random and its sample values are not inherent in the reinforcement rule.

Market share
One very broad application of Pólya-like urn models is as a simplified but plausible micro-level mechanism to explain the so-called "lock-in" phenomenon in industrial or consumer behavior. The questions are why one technology is chosen over another (think of the VHS versus Betamax standard for videotape), why the locations of industrial sites exhibit clustering behavior, and so forth. In a series of articles in the 1980's, Stanford economist W. Brian Arthur proposed urn models for this type of social or industrial process, matching data to the predictions of some of the models. Arthur used only very simple urn models, most of which were not new, but his conclusions evidently resonated with the economics community. The stories he associated with the models included the following.
Random limiting market share: Suppose two technologies (say Apple versus IBM) are selectively neutral (neither is clearly better) and enter the market at roughly the same time. Suppose that new consumers choose which of the two to buy in proportion to the numbers already possessed by previous consumers. This is the basic Pólya urn model, leading to a random limiting market share: X n → X. In the case of Apple computers, the sample value of X is between 10% and 15%. This model is discussed at length in [AEK87]. Random monopoly: Still assuming no intrinsic advantage, suppose that economies of scale lead to future adoption rates proportional to a power α > 1 of present market share. This particular one-dimensional GPU is of the type in Theorem 2.8 (a Hill-Lane-Sudderth urn) with (4.1) The graph of F is shaped as in figure 2 below. The equilibrium at x = 1/2 is unstable and X n converges almost surely to 0 or 1. Which of these two occurs depends on chance fluctuations near the beginning of the run. In fact such qualitative behavior persists even if one of the technologies does have an intrinsic advantage, as long as the shape of F remains qualitatively the same. The possibility of an eventual monopoly by an inferior technology is discussed as well in [AEK87] and in the popular account [Art90]. The particular F of (4.1) leads to interesting quantitative questions as to the time the system can spend in disequilibrium, which are discussed in [CL06b; OS05].

Neuron polarity
The mathematics of the following model for neuron growth is mathematically almost identical. The motivating biological question concerns the mechanisms by which apparently identical cells develop into different types. This is poorly understood in many important developmental processes. Khanin and Khanin examine the development of neurons into two types: axon and dendrite. Indistinguishable at first, groups of such cells exhibit periods of growth and retraction until one rapidly elongates to eventually become an axon [KK01, page 1]. They note experimental data suggesting that any neuron has the potential to be either type, and hypotheses that a neuron's length at various stages of growth relative to nearby neurons may influence its development.
They propose an urn model where at each discrete time one of the existing neurons grows by a constant length, l, and the others do not grow. The probability of being selected to grow is proportional to the α-power of its length, for some parameter α > 0. They give rigorous proofs of the long-term behavior in three cases. When α > 1, they quote Rubin's Theorem from [Dav90] to show that after a certain random time, only one neuron grows. When α = 1, they cite results on the classical Pólya urn from [Fel68] to show that the pairwise length ratios have random finite limits. When α < 1, they use embedding methods to show that every pair of lengths has ratio equal to 1 in the limit and to show fluctuations that are Gaussian when α < 1/2, Gaussian with a logarithm in the scaling when α = 1/2, and differing by a t α times a random limiting constant when α ∈ (1/2, 1) (cf. Freedman's results quoted in Section 3.1).

Preferential attachment
Another self-organization story has to do with random networks. Models of random networks are used to model the internet, trade, political persuasion and a host of other phenomena. Mathematically, the best studied model is the Erdös-Rényi model where each possible edge is present independently with some probability p. For the purposes of many applications, two properties are desirable that do not occur in the Erdös-Rényi model. First, empirical studies show that the distribution of vertex degrees should follow a power law rather than be tightly clustered around its mean. Secondly, there should be local clustering but global connectivity, meaning roughly that as the number of vertices goes to infinity with the average degree constant, the graph-theoretic distance between typical vertices should be small (logarithmic) but the collection of geodesics should have bottlenecks at certain "hub" vertices.
A model, known as the small-world model was introduced by Watts and Strogatz [WS98] who were interested in the "six degrees of separation" phenomenon (essentially the empirical fact that the graph of humans and acquaintanceship has local clustering and global connectivity). Their graph is a random perturbation of a nearest neighbor graph. It does exhibit local clustering and global connectivity but not the power-law variation of degrees, and is not easy to work with. A model with the flexibility to fit an arbitrary degree profile was proposed by Chung and Graham and analyzed in [CL03]. This static model is flexible, tractable and provides graphs that match data. Neither this nor the small-world model, however, provides a micro-level explanation of the formation of the graph. A collection of dynamic growth urn models, known as preferential attachment models, the first of which was introduced by Barabási and Albert [BA99], has been developed in order to address this need.
Let a parameter α ∈ [0, 1] be chosen and construct a growing sequence of graphs {G α n } on the vertex set {1, . . . , n} as follows. Let G 1 be the unique graph on one vertex. Given G α n , let G α n+1 be obtained from G α n by adding a single vertex labeled n + 1 along with a single edge connecting n + 1 to a random vertex V n ∈ G α n . With probability α the new vertex V n is chosen uniformly from {1, . . . , n}, while with probability 1 − α the probability V n = v is taken to be proportional to the degree of v.
This procedure always produces a tree. When α = 1, this is a well known recursive tree. The other extreme case α = 0 may be regarded as pure preferential attachment. A modification is to add some fixed number m of new edges each time, choosing each independently according to the procedure in the case of m = 1 and handling collisions among these m new edges by some arbitrary re-sampling scheme. This procedure produces a directed graph that is not, in general, a tree. We denote this random graph by G α,m n . Preferential attachment models, also known as rich get richer models are examples of scale-free models 3 . The power laws they exhibit have been fit to data many times, e.g., in figure 1 of [BA99]. Preferential attachment graphs have also been used as the underlying graphs for models of interacting systems. For example, [KKO + 05] examines a market pricing model known as the graphical Fisher model for price setting. In this model, there is a bipartite graph whose vertices are vendors and buyers. Each buyer buys a unit of goods from the cheapest neighboring vendor, with the vendors trying to set prices as high as possible while still selling all their goods. The emergent prices are entirely a function of the graph structure. In [KKO + 05], the graph is taken to be a bipartite version of G α,m n and the prices are shown to vary only when m = 1. A number of nonrigorous arguments for the degree profile of G α,m n appear in the literature. For example, in Barabasi and Albert's original paper, the following heuristic argument is given for the case α = 0; see also [Mit03]. Consider the vertex v added at time k. Let us use an urn model to keep track of its degree. There will be a red ball for each edge incident to v and a black ball for each half of each edge not incident to v. The urn begins with 2km balls, of which m are red. At each time step a total of 2m balls are added. Half of these are always colored black (half-edges incident to m new vertices) while half are colored by choosing from the urn. Let R l be the number of red balls in the urn at time l. Then Thus far, the urn analysis is rigorous. The heuristic now proposes that the degree of each ball is exactly the greatest integer below this. Solving for k so that the vertex has degree d at time n gives k as a function of d: The number of k for which the expected degree is between d and d + 1 is ⌊k(d + 1)⌋ − ⌊k(d)⌋; this is roughly the derivative with respect to −d of k(d), namely 2m 2 n/d 3 . Thus the fraction of vertices having degree exactly d should be asymptotic to 2m 2 /d 3 .
Chapter 3 of the forthcoming book of Chung and Lu [CL06a] will contain the first rigorous and somewhat comprehensive treatment of preferential attachment schemes (see the discussion in their Section 3.2 of the perils of unjustified heuristics with regard to this model). The only published, rigorous analysis of preferential attachment that I know of is by Bollobás et al. [BRST01] and is restricted to the case α = 0. Bollobás et al. clean up the definition of G 0,m n with regard to the initial conditions and the procedure for resolving collisions. They then prove the following theorem.
Theorem 4.1 (degrees in the pure preferential attachment graph). Let and let X n,m,d denote the proportion among all n vertices of G 0,m n that have degree m + d (that is, they have in-degree d when edges are directed toward the original vertex). Then both and sup converge to 1 in probability as n → ∞.
As d → ∞ with m fixed, β(m, d) is asymptotic to 2m 2 d −3 . This agrees, as an asymptotic, with the heuristic for α = 0, while providing more information for small d. The method of proof is to use Azuma's inequality on the filtration σ(G 0,m n : n = 1, 2, . . .); once this concentration inequality is established, a relatively easy computation finishes the proof by showing convergence of EX n,m,d to β(m, d).

Statistics
We saw in Theorem 2.1 that the fraction of red balls in a Pólya urn with initial composition (R(0), B(0)) converges almost surely and that the limit distribution is β(R(0), B(0)). Because the sequence of draws is exchangeable, de Finetti's Theorem allows us to interpret the Pólya process as Bayesian observation of a coin with unknown bias, p, with a β(R(0), B(0)) prior on p, the probability of flipping "Red" (see the discussion in Section 2.2). Each new flip changes our posterior on p, the new posterior after n observations being exactly β(R(n), B(n)). When R(0) = B(0) = 1, the prior is uniform on [0, 1]. According to [Fel68, Chapter V, Section 2], Laplace used this model for a tongue-in-cheek estimate that the odds are 1.8 million to one in favor of the sun rising tomorrow; this is based on a record of the sun having risen every day in the modern era (about 5,000 years or 1.8 million days).

Dirichlet distributions
The urn representation of the β distribution generalizes in the following manner to any number of colors. Consider a d-color Pólya urn with initial quantities R 1 (0), . . . , R d (0). Blackwell and McQueen [BM73, Theorem 1] showed that the limiting distribution is a Dirichlet distribution with parameters (R 1 (0), . . . , R d (0)), where the Dirichlet distribution with parameters (α 1 , . . . , α d ) is defined to be the measure on the (d − 1)-simplex with density (4. 2) The Dirichlet distribution has important statistical properties, some of which we now discuss. Ferguson [Fer73] gives a formula and a discussion of the history. It was long known to Bayesians as the conjugate prior for the parameters of a multinomial distribution (Ferguson refers to [Goo65] for this fact). Thus, for example, the sequence of colors drawn from an urn with initial composition (1, . . . , 1) are distributed as flips of a d-sided coin whose probability vector is drawn from a prior that is uniform on the (d − 1)-simplex; the posterior after n flips will be a Dirichlet with parameters (R 1 (n), . . . , R d (n)).
Given a finite measure α on a space S, the Dirichlet process with reference measure α is a random measure ν on S such that for any disjoint sets A 1 , . . . , A d , the vector of random measures (ν(A 1 ), . . . , ν(A d )) has a Dirichlet distribution with parameters (α(A 1 ), . . . , α(A d )). We denote the law of ν by D(α). Because Dirichlet distributions are supported on the unit simplex, the random measure ν is almost surely a probability measure.
Ferguson [Fer73] suggests using the Dirichlet process as a natural, uninformative prior on the space of probability measures on S. Its chief virtue is the ease of computing the posterior: Ferguson shows that after observing independent samples x 1 , . . . , x n from an unknown measure ν distributed as D(α), the posterior for ν is D(α + n k=1 δ(x k )), where δ(x k ) is a point mass at x k . A corollary of this is a beautiful urn representation for D(α): it is the limiting contents of an S-colored Pólya urn with initial "contents" equal to α. A second virtue of the Dirichlet prior is that it is weakly dense in the space of probability measures on probability measures on the unit simplex. A drawback is that it is almost surely an atomic measure, meaning that it predicts the eventual occurrence of identical data values. One might prefer a prior supported on the space of continuous measures, although in this regard, the Dirichlet prior is more attractive than its best known predecessor, namely a random distribution function on [0, 1], defined by Dubins and Freedman [DF66], which is almost surely singular-continuous.
The Dirichlet prior and the urn process representing it has been generalized in a number of ways. A random prior on the sequence space E := {0, . . . , k−1} ∞ is defined in [Fer74;MSW92] via an infinite k-ary tree of urns. Each urn is a Pólya urn, and the rule for a single update is as follows: sample from the urn at the root; if color j is chosen, put an extra ball of color j in that urn, move to the urn that is the j th child, and repeat this sampling and moving infinitely often. Mapping the space E into any other space S gives a prior on S. Taking k = 2, S = [0, 1] and the binary map (x j ) → x j 2 −j , one recovers the almost surely singular-continuous prior of [DF66]. Taking k = 1, the tree is an infinite ray, and the construction may be used to obtain the Beta-Stacy prior [MSW00].
Another generalization formulates a natural conjugate prior on the the transition matrix of a reversible Markov chain. The edge-reinforced random walk, defined in Section 2.1, is a Markov-exchangeable process (see the last sentence of Section 2.2). This implies that the law of this sequence is a mixture of laws of Markov chains. Given a set of initial weights one the edges, the mixing measure may be explicitly described, as in Theorem 5.1 below. Diaconis and Rolles [DR06] propose this family of such measures, with initial weights as parameters, as priors over reversible Markov transition matrices. Suppose we fix such a prior, coming from initial weights {w(e)} and we then observe a single sample X 0 , . . . , X n of the unknown reversible Markov chain run for time n.
The posterior distribution will then be another measure from this family, with weights This is exactly analogous to the Ferguson's use of Dirichlet priors for the parameter of an IID sequence and yields, as far as I know, the only computationally feasible Bayesian analysis of an unknown reversible Markov chain.

The Greenwood-Yule distribution and applications
Distributions obtained from Pólya urn schemes have been proposed for a variety of applications in which the urn mechanism is plausible at the micro-level. For example, it is proposed in [Jan82] that the number of males born in a family of a specified size n might fit the distribution of a Pólya urn at time n better than a binomial (n, p) if the propensity of having a male was not a constant p but varied according to family. Mackerro and Lawson [ML82] make a similar case (with more convincing data) about the number of days in a given season that are suitable for crop spraying. For more amusing examples, see [Coh76].
Consider a Pólya urn started with R red balls and n black balls and run to time αn. The probability that no new balls get added during this time is equal to αn−1 j=0 n + j n + R + j which converges as n → ∞ to (1 + α) −R . The probability of adding exactly k balls during this time converges as well. To identify the limit, use exchangeability to see that this is αn k times the probability of choosing zero red balls in αn − k steps and then k red balls in a row. Thus the probability p αn (k) of choosing exactly k red balls is given by The limiting distribution is a distribution with very fat tails known as the Greenwood-Yule distribution (also, sometimes, the Eggenberger-Pólya distribution). Successive ratios p(k + 1)/p(k) are of the form c R+k k , which may be contrasted to the successive ratios c R k of the Poisson. Thus it is typically used in models where one occurrence may increase the propensity for the next occurrence. It is of historical interest because its use in modeling dependent events precedes the paper [EP23] of Pólya's by several years: the distribution was introduced by Greenwood and Yule [GY20] in order to model numbers of accidents in industrial worksites. More recently it has been proposed as a model for the number of crimes committed by an individual [Gre91], the spontaneous mutation rate in filamentous fungi [BB03] and the number of days in a dry spell [DGVEE05].
It is particularly interesting when the inference process is reversed. The crosssection of the number of particles created in high speed hadronic collisions is known experimentally to have a Greenwood-Yule distribution. This has led physicists [YMN74; Min74] to look for a mechanism responsible for this, perhaps similar to the urn model for Bose-Einstein statistics.

Sequential design
The "two-armed" bandit, whose name seems already to have entered the folklore between 1952 and 1957 [Rob52; BJK62], is a slot machine with two arms. One arm yields a payoff of $1 with probability p and the other arm yields a payoff of $1 with probability q. The catch is, you don't know which arm is which, nor do you know p and q. The goal is to play so as to maximize your expected return, or limiting average expected return. When p and q are unknown, it is not at all obvious what to do. At the n th step, assuming you have played both arms by then, if you play the arm with the lower historical yield your immediate return is sub-optimal. However, if you always play the arm with the higher historical return, you could miss out forever on a much better action which mis-led you with an initial run of bad luck.
The type of analysis needed to solve the two-armed bandit problem goes by the names of sequential analysis, adaptive control, or stochastic or optimal control. Mathematically similar problems occur in statistical hypothesis testing and in the design of clinical trials. The formulation of what is to be optimized, and hence the solution to the problem, will vary with the particular application. In the gambling problem, one wants to maximize the expected return, in the sense of the limiting average (or perhaps the total return in a finite time or infinite time with the future discounted). Determining which of two distributions has a greater mean seems almost identical to the two-armed bandit problem but the objective function is probably some combination of a cost per observation and a reward according to the accuracy of the inference. When designing a clinical trial, say to determine which of two treatments is more effective, there are two competing goals because one is simultaneously gathering data and treating patients. The most data is gathered in a balanced design, where each treatment is tried equally often. But there is an ethical dilemma each time an apparently less effective treatment is prescribed, and the onus is to keep these to a minimum. A survey of both the statistical and ethical problems may be found in [Ros96].
The two-armed bandit problem may be played with asymptotic efficiency. In other words, letting X n be the payoff at time n, there is a strategy such that lim n→∞ 1 n n k=1 X k = max{p, q} no matter what the values of p and q. The first construction I am aware of is due to [Rob52]. A number of papers followed upon that, giving more quantitative solutions in the cases of a finite time horizon [Vog62b;Vog62a], under a finite memory constraint [Rob56; SP65; Sam68], or in a Bayesian framework [Fel62;FvZ70]. One way to formulate an algorithm for asymptotically optimal play is: let {ǫ n } be a given sequence of real numbers converging to zero; with probability 1 − ǫ n at time n, play whichever arm up to now has the greater average return, and with probability ǫ n play the other arm. Such an algorithm is described in [Duf96] and shown to be asymptotically efficient.
In designing a clinical trial, it could be argued that the common good is best served by gathering the most data, since the harm to any finite number of patients who are given the inferior treatment is counterbalanced by the greater efficacy of treatment for all who follow. Block designs, for example alternating between the treatments, were once prevalent but suffer from being predictable by the physician and therefore not double blind.
In 1978, Wei and Durham [WD78] proposed the use of an urn scheme to dictate the sequence of plays in a medical trial. Suppose two treatments have dichotomous outcomes, one succeeding with probability p and the other with probability q, both unknown. In Wei and Durham's scheme there is an urn containing at any time two colors of balls, corresponding to the two treatments.
At each time a ball is drawn and replaced, and the corresponding treatment given. If the treatment succeeds, α identical balls and β < α balls of the opposite color are added; if the treatment fails, α balls of the opposite color and β balls of the same color are added. This is a GPU with random reinforcement and mean reinforcement matrix The unique equilibrium gives nonzero frequencies to both treatments but favors the more effective treatment. It is easy to execute, unpredictable, and comprises between balance and favoring the superior treatment.
If one is relatively more concerned with reducing the number of inferior treatments described, then one seeks something closer to asymptotic efficiency. It is possible to achieve this via an urn scheme as well. Perhaps the simplest way is to reinforce by a constant α if the chosen treatment is effective, but never to reinforce the treatment not chosen. The mean reinforcement matrix for this is simply p 0 0 q . If p = q we have a Pólya urn with a random limit. If p > q we obtain the diagonal urn of Theorem 3.3; the urn population approaches a pure state consisting of only the more effective treatment, with the chance of assigning the inferior treatment at time n being on the order of n −|p−q|/p . Surprisingly, the literature on urn schemes in sequential sampling, as recently as the survey [Dir00] contains no mention of such a scheme. In [LPT04] a stochastic approximation scheme is introduced. Their context is competing investments, and they assume a division of the portfolio into two investments (X n , 1 − X n ). Let {γ n } be a sequence of positive real numbers summing to infinity. Each day, a draw from the urn determines which investment to monitor: the first is monitored with probability X n and the second with probability 1 − X n . If the monitored investment exceeds some threshold, then a fraction γ n of the other investment is transferred into that investment. The respective probabilities for the investments to perform well are unknown and denoted by p and q. Defining T n recursively by T n /T n+1 = 1−γ n , this is a time-dependent Pólya urn process (see Section 3.2) with a n = T n+1 − T n , modified so that the reinforcement only occurs if the chosen investment exceeds the threshold. If γ n = 1/n then a n ≡ 1 and one obtains the diagonal Pólya urn of the preceding paragraph.
When p = q, the only equilibria are at X n = 0 and X n = 1. The equilibrium at the endpoint 0 is attracting when p < q and repelling when p > q, and conversely for the equilibrium at 1. The attractor must be the limit of {X n } with positive probability, but can the repeller be the limit with positive probability? The answer depends on the sequence {γ n }. It is shown in [LPT04] that for γ n ∼ n −α , the repeller can be a limit with positive probability when α < 1. Indeed, in this case it is easy to see that with positive probability, the attractor is chosen only finitely often. Since we assume n γ n = ∞, this leaves interesting cases near γ n ≈ n −1 . In fact Lamberton, Pagès and Tarrès [LPT04,Corollary 2] show that for γ n = C/(n + C) and p > q, the probability of converging to the repeller is zero if and only if C < 1/p.

Learning
A problem of longstanding interest to psychologists is how behavior is learned. Consider a simple model where a subject faces a dichotomous choice: A or B. After choosing, the subject receives a reward. How is future behavior influenced by the reward? Here, the subjects may be animals or humans: in [Her70] pigeons pecked one of two keys and were rewarded with food; in [SP67] the subjects were rats and the reward was pleasant electrical stimulation; in [RE95] the subjects were human and the reward monetary; in [ES54] the subjects were human and success was its own reward. All of these experimenters wished primarily to describe what occurred.
The literature on this sort of learning model is large, but results tend to be mixed, with one model fitting one experiment but not generalizing well. I will, therefore, be content here to describe two popular models and say where they arise. A very basic model is that after a short while, the subject learns which option is best and fixates on that option. According to Herrnstein [Her70, page 243], this does not describe the majority of cases. A hypothesis over 100 years old [Tho98], called the law of effect, is that choices will be made with probabilities in proportion to the total reward accumulated when making that choice in the past. Given a (deterministic or stochastic) reward scheme, this then translates into a GPU. In the economic context, the law of effect, also called the matching law, is outlined by Roth and Erev [RE95]. They note a resemblance to the evolutionary dynamics formulated by Maynard Smith [MS82], though the models are not the same, and apply their model and some variants to a variety of economic games.
Erev and Roth provide little philosophical justification for the matching law, though their paper has been very influential among evolutionary game theorists. When there are reasons to believe that decision making is operating at a simple level, such models are particularly compelling. In a study of decision making by individuals with brain damage stemming from Huntington's disease, Busemeyer and Stout [BS02] compare a number of plausible models including a Bayesian expected utility model, a stochastic model similar to the Markovian learning models described in the next paragraph, and a Roth-Erev type model. They estimate parameters and test the fit of each model, finding that the Roth-Erev model consistently outperforms the others. See Section 4.6 for more general justifications of this type of model.
A second type of learning model in the psychology literature is a Markovian model with constant step size, which exhibits a stationary distribution rather than convergence to a random limit. Norman [Nor74] reviews several such models, the simplest of which is as follows. A subject repeatedly predicts A or B (in this case, a human predicts whether or not a lamp will flash). The subject's internal state at time n is represented by the probability the subject will choose A, and is denoted X n . The evolution rules contain for parameters, θ 1 , . . . , θ 4 ∈ (0, 1). The four possible occurrences are choose A correctly, choose A incorrectly, choose B incorrectly, or choose B correctly, and the new value of X n+1 is respectively X n +θ 1 (1−X n ), (1−θ 2 )X n , X n +θ 3 (1−X n ) or (1−θ 4 )X n . Such models were introduced by [ES54 ;BM55]. The corresponding Markov chain on [0, 1] is amenable to analysis. One interesting result [Nor74, Theorem 3.3] is when θ 1 = θ 4 = θ and θ 2 = θ 3 = 0. Sending θ to zero while nθ → t gives convergence of X nt to the time-t distribution of a limiting diffusion.

Evolutionary game theory
Evolutionary game theory is the marriage of the economic concepts of game theory and Nash equilibria with the paradigm of Darwinian evolution originating in biology. A useful reference is [HS98] (replacing the earlier work [HS88]), which has separate introductions for economists and biologists. This subject has exploded in the last several decades, with entire departments and institutes devoted to its study. Naturally, only a very small piece can be discussed here. I will present several applications that reflect the use of urn and reinforcement models, capturing the flavor of this area by giving a vignette rather than a careful history of ideas and methods in evolutionary game theory (and even then, it will take a few pages to arrive at any urn models).

Economics meets biology
Applications of evolutionary game theory arise both in economics and biology. This is because each discipline profits considerably from the paradigms of the other, as will now be discussed.
A dominant paradigm in genetics is the stochastic evolution of a genome in a fitness landscape. The fitness landscape is a function from genotypes to the real numbers, measuring the adaptive fitness of the corresponding phenotype in the existing environment. A variety of models exist for the change in populations of genotypes based on natural selection with respect to the fitness landscape. Often, randomness is introduced by mechanisms of mutation as well as by stochastic modeling of interactions with the environment. Much of the import of any particular model is in the details of the fitness landscape. Any realistic fitness landscape is hopelessly intractable and different choices of simplifications lead to models illuminating different aspects of evolution.
Game theory enters the biological scene as one type of model for fitness, designed to capture some aspect of the behavior of interacting organisms. Game theoretic models focus on one or two behavioral attributes, usually modeled as expressions of single genes. Different genotypes correspond to different strategies in a single game. Fitness is modeled by the payoff of the given strategy against a mix of other strategies determined by the entire population. Selection acts through increased reproduction as a function of fitness.
In economics, the theory of games and equilibria has been a longstanding dominant paradigm. Interactions between two or more agents are formalized by payoff matrices. Pure and mixed strategies are allowed, but it is generally held that the only strategies that should end up played by rational, informed agents should be Nash equilibria 4 , that is, strategies that cannot be improved upon given the stochastic mix of strategies in use by the other agents. Two-player games of perfect information are relatively straightforward under assumptions of rationality and perfect information. There is, however, often a distressing lack of correspondence between actual behavior and what is predicted by Nash equilibrium theory.

Equilibrium selection
Equilibrium theory can only predict that certain strategies will not be played, leaving open the question of selection among different equilibria. Thus, among the questions that motivated the introduction of evolutionary mechanisms are: • equilibrium selection Which of the equilibria will be played?
• equilibrium formation By what route does a population of players come to an equilibrium? • equilibrium or not Will an equilibrium be played at all?
Darwinism enters the economic scene as a means of incorporating bounded information and rationality, explaining equilibrium selection, and modeling games repeated over time and among collections of agents. Assumptions of perfect information and rationality are drastically weakened. Instead, one assumes that individual agents arrive with specific strategies, which they alter only due to data about how well these work (fitness) or to unlikely chance events (mutation). These models make sense in several types of situation. One is when agents are assumed to have low information, for instance in modeling adoption of new technology by consumers, companies, and industries (see the discussion in Section 4.1 of VHS versus Betamax, or Apple versus Mac). Another is when agents are bound by laws, rules or protocols. These, by their nature, must be simple and general 5 .
One early application of evolutionary game theory was to explain how players might avoid a Pareto-dominated equilibrium. The ultimate form of this is the Prisoner's dilemma paradox, in which smart people (e.g., game theorists) must choose the only Nash equilibrium, but this is not Pareto-optimal and in fact is dominated by a non-equilibrium play chosen by uneducated people (e.g., mobsters). There are by now many solutions to this dilemma, most commonly involving repeated play. Along the lines of evolutionary game theory, large-scale interactive experiments have been run 6 in which contestants are solicited to submit computer programs that embody various strategies in repeated Prisoner's 4 Many refinements of this notion have been formulated, including subgame-perfect equilibria, coordinated equilibria, etc.
5 Morals and social norms may be viewed as simple and general principles that may be applied to complex situations. An evolutionary game theoretic approach to explaining these may therefore seem inevitable, and indeed this is the thrust of recent works such as [Sky04;Ale05]. 6 The first was apparently run by Robert Axelrod, a political scientist at the University of Michigan.
Dilemma, and then these are run against each other (in segments of 50 games against each individual opponent) with actual stochastic replicator dynamics to determine which strategies thrive in evolving populations 7 .
In the context of more general two-player games, Harsanyi and Selten introduced the concept of the risk-dominant equilibrium. This is a notion satisfying certain axioms, among which are naturality not only with respect to gametheoretic equivalences but also the best-reply structure. Consider symmetric 2 × 2 games of the form (a, y) (0, 0) (0, 0) (b, z) .
When a > b and z > y this is a prototypical Nash Bargaining Game. The strategy pair (1, 1) is risk-dominant if ay > bz. For these games, Pareto-optimality implies risk-dominance, but for other 2 × 2 games with multiple equilibria, the risk-dominant equilibrium may not be Pareto-optimal. Another development in the theory of equilibrium selection, dating back to around 1973, was Selten's trembling hand. This is the notion of stochastically perturbing a player's chosen strategy with a small probability ǫ. The idea is that even in an obviously mutually beneficial Nash equilibrium, there is some chance that the opponent will switch to another strategy by mistake (a trembling of the hand), if not through malice or stupidity 8 . A number of notions of equilibria stable under such perturbations arose, depending on the exact model for the ǫperturbation, and the way in which ǫ → 0. An early definition due to J. Maynard Smith was formulated without probability. An evolutionarily stable strategy is a strategy such that if it is adopted by a fraction 1 − ǫ of the population, then for sufficiently small ǫ, any other strategy fares worse.

Replicator dynamics
One of the earliest and most basic evolutionary game theoretic models is the replicator. There are two versions: the (deterministic) replicator dynamical system and the stochastic replicator. The deterministic replicator assumes a population in which pairs of players with strategy types 1, . . . , m are repeatedly selected at random from a large population, matched against each other in a fixed (generally non-zero-sum) two-player game, and then given a selective advantage in accordance with the outcome of the game. Formally, the model is defined as follows. Fix a two-player (non-zero-sum) game with m strategies for each player such that the payoff to i when playing i against j does not depend on whether the player is Player 1 or Player 2; the matrix of these outcomes is denoted M . Let X(t) denote the normalized population vector, that is, X i (t) is the proportion of the population at time t that is of type i. For any normalized population vector y, the expected outcome for strategy i against a random pick from the population is E(i, y) := m j=1 M i,j y j . Let E ′ (i, y) := E(i, y) − E 0 (y) where E 0 (y) := m j=1 E(j, y) is the average fitness for the population y; we interpret E ′ (i, y) as the selective advantage of type i in population y and let E ′ (y) denote the vector with components E ′ (i, y). The replicator model is the differential equation d dt X(t) = E ′ (X(t)) . HS88] study it extensively. The notion of evolutionarily stable strategies may be generalized to mixed strategies by means of replicator dynamics. Nash equilibria correspond to rest points for the replicator dynamics. An evolutionarily stable state is a population vector that is an attractor for the replicator dynamics (see [HS98, Theorem 7.3.2]).
The presence of the continuous parameter in replicator dynamics indicates that they are a large-population limit. There are a number of discrete systems achieving this limit, but one of the most natural is the stochastic replicator. Fix a positive integer d and a d × d real matrix M . We view M as the payoff matrix (for the first player) in a two-player game with d possible strategies, and assume it is normalized to have nonnegative entries. At each integer time t ≥ 0 there is a population of some size N (t), consisting of individuals whose only attributes are their type, the allowed types being {1, . . . , d}. These individuals are represented by an urn with balls of colors 1, . . . , d numbering N (t) altogether. The population at time t + 1 is determined as follows. Draw i and j at random from the population at time t (with replacement) and return them to the urn along with M ij extra balls of type i. The interpretation is that M ij is the fitness of strategy i against strategy j and that the interaction between the two agents causes the representation of type i in the population will change on average by an amount proportional to its fitness against the other strategy it encounters. Repeating this will allow the average growth of type i to be proportional to its average success against all strategies weighted by their representation in the population. One might expect an increase as well of R ji in type j, since the interaction has, after all, effects on two agents; in the long run such a term would simply double the rate of change, since an individual will on average be chosen to be Player 1 half the time.
Much of the preceding paragraph is drawn from S. Schreiber's article [Sch01], in which further randomization is allowed (M is the mean matrix for a random increment); as we have seen before, this randomness is not particularly consequential; enough randomness enters through the choice of two individual players. Schreiber also allows M ij ∈ [−1, 0], which gives his results more general scope than some of their predecessors.
The stochastic replicator is evidently a generalized Pólya urn and its mean ODE is One may also consider the normalized population vector X(t) := Z(t)/|Z(t)|, where |Z(t)| is the sum of the components of Z(t). This evolves, as promised, by a (possibly time-changed) replicator equation In other words, the growth rate of The early study of replicator dynamics concentrated on determining trajectories of the dynamical systems, formulating a notion of stability (such as the evolutionarily stable strategy of [MSP73]), and applying these to theoretically interesting biological systems (see especially [MS82]).
The stochastic replicator process fits into the framework of Benaïm et al. described in Section 2.5 (except for the possibility of extinction when M ii is allowed to be negative). Schreiber [Sch01, Theorem 2.2] proves a version of Theorem 2.13 for replicator processes, holding on the event of nonextinction. This allows him to derive a version of Corollary 2.15 for replicator process. It follows from the attractor convergence theorem 2.16 that any attractor in the dynamical system attracts the replicator process with positive probability [BST04, Theorem 7].
Completing the circle ideas, Schreiber has applied his results to a biological model. In [SL96], data is presented showing that three possible color patterns and associated behaviors among the side-blotched lizard uta stansburiana have a non-transitive dominance order in terms of success in competing for females 9 . Furthermore, the evolution of population vectors over a six-year period showed a cycle predicted by the dynamical system models of Maynard Smith, which are cited in the paper. Schreiber then applies replicator process urn dynamics. These are the same as in the classic Rock-Paper-Scissors example analyzed in [HS98] and they predict initial cycling followed by convergence to an even mix of all three types in the population.

Fictitious play
A quest somewhat related to the problem of explaining equilibrium selection is the problem of finding a mechanism by which a population might evolve toward any equilibrium at all in a game with many strategies. In other words, the emphasis moves from explaining behavior in as Darwinistic a manner as possible to using the idea of natural selection to formulate a coordination algorithm by means of which relatively uninformed agents might adaptively find good (i.e., equilibrium) strategies. Such algorithms are quite important in computer science (internet protocols for use of shared channels, coordination protocols for parallel processing, and so forth).
In 1951, G. Brown [Bro51] proposed a mechanism known as fictitious play. A payoff matrix M is given for a two-player, zero-sum game. Two players play the game repeatedly, with each player choosing at time n + 1 an action that is optimal if under the assumption that the other player will play according to the past empirical distribution. That is, Player 1 plays i on turn n + 1 where i is a value of x maximizing the average payoff n −1 n k=1 M x,y k and y 1 , . . . , y n are the previous plays of Player 2; Player 2 plays analogously. Robinson [Rob51] showed that for each player, the empirical distribution of their play converges to an optimal mixed strategy 10 .
Fictitious play makes sense for non-zero sum games as well, and for games with more than two players, provided it is specified whether the Bayesian assumption is that each other player independently plays from his empirical distribution or whether the joint play of the other players is from the joint empirical distribution. Robinson's result was extended to non-zero-sum 2 × 2 games by [Miy61], but then shown to fail in general by Shapley [Sha64] (a twoplayer, three-strategy counterexample; see also [Jor93] for a counterexample with dichotomous strategies but three players). There are, however, subclasses of non-zero-sum games for which fictitious play has been shown to converge to Nash equilibria. These include potential games [MS96] (every player receives the same payoff), super-modular games [MR90] (the payoff matrix is super-modular) and games with interior evolutionarily stable strategies.
Although originally proposed as a computational mechanism, fictitious play became popular behavioral modelers. However, when interpreted as a psychological micro-level mechanism, there are troubling aspects to fictitious play. For a two-player zero-sum game with a unique Nash equilibrium, while the marginals will converge to a saddle point, the plays of the two players may be entirely coordinated, so that actual payoffs may not have the correct long-run average. When there are more than two players, modeling the opponents' future plays as independent picks from empirical marginals seems overly naïve because the empirical joint distribution is known. (The coordination problems that can arise with two players can be thought of in the same way: a failure to model dependence between the opponent's plays one's own plays.) Fudenberg and Kreps [FK93] address these concerns via a greatly generalized framework of optimum response. There chief concern is to give a notion of convergence to Nash equilibrium that precludes the kind of coordination problems mentioned above. In doing so, they take up the notion, due to Harsanyi [Har73], of stochastically perturbed best response, in which each player has independent noise added to the utilities during the computation of the optimum response. They then extend Miyasawa's result on convergence of fictitious play for 2 × 2 non-zero-sum games to the setting of stochastic fictitious play, under the assumption of a unique Nash equilibrium [FK93, Proposition 8.1].
Stochastically perturbed fictitious play fits directly into the stochastic approximation framework. While the stochastic element caused technical difficulties for Fudenberg and Kreps, for whom the available technology was limited to pre-1990 works such as [KC78;Lju77], this same element fits nicely into the framework of Benaïm et al. to eliminate unstable trajectories. The groundwork for an analysis in the stochastic approximation framework was laid in [BH99a]. They obtain the usual basic conclusions: the system converges to chain recurrent sets for the associated ODE and attractors attract with positive probability.
They give examples of failure to converge, including the stochastic analogue of Jordan's 2 × 2 × 2 counterexample. They then begin to catalogue cases where stochastic fictitious play does converge. Under suitable nondegeneracy assumptions on the noise, they extend [FK93,Proposition 8.1] to allow at most countably many Nash equilibria. Perhaps more interesting is their introduction of a class of two-player n × 2 games they call generalized coordination games for which they are able to obtain convergence of stochastic fictitious play. This condition is somewhat restrictive, but in a subsequent work [BH99b], they formulate a simpler and more general condition. Let F denote the vector field of the stochastic approximation process associated with stochastically perturbed fictitious play for a given m-player (non-zero-sum) game. Say that F is cooperative if ∂F i /∂x j ≥ 0 for every i = j. For example, it turns out that the vector field for any generalized coordination game is cooperative. Under a number of technical assumptions, they prove the following result for any cooperative stochastic approximation. Note though, that this is proved for stochastic approximations with constant step size ǫ, as ǫ → 0; this is in keeping with the prevailing economic formulations of perturbed equilibria, but in contrast to the usual stochastic approximation framework. . If F is cooperative then as ǫ → 0, the empirical measure of the stochastic approximation process converges in probability to the set of equilibria of the vector field F . If in addition either F is real analytic or has only finitely many stable equilibria, then the empirical distribution converges to an asymptotically stable equilibrium.
Remark. This result requires constant step size (2.6) but is conjectured to hold under (2.7)-(2.8); see [Ben00, Conjecture 2.3]. The difficulty is that the convergence theorems for general step sizes require smoother unstable manifolds than can be proved using the cooperation hypothesis.
Benaïm and Hirsch then show that this result applies to any m-player generalized coordination game with stochastic fictitious play with optimal response determined as in the framework of [FK93], provided that the response map is smooth (which requires some noise). Generalized coordination games by definition have only two strategies per player, so the extension of these results to multi-strategy games was left open. At the time or writing, the final installment in the story of stochastic fictitious play is the extension by Hofbauer and Sandholm of the non-stochastic convergence results (for potential games, su-permodular games and games with an internal evolutionarily stable strategy) to the stochastic setting [HS02]. Forthcoming work of Benaïm, Hofbauer and Sorin [BHS05; BHS06] replaces the differential equation by a set valued differential inclusion in order to handle fictitious play with imperfect information or with discontinuous F .

Agent-based modeling
In agent-based models, according to [Bon02], "A system is modeled as a collection of autonomous decision-making entities called agents, [with each] agent individually assessing its situation and making decisions on the basis of a set of rules." A typical example is a graph theoretic model, where the agents are vertices of a graph and at each time step, each agent chooses an action based on various characteristics of its neighbors in the graph; these actions, together with external sources of randomness, determine outcomes which may alter the characteristics of the agents. Stochastic replicator dynamics fall within this rubric, as do a number of the other processes already discussed. The boundaries are blurry, but this section is chiefly devoted to agent-based models from the social sciences, in which some sort of graph theoretic structure is imposed.
Analytic intractability is the rule rather than the exception for such models. The recent boom in agent-based modeling is probably due to the emergence of fast computers and of software platforms specialized to perform agent-based simulation. One scientific utility for such models is to give simple explanations for complex phenomena. Another motivation comes from psychology. Even in situations where people are capable of some kind of rational game-theoretic computation, evidence shows that actual decision mechanisms are often much more primitive. Brain architecture dictates that the different components of a decision are processed by different centers, with the responses then chemically or electrically superimposed (see for example [AHS05]). Three realistic components of decision making, captured better by agent-based models than by rational choice models are noted by Flache and Macy [FM02, page 633]: • Players develop preferences for choices associated with better outcomes even though the association may be coincident, causally spurious, or superstitious. • Decisions are driven by the two simultaneous and distinct mechanisms of reward and punishment, which are known to operate ubiquitously in humans. • Satisficing, or persisting in a strategy that yields a positive but not optimal outcome, is common and indicates a mechanism of reinforcement rather than optimization.
Agent-based models now abound in a variety of social science disciplines, including psychology, sociology [BL03], public health [EL04], political science [OMH + 04]. The discussion here will concentrate on a few game-theoretic applications in which rigorous results have been obtained.
A number of recent analyses have centered on a two-player coordination game similar to Rousseau's stag hunt. Each player can choose to hunt rabbits or stags. The payoff is bigger for a stag but the stag hunt is successful only if both players hunt stag, whereas rabbit hunting is always successful. More generally, consider a payoff matrix as follows . (4.5) When a > d and b > c, the outcomes (a, a) and (b, b) are both Nash equilibria. Assume these inequalities, and without loss of generality, assume a > b. a) is always the unique Pareto-optimal equilibrium. In 1993, Kandori, Mailath and Rob [KMR93] analyzed a very general class of evolutionary dynamics for populations of N individuals associated with the two strategy types. The class included the following extreme version of stochastic replicator dynamics: each player independently with probability 1 − 2ǫ changes type to whatever strategy type was most successful against the present population mix, and with probability 2ǫ resets the type according to the result of independent fair coins. In the case of a game described by 4.5 they showed that the resulting Markov chain always converged to the risk-dominant equilibrium in the sense that the chain had a stationary measure µ N,ǫ satisfying: Proof: Assume without loss of generality that a − d > b − c, that is, that strategy 2 is risk-dominant. There is an embedded two-state Markov chain, where state 1 contains all populations where the proportion of type 1 players is at least αN , and α(ǫ) is the threshold for strategy 1 to be superior to strategy 2 against such a population. Due to a − d > b − c, we know α < N/2. Going from state 2 to state 1 occurs exactly when there are at least αN "mutations" (types chosen by coin-flip) and going from state 1 to state 2 occurs when there are at least αN mutations. The ratio of the stationary measures of state 1 to state 2 goes to the ratio of these two probabilities, which goes to infinity.
Unfortunately, the waiting time to get from either state to the other is exponential in N log(1/ǫ), meaning that for many realistic parameter values, the population, if started at the sub-optimal equilibrium, does not have time to learn the better equilibrium. This many simultaneous mutations are as rare as all the oxygen molecules suddenly moving to the other side of the room (well not quite). Ellison [Ell93] proposes a variant. Let the agents be labeled by the integers modulo N , and for fixed k < N/2, let i and j be considered neighbors if their graph distance is at most k. Ellison's dynamics are the same as in [KMR93] except that each agent with probability 1 − 2ǫ chooses the best play against the reference population consisting of that individual together with its 2k neighbors. The following result shows that when global interactions are replaced by local interactions, the population learns the optimal equilibrium much more rapidly. Proof: Let j ≤ k be such that j out of 2k + 1 neighbors of type 1 is sufficient to make strategy 1 optimal. Once there are j consecutive players of type 1, the size of the interval of players of type 1 (allowing an ǫ fraction of errors) will tend to increase by roughly 2(k − j − ǫN ) at each turn. The probability of the interval r + 1, . . . , r + j all turning to type 1 in one step is small but nonzero, so for sufficiently large N , such an interval arises immediately.
The issue of how people might come to choose the superior (a, a) in this case has been of longstanding concern to game theorists. In [SP00], a new evolutionary dynamic is introduced. A two-player game is fixed, along with a population of players labeled 1, . . . , N . Each player is initially assigned a strategy type. Positive weights w(i, j, 1) are assigned as well, usually all equal to 1. The novel element to the model is the simultaneous evolution of network structure with strategy. Specifically, the network at time t is given by the collection of weights w(i, j, t) representing propensities for player i to interact with player j at time t. At each time step, each player i chooses a partner j independently at random with probabilities proportional to w(i, j, t), then plays the game with the partner. After this, w(i, j, t + 1) is set equal to w(i, j, t) + u and w(j, i, t + 1) is set equal to w(j, i, t) + u ′ , where u and u ′ are the respective utilities obtained by players i and j. (Note that each player plays at least once in each round, but more than once if the player is chosen as partner by one of more of the other players.) In their first model, Skyrms and Pemantle take the strategy type to be fixed and examine the results of evolving network structure. with 2k > 0 stag hunters and 2(n − k) > 0 rabbit hunters. Under the above network evolution rules, with no evolution or mutation of strategies, as t → ∞, the probability approaches 1 that all stag hunters choose stag hunters and all rabbit hunters choose rabbit hunters.
Proof: If i is a stag hunter and j is a rabbit hunter then w(i, j, t) remains 1 for all time; hence stag hunters do not choose rabbit hunters in the limit. The situation is more complicated for w(j, i, t), since rabbit hunters get reinforced no matter whom they choose or are chosen by. However, if A denotes the set of stag hunters and Z(j, t) denotes the probability i∈A w(j, i, t)/ i w(j, i, t) that j will choose a stag hunter at time t, then it is not hard to find λ, µ > 0 such that exp(λZ(j, t) + µ log t) is a supermartingale, which implies that Z(i, t) → 0 (in fact, exponentially fast in log t).
Further results via simulation show that when each agent after each round decides with a fixed probability ǫ > 0 to switch to the strategy that is optimal against the present population, then all agents converge to a single type. However, it is random which type. When the evolution of strategy was slow (e.g., ǫ = 1/100), the system usually found at the optimal equilibrium (everyone hunts stag) but when the evolution of strategy was more rapid (e.g., ǫ = 1/10), the majority (78%) of the simulations resulted in the maximin equilibrium where everyone hunts rabbits. Evidently, more rapid evolution of strategy causes the system to mirror the stochastic replicator models in which the risk-dominant equilibrium is always chosen.

Splines and interpolating curves
Computer-aided drawing programs often provide interpolated curves. A finite sequence x 0 , . . . , x n of control points in R d are specified, and a curve {f (t) : 0 ≤ t ≤ 1} is generated which in some sense approximates the polygonal path g(t) defined to equal x k + (nt − k)(x k+1 − x k ) for k/n ≤ t ≤ (k + 1)/n. In many cases, the formula for producing f is Depending on the choice of {B n,k (t)}, one obtains some of the familiar blending curves: Bezier curves, B-splines, and so forth.
Goldman [Gol85] proposes a new family of blending functions. Consider a two-color Pólya urn with constant reinforcement c ≥ 0, initially containing t red balls and 1 − t black balls. Let B n,k (t) be the probability of obtaining exactly k red balls in the first n trials. The functions {B n,k } are shown to have almost all of the requisite properties for families of blending functions. In particular, (i) {B n,k (t) : k = 0, . . . , n} are nonnegative and sum to 1, implying that the interpolated curve is in the convex hull of the polygonal curve; (ii) B n,k (t) = B n,n−k (1 − t) implying symmetry under reversal; (iii) B n,k (0) = δ k,0 and B n,k (1) = δ k,n , implying that the curve and polygon have the same endpoints (useful for piecing together curves); (iv) n k=0 kB n,k (t) = nt, implying that the curve is a line when x k+1 − x k is independent of k; (v) The curve is less wiggly than the polygonal path: for any vector v, the number of sign changes of f (t) · v is at most the number of sign changes of g(t) · v (vi) Given P 0 , . . . , P n there are Q 0 , . . . , Q n+1 that reproduce the same curve f (t) with the same parametrization; (vii) Any segment {f (t) : a ≤ t ≤ b} of the curve with control points P 0 , . . . , P n is reproducible as the entire curve corresponding to control points Q 0 , . . . , Q n , where the parametrization may differ but n remains the same.
There is of course an explicit formula for the polynomials B n,k (t). This generalizes the Bernstein polynomials, which are obtained when the reinforcement parameter c is zero. However, the urn model pulls its weight in the sense that verification of many of the features is simplified by the urn model interpretation. For example, the first fact translates simply to the fact that for fixed n and t the quantities B n,k (t) are the probabilities of k + 1 possible values of a random variable.
In a subsequent paper [Gol88a], Goldman goes on to represent the so-called Beta-spline functions of [Bar81] via a somewhat more complicated time-varying Friedman urn model. Classical B-splines have a similar representation, which has consequences for the closeness of approximations by B-splines and Bernstein polynomials [Gol88b].

Image reconstruction
An interesting application of a network of Pólya urns is described in [BBA99]. The object is to reconstruct an image, represented in a grid of pixels, each of which contains a single color from a finite color set {1, . . . , k}. Some coherence of the image is presumed, indicating that pixels dissimilar to their neighbors are probably errors and should be changed to agree with their neighbors. Among the existing methods to do this are maximum likelihood estimators, Markov random field models with Gibbs-sampler updating, and smoothing via wavelets. Computation of the MLE may be difficult, the Gibbs sampler may converge too slowly, and wavelet computation may be time-consuming as well.
Banarjee et al. propose letting the image evolve stochastically via a network of urns. This is fast, parallelizable, and should capture the qualitative features of smoothing. The procedure is as follows. There is an urn for each pixel. Initially, urn x contains x(j) balls of color j, where and δ(y, j) is one if pixel y is colored j and zero otherwise. In other words, the initial contents are determined by the empirical distribution of colors near x, weighted by inverse distance. Define a neighborhood structure: for each x there is a set of pixels N (x); this may for example be nearest neighbors or all pixels up to a certain distance from x. The update rule for urn x is to sample from the combined urn of all elements of N (x) and add a constant number ∆ of balls of the sampled color to urn x. This may be done simultaneously for all x, sequentially, or by choosing x uniformly at random. After a long time, the process halts and the output configuration is chosen by taking the plurality color at each pixel. The mathematical analysis is incomplete, but experimental data shows that this procedure outperforms a popular relaxation labeling algorithm (the urn scheme is faster and provides better noise reduction).

Reinforced random walk
In 1987 [Dia88] (see also [CD87]), Diaconis introduced the following process, known now as edge-reinforced random walk or ERRW. A walker traverses the edges of a finite graph. Initially any edge incident to the present location is equally likely, but as the process continues, the likelihood for the walker to choose an edge increases with each traversal of that edge, remaining proportional to the weight of the edge, which is one more than the number of times the edge has been traversed in either direction.
Formally, let G := (V, E) be any finite graph and let v ∈ V be the starting vertex. Define X 0 = v and W (e, 0) = 1 for all e ∈ E. Inductively define F n := σ(X 0 , . . . , X n ), W ({y, z}, n) = W ({y, z}, n − 1) + 1({X n−1 , X n } = {y, z}), and let w is a neighbor of X n and zero otherwise. The main result of [CD87] is that ERRW is a mixture of Markov chains, and that the edge occupation vector converges to a random limit whose density may be explicitly identified.
Furthermore, the weights W := {W (e, n) : e ∈ E} approach a random limit continuous with respect to Lebesgue measure on the simplex {W : w(e) ≥ 0, e w(e) = 1} of sequences of nonnegative numbers indexed by E and summing to 1. The density of the limit is given by where w(v) denotes the sum of w(e) over edges e adjacent to v, d(v) is the degree of v and A is the matrix indexed by cycles C forming a basis for the homology group H 1 (G) with A(C, C) := e∈C 1/w(e) and A(C, D) = e∈C∩D ±1/w(e) with a positive sign if e has the same orientation in C and D and a negative sign otherwise.
This result is proved by invoking a notion of partial exchangeability [dF38], shown by [DF80] to imply that a process is a mixture of Markov chains 11 . The formula (5.1) is then proved by a direct computation. The computation was never written down and remained unavailable until a more general proof was published by Keane and Rolles [KR99]. The definition extends easily to ERRW on the infinite lattice Z d and Diaconis posed the question of recurrence: Question 5.1. Does ERRW on Z d return to the origin with probability 1?
This question, still open, has provoked a substantial amount of study. Early results on ERRW and some of its generalizations are discussed in the next subsection; the following subsections concern two other variants: vertex-reinforced random walk and continuous time reinforced random walk on a graph. For further results on all sorts of ERRW models, the reader is referred to the short but friendly survey [MR06].

Edge-reinforced random walk on a tree
A preliminary observation is that ERRW on a directed graph may be represented by a network of Pólya urn processes. That is, suppose that P (X n+1 = w | F n ) is proportional to one plus the number of directed transits from X n to w. Then for each vertex v, the sequence of vertices visited after each visit to v is distributed exactly as a Pólya urn process whose initial composition is one ball of color w for each neighbor w of v; as v varies, these urns are independent. Formally, consider a collection of independent Pólya urns labeled by vertices v ∈ V , the contents of each of which are initially a single ball of color w for each neighbor w of v; let {X n,v : n = 1, 2, . . .} denote the sequence of draws from urn v; then we may couple an ERRW {X n } to the independent urns so that X n+1 = w ⇐⇒ X s,v = w, where s is the number of times v has been visited at time n.
For the usual undirected ERRW, no such simple representation is possible because the probabilities of successive transitions out of v are affected by which edges the path has taken coming into v. However, if G is a tree, then the first visit to v = v 0 must be along the unique edge incident to v leading toward v 0 and the (n + 1) st visit to v must be a reverse traversal of the edge by which the walk left v for the n th time. This observation, which is the basis for Lemma 2.4, was used by [Pem88a] to represent ERRW on an infinite tree by an infinite collection of independent urns. In this analysis, the reinforcement was generalized from 1 to an arbitrary constant c > 0. The urn process corresponding to v = v 0 has initial composition (1+c, 1, . . . , 1) where the first component corresponds to the color of the parent of v, and reinforcement 2c each time. Recalling from (4.2) that such an urn is exchangeable with limit distribution that is Dirichlet with parameters (1 + c)/(2c), 1/(2c), . . . , 1/(2c), one has a representation of ERRW on a tree by a mixture of Markov chains whose transition probabilities out of each vertex are given by picks from the specified independent Dirichlet distributions. This leads to the following phase transition result (see also extensions by Collevecchio [Col04;Col06a;Col06b]).

Other edge-reinforcement schemes
The reinforcement scheme may be generalized in several ways. Suppose the transition probabilities out of X n at step n are proportional not to the weights w({X n , w}, n) incident to X n at time n but instead to F (w({X n , w}, n)) where F : Z + → R + is any nondecreasing function. Letting a n := F (n) − F (n − 1), one might alternatively imagine that the reinforcement is a n on the n th time an edge is crossed (see the paragraph in Section 3.2 on ordinal dependence). Davis [Dav99] calls this a reinforced random walk of sequence type. A special case of this is when a 1 = δ and a n = 0 for n ≥ 2. This is called once-reinforced random walk for the obvious reason that the reinforcement occurs only once, and its invention is usually attributed to M. Keane. More generally, one might take the sequence to be different for every edge, that is, for each edge e there is a nondecreasing function F e : Z + → R + and P(X n+1 = w | F n ) is proportional to F e (w(e, n)) with e = {X n , w}.
It is easy to see that for random walks of sequence type on any graph, if ∞ n=1 1/F (n) < ∞ then with positive probability the sequence of choices out of a given edge will fixate. This extends to Proof: Assume first that ∞ n=1 1/F (n) < ∞. To see that sup n X n < ∞ with probability 1, it suffices to observe that for each k, conditional on ever reaching k, the probability that sup n X n = k is bounded below by ∞ n=1 F (n)/(1 + F (n)) which is nonzero. The same holds for inf n X n , implying finite range almost surely. To improve this to almost sure fixation on a single edge, Davis applies Herman Rubin's Theorem (Theorem 3.6) to show that the sequence of choices from each vertex eventually fixates. Conversely, if ∞ n=1 1/F (n) is infinite, then each choice is made infinitely often from each vertex, immediately implying recurrence or converge to ±∞. The latter is ruled out by means of an argument based on the fact that the sum M n := Xn k=1 1/F (w({j − 1, j}, n)) of the inverse weights up to the present location is a supermartingale [Dav90, Lemma 3.0]. Remark 5.4. The most general ERRW considered in the literature appears in [Dav90]. There, the weights {w(e, n)} are arbitrary random variables subject to w(e, n) ∈ F n and w(e, n + 1) ≥ w(e, n) with equality unless e = {X n , X n+1 }. The initial weights may be arbitrary as well, with the term initially fair used to denote all initial weights equal to 1. At this level of generality, there is no exchangeability and the chief techniques are based on martingales. Lemma 3.0 of [Dav90], used to rule out convergence to ±∞ is in fact proved in the context of such a general, initially fair ERRW.
When the graph is not a tree, many of the arguments become more difficult. Sellke [Sel94] extended the martingale technique to sequence-type ERRW on the d-dimensional integer lattice. Because of the bipartite nature of the graph, one must consider separately the sums ∞ n=1 1/F (2n) and ∞ n=1 1/F (2n + 1). For convenience, let us assume these two both converge or both diverge. Theorems 1-3]). If ∞ n=1 1/F (n) < ∞ then with probability one, the process is eventually trapped on a single edge. If ∞ n=1 1/F (n) = ∞, then with probability one, the range is infinite and each coordinate is zero infinitely often.
The proofs are idiosyncratic, based on martingales and Rubin's construction. It is noted that (i) the conclusion in the case ∞ n=1 1/F (n) = ∞ falls short of recurrence; and (ii) that the conclusion of almost sure trapping in the opposite case is specific to bipartite graphs, with the argument not generalizing to the triangular lattice, nor even to a single triangle! This situation was not remedied until Limic [Lim03] proved that for ERRW on a triangle, when F (n) = n ρ for ρ > 1, the walk is eventually trapped on a single edge. This was generalized in [LT06] to handle any F with ∞ n=1 1/F (n) < ∞. Because of the difficulty of proving results for sequence-type ERRW, it was thought that the special case of once-reinforced random walk might be a more tractable place to begin. Even here, no one has settled the question of recurrence versus transience for the two-dimensional integer lattice. The answer is known for a tree. In contrast to the phase transition in ordinary ERRW on a tree, a once-reinforced ERRW is transient for every δ > 0 (in fact the same is true when "once" is replaced by "k times"). This was proved for regular trees in [DKL02] and extended to Galton-Watson trees in [Die05].
The only other graph for which I am aware of an analysis of once-reinforced ERRW is the ladder. Let G be the product of Z 1 with K 2 (the unique connected two-vertex graph); the vertices are Z × {0, 1} and the edges connect neighbors of Z with the same K 2 -coordinate or two vertices with the same Z coordinate. The following recurrence result was first proved by T. Sellke in 1993 in the more general context of allowing arbitrary vertical movement (cf. [MR05]).  These results are too recent to have been included in this survey.

Vertex-reinforced random walk
Recall that the vertex-reinforced random walk (VRRW) is defined analogously to the ERRW except that in the equation (2.1) for choosing the next step, the edge occupation counts (2.2) are replaced by the vertex occupation counts (2.4).
This leads to entirely different behavior. Partial exchangeability is lost, so there is no representation as a random walk in a random environment. There are no obvious embedded urns. Moreover, an arbitrary occupation vector is unlikely to be evolutionarily stable. That is, suppose for some large n, the normalized occupation vector X n whose components are the portion of the time spent at each vertex is equal to a vector x. Let π x denote the stationary measure for the Markov chain with transition probabilities p(y, z) = x z / z ′ ∼y x z ′ which moves proportionally to the coordinate of x corresponding to the destination vertex. For 1 ≪ k ≪ n, X n+k = (1 + o(1))X n , so the proportion of the time in [n, n + k] that the walk spends at vertex y will be proportional to π x (y). It is easy to see from this that {X n } obeys a stochastic approximation equation (2.6) with The analysis from here depends on the nature of the graph. The methods of Section 2.5 show that X n is an asymptotic pseudotrajectory for the flow dX/dt = F (X), converging to an equilibrium point or orbit. There is always a Lyapunov function V (x) := x T A x where A is the incidence matrix of the underlying graph G. Therefore, equilibrium sets are sets of constancy for V and any equilibrium point p is a critical point for V restricted to the face of the (d − 1)-simplex containing p. Any attractor for the flow appears as a limit with positive probability, while linearly unstable orbits occur with probability zero. Several examples are given in [Pem88b].
Example 5.2. Let G be a cycle of d nodes for d ≥ 5 (the smaller cases turn out to behave differently). The centroid (1/d, . . . , 1/d) is still an isolated equilibrium but for d ≥ 5, it is linearly unstable. Although it was only guessed at the time this example appeared in [Pem88b], it follows from the nonconvergence theorems of [Pem90a; BH95] that the probability of convergence to the centroid is zero. The other equilibria are cyclic permutations of the points (a, 1/2, 1/2 − a, 0, . . . , 0) and certain convex combinations of these. It was conjectured in [Pem88b] and corroborated by simulation that the extreme points, namely cyclic permutations of (a, 1/2, 1/2 − a, 0, . . . , 0), were the only possible limits.
Taking d → ∞ in the last example results in VRRW on the one-dimensional integer lattice. The analogous conjecture is that occupation measure for VRRW on Z converges to a translation of . . . , 0, 0, a, 1/2, 1/2 − a, 0, 0 [PV99] for the one-dimensional lattice by proving almost sure trapping on an interval of exactly five vertices, with the conjectured power laws.

Slime mold
A mechanism by which simple organisms move in purposeful directions is called taxis. The organism requires a signal to govern such motion, which is usually something present in the environment such as sunlight, chemical gradient or particles of food. Othmer and Stevens [OS97] consider instances in which the organism's response modifies the signal. In particular, Othmer and Stevens study myxobacteria: organisms which produce slime, over which it is then easier for bacteria to travel in the future. Aware of the work of Davis on ERRW [Dav90], they propose a stochastic cellular automaton to model the propagation of one or more bacteria. One of their goals is to determine what features of a model lead to stable aggregation of organisms; apparently previous such models have led to aggregates forming but then disbanding.
In the Othmer-Stevens model, the build-up in slime at the intersection points of the integer lattice is modeled by postulating that the likelihood for the organism to navigate to a given next vertex is one plus the number of previous visits to that site (by any organism). With one organism, this is just a VRRW, which they call a "simplified Davis' model", the simplification being to go from ERRW to VRRW. They allow a variable weight function W (n) = n k=1 a k . On page 1047, Othmer and Stevens describe results from simulations of the "simplified" VRRW for a single particle. Their analysis of the simulations may be paraphrased as follows.
If F (n) grows exponentially, the particle ultimately oscillates between two points. If F grows linearly with a small growth rate, the particle does not stay in a fixed finite region. These two results agree with the theoretical result, which is proven, however, only in one dimension. If the growth is linear with a large growth rate, results of the simulation are "no longer comparable to the theoretical prediction" but this is because the time for a particle to leave a fixed finite region increases with the growth rate of F .
Given what we know about VRRW, we can give a different interpretation of the simulation data. We know that VRRW, unlike ERRW, fixates on a finite set. The results of [Vol01] imply that for Z 2 the fixation set has positive probability both of being a 4-cycle and of being a plus sign (a vertex and its four neighbors). All of this is independent of the linear growth rate. Therefore, the simulations with large growth rates do agree with theory: the particle is being trapped rather than exiting too slowly to observe. On the other hand, for small values of the linear reinforcement parameter, the particle must also be trapped in the end, and in this case it is the trapping that occurs too slowly to observe. The power laws in [Vol01, Corollary 1] and part (iii) of Theorem 5.7 give an indication of why the trapping may occur too slowly to observe.
Othmer and Stevens are ultimately concerned with the behavior of large collections of myxobacteria, performing a simultaneous VRRW (each particle at each time step chooses the next step independently, with probabilities proportional to the total reinforcement due to other any particle's visits to the destination site). They make the assumption that the system may be described by differential equations corresponding to the mean-field limit of the system, where the state is described by a density over R 2 . They then give a rigorous analysis of the mean field differential equations, presumably related to scaling limits of ERRW 13 . The mean-field assumption takes us out of the realm of rigorous mathematics, so we will leave Othmer and Stevens here, but in the end they are able to argue that stable aggregation may be brought about by the purely local mechanisms of reinforced random walk.

A continuous-time reinforced jump process
The next section treats a number of continuous-time models. I include the vertex-reinforced jump process in this section because it is a process on dis-crete space which does not involve a scaling limit and seems similar to the other models in this section.
The vertex-reinforced jump process (VRJP) is a continuous-time process on the one-dimensional lattice. From any site x, at time t, it jumps to each nearest neighbor y at rate equal to one plus the amount of time, L(y, t), that the process has spent at y. On a state space that keeps track of occupation measure as well as position, it is Markovian. The process is defined and constructed in [DV02] and attributed to W. Werner; because the jump rate at time t is bounded by 2 + t, the definition is completely routine. We may obtain a oneparameter family of reinforcement strengths by jumping at rate C + L(y, t) instead of 1 + L(y, t).
The VRJP is a natural continuous-time analogue of VRRW. An alternative analogue would have been to keep the total jump rate out of x at 1, the chance of a jump to y = x ± 1 remaining proportional to the occupation measure at y. In fact the choice of variable jump rates decouples jumps to the left from jumps to the right, making the process more tractable. On any two consecutive sites a and a + 1, let m(t) denote the occupation measure of a + 1 the first time the occupation measure of a is t. Then m(t)/t is a martingale [DV02, Corollary 2.3], which implies convergence of the ratio of occupation measures at a + 1 and a. Together with some computations, this leads to an exact characterization of the (unscaled) random limit normalized occupation measure.
The collection {Y n : n ∈ Z} is distributed as This process may be defined on any locally finite graph. Limiting ratios of the occupation measure at neighboring vertices have the same description. On an infinite regular tree, this leads as in [Pem88a] to a transition between recurrence and transience, depending on the reinforcement parameter, C; see [DV04].

Continuous processes, limiting processes, and negative reinforcement
In this section we will consider continuous processes with reinforcement. Especially when these are diffusions, they might be termed "reinforced Brownian motion". Some of these arise as scaling limits of reinforced random walks, while others are defined directly. We then consider some random walks with negative reinforcement. The most extreme example is the self-avoiding random walk, which is barred from going where it has gone before. Limits of self-avoiding walks turn out to be particularly nice continuous processes.

Reinforced diffusions
Random walk perturbed at its extrema Recall the once-reinforced random walk of Section 5.2. This is a sequence-type ERRW with F (0) = 1 and F (n) = 1 + δ for any n ≥ 1. The transition probabilities for this walk may be phrased as P(k, k + 1) = 1/2 unless {X n } is at its maximum or minimum value, in which case P(k, k + 1) = 1/(2 + δ) or (1 + δ)/(2 + δ) respectively. If such a process has a scaling limit, the limiting process would evolve as a Brownian motion away from its left-to-right maxima and minima, plus some kind of drift inwards when it is at a left-to-right extremum. This inward drift might come from a local time process, but constructions depending on local time processes involve considerable technical difficulty (see, e.g., [TW98]). An alternate approach is an implicit definition that makes use of the maximum or minimum process, recalling the way a reflecting Brownian motion may be constructed as the difference, {B t − B # t }, between a Brownian motion and its minimum process.
Let α, β ∈ (−∞, 1) be fixed, let g * (t) := sup 0≤s≤t g(s) denote the maximum process of a function g and let g # (t) := inf 0≤s≤t g(s) denote its minimum process. Carmona, Petit and Yor [CPY98] examine the equation They show that if f is any continuous function vanishing at 0, then there is a unique solution g(t) to (6.1), provided that ρ := |αβ/((1 − α)(1 − β))| < 1. If f is the sample path of a Brownian motion, then results of [CPY98] imply that the solution Y t := g(t) to (6.1) is adapted to the Brownian filtration. It is a logical candidate for a "Brownian motion perturbed at its extrema". In 1996, Burgess Davis [Dav96] showed that the Carmona-Petit-Yor process is in fact the scaling limit of the once-reinforced random walk. His argument is based on the property that the map taking f to g in (6.1) is bounded: |g 1 −g 2 | ≤ C|f 1 − f 2 |. The precise statement is as follows. Let α = β = −δ. The process g will be well defined, since ρ = |δ/(1 − δ)| 2 < 1.

Drift as a function of occupation measure
Suppose one wishes to formulate a diffusion that behaves like a Brownian motion, pushed according to a drift that depends in some natural way on the past, say through the occupation measure. There are a multitude of ways to do this. One way, suggested by Durrett and Rogers [DR92], is to choose a function f and let the drift of the diffusion {X t } be given by t 0 f (X t − X s ) ds. If f is Lipschitz then there is no trouble in showing that the equation has a pathwise unique strong solution.
Durrett and Rogers were the first to prove anything about such a process, but they could not prove much. When f has compact support, they proved that in any dimension there is a nonrandom bound lim sup t→∞ |X t |/t ≤ C almost surely, and that in one dimension when f ≥ 0 and f (0) > 0, then X t /t → µ almost surely for some nonrandom µ. The condition f ≥ 0 was weakened by [CM96] to be required only in a neighborhood of 0. Among Durrett and Rogers' conjectures are that if f is a compactly supported odd function with xf (x) ≥ 0, then X t /t → 0 almost surely. Their reasoning is that the process should behave roughly like a negatively once-reinforced random walk. It sees only the occupation in an interval, say [X t − 1, X t + 1], drifting linearly to the right for a while due to the imbalance in the occupation, until diffusive fluctuations cause it to go to the left of its maximum. Now it should get pushed to the left at roughly linear rate until it suffers another reversal. They were not able to make this rigorous. However, taking the support to zero while maintaining 0 −∞ f (x) dx = c gives a very interesting process about which Toth and Werner were able to obtain results (see Section 6.3 below).
Cranston and Le Jan [CLJ95] take up this model in two special cases. When f (x) = −ax with a > 0, there is a restoring force equal to the total moment of the occupation measure about the present location. The restoring force increases without bound, so it may not be too surprising that X t converges almost surely to the mean of the limiting occupation measure.
Theorem 6.2 ([CLJ95, Theorem 1]). Let a > 0 and set f (x) = ax. Then there is a random variable X ∞ such that X t → X ∞ almost surely and in L 2 .
Proof: This may be derived from the fact that the stochastic differential equation has the (unique) strong solution with h(t, s) = 1 − ase as 2 /2 t s e −au 2 /2 du. The other case they consider is f (x) = −a sgn(x). It is not hard to show existence and uniqueness despite the discontinuity. This time, the restoring force is toward the median rather than the mean, but otherwise the same result, X t → X should and does hold [CLJ95,Theorem 2]. This was extended to higher dimensions by the following theorem of Raimond.
Drift as a function of normalized occupation measure The above diffusions have drift terms that are additive functionals of the full occupation measure µ t := µ 0 + t 0 δ Xs ds. The papers that analyze this kind of diffusion are [DR92; CLJ95; Rai97; CM96]; see also [NRW87]. In a series of papers [BLR02; BR02; BR03; BR05], Benaïm and Raimond (and sometimes Ledoux), consider diffusions whose drift is a function of the normalized occupation measure π t := t −1 µ t . Arguably, this is closer in spirit to the reinforced random walk. Another difference in the direction taken by Benaïm and Raimond is that their state space is a compact manifold without boundary. This sets it apart from continuum limits of reinforced random walks on Z d (not compact) or limits of urn processes on the (d − 1)-simplex (has boundary).
The theory is a vast extension of the dynamical system framework discussed in Section 2.5. To define the object of study, let M be a compact Riemannian manifold. There is a Riemannian probability measure, which we call simply dx, and a standard Brownian motion defined on M which we call B t . Let V : M ×M be a smooth "potential" function and define the function V µ by V µ(y) = V (x, y) dµ(x) .
The additive functional of normalized occupation measure is always taken to be V π t = t −1 V µ t ; thus the drift at time t should be ∇(V π t ). Since V µ(·) = V (x, ·) dµ(x), we may write the stochastic differential equation as in [BLR02]: A preliminary step establishes the existence of the process {X t } from any starting point, and including the possibility of any starting occupation measure [BLR02, Proposition 2.5]. Simultaneously, this defines the occupation measure process {µ t } and normalized occupation measure process {π t }. When t is large, π t+s will remain near π t for a while. As in the dynamical system and stochastic approximation framework, the next step is to investigate what happens if one fixes the drift for times t + s at −∇(V π t ). A diffusion on M with drift −∇f has an invariant measure whose density may be described explicitly as This leads us to define a function Π that associates with each measure µ the density of the stationary measure for Brownian motion with potential function V µ: The process {π t } evolves stochastically. Taking a cue from the framework of Section 2.5, we compute the deterministic equation of mean flow. If t ≫ δt ≫ 1, then π t+δt should be approximately π t + δt t (Π(π t ) − π t ). Thus we are led to define a vector field on the space of measures on M by A second preliminary step, carried out in [BLR02, Lemma 3.1], is that this vector field is smooth and induces a flow Φ t on the space P(M ) of probability measures on M satisfying As with stochastic approximation processes, one expects the trajectories of the stochastic process π t to approximate trajectories of Φ t . One expects convergence of π t to fixed points or closed orbits of the flow, positive probability of convergence to isolated sinks, and zero probability of convergence to unstable equilibria. A good part of the work accomplished in the sequence of papers [BLR02; BR02; BR03; BR05] is to extend results on asymptotic pseudotrajectories in R d to prove these convergence and nonconvergence results in the space of measures on M . One link in the chain that does not need to be extended is that Theorem 2.14 (asymptotic pseudotrajectories have chain transitive limits), which is already valid in a general metric space. Benaïm et al. then go on to prove the following results. The proof of the first is quite technical and occupies Section 5 of [BLR02].
Corollary 6.5. The limit set of {π t } is almost surely an invariant chainrecurrent set containing no proper attractor.
Theorem 6.6 ([BR05, Theorem 2.4]). Suppose that V is symmetric, that is, V (x, y) = V (y, x). With probability 1, the limit set of the process {π t } is a compact connected subset of the fixed points of Π (that is, the zero set of F ).
Proof: Define the free energy of a strictly positive f ∈ L 2 (dx) by where V f denotes the potential V f (y) = V (x, y)f (x) dx and f, g denotes f (x)g(x) dx. Next, verify that J is a Lyapunov function for the flow Φ t ([BR05, Proposition 4.1]) and that F (µ) = 0 if and only if µ has a density f and f is a critical point for the free energy, i.e., ∇J(F ) = 0 ([BR05, Proposition 2.9]). The result then follows with a little work from Theorem 6.4 and the general result that an asymptotic pseudotrajectory is a chain transitive set.
Corollary 6.7 ([BR05, Corollary 2.5]). If, in addition, the zero set of F contains only isolated points then π t converges almost surely.
The next two results are proved in a similar manner to the proofs of the convergence and nonconvergence results Theorem 2.16 and Theorem 2.9, though some additional infrastructure must be built in the infinite-dimensional case.
The nonconvergence results, as well as criteria for existence of a sink, the following definition is very useful.
While the assumption of a symmetric Mercer kernel may appear restrictive, it is shown [BR05, Examples 2.14-2.20] that many classes of kernels satisfy this, including the transition kernel for any reversible Markov semi-group, any even function of x − y on the torus T n that has nonnegative Fourier coefficients, any completely monotonic function of ||x − y|| 2 for a manifold embedded in R n . and any V represented as V (x, y) = E G(α, x)G(α, y) dν(α) for some space E and measure ν (this last class is in fact dense in the set of Mercer measures). The most important fact about Mercer kernels is that they are strictly convex.
Lemma 6.10 ([BR05, Theorem 2.13]). If V is Mercer then J is strictly convex, hence has a unique critical point f which is a global minimum.
Corollary 6.11. If V is Mercer then the process π t converges almost surely to the measure f dx where f minimizes the free energy, J.
Proof of lemma: The second derivative D 2 J is easily computed [BR05, Proposition 2.9] to be The second term is always positive definite, while the first is nonnegative definite by hypothesis.
The only nonconvergence result they prove requires a hypothesis involving Mercer kernels.
Theorem 6.12 ([BR05, Theorem 2.26]). If π * is a quadratically nondegenerate zero of F with at least one positive eigenvalue, and if V is the difference of Mercer kernels, then P(π t → π * ) = 0.
A number of examples are given but perhaps the most interesting is one where V is not symmetric. It is possible that there is no Lyapunov function and that the limit set of π t , which must be an asymptotic pseudotrajectory, may be a nontrivial orbit. In this case, one expects that µ t should precess along the orbit at logarithmic speed, due to the factor of 1/t in the mean differential equation dπ t /dt = (1/t)F (π t ).
When φ is not 0 or π this is not symmetric. A detailed trigonometric analysis shows that when c · cos φ ≥ −1/2, the unique invariant set for Φ t is Lebesgue measure, dx, and hence that π t → dx almost surely.
Suppose now that c · cos φ < −1/2. If φ = 0 then the critical points of the free energy function are a one-parameter family of zeros of F with densities g θ := c 1 (c)e c2(c)·cos(x−θ) . It is shown in [BLR02, Theorem 1.1] that π t → g Z almost surely, where Z is a random variable. The same holds when φ = π.
When φ = 0, π then things are the most interesting. The forward limit set for {π t } under Φ consists of the unstable equilibrium point dx (Lebesgue measure) together with a periodic orbit {ρ θ : θ ∈ S 1 }, obtained by averaging g θ while moving with logarithmic speed. To rule out the point dx as a limit for the stochastic process 6.2 would appear to require generalizing the nonconvergence result Theorem 2.17 to the infinite-dimensional setting. It turns out, however, that the finite-dimensional projection µ → S 1 x dµ maps the process to a stochastic approximation process in the unit disk, that is, the evolution of S 1 x dµ depends on µ only through S 1 x dµ. For the projected process, 0 is an unstable equilibrium, whence dx is almost surely not a limit point of {π t }. By Corollary 6.5, the limit set of the process is the periodic orbit. In fact there is a random variable Z ∈ S 1 such that This precise result relies on shadowing theorems such as [Ben99, Theorem 8.9].

Self-avoiding walks
A path of finite length on the integer lattice is said to be self-avoiding if its vertices are distinct. Such paths have been studied in the context of polymer chemistry beginning with [Flo49], where nonrigorous arguments were given to show that the diameter of a polymer chain of length n in three-space would be of order n ν for some ν greater than the value of 1/2 predicted by a simple random walk model. Let Ω n denote the set of self-avoiding paths in Z d of length n starting from the origin. Surprisingly, good estimates on the number of such paths are still not known. Hammersley and Morton [HM54] observed that |Ω n | is sub-multiplicative: concatenation is a bijection between Ω j × Ω k and a set containing Ω j+k . It follows that |Ω n | 1/n converges to inf k |Ω k | 1/k . The connective constant µ = µ d , defined to be the value of this limit in Z d , is not known, though rigorous estimates place µ 2 ∈ [2.62, 2.70] and nonrigorous estimates claim great precision. It is not known though widely believed that in any dimension, |Ω n+1 |/|Ω n | → µ d ; for closed loops Kesten [Kes63] did show that |Ω n+2 |/|Ω n | → µ 2 .
Let U n denote the uniform measure on Ω n . Given that the cardinality of Ω n is poorly understood, it is not surprising that U n is also poorly understood. In dimensions five and higher, a substantial body of work by Hara and Slade has established the convergence under rescaling of U n to Brownian motion, convergence of |Ω n+1 |/|Ω n |, and values of several exponents and constants. Their technique is to use asymptotic expansions known as lace expansions, based on numbers of various sub-configurations in the path. See [MS93] for a comprehensive account of work up to 1993 or Slade's piece in the Mathematical Intelligencer [Sla94] for a nontechnical overview.
In dimensions 2, 3 and 4, very little is rigorously known. Nevertheless, there are many conjectures, such as the existence and supposed values of diffusion exponent ν = ν d for which the U n -expected square distance between the endpoints of the path (usually denoted R 2 n ) is of order n 2ν . Absent rigorous results, the measure U n has been investigated by simulation, but even that is difficult. The exponential growth of U n prevents sampling for U n in any direct way once n is of order, say, 100.
Various Monte Carlo sampling schemes have been proposed. Beretti and Sokal [BS85] suggest a Markov Chain Monte Carlo algorithm, each step of which either extends the or retracts the path by one edge. Adjusting the relative probabilities of extension and retraction produces a Markov chain whose stationary distribution approximates a mixture of the measures U n and which approaches this distribution in polynomial time, provided certain conjectures hold and parameters have been correctly adjusted. Randall and Sinclair take this a step further, building into the algorithm foolproof tests of both of these provisions [RS00].
Of relevance to this survey are dynamic reinforcement schemes to produce self-avoiding or nearly self-avoiding random walks. It should be mentioned that there is no consensus on what measure should properly be termed the infinite self-avoiding random walk. If U n converges weakly then the limit is a candidate for such a walk. Two other ideas, discussed below, are to make the self-avoiding constraint soft, then take a limit, and to get rid of self-intersection by erasing loops as they form.
'True' self-avoiding random walk For physicists, it is natural to consider the constraint of self-avoidance to be the limit of imposing a finite penalty for each self-intersection. In such a formulation, the probability of a path γ is proportional to e −β·H(γ) where the energy H(γ) is the sum of the penalties.
A variant on this is to develop the walk dynamically via P(X n+1 = y | X n = x, F n ) = e −β·N (y,n) z∼x e −β·N (z,n) where N (z, n) is the number of visits to z up to time n. This does not yield the same measure as the soft-constraint ensemble, but it has the advantage that it extends to a measure on infinite paths. Such a random walk was first considered by [APP83] and given the unfortunate name true self-avoiding walk. For finite inverse temperature β, this object is nontrivial in one dimension as well as in higher dimensions, and most of what is rigorously known pertains to one dimension. Tóth [Tót95] proves a number of results. His penalty function counts the number of pairs of transitions across the same edge, rather than the number of pairs of times the walk is at the same vertex, but is otherwise the same as in [APP83].
In the terminology of this survey, we have an ERRW [Tót95] or VRRW [APP83] of sequence type, with sequence F (n) = e −βn . Tóth calls this exponential self-repulsion. In a subsequent paper [Tót94], the dynamics are generalized to subexponential self-repulsion F (n) = e −βn κ , with 0 < κ < 1. The papers [Tót96; Tót97] then consider polynomial reinforcement F (n) = n α . When α < 0 this is self-repulsion and when α > 0 it is self-attraction. The following results give a glimpse into this substantial body of work, concentrating on the case of self-repulsion. An overview, which includes the case of self-attraction, may be found in the survey [Tót99]. For technical reasons, instead of X N in the first result, a random stopping time θ(N ) = θ(λ, N ) is required. Define θ(N ) to be a geometric random variable with mean λN , independent of all other variables.
Theorem 6.13 (see [Tót99, Theorem 1.4]). Let {X n } be a sequence-type ERRW with sequence F (n) equal to one of the following, and define the constant ν in each case as shown.
Tóth also proves a Ray-Knight theorem for the local time spent on each edge. As expected the time scaling is N −γ where γ = (1 − ν)/ν.

Loop-erased random walk
Lawler [Law80] introduced a new way to generate a random self-avoiding path. Assume the dimension d is at least 3. Inductively, we suppose that at time n, a self-avoiding walk from the origin γ n := (x 0 , x 1 , . . . , x k ) ∈ Ω k has been chosen. Let X n+1 be chosen uniformly from the neighbors of x k , independently of what has come before. At time n + 1, if X n+1 is distinct from all x j , 0 ≤ j ≤ k, then γ n+1 is taken to be (x 0 , . . . , x k , X n+1 ). If not, then γ n+1 is taken to be (x 0 , . . . , x r ) for the unique r ≤ n such that x r = X n+1 (we allow γ n+1 to become the empty sequence if r = 0). In other words, the final loop in the path (x 0 , . . . , x r , x r+1 , . . . , X n+1 ) is erased. In dimension three and higher, |X n | → ∞, and hence for each k the first k steps of γ n are eventually constant. The limiting path γ is therefore well defined on a set of probability 1 and is a deterministic function, the loop-erasure of the simple random walk path X 0 , X 1 , X 2 , . . ., denoted LE(X). The loop-erased random walk measure, LERW is defined to be the law of γ.
The process γ = LE(X) seems to have little to do with reinforcement until one sees the following alternate description. Let {Y n : n ≥ 0} be defined inductively by Y 0 = 0 and where h(z) is the probability that a simple random walk beginning at z avoids {Y 0 , . . . , Y n } forever, with h(z) := 0 if z = Y k for some k ≤ n. Lawler observed [Law91, Proposition 7.3.1] that {Y n } has law LERW. Thus one might consider LERW to be an infinitely negatively reinforced VRRW that sees the future. Moreover, altering (6.4) by conditioning on avoiding the past for time M instead of forever, then letting M → ∞ gives a definition of LERW in two dimensions that agrees with the loop-erasing construction when both are stopped at random times. The law of γ = LE(X) is completely different from the laws U n and their putative limits, yet has some very nice features that make it worthy of study. It is time reversible, so for example the loop erasure of a random walk from a conditioned to hit b and stopped when it hits b has the same law if a and b are switched. The loop-erased random walk on an arbitrary graph is also intimately related to an algorithm of Aldous and Broder [Ald90] for choosing a spanning tree uniformly. In dimensions five and above, LERW behaves the same way as the self-avoiding measure U n , rescaling to a Brownian motion, but in dimensions 2, 3 and 4, it has different connectivity and diffusion exponents from U n .

Continuous time limits of self-avoiding walks
Both the 'true' self-avoiding random walk and the loop-erased random walk have continuous limiting processes that are very pretty. The chance to spend a few paragraphs on each of these was a large part of my reason for including the entire section on negative reinforcement.

The 'true' self-repelling motion
The true self-avoiding random walk with exponential self-repulsion was shown in Theorem 6.13 (part 1) to have a limit law for its time-t marginal. In fact it has a limit as a process. Most of this is shown in the paper [TW98], with a key tightness result added in [NR06]. Some properties of this limit process {X t } are summarized as follows. In particular, having 3/2-variation it is not a diffusion.
• The process {X t } has continuous paths.
• It is recurrent.
• It is self-similar: • It has non-trivial local variation of order 3/2.
• The occupation measure at time t has a density; this may be called the local time L t (x). • The pair (X t , L t (·)) is a Markov process.
To construct this process and show it is the limit of the exponentially repulsive true self-avoiding walk, Tóth and Werner rely on the Ray-Knight theory developed in [Tót95]. While technical statements would involve too much notation, the gist is that the local time at the edge {k, k + 1} converges under re-scaling, not only for fixed k but as a process in k. A strange but convenient choice is to stop the process when the occupation time on an edge z reaches m. The joint occupations of the other edges {j, j + 1} then converge, under suitable rescaling, to a Brownian motion started at time z and position m and absorbed at zero once the time parameter is positive; if z < 0 it is reflected at zero until then. When reading the previous sentence, be careful, as Ray-Knight theory has a habit of switching space and time.
Because this holds separately for each pair (z, m) ∈ R × R + , the limiting process {X t } may be constructed in the strong sense by means of coupled coalescing Brownian motions {B z,m (t) : t ≥ z} z∈R,m∈R + . These coupled Brownian motions are jointly limits of coupled simple random walks. On this level, the description is somewhat less technical, as follows.
Let V e denote the even vertices of Z 2 × Z + . For each (z, m) ∈ V e , flip an independent fair coin to determine a single directed edge from (z, m) to (z + 1, m±1); the exception is when m = 1; then for z < 0 there is an edge {(z, 1), (z+ 1, 2)} while for z ≥ 0 there is a v-shaped edge {(z, 1), (z + 1, 0), (z + 2, 1)}. Traveling rightward, one sees coalescing simple random walks, with absorption at zero once time is positive. A picture of this is shown. If one uses the even sites and travels leftward, one obtains a dual, distributed as a reflection (in time) of the original coalescing random walks. The complement of the union of the coalescing random walks and the dual walks is topologically a single path. Draw a polygonal path down the center of this path: the z-values when the center line crosses an integer level form a discrete process {Y n }.
This process {Y n } is a different process from the true self-avoiding walk we started with, but it has some other nice descriptions, discussed in [TW98, Sec-

Coalescing random walks
Coalescing random walks and their duals The Process Y n tion 11]. In particular, it may be described as an "infinitely negatively edgereinforced random walk with initial occupation measure alternating between zero and one". To be more precise, give nearest neighbor edges of z weight 1 if their center is at ±(1/2 + 2k) for k = 0, 1, 2, . . .. Thus the two edges adjacent to zero are both labeled with a one, and, going away from zero in either direction, ones and zeros alternate. Now do a random walk that always chooses the less traveled edge, flipping a coin in the case of a tie (each crossing of an edge increases its weight by one). The process {Y n } converges when rescaled to the process {X t } which is the scaling limit of the true self-avoiding walk. The limit operation in this case is more transparent: the coalescing simple random walks turn into coalescing Brownian motions. These Brownian motions are the local time processes given by the Ray-Knight theory. The construction of the process {X t } in [TW98] is in fact via these coalescing Brownian motions.

The Stochastic Loewner Equation
Suppose that the loop-erased random walk has a scaling limit. For specificity, it will be convenient to use the time reversal property of LERW and think of the walk as beginning on the boundary of a large disk and conditioned to hit the origin before returning to the boundary of the disk. The recursive h-process formulation (6.4) indicates that the infinitesimal future of such a limiting path would be a Brownian motion conditioned to avoid the path it has traced so far. Such conditioning, even if well defined, would seem to be complicated. But suppose, which is known about unconditioned Brownian motion and widely believed about many scaling limits, that the limiting LERW is conformally in-variant. The complement of the infinite past is simply connected, hence by the Riemann Mapping Theorem, it is conformally homeomorphic to the open unit disk with the present location mapping to a boundary point. The infinitesimal future in these coordinates is a Brownian motion conditioned immediately to enter the interior of the disk and stay there until it hits the origin. If we could compute in these coordinates, such conditioning would be routine.
In 2000, Schramm [Sch00] observed that such a conformal map may be computed via the classical Löwner equation. This is a differential equation satisfied by the conformal maps between a disk and the complement of a growing path inward from the boundary of the disk. More precisely, let β be a compact simple path in the closed unit disk with one endpoint at zero and the other endpoint being the only point of β on ∂U . Let q : (−∞, 0] → β \ {0} be a parametrization of β \ {0} and for each t ≤ 0, let f (t, z) : U → U \ q([t, 0]) (6.5) be the unique conformal map fixing 0 and having positive real derivative at 0. Löwner [Löw23] proved that Theorem 6.14 (Löwner's Slit Mapping Theorem). Given β, there is a parametrization q and a continuous function g : (−∞, 0] → ∂U such that the function f : U × (−∞, 0] → U in (6.5) satisfies the partial differential equation with initial condition f (z, 0) = z.
The point q(t) is a boundary point of U \ q([t, 0]), so it corresponds under the Riemann map f (t, ·) to a point on ∂U . It is easy to see this must be g(t). Imagine that β is the scaling limit of LERW started from the origin and stopped when it hits ∂U (recurrence of two-dimensional random walk forces us to use a stopping construction). Since a Brownian motion conditioned to enter the interior of the disk has an angular component that is a simple Brownian motion, it is not too great a leap to believe that g must be a Brownian motion on the circumference of ∂U , started from an arbitrary point, let us say 1. The solution to (6.6) exists for any g, that is, given g, we may recover the path q. We may then plug in for g a Brownian motion with EB 2 t = κt for some scale parameter κ. We obtain what is known as the radial SLE κ .
More precisely, for any κ > 0, any simply connected open domain D, and any x ∈ ∂D, y ∈ D, there is a unique process SLE κ (D; x, y) yielding a path β as above from x to y. We have constructed SLE κ (D; 1, 0). This is sufficient because SLE κ is invariant under conformal maps of the triple (D; x, y). Letting y approach z ∈ ∂D gives a well defined limit known as chordal SLE κ (D; x, z).
Lawler, Schramm and Werner have over a dozen substantial papers describing SLE κ for various κ and using SLE to analyze various scaling limits and solve some longstanding problems. A number of properties are proved in [RS05]. For example, SLE κ is always a path, is self-avoiding if and only if κ ≤ 4, and is space-filling when κ ≥ 8. Regarding the question of whether SLE is the scaling limit of LERW, it was shown in [Sch00] that if LERW has a scaling limit and this is conformally invariant, then this limit is SLE 2 . The conformally invariant limit was confirmed just a few years later: Theorem 6.15 ([LSW04, Theorem 1.3]). Two-dimensional LERW stopped at the boundary of a disk has a scaling limit and this limit is conformally invariant. Consequently, the limit is SLE 2 .
In the same paper, Lawler, Schramm and Werner show that the peano curve separating an infinite uniform spanning tree from its dual has SLE 8 as its scaling limit. The SLE 6 is not self-avoiding, but its outer boundary is, up to an inessential transformation, the same as the outer boundary of a two-dimensional Brownian motion run until a certain stopping time. A recently announced result of Smirnov is that the interface between positive and negative clusters of the two-dimensional Ising model is an SLE 3 . It is conjectured that the scaling limit of the classical self-avoiding random walk is SLE 8/3 , the conjecture following if such a scaling limit can be proved to exist and be conformally invariant.