Limits of relative entropies associated with weakly interacting particle systems

The limits of scaled relative entropies between probability distributions associated with N-particle weakly interacting Markov processes are considered. The convergence of such scaled relative entropies is established in various settings. The analysis is motivated by the role relative entropy plays as a Lyapunov function for the (linear) Kolmogorov forward equation associated with an ergodic Markov process, and Lyapunov function properties of these scaling limits with respect to nonlinear finite-state Markov processes are studied in the companion paper [6].


Introduction
We consider a collection of N weakly interacting particles, in which each particle evolves as a continuous time pure jump càdlàg stochastic process taking values in a finite state space X = {1, . . ., d}.The evolution of this February 16, 2015 collection of particles is described by an N -dimensional time-homogeneous Markov process X N = {X i,N } i=1,...,N , where for t ≥ 0, X i,N (t) represents the state of the ith particle at time t.The jump intensity of any given particle depends on the configuration of other particles only through the empirical measure where δ a is the Dirac measure at a. Consequently, a typical particle's effect on the dynamics of the given particle is of order 1/N .For this reason the interaction is referred to as a "weak interaction."Note that µ N (t) is a random variable with values in the space P N (X ) .= P(X ) ∩ 1 N Z d , where P(X ) is the space of probability measures on X , equipped with the usual topology of weak convergence.In the setting considered here, at most one particle will jump, i.e., change state, at a given time, and the jump intensities of any given particle depend only on its own state and the state of the empirical measure at that time.In addition, the jump intensities of all particles will have the same functional form.Thus, if the initial particle distribution of X N (0) = {X i,N (0)} i=1,...,N is exchangeable, then at any time t > 0, X N (t) = {X i,N (t)} i=1,...,N is also exchangeable.
Such mean field weakly interacting processes arise in a variety of applications ranging from physics and biology to social networks and telecommunications, and have been studied in many works (see, e.g., [1,2,3,13,17,25,30]).The majority of this research has focused on establishing so-called "propagation-of-chaos" results (see, e.g., [18,16,13,19,20,25,27,30]).Roughly speaking, such a result states that on any fixed time interval [0, T ], the particles become asymptotically independent as N → ∞, and that for each fixed t the distribution of a typical particle converges to a probability measure p(t), which coincides with the limit in probability of the sequence of empirical measures {µ N (t)} N ∈N as N → ∞.Under suitable conditions, the function t → p(t) can be characterized as the unique solution of a nonlinear differential equation on P(X ) of the form where for each p ∈ P(X ), Γ(p) is a rate matrix for a Markov chain on X .This differential equation admits an interpretation as the forward equation of a "nonlinear" jump Markov process that represents the evolution of the typical particle.In the context of weakly interacting diffusions, this limit equation is also referred to as the McKean-Vlasov limit.
February 16,2015 Other work on mean field weakly interacting processes has established central limit theorems [32,31,26,22,7] or sample path large deviations of the sequence {µ N } [8,5,11].All of these results are concerned with the behavior of the N -particle system over a finite time interval [0, T ].

Discussion of main results
An important but difficult issue in the study of nonlinear Markov processes is stability.Here, what is meant is the stability of the P(X )-valued deterministic dynamical system {p(t)} t≥0 .For example, one can ask if there is a unique, globally attracting fixed point for the ordinary differential equation (ODE) (1.2).When this is not the case, all the usual questions regarding stability of deterministic systems, such as existence of multiple fixed points, their local stability properties, etc., arise here as well.One is also interested in the connection between these sorts of stability properties of the limit model and related stability and metastability (in the sense of small noise stochastic systems) questions for the prelimit model.
There are several features which make stability analysis particularly difficult for these models.One is that the state space of the system, being the set of probability measures on X , is not a linear space (although it is a closed, convex subset of a Euclidean space).A standard approach to the study of stability is through construction of suitable Lyapunov functions.Obvious first choices for Lyapunov functions, such as quadratic functions, are not naturally suited to such state spaces.Related to the structure of the state space is the fact that the dynamics, linearized at any point in the state space, always have a zero eigenvalue, which also complicates the stability analysis.
The purpose of the present paper and the companion paper [6] is to introduce and develop a systematic approach to the construction of Lyapunov functions for nonlinear Markov processes.The starting point is the observation that given any ergodic Markov process, the mapping q → R(q π), where R is relative entropy and π is the stationary distribution, in a certain sense always defines a Lyapunov function for the distribution of the Markov process [28].We discuss this point in some detail in Section 3.For an ergodic Markov process the dynamical system describing the evolution of the law of the process (i.e., the associated Kolmogorov's forward equation) is a linear ODE with a unique fixed point.In contrast, for a nonlinear Markov process the corresponding ODE (1.2) can have multiple fixed points which may or may not be locally stable, and this is possible even when the jump rates given by the off diagonal elements of Γ(p) are bounded away February 16, 2015 from 0 uniformly in p.Furthermore, as is explained in Section 3, due to the nonlinearity of Γ(•) relative entropy typically cannot be used directly as a Lyapunov function for (1.2).
The approach we take for nonlinear Markov processes is to lift the problem to the level of the pre-limit N -particle processes that describe a linear Markov process.Under mild conditions the N -particle process will be ergodic, and thus relative entropy can be used to define a Lyapunov function for the joint distribution of these N particles.The scaling properties of relative entropy and convergence properties of the weakly interacting system then suggest that the limit of suitably normalized relative entropies for the N -particle system, assuming it exists, is a natural candidate Lyapunov function for the corresponding nonlinear Markov process.Specifically, denoting the unique invariant measure of the N -particle Markov process X N by π N ∈ P(X N ), the function F : P(X ) → R defined by is a natural candidate for a Lyapunov function.The aim of this paper is the calculation of quantities of the form (1.3).In the companion paper [6] we will use these results to construct Lyapunov functions for various particular systems.
Of course for this approach to work, we need the limit on the right side in (1.3) to exist and to be computable.In Section 4 we introduce a family of nonlinear Markov processes that arises as the large particle limit of systems of Gibbs type.For this family, the invariant distribution of the corresponding N -particle system takes an explicit form and we show that the right side of (1.3) has a closed form expression.In Section 4 of [6] we show that this limiting function is indeed a Lyapunov function for the corresponding nonlinear dynamical system (1.2).
The class of models just mentioned demonstrates that the approach for constructing Lyapunov functions by studying scaling limits of the relative entropies associated with the corresponding N -particle Markov processes has merit.However, for typical nonlinear systems as in (1.2), one does not have an explicit form for the stationary distribution of the associated Nparticle system, and thus the approach of computing limits of FN as in (1.3) becomes infeasible.An alternative is to consider the limits of where p N (t) is the (exchangeable) probability distribution of X N (t) with some exchangeable initial distribution p N (0) on X N .Formally taking the February 16, 2015 limit of F N t , first as t → ∞ and then as N → ∞, we arrive at the function F introduced in (1.3).Since as we have noted this limit cannot in general be evaluated, one may instead attempt to evaluate the limit in the reverse order, i.e., send N → ∞ first, followed by t → ∞.
A basic question one then asks is whether the limit lim N →∞ F N t (q) takes a useful form.In Section 5.1 we will answer this question in a rather general setting.Specifically, we show that under suitable conditions the limit of 1  N R(⊗ N q Q N ) as N → ∞ exists for every q ∈ P(X ) and exchangeable sequence {Q N } N ∈N , Q N ∈ P(X N ).The main condition needed is that the collection of empirical measures of N random variables with joint distribution Q N satisfies a locally uniform large deviation principle (LDP) on P(X ).We show in this case that the limit of 1 N R(⊗ N q Q N ) is given by J(q), where J is the rate function associated with the LDP.Applying this result to Q N = p N (t), we then identify the limit as N → ∞ of F N t (q) as J t (q), where J t is the large deviations rate function for the collection of P(X )-valued random variables {µ N (t)} N ∈N .In the companion paper we will show that the limit of J t (q) as t → ∞ yields a Lyapunov function for (1.2) for interesting models, including a class we call "locally Gibbs," which generalizes those obtained as limits of N -particle Gibbs models.

Outline of the paper and common notation
The paper is organized as follows.In Section 2 we describe the interacting particle system model and the ODE that characterizes its scaling limit.Section 3 recalls the descent property of relative entropy for (linear) Markov processes.Section 4 studies systems of Gibbs type and shows how a Lyapunov function can be obtained by evaluating limits of FN (q) as N → ∞.Next, in Section 5 we consider models more general than Gibbs systems.In Section 5.1, we carry out an asymptotic analysis of 1  N R(⊗ N q Q N ) as N → ∞ for an exchangeable sequence {Q N } N ∈N .The results of Section 5.1 are then used in Section 5.2 to evaluate lim N →∞ F N t (q).Section 5.2 also contains remarks on relations between the constructed Lyapunov functions and the Freidlin-Wentzell quasipotential and metastability issues for the underlying N -particle Markov process.
The following notation will be used.Given any Polish space E, D([0, ∞) : E) denotes the space of E-valued right continuous functions on [0, ∞) with finite left limits on (0, ∞), equipped with the usual Skorohod topology.Weak convergence of a sequence {X n } of E-valued random variables to a random variable X is denoted by X n ⇒ X.The cardinality of a finite set C is denoted by |C|.In this section, we provide a precise description of the time-homogeneous X N -valued Markov process X N = (X 1,N , . . ., X N,N ) that describes the evolution of the N -particle system.We assume that at most one particle can change state at any given time.Models for which more than one particle can change state simultaneously are also common [1,12,11].However, under broad conditions the limit (1.2) for such models also has an interpretation as the forward equation of a model in which only one particle can change state at any time [11], and so for purposes of stability analysis of (1.2) this assumption is not much of a restriction.
Recall that X is the finite set {1, . . ., d}.The transitions of X N are determined by a family of matrices {Γ N (r)} r∈P(X ) , where for each r ∈ P(X ), Γ N (r) = {Γ N x,y (r), x, y ∈ X } is a transition rate matrix of a continuous time Markov chain on X .For y = x, Γ N xy (r) ≥ 0 represents the rate at which a single particle transitions from state x to state y when the empirical measure has value r.More precisely, the transition mechanism of X N is as follows.Given X N (t) = x ∈ X N , an index i ∈ {1, . . ., N } and y = x i , the jump rate at time t for the transition , where r N (x) ∈ P N (X ) is the empirical measure of the vector x ∈ X N , which is given explicitly by Moreover, the jump rates for transitions of any other type are zero.Note that r N • (X N (t)) equals the empirical measure µ N (t)(•), defined in (1.1).The description in the last paragraph completely specifies the infinitesimal generator or rate matrix of the X N -valued Markov process X N , which we will denote throughout by Ψ N .Note that the sample paths of X N lie in D([0, ∞) : X N ), where X N is endowed with the discrete topology.The generator Ψ N , together with a collection of X -valued random variables {X i,N (0)} i=1,...,N whose distribution we take to be exchangeable, determines the law of X N .February 16, 2015

The jump Markov process for the empirical measure
As noted in Section 1, exchangeability of the initial random vector {X i,N (0), i = 1, . . ., N } implies that the processes {X i,N } i=1,...,N are also exchangeable.From this, it follows that the empirical measure process µ N = {µ N (t)} t≥0 is a Markov chain taking values in P N (X ).We now describe the evolution of this measure-valued Markov chain.For x ∈ X , let e x denotes the unit coordinate vector in the x-direction in R d .Since almost surely at most one particle can change state at any given time, the possible jumps of µ N are of the form v/N, v ∈ V, where Moreover, if µ N (t) = r for some r ∈ P N (X ), then at time t, N r x of the particles are in the state x.Therefore, the rate of the particular transition r → r + (e y − e x )/N is N r x Γ N xy (r).Consequently the generator L N of the jump Markov process µ N is given by for real-valued functions f on P N (X ).

Law of large numbers limit
We now characterize the law of large numbers limit of the sequence {µ N } N ∈N .It will be convenient to identify P(X ) with the (d − 1)-dimensional simplex S in R d , given by and identify P N (X ) with S N .= S ∩ 1 N Z d .We use P(X ) and S (likewise, P N (X ) and S N ) interchangeably.We endow S with the usual Euclidean topology and note that this corresponds to P(X ) endowed with the topology of weak convergence.We also let S • denote the relative interior of S. We will find it convenient to define Γ xx (r) .= − y∈X ,y =x Γ yx (r), so that Γ(r) can be viewed as a d × d transition rate matrix of a jump Markov process on X .
Laws of large numbers for the empirical measures of interacting processes can be efficiently established using a martingale problem formulation, see for instance [27].Since X is finite, in the present situation we can rely on a classical convergence theorem for pure jump Markov processes with state space contained in a Euclidean space.
Theorem 2.2 Suppose that Condition 2.1 holds, and assume that µ N (0) converges in probability to q ∈ P(X ) as N tends to infinity.Then {µ N (•)} N ∈N converges uniformly on compact time intervals in probability to p(•), where p(•) is the unique solution to (1.2) with p(0) = q.
Proof.The assertion follows from Theorem 2.11 in [21].In the notation of that work, E = P(X ), and we recall e x is the unit vector in R d with component x equal to 1.Note that if f is the identity function f (p) , where L N is the generator given in (2.3).Moreover, the z-th component of the d-dimensional vector F (p) is equal to x:x =z p x Γ x,z (p) − y:y =z p z Γ z,y (p), which in turn is equal to x p x Γ x,z (p), the z-th component of the row vector pΓ(p).The ODE d dt p(t) = F (p(t)) is therefore the same as (1.2).Since F is Lipschitz continuous by Condition 2.1, this ODE has a unique solution.The proof is now immediate from Theorem 2.11 in [21].
The solution to (1.2) has a stochastic representation.Given a probability measure q(0) ∈ P(X ), one can construct a process X with sample paths in D([0, T ] : X ) such that for all functions f : X → R, is a martingale, where q(t) denotes the probability distribution of X(t), t ≥ 0. Furthermore, X is unique in law.Note that the rate matrix of X(t) is time inhomogeneous and equal to Γ(q(t)), with q x (t) = P {X(t) = x}.
Because the evolution of X at time t depends on the distribution of X(t), this process is called a nonlinear Markov process.Note that q(t) also solves (1.2), and so if q(0) = p(0), by uniqueness p x (t) = P {X(t) = x}.One can show that, under the conditions of Theorem 2.2, X(•) is the limit in distribution of X i,N (•) for any fixed i, as N → ∞ (see Proposition 2.2 of [30]).
A fundamental property of interacting systems that will play a role in the discussion below is propagation of chaos; see [14] for an exposition and characterization.Propagation of chaos means that the first k components of the N -particle system over any finite time interval will be asymptotically independent and identically distributed (i.i.d.) as N tends to infinity, whenever the initial distributions of all components are asymptotically i.i.d.In the present context, propagation of chaos for the family (X N ) N ∈N (or {Ψ N } N ∈N ) means the following.For t ≥ 0 denote the probability distribution of (X 1,N (t), . . ., X k,N (t)) by p N,k (t).If q ∈ P(X ) and if for all k ∈ N p N,k (0) converges weakly to the product measure ⊗ k q as N → ∞, then for all k ∈ N and all t ≥ 0 p N,k (t) converges weakly to ⊗ k p(t), where p(•) is the solution to (1.2) with p(0) = q.Instead of a particular time t a finite time interval may be considered.Under the assumptions of Theorem 2.2, propagation of chaos holds for the family of N -particle systems determined by {Ψ N } N ∈N .See, for instance, Theorem 4.1 in [15].

Descent Property of Relative Entropy for Markov Processes
We next discuss an important property of the usual (linear) Markov processes.As noted in the introduction, various features of the deterministic system (1.2) make standard forms of Lyapunov functions that might be considered unsuitable.Indeed, one of the most challenging problems in the construction of Lyapunov functions for any system is the identification of natural forms that reflect the particular features and structure of the system.The ODE (1.2) is naturally related to a flow of probability measures, and for this reason one might consider constructions based on relative entropy.It is known that for an ergodic linear Markov process relative entropy serves as a Lyapunov function.Specifically, relative entropy has a descent property along the solution of the forward equation.The earliest proof in the setting of finite-state continuous-time Markov processes the authors have been able to locate is [28, pp. I-16-17].Since analogous arguments will be used elsewhere (see Section 2 of [6]), we give the proof of this fact.Let February 16, 2015 G = (G x,y ) x,y∈X be an irreducible rate matrix over the finite state space X , and denote its unique stationary distribution by π.The forward equation for the family of Markov processes with rate matrix G is the linear ODE Recall that the relative entropy of p ∈ P(X ) with respect to q ∈ P(X ) is given by Lemma 3.1 Let p(•), q(•) be solutions to (3.1) with initial conditions p(0), q(0) ∈ P(X ).Then for all t > 0, Moreover, d dt R (p(t) q(t)) = 0 if and only if p(t) = q(t).
Proof.It is well known (and easy to check) that ℓ is strictly convex on [0, ∞), with ℓ(0) = 1 and ℓ(z) = 0 if and only if z = 1.Owing to the irreducibility of G, for t > 0 p(t) and q(t) have no zero components and hence are equivalent probability vectors.By assumption, p ′ x (t) .= d dt p x (t) = y∈X p y (t)G y,x for all x ∈ X and all t ≥ 0, and similarly for q(t).Thus for t > 0 where the last equality follows from the fact that, since G is a rate matrix, x∈X G y,x = 0 for all y ∈ X .Rearranging terms we have x,y∈X :x =y ℓ p y (t)q x (t) p x (t)q y (t) p x (t) q y (t) q x (t) G y,x .
Recall that ℓ ≥ 0, that for t > 0 q x (t) > 0 and p x (t) > 0 for all x ∈ X , and that G y,x ≥ 0 for all x = y.It follows that d dt R(p(t) q(t)) ≤ 0. It remains to show that d dt R(p(t) q(t)) = 0 if and only if p(t) = q(t).We claim this follows from the fact that ℓ ≥ 0 with ℓ(z) = 0 if and only if z = 1, and from the irreducibility of G. Indeed, p(t) = q(t) if and only if py(t)qx(t) px(t)qy(t) = 1 for all x, y ∈ X with x = y.Thus p(t) = q(t) implies d dt R(p(t) q(t)) = 0.If d dt R(p(t) q(t)) = 0 then immediately py(t)qx(t) px(t)qy (t) = 1 for all x, y ∈ X such that G y,x > 0. If y does not directly communicate with x then, by irreducibility, there is a chain of directly communicating states leading from y to x, and using those states it follows that py(t)qx(t) px(t)qy(t) = 1.If q(0) = π then, by stationarity, q(t) = π for all t ≥ 0. Lemma 3.1 then implies that the mapping is a local (and also global) Lyapunov function (cf.Definition 2.4 in [6]) for the linear forward equation (3.1) on any relatively open subset of S that contains π.This is, however, just one of many ways that relative entropy can be used to define Lyapunov functions.For example, Lemma 3.1 also implies is a local and global Lyapunov function for (3.1).Yet a third can be constructed as follows.Let T > 0 and consider the mapping where q p (•) is the solution to (3.1) with q p (0) = p.Lemma 3.1 also implies that the mapping given by (3.5) is a Lyapunov function for (3.1).This is because R p(t) q p(t) (T ) = R (p(t) q(t)), where q(•) is the solution to (3.1) with q(0) = p(T ), thus q(t) = p(T + t) = q p(t) (T ).Note that (3.3) arises as the limit of (3.5) as T goes to infinity.The proof of the descent property in Lemma 3.1 crucially uses the fact that p(•) and q(•) satisfy a forward equation with respect to the same fixed rate matrix, and therefore for general nonlinear Markov processes one does not expect relative entropy to serve directly as a Lyapunov function.However, one might conjecture this to be true if the nonlinearity is in some sense weak, and a result of this type is presented in the companion paper [6] (see Section 3 therein).For more general settings our approach will be to consider functions such as those in (3.3) and (3.5) associated with the Nparticle Markov processes and then take a suitable scaling limit as N → ∞.The issue is somewhat subtle, e.g., while this approach is feasible with the form (3.3) it is not feasible when the form (3.4) is used, even though both define Lyapunov functions in the linear case.For further discussion on this point we refer to Remark 4.4.

Systems of Gibbs Type
In this section we evaluate the limit in (1.3) for a family of interacting Nparticle systems with an explicit stationary distribution.This limit is shown to be a Lyapunov function in [6].Section 4.1 introduces the class of weakly interacting Markov processes and the corresponding nonlinear Markov processes.The construction starts from the definition of the stationary distribution as a Gibbs measure for the N -particle system.In Section 4.2 we derive candidate Lyapunov functions for the limit systems as limits of relative entropy.

The prelimit and limit systems
Recall that X is a finite set with d ≥ 2 elements.Let K : X × R d → R be such that for each x ∈ X , K(x, •) is twice continuously differentiable.For (x, p) ∈ X × R d , we often write K(x, p) as K x (p).Consider the probability measure π N on X N defined by where Z N is the normalization constant, and r N (x) is the empirical measure of x and was defined in (2.1) (recall we identify an element of P(X ) with a vector in S).
A particular example of K that has been extensively studied is given by where V : X → R is referred to as the environment potential, W : X ×X → R the interaction potential, and β > 0 the interaction parameter.In this case U N , referred to as the N -particle energy function, takes the form There are standard methods for identifying X N -valued Markov processes for which π N is the stationary distribution.The resulting rate matrices are often called Glauber dynamics; see, for instance, [29] or [24].To be precise, we seek an X N -valued Markov process which has the structure of a weakly interacting N -particle system and is reversible with respect to π N .
Let (α(x, y)) x,y∈X be an irreducible and symmetric matrix with diagonal entries equal to zero and off-diagonal entries either one or zero.A will identify those states of a single particle that can be reached in one jump from any given state.For N ∈ N, define a matrix A N = (A N (x, y)) x,y∈X N indexed by elements of X N according to A N (x, y) = α(x l , y l ) if x and y differ in exactly one index l ∈ {1, . . ., N }, and A N (x, y) = 0 otherwise.Then A N determines which states of the N -particle system can be reached in one jump.Observe that A N is symmetric and irreducible with values in {0, 1}.There are many ways one can define a rate matrix Ψ N such that the corresponding Markov process is reversible with respect to π N .Three standard ones are as follows.Let a + = max{a, 0}.For x, y ∈ X N , x = y, set either February 16,2015 In all three cases set Ψ N (x, x) .= − y:y =x Ψ N (x, y), x ∈ X N .The model defined by (4.4a) is sometimes referred to as Metropolis dynamics, and (4.4b) as heat bath dynamics [24].The matrix Ψ N is the generator of an irreducible continuous-time finite-state Markov process with state space X N .In what follows we will consider only (4.4a), the analysis for the other dynamics being completely analogous. Define and Ψ : The following lemma shows that each Ψ N in (4.4) is the infinitesimal generator of a family of weakly interacting Markov processes in the sense of Section 2.1.For example, with the dynamics (4.4a) it will follow from Lemma 4.1 that Γ N x,y (r) → e −(Ψ(x,y,r)) + α(x, y) as N → ∞. such that the following holds.Let x, y ∈ X N be such that A N (x, y) = 1, and let l ∈ {1, . . ., N } be the unique index such that x l = y l .Then Proof.Using the definition of U N we have February 16, 2015 and sup Also note that for some c 2 ∈ (0, ∞) and since r N z (x) is the empirical measure Using the various definitions and in particular (4.5) and (4.7) we have where for (x, y, p) ∈ X × X × S Using the bounds (4.8) and (4.9), we have that (4.6) is satisfied for a suitable C < ∞.
From Lemma 4.1 we have that the jump rates of the Markov process governed by Ψ N in each of the three cases in (4.4) depend on the components x j , j = l, only through the empirical measure r N (x).For example, with Ψ N as in (4.4a), for x, y ∈ X N such that x l = y l for some l ∈ {1, . . ., N }, x j = y j for j = l, Ψ N (x, y) .= e −(Ψ(x l ,y l ,r N (x))+B N (x l ,y l ,r N (x))) + A N (x, y).
Thus Ψ N as in (4.4) is the generator of a family of weakly interacting Markov processes in the sense of Section 2. Indeed for (4.4a), in the notation of that February 16, 2015 section, Ψ N is defined in terms of the family of matrices {Γ N (r)} r∈P(X ) , where for x, y ∈ X , x = y, Γ N x,y (r) = e −(Ψ(x,y,r)+B N (x,y,r))) + α(x, y). (4.10) The rate matrix Ψ N in (4.4) has π N defined in (4.1) as its stationary distribution.To see this, let x, y ∈ X N .By symmetry, A N (x, y) = A N (y, x).Taking into account (4.1), it is easy to see that for any of the three choices of Ψ N according to (4.4) we have π N (x)Ψ N (x, y) = π N (y)Ψ N (y, x).Thus Ψ N satisfies the detailed balance condition with respect to π N , and since Ψ N is irreducible, π N is its unique stationary distribution.
Hence by (4.6), the family {Γ N (r)} r∈P(X ) defined by (4.10) satisfies Condition 2.1 with Γ x,y (r) = e −(Ψ(x,y,r)) + α(x, y), x = y, r ∈ P(X ). (4.11) With X N and µ N associated with Γ N (•) as in Section 2.1, Theorem 2.2 implies the sequence {µ N } N ∈N of D([0, ∞), P(X ))-valued random variables satisfies a law of large numbers with limit determined by (1.2), and with Γ(•) as in (4.11).More precisely, if µ N (0) converges in distribution to q ∈ P(X ) as N goes to infinity then µ N (•) converges in distribution to the solution p(•) of (4.11) with p(0) = q.Thus Γ(•) describes the limit model for the families of weakly interacting Markov processes of Gibbs type introduced above.If p ∈ P(X ) is fixed then Γ(p) is the generator of an ergodic finite-state Markov process, and the unique invariant distribution on X is given by π(p) with where Z(p) .= x∈X exp (−H x (p)) .

Limit of relative entropies
We will now evaluate the limit in (1.3) for the family of interacting Nparticle systems introduced in Section 4.1.As noted earlier, the paper [6] will study the Lyapunov function properties of the limit.
Thus by Sanov's theorem and Varadhan's theorem on the asymptotic evaluation of exponential integrals [10], it follows that lim Note that C is finite and does not depend on p.
Recalling that X 1 is a random variable with distribution p, we have on combining these observations that lim This proves (4.14) and completes the proof.
As an immediate consequence we get the following result for K as in (4.3).
Corollary 4.3 Suppose that K is defined by (4.3) and let FN be as in (4.13).Then One may conjecture that an analogous descent property holds for the function F obtained by taking limits of relative entropies computed in the reverse order, namely for the function

.17)
February 16, 2015 However, in general, this is not true, as the following example illustrates.Consider the setting where K is given by (4.3) with environment potential V ≡ 0, β = 1, and non-constant symmetric interaction potential W with W ≥ 0 and W (x, x) = 0 for all x ∈ X .Then, by (4.12), the invariant distributions are given by and the family of rate matrices (Γ(p)) p∈P(X ) are of the form (4.11), with Ψ(x, y, p) .= 2 z∈X (W (y, z) − W (x, z)) p z .Suppose W is such that there exists a unique solution π * ∈ P(X ) to the fixed point equation π(p) = p.Then using the same type of calculations as those used to prove Theorem 4.2, one can check that F is well defined and takes the form for some finite constant C ∈ R that depends on π * (but not on p).Thus the proposed Lyapunov function is relative entropy with the independent variable in the second position, and the dynamics are of the form (1.2) for Γ that is not a constant.While R (π * p) satisfies the descent property for constant ergodic matrices Γ such that π * Γ = 0, this property is not valid in any generality when Γ depends on p, and one can then easily construct examples for which the function F defined above does not enjoy the descent property.

General Weakly Interacting Systems
The analysis of Section 4 crucially relied on the fact that the stationary distributions for systems of Gibbs type take an explicit form.In general, when the form of π N is not known, evaluation of the limit in (1.3) becomes infeasible.A natural approach then is to consider the function in (1.4) and to evaluate the quantity lim t→∞ lim N →∞ F N t (q).In this section we will consider the problem of evaluating the inner limit, i.e. lim N →∞ F N t (q).We will show that this limit, denoted by J t (q), exists quite generally.In [6] we will study properties of the candidate Lyapunov function lim t→∞ J t (q).
To argue the existence of lim N →∞ F N t (q) and to identify the limit we begin with a general result.February 16, 2015 Fix ε > 0. By the assumption that r N under Q N satisfies a locally uniform LDP with rate function J, and that J(q) < ∞, there exists N 0 < ∞ such that for all N ≥ N 0 , e −N (J(q)+ε) ≤ Q N ( y : r N (y) = q N ) ≤ e −N (J(q)−ε) . (5.7) Next, as in Theorem 4.2, let ν denote the uniform measure on X and let (5.8)Indeed, elementary combinatorial arguments (see, for example, Lemma 2.1.9 of [9]) show that for every N ∈ N, (N + 1) −|X | e −N R(q N ν) ≤ Q N 0 ( y : r N (y) = q N ) ≤ e −N R(q N ν) .(5.9) Since ν is the uniform measure on X , The locally uniform LDP of r N under Q N 0 then follows from the continuity of J and that 1  N log (N + 1) −|X | → 0 as N → ∞.The relation implies there exists Ñ0 < ∞ such that for all N ≥ Ñ0 , e −N ( J (q)+ε) ≤ C N (q N ) |X | N ≤ e −N ( J(q)−ε) .
the inverted limit.Note that in general, this limit will depend on p N (0) through J 0 .We will use this limit, and in particular the form (5.11), in two ways.The first is to derive an analytic characterization for the limit as t → ∞ of J t (q).This characterization will be used in [6], together with insights into the structure of candidate Lyapunov functions obtained from the Gibbs models of Section 4, to identify and verify that candidate Lyapunov functions for various classes of models actually are Lyapunov functions.The second use is to directly connect these limits of relative entropies with the Freidlin-Wentzell quasipotential related to the processes µ N .
The quasipotential provides another approach to the construction of candidate Lyapunov functions, but one based on notions of "energy conservation" and related variational methods, and with no a priori connection with the descent property of relative entropies for linear Markov processes.In the rest of this section we further compare these approaches.Suppose that π * is a (locally) stable equilibrium point for p ′ = pΓ(p), so that for some relatively open subset D ⊂ S with π * ∈ D and if p(0) ∈ D then the solution to p ′ = pΓ(p) satisfies p(t) → π * as t → ∞.From [23,4,11] it follows that if a deterministic sequence µ N (0) converges to p(0) ∈ S, then for each T ∈ (0, ∞) {µ N (t)} 0≤t≤T satisfies a LDP in D([0, T ] : S) with the rate function T 0 L(φ(s), φ(s))ds if φ(•) is absolutely continuous with φ(0) = p(0), and equal to ∞ otherwise.The Freidlin-Wentzell quasipotential associated with the large time, large N behavior of {µ N (t)} and with respect to the initial condition π * is defined by V π * (q) = inf T 0 L(φ(s), φ(s))ds : φ(0) = π * , φ(T ) = q, T ∈ (0, ∞) where the infimum is over all absolutely continuous φ : [0, T ] → S.
Next suppose J 0 is a rate function that is consistent with the weak convergence of r N under p N (0) to π * as N → ∞.One example is J0 (r) = R(r π * ), which corresponds to p N (0) equal to product measure with marginals all equal to π * .A second example is J π * 0 (r) = 0 when r = π * and ∞ otherwise, which corresponds to a "nearly deterministic" initial condition p N (0).To simplify we will consider just J π * 0 .All other choices of J 0 bound J π * 0 from below and, while leading to other candidate Lyapunov functions, they will also bound the one corresponding to J π * 0 from below.Using the fact

February 16, 2015 2
Background and Model Description 2.1 Description of the N-particle system

Lemma 4 . 1
There exists C < ∞ and for each N ∈ N a function B N : X × X × P(X ) → R satisfying sup (x,y,p)∈X ×X ×P(X ) |B N (x, y, p)| ≤ C N (4.6)