Large deviations for configurations generated by Gibbs distributions with energy functionals consisting of singular interaction and weakly confining potentials

We establish large deviation principles (LDPs) for empirical measures associated with a sequence of Gibbs distributions on n-particle configurations, each of which is defined in terms of an inverse temperature βn and an energy functional consisting of a (possibly singular) interaction potential and a (possibly weakly) confining potential. Under fairly general assumptions on the potentials, we use a common framework to establish LDPs both with speeds βn/n → ∞, in which case the rate function is expressed in terms of a functional involving the potentials, and with speed βn = n, when the rate function contains an additional entropic term. Such LDPs are motivated by questions arising in random matrix theory, sampling, simulated annealing and asymptotic convex geometry. Our approach, which uses the weak convergence method developed by Dupuis and Ellis, establishes LDPs with respect to stronger Wassersteintype topologies. Our results address several interesting examples not covered by previous works, including the case of a weakly confining potential, which allows for rate functions with minimizers that do not have compact support, thus resolving several open questions raised in a work of Chafaï et al.


Description of problem and contributions
We consider configurations of a finite number of R d -valued particles that are subject to an external force consisting of a confining potential V : R d → (−∞, +∞] that acts on each particle and a pairwise interaction potential W : R d × R d → (−∞, +∞]. For every EJP 25 (2020), paper 46. n ∈ N, we define a Hamiltonian or energy functional H n : R dn → (−∞, +∞] that assigns to every R dn -valued configuration x n = (x 1 , x 2 , . . . , x n ) of n particles, the energy H n (x n ) = H n (x 1 , ..., x n ) : (1.1) Also, for any n-particle configuration x n = (x 1 , x 2 , . . . , x n ), let L n (x n , ·) be the associated empirical measure: L n (x n ; ·) := 1 n n i=1 δ xi (·), (1.2) where δ y represents the Dirac delta mass at y ∈ R d . Given a separable metric space S, let B(S) denote the collection of Borel subsets of S, and let P(S) denote the space of probability measures on (S, B(S)). Note that for every x n ∈ R d , L n (x n ; ·) lies in P(R d ), where R d is equipped with the usual Euclidean metric. If x n is random and each component of x n has a density that is absolutely continuous with respect to a measure with no atoms (which will be true in this article), H n can be rewritten in terms of L n as follows: W (x, y) L n (x n ; dx) L n (x n ; dy) (W (x, y) + V (x) + V (y))L n (x n ; dx) L n (x n ; dy) , (1.3) where for k ∈ N and a set A ⊂ R kd , the symbol A = denotes the set of points in A whose d-dimensional components are all distinct: A = := A \ (x 1 , . . . , x k ) ∈ R kd : x i = x j for some 1 ≤ i ≤ j ≤ k . (1.4) Let {β n } be a sequence of positive numbers diverging to infinity, which can be interpreted as a sequence of inverse temperatures, and for each n ∈ N, let P n ∈ P(R dn ) be the probability measure given by P n (dx 1 , ..., dx n ) := exp (−β n H n (x 1 , ..., x n )) Z n (dx 1 ) · · · (dx n ), (1.5) where is a σ-finite measure on R d that has no atoms and acts as a reference measure, and Z n is the normalization constant (which is also referred to as the partition function) given by exp (−β n H n (x 1 , ..., x n )) (dx 1 ) · · · (dx n ). (1.6) Measures of the form (1.5) arise in a variety of contexts. For the case when is Lebesgue measure on R d , it is well known that if W and V are sufficiently smooth, then P n is the invariant distribution of a reversible Markov diffusion on R dn (with identity diffusion matrix and drift proportional to ∇H n ), which can be viewed as describing the dynamics of n interacting Brownian particles in R d [17,Chapter 5]. On the other hand, for particular choices of d, V and W , P n arises as the law of the spectrum of various random matrix ensembles, including the so-called β-ensemble as well as certain random normal matrices (see Section 1.5.7 of [9] for details).
Given P n ∈ P(R d ) as in (1.5), let Q n = (L n ) # P n be the measure induced on P R d by pushing P n forward under the mapping L n : R dn → P(R d ) defined in (1.2) (see EJP 25 (2020), paper 46. Definition 2.3 for the definition of (L n ) # ). The aim of this paper is to establish large deviation principles (LDPs) for sequences {Q n } under general conditions on V and W that allow V and W to be not only unbounded, but also highly irregular. We apply the weak convergence methods developed in [13] to provide results for both cases where β n = n (see Theorem 2.9), and lim n→∞ β n /n = ∞ (see Theorem 2.13). For the reader unfamiliar with the weak convergence approach to large deviations, we provide a brief outline in Appendix A.
To the best of our knowledge, the most general result in the direction of Theorem 2.13 is [9, Theorem 1.1]. The latter seems to be the first paper to present a general approach to proving LDPs for empirical measures generated by Gibbs distributions, when the inverse temperatures β n diverge faster than n, the number of particles (the particular case of β n = n 2 was considered earlier in [4]). Our theorems recover existing results (see Example 2.15) and also extend prior results in the following multiple directions, in particular resolving several open questions raised in [9]: 1. First, whereas the result in [9, Theorem 1.1] considers only speeds {β n } that satisfy lim n→∞ β n /nlog n = ∞, we allow for any speed diverging faster than n, including the speed n 2 considered in [4], thus showing that the growth rate condition of [9] is a technical one related to the combinatorial approach used in the proofs therein.
2. Second, we consider more general confining potentials V . In particular, the assumptions imposed in [9] only allow for large deviation rate functions whose minimizers have compact support. The minimizer of the rate function in the LDP (when unique) identifies the limiting equilibrium measure of the particles. On the other hand, LDPs with rate functions that have minimizers that are not compactly supported arise in the context of simulated annealing algorithms that are designed to sample from the minimizer of the rate function (in which case the sequence {β n } is referred to as a cooling schedule); see, e.g., [9,Section 1.5.8]. In this article, we impose weaker assumptions that, unlike in [9], allow for confining potentials V that can be weak (that is, with the limit of the corresponding sequence of empirical measures being non-compactly supported), discontinuous (see Examples 2.16 and 2.21), unbounded, and possibly not even locally integrable. In particular, the potential V is allowed to even be zero in a non-trivial unbounded domain, provided that the volume of the domain outside a ball of radius R around the origin decreases "sufficiently fast" as R goes to infinity. This allows us to consider cases where the particles are not confined in a bounded set, and in particular leads to examples with minimizers that do not have compact support (see Example 2.16), thus addressing the open question raised in [9,Section 1.5.1]. It appears that this has only been previously studied in the case of R 2 with the logarithmic Coulomb interaction potential (see Remark 2.20).
3. Third, the freedom of choice for the reference measure allows the study of Gibbs distributions defined on sets that have zero d-dimensional Lebesgue measure such as, for example, a non-smooth surface on R 3 or a fractal set like the Cantor dust in R 2 . A more specific example that often appears in complex potential theory is the case where W is the Coulomb potential, and is Lebesgue measure on some 1-dimensional subset of the complex plane C such as the unit circle.
4. Furthermore, we establish these LDPs not only with respect to the weak topology, but also with respect to a family of stronger topologies that include the p-Wasserstein topologies for p ≥ 1, thus resolving another open question raised in [9,Section 1.5.6]. The LDPs with respect to stronger topologies are used in Lemma 3.4 of [22] (see also [21]) to show that, for a large class of Hamiltonians, the sequence {x n } n∈N of Gibbs configurations satisfies the so-called "asymptotic EJP 25 (2020), paper 46. thin-shell condition", which is of relevance in asymptotic geometric analysis and high-dimensional probability. Specifically, this condition stipulates that the sequence of scaled Euclidean norms of the random vectors satisfies an LDP, and was shown in [22] (see Theorems 2.4, 2.6 and 2.8 therein) to imply that then the corresponding sequences of multi-dimensional random projections of the random vectors also satisfy an LDP. This can be viewed as a non-universal large deviation counterpart of the universality result of [2] that random projections of a high-dimensional measure whose Euclidean norms satisfy a certain concentration property called the "thin shell condition" have Gaussian fluctuations. As first observed in [16,15] It is also worthwhile to mention that, in contrast to prior works, in this work the LDPs for all speeds and topologies are established using a common methodology.

Discussion of related recent results
This article is a substantial generalization of the first version of this article [14] and also resolves a minor technical issue therein. It contains significant extensions, the most important of which is to allow weakly confining potentials V . In the case lim n→∞ β n /n = ∞ we show that although the entropic term disappears in the limit, its appearance in the pre-limit can guarantee the validity of the LDP in some cases when V does not satisfy lim x →∞ V (x) = ∞ (see Section 5.4). This result also highlights the intuitive nature of weak convergence methods, and more specifically the use of representations that are connected to the method, like the one in (5.4). In addition, compared to the original version [14], in the present article the illustrative examples have been significantly extended (see Section 2.4), to include cases where V and W are not continuous, and heuristic arguments in [14] related to the examples have been replaced here with rigorous proofs. Finally, some open problems have also been added, which can generate new directions for research.
Since the first version [14] appeared, several authors have extended our work or used some of the arguments. In [18] an LDP for a sequence of point processes defined by Gibbs measures on a compact orientable two-dimensional Riemannian manifold is studied. In [5,19], the connection between LDPs and Γ-convergence that was first highlighted in [23] and subsequently implicitly exploited in our work, was furthered explored. We believe that this is a very natural connection and hypothesize that it can also lead to some new insights in the case where Assumption 2.102 is not satisfied. More recent work that appeared after the present version of this article was posted includes [10], where the particular case of the Coulomb potential in dimension d = 2 is studied.

Assumptions and main results
This section is devoted to stating and discussion our main assumptions and results. Section 2.1 introduced basic definitions and notation used throughout the article, and Sections 2.2 and 2.3, present the main results for the case β n = n and β n → ∞, respectively. Corollaries of the main results and illustrative examples are presented in Section 2.4, and an outline of the rest of the article is given in Section 2.5. EJP 25 (2020), paper 46. First, we start by stating the assumptions on the potentials V and W that will hold throughout.
Assumption 2.1 guarantees that the Gibbs distribution given in (1.5) is well defined. More precisely, (2.1) guarantees that the measure is well defined and finite, and (2.2) guarantees that the measure is not trivial.

Remark 2.2.
Under Assumption 2.1, without loss of generality, we can assume that because any constant added to V can be absorbed into Z n ; see (1.5). With some abuse of notation, we use e −(1−a)V to denote the probability measure e −(1−a)V (x) (dx).

Notation and definitions
In this section, we provide some necessary definitions. Although some of the definitions will be given in their more general form, we would like to clarify that for our results, the underlying space S would be a separable metric space without any assumptions about its completeness. We recall the standard definition of the push forward operator #. Definition 2.3. Given measurable spaces (S, F) and (S,F), a measurable mapping f : S →S and a measure µ : F → [0, ∞], the pushforward of µ is the measure induced on (S,F) by µ under f , that is, the measure f # µ :F → [0, ∞] is given by In other words, f # µ is the image measure of µ under f . We next recall the definition of a rate function on a separable metric space S.
Note that a function that satisfies the properties in Definition 2.4 is sometimes referred to as a good rate function in the literature, as a way to highlight the compactness of the level sets and to distinguish it from lower semi-continuous functions that can be defined by the property of having closed level sets, but which can in some cases provide large deviation rates of decay. When not in the context of LDPs, a function that has the properties stated in Definition 2.4 is also called a tightness function; a term that will be used extensively in the sequel. In contrast to much of the previous application of weak convergence methods in large deviations, here we do not assume S is complete. This will be convenient when dealing with topologies other than the weak topology.
We now recall the definition of an LDP for a sequence of probability measures on (S, B(S)).
where E • andĒ denote the interior and closure of E, respectively.
Let C(R d ) be the space of continuous functions on R d , and let C b (R d ) denote the subspace of bounded functions in C(R d ). We endow P(R d ) with the topology of weak convergence and use w − → to denote convergence with respect to this topology; recall . The Lévy-Prohorov metric d w metrizes the weak topology on P(R d ), and the space (P(R d ), d w ) is Polish (see, e.g., Theorem 5 of Appendix III of [6]). We also consider stronger topologies, parameterized by functions belonging to the following set: The space (P ψ (R d ), d ψ ) is a separable metric space (see Lemma C.1 for a proof). Remark 2.6. (Alternative metrizations of the Wasserstein topology) When ψ(x) = x p for p ∈ [1, ∞), with x ∈ R d , and · denoting the Euclidean norm, d ψ induces the p-Wasserstein topology (see [1,Remark 7.1.11]). Another metric that is commonly used to induce the p-Wasserstein topology on P(R d ) is d p (µ, ν) := inf ζ∈Π(µ,ν) R d ×R d ||x − y|| p ζ(dx, dy), where Π(µ, ν) is the set of all measures in R 2d with first marginal µ and second marginal ν. Although P ψ (R d ) endowed with d p is complete and separable, we use the somewhat simpler metric d ψ defined for any ψ satisfying (2.3), under which P ψ (R d ) is only separable, and not complete. For more information on the Wasserstein distance and its topological properties, the reader is referred to [28]; specifically, see Theorem 6.8 therein.

Results in the case β n = n
Our first result concerns the LDP for {Q n } with speed α n = β n = n. The rate function is expressed in terms of the following functionals. Given a ∈ [0, 1], for ζ ∈ P( Also, for a measure ν ∈ P(R d ), as usual we define the relative entropy functional by where µ ν denotes that µ is absolutely continuous with respect to ν. Then, for µ ∈ P(R d ) let and I (µ) := R µ|e −(1−a)V + J a (µ), (2.8) Note that the lack of subscript on I in (2.8) is justified because, as the following easily verifiable relation shows, I does not depend on the constant a: and W(µ) : (2.10) To establish the LDP with respect to stronger topologies d ψ , ψ ∈ Ψ, we will need an additional condition, which we now state. Let Φ be the class of functions defined by (2.11) Assumption 2.7. There exists a lsc function γ : R d → R, of the form γ(x) = φ (ψ (x)) , for some φ ∈ Φ, such that for the constant a in Assumption 2.1, and every µ ∈ P(R d ), we (2.12) The following lemma provides a more easily verifiable sufficient condition under which Assumption 2.7 holds; its proof is deferred to Appendix B. Recall the set Ψ defined in (2.3).
We now state our first main result, whose proof is given in Section 4. Theorem 2.9. Let V and W satisfy Assumption 2.1, and for n ∈ N, let β n = n, let P n be defined as in (1.5) and let Q n = (L n ) # P n . Then {Q n } satisfies an LDP on (P R d , d w ) with speed α n = β n = n and rate function  W is both unbounded and discontinuous, and therefore Theorem 2.9 is the first in that direction. Furthermore, Assumption 2.7 provides a sufficient condition for the LDP to hold with respect to a rather large class of stronger topologies, which was useful for the verification of the asymptotic thin shell condition in [21,22].

Results in the case β n /n → ∞
Motivated by questions arising in random matrix theory, sampling and simulated annealing, several authors [3,4,9,20,24,25] have considered LDPs for {Q n } at specific speeds that are faster than n, such as β n /n log n → ∞ and β n = n 2 . Our second theorem presents a general result for speeds faster than n, that is, when β n /n → ∞, under Assumption 2.1 and certain modified assumptions on V and W stated in Assumption 2.10 below. In what follows, we use J to denote the functional J 1 : P(R d ) → (−∞, ∞] defined in (2.7), with a = 1. Recall the set Φ defined in (2.11), for a set A ⊂ R d , A = is defined in (1.4), and as usual let I A denote the indicator function, which assigns 1 to points in A and 0 otherwise. Assumption 2.10.
When A = ∅, then Assumption 2.101 collapses to the following more easily verifiable condition: Assumption 2.11. There exists a lsc function γ : (2.20) Assumption 2.11 covers all of the well known examples in the literature and the majority of generalizations that we provide in this paper. The main reason we introduce the more complicated Assumption 2.101, is that we want to include cases involving higher dimensional spaces (d ≥ 3), where the confining potential V does not necessarily satisfy lim n→∞ V (x n ) = ∞ when x n → ∞, an assumption that is used in both [9] and [25], which cover cases where the rate function has minimizers with compact support. For special cases in two dimensions where minimizers do not have compact support, see [20]   We now state our second main result, whose proof is deferred to Section 5.
Theorem 2.13. Consider a sequence {β n } such that lim n→∞ β n /n = ∞, and let V and W satisfy Assumptions 2.1, 2.101 and 2.102. For n ∈ N, let P n be as in (1.5) and Q n = (L n ) # P n . Then {Q n } satisfies an LDP on P(R d ) with speed α n = β n and rate where J is the functional J 1 given in (2.7). Furthermore, if there exists ψ ∈ Ψ for which Assumption 2.103 holds, then {Q n } satisfies an LDP on A direct consequence of Theorems 2.9 and 2.13 is the following.
Remark 2.14. ( LDPs for empirical moments) Suppose V and W satisfy Assumption 2.1, and Assumption 2.7 holds with ψ(x) := ||x|| p for some p ≥ 1. Let (X n 1 , . . . , X n n ) be distributed according to P n and for any q ≤ p, let Y n q := 1 n n i=1 |X n i | q , n ∈ N. Then Theorem 2.9, the continuity of the map µ → ||x|| q µ(dx) in the Wasserstein-p topology and the contraction principle [11, Theorem 4.2.1] together show that {Y n q } satisfies an LDP with speed β n = n and rate function Likewise, if V and W satisfy Assumptions 2.1, 2.101 and 2.102, and Assumption 2.103 holds with ψ(x) := ||x|| p and β n /n → ∞ as n → ∞, then Theorem 2.13 shows that EJP 25 (2020), paper 46.
{Y n q } satisfies an LDP with speed β n and rate functionH(

Corollaries of the main results
In this section, we provide several illustrative examples for which the assumptions of Theorems 2.9 and 2.13 can be verified.

Known examples covered by our assumptions
We start by considering potentials that have already been investigated in the literature for the weak topology, and showing that they satisfy our assumptions. In what follows, Example 2.15. Let be Lebesgue measure. The pair (V, W ) given by V (x) = ||x|| p for some p > 1 and W (x, y) = K ∆ (x − y) satisfies Assumptions 2.1, 2.11 and 2.103, and Assumptions 2.7 and 2.102 also hold with ψ(x) = ||x|| q , q < p.
Proof of Example 2.15. For d ≥ 3, it is trivial to verify that Assumption 2.1 is satisfied with a = 0. For the case d = 2, we pick a = 1 2 , and observe that 1 We verify Assumption 2.11 by picking φ(s) = 1 4 s p +C, whereC is a suitable constant. Finally, it is also easy to see that the pair (V, W ) satisfies Assumptions 2.7 and 2.103 with ψ(x) = ||x|| q , q < p, by applying Lemma 2.8 for the first case and by picking φ(s) = 1 4 s p/q +C, whereC is a suitable constant, for the second.
Verification of Assumption 2.102 is a direct application of point (3) in [9, Proposition 2.8].
Example 2.15 shows, in particular, that our assumptions are satisfied in the cases covered in [9], including the popular case studied in [9,4,20,24], of V (x) = x 2 , W (x, y) = − log(x − y) and being Lebesgue measure.

Non-diverging, weakly confining potentials
Given Assumption 2.1, Assumption 2.11 is, at least seemingly, weaker than the condition lim x →∞ V (x) = +∞ imposed in [9]. This can be directly seen if one takes φ(t) = where C is chosen accordingly. However, the two assumptions have similar origins. More specifically, since W, V generate H n by a linear combination, one can transfer the confining attributes of V to W by taking for example any V ≥ 0 such that e −V is a probability measure and W (x, y) = x 2 + y 2 . The important point is that for these cases, W (x, y) + V (x) + V (y) is penalizing large values of x and y .
In [25, p. 38] there is an intuitive explanation for how this containment can be applied for the case where V is superlinear and W is the Coulomb potential in order to prove the LDP. The proof is based on the idea that in a closed subset of P(R d ), the minimizers of the rate function J can be approximated by minimizers of H n when n is sufficiently large. The approximation property is a consequence of the Γ−convergence of H n to J (see [7]) together with the "coercivity" of H n (i.e., H n has minimizers in every closed set). To establish these properties of H n , the confining properties of V play a crucial role.
We now provide the following example.
Remark 2.17. For this example there are non-trivial regions of R 2d extending infinity on which W (x, y) + V (x) + V (y) is close to zero, so there is not an obvious confining property. Also, since J is a rate function it exhibits minimizers on every closed ballB of P (R d ). In contrast, H n not only does not have minimizers, but also However, for this example the LDP holds for example in the case β n = n 2 . Remark 2.18. We can modify Example 2.16 slightly to take V to be continuous, for example, if we set V equal to zero only in B(z, a z /2), equal to x q outside ∪ z∈Z 3 B(z, a z ), and use Tietze's extension theorem to extend it to a continuous function. For such a continuous V , it is trivial to establish Assumption 2.102 by applying the results in [9, Proposition 2.8]. It will be clear from the proof of Example 2.16 below that Assumptions 2.101 and 2.103 also continue to hold with this modification since V remains positive. Assumption 2.102 can be also proved directly for our initial example (with discontinuous V ) in a straightforward manner. We omit the proof.
Proof of Example 2.16. We start by verifying Assumption 2.101, namely inequalities (2.17)- (2.19). Letγ(x) := 1 2 ||x|| q . It follows immediately from the definitions that (2.17) holds with C = 0 and γ = C γ for any C ≤ 1. Next, let n ∈ N, and let A n 1 := A ∩ B(0, r n ), A n 2 = A \ B(0, r n ) be as in Assumption 2.101. We split the set Z 3 ∩ A n 1 , into two set Z n 1 and Z n 2 : Let Z n 1 contain the cubic integers Z 3 ∩ A n 1 for which L n (B z ∩ A n 1 ) ≤ 1 n and let Z n 2 contain the cubic integers for which L n (B z ∩ A n 1 ) ≥ 2 n . The idea is to choose the cardinality |Z n 1 | sufficiently small such that particles in the corresponding sets do not contribute much to the interaction energy, and at the same time, choose the balls around Z n 2 to be so small that the pairs of particles that are in one of these sets create a significant interaction energy. We now verify (2.18). Fix x n ∈ (R dn ) = and denote L n := L n (x n , ·). For z ∈ Z n 2 , since W (x, y) = 2/||x − y|| ≥ 1/a z when x, y ∈ B z , we have Combining this with the nonnegativity of W , the Cauchy-Schwarz inequality (in the third inequality below), the summability of ||z|| 2q a z , z ∈ Z 3 (in the fourth inequality), and using C > 0 to denote a constant that may change from line to line, we obtain Expanding the last term and using the relations s 2 ≥ s − 1, |Z n 1 | ≤ r 3 n , and r n = q+3 √ n, we see that the left-hand side above is bounded below by This implies that (2.18) holds with γ = C γ = min(C , 1)γ. Lastly, to prove (2.19), recalling that β n = n 2 , r n = q+3 √ n and V = 0 on the set A, we have for any a ∈ [0, 1), which converges to zero as n → ∞, and hence, (2.19) follows. This concludes the proof of Assumption 2.101, and also verifies Assumption 2.103 for any ψ ∈ Ψ such that max x:||x||=m ψ(x)/m q → 0. We now argue by contradiction to prove that J (equivalently, J * ) does not have any minimizers with compact support. Let µ min be a minimizer with compact support K that is contained in B(0, R) for some R ∈ N. Letz = (6R, 0, 0). We pickx ∈ K such that µ min (B(x, az)) > 0, and by choosing R sufficiently large so that az is sufficiently small, we can assume without loss of generality that 0 < µ min (B(x, az)) < 1.
We now show that J (µ * min ) < J (µ min ). First, note that since V is zero on the support of µz, EJP 25 (2020), paper 46.
Next, by the symmetry and nonnegativity of W , we have W (x, y)µ min (dx)µ min (dy), Now, since µ min has support in K, we have The definition of µz and the fact that Combining the above relations, recalling that µ * min := (µ min ) | K\B(z,az) + µz, and invoking (2.24), we conclude that Since we proved earlier that V(µ min ) ≥ V(µ * min ), we conclude that J (µ min ) > J (µ * min ), which contradicts that µ min is a minimizer of J . Unlike Assumption 2.11 (and most conditions imposed in the literature), which holds independently of {β n }, Assumption 2.101 is {β n } dependent. More specifically, for a fixed n, one can pick β n sufficiently large that (by the Laplace principle) the measure P n in (1.5) mainly charges configurations that are near-infimizers of H n , which (as noted above) have strictly smaller values than the minimizer of J . In these cases, the LDP could not possibly hold in P(R d ) with rate function J . One might expect that if J (the anticipated rate function) is a tightness function, then the LDP will hold for all speeds bigger than n. However, the comments above show that this is not true, and it is possible that J is a tightness function but the LDP does not hold on P(R d ) with rate function J . It is likely nevertheless, that a non-trivial LDP still holds true on some compactified space like P(R d ∪ {∞}). The following natural questions arise.   [20] or in the more recent [10], where the particular case of the Coulomb potential in dimension d = 2 is studied. The proofs in [20] are based on specific properties of the Coulomb potential − log |x − y| and the complex plane. Our result works for more general W, d, β n , and topological spaces P ψ (R d ). We expect that the weak convergence methods of [13] that we use here can be used to study other problems with weakly confining potentials.

Discontinuous interaction potentials
Our assumptions are also satisfied for cases where either V and/or W are discontinuous. Example 2.16 already provides one example where V is discontinuous. We now given another illustrative example where W is discontinuous. Before we provide a justification of our assertions, note that in this example W can be interpreted as an interaction that takes place only when both particles are inside the same ball B i . For visualization purposes one can take h i = K ∆ for every i, with K ∆ from Example 2.15. A situation like this can arise with an electric potential between particles that are positioned in different regions with isolating boundaries.
Proof of Example 2.21. Since each h i satisfies Assumption 2.1, we get that Assumption 2.1 holds for (V, W ) with a = 0, and Assumption 2.11 holds immediately due to the fact that V = x 2 and the definition of W . We now sketch a proof of why Assumption 2.102 also holds. Let µ ∈ P(R d ), and > 0. We We would like to approximate each µ i by an absolutely continuous measure with the same total mass, and with energy close to the original and support inside its original support (so no new interaction occurs). By . . , N }. It is possible that µ 0 (∂M ) > 0. However, we can move this mass to the interior of M by pushing µ 0 forward under with δ > 0 small. We can even assume that the resulting µ δ 0 has compact support in the interior of M by removing the mass in a small neighborhood of ∂M , and then EJP 25 (2020), paper 46.
renormalizing to keep the total mass constant as was done for the other µ i . For small δ > 0 it easy to see that, since only the continuous confined potential acts on it, |J (µ 0 ) − J (µ δ 0 )| ≤ . Since Assumption 2.102 is satisfied for each h i , we can apply it to get a measure µ δ,n i , absolutely continuous with respect to the Lebesgue measure and with the same mass as µ i , that is supported only inside B i such that |J (µ δ,n i ) − J (µ δ i )| ≤ , and µ δ,n i , µ δ i are close in the weak topology. We set µ δ,n 0 = µ δ 0 * G n , where G n is a truncated Gaussian of radius 1/n, which creates an absolutely continuous measure with support in K for which |J (µ δ,n 0 ) − J (µ δ 0 )| ≤ , for large enough n. Then µ δ,n = N i=0 µ δ,n i , satisfies |J (µ δ,n ) − J (µ)| ≤ (2N + 2) , and also by making n big enough and δ > 0 small enough we can have d w (µ δ,n , µ) ≤ .
Open problem 2.22. In [9], the (extended) continuity of W was used as a sufficient condition for Assumption 2.102 to hold. Example 2.21 above shows that continuity is not necessary for Assumption 2.102 to hold. Our preliminary investigations suggest that in some of these cases, it may be possible to establish LDPs with different rate functions, namely those that are given by some type of regularization of the J functional (like appropriate Γ-convergence relaxations but only with sequences that belong to specific subsets of the set of probability measures). We pose the existence of LDPs for cases where Assumption 2.102 fails as an open problem.

Outline of the paper
The structure of the rest of the article is as follows. In Section 3 we provide definitions and lemmas that are used throughout the paper and then show that the candidate rate functions introduced above are indeed rate functions. In Section 4 we prove results for the speed β n = n, and in Section 5 we consider the case of speeds β n that grow faster than n. An outline of the weak convergence approach and proofs of several lemmas that are needed for the main theorems are collected in the Appendices.

Rate Function Property
In what follows, recall the set Ψ defined in (2.3). In Section 3.2, we show that under various combinations of Assumptions 2.1-2.10, the functions I and J defined in (2.14) and (2.21), and for ψ ∈ Ψ, the functions I ψ and J ψ defined in (2.15) and (2.22) are rate functions on the spaces P(R d ) and P ψ (R d ), respectively. To begin with, in Section 3.1 we first introduce basic notions that will be used in the rest of the paper.

Basic definitions
Definition 3.1. Let I be an index set and let {λ a , a ∈ I} ⊂ P (S). The collection {λ a , a ∈ I} is said to be tight if for every > 0, there is a compact set K ⊂ S, such that inf{λ a (K ) , a ∈ I} ≥ 1 − .     The next result identifies a convenient tightness function on P ψ R d ; see Appendix C for a proof.
is a tightness function on P ψ R d .
Finally, it will be convenient to introduce the following projection operators to define marginal distributions. Definition 3.6. We denote by π k , k = 1, 2, the projection operators on a product space S 1 × S 2 defined by

Verification of the rate function property
Lemma 3.7. Suppose Assumption 2.1 holds. Then I and J defined in (2.14) and (2.21), respectively, are lsc on P(R d ). Moreover, for ψ ∈ Ψ, I ψ * and J ψ * defined in (2.15) and (2.22), respectively, are lsc on P ψ (R d ).
Proof. We start by showing that the functional J a defined in (2.7) is lsc. For µ ∈ P(R d ), let µ ⊗ µ denote the corresponding product measure on R d × R d , and recall from (2.7) that J a (µ) = J a (µ ⊗ µ), with J a defined as in (2.6). The map µ → µ ⊗ µ from P(R d ) to P(R d × R d ) is continuous, and by Fatou's lemma (for weak convergence) the map ζ → J a (ζ) is lower semicontinuous if W (x, y) + aV (x) + aV (y) is lower semicontinuous and bounded from below. Since the latter property holds under Assumption 2.1, it follows that J a is lsc. Since I = J a + R(·|e −(1−a)V ) and, as is well known, R ·|e −(1−a)V is lsc on P(R d ), this shows that I, and hence I * , are lsc. By the same argument, the lower semicontinuity of J can be deduced from the fact that J = J 1 (µ ⊗ µ) where J a is given in (2.6), and the fact that (x, y) → W (x, y) + V (x) + V (y) is lsc and uniformly bounded from below due to Assumption 2.1, and from (2.21), it follows that J * is lsc. Since the topology on P ψ (R d ) is stronger than that on P(R d ), it follows that both I ψ and J ψ defined in (2.15) and (2.22), respectively, are also lsc on P ψ (R d ).
Lemma 3.8. Suppose Assumption 2.1 is satisfied. Then I is a rate function on P(R d ). If, in addition, there exists ψ ∈ Ψ such that Assumption 2.7 is satisfied, then I ψ is a rate function on P ψ (R d ).
Proof. Since I is lsc on P(R d ) by Lemma 3.7, it only remains to show that the level sets of I , or equivalently I, are precompact on P(R d ).
Proof. From the definitions of J and J ψ in (2.21) and (2.22), it is clear that to prove the lemma, it suffices to show that under Assumptions 2.101 and 2.103, J is a tightness function in the respective spaces P(R d ) and P ψ (R d ). The fact that J is lsc follows from Lemma 3.7. It remains to prove that the functionals have precompact level sets. For this, by Lemmas 3.2 and 3.3, it suffices to prove that there exist C > 0 and C 1 ∈ R such that for every µ ∈ P(R d ) for some tightness function γ. We will first prove that this is true for every µ ∈ P(R d ) without atoms and with compact support, and then use a limiting argument. By Assumption 2.101 (resp. Assumption 2.103), there exist a tightness function γ : R d → R, A ∈ B(R d ) and C ∈ R such that the inequality (2.17) holds: Fix R < ∞ and µ ∈ P(R d ) whose support lies in B(0, R). Integrating both sides of (3.3) with respect to µ, we have Since µ has no atoms, by Lemma E.1 there exists a sequence x n ∈ R dn = , n ∈ N, such that L n := L(x n , ·) has support in B(0, R), and L n w → µ and J = (L n ) → J (µ) as n → ∞. Therefore, by Assumption 2.101 (resp. Assumption 2.103), and in particular (2.18), there exists n 0 ∈ N, such that n ≥ n 0 implies r n > R, and hence where c is the lower bound in Assumption 2.1. Combining this with the lower semicontinuity of γ, the fact that L n w → µ and J = (L n ) → J (µ) as n → ∞, we see that has compact support and is without atoms. The relation (3.2) holds for each N ∈ N, from which we obtain (B(0, N )) .
The integrands in the last inequality are bounded from below. Therefore, without loss of generality, we can assume that they are actually positive since otherwise we can just add and subtract their respective infima. By applying the monotone convergence theorem in the last relation, it follows that (3.2) holds for any µ ∈ P(R d ) without atoms. Finally, fix an arbitrary µ ∈ P(R d ). Assume without loss of generality that J (µ) < ∞, for if not, (3.2) holds trivially. Then by Assumption 2.102, there exists a sequence {µ n } ⊂ P(R d ) such that each µ n is absolutely continuous with respect to the measure (and consequently, non-atomic since is non-atomic), µ n w → µ and J (µ n ) → J (µ). Since, as shown above, (3.2) holds when µ is replaced with µ n for each n, taking the limit inferior as n → ∞ of both sides and using the fact that lim inf n→∞ R d γ(x)µ(dx) ≤ lim inf n→∞ R d γ(x)µ n (dx) since γ is lsc, it follows that (3.2) also holds for any µ ∈ P(R d ).

Proof of Theorem 2.9
Throughout this section, we assume that Assumption 2.1 is satisfied. To establish the LDP stated in Theorem 2.9, by [13, Theorem 1.2.3] we can equivalently verify the Laplace principle. For any probability measure P , we use E P to denote the corresponding expectation, and for conciseness denote E P by E. In view of the rate function property of I and I ψ already established in Lemmas 3.7 and 3.8, it suffices to show the following: for any bounded and continuous function f on S, the Laplace principle  To establish the bound (4.1), we first express − 1 n log E Qn e −nf in terms of a variational problem (equivalently, a stochastic control problem). We then prove tightness of nearly minimizing controls, and finally prove convergence of the values of the corresponding controlled problems to the value of the limiting variational problem. The last step is reminiscent of the notion of Γ-convergence that is often used for analyzing variational problems in the analysis community. For a nice exposition of the relationship between LDPs and Γ-convergence, the reader is referred to [23].

Representation formula
Recall that P n is the probability measure R dn defined in (1.5) and Q n is the push forward of P n under L n . Let a ∈ [0, 1) be the constant in Assumption 2.1 and let P n be the measure on R nd defined by P n (dx 1 , . . . , dx n ) := e − n i=1 (1−a)V (xi) (dx 1 ) · · · (dx n ), and note that it is a probability measure due to Assumption 2.1 and Remark 2.2. Let J a, = be defined as in (3.1): J a, = (µ) = 1 2 (R d ×R d ) = (W (x, y) + aV (x) + aV (y)) µ(dx)µ(dy). When β n = n, using (4.2), (1.5) and (1.3) to calculate dP n /dP * n , we see that for any measurable function f on P(R d ) (or on P ψ (R d )), we have where Z n is the normalizing constant defined in (1.6) and V is the functional defined in (2.9). We next state a representation for the quantity on the right-hand side of (4.3). To avoid confusion with the original distributions and random variables, we use an overbar (e.g.,L n ) for quantities that will appear in the representation, and refer to them as "controlled" versions. Given a probability measureP n ∈ P(R dn ), we can factor it into conditional distributions in the following manner:  Note thatμ n i , 1 ≤ i ≤ n, are random probability measures, and the ith measure is measurable with respect to the σ-algebra generated by {X n j } j<i . We refer to the collection {μ n i , 1 ≤ i ≤ n} as a control, and letL n (·) = L n (X n ; ·), with L n defined by (1.2), be the (random) empirical measure of {X n j } 1≤j≤n , which we refer to as the controlled empirical measure.
Let f belong to the space of functions on P(R d ) (or P ψ (R d )) such that the map x n → f (L n (x n ; ·)) from R nd to R is measurable and bounded from below. This space clearly includes all bounded continuous functions on P(R d ) (respectively, P ψ (R d )). Then, since the functional J a, = is also measurable and bounded from below (due to Assumption 2.7), we can apply [13, Proposition 4.5.1] to the function x n ∈ R d → f (L n (x n ; ·)) + J a, = (L n (x n ; ·)), to obtain − 1 n log E P n e −n(f +J a, = + a whereL n is the controlled empirical measure associated withP n as defined above, and the infimum is over all controls {μ n i } defined in terms of some joint distribution P n ∈ P(R dn ) via (4.4). FactoringP n as above and using the chain rule for relative entropy (see [ where the infimum is over all controls {μ n i } (equivalently, joint distributionsP n ∈ P(R nd )).
EJP 25 (2020), paper 46. − 1 n logZ n = − 1 n log E P n e −nJ a, = (Ln)+ a n V(Ln) We claim that to prove Theorem 2.9, it suffices to show that for every bounded and continuous (in the respective topology) function f , the lower bound and upper bound lim sup hold. Indeed, when combined with (4.6), (4.7) and (4.3), these bounds imply the desired limit (4.1). The lower and upper bounds are established in Sections 4.3 and 4.4, respectively. First, in Section 4.2, we establish some tightness properties of the controls that will be used in the proofs of these bounds.

Properties of the controls
We continue to use the notation for the controls introduced in the previous section. We start with a simplifying observation.

Remark 4.2.
In the proof of the lower bound (4.8), we can assume that there exists If this were not true, we could restrict to a subsequence that has such a property, because for any subsequence for which the left-hand side of (4.10) is infinite, the lower bound (4.8) is satisfied by default. Furthermore, since under Assumption 2.1, J a, = > min{0, 2c}, we can restrict to controls for which the relative entropy cost is bounded by C 0 + 2|c|:  Proof. Let {μ n i }, n ∈ N, be a sequence of controls that satisfies (4.11). By the convexity of relative entropy and Jensen's inequality We know that R ·|e −(1−a)V is a tightness function on P R d and hence, by Lemma 3.2, the sequence of random probability measures {μ n , n ∈ N} is tight. By Lemma 3.4, the sequence of probability measures {E[μ n ], n ∈ N} is tight. Sinceμ n i is the conditional distribution ofX n i given (X n 1 , ...,X n i−1 ), for any measurable function g : R d → R that is bounded from below, we have Thus, E L n = E [μ n ] , and so {E[L n ], n ∈ N} is also tight. Another application of Lemma 3.4 then shows that {L n , n ∈ N}, is tight, which together with the tightness of {μ n } established above, implies μ n ,L n , n ∈ N is tight.
The following lemma, which uses an elementary martingale argument, appears in [13]. For the reader's convenience the proof is given in Appendix D. For the next result, it will be convenient to first define a collection of auxiliary random measures that extend the ones that appear in the representation (4.6). Let P n be a probability measure on R dn , and let (X n 1 , . . . ,X n n ) be random variables with joint distributionP n . For J ⊂ {1, ..., n}, letP n J equal the marginal distribution ofP n on {x j , j ∈ J}, and for disjoint subsets I 1 and I 2 of {1, . . . , n}, letP n I1|I2 denote the stochastic kernel defined as follows: P n I1|I2 (dx i , i ∈ I 1 |x k , k ∈ I 2 )P n I2 (dx k , k ∈ I 2 ) =P n I1∪I2 (dx j , j ∈ I 1 ∪ I 2 ).
Let K k := {1, . . . , k − 1}. In the sequel we fix i < j (the case j < i can be handled in a symmetric way), and definē Also, note that with this notation are the controls used in the representation (4.6). We claim that π 1 #μ n ij =μ n i and π 2 #μ n ij = E[μ n j |X n k , k ∈ K i ], (4.15) where π k , k = 1, 2, and # are the projection and push-forward operators introduced in Large deviations for configurations generated by Gibbs distributions (4.15) is a little more involved. Indeed, note that for every A ⊂ B(R d ), from which the second equality in (4.15) follows.  (4.16) and letμ n be as defined in (4.12). Then Proof. Let θ be a probability measure on R d . By the chain rule for relative entropy, we In addition, Jensen's inequality gives Combining the last two displays with (4.13) and (4.14), we obtain ac n EJP 25 (2020), paper 46.
where J a is the functional defined in (2.6) and c is a lower bound for V. Next, let µ 2,n := 1 n(n − 1)  Then combining (4.18) with the convexity of R in both arguments (see [13,Lemma 1.4.3]), the linearity of J a , and the definition ofμ 2,n in (4.19), we obtain (4.20) We now use (4.20) to establish tightness of both {L n } and {μ n } in the d ψ topology. Note thatμ 2,n is a random probability measure on R d × R d and that it has identical marginals. Since V and W satisfy Assumption 2.7 and relative entropy is nonnegative, there exists a superlinear function φ for which we have the inequalities n .  Substituting this into the right-hand side of (4.22) and letting C 0 < ∞ denote the left-hand side of (4.16), we obtain the bound for all sufficiently large n. However, since we know from Lemma 3.5 that Φ(µ) = R d φ (ψ (x)) µ (dx) is a tightness function on P ψ R d , it follows that {μ n } is tight as a collection of P ψ R d -valued random elements. Finally, note that we have the equality Setting g(x) = φ(ψ(x)), and again invoking Lemma 3.5, we see that {L n } is also tight.

Remark 4.6.
In the remainder of the proof, which is carried out in Sections 4.3 and 4.4, the arguments for both P(R d ) and P ψ (R d ) are similar, and so we will treat both cases simultaneously. The functions f used will be considered continuous in the respective topology and any infimum taken should be with respect to the corresponding set P(R d ) or P ψ (R d ).

Proof of the lower bound
For the proof of the lower bound (4.8) we will use some auxiliary functionals. For d ∈ N, an arbitrary function F : and note that for every µ ∈ P(R d ), where J a, = ≥ J M a, = is used for the third inequality and the last inequality uses (4.24) and the fact thatL n ⊗L n put mass at most 1/n on the diagonal of R d × R d .
Letμ n := 1 n n i=1μ n i . Since Lemma 4.5 implies {(L n ,μ n )} is tight, we can extract a further subsequence, which we denote again by {(L n ,μ n )}, which converges in distribution to some limit (L,μ). If the lower bound is established for this subsequence, a standard argument by contradiction establishes the lower bound for the original sequence. Let {M n } be an increasing sequence such that lim n→∞ M n = ∞ and lim n→∞ Mn n = 0, and let m ∈ N. By the monotonicity of n → W Mn , Jensen's inequality, the definition ofμ n , and Fatou's lemma we have where the continuity of f and lower semicontinuity of J Mm a and R(·|e −(1−a))V ) are also used in the last inequality. Since this inequality holds for arbitrary m ∈ N, the monotone convergence theorem, the property thatL =μ almost surely (due to Lemma 4.4) and the definition of I in (2.14), together imply

Proof of the upper bound
Again, fix f to be a bounded continuous function on P(R d ), let > 0 and let µ * ∈ P(R d ) (respectively, P ψ (R d )) be such that For n ∈ N, let {μ n i , 1 ≤ i ≤ n} denote the particular control defined byμ n i := µ * for all n ∈ N and i ∈ {1, ..., n}, and letX n i , i = 1, . . . , n, andL n denote the associated controlled objects. Recall that and hence µ * are non-atomic. From the definition of J a and J a, = in (2.7) and (3.1), respectively, we have Defineμ n := 1 n n i=1μ n i = µ * . Then, due to (4.28), the conditions of Lemma 4.5 hold for {(L n ,μ n )}. Together with Lemma 4.3, this shows that {L n } is tight in P(R d ) and P ψ (R d ). When combined with the almost sure convergenceL n → µ * , which holds due to Lemma 4.4 (or the Glivenko-Cantelli lemma), this implies convergence ofL n to µ * with respect to both d w and d ψ , as appropriate. Since f is bounded and continuous, lim n→∞ E[f (L n )] = f (µ * ) by the dominated convergence theorem. The above observations, together with (4.29), the uniform lower bound on J a and V and (4.28) show that lim sup Since is arbitrary, this implies the upper bound (4.9), which together with (4.8) and the discussion at the end of Section 4.1 completes the proof of Theorem 2.9.

Proof of Theorem 2.13
This section is devoted to the proof of Theorem 2.13. The structure of the proof is similar to that of the case with speed β n = n. In view of Lemmas 3.7 and 3.8 and Theorem 1.2.3 in [13], it suffices to prove that for any bounded and continuous function

Representation formula
As before, fix a ∈ [0, 1) as in Assumption 2.1, and let P n (dx (dx i ) be the probability measure on R dn defined in (4.2). We now introduce the functional J n, = : P(R d ) → (−∞, ∞] given by Note that J n, = (µ) is bounded below for all sufficiently large n due to Assumption 2.101 and the fact that β n /n → ∞. When x n ∈ (R dn ) = , using (5.2) we can rewrite β n H n , where H n was defined in (1.1), as follows: × L n (x n ; dx) L n (x n ; dy) Let f be a measurable function on P(R d ) (or on P ψ (R d )) that is bounded below (in particular f could be bounded and continuous). Then by the definition of P * n , we have where Z n is the normalization constant defined in (1.6).
Using the same notation and arguments as in Section 4.1, the following representations are valid. Fix any function f on P(R d ) (or P ψ (R d )), such that f • L n is measurable in R dn and bounded from below (this includes all continuous and bounded functions on P(R d ) or P ψ (R d ). Then, since the function (x, y) y) is measurable and bounded from below, we can apply [13,Proposition 4.5.1] to f (L n (x n ; ·)) + J n, = (L n (x n ; ·)) + 1 n − 1−a βn V(L n (x n ; ·)), to obtain As before, to establish Theorem 2.13, in view of (5.4), (5.5) and (5.3), it suffices to establish the lower bound (5.6) and the upper bound for all bounded and continuous functions f (with respect to the corresponding topologies).

Tightness of controls
As in Remark 4.2, we have the following observation that simplifies the proof of the lower bound.
Lemma 5.2. Let {μ n i } be a sequence of controls such that the associated controlled empirical measures satisfy (5.8). Assume also that V and W satisfy Assumptions 2.1 and 2.101. Then {L n } is tight in P R d . Further, if Assumption 2.103 is also satisfied for some ψ ∈ Ψ, then {L n } is tight on P ψ R d . Proof. First, note that by (5.2), J n, = (L n ) can be rewritten as For large enough n, Assumption 2.1 implies that all the integrands are bounded from below. Therefore, we have Now, let the set A, sequence {r n } and lsc function γ : R d → R be as in Assumption 2.101, and let A 1 n and A 2 n be the associated sets defined therein. Then the integral J = (L n ) can be decomposed as the sum of integrals over the following sets: Each of these terms is bounded from below by direct application of Assumption 2.1, and since E[J = (L n )] is uniformly bounded from above we have that the expectation of each of them is also bounded from above.
To prove that {L n } is tight, by Lemmas 3.2 and 3.3 it suffices to prove that the term We now show that each of the three terms in the last inequality is uniformly bounded from above. By applying (2.17) of Assumption 2.101, we obtain for n ≥ 2, By Assumption 2.101 we also have Due to (5.9), the last two displays show that the first two terms on the right-hand side of (5.10) are uniformly bounded. Finally for the third term, sinceμ n = 1 n n i=1μ n i , recalling (from Section 4.1) that {X n j } 1≤j≤n are the controlled random variables with joint distri-butionP n (dx 1 , . . . , dx n ), and using the tower property of conditional expectations, we EJP 25 (2020), paper 46.
Recalling that I A 2 n denotes the indicator function of the set A 2 n , by an extension of the formula that relates exponential integrals and relative entropy [13,Proposition 4.5.1], the bound (2.19) in Assumption 2.101 (resp. Assumption 2.103), and (5.9), we see that The first term on the right-hand side is uniformly bounded by (2.19) in Assumption 2.101 (resp. Assumption 2.102) and the second term is uniformly bounded by (5.9). This concludes the proof.

Proof of the lower bound
For the proof of the lower bound we use some auxiliary functionals on P(R d ).
These integrals are well defined for sufficiently large n because of Assumption 2.1. For every M, n ∈ N, Let > 0 and {μ n i } be such that where C is a finite upper bound, which exists by Remark 5.1 and the boundedness of f , and the last inequality follows from (5.11) and the factL n (dx)L n (dy) puts mass 1/n on the diagonal x = y.
Owing to tightness (see Lemma 5.2) we can extract a further subsequence of {(L n ,μ n )}, which (with some abuse of notation) we denote again by {(L n ,μ n )}, for whichμ n := 1 n n i=1μ n i , that converges weakly to some limit (L,μ). Let M n be a sequence that goes to infinity such that lim n→∞ Mn n = 0 and let m ∈ N. By Fatou's lemma, the nonnegativity of R(·|e −V ), the definition of V in (2.9), and the fact that n/β n → 0, we have lim inf Since the above inequality holds for arbitrary m, using the monotone convergence Since > 0 is arbitrary, this establishes (5.6).

Proof of the upper bound
We start by making an observation, whose proof is deferred to Appendix F. . Furthermore, if µ ∈ P ψ (R d ) for some ψ ∈ Ψ, then we can assume in addition that d ψ (µ n , µ) → 0.
Now, let f be a bounded and continuous function on P(R d ) (or P ψ (R d )), let > 0 and let µ * be such that We can also assume that R µ * |e −(1−a)V < ∞, due to Assumption 2.102 and Lemma 5.3. Then letμ n i = µ * for all n ∈ N and i ∈ {1, ..., n}, and let the random variables X n i , 1 ≤ i ≤ n, n ∈ N, be iid with distribution µ * . By Lemma 4.4, the weak limit ofL n equals µ * . Calculations very similar to those of (4.29) yield Thus,L n w → µ, the dominated convergence theorem and the fact that n/β n → 0 imply lim sup is equal to f (µ * ) + J (µ * ). Thus, we have shown that lim sup Since > 0 is arbitrary, we obtain the upper bound (5.7), thus completing the proof of Theorem 2.13.
X n →X in distribution, then lim inf n→∞ E C(X n :X n ) ≥ E[I(X)], and therefore lim inf Since {X n } is arbitrary, this gives the needed bound (in fact, in this part of the analysis one often identifies a candidate for the rate function).
The reverse bound is obtained as follows. Given any x * that is within ε > 0 of the infimum of inf x∈X [f (x) + I(x)], one identifies controls that will driveX n to x * (recall that large deviations uses a law of large numbers scaling), and with costs that satisfy lim sup n→∞ E C(X n :X n ) ≤ I(x * ). The reverse bound follows since ε > 0 is arbitrary, and together the bounds give (A.1). To successfully carry out these steps, one typically needs a very good understanding of the law of large numnbers analysis of the original system {X n }, since the weak convergence analysis ends up being a law of large numbers analysis of the controlled versions, and methods that are useful for the first problem can often be adapted to deal with the second.

B Proof of Lemma 2.8
The proof of Lemma 2.8 is based on two preliminary results, established in Lemma B.1 and Lemma B.2 below.
Lemma B.1. Let ν ∈ P (R m ) and letψ : R m → R + be measurable. Then increasing. Finally, we have (⇐) Letφ be as in the statement of the lemma. Sinceφ satisfies lim s→∞φ  for all λ < ∞ if and only if there exists a convex, increasing and superlinear function φ : R + → R and a constant C < ∞ such that for any µ ∈ P (R m ), and (B.5) implies that log R m eφ (ψ(z)) ν (dz) is bounded, which proves (B.3).
Proof of Lemma 2.8. Consider the probability measure on R d × R d defined by where Z is the normalization constant that makes ν a probability measure; the finiteness of Z follows on setting λ = 0 in (2.13). Since ψ satisfies (2.13), we can apply Lemma B.2 withψ(x, y) = ψ(x) + ψ(y) to conclude that there exists a convex and increasing functionφ : R + → R with lim s→∞φ (s)/s = ∞ such that for any ζ ∈ P(R d × R d ), We claim, and prove below, that for every ζ we have If the claim holds, then sinceφ is increasing and since ψ and R are positive, for i = 1, 2, where recall from Definition 3.6 and Definition 2.3 that π i # ζ represents the ith marginal of ζ. Adding the inequality (B.8) for i = 1 and i = 2 we have ≤ 4J a (ζ) + 4R ζ|e −(1−a)V ⊗ e −(1−a)V + 2(C + log Z).
We now turn to the proof of the claim (B.8). We can assume without loss of generality that ζ(dxdy) has a density with respect to the measure e −(1−a)V ⊗ e −(1−a)V , because otherwise (B.8) holds trivially, since W (x, y) + aV (x) + aV (y) is bounded from below. Denoting this density (with some abuse of notation) by ζ(x, y), (B.6) then gives Therefore, recalling the definition of J a in (2.6), we have which completes the proof of the claim, and therefore the lemma.

C Proof of Lemma 3.5
We first establish a preliminary result in Lemma C.1 below. Let B(0, r) denote the closed ball about 0 of radius r, and let B c (0, r) denote its complement. Furthermore, the metric space (P ψ (R d ), d ψ ) is separable.
Since is arbitrary, the conclusion follows.
Since is arbitrary, when substituted back into (C.4), this shows that We now turn to the proof that P ψ (R d ) is separable. Let {x n } be a countable dense subset of R d , and define where Q + is the set of nonnegative rational numbers, and observe that A is a countable subset of P ψ . We now show that A is dense in P ψ . Fix µ ∈ P ψ and ε > 0. Also and let F 1 be the subspace of functions with ||f || BL ≤ 1. Then consider the metric on P(R d ) given by In view of the definition of d ψ in (2.5) and the fact that there exists a constant C < ∞ such that d w (µ, ν) ≤ C d BL (µ, ν) (see [12, p. 396]), it suffices to show that there exists ν ∈ A such that Recalling that ψ is continuous, for each n ∈ N, choose r n ∈ (0, ε/2) such that sup x∈Br n (xn) |ψ(x) − ψ(x n )| ≤ ε 2 , (C. 6) and note that then we also have sup x∈Br n (xn) Now defineB n := B rn (x n ) \ ∪ n−1 k=1 B r k (x k ) and b n := µ(B n ). Clearly, {B n } n∈N forms a disjoint partition of R d and hence, ∞ n=1 b n = 1. Moreover, by (C.6) and (C.7) we have for all f ∈ F 1 ∪ {ψ}, (C.8) We can assume without loss of generality that ψ is uniformly bounded from below away from zero. Since R d ψ(x)µ(dx) is finite, this implies ∞ n=1 b n ψ(x n ) < ∞, and hence there exists N ∈ N such that Observe that N n=1 c n = ∞ n=1 b n = 1, and hence, c 1 also lies in Q + . Set ν := N n=1 c n δ xn . Then, for f ∈ F 1 ∪ {ψ}, using (C.10) and (C.9), we have When combined with (C.8) this establishes the desired inequality (C.5).
Thus, by the first assertion of Lemma C.1, {µ n } is tight in P ψ (R d ).

D Tightness results
∆ n m,i ∆ n m,j   .
Let F n j = σ(X n i , i = 1, . . . , j). As we show below, by a standard conditioning argument, the off-diagonal terms vanish: for i > j, E ∆ n m,i ∆ n m,j = E E ∆ n m,i ∆ n m,j F n i = E E ∆ n m,i F n i ∆ n m,j = 0.