ELECTRONICCOMMUNICATIONSinPROBABILITY LARGE DEVIATIONS FOR MIXTURES

Suppose the probability measures ( „ n ) on £ obey a large deviation principle (LDP). Suppose too that „ n is concentrated on £ n and that, for (cid:181) ( n ) 2 £ n with (cid:181) ( n ) ! (cid:181) 2 £, the probability measures ( P n(cid:181) ( n ) ) on X also obey an LDP. The main purpose of this paper is to give conditions which allow an LDP for the mixtures ( P n ), given by P n ( A ) = R P n(cid:181) ( A ) d„ n ( (cid:181) ), to be deduced. Chaganty (1997) also considered this question, but under stronger assumptions. The treatment here follows that of Dinwoodie and Zabell (1992) who, motivated by exchangeability, considered the case where „ n does not vary with n . ,


Introduction and motivation
Let µ n be a (mixing) probability measure on the Borel σ-algebra of a topological space Θ, concentrated on (the measurable set) Θ n . For each θ ∈ Θ n , let P n θ be a probability measure on the Borel σ-algebra of the topological space X for which the map θ → P n θ (A) is measurable on Θ n for every measurable A ⊂ X. For definiteness, let P n θ be given by some fixed probability measure on X when θ / ∈ Θ n . Based on these, the joint distribution, P n , and the marginal distribution, P n , obtained by mixing over θ, have the usual definitions: d P n (θ, x) = dP n θ (x)dµ n (θ) and dP n (x) = Θ dP n θ (x)dµ n (θ) = Θn dP n θ (x)dµ n (θ).
Throughout, Θ and X are assumed to be Hausdorff (i.e. distinct points can be separated by disjoint open sets) and Θ is assumed to be first countable (i.e. for each θ there is a countable collection of neighbourhoods such that every neighbourhood of θ contains one of this collection), which implies that convergence in Θ can be described using sequences. However, X is not assumed to be first countable. Hence X could, for example be a space of measures with the τ -topology, which is the one induced by the bounded measurable functions. In this context, the assumption that P n θ is a measure on the Borel sets of X, rather than some smaller σ-algebra, does rule out some natural examples -see Section 6.2 in Dembo and Zeitouni (1993).
The sequence of probability measures (P n ) (on the Borel σ-algebra of the topological space X) obeys a large deviation principle (LDP) if there is a lower semicontinuous non-negative function λ (a rate function) such that for every closed F and open G lim sup log P n (F ) n ≤ − inf y∈F λ(x) and lim inf log P n (G) n ≥ − inf y∈G λ(x).
The rate function λ is called 'good' (or 'proper') if for every finite β the set {x : λ(x) ≤ β} is compact. The sequence satisfies a weak LDP if the upper bound holds for compact, rather than closed, F . Furthermore, the sequence of probability measures (P n ) is said to be exponentially tight if for every α > 0 there is a set O α whose complement is compact with The main idea is to combine large deviation results for (P n θ ) and (µ n ) to give large deviation results for the marginal distributions (P n ). The treatment draws heavily on that in Section 2 of Dinwoodie and Zabell (1992), who consider the case where µ n does not depend on n. They used their results to consider large deviations for exchangeable sequences in rather general spaces; this motivation led naturally to the assumption that µ n was independent of n. The basic framework adopted here is used by Chaganty (1997), who also provides a number of statistical applications, but the treatment here is more general in two main ways. Firstly, the use of Θ n rather than Θ is needed to deal with our motivating example, is natural, and produces genuine complication in the argument. Secondly, Chaganty (1997) confines attention to cases where Θ and X are both Polish, whereas here greater topological generality, in the spirit of Dinwoodie and Zabell (1992), is maintained. A final, arguably less significant, difference is that the focus in Chaganty (1997) is on the LDP for the joint distributions ( P n ), rather than the marginals (P n ). Chaganty's main result will be a consequence of the results here. The motivating example for developing these results arose in the study of random graphs. The classical random graph is very well understood, but fails to match up to the graphs occurring in many applications. Recently, Cannings and Penman (2003) suggested a model with more flexibility; see also Penman (1998). Suppose a graph is to have n vertices. Then, to produce random graphs with a correlation structure between edge occurrences, Cannings and Penman (2003) proposed that each vertex is independently assigned one of a number of colours, and the probability that an edge arises depends on the colours of its two vertices. The problem posed is to find an LDP for the number of edges, as n becomes large. This falls exactly into the framework proposed. To elucidate, consider the graph with n vertices. Let the proportions of these vertices of the various possible colours be θ; then µ n is the distribution θ. Given n and θ the number of edges is obtained as the sum of independent (but not identically distributed) random variables; this specifies P n θ . Note that for finite n the possible values of θ are confined to those with nθ containing integers; this defines Θ n here. The details of this application are discussed in Biggins and Penman (2003). The next section contains the statements of the main results. The following two contain their proofs and those of various intermediate results. A brief final section mentions some possible directions for further work.

The main results
For easier references in the statement and proofs, various assumptions will be labelled. The first two concern the LDP and the exponential tightness for the mixing distributions (µ n ) on Θ. ldpµ: (µ n ) satisfies an LDP with rate ψ. tightµ: (µ n ) is exponentially tight. When Θ is compact tightµ holds automatically. The third assumption is an LDP statement for the conditional distributions. In Dinwoodie and Zabell (1992), this kind of condition is called exponential continuity. To state it, a little more notation is needed. Let Θ be the limit set of sequences with the nth member from Θ n ; thus Θ = {θ ∈ Θ : ∃ θ(n) ∈ Θ n , θ(n) → θ}.
It is easy to check that Θ is closed; see Lemma 10 in the next section. Most applications will have Θ = Θ. exp-cty: Θ is non-empty and whenever θ(n) ∈ Θ n and θ(n) → θ ∈ Θ, (P n θ(n) ) satisfies an LDP with rate λ θ . When θ / ∈ Θ, let λ θ (x) = ∞ for all x. Since λ θ is a rate function it is lower semicontinuous on X for each θ. The fourth assumption is in similar vein. lsc: The function λ, defined by is lower semicontinuous on X.

Recall that a topological space is regular if for every open U containing x there is an open O
also containing x with its closure contained in U .
Theorem 1 Suppose ldpµ, tightµ, exp-cty and lsc all hold. Suppose also that Θ is regular. Then {P n } satisfies an LDP with rate function λ. When Θ is compact tightµ holds automatically. When ψ takes only the value 0 the requirement that Θ is regular is not needed.
It turns out that in Theorem 1 it is automatic that λ is a good rate function under the extra conditions that X is regular and the rate functions ψ and λ θ are good. This is the essential content of the next theorem.
Theorem 2 Suppose ldpµ, tightµ and exp-cty all hold and that Θ and X are regular. Suppose also that the following conditions hold. goodψ: The rate function ψ in ldpµ is good. goodλ θ : For each θ ∈ Θ, the rate function λ θ in exp-cty is good. Then λ, defined at (2), is a good rate function and (P n ) satisfies an LDP with rate function λ.
The next result notes that often the rate function being good implies exponential tightness. It shows that the hypothesis tightµ in Theorem 2, and later results, is superfluous when goodψ holds and Θ is locally compact or Polish. For locally compact spaces the result is contained in Exercise 1.2.19 in Dembo and Zeitouni (1993). For Polish spaces it is Lemma 2.6 in Lynch and Sethuraman (1987) -see also Exercise 4.1.10 in Dembo and Zeitouni (1993).
Lemma 3 Suppose Θ is either locally compact or Polish and goodψ holds. Then tightµ holds.
The following condition is a natural extension of the property that λ θ is lower semicontinuous on X for each θ. jnt-lsc: Dinwoodie and Zabell (1992) gives some general conditions for jnt-lsc to hold; see also Lemma 3.2 in Chaganty (1997). The next result uses this condition to provide one way to check that lsc holds.
Theorems 1 and 2 approach the LDP for (P n ) directly. The next result, which essentially contains Theorem 2.3 in Chaganty (1997), approaches the question through a weak LDP for the joint distributions ( P n ).
Theorem 5 Suppose ldpµ, exp-cty and jnt-lsc hold. Suppose too that both Θ and X are regular.
(a) Then ( P n ) satisfies a weak LDP with rate function λ θ (x) + ψ(θ). Furthermore, when Θ is locally compact the LDP in ldpµ can be replaced by a weak LDP and, similarly, when X is locally compact a weak LDP is enough in exp-cty.
(b) If in addition ( P n ) is exponentially tight then the (full) LDP holds with a good rate function and (P n ) satisfies an LDP with the good rate function λ defined at (2).
It is desirable to have conditions that ensure that ( P n ) is exponentially tight in order to use the last part of the previous result. The next three Propositions, and Lemma 3, provide a variety of conditions for this. Before giving them one further definition is needed. A family of sequences (of probability measures) is uniformly exponentially tight if, in (1), for every α > 0 the same O α can be used for every sequence.
Proposition 6 Suppose that tightµ holds. Suppose also that the following condition holds. uni-tight: For each θ ∈ Θ, the family of sequences {(P n θ(n) ) : θ(n) ∈ Θ n , θ(n) → θ} is uniformly exponentially tight. Then ( P n ) is exponentially tight. Dinwoodie and Zabell (1992) gives conditions under which uni-tight holds when, for each θ, P n θ is the distribution (on a rather general space) of the average of independent identically distributed variables. In Lemma 3 it is noted that tightµ can be replaced by goodψ when Θ is locally compact or Polish. The next Proposition is in a similar spirit.

Lemma 3.2 in
Proposition 7 If X is locally compact then, in Proposition 6, uni-tight can be replaced by goodλ θ .
The final result in this trio is not useful for getting the LDP for the marginal (P n ) from Theorem 5, since the conditions are the same as those in Theorem 2 except for a more restrictive condition on X. However, it could be used to strengthen the weak LDP for the joint distributions ( P n ) to a (full) LDP.

Proofs of Theorems 1 and 2 and Proposition 4
By definition, a function f on X is lower semicontinuous at x if for each c < f (x) there is an open U containing x such that f (y) > c for every y ∈ U .
Lemma 9 If X is regular and f is lower semicontinuous on X then for every x and c < f (x) there is a closed set C x with x in its interior and f (y) > c for all y ∈ C x .
Proof. Fix x and c < f (x). By the definition of lower semicontinuity, there is an open set U containing x with f (y) > c for y ∈ U . Applying regularity, there is an open set V x containing x with its closure inside O x . Take C x to be the closure of V x .
is also an open neighbourhood of θ k(i) and so there is an n k(i) with θ (k(i)) n ∈ U i for all n ≥ n k(i) . The sequence ϑ n = θ (k(i)) n ∈ Θ n for n k(i) ≤ n < n k(i+1) converges to θ, and so θ ∈ Θ.
In particular, for G open in X, To demonstrate this, suppose it fails. Then there are n(i) > n(i − 1) and θ(i) ∈ Θ n(i) with θ(i) → θ such that which contradicts the lower bound in the LDP in exp-cty.
Thus, for n ≥ N , and so, using ldpµ, The last part comes from taking G * = Θ × G, for then P n (G * ) = P n (G).
Lemma 12 Suppose ldpµ and exp-cty hold. Suppose too that tightµ holds and Θ is regular. Let F ⊂ X be closed. Then Proof. Fix F . Let c and d be such that Using tightµ, let O be such that  To demonstrate this, suppose it fails. Then there are n(i) > n(i − 1) and θ(i) ∈ Θ n(i) with θ(i) → θ such that P n(i) θ(i) (F ) > exp(−n(i)Λ (θ)), and then lim sup log P n(i) which contradicts the upper bound in the LDP in exp-cty. Furthermore, using the lower semicontinuity of ψ, by taking O θ to be smaller if necessary, and, using regularity of Θ, there is an open set V θ with closure V θ such that θ ∈ V θ and is an open covering of S. Since S is compact a finite subcover (V θ(i) ) 1≤i≤k exists. Then, for sufficiently large n, exp(−nΛ (θ(i))) exp(−nψ (θ(i))).
Hence, since c < 1/ , Since c < d and > 0 are arbitrary, the result follows.
Lemma 13 In Lemma 12, if ψ takes only the value 0 then the hypothesis that Θ is regular is not needed.
Proof. When ψ takes only the value 0 there is no need to introduce V θ ; it suffices to take a finite subcover from (O θ : θ ∈ Θ).
Proof of Theorem 1. The last part of Lemma 11 gives the lower bound for open sets, Lemma 12 gives the upper bound for closed sets. Finally, λ is lower semicontinuous by assumption. Lemma 13 gives the simplification contained in the final assertion.
Some further work is needed to deal with the conditions implying that λ is good, to produce a proof of Theorem 2.
Then the sets {y : λ(y) ≤ c} and {y : λ(y) ≤ c} are the same, and λ and λ agree on this set.
Proof. The set K is compact because ψ is good. Then, since λ θ is non-negative, it is easy to see that λ(y) ≥ λ(y) ≥ min{λ(y), c}, which gives the result.
Lemma 15 Suppose ldpµ, exp-cty, goodψ and goodλ θ hold. Suppose also that both Θ and X are regular. Then λ, given by (2), is a good rate function.
Proof. Take and α with 0 ≤ α < α + 2 < ∞. Let K be {θ : ψ(θ) ≤ c = α + 2 }, which is compact, by goodψ. Now, by Lemma 14, Denote this set by L α and suppose α was selected so that L α is not compact. Then there exists a net {(θ(i), x(i)) : i ∈ I} ⊂ K × X such that {x(i)} ⊂ L α , {x(i)} has no convergent subnet and λ θ(i) (x(i)) + ψ(θ(i)) ≤ α + for all i. Note that this implies that θ(i) ∈ Θ because, by definition, λ θ (x) = ∞ for θ / ∈ Θ. Since K is compact and first countable there is a subsequence (θ(i k ), x(i k )) such that θ(i k ) → θ, where θ ∈ Θ since Θ is closed. Furthermore, because ψ is lower semicontinuous, lim inf ψ(θ(i)) ≥ ψ(θ) and ψ(θ) ≤ α + . Take β = α + 3 . The level set of λ θ given by L β θ = {x : λ θ (x) ≤ β − ψ(θ)} is compact, by goodλ θ . Hence, for large enough k 0 , C 0 = {x(i k ); k ≥ k 0 } must be in the complement of L β θ . Following exactly the argument in Lemma 2.1 in Dinwoodie and Zabell (1992), C 0 is closed and so, using the regularity of X and the compactness of L β θ , there are open sets separating C 0 and L β θ . Hence, there is an open set U containing C 0 with closure C in the complement of L β θ . Now take ϑ(i, n) ∈ Θ n with ϑ(i, n) → θ(i). By the LDP lower bound in exp-cty, Hence, selecting suitable subsequences, there is an increasing sequence n(k) and ϑ(k) ∈ Θ n(k) such that ϑ(k) → θ and log P n(k) By the LDP upper bound in exp-cty because C is in the complement of L β θ . Since U ⊂ C this contradicts the previous inequality. Therefore L α must be compact. It is therefore also closed, since X is Hausdorff, which means λ is lower semi-continuous.
Proof of Theorem 2. The last part of Lemma 11 gives the lower bound for open sets, Lemma 12 gives the upper bound for closed sets. Lemma 15 shows that λ is good rate function under the stated conditions.
For each θ, because λ θ (y) is jointly lower semicontinuous, and ψ is lower semicontinuous there are open sets O θ ⊂ Θ and U θ ⊂ X containing θ and x respectively such that throughout The {O θ : θ ∈ C} cover C, and so there is a finite subcover, which is open and contains x. Then for y ∈ U λ(y) ≥ min provided is small enough. Then x ∈ U ⊂ {y : λ(y) > c} proving the result.

Proof of Theorem 5 and associated results
Proof of Theorem 5. First, the lower bound for open sets is contained in Lemma 11. Second, by ldpµ and jnt-lsc, λ θ (x) + ψ(θ) is lower semicontinuous. To prove (a) it remains to consider the upper bound for compact sets. Fix F ⊂ Θ × X, compact. By lower semicontinuity and regularity, for each (θ, x) there are open sets O ⊂ Θ and U ⊂ X containing θ and x respectively, with closures O and U , such that By taking O to be smaller, if necessary, there is an integer N such that for n ≥ N and γ ∈ O ∩ Θ n P n γ (U ) ≤ exp(−nλ (x, θ)) and µ n (O) ≤ exp(−nψ (θ)). Thus, for n ≥ N , ≤ exp(−nλ θ (x)) exp(−nψ (θ)).

Hence lim sup
As (x, θ) varies over F the corresponding sets U × O cover F . Taking a finite subcover, using it to get an upper bound on P (F ) and then letting go to zero completes the proof. In the locally compact cases, O and U can be taken so that U and O are compact and so a weak LDP is enough to bound the corresponding terms. This completes the proof of (a). Part (b) follows immediately from Lemma 1.2.18 in Dembo and Zeitouni (1993), which gives the LDP for ( P n ), and the contraction principle (given in Theorem 4.2.1 of Dembo and Zeitouni (1993)) applied to the projection from Θ × X to X, which gives the LDP for (P n ).
Proof of Proposition 6. Fix α. Using tightµ, let O be such that lim sup log µ n {O} n < −α and let S be the (compact) complement of O. For θ ∈ Θ, let U θ ⊂ X be a set with compact complement such that for any θ(n) ∈ Θ n with θ(n) → θ The existence of U θ is guaranteed by uni-tight. For θ / ∈ Θ, let U θ = X. Then there is an open set V θ , containing θ, and an integer N θ such that for n ≥ N θ and γ ∈ V θ ∩ Θ n P n γ (U θ ) < exp(−nα).
Otherwise a suitable subsequence contradicts uni-tight. The collection {V θ : θ ∈ S} covers S. Take a finite cover (V θ(i) : 1 ≤ i ≤ k) of S; then let U be the set ∩ i U θ(i) and K be its complement, which, as the union of a finite number of compact sets is itself compact. Then (O × X) ∪ (S × U ) has the complement S × K, which is compact, and, for n large enough which suffices, since α was arbitrary.
Lemma 16 Suppose X is locally compact and exp-cty holds. Then uni-tight holds when goodλ θ holds.
Proof. Locally compact means that for every x ∈ X there is U x open and C x compact with x ∈ U x ⊂ C x . Fix θ ∈ Θ. Take β < α < ∞. Since λ θ is good, Consider {P n θ(n) } where θ(n) ∈ Θ n , and θ(n) → θ. Then, by exp-cty, Since the set O is independent of the particular sequence (θ(n)) the result is proved.
Proof of Proposition 7. This is an immediate consequence of Lemma 16.
Proof of Proposition 8. The argument is borrowed from the last part of the proof of Theorem 2.3 in Chaganty (1997). Fix α. Using tightµ, let O ⊂ Θ, with a compact complement S, be such that lim sup log µ n {O} n < −α/2.
By Theorem 2, (P n ) satisfies an LDP with the good rate function λ. Then, by Lemma 3, (P n ) is exponentially tight and so there is an open set U ⊂ X with a compact complement K such that lim sup log P n {U } n < −α/2.

Possible extensions and refinements
This is a brief note of things that have not been attempted but seem to have some interest. Clearly, it would be desirable to have some variant of Lemma 16 for Polish spaces. However, the proof that a good rate function implies exponential tightness in a Polish space seems to work only for a given sequence -see Lemma 2.6 in Lynch and Sethuraman (1987). Hence, it does not produce the uniformity needed in uni-tight. This note aims to generalise Theorem 2.3 in Dinwoodie and Zabell (1992). In that Theorem, the mixing LDP, ldpµ, and the associated exponential tightness, tightµ, hold automatically, while exponential continuity, exp-cty, and joint lower semicontinuity of λ θ (x), jnt-lsc, are taken as hypotheses. In a further study, Dinwoodie and Zabell (1993), they give results that relax these assumptions and also their assumption that Θ is compact, which is, in a sense, analogous to tightµ here. Their ideas could be taken up in this context. As hinted in the introduction, the possibility that X is a space of measures with the τ -topology raises the question of whether the arguments can be adapted to allow P n θ to be defined on a smaller σ-algebra than the Borel one. Finally, the approach to large deviations described in Puhalskii (2001) could be explored. Theorem 1.8.9 and Lemma 1.8.12 there are relevant. Roughly translated into the language here, they give conditions on ψ(θ) and λ θ (x) which make λ θ (x) + ψ(θ) a rate function on Θ × X.