Monte Carlo methods for improper target distributions

Abstract: Monte Carlo methods (based on iid sampling or Markov chains) for estimating integrals with respect to a proper target distribution (that is, a probability distribution) are well known in the statistics literature. If the target distribution π happens to be improper then it is shown here that the standard time average estimator based on Markov chains with π as its stationary distribution will converge to zero with probability 1, and hence is not appropriate. In this paper, we present some limit theorems for regenerative sequences and use these to develop some algorithms to produce strongly consistent estimators (called regeneration and ratio estimators) that work whether π is proper or improper. These methods may be referred to as regenerative sequence Monte Carlo (RSMC) methods. The regenerative sequences include Markov chains as a special case. We also present an algorithm that uses the domination of the given target π by a probability distribution π0. Examples are given to illustrate the use and limitations of our algorithms.


Introduction
Let π be a distribution (that is, a measure) on a set S and let f be a real valued function on S, integrable with respect to π, that is, S |f |dπ < ∞.Let λ := S f dπ.The problem addressed in this paper is to find appropriate statistical procedures to estimate λ.That is, the problem is to find a way to generate data (X 1 , X 2 , . . ., X n ) and find a statistic λn such that for large n, λn is close to λ in a suitable sense.When π is known to be a probability distribution, that is when π(S) = 1, a well established statistical procedure is the classical (iid) Monte Carlo method.This method requires one to generate iid observations (X 1 , X 2 , . . ., X n ) with distribution π and estimate λ by λn ≡ n i=1 f (X i )/n.By the law of large numbers for iid random variables, λn a.s.−→ λ as n → ∞.If in addition S f 2 dπ < ∞ then the accuracy of the estimate λn , that is the behavior of the quantity ( λn − λ) is (by the classical CLT) of the order n −1/2 .
Over the past twenty years an alternative statistical procedure known as Markov chain Monte Carlo (MCMC) has come into extensive use in statistical literature.In particular, if generating iid observations from π is not easy then classical Monte Carlo cannot be used to estimate λ.The MCMC method is to generate a Markov chain {X n } n≥0 with π as its invariant distribution and that is suitably irreducible.Then one appeal to the law of large numbers for such chains that says that for any initial value of X 0 , the "time average" n j=1 f (X j )/n converges almost surely to the "space average" λ = S f dπ.Indeed, the pioneering paper by [15] did introduce this method where the target distribution π, although a probability distribution, turned out to be the so called Gibbs measure on the configuration space in statistical mechanics and it was difficult to simulate from π.In this fundamental paper, they constructed a Markov chain {X n } n≥0 that is appropriately irreducible and has π as its stationary distribution.This idea was adapted to some statistical problems by [10].This whole apparatus was rediscovered in the late 80's and early 90's by a number of statisticians and since then the subject has exploded [see e.g.21].Both the above two methods, that is, the iid Monte Carlo and the MCMC depend crucially on the assumption that π is a probability distribution.
A natural question suggested by the above discussion is the following.Suppose π is not proper (that is, π(S) = ∞) but f : S → R is such that S |f |dπ < ∞.How do we estimate λ by statistical tools as distinct from classical numerical analytic tools?Clearly, the iid Monte Carlo to generate data from π is not feasible since π(S) = ∞.However, it may be possible to generate a Markov chain {X n } n≥0 that is appropriately irreducible and admits the (improper) distribution π as its stationary distribution.In such situations, it can be shown (see Corollary 2) that the usual estimator n j=1 f (X j )/n will indeed not work, that is, will not converge to λ, but rather go to zero with probability 1.In particular, in Bayesian statistics if an improper prior is assigned to the parameters then the resulting posterior distribution π (for the given data) is not guaranteed to be proper.If posterior propriety is not checked and π happens to be improper then Corollary 2 implies that the usual MCMC estimator n j=1 f (X j )/n based on a Markov chain {X n } n≥0 with π as its stationary (improper) distribution will simply not work in the sense that it will go to zero with probability 1 and not to λ.The use of MCMC for improper posterior distribution does not seem to have received much attention in the literature.The exceptions include [22,12], and [13].
In this paper we show that given a distribution π on some space S that may or may not be proper, it may be possible to generate data {X n } n≥0 such that more general time averages (called regeneration and ratio estimators in Theorem 1) than the standard Monte Carlo estimator, n j=1 f (X j )/n, will converge to λ.This is based on the idea of regenerative sequences (Definition 1) which in-clude iid samples and Markov chains (null or positive recurrent) as special cases (see section 2).For Harris recurrent Markov chains, the possibility of using the ratio estimator proposed here was previously noted by [22].Based on these theoretical results we present several algorithms for statistical estimation of λ (see sections 2 and 3).Section 3 discusses the use of a special class of regenerative sequences, namely regenerative sequences (not necessarily Markov chains) based on null recurrent random walks for estimating countable sums and integrals when S = R d for any d < ∞.Some examples are given in Section 4. Some of these examples illustrate the use of null recurrent Gibbs chains for consistent estimation of quantities like λ.This use of null recurrent Markov chains does not seem to have been noted before in the statistical literature.Section 4 also presents a generalization of the importance sampling methods and its potential application in Bayesian sensitivity analysis.
Another method discussed in this paper uses the idea of dominating the given target π by a probability distribution π 0 and then using the standard Monte Carlo methodology (iid or MCMC) with π 0 as its stationary distribution.This is discussed in Section 5. Some concluding remarks appear in Section 6.Finally, proofs of theorems and corollaries are given in Appendix B and some technical results appear in Appendix A and Appendix C.

Convergence results for regenerative sequences
In this section we introduce the concept of regenerative sequences and establish some convergence results for these.This in turn provides some useful statistical tools for estimating parameters.These tools are more general than the Monte Carlo methods based on iid observations or Markov chains.In particular, they are applicable to the case when the target distribution is not proper.The following definition of a regenerative sequence is given in [2].Definition 1.Let (Ω, F , P ) be a probability space and (S, S) be a measurable space.A sequence of random variables {X n } n≥0 defined on (Ω, F , P ) with values in (S, S) is called regenerative if there exists a sequence of integer (random) times are iid where τ j = T j − T j−1 for j = 1, 2, . . ., that is, for all k 0 , k 1 , . . ., k r ∈ N, the set of positive integers, A q,j ∈ S, 0 ≤ q < k j , j = 0, 1, . . ., r, and r ≥ 1.The random times {T n } n≥0 are called regeneration times and {τ j } j≥1 are the excursion times.
Example 1.Let {X n } n≥0 be an irreducible, recurrent Markov chain with countable state space S = {s 0 , s 1 , s 2 , . . .}.Let X 0 = s 0 , T 0 = 0, T r+1 = inf{n : Example 2. Let {X n } n≥0 be a Markov chain with general state space (S, S).Assume that S is countably generated and {X n } n≥0 is Harris recurrent.Then it has been independently shown by [3] and [18] that for some n 0 ≥ 1, the Markov chain {Z n } n≥0 has an atom that it visits infinitely often, where Z n ≡ X nn0 for n = 0, 1, . . . .By the Markov property, the excursions between consecutive visits to this atom are iid making {Z n } n≥0 a regenerative sequence (see e.g.Meyn and Tweedie [16,Section 17.3.1]or Athreya and Lahiri [2, Theorem 14.2.8]).
Example 3. Let {Y n } n≥0 be a Harris recurrent Markov chain as in Example 2. Given {Y n = y n } n≥0 , let {A n } n≥0 be independent positive integer valued random variables.Set The following theorem presents some useful limit results for regenerative sequences that can be used for constructing consistent estimators of integrals with respect to improper measures.Theorem 1.Let {X n } n≥0 be a regenerative sequence with regeneration times {T n } n≥0 .Let (2.1) (The measure π is known as the canonical (or, occupation) measure for {X n } n≥0 .) (ii) (ratio estimator) and A proof of Theorem 1 is given in Appendix B. Theorem 1(i) for continuous time regenerative sequences is available in [1].Theorem 1(ii) for Harris recurrent Markov chains is available in Meyn and Tweedie [16,Theorem 17.3.2].In Definition 1, if {X n : T j ≤ n < T j+1 , τ j+1 }'s are assumed to be iid only for j ≥ 1, then {X n } n≥0 is called a delayed regenerative process.Straightforward modification of the proof of Theorem 1 shows that it holds for any delayed regenerative process.
The next theorem describes the asymptotic behavior of the regeneration times, T n and the number of regenerations, N n .The result on the asymptotic distribution of T n presented in the theorem below is already proved in Feller [8, Chapter XIII, Section 6].
Theorem 2. Let {T j } j≥0 and {τ j } j≥1 be as defined in Definition 1. Suppose in addition where {a n } satisfies na −α n L(a n ) → 1 and V α is a positive random variable with a stable distribution with index α such that ) where b n = n α /L(n) and α and V α is as defined in (2.5).
As mentioned before, the proof of Theorem 2(i) is available in Feller [8, Chapter XIII, Section 6].The proof of Theorem 2(ii) for random walk Markov chains on R using method of moments is presented in [4, page 229].We provide a proof of Theorem 2(ii) in Appendix B.
Recall from the Introduction that our goal is to obtain a consistent estimate of λ = S f dπ for a given (possibly infinite) measure π and an integrable function f : S → R. Theorem 1 yields the following generic recipe for estimating λ.In order to use the above recipe, one needs to find a regenerative sequence with π as its canonical measure.Later in this section, we see that a Markov chain {X n } n≥0 that has a recurrent atom and has π as its invariant measure can be used for this purpose.
The following corollary uses (2.6) to provide an alternative estimator of λ.
Corollary 1. From (2.2) and (2.6) it follows that under the hypotheses of Theorem 1 and Theorem 2, ) is known and is finite then we can estimate S f dπ in the following way.Run k independent copies of {X n } and let D r n = n j=0 f (X r j )/b n where {X r j } n j=0 are the realized values of the chain in the rth run, r = 1, 2, . . ., k.Since for large n, D r n , r = 1, 2, . . ., k are independent draws from an approximation to It can be shown that there exists a sub-sub sequence k j l ↑ ∞ and n kj l ↑ ∞ such that λn k j l ,kj l a.s.
For the Markov chains in Example 1, it can be shown that P (X n = s i for some 1 ≤ n < ∞|X 0 = s j ) = 1 for all s i , s j ∈ S, that is, for such chains any s i ∈ S is a recurrent point and the Markov chain regenerates every time it reaches s i .If {X n } n≥0 is a Markov chain with general state space (S, S) and is φ-irreducible with a nontrivial σ-finite reference measure φ, then by the regeneration method of [3] and [18], the Markov chain {X n } n≥0 is equivalent to an appropriate Markov chain Xn on an enlarged space Š so that Xn admits an atom ∆ in Š [16,Theorem 5.1.3].Now in addition if we assume that {X n } n≥0 is Harris recurrent as in Example 2, then P ( Xn = ∆ for some 1 ≤ n < ∞| X0 = x) = 1 for all x ∈ Š, i.e., ∆ is a recurrent point for Xn .In Appendix A we present some results on the existence and the uniqueness of invariant measure for Markov chains.Using these results and Theorem 1, the following remark discusses how integrals with respect to a given target distribution can be estimated using Markov chains.
Remark 1.Let π be a given (proper or improper) target distribution and {X n } n≥0 be a Markov chain as in Examples 1 or 2 with invariant distribution π.Then from Corollary 3 in Appendix A we know that π is the unique (up to a multiplicative constant) invariant measure for {X n } n≥0 .Also from Theorem 1 we know that for any two integrable functions f, g : S → R with S gdπ = 0, no matter what the distribution of the starting value X 0 , as n → ∞, the ratio Rn = n j=0 f (X j )/ n j=0 g(X j ) converges almost surely to S f dπ/ S gdπ.If we choose g such that E π g = S gdπ is easy to evaluate, then λ = S f dπ can be consistently estimated by Rn (E π g).In particular, if ), the number of times the Markov chain visits K by time n.Further, from Corollary 3 in Appendix A we note that one may use the regenerative estimator λn as in (2.2) if π({∆}) is known.
From Remark 1 we see that when π is not proper, the regeneration estimator (2.2) and the ratio estimator (2.3) based on a Markov chain provide consistent estimates of λ.On the contrary, in Corollary 2 below we show that when π is improper, the usual time average MCMC estimator n j=1 f (X j )/n is not consistent for λ.Indeed, we show that it converges to zero with probability 1.A result similar to Corollary 2 but under a different set of hypotheses was established by [13].
Theorem 3. Let {X n } n≥0 be a regenerative sequence with occupation measure π as in (2.1).Let π(S) = E(T 1 ) = ∞.Then for any f : Then the standard time average estimator n i=1 f (X i )/n converges to 0 with probability 1.
The proof of Theorem 3 is given in Appendix B. Corollary 2 follows from Theorem 3 since as pointed out in Example 2, every Harris recurrent Markov chain is regenerative.In the improper target case, N n , the number of regenerations, grows slower than n as noted in (2.6).On the other hand, from (2.2) we know that the time sums n i=1 f (X i ) grow exactly at the rate N n .This may explain why the usual time average MCMC estimator is not consistent for λ when π is improper.

Algorithms based on random walks
In this section we focus on the use of a special class of regenerative sequences, namely regenerative sequences (not necessarily Markov chains) based on null recurrent random walks on Z and its variants for estimating countable sums and integrals.We begin with the case when S is countable.
Let S be any countable set and f : S → R be such that s∈S |f (s)| < ∞.Our goal is to obtain a consistent estimator of λ = s∈S f (s).Without loss of generality (see Remark 3) we take S to be Z, the set of integers.Let {X n } n≥0 be a simple symmetric random walk (SSRW) on Z starting at X 0 .That is, X n+1 = X n + δ n+1 , n ≥ 0 where {δ n } n≥1 are iid with distribution P (δ 1 = +1) = 1/2 = P (δ 1 = −1) and independent of X 0 .This Markov chain {X n } n≥0 is null recurrent with the counting measure {π(i) ≡ 1 : i ∈ Z} being its unique (up to multiplicative constant) invariant measure [see e.g. 16, Section 8.4.3].From Example 1 of Section 2, we know that {X n } n≥0 is regenerative and it satisfies the hypothesis of Corollary 3 in Appendix A with ∆ = 0. Since π(∆) = 1, Corollary 3 in Appendix A allows us to identify the occupation measure π with the counting measure and hence from (2.2) it follows that for any initial X 0 , where N n = n j=0 I(X j = 0) is the number of visits to zero by {X j } n j=0 .So λn is a consistent estimator of λ.Let X 0 = 0, T 0 = 0, T r+1 = inf{n : where and V α is as defined in (2.5).Since the density function of The above discussion implies the following algorithm, that can be used to estimate of λ = s∈S f (s)π(s) where π is a given (possibly infinite) measure on a countable set S and f : S → R is a function satisfying s∈S |f (s)|π(s) < ∞.Again, without loss of generality (see Remark 3) we assume that S = Z.
Algorithm I: 1. Run the simple symmetric random walk (SSRW), {X n } n≥0 on Z. 2. Let λn = n j=0 f (X j )π(X j )/N n where N n is the number of visits by {X j } n j=0 to 0 by time n.Then λn a.s.
−→ λ, that is λn is a consistent estimator of λ.Remark 3. Let S be any countable set and as before let N be the set of positive integers.Then there exists a 1 − 1, onto map ζ : N → S. Consider the map ξ : Z → N, defined as follows We now present some examples of the function ζ.
(i) Note that, if S = Z, we can take ζ to be simply ξ −1 .
Next we consider the case S = R 2 .Let f : R 2 → R be Lebesgue integrable and let i )} i≥1 's are iid random variables where η 1 i , η 2 i are iid Uniform (−1/2, 1/2).From Durrett's (2010) Theorem 4.2.8 we know that starting anywhere X 0 = x, {X n } n≥0 visits any circle infinitely often with probability one.Also since η i has an absolutely continuous distribution, the density of η i is bounded below by ǫ, for any 0 < ǫ < 1 [see e.g. 7, Example 6.8.2].Thus a regeneration scheme as in [3] exists making {X n } n≥0 regenerative.Since the Lebesgue measure on R 2 is an invariant measure for {X n } n≥0 by Corollary 3 in Appendix A and (2.3) of Theorem 1, a consistent estimator of λ is given by . Since X 0 = (0, 0), note that .
By the local central limit theorem, as n → ∞, (Here, as n → ∞, c n ∼ d n means that lim n→∞ c n /d n = 1.)This yields the following proposition.
Proposition 1.Let N n (K) be as defined above.Then Next using discrete renewal theory and Karamata's Tauberian theorem, arguing as in [4,Theorem 10.30], one can show that for each r = 1, 2, . . .
and {µ r } r≥1 is a moment sequence of an exponential distribution and hence satisfies Carleman's condition for determining the distribution.This implies the following theorem holds.
Theorem 4. Let N n (K) be as defined above.Then where Y is an Exponential(1) random variable.
Some recent results (analogous to Theorem 4) for random walks on Z 2 can be found in [9] and [19].As in the case of S = R, we can use Theorem 4 to obtain an alternative estimator of λ (Corollary 1).
Let S = R and π be absolutely continuous with density h.From the above discussion we see that the following algorithm can be used to estimate of λ = R f (x)π(dx) where f : R → R is π-integrable.

Let Rn
where N n (K) is the number of visits by {X j } n j=0 to the unit interval K = (−1/2, 1/2).Then Rn a.s.
−→ λ, that is, Rn is a consistent estimator of λ From the above discussion we know that Algorithm II can be suitably modified for the case S = R 2 .Remark 4. When S is countable, the method proposed here works with the Markov chain SSRW replaced by any mean 0, random walk on Z such that X n /n → 0 in probability.By Chung-Fuchs theorem this random walk is recurrent [7, page 195].Similarly when S = R or R 2 , any mean 0, finite variance random walk with absolutely continuous distribution can be used.On the other hand, a symmetric random walk on R may not be recurrent.For example, consider a random walk Markov chain {X n } n≥0 on R where the increment random variable η has a symmetric stable distribution with index α, 0 < α < 1, that is, the characteristic function of η is ψ(t) = exp(−|t| α ), 0 < α < 1.It can be shown that {X n } n≥0 is not recurrent [see 6, page 253] in the sense that for any nondegenerate interval, I in R, ∞ n=0 P (X n ∈ I) < ∞ and hence the number of visits by {X n } n≥0 to I is finite with probability 1.
Lastly we present an algorithm that is based on the SSRW on Z and a randomization tool to estimate integrals with respect to an absolutely continuous measure π on any R d , d < ∞.Let f : R d → R be Borel measurable and π be an absolutely continuous distribution on R d with Radon-Nikodym derivative p(•).Assume that R d |f (x)|p(x)dx < ∞.The following theorem provides a consistent estimator of λ = R d f (x)p(x)dx.Theorem 5. Let {X n } n≥0 be a SSRW on Z.Let {U ij : i = 0, 1, . . .; j = 1, 2, . . ., d} be a sequence of iid Uniform (−1/2, 1/2) random variables and in- where N n = n j=0 I(X j = 0) is the number of visits to zero by {X j } n j=0 .It may be noted that the sequence {Y n } n≥0 defined above is not a Markov chain but is indeed regenerative as in Theorem 1 with {T n } sequence corresponding to the returns of SSRW {X n } n≥0 to zero.The proof of Theorem 5 is given in Appendix B. Note that, the function κ can be chosen as κ = ζ • ξ where ζ and ξ are as given in Remark 3. Theorem 5 suggests the following variation of Algorithm I.

Examples
In this section we demonstrate the use of the statistical algorithms developed in the previous sections with some examples.Some of our simulations also illustrate that the convergence of these algorithms could be rather slow.

Simulation examples
We first consider estimating λ = ∞ m=1 1/m 2 using SSRW (Algorithm I) with X 0 = 0.The left panel in Figure 1 shows the estimates for 6 values (log 10 n = 3, 4, . . ., 8, where log 10 denotes logarithm base 10).The plot includes the median estimates and the 90% empirical confidence interval estimates based on 30 independent repetitions.The median and 90% interval estimates for n = 10 8 are 1.654, and (1.560, 1.701) respectively.Note that, the true value λ is π 2 /6 = 1.645.The time to run the SSRW chain for 10 8 steps using R [20] in an old Intel Q9550 2.83GHz machine running Windows 7 is about 3 seconds.Next we used the same chain (SSRW with X 0 = 0) to estimate λ = ∞ m=1001 1/(m − 1000) 2 and found that the first 10 5 iterations yielded an estimate of 0 (red lines in the right panel in Figure 1).However, estimation of λ can be improved if we start the SSRW at X 0 = 1000, and define regenerations as the returns to 1000 (green lines in the right panel in Figure 1).
Next we consider estimating λ = ∞ 1 1/x 2 dx (see left panel in Figure 2) using Algorithms II (red line) and III (green line) for the same six n values mentioned above.We observe that the interval estimates based on Algorithm II are much Notice that the integral, π(A) is not available in closed form.Let {(X n , Y n )} n≥0 be the Gibbs sampler Markov chain that uses f X|Y (•|y) and f Y |X (•|x) alternately.Then {(X n , Y n )} n≥0 has π as its invariant measure.From Theorem 3 we know that the usual MCMC estimator, n j=0 I A (X j , Y j )/n a.s.
−→ 0 since π is an improper measure (see the right panel in Figure 2).It can be verified that {(X n , Y n )} n≥0 is regenerative (a proof of this is given in Appendix C).So by Theorem 1, for any initial (X 0 , Y 0 ), and for any −∞ < a < b < ∞, Rn is a consistent estimator of π(A).The solid lines in the left panel in Figure 3 shows the median (based on 30 independent repetitions) estimates of π((−1, 1) × (−1, 1)) using Algorithm II with uniform random walk in R 2 (red line), Algorithm III (green line) and the above ratio estimator based on the Gibbs sampler (with −a = b = 1000) (blue line) for the same six n values mentioned above.The median estimates no. of iterations (log10 scale) for n = 10 8 corresponding to the Gibbs sampler, Algorithm II (with uniform random walk in R 2 ) and Algorithm III are 4.438, 2.801, and 3.060 respectively.Numerical integration yields 3.056 as an estimate of π(A).Like in the previous example, here also Algorithm III performs better than Algorithm II.
The next example is as follows.Let where R+ f (x ′ , y)dx ′ = y exp(−xy); 0 < x, y < ∞.Then for each y, f X|Y (•|y) is a probability density on R + .Consider the Gibbs sampler {(X n , Y n )} n≥0 that uses the two conditional densities f X|Y (•|y) and f Y |X (•|x) alternately.[5] considered this example and found that the usual estimator n j=0 f X|Y (x|Y j )/n for the marginal density f X (x) = R+ f (x, y)dy = R+ f X|Y (x|y)f Y (y)dy = 1/x breaks down.Indeed from Theorem 3 we know that this estimator converges with probability 1 to zero as the above Gibbs sampler chain is regenerative (see Proposition 3 in Appendix C) with an improper invariant measure.( [11] showed that the Gibbs sampler for any set A with positive Lebesgue measure on R 2 .)[5] suggested as a remedy that one restricts the conditional densities to compact intervals.Theorem 1 provides alternatives that do not require this restriction.
By Theorem 1 we have the following consistent estimator of Although log(b/a) Rn is a consistent estimator of f X (x) based on the Gibbs sampler, our simulations show that in this case the Gibbs chain is apt to take values too large or too small (positive) that are outside the range of machine precision.Here we use Algorithms II and III to estimate f X (2) = 1/2.The right panel in Figure 3 shows the estimates (median and 90% interval based on 30 repetitions) using Algorithms II (red line) and III (green line) for the same six n values mentioned above.The median estimates corresponding to red, and green lines for n = 10 8 are 0.503, and 0.501 respectively.
Remark 5. Note that in order to use the ratio estimator (2.3) based on the Gibbs sampler {(X n , Y n )} n≥0 in the above two examples, by Theorem 1 it suffices to establish that these chains are regenerative.This avoids having to establish the Harris recurrence of {(X n , Y n )} n≥0 , which is required to apply the Theorem 17.3.2 in Meyn and Tweedie [16].

Generalization of importance sampling
Importance sampling is a Monte Carlo technique in which a sample from one density is used to estimate an expectation with respect to another.Suppose π is a probability density on S (with respect a measure ν) and we want to estimate Often, such a π is known only up to a normalizing constant, that is let π(x) = cp(x), where c > 0 is a finite constant, not easy to evaluate and p(x) is a known function.Suppose π is another density (not necessarily proper) with respect to the measure ν such that {x : π(x) > 0} ⊂ {x : π(x) > 0}.Now note that Note that S f (x)[p(x)/π(x)]π(x)ν(dx) = E π f /c is well defined and is finite, and 0 < S [p(x)/π(x)]π(x)ν(dx) = 1/c < ∞.So if we can generate a Harris recurrent Markov chain (regenerative sequence) {X n } n≥0 with invariant (occupation) density π, then by Theorem 1, we have for any initial X 0 , Thus Rn is a consistent estimator of E π f .Note that, here the ratio limit theorem (Theorem 1 (ii)) is naturally applicable since the above importance sampling estimator of E π f has the form of a ratio.That is, in order to use the above importance sampling estimator, we do not need to find any function g such that S gdπ( = 0) is known.Note that the standard importance sampling using MCMC requires π to be a probability density and {X n } n≥0 to be positive recurrent [10].Here we are able to drop both these conditions.

An example involving Bayesian sensitivity analysis
The importance sampling estimator (4. 3) and π h (θ|y) plays the role of π.In general, the importance density π in (4.3) is preferred to have heavier tails than π h (θ|y) for all h ∈ H, as otherwise the ratio estimate will tend to be unstable because the weights p h (θ)/π(θ) can be arbitrarily large, where p h (θ) = ℓ(θ|y)π h (θ).On the other hand, it may be difficult to construct a proper importance density π which has heavier tails than π h (θ|y) for all h ∈ H. Since the importance sampling estimator (4.3) can be used even when π is improper, in the following example, we take π to be the improper uniform density on R. Let y ≡ (y 1 , y 2 , . . ., y m ) be a random sample from N(µ, 1).Suppose, the conjugate normal prior N(µ 0 , σ 2 0 ) is assumed for µ.Here h = (µ 0 , σ 0 ).Let {µ j } j≥0 be the random walk on R with Uniform (−1/2, 1/2) increment distribution.As mentioned in Section 3, {µ j } j≥0 is a Harris recurrent Markov chain with state space R and the Lebesgue measure as its invariant measure.The estimator (4.3) for E h (f (µ)|y) in this case becomes where ȳ = m i=1 y i /m, and φ(x; a, b) is the density of the normal distribution with mean a, variance b, evaluated at the point x.A single random walk chain {µ j } n j=0 started at µ 0 = 0 with n = 10 7 is used to estimate E h (f (µ)|y) with f (µ) ≡ µ for different values of (µ 0 , σ 0 ).The plots in Figure 4 give graphs of the estimate (4.4) as µ 0 and σ 0 vary.For making these plots, we took m = 5, and ȳ = 0. We fixed σ 0 = 0.5 for the graph in the left panel in Figure 4 and the plot is based on the estimate (4.4) for 41 values of µ 0 .(We took µ 0 ranging from −10 to 10 by increments of 0.5.)The right panel in Figure 4 is based on the estimate (4.4) for 410 values of (µ 0 , σ 0 ) with µ 0 ranging from −10 to 10 by increments of 0.5, and σ 0 ranging from 0.1 to 1 by increments of 0.1.Both the plots are median estimates of posterior expectations E h (µ|y) based on 30 independent repetitions of random walk chains of length n = 10 7 with µ 0 = 0.The pointwise empirical 90% confidence intervals are too narrow to distinguish these quantiles from the median estimates in these plots.The median estimates also match (up to two decimal places) with the true values E (µ0,σ0) (µ|y) = mσ 2 0 ȳ/(mσ 2 0 + 1) + µ 0 /(mσ 2 0 + 1).

Reduction to proper target distribution
In the previous sections we have presented several algorithms for estimating integrals with respect to a measure π that may be proper or improper.In this section, we propose one more algorithm that involves reduction to a proper target distribution.We begin with the following result.
Lemma 1.Let π be a σ-finite measure on a measurable space (S, S).
(i) Let {A n } n≥1 be a finite or countable sequence of Ssets such that A n 's are disjoint, π(A n ) < ∞ for all n ≥ 1, and S = ∪ ∞ n=1 A n .Let J = {j : π(A j ) > 0} and let {p j : j ∈ J} be such that p j > 0 for all j ∈ J and j∈J p j = 1.Then π is dominated by the probability measure π 0 defined by That is, A ∈ S, π 0 (A) = 0 implies π(A) = 0.
(ii) Further, the Radon-Nikodym derivative h = dπ/dπ 0 is given by (5.2) Remark 6.Since there can be many choices of {A n } n≥1 , and since we can use any {p j : j ∈ J} such that p j > 0 for all j ∈ J and j∈J p j = 1, the above lemma shows that given a σ-finite measure π, there are uncountably many probability measures π 0 dominating π.
Let π be the given σ-finite (probably infinite, that is, π(S) = ∞) measure on (S, S) and suppose we can find a probability measure π 0 on S such that π is dominated by π 0 .(For example, if π is absolutely continuous on R d , we can take π 0 to be the probability measure corresponding to the d-variate Student's t distribution.)Let h ≡ dπ/dπ 0 be the Radon-Nikodym derivative of π with respect to π 0 and suppose we know h only upto a normalizing constant, that is, h = ch * , where the finite constant c > 0 is unknown and h * is known.In that case, if we have an integrable (with respect to π) function g : S → R such that S gdπ = 0 is known, we have a consistent estimator of λ = S f dπ via a new ratio estimator where {X j } n j=0 is either iid sample from π 0 or realizations of a positive Harris recurrent Markov chain with invariant measure π 0 started at any initial X 0 = x 0 .
Lemma 1 and the above discussion is formalized in the following algorithm for estimating λ = S f dπ where π is a σ-finite measure on S.
Algorithm IV: 1. Find a probability measure π 0 dominating π (there are infinitely many such probability measures).Let h be the density of π with respect to π 0 .2. Draw iid sample {X j } n j=0 from π 0 or run a positive Harris recurrent Markov chain {X j } n j=0 (starting at any X 0 = x 0 ) with stationary distribution π 0 .Estimate λ by the usual estimator n j=0 f (X j )h(X j )/(n + 1) if h is completely known, otherwise if h is known up to a multiplicative constant, estimate λ via the ratio estimator (5.3).
One of the difficulties with Algorithm IV is that if π 0 has thinner tails than π, then the function h (or h * ) can take rather large values for some X j 's making the estimator highly varying.
Example 5.1 Let S = R and let f : R → R be Lebesgue integrable.Our goal is to obtain a consistent estimator of λ = R f (x)π(dx) using Algorithm IV where π(dx) = dx is the Lebesgue measure on R.
In order to prove Lemma 1 (ii), note that

Fig 3 .
Fig 3. Estimates using Gibbs sampler and random walks.

Fig 4 .
Fig 4. Posterior expectations for different values of hyperparameters.
available [see e.g. 8, page 173], simple calculations show that Y 1/2 d = |Z| where Z ∼ N(0, 1).(We can also use (2.5) to find the distribution of Y 1/2 .)So Remark 2. It can be shown [see 23] that the number of distinct values of X j for 0 ≤ j ≤ n grows at the rate of √ n.Thus, after n steps of running the SSRW Markov chain, the computation of d −→ π 2 λ|Z| and since E(|Z|) = 2/π is known we can use Corollary 1 to obtain an alternative estimator of λ. n j=0 f (X j ) involves evaluation of f only at approximately √ n sites in Z.
3) can be useful in Bayesian analysis when one wishes to see the effect of a change in the prior distribution on a posterior expectation.Let {π h (θ), h ∈ H} be a family of prior densities, where h is called a hyperparameter.The resulting posterior densities are denoted by π h (θ|y) ∝ ℓ(θ|y)π h (θ), where y denotes the observed data and ℓ(θ|y) is the likelihood function.Let f (θ) be a quantity of interest where f (•) is a function.In sensitivity analysis one wants to assess the changes in the posterior expectation E h (f (θ)|y) as h varies.If two different values of h result in the almost same posterior expectations, then the use of either of those two values will not be controversial.On the other hand, if E h (f (θ)|y) changes significantly as h varies, one may want to make a plot of the posterior expectations to see what values of h cause big changes.If posterior expectations are not available in closed form, then the importance sampling estimator (4.3) based on a single chain may be used to estimate E h (f (θ)|y) for all h ∈ H.Note that, here the parameter θ corresponds to x in (4.