Simulation of Hyper-Inverse Wishart Distributions for Non-decomposable Graphs

We propose an efficient solution to the problem of sampling from a hyperinverse Wishart distribution in non-decomposable graphs. The method relies on local computations based on the standard junction tree representation of graphs and distribution theoretical results of constraint Wishart matrices. We hope with this note to clarify a few confusing points that appeared in previous attempts to solve this problem. Some key words: Hyper-inverse Wishart; Junction Trees; Non-decomposable graphs; Posterior simulation.

1. Introduction [3] proposed a general method to direct sample from the hyper-inverse Wishart for both decomposable and non-decomposable graphs.The approach is based on the compositional form of the joint distribution over a sequence of subgraphs defined by the junction tree as if Σ ∼ HIW G (b, D), where G = (V, E), and for a sequence of perfectly ordered prime components {P 1 , S 2 , P 2 , . . ., P k } the density takes the form The efficiency of the proposed sampling strategy derives from the fact that all matrix operations are done at the component level and therefore the complexity of the algorithm is dependent only on the dimension of the largest prime component of G.
In decomposable graphs their methodology follows directly from basic conditioning results of normals and Wishart distributions and it works perfectly.For non-decomposable graphs, however, they used the general distributional theory for the Cholesky decomposition of a hyper-inverse Wishart defined by [2] in conjunction with their decomposition idea.In this paper, we identify two problems in their approach and propose a direct sampling solution that preserves the computational efficiencies associated with matrices computations being carried out at the level of the prime components.
For clarity of presentation we assume from this point forward that G = (V, E) can be decomposed into two prime components and one separator {P 1 , S 2 , P 2 }.Assume further that the subgraph G P1 is complete and G P2 is not.This implies no loss of generality as the proposed sampling process can be repeated down the junction tree in the presence of both complete and incomplete additional components.

Sampling in a single non-decomposable prime component
Let us start by focusing on sampling from the marginal distribution of the Write D −1 P2 = T ′ T and K = Φ ′ Φ as Cholesky decompositions and define Ψ = ΦT −1 .Following the nomenclature of [2], the free elements of Φ are those φ ij such that (i, j) is an edge in P 2 .From Theorem 1 and equation (38) of [2], these free elements have density defined by where p is the cardinality of subgraph G P2 and the ψ ij s, for (i, j) / ∈ E and i < j, are well-defined functions of the free elements of G P2 .
Based on expression (3), Section 4.3 of [2] suggested a direct algorithm to sample K using chi-squares and normal random variables.This suggestion was adapted in [3] in the context of the decomposition in (1).This is where the first problem with their approach appears.The derivations of [2] are correct but the sampling suggestion used in [3] is not.According to (3), the free elements are not independent normal and chi-square random variates; they are implicitly dependent through the term of exp − 1 2 (i,j) / ∈E i<j ψ 2 ij .We suggest the following rejection sampling modification [7] as a fix for the problem: The algorithm in [3] only has Step (i) and (iii).However, a sample of Ψ in Step (i) is not from equation ( 3), but from the instrumental distribution: Therefore, the Accept-Reject method in Step (ii) is necessary to ensure that this sample is from the true distribution of Ψ, that is, equation (3).Note that exp − 1 where C is the inverse of the normalizing constant of the distribution of equation (3).By Corollary 2.17 and Algorithm A.4 of [7], we can generate a correct sample of Ψ using the rejection sampling of Step (ii).

Sampling HIW distributions for non-decomposable graphs
With the correct mechanism to sample from a single non-decomposable prime component available, we can revisit the decomposition strategy of [3] to draw samples of Σ in all of G.It should be clear that sampling Σ P1 is a simple task as Σ P1 ∼ IW(b, D P1 ) so the problem is reduced to sampling from the conditional distribution p(Σ P2 | Σ S2 ) as Σ S2 is a complete subset of Σ P1 .The second mistake in [3] was the suggestion that obtaining samples from this conditional distribution in non-decomposable graphs could be done by matching elements of the Cholesky decomposition of the Σ −1 P2 with the ones obatained from the decomposition of Σ −1 S2 [see 3, Section 3.3].We now fix this problem by defining the appropriate conditioning strategy.
Write Σ P2 and K = Σ −1 P2 as By noting that S 2 is complete so that the subgraph G P2 is collapsible onto The independency result in (4) allow us to define the sampling scheme for p(Σ P2 | Σ S2 ) as follows: 1. Sample K ∼ W G (b, D P2 ) using the rejection sampling of Section 2 providing values to the submatrices K R and With samples from Σ P1 , Σ S2 and Σ P2 available, the remaining elements of Σ where (i, j) / ∈ E can be computed via the completion operation derived in [5] and presented in equation ( 6) of [3].

A simulated example
We consider one simulated example that involves a two prime component nondecomposable graph G in Figure 1.The simulation method was applied to generate 5000 samples from the hyper-inverse Wishart distribution HIW G (203, D) where To demonstrate the efficacy of the our sampler, we first compute the theoretically exact value of E(Σ E | D, d, G), where Σ E denotes the free elements of Σ.The Corollary 2 of [8] implies that this expectation can be calculated as The theoretically exact value and the Monte Carlo estimate based the sample mean of the 5000 simulated covariance matrices are , respectively, where • denotes non-free elements to highlight structure.The mean and median number of samples required to accept one sample in the rejection sampling of Step (ii) in Section 2 is 1.35 and 1 respectively.

Final remarks
We have identified and fixed two problems involving the sampling of hyperinverse Wishart random variables conditional on non-decomposable graphical models as proposed by [2] and [3].Their ideas are still used in our approach and it now clear that direct samples can be obtained in a computationally efficient way by using the junction tree of a graph.Finally, we would like to point the reader to two recently proposed alternative solutions to this problem: [1], [6] and [4].It is important to highlight, however, that all of these strategies are based on iterative Markov chain Monte Carlo and therefore have to rely on the eventual convergence of the chain.Our method is a direct sampling strategy that, at each step, delivers exact samples from the hyper-inverse Wishart distribution.Upon publication of this note, a R-package for sampling from general hyper-inverse Wishart distributions will be available at the author's website.