Objective Bayesian Analysis for Gaussian Hierarchical Models with Intrinsic Conditional Autoregressive Priors

. Bayesian hierarchical models are commonly used for modeling spatially correlated areal data. However, choosing appropriate prior distributions for the parameters in these models is necessary and sometimes challenging. In particular, an intrinsic conditional autoregressive (CAR) hierarchical component is often used to account for spatial association. Vague proper prior distributions have frequently been used for this type of model, but this requires the careful selection of suitable hyperparameters. In this paper, we derive several objective priors for the Gaussian hierarchical model with an intrinsic CAR component and discuss their properties. We show that the independence Jeﬀreys and Jeﬀreys-rule priors result in improper posterior distributions, while the reference prior results in a proper posterior distribution. We present results from a simulation study that compares frequentist properties of Bayesian procedures that use several competing priors, including the derived reference prior. We demonstrate that using the reference prior results in favorable coverage, interval length, and mean squared error. Finally, we illustrate our methodology with an application to 2012 housing foreclosure rates in the 88 counties of Ohio.


Introduction
Bayesian hierarchical models with intrinsic conditional autoregressive (CAR) priors are used for many statistical models for spatially dependent data in applications such as disease mapping (Clayton and Kaldor, 1987;Bell and Broemeling, 2000;Moraga and Lawson, 2012;Goicoa et al., 2016), image restoration (Besag et al., 1991), complex survey data (Mercer et al., 2015), and neuroimaging (Liu et al., 2016).The use of CAR specifications for modeling areal data was first introduced by Besag (1974), followed by the intrinsic CAR model as a prior for a latent spatial process proposed by Besag et al. (1991).Bayesian methods have been the dominating paradigm for spatial models including a CAR component (Sun et al., 1999;Hodges et al., 2003;Reich et al., 2006;Banerjee et al., 2014).Often in practice, it is difficult to subjectively choose informative priors for the parameters of the CAR component that are meaningful based on relevant domain knowledge.As a result, practitioners frequently use vague naïve priors (Bernardinelli et al., 1995;Best et al., 1999;Bell and Broemeling, 2000;Lee, 2013) that in the case of spatial models may lead to poor performance, such as slow Markov chain Monte Carlo (MCMC) convergence (Natarajan and McCulloch, 1998), unacceptably wide credible intervals, or larger mean squared error for estimation of parameters.To solve these problems, we introduce a novel objective prior that eliminates the need to subjectively choose priors for the parameters of the intrinsic CAR prior when no previous knowledge is available.Our objective prior can serve as an automatic prior for the popular Gaussian hierarchical model with an intrinsic CAR prior for the spatial random effects, thus enabling an automatic Bayesian analysis.
To address concerns with the use of subjective proper priors, research has been conducted to explore objective Bayesian analysis of various spatial models.In particular, Berger et al. (2001) have introduced objective priors for geostatistical models for spatially correlated data, which has been further developed by De Oliveira (2007) who has derived objective priors for a similar model that includes measurement error.The development of objective priors for CAR models for the observed data has been explored by Ferreira andDe Oliveira (2007), De Oliveira (2012), and Ren and Sun (2013).While many use CAR models directly for observed data, sometimes it is preferred to use a CAR prior for spatial random effects to allow for a smoother spatial process.More recently, Ren and Sun (2014) have derived objective priors for autoregressive models incorporated into the model as a latent process.Ren and Sun (2014) have focused on the use of proper CAR priors for the latent process component.However, in practice spatial random effects are usually assigned intrinsic CAR priors (Best et al., 2005).In this work, we consider objective priors for hierarchical models that use intrinsic CAR priors for the spatial random effects.
In the spatial statistics literature, intrinsic autoregressions (or intrinsic CARs) carry the notion that they are improper "densities" that are made proper by imposing a constraint.The most frequently used constraint is a sum-zero constraint that ensures that the sum of the spatial random effects is equal to zero.For the spatial random effects, we formally define a sum-zero constrained intrinsic CAR prior which is actually a singular multivariate Gaussian distribution that is proper.
We provide further details and explanations in the following sections.In Section 2, we introduce notation and describe the Gaussian hierarchical model with unstructured random effects and intrinsic CAR spatial random effects.We also describe how to express the constrained intrinsic CAR prior as the limit of a proper CAR prior.We derive explicit expressions for the reference prior, the independence Jeffreys prior, and the Jeffreys-rule prior for this model and discuss posterior propriety in Section 3. We describe the details of the MCMC algorithm we use to simulate from the posterior distribution in Section 4. In Section 5, we present results of a simulation study to assess the frequentist properties of the Bayesian analyses using several competing priors.To illustrate and compare our proposed method to other common approaches, we conduct an analysis using the proposed reference prior and two frequently used subjective priors to model 2012 foreclosure rates in the 88 counties of Ohio in Section 6.Finally, we provide our conclusions and discussion of future work in Section 7.For clarity of exposition, the proofs of the theoretical results are provided in the Appendix section of the Supplementary Material (Keefe et al., 2019).

Model
Consider a contiguous geographical region of interest that is partitioned into n disjoint subregions that collectively make up the entire region of interest.For example, a state could be divided into several counties.Additionally, suppose that a neighborhood structure is considered for the region of interest such that {N j ; j = 1, . . ., n} denotes the set of subregions that are neighbors of subregion j.Typically, subregions that share a boundary are considered neighbors.Within this framework, consider the following model: where Y is the n × 1 vector containing the response variable, X is a n × p matrix of covariates, and β is the p × 1 vector of fixed effect regression coefficients.Furthermore, we assume that θ = (θ 1 , θ 2 , . . ., θ n ) T is a vector of unstructured random effects defined such that θ i are independent and normally distributed with mean 0 and variance σ 2 for i = 1, 2, . . ., n.Finally, φ = (φ 1 , φ 2 , . . ., φ n ) T is a vector of spatial random effects that is assigned an intrinsic CAR prior (Besag and Kooperberg, 1995), with the sum-zero constraint n i=1 φ i = 0. Additionally, we assume that θ and φ are independent a priori.In order to demonstrate how to obtain the sum-zero constrained intrinsic CAR prior, we first provide background information on how intrinsic CAR models are typically defined in Section 2.2.Then, we describe a formal procedure to obtain the sum-zero constrained intrinsic CAR prior in Section 2.3 that is used in our model formulation.

Background on Intrinsic CAR Models
To distinguish our formal specification of the intrinsic CAR spatial random effects from the traditional specification, we reserve φ for our formal definition and use ω to provide background information on intrinsic CAR models as they are frequently used in practice.Typically, intrinsic CAR models are defined by their conditional distributions, which are proper distributions.Consider the frequently used intrinsic CAR model for ω = (ω 1 , ω 2 , . . ., ω n ) T specified by its conditional distributions where ω ∼i is the vector of the CAR elements for all subregions except subregion i and τ ω > 0 is a precision parameter.In addition, g ij ≥ 0 is a measure of how similar subregions i and j are, g ij = g ji , and h i = n j=1 g ij .Alternatively, we may write the joint density for ω as where H is a symmetric, positive semi-definite precision matrix defined as The matrix H is assumed to be fixed and known, as its structure is typically chosen as a modeling decision.One common choice for the similarity measure is g ij = 1 if subregions i and j are neighbors, and g ij = 0 if subregions i and j are not neighbors.This binary similarity measure implies that h i is the number of neighbors of subregion i.Note that (3) is a multivariate normal kernel specified by its precision matrix τ ω H. Furthermore, note that the matrix H is singular with one eigenvalue equal to zero and corresponding eigenvector n −1/2 1 n , where 1 n denotes a n × 1 vector of ones.As a consequence, the density given in (3) does not change if we substitute ω by ω + a1 n , where a is any real constant.Because of that, (3) is an improper "density".By improper, we mean that the integral of the "density" with respect to ω diverges to ∞.To make this density proper, practitioners usually assume the sum-zero constraint n i=1 ω i = 0.This constraint is frequently imposed in an informal manner with the vector of spatial random effects being recentered around its own mean at the end of each MCMC iteration.While ingenious, this mathematically informal way to impose the sum-zero constraint obscures the actual joint prior density of ω under the constraint.Consequently, a hierarchical model with spatial random effects defined by (3) with the sum-zero constraint n i=1 ω i = 0 imposed within the MCMC algorithm does not yield itself to a formal objective Bayes analysis.
To enable a formal objective Bayes analysis, in Section 2.3 we consider an intrinsic CAR expressed as the limit of a proper CAR, rather than starting with an intrinsic CAR.We impose the sum-zero constraint before taking the limit.The resulting joint density for the spatial random effects after taking the limit is proper.

Intrinsic CAR as the Limit of a Proper CAR
In the following development, we consider the following notation to clearly demonstrate the transition between different CAR models.First, we consider a proper CAR for the random vector φ * * , then impose the sum-zero constraint to obtain the distribution of the random vector φ * , and finally consider the limiting case that leads to the sum-zero constrained intrinsic CAR for the random vector φ.The intrinsic CAR prior is a limiting case of a proper CAR prior.Specifically, we use a signal-to-noise ratio parametrization to express the proper CAR prior as where σ 2 and τ c > 0 are unknown parameters, λ > 0 is a propriety parameter, and Σ −1 λ = λI n + H with I n being the n × n identity matrix and H is defined as in (4).Note that rather than using a CAR model directly for the data, we consider a hierarchical model with both conditional autoregressive spatial random effects and unstructured random effects.Because of this, we use a signal-to-noise ratio parametrization of the CAR prior where the variance of the spatial random effects is expressed as a function of the variance of the unstructured random effects, as seen in expression (5).When λ → 0, φ * * in (5) approaches an intrinsic CAR (Besag et al., 1991;Besag and Kooperberg, 1995).In practice, the intrinsic CAR spatial random effects are constrained to sum to zero to ensure propriety of the prior.This is often performed within the posterior sampling algorithm by centering the sampled random effects after each iteration.Rather than taking this approach, we first consider the proper CAR in (5), then we impose the sum-zero constraint, and finally we take the limit as λ → 0 to obtain a constrained intrinsic CAR.
To impose the sum-zero constraint for the CAR prior in (5), we project φ * * onto the subspace of R n that is orthogonal to the subspace spanned by the vector n −1/2 1 n to obtain φ * .By construction, n i=1 φ * i = 0. Specifically, we define φ * = P φ * * , where P = (I n − n −1 1 n 1 T n ) is a centering (projection) matrix.After imposing the constraint, the distribution of φ * is given by where Σ φλ = P Σ λ P T .Now, suppose that the spectral decomposition of H is given by H = QDQ T where Q = (q 1 , q 2 , . . ., q n ) is a n × n matrix comprised of columns which are the normalized eigenvectors of H and Additionally, if all of the subregions that completely partition the region of interest are connected (i.e.there is a path connecting any two subregions), then H has rank n−1.Consequently, 0 is an eigenvalue of H with multiplicity 1 (e.g.see De Oliveira and Ferreira (2011)).As a result, d n−1 > d n = 0 and d n has a corresponding eigenvector q n = n −1/2 1 n .
Lemma 1.If all of the subregions that completely partition the region of interest are connected, then Σ φλ has rank n − 1 and the spectral decomposition of Σ φλ is Σ φλ = QM λ Q T , where M λ = diag((λ + d 1 ) −1 , . . ., (λ + d n−1 ) −1 , 0).Finally, Σ φλ can be written as Proof.See Appendix in Supplementary Material Now we take the limit as λ → 0 to obtain the final density for φ which appears in our model from (1) given by where (Penrose, 1955).Thus, the vector of spatial random effects φ has a singular Gaussian distribution (Muirhead, 2009;Ferreira et al., 2011).Further, we note that the integral of the corresponding density with respect to φ is equal to one.Hence, this singular Gaussian distribution is proper.In addition, Ferreira (2018) shows that the singular Gaussian distribution in (2.8) is the limiting distribution of a one-at-a-time Gibbs sampler applied to the intrinsic CAR prior in (2.2) with centering on the fly.Therefore, our reference prior is directly applicable to the Gaussian hierarchical models with intrinsic CAR priors widely used in practice.
A possible concern raised by Lavine and Hodges (2012) is that there are several ways of approaching an improper intrinsic CAR as the limit of a proper CAR.Without imposing the sum-zero constraint, Lavine and Hodges (2012) showed that different ways to approach the limit may lead to different functions of τ ω appearing in the constant of proportionality in (2.3).As a consequence, the likelihood function for τ ω would not be well-defined and formal Bayesian inference would not be possible.To address this issue, Keefe et al. (2018) have shown that different ways of approaching the limit are equivalent to using different initial proper CAR models.In particular, Keefe et al. (2018) consider proper CAR models given by (2.5) but substituting the identity matrix I n by a matrix K to obtain the precision matrix Σ −1 λ = λK + H. Keefe et al. (2018) consider for K the class of symmetric positive semidefinite matrices such that the sum of its elements is positive.This construction can represent two of the ways of approaching the limit considered by Lavine and Hodges (2012).Keefe et al. (2018) have shown that for any matrix K in this class, if the sum-zero constraint is formally imposed before taking the limit, then sum-zero constrained intrinsic CAR model does not depend on K and is, therefore, unique.As a consequence, for any such matrix K, the sum-zero constrained intrinsic CAR density has a well-defined constant of proportionality that is a unique function of τ c .Therefore, for the hierarchical model with the sum-zero constrained intrinsic CAR random effects we consider, the likelihood function is well-defined.The resulting sum-zero constrained intrinsic CAR prior for φ in (2.8) is a singular Gaussian distribution, which is amenable for an objective Bayes analysis.
Thus, it follows that the model for the response Y with the spatial random effects

Remarks
Note that we assume a signal-to-noise ratio parametrization for the variance components of the random components of the model, as is done in De Oliveira (2007).One of the benefits of this parametrization is that it leads to simpler expressions for the objective priors.The unknown model parameters for this hierarchical spatial model for areal data Of particular interest is the parameter τ c , which controls the strength of spatial dependence.Conditional on the neighborhood structure chosen by the user, Σ φ is fixed and the correlation between the response for two neighboring subregions Y i and Y j is given by where σ ij denotes the element of Σ φ located in the i th row and j th column and σ ii and σ jj denote the i th and j th diagonal elements of Σ φ , respectively.Since ρ ij is a decreasing function of τ c , as τ c → 0, ρ ij increases implying that the spatial dependence also increases.As is commonly the case in stochastic processes, an increase in correlation is often accompanied by an increase in marginal variance.Take for example the firstorder autoregressive model (AR(1)) with error variance δ 2 and autoregressive coefficient 0 < ρ < 1.The correlation function for lag h is Corr(x t , x t−h ) = ρ h and the marginal variance is δ 2 /(1 − ρ 2 ).Therefore, for the widely used AR(1) time series model, increase in ρ corresponds to increase in both correlation and marginal variance.Note that in our model, Var . So, it is also the case for our model that an increase in correlation is accompanied by an increase in marginal variance.
Specifying the model using a sum-zero constrained intrinsic CAR simplifies application of the results of De Oliveira (2007) to derive several objective priors.Although these priors take on similar forms as those presented by De Oliveira ( 2007), the properties of the derived priors differ in our case.The reason for the difference is that for the geostatistical models considered by De Oliveira ( 2007), the matrix Σ φ is full rank, whereas we consider a hierarchical model for areal data where Σ φ is singular.

Likelihood Functions
In order to obtain objective priors, such as the reference, independence Jeffreys, and Jeffreys-rule priors for this model, we consider the likelihood and integrated likelihood functions.The likelihood of η given the observed response y and the matrix X is given by where Ω = I n + 1 τc Σ φ and |A| denotes the determinant of the matrix A. All of the objective priors we derive with respect to the integrated likelihoods in the following sections conveniently fall into a class of priors of the form where a ∈ R is a hyperparameter and π(τ c ) is referred to as the marginal prior for τ c .
In order to obtain the objective priors, it is first necessary to obtain the integrated likelihoods.If we first integrate L(η; y, X) with respect to β, we obtain the integrated likelihood of (σ 2 , τ c ).This integration leads to the log-integrated likelihood given by log where ) −a with respect to σ 2 yields the integrated likelihood of τ c , which is given by Unfortunately, it is difficult to calculate the reference, independence Jeffreys, and Jeffreys-rule priors directly using expressions ( 13) and ( 14).To overcome this challenge, it is particularly useful to express (13) in a simplified form involving the eigenvalue decompositions of matrices that are functions of the matrices X and Σ φ .Consider the following lemma (Verbyla, 1990;Dietrich, 1991;Kuo, 1999;De Oliveira, 2007): Lemma 2. Suppose X n×p is of full rank, with n > p, and Σ is an n × n symmetric positive definite matrix.Then there exists a full rank n × (n − p) matrix L with the following properties: Here we propose one way to obtain a matrix L with the above properties by the following steps: 2. Calculate the matrix Q * such that the columns of Q * are the normalized eigenvectors corresponding to the non-zero eigenvalues of G * .

Denote the spectral decomposition of
The resulting L matrix has properties that allow for convenient simplifications in the integrated likelihoods.Let Then, by the way we obtain the matrix L, we have that Then the following results hold (De Oliveira, 2007) where σ 2 * and τ c * are the true values of σ 2 and τ c , respectively, and {Z 2 j } iid ∼ χ 2 1 .Using these results, the log-integrated likelihood in (13) satisfies log L I (σ 2 , τ c ; y, X) The posterior distribution of η using a prior of the form given in ( 12) is proper if and only if Proposition 1.Consider the model given in ( 9) along with the prior given in ( 12).Then, L I (τ c ; y, X) is a continuous function on (0, ∞) and

Proof. See Appendix in Supplementary Material
The condition in (18) and the tail behavior of the integrated likelihood in Proposition 1 provide justification for determining whether or not the proposed priors lead to proper posterior distributions.For example, the tail behavior of the posterior distribution as τ c → 0 must be O(τ m c ) where m > −1 in order to guarantee posterior propriety.Likewise, the tail behavior of the posterior distribution as τ c → ∞ must be O(τ m c ) where m < −1 in order to guarantee posterior propriety.

Reference Prior
In order to obtain a reference prior for the model parameters η, it is necessary to identify an order of importance for the parameters.Here, we consider β to be a nuisance parameter, while (σ 2 , τ c ) is the parameter vector of interest.Then, we use exact marginalization to find the reference prior (Berger et al., 2001).First, the joint prior distribution of η is factored as Then given (σ 2 , τ c ), the conditional reference prior for β is π R (β|σ 2 , τ c ) ∝ 1 (Bernardo and Smith, 1994).Finally, we use the Jeffreys-rule algorithm on the integrated likelihood Theorem 1.Consider the model given in ( 9).Then, the reference prior of η is of the form ( 12) with

Proof. See Appendix in Supplementary Material
The following proposition and corollary describe the properties of the proposed reference prior and its resulting posterior distribution.

Proof. See Appendix in Supplementary Material
Corollary 1.Using the model given by ( 9), we have (i) The reference prior π R (τ c ) given in ( 19) and its resulting posterior π R (η|y, X) are both proper.
(ii) The k th moment of the marginal reference posterior π R (τ c |y, X) does not exist for k ≥ 1.

Proof. See Appendix in Supplementary Material
Notice that the reference prior depends on the covariates X through the eigenvalues ξ 1 , ξ 2 , . . ., ξ n−p of L T Σ φ L. We also note that since our proposed reference prior has the same mathematical expression as that of De Oliveira (2007), the results regarding the marginal reference prior, the joint posterior distribution, and the marginal posterior distribution are all the same.However, the same is not true for the independence Jeffreys and Jeffreys-rule priors that follow.This is due to the fact that, in our case, L T Σ φ L is full rank, while Σ φ is not full rank.

Independence Jeffreys Prior
The independence Jeffreys prior is obtained by assuming that β and (σ 2 , τ c ) are independent a priori.Then, the Jeffreys rule is used to find the marginal prior of each parameter, assuming the other parameters are known.We denote the ordered eigenvalues of Σ φ as It is important to note that in our case, we are dealing with areal data, and thus γ n = 0 since Σ φ is of rank n − 1.This is a key distinction between our work and that of De Oliveira (2007), which leads to different results regarding posterior propriety.

Theorem 2. Consider the model given in (9). Then, the independence Jeffreys prior of η is of the form (12) with
Proof.See Appendix in Supplementary Material Proposition 3. The marginal independence Jeffreys prior of τ c in ( 20) is a continuous function on (0, ∞) where π J1 (τ c ) = O(τ −1 c ) as τ c → 0.

Proof. See Appendix in Supplementary Material
Corollary 2. Using the model given by ( 9), the independence Jeffreys prior π J1 (τ c ) given in ( 20) and its resulting posterior π J1 (η|y, X) are both improper.
Proof.See Appendix in Supplementary Material
Theorem 3. Consider the model given in ( 9).Then, the Jeffreys-rule prior of η, is of the form ( 12) with Proof.See Appendix in Supplementary Material ) as τ c → 0.

Proof. See Appendix in Supplementary Material
Corollary 3. Using the model given by ( 9), the posterior distribution obtained by using the Jeffreys-rule prior π J2 (τ c ) given in ( 21) is improper.

Proof. See Appendix in Supplementary Material
Similar to the reference prior, notice that the Jeffreys-rule prior also depends on the covariates X through the eigenvalues ξ 1 , ξ 2 , . . ., ξ n−p of L T Σ φ L.

Remarks
There are a few key distinctions between our results and those of other researchers.Specifically, our reference prior leads to a proper posterior distribution, while the independence Jeffreys and Jeffreys-rule priors lead to improper posterior distributions, unlike De Oliveira (2007).Although we consider areal data while De Oliveira (2007) has considered geostatistical data, the reference prior in (19) has the same mathematical expression as that proposed by De Oliveira (2007) because we choose to formulate our model with the intrinsic CAR component as the limit of a proper CAR.However, the matrix Σ φ is not full rank in the areal data setting.Thus, the smallest eigenvalue of Σ φ is γ n = 0, which leads to improper posterior distributions when the independence Jeffreys and Jeffreys-rule priors are used.Ren and Sun (2014) have defined their proper CAR prior using conditional distributions with a constant variance that does not depend on the number of neighbors a subregion has, which limits utility for many applications.In contrast, the conditional specification of the CAR prior we use has a variance that depends on the number of neighboring subregions.This allows the conditional distribution of φ i given φ ∼i for subregions with more neighbors to have a smaller variance.Furthermore, the objective priors that have been derived by Ren and Sun (2014) do not have explicit mathematical forms.Specifically, the priors of Ren and Sun (2014) are written as functions of determinants of Fisher information matrices that do not resemble our reference prior.In contrast, the expression of our reference prior can be easily implemented with its explicit mathematical form.

Sampling from the Posterior Distribution
In order to perform posterior inference about the model parameters, we propose a MCMC algorithm.Specifically, we have implemented a MCMC sampler (Gelfand and Smith, 1990) that considers the integrated likelihood given in (9) obtained by integrating out the spatial random effects φ.Our MCMC sampler is a Metropolis-within-Gibbs algorithm that includes a Gibbs step for β and a joint Metropolis-Hastings step for τ c and σ 2 .Additionally, we simulate φ using composite sampling.
The use of the signal-to-noise ratio parametrization results in a Fisher information matrix of log(σ 2 ) and log(τ c ) which does not depend on the value of σ 2 .This implies a uniform prior for log(σ 2 ).Thus, in sampling from the posterior distribution, we jointly propose values for log(σ 2 ) and log(τ c ) with a Metropolis-Hastings step using Normal proposal distributions, and accept proposed values based on the appropriate Metropolis-Hastings acceptance probability (Gamerman and Lopes, 2006;Robert and Casella, 2004).Posterior inference can then be conducted directly from the MCMC samples after discarding an appropriate number of iterations as burn-in.While we choose to use a MCMC sampler, there are numerous alternative methods that could be used to obtain samples from the posterior distribution.For example, De Oliveira (2007) has implemented adaptive rejection Metropolis sampling (Gilks et al., 1995).The steps of our MCMC algorithm are provided in Algorithm 1.
Since the matrix Σ φ can be conveniently expressed as a function of eigenvectors and eigenvalues of H, as shown in ( 7), several computational equivalences can be used to speed up the MCMC algorithm.Specifically, execution of the MCMC algorithm relies on computation of the determinant and inverse of the matrix (I + τ −1 c Σ φ ).In this work, these quantities can be expressed by the following equivalences , δ t ) 3. Compute joint acceptance probability for σ 2 * and τ * c : Use composite sampling to generate φ (i) (i = 1, . . ., K) from its full conditional distribution which is provided in the Appendix in the Supplementary Material.

Comparison of Priors
We use Monte Carlo simulation to compare the performance of the proposed reference prior with two commonly used vague naïve prior distributions from the literature.Because the independence Jeffreys and Jeffreys-rule priors lead to improper posterior distributions, we have not included them in this simulation study.Performance is as-sessed using frequentist properties of Bayesian procedures, including interval coverage rate, average interval length (IL), and mean squared error (MSE).
The vague naïve prior distributions we consider come from the CARBayes R package (Lee, 2013) and Best et al. (1999).Both of these approaches assume gamma(shape, rate) prior distributions for the precisions of both unstructured and spatial random effects.Specifically, the CARBayes package (version 4.0) has implemented gamma(0.001,0.001) prior distributions as the default for both precision parameters.Best et al. (1999) has used gamma(0.001,0.001) and gamma(0.1,0.1) prior distributions for the precisions of the unstructured and spatial random effects, respectively.Previously, the CARBayes package used uniform prior distributions which are no longer available.We adopt the prior distributions implied by the CARBayes package and Best et al. (1999) under our signal-to-noise ratio parametrization.For the purpose of reporting, we refer to these methods as CARBayes and NB (lead author initials), respectively.
Since we have adopted a signal-to-noise ratio parametrization in this work, the gamma(α 1 , β 1 ) and gamma(α 2 , β 2 ) prior distributions for the precisions of the unstructured and spatial random effects, respectively, are re-parametrized.The implied marginal prior for τ c under the signal-to-noise ratio parametrization is given by where Γ(•) is the Gamma function.
When comparing priors, it is important to consider not only the regions in the parameter space that receive the most mass, but also the rate of decay of the tail of each prior.It is clear from (25) that with regard to tail behavior, as ).Thus, for values of α 1 and α 2 close to 0, the vague naïve prior is close to impropriety.From Proposition 2, we can see that the tail behavior of the reference prior is considerably different than that of the CARBayes and NB priors.Consequently, the slower decay rate as τ c → ∞ implies that the CARBayes and NB priors place significantly more mass on large values of τ c . Figure 1(a) provides a plot of each of the prior distributions for τ c that we consider.The plot of log π(τ c ) versus log 10 τ c provided in Figure 1(b) shows that the reference prior places more mass on values of τ c between 0.01 and 56 (CARBayes) and 0.02 and 35 (NB), while the CARBayes and NB priors place significantly more mass on large values of τ c .
In addition, the CARBayes and NB priors put more mass on very small values of τ c .Our practical experience indicates that real spatial datasets will usually have estimated values of τ c between 0.1 and 10.For example, in the application considered in Section 6 the estimated values of τ c for the three considered priors range from about 0.15 to about 0.25.Further, because we use a signal-to-noise ratio parametrization, values of τ c much less than 0.01 are not of practical use.Note that τ c < 0.01 implies that the conditional variance of the structured random effects is greater than 100 times the variance of the unstructured random effects.In that case, the analyst should consider a model that does not include unstructured random effects.Hence, in the rare case when for a given dataset the estimate of τ c is less than 0.01, then instead of the hierarchical model given in (1), the analyst should consider a proper CAR model directly for the observations (e.g., see Ferreira and De Oliveira, 2007).
In order to better understand the importance of the parameter τ c as it relates to the strength of spatial dependence, we consider the Kullback-Leibler divergence between the spatial model and the independent data model as a function of τ c .Specifically, if p i and p s correspond to the independent and spatial models, respectively, we consider the Kullback-Leibler divergence given by

KL(p
Note that as τ c → ∞, KL(p i p s ) → 0 indicating that the spatial and independent data models are nearly identical for large values of τ c .Specifically, Figure 1(c) shows that for values of τ c > 10, the Kullback-Leibler divergence per observation is nearly zero, indicating that the spatial model we consider is only relevant for smaller values of τ c .The CARBayes and NB priors unnecessarily place significantly more mass on large values of τ c .In contrast, the reference prior intuitively places mass on values of τ c that are typical of spatially dependent data.

Simulation Design
For the design of our simulations, we consider square regions with three different sample sizes of n = 5 2 , 7 2 , 10 2 .We fix σ 2 = 2, while considering values of τ c = 0.01, 0.032, 0.1, 0.32, 1, 3.2, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100.To explore how the number of neighbors which exert spatial dependence on a subregion affects inferential performance, we investigate both first-and second-order neighborhood structure (i.e., diagonal subregions are not/are considered neighbors, respectively).We also consider p = 1 (intercept only) with β = 1 and p = 6 with β = (−3, −2, −1, 1, 2, 3) T .All covariates are generated from a normal distribution with mean 0 and variance 1.We generate results based on 1,000 simulated data sets for each combination of these levels of n, τ c , neighborhood structure, and p.For each data set, 15,000 MCMC iterations are obtained with the first 5,000 iterations discarded as burn-in.The Gelman-Rubin convergence diagnostic (Gelman and Rubin, 1992) for τ c and σ 2 for three MCMC chains using one setting of parameter values was used to determine the number of MCMC iterations needed to obtain convergence.Gelman-Rubin convergence diagnostics for τ c and σ 2 were calculated to be less than 1.01, indicating that 15,000 MCMC iterations is sufficient for our simulation study.The variances of the proposal distributions described in Section 4 were chosen to be δ s = δ t = 0.5 yielding an acceptance rate close to 40%.Due to the heavily right-skewed posterior density for τ c , we consider 95% highest posterior density (HPD) regions for τ c , while 95% equal-tailed credible intervals are considered for σ 2 and β.
Performance results pertaining to τ c less than or equal to 10, first-order neighborhood structure, and p = 6 are displayed graphically and discussed in Section 5.3.For the sake of brevity, performance results pertaining to β, σ 2 , second-order neighborhood structure, or p = 1 are described in Section 5.3 and tabulated in the Supplementary Material.To investigate the case of a larger number of regressors, we have also performed a simulation study with p = 20 when n = 100; because the results are qualitatively similar to the results discussed in Section 5.3, we do not include them here.Simulation results for values of τ c > 10 indicate that the performance of all considered priors deteriorates as τ c increases (see Simulation Results in Supplementary Material; To make the information easier to digest, in addition to the values of τ c less or equal to 10, the tables provided in the Supplementary Material include only three representative values of τ c greater than 10: 40, 70, and 100.).In particular, the frequentist coverages of the reference and NB credible intervals for τ c decrease substantially as τ c increases much beyond 10.The frequentist coverage of the CARBayes credible interval remains above nominal, but the CARBayes credible interval is so wide that it becomes of no practical use.The deterioration of performance of the reference credible interval is not a problem, because as illustrated by the Kullback-Leibler divergence shown in Figure 1(c), larger values of τ c indicate that the observations are nearly independent, which may imply that a model without spatial dependence would be preferred.We focus on values of τ c < 10, since our reference prior is intended for a model that accounts for spatial dependence.

Simulation Results
Figure 2 displays the frequentist interval coverage rate and average IL (on the log 10 scale) for τ c resulting from the reference, CARBayes, and NB priors across a range of values for τ c (on the log 10 scale) for various sample sizes, assuming p = 6 and firstorder neighborhood structure.Figure 2 demonstrates that the reference prior achieves nominal interval coverage and low average interval length as n increases across a range of values for τ c .Further, the CARBayes and NB priors exhibit a dip in interval coverage for values of τ c corresponding to strong spatial dependence.The CARBayes prior yields particularly wide average interval lengths, potentially detracting from the value of its close-to-nominal coverage.These results demonstrate that, when compared to the two gamma priors, the reference prior leads to favorable coverage and interval length.
Mean squared error of the posterior median of τ c is shown in Figure 3, which demonstrates that use of the reference prior leads to advantageous estimation properties across all sample sizes, especially smaller sample sizes.Use of the CARBayes prior leads to considerably higher MSE than that of the other priors.
All credible intervals for each of the regression coefficients β achieve the nominal level for frequentist coverage with comparable interval lengths across all three priors.Note that for the tabulated results in the Supplementary Material for the case of p = 6, the frequentist coverage and average interval length are averaged across all six of the regression coefficients since results were comparable for each of the regression coefficients.Additionally, all three priors show reasonable performance in terms of frequentist coverage and average interval length for σ 2 .Results for second-order neighborhood structure are qualitatively similar to those presented for first-order neighborhood structure.Likewise, results for p = 1 are qualitatively similar to those for p = 6.
An intriguing result is that, even though Figure 1(b) seems to show that the marginal CARBayes and NB priors for τ c are fairly similar, the frequentist properties of the CAR-Bayes and NB posteriors can be very different.The issue here is that in the CARBayes and NB specifications, τ c and σ 2 are not independent a priori.Specifically, it is straightforward to show that in both CARBayes and NB specifications, the conditional prior for σ 2 given τ c is IG(α 1 + α 2 , β 1 + β 2 τ c ), where IG(., .)corresponds to an inverse gamma distribution defined by its shape and scale parameters, respectively.Hence, the joint posteriors of σ 2 and τ c (and the marginal posteriors of τ c ) under the CARBayes and  Note that since the y-axis is on the log 10 scale, the difference in MSE of the posterior median of τ c between the reference prior and the CARBayes prior is considerable.
NB specifications may differ substantially.Therefore, even thought the marginal priors for τ c under the CARBayes and NB specifications behave similarly, the frequentist properties of CARBayes and NB procedures for τ c may be very different.This is illustrated in Figure S1 (Supplementary Material) that shows the impact of the joint prior of (σ 2 , τ c ) on the posterior inference for these parameters for a difficult dataset.Specifically, Figure S1 shows contour plots of (a) integrated likelihood of (σ 2 , τ c ), and joint posterior density of (σ 2 , τ c ) based on (b) reference prior, (c) NB prior, and (d) CARBayes prior.The dataset was simulated with a first-order neighborhood structure, square regular grid, n = 49, and parameters β = 1, σ 2 = 2, and τ c = 0.1.Contours correspond to HPD regions with credible levels equal to 10%, 20%, . .., 90%.
The integrated likelihood behaves badly for this dataset and at first glance seems not to provide much information about σ 2 and τ c .We have found empirically that the probability of such a bad integrated likelihood behavior increases as the Moran-I statistic decreases.For the particular dataset used to produce Figure S1, the Moran-I statistic is equal to 0.203 with a corresponding p-value of 0.0181 (null hypothesis is of no spatial dependence).This is not an extreme dataset; under the conditions used to simulate this dataset, the probability of the Moran-I statistic being less than 0.203 is about 0.12.For this dataset, the CARBayes analysis provides unacceptably large HPD regions.The NB analysis provides somewhat smaller HPD regions than the CARBayes analysis.Finally, in this case the reference analysis provides much smaller HPD regions.True parameter values are indicated with a red dot, and the posterior median under each prior is indicated with a red star.The posterior medians of τ c are equal to 1.0, 2.2, and 3.2 for the reference, NB, and CARBayes analyses, respectively.Hence, analyses based on the reference, NB, and CARBayes priors may differ substantially.Finally, for this dataset the reference analysis provides much tighter HPD regions and a much better estimate of τ c .
Finally, Figure S2 in the Supplementary Material shows the behavior of the integrated likelihood and the posterior densities for a dataset with Moran-I statistic of moderate size.Specifically, for this dataset the Moran-I statistic is equal to 0.363 with a corresponding p-value of 0.00017 (null hypothesis is of no spatial dependence).This Moran-I statistic is close to the median of the sampling distribution of the Moran-I statistics.The simulation setting was the same as in the previous paragraph: first-order neighborhood structure, square regular grid, n = 49, and parameters β = 1, σ 2 = 2, and τ c = 0.1.For this dataset, the integrated likelihood behaves much better than the one discussed in the previous paragraph.However, the integrated likelihood tends to put much of its mass at values of σ 2 and τ c larger than the true values.The reference prior corrects the integrated likelihood behavior and leads to a reference posterior density that has much of its mass close to the true values of the parameters.For this dataset, the NB and CARBayes priors tend to shrink the posterior density too much towards zero.Specifically, the posterior medians of τ c are equal to 0.095 (reference), 0.017 (NB), and 0.015 (CARBayes), while the posterior medians of σ 2 are 2.8 (reference), 0.6 (NB), and 0.6 (CARBayes).Thus, for this dataset the NB and CARBayes analyses are actually fairly similar and severely underestimate both σ 2 and τ c .Meanwhile, the reference analysis provides a reasonable correction to the integrated likelihood and yields estimates for σ 2 and τ c much closer to the true values of these parameters.Such beneficial correction of badly behaved likelihood functions by reference (and other non-informative) priors has been previously reported in other statistical inference problems such as for example in the analysis of generalized linear models (Firth, 1993), linear regression with Student-t errors (Fonseca et al., 2008) and exponential power errors (Salazar et al., 2012;Ferreira and Salazar, 2014), and analysis of elapsed times in continuous time Markov chains (Ferreira and Suchard, 2008) used in phylogenetic tree reconstruction (Bouckaert et al., 2014).
In summary, the proposed reference prior exhibits a favorable combination of high interval coverage, short average interval length, and low MSE relative to the CARBayes and NB priors, in addition to philosophical appeal and practical convenience relative to vague naïve priors previously used in the literature.

Data Description
To illustrate an objective Bayesian analysis using the model in ( 9) along with the reference prior, we consider a data set containing foreclosure rates as a proportion of all housing transactions for each of the 88 counties in the state of Ohio for the year 2012.This data set is a subset of a larger database of approximately 54 million records from municipalities across the United States between 2005 and 2014 that we obtained from our partnerships with the Virginia Center for Housing Research and Metrostudy, a Hanley Wood Company.The data include very fine spatial and temporal information, including latitude, longitude, and sale date of each closing record.We choose to aggregate the data at the county level to illustrate this analysis.For more details describing how records were classified as foreclosures, see Keefe et al. (2017).In addition to foreclosure rates, county unemployment rates are also available as a potential covariate to be used in modeling (Bureau of Labor Statistics, 2012).For each county, the total number of housing transactions and the observed number of foreclosures are available.
Analogously to what is typically done in disease mapping, we consider the standardized morbidity ratio (SMR) for each county, defined by SMR i = O i /E i , where O i is the observed number of foreclosures in county i and E i is the expected number of foreclosures in county i.The expected counts are calculated by E i = n i ( j O j / j n j ), where n i is the total number of housing transactions in county i.A map of the SMRs for foreclosure rates in Ohio counties for 2012 is shown in Figure 4(a).Figure 4(b) shows a map of the unemployment rates in 2012 for each county.

Modeling
Using the 2012 foreclosure rate data set, we fit the model given in (9) using the reference, CARBayes, and NB priors in order to compare the results using the proposed reference prior to the results using the priors used frequently in practice.We consider y i = log(SMR i ) as the response variable and unemployment rate as a covariate.The posterior distribution was sampled using the MCMC algorithm described in Section 4 with 25,000 iterations where the first 5,000 were discarded as burn-in.Gelman-Rubin convergence diagnostics were calculated using four MCMC chains with the reference prior implemented with different starting values for β = (β 0 , β 1 ) T , τ c and σ 2 .The 25,000 MCMC iterations with 5,000 burn-in iterations yielded Gelman-Rubin diagnostics that were all below 1.005, indicating acceptable posterior convergence.Posterior summaries, including posterior medians and credible intervals for the parameters are provided in Table 1.Note that 95% HPD credible intervals are used for τ c , while 95% equal-tailed credible intervals are used for β = (β 0 , β 1 ) T and σ 2 .
From Table 1, we see that posterior estimates for the fixed effect regression coefficients are all similar.The estimates for τ c and σ 2 vary quite a bit for the different priors.In particular, the analysis using the NB prior implied much stronger spatial dependence than the reference prior with an estimate of τ c closer to 0. The estimated values of log 10 τ c for all three analyses are between −0.8 and −0.55, which according to our simulation study are values of τ c for which the proposed reference prior provides superior results in terms of interval coverage, average interval length, and MSE for τ c .Thus, we should trust the resulting inference based on the reference prior more than the other priors.Furthermore, note that the credible interval for τ c is wider when the CARBayes prior is used, while use of the reference and NB priors results in more narrow credible intervals.This supports the findings of our simulation study regarding the average interval length for τ c .Although the credible interval for τ c using the CARBayes prior is only slightly wider in the case study, the simulation study results illustrate the potential danger of using this particular vague prior.While the analysis of foreclosure rates does not result in undesirably wide credible intervals for τ c when the CARBayes prior is used, our simulation study shows that other data sets may lead to credible intervals for τ c that are orders of magnitude wider than those obtained when the reference prior is used.Unreasonably wide credible intervals for τ c could lead practitioners to decide that a spatial model is not necessary for the data, resulting in the incorrect use of a non-spatial model.We thus recommend use of the reference prior.
The estimate of the regression coefficient for unemployment rate is positive and its credible interval does not contain zero in all three cases.This implies that, on average, as unemployment rate increases, foreclosure rates also increase.Consider the posterior distribution of the expected SMR in county i given by E[SMR i ] = exp{β 0 + β 1 x i + φ i }, where x i is the unemployment rate of county i for i = 1, 2, . . ., n. Figures 4(c) and 4(d) show maps of the posterior median and posterior standard deviation, respectively, of the expected SMR values computed directly from the results of the MCMC algorithm using the reference prior.These maps show that counties with higher unemployment rates tend to have higher risk of foreclosure than counties with lower unemployment rates.
As shown in the simulation study, the main selling point of the reference prior is that, in the absence of prior information, it allows a more realistic assessment of the uncertainty for the several parameters.This actually makes difference in our case study.For example, we may be interested in identifying clusters of counties with risk higher than predicted by the regressors.If that is the case, we would analyze the spatial random effects.A common decision rule would be to say that counties for which P (φ i > 0|Y ) > 0.95 would have risk higher than predicted by the regressors.With respect to counties with risk higher than predicted by the regressor, the reference, CARBayes, and NB analyses identify 3, 4, and 9 counties respectively.Similarly to the results presented in Table 1, for this particular dataset the CARBayes analysis provides results that are in between those of the reference and the NB analyses.Further, the NB analysis identifies many more counties than the reference analysis.Thus, the analyses based on the different priors lead to very different conclusions in terms of what counties have risk of foreclosure higher than predicted by the regressor.That is because such analyses are based on tail probabilities that depend crucially on an appropriate assess-ment of uncertainty through the variance parameters σ 2 and τ c .As our simulation study indicates, amongst the analyses considered in this study, the reference prior provides the best uncertainty quantification for these parameters and is, therefore, preferable.
It is often difficult to choose sensible hyperparameters for the commonly used priors for this type of spatial model.For instance, it might be difficult for a housing market expert to articulate their understanding of the spatial dependence among counties' foreclosure rates in such a way as to inform prior specification for a Bayesian hierarchical model.The proposed reference prior is an appealing alternative because it is automatic.It does not require the choice of hyperparameters and has favorable performance.This reference prior is useful in situations where researchers are unsure of how to subjectively choose priors for areal data with a nugget effect.

Discussion
This paper presents a fully Bayesian analysis for a commonly used Gaussian hierarchical model for spatial data.We have derived explicit expressions for several objective priors, including the reference, independence Jeffreys, and Jeffreys-rule priors for the Gaussian hierarchical model with an intrinsic CAR prior for the spatial random effects.We have shown that the reference prior results in a proper posterior distribution, while the independence Jeffreys and Jeffreys-rule priors lead to improper posterior distributions.Furthermore, we have studied frequentist properties of Bayesian procedures using the proposed reference prior and two commonly used priors for this model.
We have determined that the reference prior leads to a combination of favorable frequentist coverage, average interval length, and mean squared error relative to two commonly used priors.We have demonstrated using the Kullback-Leibler divergence that focus should be placed on small values of τ c , which correspond to strong spatial dependence.So, while inferential performance deteriorates across all considered priors for larger values of τ c , this is not a concern since perhaps a model without spatial dependence would be preferred in this situation.More importantly, the reference prior performs better than the competing priors for smaller values of τ c in terms of frequentist coverage, average interval length, and mean squared error.
Additionally, the reference prior approach obviates the need to subjectively specify hyperparameters, allowing for an automatic Bayesian analysis.This philosophy works well for hierarchical modeling when interpretation of hyperparameters and elicitation of meaningful priors is difficult.Often times, practitioners choose to use a vague naïve prior in the absence of prior information or understanding of the problem.However, in situations such as the one considered here, where an improper prior for τ c would lead to an improper posterior, the use of a vague naïve prior only masks the impropriety issue, rather than solving it (Berger, 2006).By contrast, an objective prior like the reference prior proposed here, will lead to a proper posterior distribution and will let the data speak for themselves.
Although much research has been done on objective Bayesian analysis for spatial models, such as geostatistical models (Berger et al., 2001;De Oliveira, 2007), proper CAR models for the observed data (Ferreira and De Oliveira, 2007;De Oliveira, 2012;Ren and Sun, 2013), and proper CAR models for latent process models for areal data (Ren and Sun, 2014), the hierarchical model with an intrinsic CAR prior that we consider here is one of the most popularly used models in applications of disease mapping and other areas of research.All intrinsic CAR models require subjective decisions regarding rigorous specification and additional constraints to ensure propriety (Lavine and Hodges, 2012).Keefe et al. (2018) have shown that if we first impose the sumzero constraint and then consider the limit to obtain the sum-zero constrained intrinsic CAR, there is a broad class of proper CAR priors that all result in the unique intrinsic CAR prior given in Equation (8).Our development of an appropriate objective Bayesian analysis for this model will hopefully help researchers analyze areal data.
There are many possible avenues for future research.For example, this work could be extended by developing objective priors for areal models for non-Gaussian responses, such as counts and rates that can be more accessible to disease mapping applications such as the one described in Section 6.Other future work may also include the study and development of objective priors for spatio-temporal models for areal data, as well as the effect of the choice of priors for these types of models.

Figure 1 :
Figure 1: (a) Raw plot of π(τ c ) and (b) plot of log π(τ c ) vs. log 10 τ c for the reference, CARBayes, and NB priors.The CARBayes and NB priors both approach ∞ as τ c → 0, while the reference prior does not.Panel (c) shows the Kullback-Leibler divergence per observation between the spatial model and the independent data model across values of τ c , indicating that large values of τ c correspond to a model for nearly independent data.

Figure 2 :
Figure 2: Frequentist coverage and log 10 average interval length (IL) for τ c for n = 100 (top row), n = 49 (middle row), and n = 25(bottom row).Reference prior shows favorable performance in terms of frequentist coverage and average interval length.

Figure 3 :
Figure 3: log 10 MSE for the posterior median of τ c for (a) n = 100, (b) n = 49, and (c) n = 25.Reference prior leads to favorable performance in terms of estimation of τ c .Note that since the y-axis is on the log 10 scale, the difference in MSE of the posterior median of τ c between the reference prior and the CARBayes prior is considerable.

Table 1 :
Posterior Summaries for Foreclosure Rate Case Study for Bayesian Analyses Using Reference, CARBayes, and NB Priors.