Linking physics and spatial statistics: A new family of Boltzmann-Gibbs random fields

We investigate a connection between spatial statistics and statistical physics to obtain new covariance functions with direct physical interpretation for spatial random fields. These covariance functions are based on the exponential Boltzmann-Gibbs representation and use an energy functional to represent interactions between the values of the random field at different points in space. This formulation results in closed-form generalized covariance functions, which display infinite variance in Euclidean spaces of dimension larger than one. We propose regularization schemes in real and reciprocal (spectral) space that lead to well-behaved covariance structures. The real-space regularization parameter allows a continuous interpolation between the Boltzmann-Gibbs covariance and the exponential covariance. We also propose discretized approximations on regular grids, and we show that they represent reparametrized versions of the well-known Besag and Leroux lattice models. We then discuss parameter estimation and spatial prediction for the regularized Boltzmann-Gibbs covariance model in two dimensions. We recommend using the pairwise difference likelihood that combines satisfactory estimation performance and good scalability with many observation points. The predictive performance of the regularized covariance function is assessed by means of cross-validation statistics. Irregularly-spaced samples from the Walker Lake dataset are used, and spatial prediction is conducted by means of ordinary kriging. The regularized Boltzmann-Gibbs covariance yields improved predictive performance compared to the exponential covariance model.


Introduction
There are strong links between statistical physics and spatial statistics. These date back to the work of Besag [6,7] on lattice autoregression models, which was inspired by the Ising model developed for magnetic systems by the physicist Ernst Ising [32], and to the seminal Monte Carlo algorithm of Nicholas Metropolis [38], which was initially developed for the simulation of liquids and today occupies a central position in Bayesian statistics and simulation-based statistical inference [42].
In statistical physics, the spatial dependence and variation of field values is usually expressed in terms of Boltzmann-Gibbs probability density functions (pdf), which have the general form f = exp(−H)/Z, where H is an energy functional, and Z is a normalization constant known as the partition function. For example, this formulation is used in statistical field theory [34]. The partition can be explicitly calculated in finite-dimensional cases if the model is Gaussian, i.e., if the energy functional is of quadratic form, but it is usually intractable in more general cases. Nonetheless, various techniques allow the approximate calculation of Z [26].
The main ideas of statistical field theory have been used to develop Boltzmann-Gibbs random fields with a new class of covariance structure [24,28,25], including cross-covariance models for multivariate data [29]. Furthermore, combining the equilibrium (i.e., purely spatial) Boltzmann-Gibbs fields with linear response theory, a new space-time covariance function was obtained [30].
The Boltzmann-Gibbs framework was also used to develop the so-called stochastic local interaction (SLI) model. This model defines a Gaussian Markov random field over the observed locations, which may be irregularly spaced, by using an appropriately specified neighborhood for each location. Then, fast numerical estimation and prediction are possible even with large numbers of locations, and without resorting to the framework of finite-element-based approximations to stochastic partial differential equations (SPDEs) [37]. The SLI model has been recently extended to the spatiotemporal setting [27]. The main advantage of the Boltzmann-Gibbs representation in this case is that it provides explicit and sparse expressions for the precision matrix (i.e., the inverse covariance matrix), which partake a significant computational advantage with respect to the classical, covariance-based approaches (e.g., kriging, and Gaussian process regression).
In this study, we investigate theoretical properties and statistical techniques for a form of Gaussian Boltzmann-Gibbs models whose energy functional H is related to the spatial variation of the realizations of the field x(s) and of its gradient function ∇x(s). High energies (and therefore low probabilities of realization) are associated with large values of the squares of the field and its gradient. Two parameters are used to scale the magnitude of the contributions from these two terms to the overall energy. This approach offers a natural mechanism for assigning lower probability of realization to functions x(s) that show strong global or local variability in space, such that spatial coherence of values x(s) across different locations s is established and can be controlled parametrically.
The content of the remainder of the paper is structured as follows. In Section 2, we define the Boltzmann-Gibbs energy with respect to an appropriate function space as starting point and then derive distributional properties of the resulting random fields. Specifically, we characterize the spatial variability through both its spectral representation and the corresponding covariance functions. This allows highlighting links to existing random field models that have been explored for statistical implementations. In Euclidean space of dimension larger than one, our formulation yields generalized covariances with infinite variance. Therefore, in Section 3 we develop and compare two regularization schemes that define approximating models with finite variance. One of the regularization schemes operates in the spectral domain and the other directly on the covariance function. For both approaches, we analytically and numerically investigate the behavior of the variogram function in different limits. Section 4 highlights links to the Besag and Leroux models on regular lattices, which can be viewed as another regularization approach. Section 5 discusses parameter estimation of the regularized covariance function using maximum likelihood techniques. A simulation study illustrates the performance of the estimators for various data configurations based on subsamples of the Walker Lake dataset; we advocate the pairwise difference likelihood as a method that provides accurate estimations and good scalability with many observation locations. Section 6 presents a study to assess the prediction performance of the new covariance models using the Walker Lake dataset. Finally, a discussion of the results and our conclusions are presented in Section 7.

Boltzmann-Gibbs random fields
We consider spatial models defined over a domain D which we choose as a compact subset of R d , where s ∈ R d denotes the location vector. The boundary ∂D of D is supposed to be a piecewise smooth (i.e., infinitely differentiable) manifold of dimension d − 1. More precisely, {X(s, ω) : s ∈ R d , ω ∈ Ω} is a scalar, real-valued spatial random field defined on a probability space (Ω, F , P ), where Ω is the sample space, F is a σ-field of subspaces of Ω, and P is a probability measure [49]. The realizations of the random field X(s; ω) for ω ∈ Ω are deterministic functions denoted by x(s). The expectation over all states will be denoted by E[·], and m x (s) = E[X(s; ω)] if the expectation is well defined, i.e., if Ω |x(s; ω)| P (dω) < ∞. Without loss of generality, we will consider below m x (s) = 0 unless otherwise specified.

Energy representation
In this work, we consider realizations x(s), s ∈ R d , possessing regularity properties that give mathematical sense to operations such as differentiation (up to the second order), integration on compact subsets of R d and Fourier transforms, i.e., we ensure that such operations are well-defined. In the usual mean-square sense, these regularity conditions are too restrictive for practical use. We therefore consider them in a weak sense. In recent work, [11] defined Generalized Random Fields as tempered distributions from Schwartz's theory of distributions [45]. In this framework, the operations of differentiation, integration and Fourier Transform are well defined and are considered in a distributional sense. Here, we will suppose that x(s) belongs to the Sobolev space W 2,2 (D), which is the weak sense restriction to D of the generalized functions (i.e. tempered distributions) on R d admitting square integrable second-order distributional derivatives. In W 2,2 (D), point values such as x(s) or the Laplacian of the function x(s), are expressed in terms of integrable generalized functions, such as Dirac delta functions. For simplicity of exposition, we abuse notation and conveniently write x(s) and ∇ 2 x(s) to refer to these tempered distributions. The focus of our work is on random fields characterized by the Boltzmann-Gibbs probability density function (pdf) f = exp(−H)/Z over W 2,2 (D) given by is the gradient operator and I(A) is the indicator function with I(A) = 1 if the condition A is satisfied and I(A) = 0 otherwise.
In the pdf, the denominator Z is the normalizing constant, frequently referred to as the partition function. The functions x(s) belong to the space W 2,2 (D), and they denote realizations (states) of the random field X(s; ω). We shall refer to such random fields as Boltzmann-Gibbs (BG) Random Fields.
From a rigorous, mathematically oriented perspective, the representation (2.1) should be set in the framework of Gaussian measures on Hilbert spaces [13]. The probability density would then be deduced from the respective characteristic function, involving its trace operator on W 2,2 (D). This rigorous approach would come at the cost of highly technical functional analysis arguments. Since the focus of this work is on the statistical use of BG random fields, we opt for a more direct, physics-oriented characterization. With this perspective in mind, the main properties of BG random fields X(s; ω) can be stated as follows: tional integral over all states x(s) [18,34,3,26]. This represents an integral over an infinite-dimensional space, the rigorous definition of which is rather involved [3]. However, for our purposes, the mathematical details arising in the calculation of Z are circumstantial since the statistical properties of the random field follow directly from the structure of the energy functional without making explicit use of Z. (A.2) The random field is Gaussian. This property derives from the fact that the energy is a quadratic functional of x(s).
3) The expectation is equal to zero. This property follows from the fact that for Gaussian random fields the expectation coincides with the mode of the pdf. Since α 0 , α 1 > 0, it holds that H ≥ 0, and the mode is obtained for H = 0. However, we get H = 0 if and only if x(s) is the zero element in W 2,2 (D), which corresponds to the generalized version in W 2,2 (D) of the null function over R d . (A.4) The random field is stationary. This follows from the fact that (i) the expectation is zero, and (ii) the parameters α 0 and α 1 , which determine the generalized covariance function (see Section 2.2), are independent of the spatial locations. In particular, the energy functional admits a spectral representation, cf. Eq. (2.6), which involves a translation-invariant spectral density determined by α 0 and α 1 . Since the field is Gaussian, the translation invariance of the mean and the covariance function guarantees both wide-sense and strict-sense stationarity. (A.5) The random field is statistically isotropic. This property reflects the fact that the energy functional does not have a directional preference. The isotropy is broken if, e.g., (∇x(s)) 2 is replaced by (b · ∇x(s)) 2 , where b ∈ R d and b · ∇x(s) denotes the dot (inner) product of the two vectors.
In more general cases where we consider random fields whose mean function m x (s) is not zero everywhere, we assume that it also belongs to W 2,2 (D). If the mean function is not constant, x(s) includes a non-stationarity in the mean. Stationarity and isotropy then hold for x(s) − m x (s) in (2.1). From now on, m x (s) is assumed to be equal to zero everywhere.
In the following, the integral of the square gradient in (2.1) is expressed in terms of the Laplacian of x(s). Let dS be the unit surface element on ∂D, and n(s) the unit outward-pointing vector perpendicular to ∂D for s ∈ ∂D. Then, Green's first identity, which extends integration by parts to dimensions d > 1, leads to the following representation: The standard assumption in statistical field theory is that the boundary term, i.e., the last integral on the right hand side of Eq. (2.2), can be neglected [19]. Indeed, if the domain D is a hypercube [0, L] d , the "bulk" term, i.e., the first integral on the right hand side of Eq. (2.2), extends over a domain that scales as L d , while the boundary term involves an integral over a domain that scales as L d−1 . Hence, if L 1, the contribution of the boundary term becomes negligible in comparison to the bulk.
In light of the above, we propose to work with the energy functional given as In the Boltzmann-Gibbs approach, the regularity properties of the realizations (i.e., of the sample paths) are not derived from the covariance function, as is commonly done with mathematical representations of Gaussian random fields based on the covariance [2]. In contrast, our approach consists of defining the class of functions (i.e., sample paths) admissible in the energy functional (2.1), and the properties of the joint pdf and the covariance function then follow.

Spectral representation
In this section we derive a spectral representation for the energy functional given by Eq. (2.1) in terms of the Fourier transform (FT). The Fourier representation of x(s)I(s ∈ D) involves a series of discrete frequency vectors k i . The series representation converges to the Fourier integral as D → R d . However, the functions x(s) in the energy functional Eq. (2.1) are not absolutely integrable over R d . Hence, their Fourier transform does not exist in the ordinary sense. Nonetheless, a generalized Fourier transform can be defined for tempered distributions by means of suitably defined test functions [11,17,41]. For the sake of brevity, we shall directly use the main results of this theory regarding Fourier transforms. Interested readers will find all the theoretical justifications in the references above. In the following,x(k) refers to the generalized Fourier transform. The BG random field states can then be expressed using the weak-sense Cramér representation of x(s), given in terms of the following integral where j is the imaginary unit, k ∈ R d denotes the frequency vectors in reciprocal (Fourier) space, and IFT denotes the inverse Fourier transform. The energy functional can be expressed in the spectral representation. For this, we take into account the orthonormality of the Fourier basis, i.e., We thus obtain the following expression of the energy functional in (2.1) which corresponds to Parseval's identity (i.e., the equality of the energy content in the direct and reciprocal spaces): where x D = {x(k), k ∈ R d } and z * represents complex conjugation of z. Based on this expression, the BG spectral density is given bỹ (2.6) Hence a = α 1 /α 0 can be interpreted as the (length) BG scale parameter, whereas 1/(α 0 a d ) corresponds to the BG variance parameter, which we will denote by σ 2 from now on.

Covariance representation
The covariance function is obtained through the following inverse Fourier transform (where h ∈ R d is the vector-valued lag in real space): (2.7) SinceC(k) is a radial function (i.e., its dependence on k is only through k = k ), the covariance function is also radial, i.e., it depends on h only through the Euclidean distance r = h ≥ 0. For the sake of conciseness, we will use the same notation C for the vector covariance function and its radial representation. The radial spectral representation is given by the following Hankel transform [49,26]: where J ν (·) is the Bessel function of the first kind and order ν = d/2 − 1.
In particular, it follows that the generalized BG covariance is given by (2.10) Above, S d = 2π d/2 /Γ(d/2) denotes the surface area of the d − 1-dimensional unit sphere (i.e., the "area" of the boundary of the d-dimensional unit ball), with S 1 = 2, S 2 = 2π, S 3 = 4π and S 4 = 2π 2 . To see how the expressions for d = 1 and d = 3 follow from Eq. (2.9), we need to take into account two aspects: In Eq. (2.10), we see that C 2 (·) to C 4 (·) exhibit a singularity at r = 0. More generally, for d = 2, 3, 4 the function C d (h) is a generalized covariance [28,11]. This is a consequence of the fact that both the BG field's derivatives and its Fourier transform are defined in the generalized sense.

Some special cases
We now elaborate on two special cases of the BG random field.

Whittle-Matérn (W-M) random fields:
The spectral density of Whittle-Matérn (W-M) random fields is given byC(k) = A ν,d /(1 + k 2 a 2 ) ν+d/2 , where A ν,d is a coefficient that depends on the dimension of space and the smoothness index ν > 0 of the W-M field. A comparison between the above and Eq. (2.6) shows that the latter describes a W-M covariance with ν = 1/2 in d = 1. In d = 2, the BG spectral density is equivalent to the W-M spectral density for ν = 0. As shown in [11], the random field corresponding to ν = 0 is a Generalized Random Field, which is well defined in the framework of tempered random distributions. The associated covariance is a covariance distribution, referred to as the generalized Matérn covariance. In [37,11], it was shown that this Generalized Random Field arises as a weak solution of the stochastic partial differential equation (SPDE) where κ 2 > 0, ε(s) is Gaussian white noise, and the SPDE is fractional due to the non-integer exponent 1/2. Using appropriate boundary conditions on the PDE of Neumann or Dirichlet type, [37] defined a stochastic weak solution and obtained well-defined and accurate approximations through classical Gaussian random fields. Then, the link to the parametrization of C 2 (r) in (2.10) is given by a = 1/κ. Scale-free case: For α 0 = 0, the BG spectral density becomesC(k) = (α 1 k 2 ) −1 . This is the Fourier representation of the generalized covariance function In this case, the covariance does not have a characteristic scale a. The generalized covariance function is undefined at r = 0, but it is still positive definite in a distributional sense. Furthermore, it represents the Green's function of the Laplace equation [33]. In addition, the logarithmic function obtained in d = 2 corresponds to the De Wijs geostatistical model [39,12]. Again, this random field can be interpreted as a stochastic weak solution to the SPDE in (2.11) when κ 2 = 0; see [37,11] for details.

Regularization of the covariance function
The covariances obtained for d = 2 to d = 4, i.e., C 2 (·), C 3 (·) and C 4 (·), exhibit a singularity at zero which hinders their practical use. In this section, we use regularization to build easily tractable finite-variance approximations, which can be used as standard stationary covariance functions for prediction and simulation. The term "regularization" is here used for procedures that lead to stable solutions of ill-posed problems [23]. In the case of divergent Feynman integrals obtained in statistical field theory, various regularization techniques can be used to make the integrals finite, including momentum-cutoff (spectral-cutoff), the Pauli-Villars method, as well as lattice-based and dimensional regularization methods [35]. Herein, we consider two options: real-space regularization using a spatial cutoff at very small distances, and regularization with a spectral cutoff at large frequencies.

Real-space regularization
In this section we introduce a regularization method which is based on the use of a small cutoff distance ε.

Theoretical result
We first state the general theoretical result in the following proposition and then provide its proof. The set of positive integers is denoted N * .
is any of the radial forms defined on the right-hand side of (2.10), is positive definite in R for any ∈ N * .
For the proof, the following well-known proposition recalls some properties characterizing radial functions that are strictly positive definite in R for any ∈ N * . Proofs can be found in [47,15].

Proposition 3.2 (Completely monotone functions). For a radial function ϕ(r) :
[0, ∞] → R the following three properties are equivalent:  (2) and (3) is known as the Bernstein theorem [5], while the equivalence of (1) and (2) was shown by Schoenberg [44]. The equivalence of (1) and (3) follows from the other two equivalences. We also need the following Lemma, which is an interesting result on its own.
Proof. As stated in Proposition 3.2, the function ϕ(z) is a CM function on (0, ∞) (respectively on [0, ∞)), if and only if there exists a positive measure μ on [0, ∞) such that for all z > 0 (respectively z ≥ 0), Let us now consider the positive measure Equipped with these preliminary results we are now able to prove the main result in Proposition 3.1.
Proof of Proposition 3.1. The negative exponential is known to be positive definite on R for any ∈ N * . Following [14,Proposition 4], the function , is also CM on (0, ∞) since the product of two CM functions is CM. Then, the same reasoning as above applies: by application of Lemma 3.1, the function is positive definite on R , for any . The proof follows by setting ν = 1/2 and ν = 1 corresponding to d = 3 and d = 4 respectively.
We highlight that, even though the functions in Eq. (2.10) have been obtained from (2.9) for specific space dimensions, all covariance functions C d,ε (h) obtained by application of Proposition 3.1 are valid on R , ∈ N * , for any value ε > 0.

Variograms of regularized BG random fields
The functions C d,ε (h)/C d (ε) are valid correlation functions for any value ε > 0. The associated normalized variograms arẽ with ε a = ε/a. In Figure 1, several 2D normalized variogramsγ 2,ε are shown for various values of a and ε a . The parameter ε a has a significant effect on the shape of the variogram, which we next study analytically for ε a 1. Note that larger values of ε a also provide valid variograms, but in this case the function C d,ε (r) can barely be considered as an approximation of the Boltzmann-Gibbs model explored in this work. At the end of this section, we further show that the C d,ε tends to the exponential model as ε → ∞.
Let us first consider the case d = 2, with ε a 1. For the sake of simplifying notation we drop the index d. The dependence of the variogram at very short distances can be obtained from the leading-order approximation This approximation is very accurate if z < 0.1, and it remains practically useful as long as z < 0.2. Direct calculation of Eq. (3.1) using the approximation yields the normalized variogram at distance ε: Furthermore, given a specific levelγ ε (r g ) = g 1, based on Eq. (3.2) the distance r g where this level is achieved is given by Let us now compute the distance r 1/2 such thatγ ε (r 1/2 ) = 1/2. Using the leading-order approximation of K 0 (·), it follows that based on which the distance r 1/2 is given by If we assume that ε r 1/2 , one finally gets for which the condition ε √ ε is indeed verified. These computations show that the behavior of the normalized variogram at very short distances is controlled by ε a = ε/a, see Eq. (3.2). Furthermore, the distance at which the normalized variogram reaches 1/2 is proportional to the range and proportional to √ ε a .
The practical range of the variogram can be defined as the distance r η such that γ ε (r η ) = 1 − η, with some η 1. According to Eq. (3.1), this is given by the solution of the equation K 0 (r η /a + ε a ) = ηK 0 (ε a ), which involves a nonlinear combination of a and ε a .
Let us now examine the variogram corresponding to d = 3. Here again we drop the subscript d = 3 for the sake of lighter notation. Then, Notice that this value is very close to 1/2 when ε a 1, which is usually the case. For d = 4, the first order Taylor series expansion K 1 (z) 1/z as z → 0 leads toγ independently on a. These findings imply that as d increases, the (regularized) variogramsγ d,ε show an increasingly steep behavior at short distances, and the associated BG random fields have increasingly rough sample paths. Next, we consider the case ε a 1, i.e., the asymptotic behavior as ε a → ∞. When d = 3, one gets immediately For d = 2 and d = 4, following Abramowitz and Stegun [1, Eq. 9.7.2], as z → ∞ we obtain the approximation Hence, as ε a, and considering ν = d/2 − 1, one gets that Therefore, in all casesγ d,ε tends to the exponential variogram as ε a tends to infinity.
In conclusion, the parameter ε a plays the role of a regularity parameter that controls the shape of the covariance function, ranging from a generalized covariance function with a slope at the origin proportional to ε −1 a if ε a 1, to an exponential model if ε a 1.

Frequency-space regularization
The divergent covariance functions in Eq. (2.10) (for d = 2, 3, 4) can also be regularized by imposing a frequency cutoff Λ > 0, i.e., by assuming that the spectral density is given byC Note that the SLI spectral density in Eq. (2.6) is given byC(k) = lim Λ→∞CΛ (k). Using the frequency cutoff Λ < ∞, the variance is given by the integral which takes a finite value for d ≥ 2.
The finite cutoff replaces the upper integration limit in the spectral integral of Eq. (2.8) for the covariance function. To our knowledge, this integral with finite upper limit cannot be explicitly evaluated for every r ≥ 0. However, for large r the equations (2.10) provide a very accurate approximation of the regularized covariance. A finite value of Λ implies (i) that the regularized variance (3.5) is also finite and (ii) that the random field is mean-square differentiable for every order. Hence, the variogram function is well-defined for r → 0, and in addition γ(r) = c 0 r 2 + O(r 4 ).
Indeed, using the spectral representation (2.8), we obtain the following representation for finite Λ and for d ≥ 2: To approximate the above expression, we use the Taylor series expansion of the Bessel function J ν (·), which is given by [1, Eq. 9.1.10] In light of the above, the expansion of the variogram function around r = 0 is given by This integral is evaluated using the change-of-variable transformation ak → x and leads to the general expression where B d (aΛ) is a dimensionless constant. More precisely for d = 2, 3, 4, we obtain (3.8b) The variance of the spectrally regularized BG process with spectral density (2.6) is given by dk.
Evaluation of the above integral leads to the following expressions: (3.9) Thanks to the high frequency cutoff, the variance is finite as can be seen from (3.9), and the variogram is therefore bounded. Thus, a normalized version of the variogram can be defined, which we denote byγ Λ (r) = γ Λ (r)/σ 2 Λ . Let us consider the case d = 2. Based on the spectral representation Eq. (3.6), the series expansion Eq. (3.7) of the Bessel function J 0 , and the regularized variance Eq. (3.9), the series expansion of the normalized variogram becomes with the series coefficients u k being defined by the integrals (3.10b) Let us defineũ k = u k /Λ 2k withũ 0 = ln(1 + a 2 Λ 2 )/2a 2 . Then, the u k are obtained from the following recursive equatioñ Λ 2 , which shows thatũ k tends rapidly to 1/(2k) for increasing k. Finally, the following power series is obtained forγ Λ (r) The power series expansion in Eq. (3.11) converges absolutely for all r. This can be shown by means of D'Alembert's ratio test [46], thanks to the factorial term (k!) 2 in the denominator. However, the expansion is not very useful in practice, since the rapid growth of the term (rΛ/2) 2k with increasing r makes the series unstable as soon as rΛ/2 > 20. A more stable approach is the expansion of the Hankel transform (2.8), which makes use of the roots of the zero-order Bessel function of the first kind and leads to semi-analytical expressions for d = 2, 4 [4,22]. For numerical computation of γ Λ (r) by means of Eq. (3.6), standard numerical integration routines provide stable results (e.g., using the integrate routine in the base library of the R software).

Matching the regularization schemes
In the preceding two subsections, we presented real-space and frequency-space regularization methods for the generalized covariance functions (2.10) in d = 2, 3, 4. Both approaches provide expressions that yield finite variance values, while maintaining the form of the generalized covariances in (2.10) at larger lags. Hence, it is useful to investigate conditions on the regularization parameters for which both approaches lead to the same variance, i.e., σ 2 ε = C ε (0) = C(ε) = σ 2 Λ . To better understand this constraint, consider the fact that the variance is given by the integral of the spectral density as evidenced in (3.5). Both regularization schemes modify the spectral density (3.4) at high frequencies ( k 0). Hence, the matching of the variance ensures that the integrals of the respective spectral densities,C Λ ( k ) andC ε ( k ), are the same over the entire range of k , and thus over the regime of high k as well.
By taking into account (2.10) and (3.9), the variance matching condition leads to the following equations: We further simplify the equations above by considering that in order to only modify the behavior of the covariance near zero, it must hold that aΛ 1 and ε/a 1. We therefore use the following approximations: K 0 (ε/a) −γ E − ln(ε/2a), K 1 (ε/a) a/ε, ln(1 + a 2 Λ 2 ) 2 ln(aΛ), exp(−ε/a) 1, and aΛ − tan −1 (aΛ) aΛ. Then, the variance matching condition simply becomes Λε = 2e −γ E in d = 2, Λε = π/2 in d = 3, and Λε = 2 in d = 4. As a first order approximation one could thus set Λε d/2. In addition to the assumptions aΛ 1 and ε/a 1, the above analysis demonstrates that the variance matching condition also requires Λε ∼ O(1). These conditions allow a wide range of combinations of ε and Λ values.
A different approach for matching the regularization schemes involves matching the short distance behavior of the variogram function. As before, we assume aΛ 1 and ε/a 1. In d = 2, we consider the first term of the series expansion (3.8) for the normalized variogramγ Λ (r), which is given by where the last approximation is based on aΛ 1. If we assume that Λε = b for some fixed scalar b, thenγ On the other hand, the variogram at distance ε according to real-space regularization is given by (3.2),γ ε (ε) − ln(2)/ (ln(ε/a) − ln(2) + γ E ). To match the two regularized variograms at distance ε, we numerically determine the locus b(ε) of the nonlinear equationγ Λ (ε) −γ ε (ε) = 0, for log(ε/a) ∈ [−10, −2]. Figure 2 plots ε/a against the respective values b. As evidenced in this plot, the two variograms match for b ∈ [2.45, 2.75] and ε/a ≤ 0.1, which then implies that Λ 10. Hence, we conclude that both matching approaches lead to very similar results, despite their different origins.

Statistical models for lattice data
In this section, we present a discretization of the continuum BG model over a regular grid in dimension d. We consider a hyper-rectangular grid G = {1, . . . , n 1 }× . . . × {1, . . . , n d } ⊂ Z d with a hyper-cubic unit cell. The d-dimensional vectors s k , k = 1, . . . , n with n = d i=1 n i are the grid node locations, and x = (x 1 , . . . , x n ) with x k = x(s k ) is the vector of the field values on the grid. We further allow the grid step b > 0 to be different from 1, such that we use the grid G b = {b j | j ∈ G}, where j = j(s) = (j 1 , . . . j d ) is a G-valued index vector with one index for each of the d dimensions. We represent grid points s as follows: is the set of unit vectors e i = (0, . . . , 0, 1, 0, . . . , 0) T with 1 at the i-th position, and j is the G-valued index defined above.

Energy terms for lattice models
By using a discrete approximation of the gradient evaluated at the grid points in G b , the energy term (2.1) can be expressed as a bilinear functional of x : G → R n . Using the forward difference approximation for the gradient, i.e., the discretized contribution of a grid point s ∈ G b to the integral in the energy functional of the Boltzmann-Gibbs model (2.1) is given by where the leading term corresponds to the normalization with respect to the hyper-volume b d of the grid cell containing s. Since each point s ∈ G is the forward neighbor of the d points s − be i , i = 1, . . . , d, the overall additive contribution (to the approximation of the energy integral) of terms that directly involve x k = x(s k ) can be summarized as follows: Therefore, we use the approximation H b (x; θ) = n k=1 I k for the lattice model over G b . The points s at the boundary of G b merit special attention. If s has index vector j and j i = n i for some i ∈ {1, . . . , d}, then the forward neighbor s + be i is part of G b . Similarly, if j i = 1 for some i ∈ {1, . . . , d}, then s is forward neighbor of the point s−be i that is not part of G b . In the following equations, we therefore keep the points lying outside the grid G b if they are direct neighbors of points within the grid. In statistical practice, even if the field values for such points outside the grid are not observed, it is possible to impose appropriate boundary conditions for such points. A first possibility would be to use Dirichlet boundaries, i.e., to set x(s) = 0 if s ∈ G b . Another possibility would be to use Neumann boundaries, i.e., to set x(s + be i ) = x(s) if exactly one of the two points s or s + be i is not part of G b .
The energy term H b (x; θ) discretized over the grid G b is bilinear with respect to components of x, such that it can be written in matrix notation as follows: Since the original energy functional and the discrete approximation in (4.1) are non-negative for any α 0 , α 1 ≥ 0, such that max(α 0 , α 1 ) > 0, the precision matrix Q(θ) is non-negative definite. Moreover, the precision matrix is positive definite if at least one of α 0 , α 1 is positive, and if in addition max d i=1 n i ≥ 2 in the case of Neumann boundary conditions. The energy functional (4.1) tends to the continuum energy (2.1) at the limit of infill asymptotics b → 0 by considering the grid where the number of grid points n i is replaced by The precision matrix can be additively decomposed as follows: where I n is the n×n identity matrix which results from the energy of the squared fluctuations, and Q 1 is the gradient precision sub-matrix which incorporates the sum of the squared differences and is given by where k 1 , k 2 ∈ {1, . . . , n}.

Links to common lattice models
Using the above expressions for the precision matrix elements, we can explicitly provide the conditional distribution of the observation x k at a point s k given the observations at its grid neighbors where x −k refers to the values of the field over all grid points with the exception of x k . From (4.3), it follows immediately that our grid approximation of the model is closely related to two widely used models for lattice data in spatial statistics. The most popular and fundamental model for areal data observed over known graphs, which are defined by means of the respective adjacency matrix, is due to Besag [6,7,8,43], and it is often referred to as the conditionally autoregressive (CAR) model. The general Besag model can be characterized through its conditional distributions where τ > 0 is a precision parameter. Therefore, the Boltzmann-Gibbs model characterized by the conditional distributions in (4.3) is equivalent to the Besag model (with points connected to s given by s ± be i , i = 1, . . . , d) if α 0 = 0 and An extension of the Besag model is the Leroux model [36], which is characterized by the precision matrix where τ Leroux > 0 is a precision parameter and Q Besag (1) As a result, the grid approximation of the Boltzmann-Gibbs model with energy given in (4.1) can be interpreted as a reparametrization of the Leroux model using parameters α 0 and α 1 instead of τ Leroux and λ. The grid step b is not considered as a parameter since it is usually fixed before using the model. The grid approximation, potentially combined with appropriate boundary conditions, can be viewed as another regularization technique for the continuousspace model, where smaller values of b will lead to a more faithful approximation.

Parameter estimation
We now explore how the estimation of the parameters of a regularized covariance function of the form σ 2 C d,ε (h; a) can be achieved. Several standard estimation methods are available for Gaussian random fields: Weighted Least Squares (WLS) [12], Full Likelihood (FL) and Composite Likelihood (CL) [10,9]. Composite likelihoods are products of "smaller" likelihoods defined on certain subsets of the data, such as data pairs, for which the likelihood is easy to compute. One can consider the marginal likelihood, the conditional likelihood or the likelihood of differences (also referred to as increments) of such pairs. We follow [10] and choose to consider all possible differences is the two-point random increment. Compared to a full likelihood, this approach can easily handle very large datasets, which resonates with the motivation for exploring Boltzmann-Gibbs models. Following the definition of the variogram function, i.e, γ ij (θ) = 1 2 Var[X(s i ; ω) − X(s j ; ω)], we obtain Var[U ij (ω)] = 2γ ij (θ); note that γ ij (θ) is abbreviated notation for γ d,ε (s j − s i ; θ), which emphasizes the dependence upon the parameters θ to be estimated. Under the Gaussian assumption for the random field X(s; ω), U ij (ω) is distributed as a Gaussian random variable N (0, 2γ ij (θ)). Ignoring the spatial dependence between different variables U ij (ω), the negative weighted composite likelihood (after dropping the constant terms that are irrelevant for parameter estimation) is given by where the pairwise likelihood terms are defined by The computational cost of calculating cl(θ) is O(N 2 ). It can be significantly reduced by only considering pairs of sites that are closer than a specified distance r 0 , i.e., by setting w ij = I(r ij ≤ r 0 ). As shown in [10], choosing an appropriate r 0 not only provides a significant gain in computational time but also increases the statistical efficiency, since pairs of observations separated by distances exceeding the correlation range are uninformative for the spatial scale parameter. The associated CL score statistic, is unbiased, i.e., E[CL (1) (θ; ω)] = 0, where 0 is the zero vector in a space with dimension equal to the number of components of θ. In Eq. (5.2) f (1) denotes the gradient (i.e., the vector of partial derivatives) of a function f with respect to θ. This result does not depend on distributional assumptions about X(s; ω), since for stationary random fields it always holds that E[U ij (ω) 2 ] = 2γ ij (θ). CL is therefore expected to provide better estimates than FL when the Gaussian assumption is violated.

Description of the data set
We use the well-known Walker Lake data set [31] to illustrate statistical aspects of the Boltzmann-Gibbs covariance model. It is available from the gstat package in R [40]. The Walker Lake data set was derived from digital terrain elevations of the Walker Lake area in western Nevada, US. A rectangular region containing elevation values at 1.95 million regularly spaced points was divided into a coarse-grained 300 × 260 grid in which each cell contained a 5 × 5 array of points. The mean and variance of the elevation within each cell were determined. Through a deterministic function of these two values, a synthetic metal concentration V (herein denoted by X for consistency with our random field notation) was derived for each cell of the grid. A complete description of the construction of the Walker Lake data set is given in [31,Appendix A]. Because the complete data set is highly non-stationary, we selected the [10,200]× [10,100] subdomain containing N = 17, 381 data points, which we deem compatible with a stationarity assumption (see the left-hand side plot of Figure 3).

Estimation with full dataset
We first estimate the parameters of the BG variogram using as training set all 17,381 data in the entire subdomain, based on WLS and CL. Full likelihood is not feasible for numerical reasons due to the large size of the training dataset. As can be seen from the histogram plot on the right-hand side of Figure 3, the concentration follows a non-Gaussian distribution. For numerical optimization we used the R function nlminb, which was successfully tested on simulated data in unreported preliminary analyses. Estimated values of the BG parameter vector (σ 2 , a, ε a ) are listed in Table 1, and the estimated variograms are shown on Figure 4. The experimental variogram (black circles) is computed for distances ranging from 0 to 70, using K = 80 bins of equal width. When the fitting is performed using WLS, the experimental variogram is closely fitted by the Boltzmann-Gibbs variogram σ 2γ 2,ε (h) whereγ 2,ε (h) is given by Eq. (3.1) (blue continuous line). WLS is used with optimal weights proportional to n k /γ 2 (h k ; θ), where h k is the distance corresponding to bin k ∈ {1, . . . , K} and n k is the number of pairs in that bin (cf. [12] for a justification of this choice). CL-based variograms estimated with cutoff distances r 0 = 10 and r 0 = 20, underestimate the variability at distances larger than the respective cutoff distance. For r 0 = 30 or larger, the estimated variogram is almost identical to the one estimated with WLS. In all cases, the estimated spatial cutoff parameter verifies ε a 1 (see Table 1), as discussed in Section 3.
For comparison, an exponential variogram σ 2 (1 − exp(− h /a)) was also fitted with WLS and CL under the same conditions. Its fit to the experimental variogram is evidently inferior to that of the BG model, in particular for intermediate (with respect to the range) distances, approximately between 15 and 35. The additional regularization parameter in the BG variogram allows fitting very accurately the experimental variogram at all distances (short, intermediate and large). For the exponential variogram, CL systematically underestimates the variability at distances larger than r 0 . The estimates are equal to WLS estimates only if the cutoff distance is equal to 50, which equals half of the North-South diagonal distance in the study domain. From now on, the set of parameters (σ 2 , a, ε a ) = (67, 450, 40.7, 0.07) will be considered as the "ground truth". These parameters correspond to the WLS and CL estimates with r 0 ≥ 30 (blue curve in Figure 4).

Estimation using smaller training sets
We now assess the ability to estimate the "true" variogram based on training sets of small to moderate size. We will uniformly select 200, 400 and 800 samples from the full subdomain dataset and estimate the parameters using WLS, Full Likelihood (FL) and CL with r 0 ∈ {5, 10, 20, 30, 40, 50}. The whole experiment of random subsampling followed by estimation is repeated 100 times.
These estimates are then compared to the "ground truth" established in Section 6.2.1. Since the synthetic metal concentration does not follow the Gaussian distribution, likelihood methods might not perform very well. FL is expected to perform poorly (even if computationally feasible), because it relies heavily on the distributional properties of the multivariate Gaussian distribution.  Figure 5 shows the estimates of the three BG parameters for different estimation methods as well as the scatter plot of log(â) vs.σ 2 , for all methods. The estimates are obtained for 100 configurations of n = 400 randomly sampled observations. FL leads to a bimodal set of estimates. In the first group, the estimates are close to those obtained with the complete dataset (black markers in Figure 5). However, there is also a second group of estimated values (red markers in Figure 5) with much larger values ofσ 2 (from 115,687 to 200,643) and ranges (from 949 to 2,102, with 2,102 representing the upper bound for the range used in the optimization procedure) and smaller ε a (from 0.0007 to 0.0032). This bimodality is specific to FL in this experiment, even though WLS and CL with large r 0 also lead to estimates that significantly deviate from the "true" values. Closer inspection reveals that r 0 ∈ {20, 30} consistently leads to the least biased estimates. However, the variance is underestimated due to the small size of the training set. The upper-right window of Figure 5 is a scatter plot ofσ 2 andâ obtained from all the estimation methods (WLS, CL, FL). As evidenced in this plot,σ 2 andâ are not correlated ifσ 2 is approximately around the "true" value. However, for training configurations that lead to larger estimates ofσ 2 , there is a positive linear correlation between log(â) andσ 2 .
Very similar patterns were obtained with training sets containing n = 200 and n = 800 observations. Estimates are more or less dispersed around the "true" values, depending on n. In particular, the FL group of large variance and large range estimates becomes smaller as n increases (and vice-versa, it becomes larger as n decreases). The score statistics associated with CL, given in Eq. (5.2), are very close to zero for all cutoff distances r 0 . Figure 6 represents the 100 BG model variograms (based on CL parameter estimates with r 0 = 30), along with the variograms (experimental and based on CL estimates) obtained from the complete dataset of the Walker Lake subdomain. For the sake of comparison, both Boltzmann-Gibbs and exponential variograms were estimated. The comparison clearly shows that BG estimates are less dispersed and more centered around the "true" (experimental) variogram. This figure provides further evidence that the regularized BG variogram model better fits the spatial dependence of the data than the exponential model, even for moderately sized training sets. Similar plots were obtained for n = 200 and n = 800 but are not reported herein for the sake of conciseness.

Prediction and cross-validation
To assess the predictive performance of the BG model, the random training sets and their associated parameter estimates obtained in Section 6.2 are now used for prediction (kriging) and cross-validation. For a training set of size n, where n ∈ {200, 400, 800}, we use ordinary kriging to predict the synthetic metal concentration X at each of the N − n remaining locations of the Walker Lake subdomain (N = 17, 381). The partitioning of the dataset into training and validation sets is repeated 100 times. Four different spatial models are used for the comparison. First, ordinary kriging is performed for each randomly selected training set with the BG "ground truth" variogram and the exponential (Exp) variogram (both shown as blue lines in Figure 4). Recall that the "ground truth" parameters were estimated with WLS, and they are listed in Table 1. Then, for each of the above random training sets, the variograms (BG and Exp) with parameters estimated using maximum CL with the optimal cutoff distance (i.e., r 0 = 30), are also used for ordinary kriging. The resulting four spatial models are denoted F-BG, F-Exp, S-BG and S-Exp, respectively, where F stands for "Full dataset" (on the selected subdomain) and S for "Subdomain samples". For each random training set and each spatial model, four different scores are computed: mean error (ME), mean squared error (MSE), mean normalized squared error (MNSE), and linear correlation between the prediction error and the true value (COR). Good predictive performance requires ME close to 0 and MNSE close to one. The latter requirement is due to the fact that (assuming zero bias of the kriging estimates) E[(X i − X i ) 2 ] = σ 2 K,i , where X i andX i are the true value and the kriging estimate at point s i , respectively, and σ 2 K,i is the associated kriging variance. Recall that kriging is the orthogonal projection on the linear subspace spanned by the data, e.g. [12]. Therefore, E[(X i − X i )X i ] = 0, and COR should be as close to 0 as possible. For each of the above four scores and each of the four spatial models, we obtain 100 values associated with the 100 training sets.
The results of the cross-validation analysis are summarized in Table 2. As expected, for all values n and for all the spatial models, kriging is essentially unbiased (with ME being close to zero). Boxplots of the average MSEs are shown in Figure 7. For all n, the MSE is smaller for the BG model than for the Exp model, whether based on the full dataset (F-BG vs. F-Exp) or the subsamples (S-BG vs. S-Exp). The relative MSE improvement of the BG model versus the Exp ranges from 1.5% when n = 200 to 2.4% when n = 800. When using the Exp model, predictions are slightly more accurate with the "full" model than with the specific model estimated with the sample at hand. The opposite holds when using the BG model, but the difference is very small. A striking result is that NMSE is significantly closer to 1 for the BG than for the Exp model. In other words, the exponential model largely underestimates the magnitude of the kriging error, whereas the BG model has an MNSE score quite close to 1. In addition, COR is always smaller with the BG than for the Exp model. To summarize, the (regularized) BG model provides better scores than the exponential model for all validation metrics considered. A particularly interesting feature of the BG model that enables efficient computation of predictions is the degree of sparsity of the precision matrix. An indicator of sparsity is the proportion of precision matrix entries with an absolute value larger than 10 −8 , denoted by p s . The index p s is listed in the third column of Table 2. Its values show that the precision matrix is always sparser for the BG than for the Exp model. This is a noteworthy finding, since the exponential model possesses a Markov property on the continuous plane [37] and is known to show very sparse precision matrices in practice.
All of the above results of cross-validation analysis provide strong evidence in favor of the regularized Boltzmann-Gibbs model, at least for this dataset.

Conclusions
The work presented in this paper was motivated by the strong interest in statistical methodology geared towards spatial models based on physically inspired stochastic differential equations, e.g., in the developments and applications originating in the seminal work of [37], and going further in [11], as well as connections between statistical field theory and spatial models [24,28,25]. Starting from the specification of a Boltzmann-Gibbs model that directly and intuitively controls spatial dependence in terms of an energy functional that contains contributions from squares of the fluctuations and their gradients, we have characterized the properties of the resulting random fields in terms of the covariance and the spectrum. In cases where the variance is infinite, we achieve finite covariance structures, useful for simulation and statistical modeling and prediction, by means of regularization schemes in the direct and spectral domains. Specifically, the direct-space covariance regularization introduces a new approximation parameter, ε > 0, which can be estimated jointly with the other model parameters using techniques such as the pairwise difference likelihood. When ε moves from 0 to infinity, the corresponding model smoothly interpolates between the Boltzmann-Gibbs generalized covariance and the exponential covariance. In two dimensions both of the above can be considered as special cases of the widely used Matérn covariance with regularity parameters 0 and 0.5, respectively.
With respect to spectral regularization, instead of using a "hard" (abrupt) frequency cutoff in Eq. (3.4), it is possible to use a tapering function that provides a smooth decay of the spectral density, as proposed in [24]. This is equivalent to a convolution of the random field with a suitable kernel function in direct space; the latter would then act as a filter for the high-frequency fluctuations of the field. Such a choice is certainly appealing due to the smooth decay of the spectral density that it produces. On the other hand, the integrals for the regularized variance (3.5) and variogram function (3.6), will become more complicated.
A grid approximation of the Boltzmann-Gibbs field with appropriate boundary conditions can be viewed as another regularization technique and establishes a direct link to the popular Besag and Leroux lattice models used in spatial statistics. The grid approximation is characterized by sparse precision matrices, which are useful for gridded high-dimensional data thanks to their numerical benefits. In future work, we plan to develop similar sparse matrix approximations of the Boltzmann-Gibbs models for irregularly spaced locations.
Another promising research direction is the extension of the Boltzmann-Gibbs model to vector random fields, which would be applicable to multivariate continuum data as well as multi-spectral images. In addition, given the simple rational function expression of the Boltzmann-Gibbs spectral density, it is worthwhile to investigate spectral estimation methods for Boltzmann-Gibbs random fields, both for gridded but incomplete and irregularly spaced data. Such approaches based on Whittle's approximation [48] of the Gaussian log-likelihood have been explored in [16,21].
The Gibbs energy term (2.1) studied herein provides control over local gradients but not directly over the local curvature of the random field. Boltzmann-Gibbs models which include curvature have been developed in [24,28,25]. Extensions towards the inclusion of higher-order (than the gradient and curvature) terms allowing for smoother realizations of the Gaussian field, or towards terms defining non-Gaussian Boltzmann-Gibbs fields, are other promising areas of research.