Nearest-Neighbor Mixture Models for Non-Gaussian Spatial Processes

We develop a class of nearest-neighbor mixture models that provide direct, computationally efficient, probabilistic modeling for non-Gaussian geospatial data. The class is defined over a directed acyclic graph, which implies conditional independence in representing a multivariate distribution through factorization into a product of univariate conditionals, and is extended to a full spatial process. We model each conditional as a mixture of spatially varying transition kernels, with locally adaptive weights, for each one of a given number of nearest neighbors. The modeling framework emphasizes the description of non-Gaussian dependence at the data level, in contrast with approaches that introduce a spatial process for transformed data, or for functionals of the data probability distribution. Thus, it facilitates efficient, full simulation-based inference. We study model construction and properties analytically through specification of bivariate distributions that define the local transition kernels, providing a general strategy for modeling general types of non-Gaussian data. Regarding computation, the framework lays out a new approach to handling spatial data sets, leveraging a mixture model structure to avoid computational issues that arise from large matrix operations. We illustrate the methodology using synthetic data examples and an analysis of Mediterranean Sea surface temperature observations.


Introduction
Gaussian processes have been widely used as an underlying structure in model-based analysis of irregularly located spatial data in order to capture short range variability.The fruitfulness of these spatial models owes to the simple characterization of the Gaussian process by a mean and a covariance function, and the optimal prediction it provides that justifies kriging.However, the assumption of Gaussianity is restrictive in many fields where the data exhibits non-Gaussian features, for example, vegetation abundance (Eskelson et al., 2011), precipitation data (Sun et al., 2015), temperature data (North et al., 2011), and wind speed data (Bevilacqua et al., 2020).This article aims at developing a flexible class of geostatistical models that is customizable to general non-Gaussian distributions, with particular focus on continuous data.
Several approaches have been developed for non-Gaussian geostatistical modeling.A straightforward approach consists of fitting a Gaussian process after transformation of the original data.Possible transformations include Box-Cox (De Oliveira et al., 1997), power (Allcroft and Glasbey, 2003), square-root (Johns et al., 2003), and Tukey g-and-h (Xu and Genton, 2017) transforms, to name a few.An alternative approach is to represent a non-Gaussian distribution as a location-scale mixture of Gaussian distributions.This yields Gaussian process extensions that are able to capture skewness and long tails (Kim and Mallick, 2004;Palacios and Steel, 2006;Zhang and El-Shaarawi, 2010;Mahmoudian, 2017;Morris et al., 2017;Zareifard et al., 2018;Bevilacqua et al., 2021).Beyond methods based on continuous mixtures of Gaussian distributions, Bayesian nonparametric methods have been explored for geostatistical data modeling, starting with the approach in Gelfand et al. (2005) which extends the Dirichlet process (Ferguson, 1973) to a prior model for random spatial surfaces.We refer to Müller et al. (2018) for a review.From a different perspective, Bolin (2014) formulates stochastic partial differential equations driven by non-Gaussian noise, resulting in a class of non-Gaussian Matérn fields.
An alternative popular approach involves a hierarchical model structure that assumes conditionally independent non-Gaussian marginals, combined with a latent spatial process that is associated with some functional or link function of the first-stage marginals.Hereafter, we refer to these models as hierarchical first-stage non-Gaussian models.If the latent process is linked through a function of some parameter(s) of the first-stage marginal which belongs to the exponential dispersion family, the approach is known as the spatial generalized linear mixed model and its extensions (Diggle et al., 1998).Non-Gaussian spatial models that build from copulas (Joe, 2014) can also be classified into this category.Copula models assume pre-specified families of marginals for observations, with a multivariate distribution underlying the copula for a vector of latent variables that are probability integral transformations of the observations (Danaher and Smith, 2011).Spatial copula models replace the multivariate distribution with one that corresponds to a spatial process, thus introducing spatial dependence (Bárdossy, 2006;Ghosh and Mallick, 2011;Krupskii et al., 2018;Beck et al., 2020).
The non-Gaussian modeling framework proposed in this article is distinctly different from the previously mentioned approaches.Our methodology builds on the class of nearestneighbor processes obtained by extending a joint density for a reference set of locations to the entire spatial domain.The original joint density is factorized into a product of conditionals with respect to a sparse directed acyclic graph (DAG).Deriving each conditional from a Gaussian process results in the nearest-neighbor Gaussian process (Datta et al., 2016a).Models defined over DAGs have received substantial attention; see, e.g., Datta et al. (2016b); Finley et al. (2019); Peruzzi et al. (2020); Peruzzi and Dunson (2022).The class of DAG-based models originates from the Vecchia approximation framework (Vecchia, 1988).Katzfuss and Guinness (2021) provide further generalization.Considerably less attention, however, has been devoted to defining models over a sparse DAG with non-Gaussian distributions for the conditionals of the joint density.This is in general a difficult problem, as each conditional involves, say, a p-dimensional conditioning set, which requires a coherent model for a (p + 1)-dimensional non-Gaussian distribution, with p potentially large.In this article, we take on the challenging task of developing a computationally efficient, interpretable framework that provides generality for modeling different types of non-Gaussian data and flexibility for complex spatial dependence.
We overcome the aforementioned challenge by modeling each conditional of the joint density as a weighted combination of spatially varying transition kernels, each of which depends on a specific neighbor.This approach produces multivariate non-Gaussian distributions by specification of the bivariate distributions that define the local transition kernels.Thus, it provides generality for modeling different non-Gaussian behaviors, since, relative to the multivariate analogue, constructing bivariate distributions is substantially easier, for instance, using bivariate copulas.Moreover, such a model structure offers the convenience of quantifying multivariate dependence through the collection of bivariate distributions.As an illustration, we study tail dependence properties under appropriate families of bivariate distributions, and provide results that guide modeling choices.The modeling framework achieves flexibility by letting both the weights and transition kernels be spatially dependent, inducing sufficient local dependence to describe a wide range of spatial variability.We refer to the resulting geospatial process as the nearest-neighbor mixture process (NNMP).
An important feature of the model structure is that it facilitates the study of conditions for constructing NNMPs with pre-specified families of marginal distributions.Such conditions are easily implemented without parameter constraints, thus resulting in a general modeling tool to describe spatial data distributions that are skewed, heavy-tailed, positivevalued, or have bounded support, as illustrated through several examples in Section 4 and in the supplementary material.The NNMP framework emphasizes direct modeling by introducing spatial dependence at the data level.It avoids the use of transformations that may distort the Gaussian process properties (Wallin and Bolin, 2015).It is fundamentally different from the class of hierarchical first-stage non-Gaussian models that introduce spatial dependence through functionals of the data probability distribution, such as the transformed mean.Regarding computation, NNMP models do not require estimation of potentially large vectors of spatially correlated latent variables, something unavoidable with hierarchical first-stage non-Gaussian models.In fact, approaches for such models typically resort to approximate inference, either directly or combined with a scalable model (Zilber and Katzfuss, 2021).Estimation of NNMPs is analogous to that of a finite mixture model, thus avoiding the need to perform costly matrix operations for large data sets, and allowing for computationally efficient, full simulation-based inference.Overall, the NNMP framework offers a flexible class of models that is able to describe complex spatial dependence, coupled with an efficient computational approach, leveraged from the mixture structure of the model.
The rest of the article is organized as follows.In Section 2, we formulate the NNMP framework and study model properties.Specific examples of NNMP models illustrate different components of the methodology.Section 3 develops the general approach to Bayesian estimation and prediction under NNMP models.In Section 4, we demonstrate different NNMP models with a synthetic data example and with the analysis of Mediterranean Sea surface temperature data, respectively.Finally, Section 5 concludes with a summary and discussion of future work.
2 Nearest-Neighbor Mixture Processes for Spatial Data

Modeling Framework
Consider a univariate spatial process {Z(v) : v ∈ D}, where D ⊂ R p , for p ≥ 1.Let S = {s 1 , . . ., s n } be a finite collection of locations in D, referred to as the reference set.We write the joint density of the random vector z S = (Z(s 1 ), . . ., Z(s n )) as p(z S ) = p(z(s 1 )) n i=2 p(z(s i ) | z(s i−1 ), . . ., z(s 1 )).If we regard the conditioning set of z(s i ) as the parents of z(s i ), the joint density p(z S ) is a factorization according to a DAG whose vertices are z(s i ).We obtain a sparse DAG by reducing the conditioning set of z(s i ) to a smaller subset, denoted as z Ne(s i ) , with Ne(s i ) ⊂ S i = {s 1 , . . ., s i−1 }.We refer to Ne(s i ) as the neighbor set for s i , having at most L elements with L n.The resulting density for the sparse DAG is which is a proper density (Lauritzen, 1996).Choosing the neighbor sets Ne(s i ) creates different sparse DAGs.There are different ways to select members from S i for Ne(s i ); see, e.g., Vecchia (1988), Stein et al. (2004), and Gramacy and Apley (2015).Our selection is based on the geostatistical distance between s i and s j ∈ S i .The selected locations s j are placed in ascending order according to the distance, denoted as s (i1) , . . ., s (i,i L ) , where i L := (i − 1) ∧ L. We note that the development of the proposed framework holds true for any choice of the neighbor sets.Constructing a nearest-neighbor process involves specification of the conditional densities p(z(s i ) | z Ne(s i ) ) in (1), and extension to an arbitrary finite set in D that is not overlapping with S. We approach the problem of constructing a nearest-neighbor non-Gaussian process following this idea.A notable difference, however, from the nearest-neighbor Gaussian process approach in Datta et al. (2016a) is that we do not posit a parent process for p(z S ) when deriving p(z S ).Datta et al. (2016a) assume a Gaussian process for p(z S ), and use it to model the conditional densities p(z(s i ) | z Ne(s i ) ).Similar ideas underlie the Vecchia approximation framework which considers (1) as an approximation for the density of a Gaussian process realization.We instead utilize the structure in (1) to develop nearestneighbor models for non-Gaussian spatial processes.To this end, we define the conditional density in the product in (1) as where f s i ,l is the lth component of the mixture density p, and the weights satisfy w l (s i ) ≥ 0, for all l, and i L l=1 w l (s i ) = 1, for every s i ∈ S. In a sparse DAG, nearest neighbors in set Ne(s i ) are nodes that have directed edges pointing to s i .Thus, it is appealing to consider a high-order Markov model in which temporal lags have a similar notion of direction.Our approach to formulate (2) is motivated from a class of mixture transition distribution models (Le et al., 1996), which consists of a mixture of first-order transition densities with a vector of common weights.A key feature of the formulation in (2) is the decomposition of a non-Gaussian conditional density, with a potentially large conditioning set, into a weighted sum of local conditional densities.This provides flexible, parsimonious modeling of p(z(s i ) | z Ne(s i ) ) through specifying bivariate distributions that define the local conditionals f s i ,l (z(s i ) | z(s (il) ).We provide further discussion on this feature for model construction and relevant properties in the following sections.
Spatial dependence characterized by ( 2) is twofold.First, each component f s i ,l is associated with spatially varying parameters indexed at s i ∈ S, defined by a probability model or a link function.Secondly, the weights w l (s i ) are spatially varying.As each component density f s i ,l depends on a specific neighbor, the weights indicate the contribution of each neighbor of s i .Besides, the weights adapt to the change of locations.For two different s i , s j in S, the relative locations of the nearest neighbors Ne(s i ) to s i are different from that of Ne(s j ) to s j .If all elements of Ne(s i ) are very close to s i , then values of (w 1 (s i ), . . ., w i L (s i )) should be quite even.On the other hand, if, for s j , only a subset of its neighbors in Ne(s j ) are close to s j , then the weights corresponding to this subset should receive larger values.We remark that in general, probability models or link functions for the spatially varying parameters should be considered case by case, given different specifications on the components f s i ,l .Details of the construction for the component densities and the weights are deferred to later sections.
We obtain the NNMP, a legitimate spatial process, by extending (2) to an arbitrary set of non-reference locations U = {u 1 , . . ., u r } where U ⊂ D \ S. In particular, we define the conditional density of z U given z S as where the specification on w l (u i ) and f u i ,l for all i and all l is analogous to that for (2), except that Ne(u i ) = {u (i1) , . . ., u (iL) } are the first L locations in S that are closest to u i in terms of geostatistical distance.Building the construction of the neighbor sets Ne(u i ) on the reference set ensures that p(z U | z S ) is a proper density.Given ( 2) and (3), we can obtain the joint density p(z V ) of a realization z V over any finite set of locations V ⊂ D. When V ⊂ S, the joint density p(z V ) is directly available as the appropriate marginal of p(z S ).Otherwise, we have p(z In general, the joint density p(z V ) of an NNMP is intractable.However, since both p(z U | z S ) and p(z S ) are products of mixtures, we can recognize that p(z V ) is a finite mixture, which suggests flexibility of the model to capture complex non-Gaussian dependence over the domain D.Moreover, we show in Section 2.3 that for some NNMPs, the joint density p(z V ) has a closed-form expression.In the subsequent development of model properties, we will use the conditional density to characterize an NNMP, where Ne(v) contains the first L locations that are closest to v, selected from locations in S.These locations in Ne(v) are placed in ascending order according to distance, denoted as v (1) , . . ., v (L) .Before closing this section, we note that spatial locations are not naturally ordered.Given a distance function, a different ordering on the locations results in different neighbor sets.Therefore, a different sparse DAG with density p(z S ) is created accordingly for model inference.For the NNMP models illustrated in the data examples, we found through simulation experiments that there were no discernible differences between the inferences based on p(z S ), given two different orderings.This observation is coherent with that from the literature that considers nearest-neighbor likelihood approximations.Since the approximation of p(z S ) to p(z S ) depends on the information borrowed from the neighbors, as outlined in Datta et al. (2016a), the effectiveness is determined by the size of Ne(s i ) rather than the ordering.A further remark is that, the ordering of the reference set S is typically reserved for observed data.Thus, the ordering effect lies only in the model estimation based on (2) with realization z S .Spatial prediction typically rests on locations outside S using (3), where the ordering effect disappears.

NNMPs with Stationary Marginal Distributions
We develop a sufficient condition to construct NNMPs with general stationary marginal distributions.The key feature of this result is that the condition relies on the bivariate distributions that define the first order transition kernels in (4) without the need to impose restrictions on the parameter space.The supplementary material includes the proof of Proposition 1, as well as of Propositions 2, 3 and 4, and of Corollary 1, formulated later in Sections 2.3 and 2.4.
Proposition 1.Consider an NNMP for which the component density f v,l is specified by the conditional density of U v,l given V v,l , where the random vector (U v,l , V v,l ) follows a bivariate distribution with marginal densities f U v,l and f V v,l , for l = 1, . . ., L. The NNMP has stationary marginal density f Z if it satisfies the invariant condition: Z(s 1 ) ∼ f Z , s 1 ∈ S, and for every , for all z and for all l.This result builds from the one in Zheng et al. (2022) where mixture transition distribution models with stationary marginal distributions were constructed.It applies regardless of Z(v) being a continuous, discrete or mixed random variable, thus allowing for a wide range of non-Gaussian marginal distributions and a general functional form, either linear or non-linear, for the expectation with respect to the conditional density p in (4).
As previously discussed, the mixture model formulation for the conditional density in (4) induces a finite mixture for the NNMP finite-dimensional distributions.On the other hand, due to the mixture form, an explicit expression for the covariance function is difficult to derive.A recursive equation can be obtained for a class of NNMP models for which the conditional expectation with respect to and for all v ∈ D. Suppose the NNMP has a stationary marginal distribution with finite first and second moments.Without loss of generality, we assume the first moment is zero.Then the covariance over any two locations where , and without loss of generality, we assume i > j.The covariance in (5) implies that, even though the NNMP has a stationary marginal distribution, it is second-order non-stationary.

Construction of NNMP Models
The spatially varying conditional densities f v,l in (4) correspond to a sequence of bivariate distributions indexed at v, namely, the distributions of (U v,l , V v,l ), for l = 1, ..., L. To balance model flexibility and scalability, we build spatially varying distributions by considering the distribution of random vector (U l , V l ), for l = 1, . . ., L, and extending some of its parameters to be spatially varying, that is, indexed in v.To this end, we use a probability model or a link function.We refer to the random vectors (U l , V l ) as the set of base random vectors.With a careful choice of the model/function for the spatially varying parameter(s), this construction method reduces significantly the dimension of the parameter space, while preserving the capability of the NNMP model structure to capture spatial dependence.
We illustrate the method with several examples below, starting with a bivariate Gaussian distribution and its continuous mixtures for real-valued data, followed by a general strategy using bivariate copulas that can model data with general support.Before proceeding to the examples, we emphasize that our method allows for general bivariate distributions.One can also consider using a pair of compatible conditionals to specify bivariate distributions (Arnold et al., 1999), for instance, a pair of Lomax conditionals.This is illustrated in Example 4 in Section 2.4.
Example 1. Gaussian and continuous mixture of Gaussian NNMP models.
For l = 1, . . ., L, take (U l , V l ) to be a bivariate Gaussian random vector with mean µ l 1 2 and covariance matrix Σ l = σ 2 l 1 ρ l ρ l 1 , where 1 2 is the two-dimensional column vector of ones, resulting in a Gaussian conditional density ).If we extend the correlation parameter to be spatially varying, ρ l (v) = k l (v, v (l) ), for a correlation function k l , we obtain the spatially varying conditional density, This NNMP is referred to as the Gaussian NNMP.If we take Z(s 1 ) ∼ N (z | µ, σ 2 ), and set µ l = µ and σ 2 l = σ 2 , for all l, the resulting model satisfies the invariant condition of Proposition 1 with stationary marginal given by the N (µ, σ 2 ) distribution.The finite-dimensional distribution of the Gaussian NNMP is characterized by the following proposition.
We refer to the model in Proposition 2 as the stationary Gaussian NNMP.Based on the Gaussian NNMP, various NNMP models with different families for (U l , V l ) can be constructed by exploiting location-scale mixtures of Gaussian distributions.We illustrate the approach with the skew-Gaussian NNMP model.Denote by TN(µ, σ 2 ; a, b) the Gaussian distribution with mean µ and variance σ 2 , truncated at the interval (a, b).Building from the Gaussian NNMP, we start with a conditional bivariate Gaussian distribution for (U l , V l ), given z 0 ∼ TN(0, 1; 0, ∞), where µ l is replaced with µ l + λ l z 0 .Marginalizing out z 0 yields the bivariate skew-Gaussian distribution for (U l , V l ) (Azzalini, 2013).Extending again ρ l to ρ l (v), for all l, we can express the conditional density p(z(v) | z Ne(v) ) for the skew-Gaussian NNMP model as , where we have: Setting λ l = λ, µ l = µ, and σ 2 l = σ 2 , for all l, we obtain the stationary skew-Gaussian NNMP model, with skew-Gaussian marginal The skew-Gaussian NNMP model is an example of a location mixture of Gaussian distributions.Scale mixtures can also be considered to obtain, for example, the Student-t model.In that case, we replace the covariance matrix Σ l with cΣ l , taking c as a random variable with an appropriate inverse-gamma distribution.Important families that admit a location and/or scale mixture of Gaussians representation include the skew-t, Laplace, and asymmetric Laplace distributions.Using a similar approach to the one for the skew-Gaussian NNMP example, we can construct the corresponding NNMP models.
A copula function C : [0, 1] p → [0, 1] is a function such that, for any multivariate distribution F (z 1 , . . ., z p ), there exists a copula C for which F (z 1 , . . ., z p ) = C(F 1 (z 1 ), . . ., F p (z p )), where F j is the marginal distribution function of Z j , j = 1, . . ., p (Sklar, 1959).If F j is continuous for all j, C is unique.A copula enables us to separate the modeling of the marginal distributions from the dependence.Thus, the invariant condition in Proposition 1 can be attained by specifying the stationary distribution F Z as the marginal distribution of (U l , V l ) for all l.The copula parameter that determines the dependence of (U l , V l ) can be modeled as spatially varying to create a sequence of spatially dependent bivariate vectors (U v,l , V v,l ).Here, we focus on continuous distributions, although this strategy can be applied for any family of distributions for F Z .We consider bivariate copulas with a single copula parameter, and illustrate next the construction of a copula NNMP given a stationary marginal density f Z .
For the bivariate distribution of each (U l , V l ) with marginals f U l and f V l , we consider a copula C l with parameter η l , for l = 1, . . ., L. We obtain a spatially varying copula , where c v,l is the copula density of C v,l , and f U v,l = f U l and f V v,l = f V l are the marginal densities of U v,l and V v,l , respectively.Given a prespecified stationary marginal f Z , we replace both f U v,l and f V v,l with f Z , for every v and for all l.We then obtain the conditional density that characterizes the stationary copula NNMP.Under the copula framework, one strategy to specify the spatially varying parameter is through the Kendall's τ coefficient.The Kendall's τ , taking values in [−1, 1], is a bivariate concordance measure with properties useful for non-Gaussian modeling.In particular, its existence does not require finite second moment and it is invariant under strictly increasing transformations.
, where H l is the parameter space associated with C l .The kernel k l should be specified with caution; k l must satisfy axioms in the definition of a bivariate concordance measure (Joe 2014, Section 2.12).We illustrate the strategy with the following example.
Example 3. The bivariate Gumbel copula is an asymmetric copula useful for modeling dependence when the marginals are positive and heavy-tailed.The spatial Gumbel copula can be defined as After we define a spatially varying copula, we obtain a family of copula NNMPs by choosing a desired family of marginal distributions.Section 4.1 illustrates Gaussian and Gumbel copula NNMP models with gamma marginals, and the supplementary material provides an additional example of a Gaussian copula NNMP with beta marginals.
Copula NNMP models offer avenues to capture complex dependence using general bivariate copulas.Traditional spatial copula models specify the finite dimensional distributions of the underlying spatial process with a multivariate copula.However, multivariate copulas need to be used with careful consideration in a spatial setting.For example, it is common to assume that spatial processes exhibit stronger dependence at smaller distances.Thus, copulas such as the multivariate Archimedean copula that induce an exchangeable dependence structure are inappropriate.Though spatial vine copula models (Gräler, 2014) can resolve this restriction, their model structure and computation are substantially more complicated than copula NNMP models.

Mixture Component Specification and Tail Dependence
A benefit of building NNMPs from a set of base random vectors is that specification of the multivariate dependence of Z(v) given its neighbors is determined mainly by that of the base random vectors.In this section, we illustrate this attractive property of the model with the establishment of lower bounds for two measures used to assess strength of tail dependence.
The main assumption is that the base random vector (U l , V l ) has stochastically increasing positive dependence.U l is said to be stochastically increasing in The definition can be extended to a multivariate random vector (Z 1 , . . ., Z p ). Z 1 is said to be stochastically increasing in (Z 2 , . . ., Z p ) if for all (z 2 , . . ., z p ) and (z 2 , . . ., z p ) in the support of (Z 2 , . . ., Z p ), where z j ≤ z j , for j = 2, . . ., p.The conditional density in (4) implies that ) is built from the vector (U l , V l ), then the set of base random vectors determines the stochastically increasing positive dependence of Z(v) given its neighbors.
For a bivariate random vector (U l , V l ), the upper and lower tail dependence coefficients, denoted as λ H,l and λ L,l , respectively, are λ H,l = lim When λ H,l > 0, we say U l and V l have upper tail dependence.When λ H,l = 0, U l and V l are said to be asymptotically independent in the upper tail.Lower tail dependence and asymptotically independence in the lower tail are similarly defined using λ L,l .Let F Z(v) be the marginal distribution function of Z(v).Analogously, we can define the upper and lower tail dependence coefficients for Z(v) given its nearest neighbors, The following proposition provides lower bounds for the tail dependence coefficients.
Proposition 3. Consider an NNMP for which the component density f v,l is specified by the conditional density of U v,l given V v,l , where the random vector (U v,l , V v,l ) follows a bivariate distribution with marginal distribution functions F U v,l and F V v,l , for l = 1, . . ., L.
The spatial dependence of random vector (U v,l , V v,l ) is built from the base vector (U l , V l ), which has a bivariate distribution such that U l is stochastically increasing in V l , for l = 1, . . ., L. Then, for every v, the lower bound for the upper tail dependence coefficient λ q) , and the lower bound for the lower tail dependence coefficient λ Proposition 3 establishes that the lower and upper tail dependence coefficients are bounded below by a convex combination of, respectively, the limits of the conditional distribution functions and the conditional survival functions.These are fully determined by the dependence structure of the bivariate distribution for (U l , V l ).The result is best illustrated with an example.
Example 4. Consider a Lomax NNMP for which the bivariate distributions of the base random vectors correspond to a bivariate Lomax distribution (Arnold et al., 1999) α+1) denotes the Lomax density, a shifted version of the Pareto Type I density.A small value of α indicates a heavy tail.The component conditional survival function of the Lomax NNMP, expressed in terms of the quantile q, is which converges to 2 −α l (v) as q → 1 − .Therefore, the v) .As α l (v) → 0 for all l, the lower bound for λ H (v) tends to one, and hence λ H (v) tends to one, since λ H (v) ≤ 1.As α l (v) → ∞ for all l, the lower bound tends to zero.
Proposition 3 holds for the general framework.If the distribution of (U l , V l ) with F U l = F V l has first order partial derivatives and exchangeable dependence, namely (U l , V l ) and (V l , U l ) have the same joint distribution, the lower bounds of the tail dependence coefficients depend on the component tail dependence coefficients.The result is summarized in the following corollary.
Corollary 1. Suppose that the base random vector (U l , V l ) in Proposition 3 is exchangeable, and its bivariate distribution with marginals F U l = F V l has first order partial derivatives, for all l.Then the upper and lower tail dependence coefficients λ H (v) and λ L (v) are bounded below by L l=1 w l (v)λ H,l (v)/2 and L l=1 w l (v)λ L,l (v)/2, where λ H,l (v) and λ L,l (v) are the tail dependence coefficients with respect to (U v,l , V v,l ).
Under Corollary 1, if the bivariate distribution of (U l , V l ) is symmetric, for instance, an elliptically symmetric distribution, the upper and lower tail dependence coefficients coincide, and can simply be denoted as λ(v).Then, we have that λ(v) ≥ L l=1 w l (v)λ l (v)/2, where λ l (v) is the tail dependence coefficient with respect to (U v,l , V v,l ).
Tail dependence can also be quantified using the boundary of the conditional distribution function, as proposed in Hua and Joe (2014) for a bivariate random vector.In particular, the upper tail dependence of (U l , V l ) is said to have some strength if indicates some strength of dependence in the lower tails.The functions V l (1) are referred to as the boundary conditional distribution functions.We use (1) are the boundary conditional distribution functions for the NNMP model.The upper tail dependence is said to be i) strongest if (1) equals 0 for 0 ≤ q < 1 and has a mass of 1 at q = 1; ii) intermediate if (1) has no mass at q = 1.The strength of lower tail dependence is defined likewise using The following result gives lower bounds for the boundary conditional distribution functions.
Proposition 4. Consider an NNMP for which the component density f v,l is specified by the conditional density of U v,l given V v,l .The spatial dependence of random vector (U v,l , V v,l ) is built from the base vector (U l , V l ), which has a bivariate distribution such that U l is stochastically increasing in V l , for l = 1, . . ., L. Let λ L,l (v) and λ H,l (v) be the lower and upper tail dependence coefficients corresponding to (U v,l , V v,l ).If for a given v, there exists λ L,l (v) > 0 for some l, then the conditional distribution function Similarly, if for a given v, there exists λ H,l (v) > 0 for some l, then the conditional distribution function Proposition 4 complements Proposition 3 to assess strength of tail dependence.It readily applies for bivariate distributions, especially for copulas which yield explicit expressions for the tail dependence coefficients.In particular, the spatially varying Gumbel copula C v,l in Example 3 has upper tail dependence coefficient 2 − 2 1/η l (v) > 0 for η l (v) > 1, so the tail dependence of a Gumbel copula NNMP model has some strength if η l (v) > 1 for some l.In fact, applying the result in Hua and Joe (2014), with a Gumbel copula, (1) degenerates at q = 1, implying strongest tail dependence.
3 Bayesian Hierarchical Model and Inference

Hierarchical Model Formulation
We introduce the general approach for NNMP Bayesian implementation, treating the observed spatial responses as an NNMP realization.The inferential framework can be easily extended to incorporate model components that may be needed in practical settings, such as covariates and additional error terms.We illustrate the extensions with the real data analysis in Section 4.2 and in the supplementary material, and provide further discussion in Section 5.
Our approach for inference is based on a likelihood conditional on the first L elements of the realization z S = (z(s 1 ), . . ., z(s n )) over the reference set S ⊂ D. Following a commonly used approach for mixture models fitting, we use data augmentation to facilitate inference.For each z(s i ), i = L + 1, . . ., n, we introduce a configuration variable i , taking values in {1, . . ., L}, such that , where w(s i ) = (w 1 (s i ), . . ., w L (s i )) , and δ l ( i ) = 1 if i = l and 0 otherwise.Conditional on the configuration variables and the vector (z(s 1 ), . . ., z(s L )) , the augmented model is where θ collects the parameters of the densities f s i ,l .
A key component of the Bayesian model formulation is the prior model for the weights.Weights are allowed to vary in space, adjusting to the neighbor structure of different reference locations.We describe the construction for weights corresponding to a point in the reference set.For non-reference points, weights are defined analogously.Consider a collection of spatially dependent distribution functions {G s i : s i ∈ S} supported on (0, 1).For each s i , the weights are defined as the increments of G s i with cutoff points r s i ,0 , . . ., r s i ,L .More specifically, where 1 A denotes the indicator function for set A. The cutoff points 0 = r s i ,0 < r where k : D × D → [0, 1] is a bounded kernel function with parameters ζ.The kernel and its associated parameters affect the smoothness of the resulting random field.We take G s i as a logit Gaussian distribution, denoted as G s i (• | µ(s i ), κ 2 ), such that the corresponding Gaussian distribution has mean µ(s i ) and variance κ 2 .The spatial dependence across the weights is introduced through the mean µ(s i ) = γ 0 + γ 1 s i1 + γ 2 s i2 , where s i = (s i1 , s i2 ).Given the cutoff points and κ 2 , a small value of µ(s i ) favors large weights for the near neighbors of s i .A simpler version of the model in ( 9) is obtained by letting G s i be the uniform distribution on (0, 1).Then the weights become k We notice that Cadonna et al. (2019) use a set of fixed, uniform cutoff points on [0, 1], i.e., r s i ,l − r s i ,l−1 = 1/L, for spectral density estimation, with a collection of logit Gaussian distributions indexed by frequency.
Finally, we note that an NNMP model requires selection of the neighborhood size L.This can be done using standard model comparison metrics, scoring rules, or information criteria; for example, Datta et al. (2016a) used root mean square predictive error and Guinness (2018) used Kullback-Leibler divergence.In general, a larger L increases computational cost.Datta et al. (2016a) conclude that a moderate value L (≤ 20) typically suffices for the nearest-neighbor Gaussian process models.Peruzzi et al. (2020) point out that a smaller L corresponds to a larger Kullback-Leibler divergence of p(z S ) from p(z S ), regardless of the distributional assumption of the density.Moreover, it is possible that information from the farthest neighbors is also important (Stein et al., 2004).Therefore, for large non-Gaussian data sets with complex dependence, one may seek a larger L to obtain a better approximation to the full model.Our model for the weights allows taking a relatively large neighbor set with less computational demand.We assign small probabilities a priori to distant neighbors.The contribution of each neighbor will be induced by the mixing, with important neighbors being assigned large weights a posteriori.

Estimation and Prediction
We implement a Markov chain Monte Carlo sampler to simulate from the posterior distribution of the model parameters.To allow for efficient simulation of parameters γ and κ 2 , we associate each y(s i ) with a latent Gaussian variable t i with mean µ(s i ) and variance κ 2 .There is a one-to-one correspondence between the configuration variables i and latent variables t i : i = l if and only if t i ∈ (r * s i ,l−1 , r * s i ,l ) where r * s i ,l = log(r s i ,l /(1 − r s i ,l )), for l = 1, . . ., L. The posterior distribution of the model parameters, based on the new augmented model, is where π θ and π ζ are the priors for θ and ζ, respectively, I n−L is an (n − L) × (n − L) identity matrix, the vector t = (t L+1 , . . ., t n ) , and the matrix D is (n − L) × 3 such that the ith row is (1, s L+i,1 , s L+i,2 ).The posterior full conditional distribution of θ depends on the form of f s i ,l .Details for the models implemented in Section 4 are given in the supplementary material.
To update ζ, we first marginalize out the latent variables t i from the joint posterior distribution.We then update ζ using a random walk Metropolis step with target den- ), for l = 1, ..., L. Hence, each t i can be updated by sampling from the l-th truncated Gaussian with probability proportional to q l (s i ).The posterior full conditional distribution of γ is ). Turning to the prediction, let v 0 ∈ D. We obtain posterior predictive samples of z(v 0 ) as follows.If v 0 / ∈ S, for each posterior sample of the parameters, we first compute the cutoff points , and obtain the weights ), for l = 1, . . ., L. We then predict z(v 0 ) using (3).If v 0 ≡ s i ∈ S, we generate z(v 0 ) similarly, but using samples for the weights collected from the posterior simulation, and applying (2) instead of (3) to generate z(v 0 ).

Simulation Study
We have conducted several simulation experiments to study the inferential benefits of the NNMP modeling framework.Here, we present one synthetic data example.Three additional simulation examples are detailed in the supplementary material, the first demonstrates that Gaussian NNMP models can effectively approximate Gaussian random fields, the second illustrates the ability of skew-Gaussian NNMPs to handle data with different levels of skewness, and the third explores a copula NNMP model with beta marginals for bounded spatial data.
In this section, we demonstrate the use of copulas to construct NNMPs for tail dependence modeling.Our focus is on illustrating the flexibility of copula NNMPs for modeling complex dependence structures, and not specifically on extreme value modeling.
To simulate data, we created a regular grid of 200 × 200 resolution on a unit square domain D, and generated over this grid a realization from random field y(v) = F −1 T ν (ω(v)) , v ∈ D; see Fig. 1(a).Here, ω(v) is a standard Student-t process with tail parameter ν and scale matrix specified by an exponential correlation function with range parameter φ w , and the distribution functions F and T ν correspond to a gamma distribution, Ga(2, 2) with mean 1, and a standard Student-t distribution with tail parameter ν, respectively.For a given pair of locations in D with correlation ρ 0 = exp(−d 0 /φ w ), the corresponding tail dependence coefficient of the random field is χ ν = 2T ν+1 − (1 + ν)(1 − ρ 0 )/(1 + ρ 0 ) .We took φ w = 1/12, and chose ν = 10 so that the synthetic data exhibits moderate tail dependence at close distance, and the dependence decreases rapidly as the distance d 0 becomes larger.In particular, for ρ 0 = 0.05, 0.5, 0.95, we obtain χ 10 = 0.01, 0.08, 0.61, respectively.We applied two copula NNMP models.The models are of the form in (7) with stationary gamma marginal Ga(a, b) with mean a/b.In the first model, the component copula density c v,l corresponds to a bivariate Gaussian copula, which is known to be unsuitable for tail dependence modeling.The spatially varying correlation parameter of the copula was specified by an exponential correlation function with range parameter φ 1 .In the second model, we consider a spatially varying Gumbel copula as in Example 3. The spatially varying parameter of the copula density is defined with the link function , where the upper bound 50 ensures numerical stability.When η l (d 0 ) = 50, exp(−d 0 /φ 2 ) = 0.98.With this link function, we assume that given φ 2 , the strength of the tail dependence with respect to the lth component of the Gumbel model stays the same for any distance smaller than d 0 between two locations.For the cutoff point kernels, we specified an exponential correlation function with range parameters ζ 1 and ζ 2 , respectively, for each model.The Bayesian model is completed with a IG(3, 1/3) prior for φ 1 and φ 2 , a Ga(1, 1) prior for a and b, a IG(3, 0.2) prior for ζ 1 and ζ 2 , and N (γ | (−1.5, 0, 0) , 2I 3 ) and IG(κ 2 | 3, 1) priors.
We randomly selected 2000 locations as the reference set with a random ordering for model fitting.For the purposes of illustration, we chose neighbor size L = 10.Results are based on posterior samples collected every 10 iterations from a Markov chain of 30000 iterations, with the first 10000 samples being discarded.Implementation details for both models are provided in the supplementary material.The computing time was around 18 minutes.Fig. 1 shows the random fields, marginal densities, and conditional survival probabilities estimated by the two models.From Fig. 1(a)-1(c), we see that, comparing with the true field, the posterior median estimate by the Gumbel copula model seems to recover the large values better than the Gaussian copula model.Besides, as shown in Fig. 1(d), the Gumbel copula model provides a more accurate estimate of the marginal distribution, especially in the tails.We computed the conditional survival probabilities at two different unobserved sites marked in Fig. 1(a).In particular, Site 1/2 is surrounded by reference set observations with moderate/large values.The Gumbel copula model provides much more accurate estimates of the conditional survival probabilities, indicating that the model captures better the tail dependence structure in the data.Overall, this example demonstrates that the Gumbel copula NNMP model is a useful option for modeling spatial processes with tail dependence.

Mediterranean Sea Surface Temperature Data Analysis
The study of Ocean's dynamics is crucial for understanding climate variability.One of the most valuable sources of information regarding the evolution of the state of the ocean is provided by the centuries-long record of temperature observations from the surface of the oceans.The record of sea surface temperatures consists of data collected over time at irregularly scattered locations.In this section, we examine the sea surface temperature from the Mediterranean Sea area during December 2003.
It is well known that the Mediterranean Sea area produces very heterogeneous temperature fields.A goal of the spatial analysis of sea surface temperature in the area is to generate a spatially continuous field that accounts for the complexity of the surrounding coastlines as well as the non-linear dynamics of the circulation system.An additional source of complexity comes from the data collection process.Historically, the observations are collected from different types of devices: buckets launched from navigating vessels, readings from the water intake of ships' engine rooms, moored buoys, and drifting buoys (Kirsner and Sansó, 2020).The source of some observations is known, but not all the data are labelled.A thorough case study will be needed to include all this information in order to account for possible heterogeneities due to the different measuring devices.That is beyond the scope of this paper.We will focus on demonstrating the ability of the proposed framework to model non-Gaussian spatial processes that, hopefully, capture the complexities of the physical process and the data collection protocol.We notice that in the original record several sites had multiple observations.In those cases we took the median of the observations, resulting in a total of 1966 observations.The data are shown in Fig. 2(a).We first examine the Gaussianity assumption for the data.We compare the Gaussian NNMP and the nearest-neighbor Gaussian process models over a subset of the region where the ocean dynamics are known to be complex and the observations are heterogeneous.The model comparison is detailed in the supplementary material and the results support the NNMP model.
In light of the evidence (Pisano et al., 2020) that sea surface temperature spatial patterns are different over Mediterranean sub-basins, shown in Fig. 2(b), which are characterized by different dynamics and high variability of surface currents (Bouzaiene et al., 2020), we further investigate the sea surface temperature over those sub-basins.We fitted a nonspatial linear model to all data, including longitude and latitude as covariates, and obtained residuals from the linear model.Fig. 2(e) shows that the histograms of the residuals are asymmetric over the sub-basins, indicating skewness in the marginal distribution, with levels of skewness that vary across sub-basins.
The exploratory data analysis suggests the need for a spatial model that can capture skewness.We thus analyze the full data set with an extension of the skew-Gaussian NNMP model.The new model has two features that extend the skew-Gaussian NNMP: (i) it incorporates a fixed effect through the location parameter of the mixture component; (ii) it allows the skewness parameter λ to vary in space.More specifically, the spatially varying conditional density f v,l of the model builds from a Gaussian random vector with mean , where x(v) = (1, v 1 , v 2 ) and z 0 (v) ∼ TN(0, 1; 0, ∞), for all v and for all l, and (v 1 , v 2 ) are longitude and latitude.The conditional density p(y(v) | y Ne(v) ) of the extended model is )) 2 }.After marginalizing out z 0 (v), we obtain that the marginal distribution of Y (v) is SN x(v) β, (λ(v)) 2 +σ 2 , λ(v)/σ , based on the result of Proposition 1.We model the spatially varying λ(v) via a partitioning approach.In particular, we partition the Mediterranean Sea D according to the sub-basins, that is, D = ∪ K k=1 P k , P i ∩ P j = ∅ for i = j, where K = 5.For all v ∈ P k , we take λ(v) = λ k , for k = 1, . . ., K. The partitions P 1 , . . ., P K correspond to the sub-basins: Westernmost Mediterranean Sea, Tyrrhenian Sea, Adriatic Sea, Ionian Sea, and Levantine-Aegean Sea.
We applied the extended skew-Gaussian NNMP model (10) using the whole data set as the reference set, with L chosen to be 10, 15 or 20.The regression parameters β = (β 0 , β 1 , β 2 ) were assigned mean-zero, dispersed normal priors.For the skewness parameters λ = (λ 1 , . . ., λ 5 ), each element received a N (0, 5) prior.We used the same prior specification for other parameters as in the first simulation experiment.Posterior inference was based on thinned samples retaining every 4th iteration, from a total of 30000 samples with a burn-in of 10000 samples.The computing time was around 14, 16, and 20 minutes, respectively, for each of the L values.
Fig. 2(c) provides the temperature posterior median estimate over a dense grid of locations on the Mediterranean Sea.Compared to Fig. 2(a), the estimate generally resembles the observed pattern.The prediction was quite smooth even for areas with few observations.The 95% credible interval width of the prediction over the gridded locations, as shown in Fig. 2(d), demonstrates that the model describes the uncertainty in accordance with the observed data structure; the uncertainty is higher in areas where there are less observations or the observations are volatile.

Discussion
We have introduced a class of geostatistical models for non-Gaussian processes, and demonstrated its flexibility for modeling complex dependence by specification of a collection of bivariate distributions indexed in space.The scope of the methodology has been illustrated through synthetic spatial data examples corresponding to distributions of different nature and support, and with the analysis of sea surface temperature measurements from the Mediterranean Sea.
To incorporate covariates, the NNMP can be embedded in an additive or multiplicative model.The former is illustrated in the supplementary material with a spatially varying regression model.Under an additive model, the posterior simulation algorithm requires extra care as it involves sequential updating of the elements in z S .This may induce slow convergence behavior.An alternative strategy for covariate inclusion is to model the weights or some parameter(s) of the spatially varying conditional density as a function of covariates.For example, in Section 4.2, we modeled the location parameter of the skew-Gaussian marginal as a linear function of the covariates.Posterior simulation under this approach is easily developed by modifying the update of the relevant parameters discussed in Section 3.2 to that of the regression coefficients.
The NNMP model structure not only bypasses all the potential issues from large matrix operations, but it also enhances modeling power.Kernel functions, such as wave covariance functions, that are impractical for Gaussian process-based models due to numerical instability from matrix inversion, can be used as link functions for the spatially varying parameter of the NNMP.One limitation of the NNMP's computation, similar to mixture models, is that the posterior simulation algorithm may experience slow convergence issues.Further development is needed on efficient algorithms for fast computation, especially when dealing with massive, complex data sets.
We have focused in this article on a modeling framework for continuous data.The proposed approach can be naturally extended to modeling discrete spatial data.Modeling options for geostatistical count data in the existing literature involve either spatial generalized linear mixed models or spatial copula models (Madsen, 2009).However, owing to their structures, both models have limitations with respect to the distributional assumption for the spatial random effects, as well as in computational efficiency.The extension to discrete NNMP models has the potential to provide both inferential and computational benefits in modeling large discrete data sets.
It is also interesting to explore the opportunities for the analysis of spatial extremes using the NNMP framework.We developed guidelines in Section 2.4 to choose NNMP mixture components based on strength of tail dependence.The results highlight the ability of the NNMP model structure to capture local tail dependence at different levels, controlled by the mixture component bivariate distributions, for instance, with a class of bivariate extremevalue copulas.Moreover, using NNMPs for spatial extreme value modeling allows for efficient implementation of inference which is typically a challenge with existing approaches (Davison et al., 2012).
Other research directions include extensions to multivariate and spatio-temporal settings.The former extension requires families of high-dimensional multivariate distributions to construct an NNMP.Effective strategies will be needed to define the spatially varying multivariate distributions that balance flexibility and scalability.When it comes to a joint model over time and space, there is large scope for exploring the integration of the time component into the model, including extending the NNMP weights or the NNMP mixture components.
where the second-to-last equality holds by the result that Z(s) ∼ f Z for all s ∈ S and Ne(u) ⊂ S for every u ∈ U.The last equality follows from (1).
Proof of Proposition 2. We verify the proposition by partitioning the domain D into the reference set S and the nonreference set U. We first prove by induction the result for the joint distribution p(z S ) over S. Then to complete the proof, it suffices to show that for every location u ∈ U, the joint density p(z U 1 ) is a mixture of multivariate Gaussian distributions, where U 1 = S ∪{u}.
Taking q → 1 − on both sides of (3), we obtain λ ), where F U v,l and F V v,l are the marginal distribution functions of (U v,l , V v,l ).Similarly, we can obtain the lower bound for λ L (v).
Proof of Corollary 1.We prove the result for λ L (v).The result for λ H (v) is obtained in a similar way.Consider a bivariate cdf F U l ,V l for random vector (U l , V l ), with marginal cdfs with q ∈ [0, 1].If F U l ,V l has first order partial derivatives, applying the L'Hopital's rule, we obtain The above is a reproduced result from Theorem 8.57 of (Joe, 2014).
) of an NNMP model are built from the base random vectors (U l , V l ).By our assumption that F U l = F V l for all l, the marginal cdfs of (U v,l , V v,l ) extended from (U l , V l ) are F v,l = F U v,l = F V v,l for all v and all l.Then we have λ L,l (v) = 2 lim q→0 Proof of Proposition 4. By the assumption that U l is stochastically increasing in V l and that (U v,l , V v,l ) is constructed based on (U l , V l ), U v,l is stochastically increasing in V v,l for all v ∈ D and for all l.Then for (Z(v), Z(v (l) )) with respect to the bivariate distribution of (U v,l , V v,l ) with marginal distributions F U v,l and F V v,l , we have that In each of the experiments, we created a regular grid of 200 × 200 resolution on a unit square domain, and generated data on each grid location.We then randomly selected a subset of the data as the reference set with a random ordering for model fitting.For the purpose of illustration, we chose neighbor size L = 10 for both cases.Results are based on posterior samples collected every 10 iterations from a Markov chain of 30000 iterations, with the first 10000 samples as a burn-in.

B.1 Additional Simulation Experiment 1
We generated data from a spatially varying regression, where z(v) is a standard Gaussian process with an exponential correlation function with range parameter 1/12.We included an intercept and a covariate drawn from N (0, 1) in the model, and chose β = (β 0 , β 1 ) = (1, 5) , and τ 2 = 0.1.The setting followed the simulation experiment in Datta et al. (2016a).
We applied two models.The first one assumes that z(v) follows an NNGP model with variance σ 2 0 and exponential correlation function with range parameter φ 0 .The second one assumes that z(v) follows a stationary GNNMP model, i.e., µ l = 0 and σ 2 l = σ 2 for all l, such that z(v) has a stationary marginal N (0, σ 2 ).For the GNNMP, we used exponential correlation functions with range parameter φ and ζ, respectively, for the correlation with respect to the component density and the cutoff points kernel function.For the NNGP model, we implement the latent NNGP algorithm from the spNNGP package in R (Finley et al., 2020).We trained both models using 2000 of the 2500 observations, and used the remaining 500 observations for model comparison.
Table 1 shows the performance metrics of the two models.The performance metrics of the GNNMP model are comparable to those of the NNGP model, the model assumptions of which are more well suited to the particular synthetic data example.The posterior median estimate of the spatial random effects from both models are shown in Figure 1.We can see that the predictive field given by the GNNMP looks close to the true field and that predicted by the NNGP.On the whole, the GNNMP model provides a reasonably good approximation to the Gaussian random field.
We focus on the model performance on capturing skewness .The posterior mean and 95% credible interval of λ for the three scenarios were −3.65 (−4.10, −3.27), 1.09 (0.91, 1.28) and 7. 69 (6.88, 8.68), respectively, indicating the model's ability to estimate different levels of skewness.The bottom row of Figure 2 shows that the posterior median estimates of the surfaces capture well features of the true surfaces, even when the level of skewness is small, thus demonstrating that the model is also able to recover near-Gaussian features.Figure 3 plots the posterior mean and pointwise 95% credible interval for the marginal density, overlaid on the histogram of the simulated data for each of the three cases.These estimates demonstrate the adaptability of the skew-GNNMP model in capturing skewed random fields with different levels of skewness.

B.3 Additional Simulation Experiment 3
Many spatial processes are measured over a compact interval.As an example, data on proportions are common in ecological applications.In this experiment, we demonstrate the effectiveness of the NNMP model for directly modeling bounded spatial data.We generated data using the model y(v) = F −1 Φ(ω(v)) , where the cdf F corresponds to a beta distribution, denoted as Beta(a 0 , b 0 ), and ω(v) is a standard Gaussian process with exponential correlation function with range parameter 0.1.We set a 0 = 3, b 0 = 6.We applied a Gaussian copula NNMP model with stationary marginal Beta(a, b), with the same spatial Gaussian copula and prior specification used in the second experiment.We used 2000 observations to train the model.Figure 4(b) shows the estimated random field which captures well the main features of the true field in Figure 4(a).The posterior mean and pointwise 95% credible interval of the estimated marginal density in Figure 4(c) overlay on the data histogram.These show that the beta NNMP estimation and prediction provide good approximation to the true field.
It is worth mentioning that implementing the beta NNMP model is simpler than fitting existing models for data corresponding to proportions.For example, a spatial Gaussian copula model, that corresponds to the data generating process of this experiment, involves computations for large matrices.Alternatively, if a multivariate non-Gaussian copula is used, the resulting likelihood can be intractable and require certain approximations.Another model that is commonly used in this setting is defined analogously to a spatial generalized linear mixed model.The spatial element in the model is introduced through the transformed mean of the observations.A sample-based approach to fit such a model requires sampling a large number of highly correlated latent variables.We conducted an experiment to demonstrate the effectiveness of the beta NNMP to approximate random fields simulated by the link function approach.We generated data as follows.
The above model is analogous to a spatial generalized linear mixed model where the mean µ(v) of the beta distribution is modeled via a logit link function, and ω(v) is a standard Gaussian process with exponential correlation function with range parameter 0.1.We set ψ = 20, µ 0 = −0.5 and σ 0 = 0.8.
Since our purpose is primarily demonstrative, we applied a Gaussian copula NNMP model with a stationary beta marginal Beta(a, b), referred to as the beta NNMP model.The We trained the model using 2000 observations.Figure 5(a)-(b) shows the interpolated surface of the true field and the predictive field given by the beta NNMP model.Although the beta NNMP's stationary marginal distribution assumption does not align with the true model, we can see that the predictive filed was able to capture the main feature of the true field.Moreover, it is worth mentioning that the MCMC algorithm for the beta NNMP to fit the data set took around 18 minutes with 30000 iterations.This is substantially faster than the MCMC algorithm for fitting the true model which involves sampling a large number of highly correlated latent variables.

B.4 Mediterranean Sea Surface Temperature Regional Analysis
In this section, we examine the non-Gaussian process assumption for the Mediterranean Sea surface temperature data.We focus on sea surface temperature (SST) over an area near the Gulf of Lion, along the islands near the shores of Spain, France, Monaco and Italy, between 0 -9 E. longitude and 33.5 -44.5 N. latitude.The SST observations in the region, as shown in Figure 6(a), are very heterogeneous, implying that the short range variability can be non-Gaussian.We compare the GNNMP with the NNGP in a spatially varying regression model, demonstrating the benefit of using a non-Gaussian process to explain the SST variability.In particular, the GNNMP has the same Gaussian marginals as the NNGP, but its finite-dimensional distribution is a mixture of Gaussian distributions.We consider the following spatially varying regression model, where y(v) is the SST observation, x(v) = (1, v 1 , v 2 ) includes longitude v 1 and latitude v 2 to account for the long range variability in SST with regression parameters β = (β 0 , β 1 , β 2 ) , z(v) is a spatial process, and (v) ∼ N (0, τ 2 ) represents the micro-scale variability and/or the measurement error.
We took around 10% of the data in the region, as the held-out data for model comparison, and used the remaining 580 observations to train models.We compare models based on RMSPE, 95% posterior credible interval coverage rate (95% CI coverage), deviance information criterion (DIC; Spiegelhalter et al. 2002), PPLC (Gelfand and Ghosh, 1998), and continuous ranked probability score (CRPS; Gneiting and Raftery 2007).To effectively compare the GNNMP and NNGP models, we first trained the NNGP model with neighbor sizes L from 5 to 20, and selected the optimal neighbor size, L = 11 that corresponds to the smallest RMSPE.We then trained the GNNMP model with the same L.In all cases, we ran the MCMC with 120000 iterations, discarding the first 20000 samples, and collected  samples every 20 iterations.The computing time was around 12 and 9 minutes for the GNNMP and NNGP models, respectively.We report the results for both models.The posterior mean and 95% credible interval estimates of the regression intercept from the GNNMP was higher than the NNGP.They were 28.55 (22.81, 35.34) and 27.18 (23.53, 30.94), respectively.The corresponding posterior estimates of the coefficients for longitude and latitude given by the GNNMP and the NNGP were 0.09 (−0.03, 0.24); −0.32 (−0.50, −0.17) and 0.09 (0.01, 0.17); −0.29 (−0.39, −0.19), respectively.Both models indicated that there was a trend of SST decreasing in the latitude at the selected region.For the error variance τ 2 , the GNNMP provided a smaller estimate 0.12 (0.02, 0.38), compared to 0.45 (0.02, 0.92) from the NNGP.
Regarding the model performance metrics in Table 2, both the PPLC and DIC suggest that the GNNMP had a better goodness-of-fit than the NNGP.For out-of-sample prediction, the GNNMP produced smaller RMSPE and CRPS than the NNGP.Both models gave the same 95% CI coverage, while the one from GNNMP had a narrower width.Figure 6(b)-6(c) show the posterior median estimates of the temperature field from both models.We can see that both models yield estimates that resemble the pattern in the observations.The predictive surface produced by the NNGP depicts some very localized, unrealistic features.These are not present in the results from the GNNMP.
The MCMC sampler to obtain samples from the joint posterior distribution is described in the main paper.We present the posterior updates of λ, σ 2 and φ.Note that the configuration variables i are such that i = l if t i ∈ (r * s i ,l−1 , r * s i ,l ) for i ≥ L + 1. Denote by f s i ,l = b l (s i )N (y(s i ) | ρl (s i )y(s (il) ), ω 2 (1 − (ρ l (s i )) 2 )).We use a random walk Metropolis step to update λ with target density N (λ | µ λ , σ 2 λ ) n i=L+1 f s i , i .The posterior full conditional distributions of σ 2 and φ are proportional to IG(σ 2 | u σ 2 , v σ 2 ) n i=L+1 f s i , i , and IG(φ | u φ , v φ ) n i=L+1 f s i , i , respectively.For each parameter, we update it on its log scale with a random walk Metropolis step.

Figure 1 :
Figure 1: Synthetic data example.Top panels are interpolated surfaces of the true field and posterior median estimates from both models.Bottom panels are estimated marginal densities and conditional survival probabilities from the two models.The green dashed lines correspond to the true model.The red (blue) dashed lines and shaded regions are the posterior mean and 95% credible interval estimates from the Gaussian (Gumbel) copula NNMP models.

Figure 2 :
Figure 2: Mediterranean Sea surface temperature data analysis.Panel (a) shows observations.Panels (b) and (e) are partitions according to Mediterranean sub-basins and histograms of the residuals obtained from a non-spatial linear model.Panels (c) and (d) are posterior median and 95% credible interval estimates of the sea surface temperature from the extended skew-Gaussian NNMP model.

Figure 1 :
Figure 1: Additional simulation experiment 1 data analysis.Interpolated surface of the true Gaussian random field and posterior median estimates from the NNGP and GNNMP models.
Figure 2: Additional simulation example 2 data analysis.Top panels are interpolated surfaces of y(v) generated by (5).Bottom panels are the posterior median estimates from the skew-GNNMP model.

Figure 3 :
Figure 3: Additional simulation example 2 data analysis.Green lines are true marginal densities.Posterior means (dashed lines) and 95% credible intervals (shaded regions) of the estimated marginal densities.

Figure 4 :
Figure 4: Additional simulation example 3 data analysis.Panels (a) and (b) are interpolated surfaces of the true field and posterior median estimate from the beta NNMP model, respectively.In Panel (c), the green dotted line corresponds to the true marginal.The red dash line and shaded region are the posterior mean and pointwise 95% credible interval for the estimated marginal.

Figure 5 :
Figure 5: Additional simulation experiment 3 data analysis.Interpolated surfaces of the true field and posterior median estimate from the beta NNMP model.

Figure 6 :
Figure 6: SST data analysis.Panel (a) shows the observations at the selected region.Panels (b) -(d) plot posterior median estimates of the SST by different models.

Table 1 :
Additional simulation experiment 1 data analysis.Performance metrics of different models

Table 2 :
SST data analysis.Performance metrics of different models