Multivariate isotropic random fields on spheres: Nonparametric Bayesian modeling and Lp fast approximations

We study multivariate Gaussian random fields defined over ddimensional spheres. First, we provide a nonparametric Bayesian framework for modeling and inference on matrix-valued covariance functions. We determine the support (under the topology of uniform convergence) of the proposed random matrices, which cover the whole class of matrix-valued geodesically isotropic covariance functions on spheres. We provide a thorough inspection of the properties of the proposed model in terms of (a) first moments, (b) posterior distributions, and (c) Lipschitz continuities. We then provide an approximation method for multivariate fields on the sphere for which measures of Lp accuracy are established. Our findings are supported through simulation studies that show the rate of convergence when truncating a spectral expansion of a multivariate random field at a finite order. To illustrate the modeling framework developed in this paper, we consider a bivariate spatial data set of two 2019 NCEP/NCAR Flux Reanalyses.


Introduction
The paper deals with multivariate (or vector-valued) Gaussian random fields defined over the d-dimensional unit sphere S d = {x ∈ R d+1 , x = 1} embedded in R d+1 , having a specified matrix-valued covariance function, see Ma (2012), Du, Ma and Li (2013), Ma (2015), Porcu, Bevilacqua and Genton (2016), Alegría et al. (2019a), Bevilacqua, Diggle and Porcu (2019) and Alegría (2020) for recent literature on this topic. Choosing the d-dimensional sphere as an index set is motivated by the increasing availability of global data, covering a big portion of our planet, and we remind the reader of recent applications in Reinsel et al. (1981), Di Lorenzo et al. (2014), Nychka et al. (2015), Combes, Hormázabal and Di Lorenzo (2017), Edwards, Castruccio and Hammerling (2019). In addition to the Earth sciences, Cosmic Microwave Background (CMB) radiation provides a natural application for Gaussian random field models over S 2 (see Cabella and Marinucci, 2009;Marinucci and Peccati, 2011). Recently, random fields using the circle as an index set have been used to capture temporal seasonality (Shirota and Gelfand, 2017;White and Porcu, 2019;Mastrantonio et al., 2019) or direction (Jona-Lasinio, Gelfand and Jona-Lasinio, 2012;Wang and Gelfand, 2014). Additionally, spheres of dimension d > 2 are relevant in various disciplines, such as physics, chemistry and material sciences, as discussed in Dette et al. (2019). For other manifolds we refer the reader to Kerkyacharian et al. (2018), Lu, Leonenko and Ma (2020) and .
There is a rich literature on nonparametric Bayesian approaches for scalarvalued random fields on planar surfaces. See, for instance, Schmidt and O'Hagan (2003); Gelfand, Kottas and MacEachern (2005); Duan, Guindani and Gelfand (2007) ;Zheng, Zhu and Roy (2010); Reich and Fuentes (2012); Chopin, Rousseau and Liseo (2013); Holbrook et al. (2018) and Müeller, Quintana and Page (2018). In addition, there has also been some recent work on nonparametric Bayesian inference for scalar-valued random fields and their covariance functions on spheres  and more general manifolds (Castillo, Kerkyacharian and Picard, 2014). Despite the activity in this area, multivariate random fields on spheres have not been considered so far under the Bayesian nonparametric framework. The benefit of adopting a nonparametric Bayesian framework are that (a) we focus on Gaussian random fields, so that the matrix-valued covariance function becomes crucial for modeling, inference, and prediction, and (b) we can exploit the simple spectral representation for multivariate covariance functions on spheres (Hannan, 2009;Yaglom, 1987). Second, fast simulation of multivariate Gaussian random fields over spheres has only been considered to a limited extent, and the reader is referred to the HEALPix software (Gorski et al., 2005), and recent contributions by Emery and Porcu (2019) and Alegría, Emery and Lantuéjoul (2020).
We focus our work on developing approaches for nonparametric Bayesian modeling and fast simulation for multivariate random fields over spheres. Specifically, our paper contributes as follows: 1. We provide a strategy to nonparametric Bayesian modeling of matrixvalued covariance functions over d-dimensional spheres. The basic idea is that the matrix-valued covariance function is considered as a random matrix with a given prior probability distribution. 2. We show that our problem is well-posed in the sense that the support of these random matrix-valued covariance functions is, under uniform topol-ogy, equal to the whole set of geodesically isotropic covariance functions that are defined over d-dimensional spheres. 3. We then provide a detailed study of the posterior distributions, the firstorder moments of the so-defined random matrices, and the geometric properties of these random matrices in terms of Lipschitz continuities. 4. We propose an approximation method for k-variate random fields with given classes of geodesically isotropic matrix-valued covariance functions. Our method is based on truncation at a final order of a spectral expansion associated with a k-variate random field defined over S d . We assess the accuracy of the approximation in the L p sense, detailed subsequently. Our findings are then supported through a simulation study on bivariate random fields. 5. We illustrate our findings on a bivariate climate data set where we discuss modeling details.
The plan of the paper is the following: Section 2 contains the necessary mathematical background that is necessary to illustrate our findings. Contributions 1.-3. above are provided in Section 3, and contribution 4. in Section 4. Section 5 presents an analysis of a bivariate global climate dataset (contribution 5.). The paper concludes with a discussion in Section 6. Technical proofs are deferred to the Appendix.

Notation
We use A to denote the transpose of a matrix A. For a positive semi-definite square matrix, A, a square root, A 1/2 , is a matrix of the same order of A such that A = A 1/2 (A 1/2 ) . This decomposition is unique when A is (strictly) positive definite. For the remainder of the paper, I(k) denotes the k × k identity matrix.
Let R k1×k2 denote the space of matrices of order k 1 × k 2 , where k 1 , k 2 ∈ N. The Frobenius inner product in R k1×k2 is defined as where the trace of a square matrix is defined as the sum of the elements on its main diagonal. The corresponding norm of a matrix takes the form A F = A, A 1/2 F . The results provided in Section 4 make use of the following Lebesgue norms. Let (M, μ) be a measure space and p ≥ 1. The L p (M; R k )-norm of a vector valued function f : M → R k is defined as Special emphasis is put on the case M = S d , equipped with the spherical measure μ. The corresponding inner product of two vector valued functions f , g : S d → R k , takes the form: Now, let (Ω, F, P) be a probability space. The vector valued function f : For the special case p = q = 2 we have

Multivariate random fields on spheres
We define a k-variate zero-mean Gaussian random field as a vector-valued stochastic process that is continuously indexed over S d . The components Z i , i = 1, . . . , k, are called scalar random fields. The covariance function C : [0, π] → R k×k , is a matrix valued mapping, whose elements are defined as C i,j (θ(x 1 , is the geodesic distance, defined as and through the paper, we make use of the shortcut θ whenever there is no confusion. Following Porcu, Bevilacqua and Genton (2016) and Alegría et al. (2019a), we say that C is geodesically isotropic, as it just depends on x 1 and x 2 through their geodesic distance. This paper does not adopt any strategy of covariance modeling through the chordal distance, which has received substantial criticisms in the recent papers by Gneiting (2013) and Porcu, Bevilacqua and Genton (2016), as well as an earlier paper by Banerjee (2005). Throughout, the diagonal elements C i,i are called marginal covariance functions, while the off-diagonal elements C i,j are called cross covariance functions. For the remainder of the paper, we always assume continuity for the elements C i,j of the matrix-valued mapping C. This assumption is not too restrictive because discontinuous models are rarely used in practice, with the exception of the nugget effect model which is regularly used as a summand in additive covariance structures for capturing abrupt short scale variations in spatial data applications (Chiles and Delfiner, 2009).
The mapping C is positive semi-definite; that is, for all positive integer n, {x 1 , . . . , x n } ⊂ S d and {a 1 , . . . , a n } ⊂ R k . A necessary and sufficient condition (Hannan, 2009) for the continuous mapping C to be the covariance matrix function of a k-variate stationary and isotropic Gaussian random field on S d (i.e., to be positive semi-definite) is that there exists a summable sequence of positive semidefinite k × k matrices {B n , n = 0, 1, 2, . . . } such that, for θ ∈ [0, π], where c n (d, x) is defined as the normalized Gegenbauer polynomial, that is, (8) Since the Chebyshev polynomials T n = C (0) n are already normalized, the last expression is not valid for d = 1 and c n (1, x) = T n (x). Such a result is the extension to the matrix-valued case of the celebrated Schoenberg's theorem (Schoenberg, 1942). We note that we have been sloppy when writing B n instead of B n,d in (7): clearly, for any dimension d, there exists a different sequence B n,d satisfying (7). Yet, this is not relevant for the present work, and for other technical aspects, the reader is referred to Porcu, Alegría and Furrer (2018) and the references therein. Representation (7) has a precise interpretation in terms of the associated zero-mean Gaussian k-variate process, Z, which admits a uniquely determined expansion of the type where Y n,m denotes spherical harmonics, an orthonormal basis for the space of squared integrable functions (against the Lebesgue measure) on the sphere; for more details see Dai and Xu (2013). The index set N n,d is finite and for details the reader is referred to Porcu, Alegría and Furrer (2018). Here, {ξ n,m } is a sequence of random vectors in R k that have zero mean and such that Eξ n,m ξ k,h = δ n=k δ m=h B n , where δ is the Kronecker delta function and B n are the non negative and summable matrices defined at (7). An alternative expansion, with the same covariance structure, will be discussed in Section 4.

Random matrix covariances, and their support
For a neater illustration of the results contained in this section, a slight change of notation is needed. We consider a stochastic process, W , being continuously indexed over S d × {1, . . . , k}, and such that W (x, j) = Z j (x) for x ∈ S d and j = 1, . . . , k, with Z j being the j-th component of the k-variate random field Z defined at (9). The covariance function K associated with the process W is defined as Clearly, K (x 1 , i), (x 2 , j) is the (i, j)-entry of the covariance matrix function C(θ(x 1 , x 2 )) defined at (7). Thus, where (B n ) i,j denotes the (i, j)-entry of B n . We denote by A the class of continuous covariance functions K on S d × {1, . . . , k}, for some positive integer k, which admit the representation (10). Apparently, any continuous function K belonging to the class A implies that the matrix valued function C having entries C i,j (θ(x 1 , x 2 )) = K (x 1 , i), (x 2 , j) will have a uniquely determined expansion as in (7). This fundamental connection is made explicit several times in the exposition of our results. Some comments are in order. Since each matrix B n in (7) is positive semidefinite, by the Cholesky decomposition, there exists a lower triangular matrix D n such that B n = D n D n , for n = 0, 1, . . . , where the diagonal entries of D n are nonnegative. Cholesky decomposition is unique if B n is positive definite, and it is equivalent to: As a consequence of Schoenberg's representations (7) and (10), the matrices D n , n = 0, 1, 2, . . . , satisfy for every i, j = 1, . . . , k.
The primary goal here is to propose a nonparametric Bayesian model by assigning prior distributions to the (random) matrices B n in (7). This is equivalent to assign a prior distribution to the matrices D n , n = 0, 1, 2, . . . , which are therefore replaced by random matricesD n that satisfy the following conditions: A.1 each matrixD n is a lower triangular matrix whose diagonal elements are non-negative, namely (D n ) i,j = 0 for 1 ≤ i < j ≤ k, and (D n ) j,j ≥ 0, for j = 1, . . . , k; A.2 the random matricesD 0 ,D 1 ,D 2 , . . . are independent; A.3 for each n = 0, 1, . . . , the R + -valued random variables (D n ) j,j (j =1, . . . , k) are i.i.d. and the real-valued random variables for i, j = 1, . . . , k. The following result shows that these conditions are indeed sufficient to ensure almost sure convergence of the sequence of random matrices {B n } ∞ n=0 , a fundamental ingredient for the uniquely determined expansion (7) to be well-defined.
Conditions A.1-A.4, in concert with Proposition 1, are necessary to construct a covariance function,K, random version of the covariance function defined at (10). We define it through for every x 1 , x 2 ∈ S d and every j 1 , j 2 ∈ {1, . . . , k}. This implicitly defines a random matrix-valued covariance function, C, having expansion Apparently, the random nature ofK and C are ensured provided those are, actually, a random variable, and a random matrix respectively. This is in turn ensured by the following result. (14) is measurable. Therefore,K is a random variable.

Proposition 2. Let A be equipped with the topology induced by the uniform (or sup) norm and consider the sigma-algebra generated by the open sets in such topology. Then the mapK : Ω → A defined in Equation
The final step in the Bayesian nonparametric construction specified through Conditions A.1-A.4 is to establish the support of the random variableK under a specified topology. We choose to work under the topology of uniform convergence. To establish our next result, an additional condition is needed: and let the support of (D n ) 2,1 be R.
With this additional assumption, we are ready to provide the core result of this section.

Theorem 3. Let conditions A.1-A.5 hold true. Then, the support ofK in the uniform topology is A.
Theorem 3 shows that the support of the random matrix-valued covariance function, C, is the whole class of matrix-valued positive definite functions defined by the series representation (7).

Moments, posteriors, and Lipschitz continuities
The previous section developed an approach for Bayesian nonparametric covariance matrix functions that have a well-defined support under the uniform topology. This section provides further inspection of the properties for this proposed framework. We start by exploring the first-order properties of the random covariance functions,K, defined in (14).
We now provide a technical result needed to establish the properties of our proposed model's posterior distribution.
Proposition 5. Let conditions A.1-A.5 hold true. Then, for every positive integer n, and for every finite system of pairwise distinct points x 1 , . . . , x n ∈ S d , for any (j 1 , . . . , j n ) ∈ {1, . . . , k} n and constants c 1 , . . . , c n ∈ R, where c l = 0, for some l, the following holds almost surely: Alegría et al. Rephrased, Proposition 5 shows thatK is almost surely strictly positive definite. This will be useful later on.
We are now able to study the posterior properties of the proposed model. Suppose that we have observations z := (z(x 1 ), . . . , z(x n )) at points x 1 , . . . , x n in S d sampled from a k-variate Gaussian process on S d . Therefore, z is a (k × n)dimensional matrix, and its (j, i)-entry, which we denote by z j (x i ), is a realization of the random variable Z j (x i ), for j = 1, . . . , k, i = 1, . . . , n. The unknown covariance functionK in (14) is uniquely determined through the sequence of random matricesD 0 ,D 1 , . . . defined through (11), which we denote byD.
Thus, conditional onD, Z is a k-variate stationary and isotropic Gaussian process with covariance matrix functionC. Specifying the probability distribution ofD and the conditional distribution of Z givenD specifies the joint prior distribution ofD and Z. Because conditioning on observations (Z(x 1 ), . . . , Z(x n )) updates the conditional distribution of the random pair (Z,D), we explore the conditional distribution of the parameterD given (Z(x 1 ), . . . , Z(x n )), namely the posterior distribution ofD. To do this, we first define vec(z) as the vectorization of the matrix z, i.e., and let mod denote the modulo operation.
Proposition 6. The posterior P z ofD exists, is unique, and is absolutely continuous with respect to the prior, with uniquely determined density evaluated at where f (z; D) is the kn-dimensional centred multivariate Gaussian density evaluated at vec(z) with covariance matrix Σ n (D), where for The matrix (17) is not singular thanks to Proposition 5. This ensures that the density (16) is well defined.
Importantly, we verify that the posterior P z distribution of the parameters varies smoothly as a function of z. This relates to robustness to changes in the observed data. Indeed, this property ensures that small changes in the data lead to small changes in the posterior distribution, and this is usually established through continuity in the Hellinger metric. If this holds, we say that the estimation task is well-posed (see Stuart, 2010;Dashti and Stuart, 2016).
To address well-posedness of the posterior, recall the definition of the Hellinger distance between two probability measures, ν 1 and ν 2 , dominated by the same measure ν with Radon-Nikodym derivatives f 1 and f 2 , respectively: We are now able to illustrate our next finding.
Theorem 7. The posterior P z given in Proposition 6 is Lipschitz continuous with respect to the Hellinger distance in the data z: for any r > 0, there exists a strictly positive constant m(r) such that for all z 1 ,

Approximations on L p spaces
This section suggests a method of approximation of k-variate isotropic random fields Z on S d , and studies the level of its accuracy in a L p sense.
The following Karhunen-Loeve expansion, proved in Ma (2016), see also Lu, Leonenko and Ma (2020), plays a fundamental role in this section. Let {V n } ∞ n=0 be a sequence of independent k-variate random vectors with EV n = 0 and is a k-variate isotropic random field on S d , with 0-mean and covariance matrix function of the form i.e., of the form (7) with A n = B n /C Observe that the realizations of the random field defined in (20) will be constant across hyperplanes that intersect the sphere (i.e., constant across arcs over the spherical surface). The vector U determines the orientation of these arcs. Thus, a more flexible variant of the random field constructed above, that preserves its mean and covariance functions, is given by where U 0 , U 1 , . . . is a sequence of independent random vectors, being uniformly distributed on S d , and being independent of the vectors V 0 , V 1 , . . .. The proof of this more general assertion follows the same arguments reported in Ma (2016) (we provide more details in Appendix A.1).
Remark 8. Notice that both expansions, (20) and (21), generate random fields with the desired covariance structure, however their finite-dimensional distributions are clearly non-Gaussian. A common strategy to achieve approximately Gaussian random fields is to overlap several independent realizations (properly rescaled). This procedure is justified by the central limit theorem (for more details, see Lantuéjoul, 2006 andAlegría, Emery andLantuéjoul, 2020, and the references therein).
For R ∈ N we define the truncated series that is the truncated version of the random field, Z, whose expansion has been defined at (21). For justifying the approximation of Z by Z R , we have to show that Z R converges to Z, when R → ∞ in several norms counting the risk simultaneously in stochastic and analytic way. Precisely, the following result characterizes the rate of convergence of the error Z − Z R in L p and P-almost sure sense in terms of the decay of the sequence A n F n∈N .
Theorem 9. Let Z be a k-variate isotropic zero mean Gaussian random field on S d given by (21). Assume that the trace of the matrix A decays algebraically with order d − 1+ε; i.e., there exist c 0 > 0 and N 0 ∈ N such that for all n ≥ N 0 , for some ε > 0. Then, the series of truncated random fields {Z R } R∈N converges to the random field Z (either) in the following sense: 1. in L 2 Ω, L 2 (S d ; R k ) , and the truncated error is bounded by 2. in L p (Ω, L 2 (S d ; R k )) for every p > 0. Moreover, there exists a constant c = c(d, ε, p) > 0 such that the truncated error to be bounded by for R ≥ N 0 , where c = c(d, ε) > 0; 3. P-almost surely, and the truncated error is asymptotically bounded by for γ < ε/2.

Numerical illustrations
The truncated expansion (22) can be used to simulate zero-mean random fields with a prescribed second order dependency structure. The experiments developed in this section are the multivariate versions of similar numerical studies reported by Lang and Schwab (2015) for univariate random fields.
In this section, we restrict ourselves to the bivariate case, where the (i, j)th entry of the Schoenberg's matrix B n is given by i,i is the variance of the ith component of the multivariate random field, and ρ i,j is the colocated correlation coefficient. Thus, we can write ρ i,j in the equation above because ρ i,i = 1 by construction. Throughout, we assume that σ i,i = 1, for i = 1, 2. Thus, both components of the random field will have unit variance. Our numerical findings are based on the following two special examples for the sequence {(b n ) i,j }.
• F family. Bevilacqua, Diggle and Porcu (2019) have recently proposed a bivariate structure on the basis of the so-called F family of functions (Alegría et al., 2019b). Precisely, this model is defined as where τ i,j , α i,j and ν i,j are positive parameters, and B(·, ·) is the Beta function. We fix τ ij = 1, for all i, j = 1, 2, since we are essentially interested in the smoothness and range of the associated random field, which are governed by the parameters ν ij and α ij , respectively (see Alegría et al., 2019b). The technical condition |ρ 12 | ≤ B(α 12 , ν 12 )/ B(α 11 , ν 11 )B(α 22 , ν 22 ), stated in Bevilacqua, Diggle and Porcu (2019), is fundamental to ensure the validity of this model. We consider the following scenarios: -Scenario F1. A random field with each component having the same smoothness and range: ν ij = 2, α ij = 4, for all i, j = 1, 2. The cross correlation coefficient is ρ 12 = 0.85, which means that both fields are strongly positively correlated.
-Scenario F2. Same scenario as in F1, but ν ij = 4, α ij = 8, for all i, j = 1, 2. The sample paths are expected to be smoother in comparison to Scenario I. The cross correlation coefficient is set to ρ 12 = −0.85, which means that both fields are strongly negatively correlated.
-Scenario F3. A random field with all components having the same smoothness, but different values of α: ν ij = 4, for all i, j = 1, 2, and α 11 = 8, α 22 = 4, and α 12 = 6. Here, the range is smaller for the first random field, since an increase in α corresponds to decreasing the correlation range. The cross correlation coefficient is chosen as ρ 12 = 0.85. Under this choice both fields are strongly positively correlated.
-Scenario F4. A random field with all components having the same range: α ij = 8, for all i, j = 1, 2, and different degrees of smoothness: ν 11 = 2, ν 22 = 4, and ν 12 = 3. Here, the second random field is smoother than the first one. The cross correlation coefficient is chosen as ρ 12 = 0.85. Under this choice both fields are strongly positively correlated.
• Multiquadric family. The bivariate multiquadric model is characterized by the sequences (Ma, 2012; Bevilacqua, Diggle and Porcu, 2019) where ζ ij ∈ (0, 1) are parameters. This model is admissible provided that (Bevilacqua, Diggle and Porcu, 2019) Unlike the F family, the members of the multiquadric class are infinitely differentiable at the origin, so they always induce random fields with smooth realizations. For this parametric class, we look at the following scenarios: -Scenario M1. We consider ζ 11 = 0.8, ζ 22 = 0.7 and ζ 12 = 0.65, which indicates that the range of the first random field component is smaller. The cross correlation coefficient is ρ 12 = 0.65, which means that both fields are positively correlated.
-Scenario M2. Same values for ζ ij as in M1, but the cross correlation coefficient is ρ 12 = −0.65, which means that both fields are negatively correlated.
-Scenario M3. We consider ζ 11 = 0.5, ζ 22 = 0.4 and ζ 12 = 0.35. Again, it indicates that the range of the first random field component is smaller. The cross correlation coefficient is ρ 12 = 0.8, which means that both fields are strongly positively correlated.
-Scenario M4. Same values for ζ ij as in M3, but the cross correlation coefficient is ρ 12 = −0.8, which means that both fields are strongly negatively correlated. Figures 1 and 2 show simulated bivariate random fields on S 2 , from the F and multiquadric families, respectively, over a grid of size 600 × 300, with R = 200. The simulations match the theoretical aspects in terms of smoothness, ranges, and cross-correlation.
We turn to a numerical validation of the truncation error derived above. We simulate random fields over 450 sites (these locations were generated through a uniform sampling over the spherical surface), under scenarios F1-F4. We focus on the F family since it has a parameter that controls the asymptotic decay of the Schoenberg's sequence. Indeed, the asymptotic behavior (b F n ) i,j ∼ n −(1+νi,j ) (Alegría et al., 2019b) implies that the condition (23) is satisfied with = min{ν 1,1 , ν 2,2 }. We take as the "true" random field the truncated expansion (22) with R = 500. We gradually truncate the expansion at different values of R and measure the L 2 discrepancy between the truncated and true fields. For each scenario, we repeat this experiment 100 times. Observe that the uniformly distributed spatial locations together with the independent repetitions of the experiment allow us to approximate the integrals involved in the L 2 error by means of a Monte Carlo argument. Figure 3 displays the results. We observe that the empirical error matches the theoretical one for each scenario (the error decays algebraically with order R − min{ν1,1,ν2,2}/2 ).
Although we have paid attention to the two parametric models described above, additional parametric families of Schoenberg's coefficients can be employed, (see, e.g., Ma, 2012 andFuentes, 2016).

NCEP/NCAR bivariate flux reanalysis
In this section, we apply the proposed modeling framework presented in Section 3 to a bivariate meteorological dataset. Recent climatic changes have affected various atmospheric constituents and downward solar radiation flux (see, e.g., Forster et al., 2007;Wild, 2009). Downward solar radiation flux (DSRF throughout) is an important component of the Earth's climate system, affecting, among other things, surface temperature and the hydrologic cycle. DSRF is affected by atmospheric pressures at various levels because these directly affect cloud cover.
In this data illustration, we consider a bivariate spatial data set from two 2019 NCEP/NCAR Flux Reanalyses (Kalnay et al., 1996): atmospheric pressure at high cloud bottom and DSRF averaged over 2019. Of course, by averaging over time, we average over possibly important temporal patterns in these data. We plot these data in Figure 4. These data show apparent negative correlation and strong spatial patterns.  For scalability, we thin these data to n s = 1568 locations such that each remaining observation represents approximately the same area. Thus, we thin more aggressively in polar regions than in equatorial regions. For simplicity in modeling our data, we subtract off the mean and divide by the standard deviation of each quantity so that the sample mean and variance are 0 and 1, respectively.
To specify a covariance matrix function of the form (7), we use independent prior distributions for the elements ofD n that follow assumptions A.1-A.5. We assume a prior distribution of the form where σ 1n ∼ Gamma 1, for n = 0, 1, 2, . . . and where N denotes normal distribution. Here, we assume that all parameters are a priori independent; thus, Assumptions A.2 and A.3 are met. Assumption A.1 is met becauseD n is lower triangular. Because E(σ 2 1n ) = E(a 2 n ) = s 2 1 /(n + 1) 2 with ∞ n=0 s 2 1 /(n + 1) 2 = s 2 1 π 2 /6, Assumption A.4 is satisfied. Given the prior distributions proposed, Assumption A.5 is also satisfied, although the Gamma is not always defined at 0.
Intuitively, the prior distribution provides increased shrinkage or penalization for higher-order terms, effectively pushing these parameters to 0. This type of prior distribution allows us to fit models with many Gegenbauer polynomial terms without many of the issues associated with including many parameters in a model (e.g., overfitting and high parameter uncertainty). Moreover, this prior distribution onD n yields positive definiteB n which, when combined with Gegenbauer polynomials of cos(θ), yield positive definite matrix functions with the great-circle distance as a metric.
We add two practical considerations. First, to account for variation over a very short distance in each outcome, we include nugget effects τ 2 1 and τ 2 2 . Second, although this model is only guaranteed to be positive definite using all terms in (7), for practical reasons, we focus on finite truncations of (7). We explore model fit as a function of the truncation number (i.e., the number of included Gegenbauer polynomials included n = 0, . . . , N). The nugget effects τ 2 1 and τ 2 2 ensure positive definiteness. We assume a priori that these nugget terms come from a uniform distribution between 10 −4 and 1. The lower bound of 10 −4 assures that the joint covariance matrix is positive definite. The upper bound of 1 is set because we have scaled both variables to have a variance of 1.
Let Y 1 be atmospheric pressures at high cloud bottom and Y 2 be downward solar fluxes. The joint model for Y 1 and Y 2 is where ⊗ is the Kronecker product, C n is a n s × n s matrix of the nth order Gegenbauer polynomial evaluated at the cosine of great circle distances between locations on the sphere, and diag(τ 2 1 , τ 2 2 ) is a 2 × 2 diagonal matrix. Because of the clear trends in latitude, the absolute value of latitude could be incorporated as a covariate in this model. In this illustration, however, we present a simple model featuring our nonparametric covariance model. We fit the model using adaptive Markov chain Monte Carlo (MCMC) (see Roberts and Rosenthal, 2009, for a discussion on adaptive MCMC). Specifically we do multivariate adaptive Metropolis-Hastings updates on the log-scale for σ 1n and σ 2n , and on the untransformed scale for a n (Haario, Saksman and Tamminen, 2001). We update each of these as a unique step. We jointly propose candidate σ 1n for n = 0, 1, . . . , N from a multivariate log-normal distribution using the natural log of the current values of σ 1n as μ. We propose values for σ 2n similarly. For a n , we sample candidate values from a multivariate Normal centered at its current value. We tune the proposal covariance using the covariance of previous samples of a n and the natural log of σ 1n and σ 2n , as well as adding a small number to the diagonal of these covariance matrices (Haario, Saksman and Tamminen, 2001). We update τ 2 1 and τ 2 2 univariately using an adaptive Metropolis-Hastings algorithm using a log-Normal random walk, where the candidate variance is turned to obtain an acceptance rate between 0.2 and 0.5, a rule presented in Roberts, Gelman and Gilks (1997). We fit the model using N = 2, 4, 6, 8, 10, 15, 20, 30 Gegenbauer polynomials using s 1 = s 2 = 1 or s 1 = s 2 = 100 as prior distributions, giving 16 models in total. For each model, we calculate the mean log-likelihood and deviance information criterion (DIC). Using these values, we explore how model fit changes with the number of Gegenbauer polynomials included and depending on the variance of the prior distribution (i.e., the degree of shrinkage or penalization).
In Figure 5, we plot the log-likelihood averaged over all post-burn-in iterations of our MCMC sampler. While adding additional Gegenbauer polynomials increases the log-likelihood, the improvements diminish as the number of terms increase. We also plot the improvement in DIC (smaller is better) using the lower variance prior distribution in Figure 5. We find that lower variance prior distribution has lower (better) DIC for all models. However, the relative improvement decreases as the number of terms increases, likely because the higher variance model still has heavy shrinkage for high-order polynomials. Here, we do not make any claims about how the number of Gegenbauer polynomials affects predictive performance.
For the model with N = 30 Gegenbauer polynomials with higher prior shrinkage s 1 = s 2 = 1, we summarize the behavior of the resulting covariance matrix function. At the estimated posterior mean, we plot C 11 (θ), C 12 (θ), and C 22 (θ)  in Figure 6. Defining the effective range as the distance at which the covariance (or cross-covariance) is 0.05% its most extreme level, we find that the estimated effective ranges of C 11 (θ), C 12 (θ), and C 22 (θ) are 7250, 6880, and 8060 km. These long effective ranges are unsurprising given the relatively smooth maps we observe in Figure 4.
In this section, we model two negatively correlated NCEP/NCAR reanalysis datasets, downward solar flux and atmospheric pressure and high cloud bottom, using a Bayesian nonparametric covariance matrix function constructed through Gegenbauer polynomials to model spatial cross-covariance. In our analysis, we focus on the degree of shrinkage imposed by our prior distribution and the number of Gegenbauer polynomials in the cross-covariance function. We only considered models with a fixed and finite number of Gegenbauer polynomials. While it is not possible to use an infinite number of Gegenbauer polynomial, future extensions of this model could allow the number of terms to be random. If the number of components comes from a Poisson prior distribution, then model fitting would require reversible jump MCMC (Green, 1995) or a point process representation of the Poisson prior distribution (Stephens, 2000). Alternatively, scaling the prior distributions ofD n using stick-breaking weights (Sethuraman, 1994) may be a possible route to learning the number of polynomials needed.
One of the most fundamental limitations of our analysis is that our covariance model is isotropic and stationary. Therefore, the estimated bivariate crosscovariance function represents the estimated global covariance pattern as function of distance, not accounting for regional differences. However, because global meteorological data over the Earth often exhibit anisotropies (Sahoo, Guinness and Reich, 2019), the estimated covariance functions may not be representative of local covariance patterns. Thus, the results must be interpreted with this limitation in mind. We discuss some extensions of this framework in Section 6 that may extend the model presented here.

Conclusion
We have provided a Bayesian nonparametric modeling framework for estimating matrix valued covariance functions. Practical implementation of the estimation techniques proposed in this paper require truncation of the Schoenberg expansion. Thus, we studied approximations of k-variate isotropic random fields and their accuracy in a L p sense, both theoretically and through simulation. In addition, we have explored the effect of the prior distribution and truncation on the model fit using a global climate reanalysis dataset.
We have worked under the assumption of geodesic isotropy, which might be questionable for some global data sets where local geographic conditions lead to anisotropy. In future analyses, we could explore approaches for incorporating anisotropy or nonstationarity within our proposed framework. Climate data also typically exhibit axial symmetry; that is, the covariance function is stationary by longitude and nonstationary by latitude (Jones, 1963;Stein, 2007). Extending this paper to the case of axial symmetry will require considerable technical work, as spectral representations for the axially symmetric case are extremely complicated.
In Section 5, we focus our analysis on the how the number of Gegenbauer polynomial terms and prior shrinkage affect fit and model complexity. In future work, one could explore more thoroughly the various prior formulations on matrices B n in (7) and their resulting posteriors. As a part of this study, it is possible that there are some analogs between specific prior distributions on B n and some closed-form representations presented in Guinness and Fuentes (2016) and Ma (2012) with prior distributions dependent on the specification of B n . In addition, we suggest exploring this framework in non-Gaussian settings (as in Terdik, 2015) and for tensor-valued quantities (see, e.g., Leonenko and Malyarenko, 2017).
One very promising application for this modeling framework is the study of Cosmic Microwave Background (CMB) radiation, including temperature (often called T) as a function of redshift and polarizations (both electric and magnetic modes, called E and B modes) (see, for example, Fixsen, 2009;Durrer, 2020). These data provide an application where the flexible models presented here could provide insight into early universe cosmology (Cabella and Marinucci, 2009;Marinucci and Peccati, 2011). However, because these datasets are generally large, one would likely need to consider low-rank (see, e.g., Higdon, 1998;Banerjee et al., 2008;Cressie and Johannesson, 2008) or directed acyclic graphical models (see Katzfuss and Guinness, 2021, for a recent review).

A.1. An alternative Karhunen-Loeve expansion
The goal of this appendix section is to justify that the expansion (21) produces a random field with the covariance function (7). Firstly, it is clear that such a random field has zero-mean (see Ma, 2016 for details). Secondly, observe that the convergence of the series ∞ n=0 A n C (d−1)/2 n (1) is fundamental to ensure that the random field (21) has finite variance, so the covariance function is well-defined. We proceed to calculate the covariance function of the random field defined in (21): In the previous equalities, we used the independence between the V n 's and U n 's, as well as the fact that E(V n V m ) = δ n=m α 2 n I(k). Finally, the integral equation (Ma, 2016) 1 where as expected.

A.2. Proof of Proposition 1
If j = l, by condition A.4 we have that By the monotone convergence theorem, series and expectation can be swapped and therefore: which in turn implies that ∞ n=0 ((D n ) j,r ) 2 < ∞ almost surely since any random variable with finite expectation is finite almost surely. If j = l, for the same reason, it is sufficient to verify that By condition A.3, (D n ) j,r and (D n ) l,r are independent, and therefore it is sufficient to verify that This is true by condition A.4 and Jensen inequality. Indeed,

A.4. Proof of Theorem 3
In what follows, we let denote f ∞ = sup E |f | for any bounded function f : E → R.
Lemma 10. For every K in A, satisfying (10) and (11), and every ε > 0, there is an integer m ≥ 1 and δ > 0 such that: Proof. In the rest of the proof, letK,D n ,B n stand forK(ω),D n (ω) andB n (ω), respectively. We aim at proving that for every ε > 0 there is an integer m ≥ 1 and δ > 0 such that if (D n ) j1,j2 − (D n ) j1,j2 ≤ δ for every j 1 , j 2 = 1, . . . , k, and To this aim, note that: The series (10) converges uniformly by the Weierstrass M-test being c n (d, ·) ∞ ≤ 1. Clearly, the same is true forK. Hence, there is an integer m ≥ 1 such that: Therefore, (36) implies: which is smaller than ε if δ is enough small and the proof is complete.
Proof of Theorem 3. The event appearing on the right hand side of (35) is equal to where the event in the intersections are independent since the elements of the random array {(D n ) j1,j2 : n = 0, 1, . . . , j 1 , j 2 = 1, . . . , k} are independent. Thanks to Lemma 10, the proof is completed noting each event in the intersection has positive probability. Indeed, for each n = 0, 1, . . . , we assume that the support of (D n ) 1,1 is the whole [0, ∞) and the support of (D n ) 2,1 is the whole R.

A.6. Proof of Proposition 5
The thesis follows combining condition A.5 with Theorem 3.5 by Guella and Menegatto (2018).

A.7. Proof of Proposition 6
The proof is based on the application of Bayes' theorem given in a general version by Schervish (1995) in its Theorem 1.31. In particular, such theorem ensures the existence of the conditional distribution of D given vec(Z(x 1 ), . . . , Z(x n )). The theorem can be applied since the conditional distribution of vec(Z(x 1 ), . . . , In order to understand the first inequality, note that f (vec(z 1 ); W ) ≤ f (vec(z 2 ); W ) implies that M (z 1 ) ≤ M (z 2 ) and therefore ( f (vec(z 1 ); W ) − f (vec(z 2 ); W ))(M (z 2 ) − M (z 1 ) ≤ 0 for any pair (z 1 , z 2 ).

A.9. Proof of Theorem 9
By (2) and (3) we extract By Fubini-Tonelli Theorem, the mixed-norm (5) takes the form where for the second equality we used the independence between the V n 's and U n 's, whereas for the third equality we used the fact that E(V n V m ) = δ n=m α 2 n I(k). The identity (32) implies that Z − Z R 2 L 2 Ω,L 2 (S d ;R k )) = n>R trace A n C (d−1)/2 n (1).
To show (25), we begin with the case p < 2. Applying the Hölder inequality on a probability space for the index 2/p > 1 directly gives We consider now p > 2. We can choose ν = ν p ∈ N such that 2(ν − 1) < p ≤ 2ν. Similarly to the case p < 2, the Hölder inequality for the index 2ν/p yields Z − Z R L p (Ω,L 2 (S d ;R k )) ≤ Z − Z R L 2ν (Ω,L 2 (S d ;R k )) .
For the last claim, we have to prove that P Z − Z R L 2 (S d ;R k ) ≥ R −γ , for infinity many R ∈ N = 0.
This will be a consequence of the Borel-Cantelli lemma (Achim, 2008, Theorem 2.7). It suffices to prove that Let p > 0. By Chebyshev's inequality, (25) and using the mixed-norm (4), we derive