Space-Time Covariance Models on Networks with An Application on Streams

The second-order, small-scale dependence structure of a stochastic process defined in the space-time domain is key to prediction (or kriging). While great efforts have been dedicated to developing models for cases in which the spatial domain is either a finite-dimensional Euclidean space or a unit sphere, counterpart developments on a generalized linear network are practically non-existent. To fill this gap, we develop a broad range of parametric, non-separable space-time covariance models on generalized linear networks and then an important subgroup -- Euclidean trees by the space embedding technique -- in concert with the generalized Gneiting class of models and 1-symmetric characteristic functions in the literature, and the scale mixture approach. We give examples from each class of models and investigate the geometric features of these covariance functions near the origin and at infinity. We also show the linkage between different classes of space-time covariance models on Euclidean trees. We illustrate the use of models constructed by different methodologies on a daily stream temperature data set and compare model predictive performance by cross validation.


Introduction 1.Background
Despite its wide variety of applications in different scientific disciplines, including environmental (see for example, Ver Hoef et al. (2006); Cressie et al. (2006); Ver Hoef and Peterson (2010);and ODonnell et al. (2014)), neurological (Jammalamadaka et al. (2013); Baddeley et al. (2014)), ecological (Ang et al., 2012) and social sciences (Ang et al. (2012); Baddeley et al. (2017)), the study of a random process observed on a network is a relatively new area in spatial statistics.Observations that are closer together in space tend to be more alike than observations far apart (Tobler, 1970).Thus, the small-scale (covariance) structure of a geostatistical process is usually assumed to be a function of distance between spatial locations.On a network, it is possible that two sampling sites are close together in the sense of Euclidean distance, but are far apart within the network.Under such a circumstance, it is more reasonable to use the alternative metric when modeling the dependence structure.
However, merely replacing the Euclidean distance in a standard geostatistical model with the shortest path within the network may lead to an invalid (not positive definite) covariance function on the network, and thus result in negative prediction variances (Ver Hoef et al., 2006).
In a finite-dimensional Euclidean space, the well-known Bochner's theorem (Bochner, 1955, pp. 58) fully characterizes the class of stationary continuous covariance functions as Fourier transforms of finite, nonnegative measures.Though this powerful theorem provides a sufficient and necessary condition for positive definiteness, closed-form Fourier inversions do not always exist.Schoenberg's result (Schoenberg, 1938), on the other hand, is Fourier transform free.It reveals the one-to-one relationship between isotropic covariance functions and completely monotone functions in a Hilbert space.Quite a few non-separable para-metric spatio-temporal covariance functions have been developed based on Bochner's and Schoenberg's theorems; see, for example, Cressie and Huang (1999) and Gneiting (2002).Yaglom (1987, pp. 526)'s kernel convolution-based approach can also be generalized into the space-time domain (Rodrigues and Diggle, 2010).
Covariance functions receive special attention due to the fact that a Gaussian random process is completely determined by its first-and second-order moments (Porcu et al., 2020).Although a broad range of classes of space-time covariance models are available in Euclidean space (De Iaco et al., 2013) and a thorough review has recently been given by Porcu et al. (2020), corresponding results for pure spatial linear networks are few and far between -a recent exception being Anderes et al. (2020) -and space-time results on networks are practically non-existent.To fill this gap, in this article we develop Fourier-free space-time covariance functions on generalized linear networks, using space embedding and scale mixture approaches.

Parametric Space-time Covariance Models
Let {Z(s; t) : (s, t) ∈ D × R} denote a univariate, real-valued, continuously-indexed stochastic process on the product space of a spatial domain D and a temporal domain R.In the literature, the spatial domain is usually taken to be either a finite-dimensional Euclidean space (D = R n ) or a unit sphere (D = S n ) (Porcu et al., 2020).In contrast, we consider a random process on a generalized linear network (D = G) whose definition will be given in the subsequent section.Assume that the first two moments of the random process exist and that the mean structure µ(s; t), which measures the global scale space-time variability, can be fixed as a constant, i.e. µ(s; t) ≡ µ, or modeled as a linear combination of covariates of interest, i.e. µ(s; t) = x(s; t) β.The second-order, small-scale dependence structure, commonly described by a parametric covariance function, is key to space-time prediction (or kriging) and regression-type parameter estimation and is the main focus of this paper.
We do not distinguish between Gaussian and non-Gaussian processes unless necessary since the covariance function plays an important role in either situation.
From the definition, C is symmetric, i.e., C(s 1 , s 2 ; t 1 , t 2 ) = C(s 2 , s 1 ; t 2 , t 1 ).Moreover, a covariance function must be positive definite, meaning that a i a j C(s i , s j ; t i , t j ) ≥ 0 (1) for any finite collections of {a i } N i=1 ⊂ R and {(s i , t i )} N i=1 ⊂ D × R. Covariance functions which fail to satisfy this condition are likely to lead to negative prediction variances and undefined probability densities.Whenever a function C satisfies the symmetry and positive definiteness conditions, we call it a valid covariance function.
In geostatistics, a common assumption made by practitioners is second-order stationarity, which requires that the overall mean is constant and that the covariance function depends on the spatial locations only through their relative positions.Moreover, in Euclidean space, a covariance function is called isotropic if it is a function of the Euclidean norm of the difference between locations.Unlike its counterpart in Euclidean space, the definition of stationarity is less clear on networks (Anderes et al., 2020).Nevertheless, a space-time covariance function is said to be isotropic within components if C(s 1 , s is a distance metric (Anderes et al., 2020) is the absolute difference between times.We call such an f a radial profile function.We work with either the covariance function or the radial profile function, denoting both by C, when the context causes no confusion.By assuming isotropy, the model guarantees that the covariance function is fully symmetric (Gneiting, 2002) When it comes to spatio-temporal covariance models, assuming separability is a convenient starting point (Rodrigues and Diggle, 2010).Specifically, a space-time model is separable if it can be written as a product or a sum of pure spatial and pure temporal models, i.e., for all space-time coordinates (s 1 , t 1 ), (s 2 , t 2 ) ∈ D × R. Given that C S and C T are valid covariance functions on D and R, the product or the sum is valid on D × R. It has been argued that the class of separable covariance models is severely limited due to the lack of space-time interaction (De Iaco et al., 2013), and that in many cases it implies "unphysical dependence among process variables" (Porcu et al., 2020).We limit our attention to nonseparable covariance functions in this paper.

Overview and Contributions
In this paper, we adopt both the space embedding technique and the scale mixture approach to construct a broad range of valid space-time covariance functions on generalized linear networks and/or Euclidean trees.The rest of the paper is organized as follows.Section 2 reviews preliminaries about generalized linear networks equipped with two distance metrics: resistance distance and geodesic distance.Section 3 gives sufficient conditions for constructing isotropic space-time covariance models by space embedding on arbitrary generalized linear networks and then on an important subgroup -Euclidean trees.Besides deriving space-time covariance functions on directed Euclidean trees based on the scale mixture approach and the convex cone property in Section 4, we also show that the exponential tail-down model (Ver Hoef and Peterson, 2010) is the one and only that is directionless (i.e.isotropic) and is thereby a bridge between models in Section 3 and Section 4.
In Section 5, we apply a few space-time covariance models on daily stream temperature measurements from a stream network in the northwest United States and compare model predictive performance.Section 6 concludes the paper with discussion.
2 Generalized Linear Networks and Distance Metrics

Generalized Linear Networks
A network, also called a graph, is a collection of vertices (nodes) joined by edges (Newman, 2010, chap. 1) and is denoted by the pair (V, E).A linear network is the union of finitely many line segments in the plane where different edges only possibly intersect with each other at one of their vertices (see Figure 1).It is useful to associate each edge with a positive real number, which is called the weight.Weights can be physical edge lengths, strengths, etc.The space-time covariance functions in our paper are defined on generalized linear networks, whose definition was introduced by Anderes et for every e ∈ E connecting two vertices u, v ∈ V.
In our work, we assume that the topological structure of G does not evolve over time.Any arbitrary site s on such a network G is denoted as s ∈ G = s ∈ V ∪ e∈E .Graphs with Euclidean edges extend the notion of traditional linear networks by including graphs which do not have a planar representation in R 2 (see, e.g., Figure 3 of Anderes et al. (2020)).We use the terms generalized linear networks and graphs with Euclidean edges interchangeably in this paper.Any tree-like graph (V, E) is planar and can be constructed as a graph with Euclidean edges easily.We call such a graph a Euclidean tree and denote it by T .Vertices of a Euclidean tree connected with only one edge are called leaves.

Distance Metrics
Let Φ(D, d) denote the class of radial profile functions such that for any f ∈ Φ(D, d), for any finite collection {a i } N i=1 ⊂ R and {s i } N i=1 ⊂ D. For any f ∈ Φ(D, d), we say that f is positive definite on D with respect to distance d.When the space domain is a finitedimensional Euclidean space D = R n , for any x, y ∈ R n , where x = (x 1 , • • • , x n ) and y = (y 1 , • • • , y n ) , let d p be the standard l p norm with When the space domain is a real Hilbert space H, the norm, denoted by Equality holds if and only if G is a Euclidean tree.
Since, by Proposition 1, d R,T = d G,T , henceforth we let d •,T denote either one.

Isotropic Space-time Models by Space Embedding
In this section, we adopt the space embedding technique to transform the abstract, less familiar spatial domain, G, to simpler, well-studied spaces, e.g.Hilbert and Euclidean spaces, and build isotropic space-time models from there.
We follow the definition of isometric spaces given by Anderes et al. (2020) for all s 1 , s 2 ∈ G.In the special case in which, G is a Euclidean tree, the above result also holds for the geodesic distance.
The so-called Gneiting class of covariance functions has been especially popular in space-time geostatistical modeling (Porcu et al., 2020) and will be the building block of the isotropic covariance functions given in this section.Despite some discrepancy in the literature, here a function ϕ ), infinitely differentiable on (0, ∞) and (−1) j ϕ (j) (t) ≥ 0 over (0, ∞) for every integer j ≥ 0, where ϕ (j) denotes the jth derivative of ϕ and ϕ for any finite collections of {a i } N i=1 ⊂ R and {s i } N i=1 ⊂ D, given N i=1 a i = 0.For such a function we write f ∈ CN D(D, d).
We are now ready to formulate and prove our first main result.Denote the generalized Gneiting class of continuous functions by G α , where with ψ and ϕ being strictly positive and continuous.Theorem 3.2 in Menegatto et al.
(2020) provides sufficient conditions for G α to be positive definite over the product space of a quasi-metric space and a finite Euclidean space, which extends Gneiting ( 2002)'s results in Euclidean spaces and will be applied on G × R.
Theorem 2. (generalized Gneiting class) Let G α be the function defined above with ϕ(•) being completely monotone.Assume that a ∈ (0, 1] and α ≥ 1/2.Then for any pairs where G is equipped with the resistance distance d R,G , the following statements are true: where g is a positive Bernstein function and Moreover, when G is a Euclidean tree, the above results hold for d G,G as well.
Proof.The symmetry of the functions defined in all three parts is obvious.A distance space (D, d) defined in this paper is also a quasi-metric space in Menegatto et al. ( 2020 Then by definition, there exists the following relationship for all s 1 , s 2 ∈ G, between γ and f .Hence, given any radial profile For instance, ψ(t Table 1) and also belongs to CN D(T , d •,T ) (see Lemma 3 in Supplement A).
When b = 1 in (c), both (a) and (c) give the same subclass of valid covariance functions on T × R.
In addition to constructing valid covariance functions on G × R, we also investigate the geometric features of marginal functions whose definition will be given shortly, near the origin and at infinity.It has been discussed in De Iaco (2010) that spatial and temporal marginals, defined as f S (d) := f (d, 0) and f T (u) := f (0, u), respectively (where f denotes the space-time radial profile function), play a significant role in selecting an appropriate and physically meaningful class of covariances in applications.By comparing empirical covariance functions with estimated ones, any obvious disagreement would indicate model misspecification (Stein, 2005).
It is clear that the covariance function, as well as both marginal functions, constructed by Theorem 2 are continuous at the origin since ψ and ϕ are continuous on [0, ∞).Although Lévy-Khinchin's formula (Berg, 2008, pp. 15-45) ϕ is completely monotone and ψ is a positive Bernstein function.
Then the spatial f S and temporal f T marginal functions, i.e., f since ϕ and ψ are positive functions and the derivative ψ is completely monotonic, thus nonnegative.Similarly, one can show the first-order derivative of the temporal marginal is ϕ is completely monotone and ψ is a positive Bernstein function.
Then the spatial marginal function Proof.By direct calculation, the second-order derivative of the spatial marginal is where each term in the brackets is non-positive for d ∈ (0, ∞).
Note that the temporal marginal function f T does not share the convexity property in general.Justified by Proposition 2, space-time covariance functions constructed by Theorem 2(c) satisfy the physical law (Tobler, 1970) which says observations that are closer in space and time have higher correlation.The asymptotic behavior of the model, e.g.lim d→∞ G α d b , u 2a for fixed u and lim u→∞ G α d b , u 2a for fixed d, depends on the asymptotic behavior of ψ and ϕ, respectively, thus does not present a unified conclusion, in general.
More completely monotone functions can be built by constructive tools, such as the property that the class of functions is closed under addition and multiplication (Miller and Samko, 2001).We show the application of Theorem 2 by a couple of examples.
Example 1.Consider the completely monotone function and the Bernstein function from the first row of Table 1.To avoid the model being overly complicated, assume α = 1 2 , a = 1 and write the geodesic distance between sites as d and the time lag as u throughout.The space-time covariance function C 0 given by Theorem 2(c) may be written as follows: where κ > 0, 0 < b ≤ 1 and δ ≥ 0. By the Schur product theorem (Schur, 1911), it follows that the product of C 0 and C S also defines a valid space-time covariance function on G × R.
The spatial and temporal marginals, as well as the covariance function itself, all decay to zero as d → ∞ and/or u → ∞, which indicates the same variability in the spatial and temporal components in the sense discussed by De Iaco et al. ( 2013) (visualizations of marginal functions, along with the covariance surface can be found in Suppplement B).We will come back to the model given by (5) later.

l 1 Embedding of Euclidean Trees
Now, we focus on a subclass of generalized linear networks G: the Euclidean trees T .In addition to the square-root embedding result which holds for any G, Anderes et al. (2020) provides another space embedding result for T only, which is restated below.) is isometrically embeddable into (R n , ρ 1 ), where n = m 2 and ρ 1 is the l 1 norm, such that there exists a mapping i : T → R n satisfying: for any s 1 , s 2 ∈ T .
The l 1 embedding result comes directly from the proof pertaining to Theorem 4 in Anderes et al. (2020).Meanwhile, the positive definite functions on (R n , d 1 ) are essentially the same as α−symmetric (here α = 1) characteristic functions in R n by Bochner's theorem, which have been extensively studied by Cambanis et al. (1983), Gneiting (1998), and Zastavnyi (2000) and will play a fundamental role in constructing space-time covariance functions on T × R. Before we dive into our second main contribution, we give the definition of linear isometric embedding due to Zastavnyi (2000).The term linear is added to distinguish from Definition 2.
Definition 3. Suppose that L i is a linear space, and a function ρ i : such that ρ 1 (x) = ρ 2 (Ax) for all x ∈ L 1 .If either of (L 1 , ρ 1 ) and (L 2 , ρ 2 ) is linearly isometrically embeddable in the other, we call these spaces linearly isometric.
Being slightly different from (2), let Φ(R n , ρ) denote the class of functions such that for For any such f and ρ, we say that Proof.Let A be the diagonal matrix with elements c 1 , • • • , c n on the main diagonal.Then for all x ∈ R n , we have ρ 1 . By Definition 3, (R n , ρ 1 ) and (R n , ρ 2 ) are linearly isometric.Lemma 1 then follows from Lemma 2(2) in Zastavnyi (2000).
Let θ denote the vector of covariance parameters and Θ n the parameter space with subscript n emphasizing that the dependence relates to R n .The theorem below gives a general framework for constructing valid space-time covariance functions on T × R by l 1 embedding.
Theorem 4. (metric models) Suppose that T is a Euclidean tree with m leaves, where β is also positive definite on R n+1 given α, β > 0, in addition to θ ∈ Θ n+1 .In concert with Theorem 3, there exists a mapping i : for any finite collection {a i } N i=1 ⊂ R and points {s i } N i=1 ⊂ T , where α, β > 0 and θ ∈ Θ n+1 .
The scaling parameters, α and β, make the spatial and temporal distances comparable.
In the literature, d •,T α + u β has been called the space-time distance and the models given by Theorem 4 have been called metric models (De Iaco et al., 2013).Combining Theorem 4 with the sufficient conditions of 1-symmetric characteristic functions given by Cambanis et al. (1983) and Gneiting (1998), we have the following corollaries.
Corollary 1. Suppose that T is a Euclidean tree with m leaves, where m ≥ 3. The function , is a valid covariance function on T ×R, provided that any of the following conditions is true: (a) C 0 is continuous, bounded, and is the Laplace transformation of a nonnegative random variable, where f n is the function where µ is a finite positive measure on [0, ∞) and ω n is defined by Although Corollary 1 provides sufficient conditions for space-time covariance functions to be positive definite, some of them are hard to check.We thus end this section with a corollary giving an explicit example, by applying Theorem 4 to a 1-symmetric characteristic function.
Example 3. (powered linear with sill models) The parametric model given by the following corollary belongs to the family of metric models and has the powered linear with sill representation.
The powered linear with sill model given by Corollary 2 is continuous near the origin.
Both marginals, as well as the covariance function itself, monotonically decay to zero.By direct calculation, one can show that both spatial and temporal marginal functions are convex near the origin.One feature of this model is that it has compact support, i.e. it reaches zero when the space-time distance is sufficiently large.This property is appealing for modeling large scale space-time datasets.However, its parameter space depends on the topology of the graph, i.e., the number of leaves m.For a fixed space-time distance, the more leaves in the tree, the smaller the dependence between observations.

Space-time Models on Directed Euclidean Trees
Instead of working with heavily mathematically involved functions, the kernel convolutionbased (or moving average) models tackle the problem from another perspective.According to Yaglom (1987, pp. 526), a large class of stationary covariance functions on the real line can be obtained by constructing a random process {Z(x) : x ∈ R}, which convolves a square-integrable kernel function g(•) over a white noise process Y (x) defined on R 1 as: x, s ∈ R.
When Y (x) is Brownian motion, the induced covariance function is valid and can be shown to be The kernel convolution-based approach allows considerable flexibility and can be generalized to nonstationary (Fuentes, 2002), space-time (Rodrigues and Diggle, 2010) and tree-like network (Ver Hoef and Peterson, 2010) settings.Details of the latter generalization are given below.

Tail-up and Tail-down Models
The space-time covariance functions given in the previous section are isotropic, which might not always be an appropriate assumption due to the fact that some networks are directed in nature.For instance, in streams, flow direction is yet another important factor, in addition to shortest path length (geodesic distance), that researchers should take into consideration when modeling physical processes.Variables that move passively downstream, e.g. chemical particles, and variables that may move upstream, e.g.fish and insects (Ver Hoef and Peterson, 2010) may need to be modeled differently.Especially for the former, we may want to allow the correlation between locations that do not share flow to be small or even zero.Based on the kernel convolution approach, Ver Hoef et al. ( 2006) and Ver Hoef and Peterson ( 2010) introduce the unilateral tail-up and tail-down covariance models on streams, which manage to handle these two scenarios differently.For detailed discussion of the models, we refer readers to Ver Hoef and Peterson ( 2010).Here, we only give the most necessary background, which will later become essential components in our space-time covariance functions on tree-like networks.
The dendritic structure of streams guarantees that condition (I) in Definition 1 holds.
Since every tree-like network is planar, we follow the prescription of Anderes et al. (2020) by letting each edge set e ∈ E be the interior of the corresponding line segment in R 2 and letting V be the set of endpoints of the line segments.Moreover, let the bijection ξ e preserve the path-length parameterization of each line segment.Thus, conditions (I) -(IV) in Definition 1 are satisfied and a stream equipped with stream distance, denoted by (T , d •,T ), is a (directed) Euclidean tree.Note that models built in this section can apply to any directed Euclidean tree, which we call a stream for convenience.
Depending on the flow direction, the tail-up and tail-down models also assume there exists a single most downstream location, which is called the outlet (see the right panel of Figure 1).Let the index set of all stream segments be denoted by A, and let the index set of stream segments that are upstream of site s i ∈ T , including the segment where s i resides, be denoted by U i ⊆ A. We say two sites s i and s j are "flow-connected" if they share water, i.e. if U i ∩U j = ∅, and are "flow-unconnected" if the water at one location does not flow to the other, i.e. if U i ∩ U j = ∅.Equivalently, two sites are called flow-connected if and only if one is on the path of the other downstream to the outlet.The pure spatial tail-up and tail-down models are given below, where the unilateral kernel g(x) is nonzero only when x > 0.
• Tail-up models: where d is the stream distance between sites s 1 and s 2 (i.e.d •,T (s 1 , s 2 )) and π 1,2 is a weight defined as follows.Let Ω(x) be a positive additive function such that Ω(x) is constant within a stream segment, but is the sum of each segment's value when two segments join at a junction, following the flow direction.Then the weight π 1,2 = Ω(s 1 ) Ω(s 2 ) ∧ Ω(s 2 ) Ω(s 1 ) ensures a constant variance of the process.In the literature, there exist different weighting schemes, see Cressie et al. (2006) andVer Hoef et al. (2006), and they have been proven equivalent (Ver Hoef and Peterson, 2010).
• Tail-down models: Commonly used kernels on streams can be found in Ver Hoef and Peterson (2010).
Obviously, a nontrivial tail-up covariance function cannot be isotropic as the covariance is always zero when sites are flow-unconnected, while a tail-down model is a function of a and b, in general.It has been shown by Ver Hoef et al. (2006) and Ver Hoef and Peterson (2010) that when the kernel is exponential, i.e. g(x) = θ 1 exp(−x/θ 2 ) for x ≥ 0, θ 1 , θ 2 > 0, the tail-down model is a function of the geodesic distance d •,T alone, regardless of flow-connectedness.Before introducing our next main contribution in terms of space-time covariance functions, we prove that the exponential kernel is the one and only which makes the tail-down model depend on d •,T alone, or in other words, isotropic.
Theorem 5. A tail-down model is isotropic, such that there exists a function The proof of Theorem 5 is nontrivial and left to Supplement A. When the kernel is exponential, the isotropic tail-down model can be written as where θ 0 , θ 2 > 0. (7) also appears in Anderes et al. (2020), where all isotropic covariance functions are developed by space embedding, as a valid covariance function on (G, d R,G ).
Therefore, Theorem 5 shows that the exponential tail-down covariance function is the only bridge which connects pure spatial covariance functions on Euclidean trees constructed by space embedding and kernel convolution, and will later help us find the linkage of space-time models constructed by different approaches as well.

Convex Cone
Stemming from the convex cone property of the class of positive definite functions, Theorem 6 provides easy to implement, yet practically important, ways to construct space-time covariance functions on directed Euclidean trees.
Theorem 6.The functions given below are valid space-time covariance functions on a directed Euclidean tree: where the tail-up C T U and the tail-down C T D models are defined in Section 4.1, and C T 1 and C T 2 are valid temporal covariance functions.
Similar to the variance components model in Ver Hoef and Peterson (2010), these functions allow high autocorrelation among sites that are flow-connected, and small but significant autocorrelation among sites that are flow-unconnected, at fixed temporal components.If we further assume that the number of observations over time on each site is the same, then substantial computational efficiency can be gained by exploiting the covariance matrix structure.
Example 4. Consider the isotropic exponential tail-down model, C T 1 being a cosine function which captures potential seasonal fluctuations, and C T 2 exponential as well.For the tailup spatial component, we adopt a Mariah kernel (Ver Hoef and Peterson, 2010), which specifies g(x) = 1 2 1 1+x/θ 1 , for x ≥ 0 with θ 1 > 0. After reparameterization, expression (10) from Theorem 6 gives the valid space-time model as where θ 1 , • • • , θ 4 > 0 and the weight π 1,2 is defined in Section 4.1.Note that the model above contains a metric sub-model which is a function of the space-time distance, i.e.

Scale Mixture Models
We conclude the theoretical development of space-time models with a clever trick (Porcu et al., 2020), which gives the so-called scale mixture model.The trick can trace back to the second stability property of covariance functions given by Chils and Delfiner (1999, pp. 60).
Theorem 7. Let C 0 (s 1 , s 2 ; t 1 , t 2 ; a) be a space-time covariance model on G × R, a be a parameter where a ∈ Θ a ⊂ R, and µ(•) a positive measure on the set Θ a .Then for every pair of space-time coordinates.
Theorem 7 can be proved by the definition of positive definiteness directly, which we will skip here.Any valid space-time model that satisfies the condition can be chosen as the integrand, and we emphasize its application on directed Euclidean trees in the following corollary.
Corollary 3. Suppose that C S (•, •) is a pure spatial covariance function on a directed Euclidean tree T and C T (•, •) is a pure temporal covariance function.Parameter a has the support Θ a ⊂ R, and µ(•) is a positive measure on the set Θ a .Then where a > 0, θ 1 > 0, θ 2 ∈ R. According to Corollary 3, is a valid space-time covariance model on directed Euclidean tree.In order to integrate (11), we use the result from Ng and Geller (1969) that Now we can simplify (11) as where the second to last equality holds by change of variable and the last by ( 12).Observe that the model given by ( 13) is a special case of ( 5 Γ(θ 4 ) After reparameterization, we have that where θ 1 , θ 2 , θ 4 > 0, and θ 3 ∈ (0, 2] is a valid space-time covariance function on Euclidean trees.The model given by ( 14) extends the metric model in Section 3.2 since it is a function of d θ 1 + u θ 3 θ 2 .The covariance function is continuous at the origin and monotonically decays to 0 as d → ∞ and/or u → ∞.The spatial marginal is convex on (0, ∞), but this is not necessarily so for the temporal marginal.

Stream Temperature Example
As mentioned in Section 2.1 and 4.1, a tree-like stream network is naturally a Euclidean tree with the geodesic distance.In this section, we illustrate the use of models introduced in previous sections on a real stream temperature data set.Streams and rivers are important to humans as well as certain plants and animals (Ver Hoef and Peterson, 2010).Stream temperature controls many physicochemical processes (Isaak et al., 2018) and is one of the key variables affecting habitat suitability for numerous aquatic species (Gallice, 2016).
Temperature-sensitive species and life stages are most vulnerable during the warm summer season (Isaak et al., 2018).Thus, gaining insights on potentially spatiotemporally varying stream temperatures over networks during the warmest time of the year is of special interest.The advent of inexpensive sensors provides a large amount of accessible water temperature records (Isaak et al., 2018).The original data (Isaak et al., 2018)  We cannot obtain the exact number of leaves where points lie due to the limitation of current software.However, the parameter space of δ in Model 3 depends on the number of leaves, and large δ will result in a rather small covariance value, which makes the estimation of covariance parameters less reliable.Therefore, we let δ = 97 in the following analysis.For Model 4, the positive additive function Ω(x) is generated using the accumulated drainage area which comes along with the NSI dataset using STARS.Models introduced thus far are continuous with variance σ 2 0 = C 0 (s, s; t, t) = 1, (s, t) ∈ T ×R.Based on applications in the literature (Gneiting et al. (2006) and Gneiting (2002)), as well as empirical analysis of space and time marginal functions for the stream temperature data, we allow a discontinuity of the covariance function at the origin by multiplying  Another interesting issue is how these models' predictive performances compare to models in the literature.Direct comparison is unavailable since there do not exist space-time covariance models on the network and researches on this specific data set are limited.
But Gallice (2016) did an extensive literature review on statistical stream temperature prediction in ungauged basins on different time scales.Despite different study regions and models applied, the daily RMSPEs range from 1.0 -2.7 • C (Table 2.1, Gallice ( 2016)).
Moreover, the measurement errors from different sensors in the original data are from ±0.2 to ±0.5 • C (Isaak et al., 2018).In light of this, all of our proposed models achieve decent prediction precision.

Discussion
This article presented a collection of tools to build valid space-time covariance models on generalized linear networks, and on an important subclass, Euclidean trees.We studied examples obtained by each constructive method and applied them to a real stream temperature data.We have not yet given a standard guidance on how to choose suitable candidate models for an arbitrary data set.But understanding the underlying physical process (Gneiting, 2002) and matching geometric features of theoretical covariance functions to the empirical space-time covariance surface (De Iaco et al., 2013) would be helpful.
It has been argued that when prediction is the goal, model estimation is just a means to an end (Porcu et al., 2020).We notice that maximum likelihood estimates of σ 2 for Model 3 and 5 are much higher, i.e. σ2 = 15.426 and 13.137, respectively, than the rest.It does not cause much trouble in this particular analysis as both models have reasonable prediction precision, as determined by cross validation.However, the study of parameter estimability under infill and increasing domain asymptotics on a network is an open question that needs to be addressed and requires special attention.For instance, there is a wide range of geodesic distances between pairs of observation sites, i.e. from less than 0.1 km to 453.438 km, in the water temperature data.Precisely characterizing the dynamics of the space-time process can help refine future sampling designs and reduce data redundancy (Isaak et al., 2018).
Though we emphasized the decisive role of valid covariance functions in geostatistical models, they also allow direct extension to space-time log Gaussian Cox processes on generalized networks (Møller et al. (1998); Anderes et al. (2020)), which we leave for future investigation.
al. (2020) and is revisited below.Definition 1.A triple G = (V, E, {ξ e } e∈E ) which satisfies conditions (I) -(IV) is called a graph with Euclidean edges.(I) Graph structure: (V, E) is a finite simple connected graph, meaning that the vertex set V is finite, the graph has no self-edges or multi-edges, and every pair of vertices is connected by a path.(II) Edge sets: Each edge e ∈ E is associated with a unique abstract set, also denoted by e.The vertex set V and all the edge sets are mutually disjoint.(III) Edge coordinates: For each edge e ∈ E and vertices u, v ∈ V joined by e, there is a bijective mapping ξ e defined on the union of the edge set e and vertices {u, v}, i.e. e ∪ {u, v}, such that ξ e maps e onto an open interval (e, e) ⊂ R and {u, v} onto endpoints {e, e}, respectively.(IV) Distance consistency: Define d G (u, v) : V × V → [0, ∞) as the length of the shortest path on vertices of a weighted graph where the weight associated with each edge e ∈ E is defined as e − e.Then, the following equality holds:

Figure 1 :
Figure 1: Examples of networks.The middle panel is a directionless tree, while the right most panel is a directed tree (e.g.stream) with the outlet superimposed.
with completely monotone derivative is called a Bernstein function.In analogy to the definition of positive definite functions, we recall that for a distance space (D, d), a continuous function f : D → R is called conditionally negative definite (see for example in Menegatto et al. (2020)) on D with respect to d if ), while the converse is not necessarily true.Therefore, parts (a) and (b) of Theorem 2 are direct applications of Menegatto et al. (2020)'s work to the case where the dimension of the Euclidean space is 1 and the quasi-metric space is (G, d R,G ).By Theorem 1, (G, d R,G ) is square root-embeddable in a Hilbert space H. Notice that (G, d R,G ) is again a distance space and thus isometrically embeddable into H by Definition 2. Statement (c) follows in concert with Theorem 3.2 (iii) in Menegatto et al. (2020).When G is a Euclidean tree, Proposition 1 gives that d R,G = d G,G , which completes the proof.Each part (a) -(c) of Theorem 2 provides researchers an easy to implement method for constructing valid space-time covariance functions on a generalized linear network.Let us digress for a moment and consider a pure spatial, isotropic random process Z(s) defined on (G, d R,G ) with an overall constant mean µ.Similar to the conclusion in geostatistics, the semivariogram, defined as γ we can construct γ based on (4), which belongs to CN D(D, d R,G ).For examples of the class Φ(G, d R,G ), we refer to Anderes et al. (2020).Note that statements (a) -(c) are not exclusive.
characterizes the class of conditionally negative definite functions in R n , analogous results in the distance space (D, d), especially (G, d R,G ), have not been obtained, as far as we know.Hence, we defer marginal results related to CN D(G, d R,G ) for future research and investigate properties of marginals pertaining to the covariance functions by Theorem 2(c) only.
and 0 < b ≤ 1.Since b and λ appear only as the product bλ, and both have the same support, i.e. (0, 1], henceforth to avoid an identification issue we use b instead of bλ.Imitating Example 1 from Gneiting (2002), consider the pure spatial covariance function C S (Anderes et al., 2020) on (G, d R,G ):

Theorem 3 .
(l 1 embedding, Anderes et al.) Let T be a Euclidean tree with m leaves, where m ≥ 3. Then (T , d •,T and J ν (t) denoting the Bessel function of the first kind with order ν.(c) C 0 is a continuous function such that C 0 (0Let n = m/2 .Given condition (a), Theorem 3.3 in Cambanis et al. (1983) shows that C 0 • ρ 1 is positive definite on R n+1 .It follows immediately from Theorem 4 that the space-time covariance function C 0 d •,T α + u β is positive definite on T × R. Parts (b) and(c) may be proved similarly using Theorem 3.1 inCambanis et al. (1983) and Theorem 3.2 inGneiting (1998).Pure spatial results, such as Theorems 4 and 5 inAnderes et al. (2020), are applications of the l 1 embedding technique to Theorem 3.1 inCambanis et al. (1983) and Theorem 3.2 inGneiting (1998), respectively.
flow-unconnected, where d has the same definition as in tail-up models and a, b represent the distances from each site to the nearest junction downstream of which it shares flow with the other site.When s 1 , s 2 ∈ T are flow-unconnected, d = a + b.
and C T D (•, •)C T 2 (•, •), are positive definite on T × R. The remaining results then follow easily from the definition of positive definiteness.
is a valid space-time covariance function on T × R given that the integral exists.The proof of Corollary 3 follows from Theorem 7 in concert with the fact that the separable space-time function C S (s 1 , s 2 ; a)C T (t 1 , t 2 ; a) is a valid covariance function on T × R. We illustrate the use of Corollary 3 by two examples, one of which shows an interesting linkage between space-embedding models and scale mixture models due to Theorem 5. Example 1 Revisit.Following Example 4 in De Iaco et al. (2002), consider an exponential tail-down model as the spatial component (C S ), a cosine function as the temporal component (C T ) and a half-normal probability density function (µ ), which are parameterized as follows: ) in Example 1, where κ = 1/θ 1 , c = θ 2 2 , ν = b = β = 1, and τ = 1/2.Lemma 2. Assume that the scale mixture space-time covariance function is isotropic, that C S is an exponential tail-down model and C T depends on the time lag u only.Then C T (0; a) ≥ 0 for a ∈ Θ a is a sufficient but not necessary condition for the spatial marginal function f S (d) := C(d; 0) to be convex on R + .Proof.Let C(d; u) := Θa exp − a 2 c d C T (u; a)dµ(a).Suppose that the measure µ(•) and the temporal covariance function C T are smooth enough to allow interchanging the order of differentiation and integration.Then f S (d) = T (0; a)dµ(a) ≥ 0, given that C T (0; a) ≥ 0 for all a ∈ Θ a .However, this condition is not necessary since we have shown in Example 1 Revisit that when C T (u; a) = cos [a(2θ 2 u)], the scale mixture covariance model is essentially a special case of Example 1 and by Proposition 3, the spatial marginal is always convex on (0, ∞).Apart from the sufficient condition provided in Lemma 2, convex cone and scale mixture models do not share unified geometric features.Example 5. Again, let the spatial component be an exponential tail-down model with a slightly different parameterization: C S (d; a) = exp − a Γ(θ 4 )

Figure 2 :
Figure 2: Delineated stream lines of the Clearwater River Basin with 96 observation sites superimposed on top.

Figure 3 :
Figure 3: Time series of residuals from the linear regression model with i.i.d.errors on 08/01/2014 -08/10/2014 between various pairs of survey sites.Upper left: flow-connected sites with the smallest geodesic distance, i.e. 0.104 km; upper right: flow-connected sites with the largest geodesic distance, i.e. 224.577 km; bottom left:flow-unconnected sites with the smallest geodesic distance, i.e. 0.031 km; bottom right: flow-unconnected sites with the largest geodesic distance, i.e. 453.438 km.

Figure 4 :
Figure 4: Empirical (points) and fitted (red lines) marginal covariance functions of residuals from Model 2 by maximum likelihood estimation on the full data.Open circles are sample covariograms with size proportional to number of data pairs within each group.

Table 2 :
Summary of geographical variables and mean August water temperature in the year 2014 of 96 survey sites.
(Isaak et al., 2018):// www.researchgate.net/publication/325933910_Principal_components_of_thermal_regimes_in_mountain_river_networks,comprisesmeandailywatertemperature(inCelsius)during a 5-year time period(2011 -2015)at 226 observational sites residing on several adjacent river networks in the northwestern United States.As the proposed models only apply to observations on the same network with finite geodesic distances in between, we consider 96 survey sites from the Clearwater River Basin (Figure2) in central Idaho.Observations are more crowded in the northeast region and geographical summary statistics are given in Table2.The average rate of missingness of the original water temperature data is about 12%(Isaak et al., 2018).Considering computational feasibility, we choose to analyze data from 10 consecutive days 08/01/2014 -08/10/2014 from the warmest season when records of those 96 sites are complete, which yields a total of 960 observations.Figure3clearly indicates that spatial and temporal dependence may exist among observations.

Table 3 :
Comparison of log likelihood values (LL), Bayesian Information Criteria (BIC), root mean squared prediction errors (RMSPE) and continuous ranked probability scores (CRPS) averaged over 8-fold cross validation replications for Models 1 -5, the separable model and the linear regression (LR) model on the water temperature data with covariates We interpret the somewhat contradictory results that a model behaves the best in-sample but worst out-of-sample as there might exist over-parameterization and model misspecification issues.Since our focus is the predictive performance, we prefer Model 2, which is constructed based on the Hilbert space embedding technique as well, because it has the smallest cross validation RMSPE and CRPS.It reduces the former metric by 17.86% and 16.55%, and latter by 21.19% and 17.70%, compared to Model 1 (also the separable model) and the linear regression model, respectively.Besides, its in-sample performance is only inferior to Model 1 and the separable model.Thus, Model 2 is the overall "best" model for this data set and fitted marginal functions are given in Figure4.