Joint Modeling of Longitudinal Relational Data and Exogenous Variables

This article proposes a framework based on shared, time varying stochastic latent factor models for modeling relational data in which network and node-attributes co-evolve over time. Our proposed framework is flexible enough to handle both categorical and continuous attributes, allows us to estimate the dimension of the latent social space, and automatically yields Bayesian hypothesis tests for the association between network structure and nodal attributes. Additionally, the model is easy to compute and readily yields inference and prediction for missing link between nodes. We employ our model framework to study coevolution of international relations between 22 countries and the country specific indicators over a period of 11 years.


Introduction
Understanding the coevolution of relational and nodal attributes is a common problem in fields as diverse as public health (Christakis and Fowler, 2007;Fowler and Christakis, 2008), finance (Kalyagin et al., 2014) and genomics (Butland et al., 2005).In these types of applications, data consists of two parts: a time series of dyadic relationships among a common set of n actors, which is encoded in the sociomatrix Y (t) = {y i,j (t) : i, j = 1, .., n; t ∈ N}, and a collection of p time series of attributes associated with each of the actors, which is encoded by the matrix Z(t) = {z i,k (t) : i = 1, . . ., n; k = 1, . . ., p}.
Broadly speaking, the literature approaches the study of the association between nodal and network attributes through two different approaches.One of these approaches focuses on modeling the structure of the network conditional of the nodal attributes.The goal in that case is to understand how social relationships are formed based on the attributes of individuals, a process known as "selection".The other approach consists of models of the nodal attributes and their association conditional on the network structure.Those models are used to understand how relationships affect attributes of the individuals in a network, a process referred to as "influence" or "contagion".Models of selection are typically built by regressing y ij (t) on node-or dyad-specific regressors using Exponential Random Graph Models (ERGMs) (Holland and Leinhardt, 1981;Robins et al., 2007) or mixed-effects generalized linear models (Wasserman and Anderson, 1987;Holland et al., 1983;Hoff et al., 2002;Hoff, 2005).ERGMs, also known as p * models, assume that the conditional distribution of the network given the covariates follows an exponential family model in which the sufficient statistics are functions of a few summary metrics of the network and nodal attributes (such as the number of edges or the number of triangles in the networks).Temporal Exponential Random Graph Models (TERGMs) are extensions of ERGMs that build dependencies between graphs over time by incorporating summary statistics that involve counts of the number links among groups of nodes at various points in time (Robins and Pattison, 2001;Hanneke and Xing, 2007;Hanneke et al., 2010).Uncertainty in TERGMs is usually assessed using resampling techniques such as the bootstrap (Desmarais andCranmer, 2010, 2012).In a similar spirit, stochastic actor oriented models (SOAMs) (Snijders et al., 2010;Brandes et al., 2012;Snijders, 2013) have been recently proposed to model the evolution of dynamic networks.SOAMs use network-specific rules or network effects such as reciprocity, triadic closure, or homophily and identify network effects to explain the observed evolution of the networks.Though naturally appealing, the performance of ERGMs and SOAMs heavily depends on the choice of sufficient statistics or networks-specific rules that are incorporated into the formulation.ERGMs in particular can be weak at capturing local features of the network, often resulting in less than satisfactory performance in many real world problems (Snijders, 2002;Handcock et al., 2003).Alternatively, latent variable models regress the dyadic responses on covariates and/or unobserved latent variables.These models are computationally efficient and useful in modeling transitivity and homophily.On the other hand, models of contagion are usually constructed by regressing nodal attributes against those of other nodes in their network (for example, see Christakis and Fowler, 2007;Fowler and Christakis, 2008;Shoham et al., 2015 and references therein).Common methodological approaches include simultaneous autoregressive (SAR) models (e.g., see Lin, 2010) and threshold models (e.g., see Watts and Dodds, 2009).
Determining whether selection or contagion are at play (i.e., the direction of the causal relationship) is typically a difficult problem (Doreian, 2001).Instead, we focus here on jointly modeling the co-evolution of network and nodal attributes through shared latent variables.The goal of our model is twofold.First, we are interested in developing tests of association between structural features of the network and individual nodal attributes.In particular, although our approach does not allow us to distinguish between contagion and selection, it does allow us to carry out tests of independence between the network and the nodal attributes.Second, we are interested in developing predictive models that can be used to jointly predict both future links and future nodal attributes.Joint models of network and nodal attributes have started to receive increasing attention over the last few years.In a static setting, Fosdick and Hoff (2015) recently proposed an extension of the bilinear model of Hoff (2005) in which the nodal attributes and the latent factors used to explain transitivity in the network are jointly modeled using a multivariate normal distribution.In their model, testing for association between the network and nodal attributes reduces to testing whether the cross-covariance matrix between latent factors and the nodal attributes is the null matrix using a likelihood ratio test.On the other hand, Durante and Dunson (2017) proposes joint modeling of a binary/categorical response and a network using latent variable tensor factorization of the joint probability model.This framework finds its application in clustering networks into multiple groups and thus has a different focus than ours.In a dynamic setting, De la Haye et al. (2010) proposed time varying joint models for network and attributes when the attributes are binary or categorical, while Niezink et al. (2017) extend the framework to accommodate continuous nodal attributes.These models rely on extensions of the ERGM and TERGM approaches described above, and therefore inherit the same drawbacks.
In this paper we propose a fully Bayesian approach to inference, testing and prediction for co-evolving networks and nodal attributes.The development of this model is motivated by the study of the relationship between international relationships and country-specific economic performance.Our approach is related to, but distinct from, the one presented in Fosdick and Hoff (2015).In addition to accommodating both discrete and continuous attributes and considering the more general case of time series data, our approach uses a common set of latent factors to explain network transitivity and covariation among attributes and network structure, and provides a fully Bayesian test of association that can be used to study individual nodal attributes.When the nodal attributes are assumed to follow conditional Gaussian distribution, our model can be interpreted as a dynamic version of the model presented in Fosdick and Hoff (2015), but with a structured (and potentially more parsimonious) prior on the covariance matrix between the latent traits and the nodal attributes.When the attributes are non-Gaussian, modeling through shared latent traits allows us to include them in the model in a straightforward fashion, something that the formulation in Fosdick and Hoff (2015) does not permit.Shared latent factors have been recently employed, among others, by Rodríguez and Moser (2015) to jointly model voting outcomes and abstentions in roll-call data in the U.S. Congress, and by Durante et al. (2017) to generate multilayer extension to the bilinear model.Our approach captures the dynamics of the system using autoregressive priors for the shared latent parameters, in an approach reminiscent to Sarkar and Moore (2006), Durante and Dunson (2014) and Sewell and Chen (2015).We refer to our proposed joint modeling framework as Joint Latent Factor Model (JLAFAC).
The remainder of the article flows as follows.Section 2 describes the joint modeling framework with latent factors and highlights some features of the proposed joint model.This section details our prior distribution on the model parameters and latent factors.Section 3 explains posterior computation and explains how MCMC samples are used for link and attribute prediction and assessments of association between structural features of the network and individual nodal attributes.Sections 4 and 5 demonstrate performance of the proposed framework along with the competing models in simulation studies and in a international relationships network data respectively.Finally, Section 6 concludes the article.Details of MCMC updates are described in the online supplement.

Model formulation 2.1 A joint model for nodal attributes and networks
Assume that a group of n actors is followed over time, and let Y (t) = [y i,j (t)] denote the n × n binary matrix capturing dyadic interactions between these actors at time imsart-ba ver.2014/10/16 file: JLAFAC_RG_1230_abel.tex date: March 31, 2019 t, and Z(t) = [z i,k (t)] be the n × p matrix of continuous or discrete attributes for these actors at time t.Although we concentrate here in binary directed relationships between actors, the model can be easily extended to continuous and other categorical relationships, as well as to undirected relationships (please see Section 6).The network and actor attributes are observed at a finite number of time points t 1 < t 2 < • • • < t T resulting in realizations Y (t 1 ), . . ., Y (t T ) of the stochastic network process Y (t), and Z(t 1 ), . . ., Z(t T ) of the stochastic attributes Z(t).
We propose conditionally random effect models for Y (t) and Z(t) with shared latent factors to accommodate co-evolution.To be more precise, for the entries of the sociomatrix Y (t) we consider a bilinear model, where u i (t) = (u i,1 (t), . . ., u i,R (t)) and v j (t) = (v j,1 (t), . . ., v j,R (t)) are both timevarying, R-dimensional latent variables, and G is an appropriate link function.In the sequel we focus our discussion on the probit link where G(•) corresponds to the cumulative distribution function of the standard normal distribution, but more general links such as the logistic can be easily accommodated with relatively minor changes to our computational approach.
Equation (2.1) corresponds to the bilinear model proposed in Hoff (2009).The time varying latent vectors u 1 (t), . . ., u n (t) and v 1 (t), . . ., v n (t) capture transitivity and reciprocity in the dyadic relationship (Hoff, 2009).The eigenvalues λ 1 , . . ., λ R play a crucial role in determining the specific form of that relationship, in the sense that similar values u i,r (t), u j,r (t) and v i,r (t), v j,r (t) contribute positively or negatively to the relationships i → j and j → i, depending on whether λ r > 0 or λ r < 0. Furthermore, note that if λ r = 0 then the r-th dimension of the latent variables has no impact on the network structure.
The time varying attributes attached to each node can either be continuous, binary or categorical.To model them, we propose to use a set of conditionally independent generalized linear models E {z i,k (t)} = δ i,k (t) where where ζ i,k (t) corresponds to realizations from a white-noise process with variance φ −1 k .
imsart-ba ver.2014/10/16 file: JLAFAC_RG_1230_abel.tex date: March 31, 2019 Note that the latent factors u 1 (t), . . ., u n (t) and v 1 (t), . . ., v n (t) we introduced in (2.1) reappear as predictors in (2.2).The static coefficients ψ k,r and ξ k,r capture the "average" effect of the rth component of the latent factors on the corresponding nodal attribute, and the time varying coefficients α k,r (t) and β k,r (t) control how those baseline effects vary over time.Hence, u 1 (t), . . ., u n (t) and v 1 (t), . . ., v n (t) not only determine the structure of the network but also control the level of association between nodal attributes, as well as the co-evolution between the network and the nodal attributes.Furthermore, just like in the case of λ r , note that setting ψ k,r = 0 and ξ k,r = 0 simultaneously implies that the rth dimension of the latent factors has no effect in the evolution of z i,k (t).
To motivate the shared latent factor formulation discussed above it is convenient to think of the vectors u i and v i as describing the position of actor i in a latent "social" space (e.g., see Hoff et al., 2002;Hoff, 2005).From that point of view, our model makes the intuitively appealing assumption that the relative position of the nodes in social space determines both the likelihood of a link between nodes (i.e., the network structure) and the behavior of the attributes.The shared latent factor formulation can also be motivated as a generalization of the model introduced in Fosdick and Hoff (2015), which we discuss in more detail in Section 2.4.Alternatively, u i and v i can be viewed as ith node-specific scores with respect to a set of R latent features/attributes.These features regulate latent similarities among nodes (via the joint models (2.1) and (2.2)) and the evolution of the associated external covariates.
Although the previous formulation does not incorporate known covariates that might impact the evolution structure of the network or the nodal attributes, extending it in this direction is trivial by including additional regression terms in (2.1) and (2.2).

Modeling the evolution of the nodal attributes and networks
In order to borrow information across time, we propose to model the stochastic processes . ., p and r = 1, . . ., R using independent Gaussian process priors.Within the general class of Gaussian processes, we find it particularly convenient to work with stationary Ornstein-Uhlenbeck (OU) processes.The OU process is a Gaussian process with an exponentially damping covariance kernel over time.A key feature of the OU process is that its discretization leads to the well known first order autoregressive process, for which fast computational algorithms exist (see discussion below).Another key advantage of this specification is that this prior has "full support," in the sense that it has positive probability of generating joint functions {(Θ(t), ∆(t)) : t ∈ T } arbitrarily close to any true joint function {(Θ 0 (t), ∆ 0 (t)) : t ∈ T } generating the data, where . Details of theoretical results on full support.including the proofs, are given in the supplementary material Section 0.2.
For ease of presentation we assume that the times t 1 , t 2 , . . ., t T at which the data is observed are equally spaced, in which case an Euler discretization of these Ornstein-Uhlenbeck processes leads to first-order autoregressive priors as described in the next few paragraphs.This discretization can be easily modified in the case of irregularly spaced observations to provide a time-consistent model.
Let u i,r (t l ) = u i,r,l and v i,r (t l ) = v i,r,l for all l = 1, . . ., T .Assuming that t 1 , t 2 , . . ., t T are equally spaced, the Ornstein-Uhlenbeck prior implies that where |ρ u | < 1 and |ρ v | < 1 are autocorrelation coefficients.The initial states u i,r,0 and v i,r,0 are assigned the respective stationary distribution, u i,r,0 ∼ N 0, , which implies that the marginal prior distributions for u i,r,l and v i,r,l have zero means and the constant variances.Furthermore, setting ρ u = ρ v = 0 leads to independent priors at every point in time.
We use a similar argument for each of the other four sets of dynamic parameters in our model.More specifically, letting where, as before, the processes are assumed to start at their respective stationary distribution, α k,r,0 ∼ N 0, To help address identifiability issues, the parameters σ 2 u , σ 2 v , σ 2 α and σ 2 β all are fixed at 1 (please see Section 3.1) .Additionally, σ 2 µ and σ 2 η are also kept fixed at 1 without loss of generality.The rest of the variance parameters φ −1 k , k = 1, . . ., p follow proper inverse gamma distributions with infinite means, IGam(1, 2).Finally, since lag parameters ρ u , ρ v , ρ α , ρ β , ρ η , ρ µ lie in [−1, 1] interval, we assign the standard normal prior distributions on transformed ρ u , ρ v , ρ α , ρ β , ρ η , ρ µ 's with full range in R.This choice of priors for ρ u , ..., ρ µ is meant to facilitate the use of elliptical slice sampling algorithms Nishihara et al. (2014) in order to improve the mixing of the algorithm (see Section 3).

Dimension selection and association testing
As mentioned in Section 3.3, the coefficients λ 1 , . . ., λ R can potentially be used to determine which components of the latent factor vector affect the structure of the network.To exploit this property of the model we propose to use independent Gaussian mixture imsart-ba ver.2014/10/16 file: JLAFAC_RG_1230_abel.tex date: March 31, 2019 priors, , and τ r ∼ Gam q 3(r−1) , q 2(r−1) , q > 1.This prior specification on τ r ensures fast decay of the variance parameter 1 τr , so as to safeguard the implied prior on the θ i,j (t)s from degenerating as R becomes large.In particular, which implies that the prior variance of θ i,j also converges as R → ∞.In our experiments we work with q = 1.5.On the other hand, allowing the prior inclusion probability π λ to be random allows us to automatically adjust for multiple comparisons (Scott et al., 2010).
Note that, as v 0 → ∞, the mixture component associated with γ r = 0 converges to a Dirac delta function at 0, δ 0 .Though using a degenerate measure at zero as one component of mixture distribution is suitable for selecting unimportant dimensions, we instead use a mixture of two normal distributions with large value of v 0 in (2.4) to mimic the effect of δ 0 .This is a very common practice in the variable selection literature to enhance better mixing of the parameters, see (George and McCulloch, 1993;Ishwaran et al., 2005).Sensitivity of the procedure to the choice of v 0 is explored and discussion is offered in Section 4.
We use a similar approach to identify components of the latent factors that do not affect the different nodal attributes.In particular, we let where The indicator variables γ 1 , . . ., γ R , ω 1,1 , . . ., ω p,R and ς 1,1 , . . ., ς p,R can be used to investigate the pattern of association between the network structure and the nodal attributes, as well as to estimate the effective dimension of the latent space (see Section 3.3 for more details, as well as for a discussion on the elicitation of the parameters a λ , b λ , a ψ , b ψ , a ξ and b ξ ).

2.4
Relationship with other models in the literature Fosdick and Hoff (2015), which focuses in situations in which the network and nodal attributes are observed only once, builds a joint model through a prior of the form i.e., they assume independence between the latent factors but allow for general crosscovariance matrices Ω u,z , Ω u,z between latent factors and attributes, as well as for an unrestricted covariance structure Ω z,z for the nodal attributes.The authors are then careful to enforce constraints on Ω u,z , Ω u,z and Ω z,z so that the full covariance matrix is well defined.In this formulation, the nodal attributes and the network structure are independent if and only if both Ω u,z = 0 and Ω v,z = 0, which the authors check using a likelihood ratio test.
Contrast this structure with that induced by our model in the simplified case in which the data is static (i.e., T = 1) and the attributes are conditionally Gaussian.For our model, the joint distribution for (u i , v i , z i ) is also normal and the covariance structure has a very similar structure.In particular, the latent factors are all independent, and we have cov . Hence, our model allows for cross-covariance matrices Ω u,z and Ω v,z that are as general as those induced by the model discussed in Fosdick and Hoff (2015).However, in our case and Φ −1 is a diagonal matrix with entries φ 1 , . . ., φ p (recall Section 3.3).Therefore, when R p, our model induces a sparse, "low rank + diagonal" decomposition for the marginal covariance matrix for the attributes Ω zz , while for R ≥ p the model allows for a general (unrestricted) covariance matrix.
This discussion shows that the model discussed in Fosdick and Hoff (2015) can be recovered as a special case of our shared latent factor model, and also makes the advantages of the more general specification clear.First, if the attributes where being modeled independently, it would be natural to assume some sort of sparse structure for the matrix Ω zz .Our latent factor formulation yields that for free.Secondly, the formulation automatically leads to a well-defined joint covariance matrix without any need for a careful prior specification for the cross-covariance matrices.Thirdly, the latent factor formulation can easily accommodate categorical attributes with minimal modifications.Finally, the latent factor formulation allows for an easy extension to the time-varying case that is the focus of this paper.

Posterior Inference
Under a Bayesian framework, parameter estimation can be achieved via Markov chain Monte Carlo (MCMC) algorithms, in which posterior distributions for the unknown quantities are approximated with empirical distributions of samples from a Markov chain.To streamline computation, we follow Albert and Chib (1993)  latent variables w i,j,l for the network data such that w i,j,l > 0 if y i,j (t l ) = 1 and w i,j,l < 0 otherwise.Equation (2.1) can now be written in terms of w i,j,l as w i,j,l = µ l + R r=1 λ r u i,r,l v j,r,l + i,j,l , i,j,l ∼ N (0, 1). (3.1) For binary or ordinal nodal attributes we follow a similar latent variable augmentation, while for attributes that follow Gaussian distributions no data augmentation is required.This data augmentations lead to updates that mostly use Gibbs sampling steps (see supplementary material Section 0.1 for details).One exception is the correlation coefficients ρ u , ..., ρ µ , for which we employ elliptical slice samplers Nishihara et al. (2014) in order to improve the mixing of the MCMC algorithm, particularly for large values of p and T (see Section 4 for an illustrations).

Identifiability of Parameters
One important issue related to the proposed model is the identifiability of the unknown parameters . Note that the parameters in the proposed model could have potentially faced following identifiability issue.
Permutation indeterminacy: r),l v j,P (r),l .Similar identifiability issue can be faced for R r=1 ψ k,r α k,r,l u i,r,l and R r=1 ξ k,r β k,r,l v i,r,l if all ψ k,r 's and all ξ k,r 's are equal respectively.
Note that λ r 's, ξ k,r 's and ψ k,r 's are assigned normal prior distributions which imposes increasing shrinkage towards zero as R increases, so that both orthogonal and permutation indeterminacies are ameliorated.Moreover, σ 2 u = σ 2 v = σ 2 α = σ 2 β = 1 partially solves the scale indeterminacy issue, namely λ r 's, ξ k,r 's and ψ k,r 's are identifiable up to sign.

Link and nodal attribute prediction
We can approximate the posterior probability of a directed dyad from node k 1 to node k 2 at time t l as an average of M post burn-in, suitably thinned, MCMC samples as where the superscript (s) denotes the s-th post burn-in MCMC sample for a parameter after suitable thinning.Note that this estimated link probability can be used to infer missing links within observed networks (under the additional assumption of ignorable missingness), or to predict the structure of the network at a future time.To decide whether a directed dyad between nodes k 1 and k 2 is present, one can choose a cut-off c ∈ (0, 1) so that θ k1,k2 (t l ) > c implies a directed link.By varying c we can construct a receiver characteristic curve for our network prediction algorithm.
A similar approach can be used to predict the value of nodal attributes.For example, for the purpose of point prediction, i,r,l .

Patterns of association between network and nodal attributes
The latent space provides us understanding of the association between network and nodal attrubutes.The R dimensions of the latent space can be divided into four groups: systemic dimensions, which influence both the network structure and at least one of the nodal attributes, two groups of idiosyncratic dimensions, one group that is associated only with the network structure, and a second group that solely impacts the association between nodal attributes, and a set of inactive dimensions that have no effect on either of the outcomes of interest.More specifically, a given dimension r is systemic if and only if it significantly affects the structure of the network (i.e., γ r = 1) and it also affects at least one of the nodal attributes (i.e., if there is at least one k for which ω k,r = 1 or ς k,r = 1, or both).Hence, q S , the number of systemic dimensions in the latent space is given by It is argued in the supplementary material Section 0.3 that q S is an identifiable quantity.Furthermore, q S = 0 if and only if the sequence of networks and the time series of nodal attributes are mutually independent.Hence, where I(•) denotes the indicator function.P (q S = 0 | data) provides us with a mechanism to evaluate global association between the network structure and nodal attributes, with high values of this probability indicating that these are independent.This is a Bayesian alternative to the likelihood ratio test discussed in Fosdick and Hoff (2015), one that also takes into account the dynamic nature of our data.
The aforesaid procedure to understand general association between network and nodal attributes can be slightly modified to investigate whether the network structure is associated with a particular nodal attribute.For this purpose, define Again, an estimate of P (q S,k = 0 | data) can be obtained from the samples generated by the MCMC algorithm.High values for this probability indicate that the network structure and the kth nodal attribute are (marginally) independent from each other.
A similar approach can be used to define the number of idiosyncratic dimensions associated only with the network structure, and the number of idiosyncratic dimensions associated with the nodal attributes, Note that if q A = 0, then all correlations among the nodal attributes is explained by some of the same factors that explain the network structure.Furthermore, although the model allows the dimension of the latent space to be as high as R, the effective dimensionally of the space is R * = q S + q N + q A ≤ R, with the number of inactive dimensions being q O = R − q S − q N − q A .Supplementary material Section 0.3 show all of q N , q A and q S and R * are identifiable.Hence, a posteriori our model is able to learn the dimension of the latent space.

Hyperparameter selection
The prior distribution on the summaries q S , q N and q A depends critically on the hyperparmeters a λ , b λ , a ψ , b ψ , a ξ and b ξ , which must be carefully chosen.For example, a priori, For even moderate p, the usual choice of a ψ = b ψ = a ξ = b ξ = 1 (i.e., using uniform distributions for the inclusion probabilities) will lead to a prior distribution on q N that is imsart-ba ver.2014/10/16 file: JLAFAC_RG_1230_abel.tex date: March 31, 2019 heavily skewed towards 0. Such prior distribution would typically be unappealing since it would make it difficult to identify components that are idiosyncratic to the network, and therefore lead to an overestimation of the number of systemic components.
To address this issue, in our own data analysis we set a λ = b λ = 1, which ensures that the R r=1 γ r follows a uniform distribution on {0, . . ., R}, and then set the values of a ψ , b ψ , a ξ and b ξ so that a ψ = a ξ = b ψ = b ξ and the marginal distributions for q S , q N and q A are (approximately) identical.We evaluate the sensitivity to various choices of a ψ , b ψ , a λ , b λ in Section 4.4.

Simulation study
We evaluate the performance of our model using five simulated datasets, each generated under a different scenario.The purpose of this simulation study is fourfold: (a) to assess predictive performance of JLAFAC in terms of link and nodal attribute predictions, (b) to investigate the ability of our model to identify dependence and independence relationships in the data, (c) to assess the impact of hyperparameters (and, in particular, of the constant v 0 , a ψ and a λ used in our variable selection prior) on model performance, and (d) to evaluate the model performance under a misspecified setting.For each of datasets 1, 2 and 5 we fitted JLAFAC using R = 5 latent dimensions, JLAFAC is fitted with R = 3 for dataset 3, while for dataset 4 JLAFAC was fitted with R = 10 latent dimensions.All posterior inferences are based on 4,000 samples from the MCMC iterations obtained after a burn-in period of 10,000 iterations and thinning the chain every 10 samples.Convergence is assessed by monitoring the behavior of the likelihood function and the posterior distributions of q N , q S , q A and R * .On the other hand, overall mixing, monitor the effective sample size out of the 4, 000 thinned post burn-in samples for all parameters.Figure 1 presents boxplots of effective sample sizes (ESS) for ψ k,r , ξ k,r 's over k = 1, ..p; r = 1, ..., R. Since in each simulation there are only a few λ r 's, r = 1, ..., R, boxplots for effective sample sizes of λ r 's are not presented.In fact, the average ESS for λ r 's for scenario 1-5 are given by 1851,1650,2915,1436,3203 respectively.These results suggests a good performance of the algorithm in terms of mixing.
In our first simulation scenario the true dimension of the latent space is R * = 5, the number of attributes is p = 5, sample size n = 40 and number of time points T = 10.The true vector of eigenvalues takes the form λ = (0, 0, 0, λ 4 , λ 5 ), while the matrices of selection coefficients take the form Note that, in this case, we have q S = 2, q N = 0 and q A = 3.For our second scenario we again set the true dimension of the latent space to R * = 5, the number of attributes is imsart-ba ver.2014/10/16 file: JLAFAC_RG_1230_abel.tex date: March 31, 2019 p = 5, n = 40 and T = 10.However, in this case, the true vector of eigenvalues takes the form λ = (0, 0, 0, 0, λ 5 ) and the selection matrices are of the form In this case q S = 0, q N = 1 and q A = 4, i.e., the network and the nodal attributes are independent in this dataset.Note that the second simulation study allows structural mismatch in terms of positioning of nonzero entries between Ψ and Ξ.We set up the third simulation study to allow for a large mismatch between the fitted dimension R = 10 and the true dimension of latent variables.Specifically, we set the true dimension of the latent space to R * = 3, the number of attributes to p = 5, n = 40 and T = 10.In this case the true vector of eigenvalues is λ = (λ 1 , λ 2 , λ 3 ) and we set , so that q S = 3, q N = 0 and q A = 0.
The fourth simulation evaluates the performance of the JLAFAC model when both the sample size and the number of time points is moderately large.Specifically, both the true dimension R * and the fitted dimension R is set to be 3.The number of attributes is p = 5, the number of network nodes n = 100 and the number of time points T = 100.In this case the true vector of eigenvalues is λ = (0, λ 2 , λ 3 ) and we set , so that q S = 2, q N = 0 and q A = 1.The fifth and final simulation scenario aims at addressing performance of the JLAFAC model under a misspecified setting.In this simulation, the nodal attributes are simulated from (2.3) with the same selection coefficient matrices Ψ and Ξ used to simulate dataset 1.However, in this case the networks is simulated conditionally on the nodal attributes usoing a temporal exponential random graph model (TERGM) that includes two sufficient statistics, S 1 (Y l , Z l ) = n i=1 ( n j=1 y i,j,l )z i,1,l z i,2,l and S 2 (Y l , Z l ) = n i,j=1 y i,j,l with the corresponding coefficients θ 1 = l/T, θ 2 = 0.5.The network data is thus simulated from a totally misspecified TERGM with complex higher order dependence between the network and nodal attributes.This simulation sets p = 5, n = 40 and T = 10.We fit the JLAFAC model with R = 5 in this simulated data.In scenario 1-4 we generate the non-zero entries of λ, imsart-ba ver.2014/10/16 file: JLAFAC_RG_1230_abel.tex date: March 31, 2019 0.4 1.0 0.25 1.0 0.35 1.0 0.7 0.5 0.1 0.25 0.3 0.6 Table 1: True value of the hyperparameters used to generate data for our simulation study.

Link and attribute prediction
In order to evaluate the ability of the model to predict links, we carry out an outof-sample cross-validation exercise.More specifically, we randomly select 400 dyads to hold-out as a validation set and then estimate our model treating these dyads as if they were missing at random. Figure 1 shows the area under the ROC curve (AUC) of JLAFAC for the observations in the validation set.In order to assess the value of jointly modeling nodal attributes and network features, we also present the results generated by fitting only the components of the model associated with the network data, referred to as the marginal model.It is evident that in simulations in which network and attributes are associated in the true data generation mechanism (scenarios 1, 3, 4 and 5), joint modeling has clear advantages in terms of predicting missing links.This can be attributed to the fact that the shared latent factors in such cases are estimated based on both the relational and nodal attribute data, leading to more accurate estimation.This is true even in the case in which the true data generating mechanism is different from our model (scenario 5).On the other hand, the predictive performance of both models is essentially identical when q S = 0, as would be expected.
In addition to the marginal model as a competitor to JLAFAC, we also implement logistic regression of y i,j (t l ) on the (p + 1) 2 covariates obtained by taking the nodal attributes corresponding to the ith and jth node along with the all possible interactions between these nodal attributes, [1, z i,l , z j,l , z i,l ⊗ z j,l ], where z i,l = (z i,1 (t l ), ..., z i,p (t l )) and z j,l = (z j,1 (t l ), ..., z j,p (t l )) .This approach exhibits performance similar to a random classifier in all three simulation studies and is thus omitted from the comparison.
We also carry out prediction of attribute values based on both JLAFAC and a dynamic factor model that ignores the relational information.Table 2 presents the mean squared error (MSE) values obtained under each model and for each scenario.Again, the results suggest that jointly modeling both sources of information leads to much improved predictions of nodal attributes in scenarios 1, 3, 4 and 5, and that those advantages disappear in scenario 2 where the nodal attributes evolve independently from the relational data.1.57 1.17 0.09 0.08 0.66 Dynamic Factor 3.38 1.04 0.13 0.59 0.74 Table 2: Mean squared error (MSE) for the two competitors in simulation cases 1 to 5. MSE is calculated as the mean absolute squared deviation of predicted and true values of all the predictors.

Patterns of association between network and nodal attributes
Prior and posterior distributions for q S , q N , q A and R * = q S + q N + q A under JLAFAC for each of the three simulation scenarios are recorded in Table 3.In scenario 1 and 4 the model is capable of identifying both the right dimension of the latent space, the breakdown into systemic and idiosyncratic components, and thereby the lack of independence between the network and attribute data.In scenario 2 the model also is able to correctly assess the dimension of dimensions of the latent space, and thereby the independence of network and attribute data, In scenario 3 the model is able to estimate q S and q N accurately, although q A (and therefore R * ) is slightly overestimated.In spite of the misspecification in scenario 5, our model is still able to provide moderate evidence of association between the network structure and the attributes (P (q S > 0) ≈ 0.65), as well as evidence of residual correlation between the attributes (P (q A > 0) ≈ 1).
Finally, Figure 2 shows the posterior distributions for q S,1 , . . ., q S,p on each of the three simulation scenarios.Again, the model is capable of correctly identifying the number of systemic components associated with each attribute for scenarios 1, 2, 3 and 4. The results are also presented for scenario 5, which shows P (q S,1 > 0) ≈ 0.27, P (q S,2 > 0) ≈ 0.51, P (q S,1 > 0) ≈ 0.15, P (q S,1 > 0) ≈ 0.12, P (q S,1 > 0) ≈ 0.21.Thus the model correctly identifies the association between attribute 2 and the network and no association between nodal attributes 3,4,5 with the network.For nodal attribute 1, the model detects a weak association with the network.Overall, it does a fair job even under a complex mis-specified setting.

Computational Complexity
The computational complexity of the proposed model is dominated by the complexity of updating u i,1 , . . ., u i,T , v i,1 , . . ., v i,T , α k,1 , . . ., α k,T and β k,1 , . . ., β k,T for k = 1, ..., p and i = 1, ..., n However, we note that u i and v i , i = 1, ..., n can be sampled in parallel over all network nodes.Similarly, α k and β k , k = 1, ..., p can be updated in parallel over nodal attributes.Hence, parallelization allows us to supports data with moderately large.Furthermore, the choice of an OU process for modeling the dynamics of the latent factors allows us to use a forward-filtering, backward-sampling algorithm to jointly sample parameters over time (e.g., see Frühwirth-Schnatter, 1994).Hence, the Gibbs sampling sampling algorithm can support inference for a moderately-sized dataset.To illustrate this, per iteration run times are provided in Table 4 for different choices of n and T assuming parallelizing computation over multiple servers.In Section 6 we offer more discussion on several other strategies to ease computation for very large n and T and for general Gaussian process priors on the latent functions.

Sensitivity Analysis
All of the results presented above were obtained after fixing v 0 = 100.Also as mentioned before, we choose a ψ and a λ so as to ensure approximately same distribution for q S , q A , q N , q O a priori.We investigated the sensitivity of the posterior means of q S , q N , q A by rerunning the model with v 0 = 1000, 10000, a ψ = b ψ = 2, 6, 10 and a λ = b λ = 1, 3, 6. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q A1 A2 A3 A4 A5 0.0 0.2 0.4 0.6 0.8 1.0 Posterior Probabilities q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q A1 A2 A3 q q q q q q q q q q q q q q q q q q q A1 A2 A3 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q A1 A2 A3 A4 A5 0.0 0.2 0.4 0.6 0.8 1.0 Posterior Probabilities q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q A1 A2 A3 A4 A5 0.0 0.2 0.4 0.6 0.8 1.0 Posterior Probabilities (e) Scenario 5 Figure 2: Posterior distribution for q S,1 , . . ., q S,p under each of our five simulation scenarios.Filled bullets indicate the true value of the summary.Ak denotes the kth attribute.
For simulation 5, the fitted model is different from the true model.Thus the truth is not specified.
fined by linking two countries if there was at least one "positive verbal action" during the first week of the year (Goldstein, 1992).Examples of positive verbal actions include granting diplomatic recognition and positive verbal support in international forums.These data on international relations is available from http://www.stat.washington.edu/people/pdhoff/Code and was previously studied in Hoff (2015).In addition to the relational data, we consider the evolution of three country-specific economic indicators, which are modeled using a conditionally Gaussian model: the growth rate of Gross Domestic Product (GDP) Per Capita, the level of National Savings as a percentage of GDP, and the amount of International Trade as a percentage of GDP.Data on these indicators are available from the World Development Indicators Database, which is published annually by the World Bank.For this dataset we estimate JLAFAC using R = 10 latent dimensions.All inferences presented below are based on 50, 000 samples from our MCMC algorithm obtained after a burn-in period of 10, 000 samples and thinning the chain every 10 iterations.
As in Section 4, we first evaluate the ability of JLAFAC to make out-of-sample predictions.In this case we hold out 500 randomly-chosen dyads, whose values are predicted using the rest of the data.Figure 4 presents the ROC curve and the AUC value under both JLAFAC and a dynamic network model that ignores the attribute data (this comparison is similar to the one we carried out in Section 4).As in the simulation study, jointly modeling the relational and nodal attribute data provides some improvement in the predictions, although the improvement in this case is relatively minor.We also calculate the RMSE of estimating the predictors which turns out to be 0.065 as compared to the value of 0.147 for the marginal model.
Next we investigate the pattern of association between the network and country specific attributes in the data.Table 5 presents the prior and posterior distributions for q S , q N , q A and R * = q S + q N + q A .It is evident from the Table that the data favors the use of 7 latent dimensions.Furthermore, the model provides strong evidence for the presence of at least one systemic component (P (q S > 0 | data) ≈ 0.95), suggesting that the network structure and the nodal attributes are indeed associated.To investigate the pattern of association in more detail we present in Figure 4 the posterior distributions for q S,1 , q S,2 and q S,3 , where subscripts 1,2,3 correspond to growth of GDP per capita, ratio of trade to GDP and ratio of savings to GDP respectively.Note that while there is strong evidence that the ratio of trade to GDP and GDP growth per capita are associated with the structure of international relations (P (q S,1 > 0 | data) ≈ 0.65 and P (q S,2 > 0 | data) ≈ 0.98), the evidence for an association with national savings is weak (P (q S,3 > 0 | data) ≈ 0.40, respectively).A review of the literature suggests that our results are consistent with previous findings.Indeed, there is a well established relationship between international trade and conflict, which was first highlighted in the classic paper by Polachek (1980) (see also Reuveny and Kang, 1996, Morrow, 1999and Reuveny, 2000).Similarly, there is some evidence in the literature for an association between GDP growth and international conflict (e.g., see Rodrik, 1999, Anderton et al., 2003, Miguel et al., 2004and Polachek and Sevastianova, 2012), although the evidence is highly disputed and the direction of the causal effect often unclear.On the other hand, as far as we could find, an association between international conflict and the level aggregate national savings has not been proposed or reported in the literature.q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q GDP Trade Savings 0.0 0.2 0.4 0.6 0.8 1.0 Posterior Probabilities  JLAFAC and a dynamic network model that ignores information about network attributes in the international relations data.Solid and dashed curves represent ROC curves for JLAFAC and dynamic network model without network attributes respectively.Figure in the bottom shows posterior distributions for q S,1 , q S,2 and q S,3 for the international relations data.

Conclusion
This article introduces the idea of jointly modeling of network and associated nodal attributes over time.Our proposed framework relied on modeling the network and nodal attributes jointly through latent factor representations, with some latent factors shared across both models to introduce dependence of directional dyad between nodes on nodal attributes.Evolution of both network and nodal attributes over time is modeled by allowing the latent factors to vary over time, and inference is carried out using a Bayesian approach.
Because of the application that motivated our work we have assumed that the network process Y (t) is directed and binary.The model can easily be reformulated to accommodate undirected networks by replacing equations (2.1) and (2.2) with y i,j (t) ∼ Ber (θ i,j (t)) , θ i,j (t) = G µ(t) + R r=1 λ r u i,r (t)u j,r (t) , and ψ k,r α k,r (t)u i,r (t) , respectively.Similarly, situation in which y i,j (t) belong to other members of the exponential family can be easily accommodated through appropriate generalized linear factor analysis models.
An important feature of our model is its ability to yield dependence / independence among network and attributes.While our simulations suggests that these test might be somewhat robust to model misspecification (recall the results from scenario 5 in our simulation study), it is important to remember that all of our tests are conditioned on the specific model structure we have assumed.In particular, while the eigenvalue model for networks that we employ here is quite flexible, there is a possibility that it might not be able to capture all structural features of the network that affect the attributes.Similarly, our model does not incorporate lags of the latent process.While it is important to keep these issues in mind, we should note the dependence on model assumptions is a common limitation of most statistical approaches.For example, tests of conditional independence for Gaussian graphical models are valid only if the data is actually Gaussian.
Although the article presents an simulation study of the proposed approach for moderately large network and time points, one important extension of the proposed model would be to scale it for larger datasets with millions of individuals connected under social networking and observed over a large number of time points.Since OU process is a variant of the Gaussian process and there is an emerging literature on modeling Gaussian processes for big data, we intend to employ such computationally convenient models to specify correlations across large number of time points.On the other hand, a potential issue in extending these models for massive networks would be to estimate a large number of latent variables.We propose to adapt strategies described in Guhaniyogi et al. (2017); Guhaniyogi and Banerjee (2018); Heaton et al. (2017) for efficient estimation of large number of correlated latent variables.It is also interesting to extend our methodology for online social networks.Some of these constitutes our current work.

Figure 1 :
Figure 1: Figures in the first row presents boxplots of Effective Sample Sizes (ESS) for scenario 1-5 for ψ k,r 's and ξ k,r 's.The figures have horizontal dashed lines at 4000 to indicate that the ESS are calculated on 4000 post burn-in thinned MCMC samples.Second row presents Area Under the ROC Curve (AUC) for our out-of-sample crossvalidation in five different simulation scenarios.Continuous curves correspond to the results under our JLAFAC model, while the dashed curves come from a marginal model that does not incorporate the attribute data.

Figure 4 :
Figure 4: Figure in the top presents ROC curves and corresponding AUC values forJLAFAC and a dynamic network model that ignores information about network attributes in the international relations data.Solid and dashed curves represent ROC curves for JLAFAC and dynamic network model without network attributes respectively.Figure in the bottom shows posterior distributions for q S,1 , q S,2 and q S,3 for the international relations data.