A Latent Shrinkage Position Model for Binary and Count Network Data

Interactions between actors are frequently represented using a network. The latent position model is widely used for analysing network data, whereby each actor is positioned in a latent space. Inferring the dimension of this space is challenging. Often, for simplicity, two dimensions are used or model selection criteria are employed to select the dimension, but this requires choosing a criterion and the computational expense of fitting multiple models. Here the latent shrinkage position model (LSPM) is proposed which intrinsically infers the effective dimension of the latent space. The LSPM employs a Bayesian nonparametric multiplicative truncated gamma process prior that ensures shrinkage of the variance of the latent positions across higher dimensions. Dimensions with non-negligible variance are deemed most useful to describe the observed network, inducing automatic inference on the latent space dimension. While the LSPM is applicable to many network types, logistic and Poisson LSPMs are developed here for binary and count networks respectively. Inference proceeds via a Markov chain Monte Carlo algorithm, where novel surrogate proposal distributions reduce the computational burden. The LSPM's properties are assessed through simulation studies, and its utility is illustrated through application to real network datasets. Open source software assists wider implementation of the LSPM.


Introduction
Network data are a collection of interconnected objects which are usually represented formally using graph theory. The objects are often called nodes while their connections to each other are called edges. Living in an increasingly connected world, network data have garnered increased interest in recent years. Social network data are a well known example where friendships [Liu and Chen, 2021], competitions [D'Angelo et al., 2019], companies [Ryan et al., 2017], households [Fosdick et al., 2019], and other social relations are exhibited between individuals (or actors) and can be modelled to understand how people interact in different situations. Beyond that, the network-based perspective has found useful applications in complex systems across various fields including computational biology such as protein-protein interactions [Chen et al., 2019]; neuroscience on the topological properties of brain connectomes [Yang et al., 2020]; economy on understanding trades between countries [Sajedianfard et al., 2021]; education on online problem based learning [Saqr and Alamro, 2019]; and public health on studying the spread of infectious disease [Jo et al., 2021].
There are many types of models for network data. Most are variants of the random graph model [Erdős andRényi, 1959, Gilbert, 1959], the stochastic blockmodel [Holland et al., 1983], or the latent position model [LPM, Hoff et al., 2002]. The LPM has received much attention as it provides a meaningful visualisation of the data and rich qualitative information [Ma et al., 2020, Tafakori et al., 2021. The LPM postulates that the nodes have positions in a latent space, and that the observed edge formation process is explained through the distance between the nodes' latent positions. In this way, important features such as transitivity, reciprocity, and homophily are easily accounted for in the model, which captures local and global structures , Kim et al., 2018, Smith et al., 2019.
The number of dimensions of the latent space, p, in the LPM is usually unknown and needs to be inferred from the data. Typically, p is fixed at two for easy visualisation and interpretation [Sewell and Chen, 2016, Zhang et al., 2020, Liu and Chen, 2021. However, this is an arbitrary choice and may result in an incomplete or overly complex description of the network. The estimation of p has received some attention in the literature. Attempts to tackle the dimension selection issue have used model selection tools such as a variant of the well-known Bayesian information criterion [BIC, Handcock et al., 2007], the Akaike information criterion Murphy, 2010, Sewell, 2021], the deviance information criterion , and the Watanabe-Akaike information criterion [Ng et al., 2021, Sosa andBetancourt, 2022]. Models with different numbers of dimensions are fitted and then compared using the chosen model selection criterion. However, the selected criterion may not be formally correct for choosing p [Handcock et al., 2007]. Furthermore, fitting a range of models, each with a different value of p, can incur a large computational cost and restrict the set of models considered.
An alternative approach to the dimension selection problem is to have an automatic process that infers the optimal dimension of the latent space from the data. This type of process has been used in areas such as factor analysis using shrinkage priors [Bhattacharya and Dunson, 2011, Durante, 2017, Murphy et al., 2020, or spike and slab distributions [Legramanti et al., 2020]. In network analysis, within a stochastic blockmodel setting, automatic dimension estimation has been used in Yang et al. [2020] in a frequentist framework while Passino and Heard [2020] used a Bayesian framework.
In the context of the LPM, a number of proposals for automatic inference of the dimension of the latent space using shrinkage priors have been suggested. Rastelli [2018] develops an approach to automatically determine the number of dimensions by creating a finite mixture of unidimensional LPMs and utilising a Dirichlet shrinkage prior on the mixing proportions. The key idea is to intentionally overfit the number of dimensions by considering a relatively large number of mixture components and then empty the superfluous components using the shrinkage prior during inference. The approach shows good performance in recovering the true dimension with small networks but tends to overestimate the number of dimensions when the number of nodes is large. Also, the approach is less interpretable than the original LPM as the Euclidean distance between nodes across dimensions has no relevant meaning. Durante and Dunson [2014] employ the shrinkage prior of Bhattacharya and Dunson [2011] when considering the dimension of the latent space in the projection model formulation of the LPM for a dynamic binary network; the distance model formulation of the LPM is considered here.  use a similar shrinkage approach but in the context of a population of networks.
Here the latent shrinkage position model (LSPM), which facilitates automatic inference on the dimensionality of the latent space, is introduced. The LSPM is a Bayesian nonparametric model that theoretically allows infinitely many dimensions. This is established through a shrinkage process prior inspired by Bhattacharya and Dunson [2011]. The key idea is that the variance of the latent positions on each dimension becomes increasingly small as the number of dimensions grows. The informative, effective dimensions are those that have non-negligible variance in the latent positions while the uninformative, negligible dimensions will have latent position variances that tend to zero. With the LSPM, the need to select a model selection criterion is obviated, only a single model needs to be fitted and the LPM's ease of interpretation is retained. In addition, posterior uncertainty concerning the latent space dimension is automatically accounted for. The LSPM builds on Durante and Dunson [2014] and  by (a) considering binary and count valued networks with the development of the logistic LSPM and the Poisson LSPM respectively, (b) introducing a truncated version of the Bhattacharya and Dunson [2011] multiplicative gamma process prior and (c) proposing a computationally efficient Markov chain Monte Carlo inference scheme through the use of surrogate proposal distributions.
The remainder of this article is structured as follows: Section 2 describes the LPM and the proposed LSPM. Section 3 outlines the inferential process. Section 4 contains simulation studies exploring the performance of the LSPM in terms of inferring the number of effective dimensions and estimating parameters across realistic settings. Section 5 applies the proposed LSPM to a range of networks, having different edge types and characteristics. Section 6 concludes the article and discusses some potential extensions. R [R Core Team, 2022] code with which all results presented herein were produced is freely available from the lspm GitLab repository.
2 The latent shrinkage position model

The latent position model
Network data typically take the form of an n × n adjacency matrix, Y, where n is the number of observed nodes, with entries y i,j denoting the relationship between node i and node j. To probabilistically model the presence or absence of an edge between two nodes, the latent position model [LPM, Hoff et al., 2002] assumes that the observed edge formation process can be explained in terms of the nodes' latent positions in a p-dimensional latent space. Under the LPM, edges are assumed independent, conditional on the latent positions of the nodes. Self-loops are not permitted and thus the diagonal elements of Y are zero. The sampling distribution is where Z is the n × p matrix of latent positions with z i denoting the latent position of node i, while α is a global parameter that captures the overall connectivity level in the network. A generalised linear model [McCullagh and Nelder, 1998] can be used to model the probability of a range of edge types between nodes as a function of their latent positions. Hoff et al. [2002] propose the distance model formulation of the LPM by considering the Euclidean distance between z i and z j , the latent locations of nodes i and j respectively, i.e., where g is an appropriate link function. As in Murphy [2016] andD'Angelo et al. [2019], here the distance is taken to be the squared Euclidean distance. This model is particularly suitable for undirected networks or directed networks that exhibit strong reciprocity. The projection model formulation is an alternative LPM that considers the angle between nodes; Salter-Townshend et al. [2012] provide a comprehensive review.
The LPM has been predominantly used to model binary valued networks via the use of a logit link function where q i,j is the probability of forming an edge between node i and node j. The probability of forming an edge between nodes when the distance between them is zero is determined by α. There have been various extensions to this model, for example, by including covariates [Hoff et al., 2002] and random effects [Hoff, 2003], sender and receiver effects [Hoff, 2005], social reach [Sewell and Chen, 2015], and stochastic blockmodels [Ng et al., 2021]. While feasible, such extensions are not considered here in the interest of simplicity. In addition to binary valued networks, various link functions within the generalised linear model framework can be utilised to model alternative edge types. This includes, but is not limited to, the use of a Poisson distribution [Hoff, 2005] for modelling count edges, and a truncated normal distribution [Sewell and Chen, 2016] and an exponential distribution [Rastelli, 2018] for non-negative continuous edges. In addition to binary edges, the LPM for count edges is considered here where log(λ i,j ) = α − ∥z i − z j ∥ 2 2 , where λ i,j is the rate parameter of a Poisson distribution between node i and j. A similar interpretation to the binary case applies here where α is the rate for forming an edge between a pair of nodes when the distance between them is zero.
Beyond different edge types, the LPM has been extended, for example, to facilitate clustering of nodes in a network [Handcock et al., 2007, Gormley andMurphy, 2010], to model dynamic networks [Sewell and Chen, 2015], and to model multiple networks [Gollini and Murphy, 2016, Salter-Townshend and McCormick, 2017, D'Angelo et al., 2019. While extending the LPM has received much attention, attempts to objectively infer the latent space dimension have been few; subjectively choosing p = 2 or choosing between a set of possible dimensions via a selected model selection criterion are typical approaches. Here, the LSPM is proposed to address the inference of p in a Bayesian nonparametric framework using a shrinkage prior.

The latent shrinkage position model
The latent shrinkage position model (LSPM) is a LPM which employs a multiplicative truncated gamma process (MTGP) prior allowing the model to intrinsically infer the number of effective dimensions from the network data, where "effective dimensions" means the dimensions necessary to fully describe the network. The MTGP prior builds on the multiplicative gamma process (MGP) prior that originates from the infinite factor model literature [Bhattacharya and Dunson, 2011], where the MGP prior allows an infinite number of factors whose loadings are shrunk towards zero as the factor number increases. Legramanti et al. [2020] modified the MGP by employing a spike and slab distribution structure that increases the prior mass on the spike for later factors. Other popular shrinkage priors include the Dirichlet-Laplace [Bhattacharya et al., 2015], generalized double Pareto [Armagan et al., 2013], and the horseshoe [Carvalho et al., 2010] priors.
Under the LPM, the prior on the latent positions is typically assumed to be a zero centred Gaussian with equal variance across each of the p dimensions. Under the LSPM, the latent positions are assumed to have a zero centred Gaussian distribution with diagonal precision matrix Ω, whose entries ω ℓ denote the precision of the latent positions in dimension ℓ, for ℓ = 1, . . . , ∞. The LSPM employs a MTGP prior on the precision parameters: the latent dimension h has an associated shrinkage strength parameter δ h , where the cumulative product of δ 1 to δ ℓ gives the precision ω ℓ . An unconstrained gamma prior is assumed for the shrinkage strength parameter for the first dimension, while a truncated gamma distribution is assumed for the remaining dimensions to ensure shrinkage. Specifically, for i = 1, . . . , n Here a 1 and b 1 are the shape and rate parameters of the gamma prior on the shrinkage parameter of the first dimension, while a 2 is the shape parameter, b 2 is the rate parameter, and c 2 is the left truncation point (here set to 1) of the truncated gamma prior for dimensions h > 1. This MTGP prior results in an increasing precision and therefore a shrinking variance in the higher dimensions of the latent space. While Bhattacharya and Dunson [2011] state that ω ℓ stochastically increases under the restriction a 2 > 1, Durante [2017] shows that this does not guarantee shrinkage, and indeed that shrinkage does not occur when a 1 = 1 and a 2 = 1.1. While the reader is referred to Durante [2017] for guidelines to study the behavior of the MGP under all possible combinations of a 1 and a 2 , Durante [2017] theoretically and empirically suggests that using a 1 = 2 and a 2 = 3 induces posterior distributions with the desired characteristics, with improved performance when the true number of dimensions is low. Although the ω ℓ will be stochastically increasing under such hyperparameter settings, this sometimes poses difficulties in the MCMC algorithm used for inference. In particular, δ h < 1 can be accepted in some cases, which increases the latent positions' variance on dimension h rather than decreasing it. To tackle this, here a truncated gamma distribution bounded between 1 and ∞ is assumed for the second and higher dimensions to ensure shrinkage across these dimensions. Nevertheless, the shrinkage parameter on the first dimension does not require the use of a truncated gamma distribution allowing it to have an unconstrained range and allowing the first dimension to encode as much information as required. The variance of the first dimension therefore determines the maximum possible variance for higher dimensions.
Empirically (see supplementary material, Appendix C), while the use of the MGP rather than the MTGP prior tends to make little difference in terms of inferring the posterior mode of the number of dimensions, the MGP does tend to result in more diffuse posteriors, with the MTGP being more precise in its inference of the dimension and the variance parameters. Moreover, the MTGP prior is faithful to the inherent model principle of shrinking variance at higher dimensions. Furthermore, the MTGP prior induces an intuitively appealing decrease in the order of importance of subsequent dimensions, as it is typical of dimension reduction methods.
Under this MTGP prior, the LSPM is nonparametric with infinitely many dimensions, where unnecessary higher dimensions' variances are increasingly shrunk towards zero. Dimensions that have variance very close to zero will then have little to no meaningful information encoded in them as the distances between nodes will be close to zero. Thus, the effective dimensions are those in which the variance is non-negligible.

Properties of the latent shrinkage position model
Given the use of the multiplicative truncated gamma process prior, it is of interest to explore the behaviour of distances in the latent space imposed by the LSPM. The full derivations for this section are given in the supplementary material in Appendix A.
The expected squared distance between nodes i and j within the ℓ-th dimension given its precision ω ℓ is where Γ is the upper incomplete gamma function. Since Γ(a 2 −1, 1) < Γ(a 2 , 1), any dimension higher than ℓ = 1 is expected to have a smaller distance. Increasing a 1 will result in the biggest decrease in all of the expected distances for ℓ = 1, . . . , ∞ as later dimensions are based on previous dimension(s). Figure 1, in which a 1 is set to 2, illustrates the typical behaviour of the expected squared distance between a pair of nodes. Increasing a 2 will decrease the gamma function ratio in (3) and thus decrease the expected squared distance as shown in Figure 1(a). For fixed a 1 and b 1 , the minimum decrease in expected squared distance is lim a 2 →0 Γ(a 2 −1,1) Γ(a 2 ,1) = 0.68 which means the expected distance contribution from each higher dimension will be at least 32% less than the previous dimension.
The expected squared distance between nodes i and j in the latent space is the geometric sum of (3) across all dimensions ℓ = 1, . . . , p < ∞, i.e.
Figure 1(b) shows the behaviour of E[(z i − z j ) 2 | ω 1 , . . . , ω p ] under different truncation levels while varying the hyperparameter a 2 . As a 2 increases, the expected squared distance contribution from the higher dimensions gets increasingly small and encourages a smaller number of effective dimensions, which is notable at a 2 > 6, showing very small distance contributions from higher latent dimensions. The positive limit as a 2 → ∞, alluded to in Figure 1(b), is due to the contribution of the expected squared distance from the first dimension.

Inference
The joint posterior distribution of the LSPM is where P(Z | δ) and P(δ) are the prior distributions outlined in Section 2.2; for α, a N (µ α = 0, σ 2 α = 9) non-informative prior is assumed throughout, where µ α is the mean and σ 2 α is the variance. Markov chain Monte Carlo (MCMC) is employed here to draw samples from the joint posterior distribution. Although the MTGP prior is nonparametric, it is not implementable in practice without setting a truncation level, p, on the number of dimensions fitted. Thus, an adaptive Metropolis-within-Gibbs sampler is used to dynamically shrink or augment the truncation level (see Section 3.2). Details of the derivations of the full conditional distributions of the latent positions and parameters are given in the supplementary material, in Appendix B.
The MCMC algorithm proceeds by iterating the following steps for s = 1, . . . , S where s denotes the current iteration and S is the total number of iterations.
1. Sample a valueŽ from the MVN(Z (s) , kΩ −1(s) ) proposal distribution where k is a step size factor. AcceptŽ as Z (s+1) with probability 2. Sample a valueα from an informed Gaussian proposal distribution and acceptα as α (s+1) following the Metropolis-Hastings (M-H) acceptance ratio. Details of this informed proposal distribution are given in Section 3.1.

Sample δ
where z iℓ is the latent position of node i in dimension ℓ.
where s * = s + 1 for m < h and s * = s for m > h. Since the likelihood function considers the Euclidean distances between the latent positions, it is invariant to rotation, reflection, or translation of the latent positions. Thus, to ensure valid posterior inference, as in Gormley and Murphy [2010], here a Procrustean transformation is performed which translates, reflects and rotates the configurations of the latent positions Z (1) , . . . , Z (S) to be as similar as possible to a reference configurationZ. This reference configuration is the configuration with the highest log-likelihood in the burn-in period of the MCMC chain. While this choice is arbitrary, it is somewhat irrelevant as the configuration is used only to identify the model.

Calculate ω
As inference from the MCMC algorithm is sensitive to its initial values, they are initialized using the following approach. When s = 0: 1. Calculate the geodesic distances between the nodes in the network. The geodesic distance [Kolaczyk and Csárdi, 2020] is the shortest path between two nodes, measured as the number of edges that must be traversed to move from one node to the other.
2. Apply classical multidimensional scaling (MDS) [Cox and Cox, 2001] to the geodesic distances of the network.
3. K-means clustering [Wu, 2012] is applied to the eigenvalues from the MDS and the number of eigenvalues in the smallest cluster is used as the initial truncation level, p 0 .
4. Set Z (s) to be the set of n × p 0 positions from the resulting MDS coordinates.
5. Fit a standard regression model, with regression coefficients α and β, to the vectorised adjacency matrix, depending on the edge type, e.g., (a) for binary edges, fit a logistic regression model where log odds(q i,j = 1) = α − β∥z j ∥ 2 2 , to obtain estimatesα andβ.

An informed proposal distribution for α
To improve mixing and the speed of convergence of the Markov chain, an informed proposal distribution is used for α in the Metropolis-Hastings algorithm. This proposal distribution has parameters that are updated as the chain progresses, ensuring that it shadows the target distribution well. This is achieved by approximating a non-linear term in the loglikelihood function using a quadratic Taylor expansion, similar to Gormley and Murphy [2010]. Details of the proposal distribution for the Poisson LSPM are provided below; the derivation of the informed proposal distribution for the logistic LSPM is given in the supplementary material in Appendix B.2.
For the Poisson LSPM, the loglikelihood is the second term of which can be approximated by a quadratic Taylor expansion around α (s) to give Substituting this expression and completing the square gives an approximated likelihood function that is quadratic in α, which, when combined with the normal prior on α, suggests the use of a normal proposal distribution N (μ α,C ,σ 2 α,C ) with mean The parameters of this informed proposal distribution depend on the current state of the chain and thus are automatically updated each iteration. This maintains the approximation of the target distribution across all iterations of the algorithm. A step size factor that multiplies the variance parameter of the proposal distribution is introduced to assist in achieving satisfactory acceptance and further improve mixing. In terms of computational gain when compared to using a random walk proposal distribution, in the case of a network with n = 50 and 2 latent dimensions, for example, using the informed proposal distribution reduced the thinning required by up to 25%.

An adaptive MCMC sampler to infer the latent dimension
Inference through MCMC for the LSPM is implemented using an adaptive sampler. The sampler shrinks or augments the truncation level p, in order to have a finite number of active dimensions in each iteration of the MCMC chain. A similar sampler is used in Bhattacharya and Dunson [2011] and Murphy et al. [2020]. At the s-th iteration, adaptation occurs with probability P(s) = exp(−κ 0 − κ 1 s), with κ 0 ≥ 0 and κ 1 > 0 chosen so that adaptation occurs often at the beginning of the chain but decreases exponentially fast as the chain grows. Here, 3 ≤ κ 0 ≤ 5 and κ 1 = 3 × 10 −5 were found to be useful and are used across the studies that follow. Adaptation only occurs after the burn-in period, in order to ensure targeting of the posterior distribution. At an adaptation step, when p > 1, a reduction in the truncation level is based on a criterion which considers the cumulative proportion of variance that the dimensions ℓ = 1, . . . , p − 1 contain. If the dimensions up to the ℓ-th cumulatively contain at least a proportion ϵ 1 of the total variance, then the dimensions from ℓ + 1 to p add little information, and p is reduced to ℓ. In practice, ϵ 1 = 0.9 has been found to work well. When the adaptation criterion for reducing p is not met, an increase in p is then considered by examining δ −1 p and a threshold ϵ 2 . If δ −1 p > ϵ 2 , a new dimension is added, by sampling δ p+1 from the MTGP prior and the latent positions from a univariate zero mean Normal distribution with induced variance ω p+1 . Under the MTGP, δ −1 p < 1 and when δ −1 p ≈ 1 shrinkage in the pth dimension is weak and more dimensions may be required to fully describe the data. Therefore we consider ϵ 2 = 0.9 which was found to work well in practice.
When p = 1, only an increase in p is possible. In this case, we consider the proportion of latent positions that have absolute deviation from the mean that exceeds the 95% critical value of a standard Normal distribution. When this proportion is ϵ 3 times greater than the expected 0.05 proportion, p is increased to 2. Setting ϵ 3 = 5 demonstrated good performance in simulation studies, with higher values of ϵ 3 increasing the tendency to remain at 1 dimension.
The hyperparameters κ 0 and a 2 can influence mixing in the adaptive MCMC sampler. When p is increased or decreased, it can take time for α to mix well as it compensates for the change in dimension. Small κ 0 increases the adaptation frequency, which can make achieving sufficient mixing of α challenging. Similarly, small a 2 encourages the addition of a dimension with variance similar to that of the pth dimension, which results in large distances between latent positions and α needs time to adjust. Conversely, an a 2 that is too large adds a dimension that has very little variance and encodes little information. The aforementioned 3 ≤ κ 0 ≤ 5 and a 2 = 3 were found to be a good combination in practice.
The adaptive sampler allows inference on the posterior distribution of the number of active dimensions p. The posterior mode p m is used as the estimate of the effective p, with credible intervals quantifying the associated uncertainty.

Posterior predictive checking
Posterior predictive checking is a useful way of assessing the fit of a model to the data. Any systematic differences between the networks simulated from the posterior predictive distribution and the observed network may indicate potential failings of the model [Gelman et al., 2013]. Sections 4 and 5 include posterior predictive checks to assess the fit of the LSPM to simulated and observed networks respectively. Samples drawn from the posterior predictive distribution are used to simulate replicate networks under the fitted model. These simulated networks are then compared to the observed network by checking similarity metrics, network properties, and distances between networks. The type of metrics used depends on the network type.

Binary valued networks
For binary networks, the similarity metrics considered here are the accuracy and the F 1 -score between the observed network and the replicate networks drawn from the posterior predictive distribution. Accuracy is a measure of the correctly identified edges i.e., accuracy = (TP + TN)/[n(n − 1)], where TP is the number of true positives and TN is the number of true negatives. The F 1 -score is the harmonic mean of precision and recall i.e., F 1 -score = 2 × precision×recall precision+recall where precision = TP/(TP + FP) and recall = TP/(TP + FN), with FP the number of false positives, and FN the number of false negatives.
Network properties such as density and transitivity are also considered. A network's density is the ratio of the number of observed dyadic connections over the number of possible dyadic connections, while transitivity is three times the number of triangles divided by the total number of connected triples. These metrics assess if the LSPM captures network properties of the observed data.
The Hamming distance is also considered, measuring the normalised difference between the observed network and a replicate network. The Hamming distance is given by 1 i,j is the link between nodes i and j in the rth replicate network from the posterior predictive distribution.

Count valued networks
As binary network measures are not directly applicable to count valued networks, the (log) frequencies of counts in replicate networks from the posterior predictive distribution are compared to those of the observed network. The mean absolute difference between replicate and observed counts is also considered.
In the simulation studies outlined in Section 4, the fit of the Poisson LSPM is also evaluated in terms of Euclidean distances for estimated and actual latent positions. Specifically, the ratios between the n(n − 1)/2 distances derived from the posterior mean latent positions and the distances derived from the true latent positions are computed. A distribution of these ratios tightly centered around 1 will indicate good model fit.

Simulation Studies
The performance of the LSPM is assessed on simulated data scenarios. The simulated data are generated by Bernoulli trials for each node pair based on the probabilities derived from the distances between the nodes' latent positions. For count valued edges, a Poisson distribution is employed instead of the Bernoulli.
The latent positions are simulated according to (2) with the shrinkage strengths being manually set to explore their effect across different settings. Hyperparameters are set as µ α = 0, σ α = 3, a 1 = 2, a 2 = 3, b 1 = b 2 = 1. A total of 30 networks are simulated in each case. Different step sizes between 0.1 to 3 are used in the proposal distributions to ensure acceptance rates are within the 20% − 40% range. The MCMC chains are run for 1,000,000 iterations for Section 4.1 and 4.3 with a burn-in period of 100,000 iterations, thinning every 1,500th. In Section 4.2 and Section 4.4, 500,000 iterations are considered, the burn-in period is 50,000 with thinning every 1,500th iteration for the logistic LSPM and every 1,000th iteration for the Poisson LSPM.
The simulation studies are structured as follows: Section 4.1 examines the impact of different initial truncation levels, p 0 . Section 4.2 assesses LSPM capability under different network sizes, n. Section 4.3 studies the performance of the logistic LSPM under different network densities; Section 4.4 explores the performance of the Poisson LSPM under different levels of overdispersion. Where relevant, violin plots are used to visualise posterior distributions for each of the 30 simulated networks. Red crosses or red lines within the violin plots represent the true values used to simulate the network. Throughout, the LSPM is compared with the LPM and posterior predictive checks are used to assess model fit. Additional results and posterior predictive checks for the simulation studies are available in the supplementary material, in Appendix D.

Study 1: initial truncation level
This section explores the effect of the initial number of latent dimensions p 0 on the inference of the number of active dimensions. Networks with n = 100 are generated with the true number of effective latent dimensions p * = 4 and shrinkage strengths of δ 1 = 0.5, δ 2 = 1.1, δ 3 = 1.05, and δ 4 = 1.15 which mean that the second dimension has importance similar to the first. Here, α = 6 meaning moderate network density (i.e. ≃ 20%) in the case of binary networks. Four initial truncation levels are considered: p 0 initialised as described in step 3 of Section 3 (termed 'auto') and p 0 = {2, 4, 10} representing situations where the initial truncation has been underestimated, correctly specified, and overestimated, respectively. Across the simulated networks, 3 ≤ p 0 ≤ 6 were found under the 'auto' procedure, with p 0 = {4, 5} in the majority of cases. Figure 2 shows the posterior of the number of active dimensions p inferred from the 30 simulated networks. Under the 'auto' initialisation procedure, and when p 0 > p * , the posterior concentrates around the true number of dimensions. However, when p 0 < p * , the posterior tends to concentrate on dimensions lower than p * . Further results are summarised in Table 1, which shows that for all initial values p 0 , the 95% credible intervals include the true dimension p * . Across binary and count simulated networks, the proportion of times the posterior modal dimension p m = p * = 4 is greater than 0.63 under the 'auto' initialisation, and is greater than 0.53 when p 0 = 10, with a poorer proportion of 0.13 or less when initialising p 0 < p * . Table 1 also reports the Procrustes correlations between the true latent positions and the LSPM posterior mean positions, conditioned on p m ; there is good agreement across network types and initialisation strategies. These results suggest that the 'auto' approach or starting with a reasonably large p 0 is advisable for accurate inference on the number of dimensions. Figure 3 illustrates the posterior distributions of the shrinkage strength parameters, with the number of dimensions truncated at 8 for visualization purposes. When p 0 = {4, 10} or is 'auto' initialised, the shrinkage parameter δ 5 tends to have a higher and more diffuse posterior than δ 4 , correctly indicating strong shrinkage after the 4th dimension. However, when p 0 = 2 < p * , the shrinkage parameters δ h for h > p 0 tend to be overestimated, causing excessive shrinkage below the true number of dimensions. When p 0 < p * , the adaptive sampler can struggle to increase the dimension, with the shrinkage prior hyperparameter being very influential. These results support the suggestion that the 'auto' approach or starting with a reasonably large p 0 is advisable for accurate inference on the number of dimensions.
Posterior distributions for the intercept parameter α, conditional on different numbers of active dimensions p = {2, 3, 4, 5}, are illustrated in Figure 4. When p = p * = 4, for any p 0 , α is accurately inferred, with more diffuse posteriors in the case of binary networks than for count networks. However, when p ̸ = p * , the α parameter compensates for the additional or missing dimensions: when p < p * underestimation of α occurs, due to the missing contribution from the Euclidean distances in the omitted dimensions. Conversely, when p > p * , α is overestimated to account for the additional distance contributed by the extra dimensions, but with a smaller bias than when p < p * .
Posterior predictive checks under the logistic LSPM for binary simulated networks initialised using the 'auto' approach and p 0 = {2, 4, 10} are shown in Figure  5. A number of LPM models were fitted using the latentnet R package [Krivitsky and Handcock, 2020] with p = {1, . . . , 8}; the BIC suggested p = 4 dimensions as optimal. The logistic LSPM fitted well and similarly across p 0 ≥ p * while model fit was poorer across all metrics considered when p 0 < p * . The LPM with p = 4 also fitted well, but required multiple models to be fit and the use of a model selection criterion. Figure 6 shows the posterior predictive checks for the Poisson LSPM fitted to simulated count networks. Similar behaviour to the binary case is observed with poor model fit when p 0 < p * and improved fit when p 0 ≥ p * : mean absolute count differences are small and ratios of pairwise distances between true and inferred latent locations are centred around 1. There is good agreement between the log frequency of counts in posterior predictive networks and the observed counts, with p 0 = 2 having larger uncertainty.
In terms of computational cost, fitting the logistic LSPM on a computer with an i7-10510U CPU and 16GB RAM took on average 40 minutes for a simulated binary network with n = 100. For comparison, fitting a single p = 4 LPM with the latentnet [Krivitsky and Handcock, 2020] R package, with default settings but the same burn in period and thinning as LSPM, took on average 37 minutes. However, fitting a number of LPM models each with a different p and choosing between them using the BIC is required in the LPM setting, thus incurring additional computational cost. Table 1: For different initial truncation levels, p 0 , the posterior mode p m of the number of active dimensions, the proportion of the 30 simulated networks for which p m = p * = 4, and the Procrustes correlation of the 30 simulated networks' latent positions with the posterior mean positions. The 95% credible intervals are given in the brackets.
Inference on p m , the posterior mode of p, and the Procrustes correlations between inferred and observed latent locations as n varies are detailed in Table 2. The posterior mode p m is accurate across different n, while the 95% credible intervals' upper bounds tend to increase as n increases. This affects the proportion of times the posterior mode is equal to the optimal p * = 2. Networks with larger n are characterized by larger uncertainty on the dimension of the latent space, as the sampler tends to explore higher dimensional solutions. In the majority of cases where p m is not 2, the posterior mode indicates 3 dimensions. The Procrustes correlation between true and posterior mean latent positions, conditioned on p m , are increasingly accurate and precise as n increases, with higher values for count than binary networks. As n increases, posterior predictive checks under the LSPM indicate improved model fit (see supplementary material, Appendix D).

Study 3: density in binary networks
In the context of binary networks, the impact of network density on the performance of the LSPM is explored. Networks with n = 50, p * = 3, δ 1 = 0.5, δ 2 = 1.1, and δ 3 = 1.05 are generated. A range of α values between 0 and 30 are used to simulate networks with densities ranging from 2% to 99%. Table 3 reports the posterior mode of p and Procrustes correlations between inferred and true latent positions under different α values. Under the LSPM, the accuracy of p m increases as the network density increases from 2% to 65%, but accuracy deteriorates for denser networks with density > 0.79. Figure 7(a) illustrates the effect of different network densities on the estimation   Figure 7(b) where the Procrustes correlations decrease for large values of α, the true latent positions are difficult to recover since their influence on the link probabilities is small in comparison to α when α is large. Similarly, the Procrustes correlations decrease for lower values of α as there are many possible locations that will produce a mostly unconnected network. In summary, the logistic LSPM performs best for networks of moderate density. Additional posterior distributions of dimensions, variances and shrinkage strength parameters, and posterior predictive checks, are available in the supplementary material, in Appendix D.

Study 4: overdispersion in count networks
Count network data are often characterised by overdispersion, whereby the variance of the counts is greater than their mean. Here, the impact on the performance of the Poisson LSPM of different levels of overdispersion in count network data is examined by varying the α and δ parameters. The first set of 30 networks are simulated with α = 0.5 and δ 1 = δ 2 = 1.5 giving mean counts between 0.45 and 0.60 and variance between 0.65 and 0.85 (low overdispersion). Another 30 more overdispersed count networks are generated with α = 1.5, δ 1 = 0.5, and δ 2 = 1.5 giving mean counts between 0.5 and 0.7 and variance between 1.4 and 2.0 (moderate overdispersion). The final set of 30 highly overdispersed count networks are generated with α = 5, δ 1 = 0.1, and δ 2 = 1.5 giving mean counts between 3 and 6 and variance between 220 and 420 (high overdispersion). Table 4 reports on the posterior mode of p and on the Procrustes correla-

Illustrative applications
The logistic and Poisson LSPMs are fitted to a range of real network datasets with different characteristics: the well-known Zachary karate club binary network which has a small number of nodes, a cat brain connectivity binary network with moderate number of nodes and density, a relatively large but sparse worm nervous system binary network, and an overdispersed phone calls count network. In all cases hyperparameters, initial values, and step sizes are set as in Section 4.

Zachary karate club binary network data
The well-known Zachary's karate club network [Zachary, 1977] is a social network with n = 34 members and describes the relationships in a university karate club. The karate teacher, Mr. Hi, and the club president, John, are the two central figures and the club was divided into two new clubs after an argument between them. The network is binary, undirected and has a density of 13.9%. For inference, 10 MCMC chains are run, each for 100,000 iterations with a burn-in period of 1,000 and every 400th sample thinned. Figure 9(a) indicates that the posterior modal dimension is 2, with low associated uncertainty. Upon fitting multiple LPMs with p = {1, . . . , 5}, the BIC suggests 1 dimension is optimal. Posterior predictive checks for the LPM with p = 1 and the logistic LSPM with p m = 2 (Figure 9(b)) indicate better fit for network density, transitivity and F 1 score under the LSPM while the LPM has better accuracy and Hamming distance. The Procrustes correlation between the posterior mean positions from the 1 dimensional LPM and the first dimension of the LSPM has a median value of 0.92 (standard deviation of 0.06) across 10 logistic LSPM and LPM chains. Additional results regarding the posterior distributions of variance and shrinkage strength parameters can be found in the supplementary material, Appendix E.

The cat brain connectivity binary network data
The logistic LSPM is used to analyse a cat brain connectivity network which includes n = 65 non-overlapping regions in the cortex treated as nodes and 1139 interregional macroscopic axonal projections treated as edges [Scannell et al., 1995, de Reus andvan den Heuvel, 2013]. The cortex regions were classified into four categories namely: visual regions (18 areas), auditory regions (10 areas), somatomotor regions (18 areas), and frontolimbic regions (19 areas). These classifications were based on neurophysiological information about the functional role of each brain area. Here, the network is viewed as a binary directed network, with y i,j = 1 indicating there is a connection between regions i and j, while y i,j = 0 indicating there is not. The network density is 27.37%.
To assess convergence, the logistic LSPM is fitted 10 times using different initial latent positions by introducing noise to the initial configuration obtained as outlined in Section 3. The MCMC chains are run for 500,000 iterations with a burn-in period of 50,000 and thinning every 2,000th sample. Model fitting took on average 10 minutes on a computer with an i7-10510U CPU and 16GB RAM. For comparison, fitting an LPM with p = 5 with the latentnet [Krivitsky and Handcock, 2020] R package with default settings but 500,000 iterations took 7 minutes. However, the LPM setting requires fitting multiple models, thus incurring additional computational cost. The Gelman-Rubin convergence criterion [Gelman et al., 2013] was satisfied with 1.0 ≤R < 1.1 for α and the shrinkage parameters. Visual inspection of trace plots also suggests convergence; an example is provided in the supplementary material, Appendix E. Figure 10(a) suggests that p m = 3 effective dimensions are required for these data, with low associated uncertainty. Upon fitting multiple LPMs with p = {1, . . . , 5}, the BIC suggests 2 dimensions are optimal, with 3 dimensions being next best. Posterior predictive check metrics from 30 posterior predictive replicate networks for each of the 10 logistic LSPM and LPM MCMC chains are shown in Figure 10(b). While the logistic LSPM tends to underestimate network transitivity, it performs better than the LPM across the majority of metrics, with the estimated density close to the network data density, higher accuracy and F 1 score, and lower Hamming distance. Figure 11 shows posterior mean latent positions under the logistic LSPM with p m = 3. The inferred latent positions under both the LPM and the LSPM are very similar with a mean Procrustes correlation of 0.97 (standard deviation 0.01) across the 10 chains. Nodes from the same cortical region also lie close to each other in the latent space under both the LPM and the logistic LSPM.

Worm nervous system binary network data
This binary directed network contains n = 272 nodes of neurons from the nervous system of the Caenorhabditis elegans adult male worm, with each of the 4451 edges representing the presence of either a chemical or electrical interaction between nodes. The data were reconstructed from serial electron micrograph sections by Jarrell et al. [2012]. Three unconnected nodes are removed prior to analysis. The network is sparse with a density of 6.09%. To fit the logistic LSPM to these data, each of 10 MCMC chains were run for 1,000,000 iterations with a burn-in period of 100,000 iterations and every 3,000th thinned. From a range of LPMs with p = {1, . . . , 7}, the BIC suggests p = 4 as optimal, closely followed by p = 3, as shown in Figure 12  In terms of posterior predictive checks, Figure 13 shows that the LSPM with both 3 and 4 active dimensions fit better than the LPM with 3 and 4 dimensions in terms of network density and F 1 score, but worse in terms of transitivity, accuracy, and Hamming distance. The LSPM with p m = 3 fits only slightly worse than its 4-dimensional counterpart, and so may be preferred as a lower dimensional representation. The Procrustes correlation between locations under the LSPM with p m = {3, 4} and the LPM with p = {3, 4} have similar median values of more than 0.88 (standard deviation 0.05).

Phone calls count network data
Mobile phone call logs are available from the "Friends and Family" data collected by the MIT Human Dynamics Lab [Aharony et al., 2011]. Here, the set of call logs from the November 2010 period is considered; the associated network contains the number of (directed) calls within a young community of n = 120 people. Modelling the number of calls rather than reducing the data to a binary network representing whether or not a call was made or received allows for deeper insight on the exchange of phone calls within the network. The Poisson LSPM is therefore fitted to this count network data. Fourteen disconnected nodes are removed prior to analysis. The data are overdispersed, with a mean count of 0.51 and a variance of 33.29.
A total of 10 MCMC chains were run, each from different starting configurations In terms of posterior predictive checks, Figure 14(b) shows that the Poisson LSPM posterior predictive networks tend to be more overdispersed than the observed network; considering a model that accounts for such overdispersion is likely to give improved model fit.

Discussion
The proposed latent shrinkage position model is a nonparametric Bayesian approach to model network data that focuses on the issue of inferring the dimension of the latent space. The LSPM extends the latent position model by employing a multiplicative truncated gamma process prior to allow an infinite number of dimensions in the latent space. In practice, the LSPM intrinsically infers the number of effective dimensions required to describe the network. The LSPM therefore circumvents the more computationally intensive approach where model selection criteria are used to choose the dimension after fitting a range of models with different dimensions p. Importantly, the LSPM retains the ease of interpretability inherent to the original LPM. The logistic and Poisson LSPM are developed for the analysis of binary and count network data respectively; extensions to alternative models for different link types are similarly feasible.
While fitting the LSPM is computationally feasible on the networks considered here, fitting the LSPM to large networks is relatively slow. There are many improvements that could be made to shorten run time, for example, by embedding the case-control concepts of Raftery et al. [2012] and by implementing the variational Bayesian inference of Salter-Townshend and Murphy [2013].
Since their introduction, both the concepts of the LPM for network data and the Bayesian nonparametric multiplicative gamma shrinkage prior have received much attention in a range of fields; as such the LSPM framework developed here should be of interest to a wide array of researchers and practitioners. The provision of the associated lspm R package will assist in usage of the Bayesian nonparametric LSPM models.

Acknowledgments
This publication has emanated from research conducted with the financial support of Science Foundation Ireland under Grant number 18/CRT/6049. For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. The authors are grateful to the anonymous reviewers for their suggestions which greatly contributed to improving this work. C: Generic constant term used for the derivations of the full conditional distributions. κ 0 , κ 1 : The dimension adaptation probability parameters. ϵ 1 : The threshold for decreasing dimension(s) in the dimension adaptation step. ϵ 2 : The threshold for increasing a dimension in the dimension adaptation step. ϵ 3 : The threshold for increasing a dimension in the dimension adaptation step when the active dimension is 1.

A.2 Expectation of 1 δ h
The prior for δ h is assumed to be

A.3 Expectation of the pairwise squared distance in the ℓ-th dimension
Since latent space dimensions are assumed to be independent, i.e.

B Derivation of full conditional distributions B.1 The logistic latent shrinkage position model
Links between nodes i and j are assumed to form probabilistically from a binomial distribution: For the binary network setting, the probability q i,j follows a logistic model i.e. Denoting , and the log-likelihood function is The joint posterior distribution is since Y only depends on α and Z, ∝ P (Y|α, Z)P (α|Z, δ)P (Z, δ) ∝ P (Y|α, Z)P (α)P (Z, δ) ∝ P (Y|α, Z)P (α)P (Z|δ)P (δ).
Assumptions and prior specifications are as follows: Since latent space dimensions are assumed independent, then The full conditional distribution for Z is: As this is not a recognisable distribution, the Metropolis-Hasting algorithm is employed. The full conditional distribution for δ 1 is: The full conditional distribution for δ h , where h ≥ 2 is: The full conditional distribution for α is: A Metropolis-Hasting algorithm is used to sample from this full conditional distribution since it is not in a recognisable form.
Completing the square gives: which is in the form of a Gaussian distribution with parameters .
The prior on α is N (µ α , σ 2 α ), which makes (α − α (s) ) ∼ N (µ α − α (s) , σ 2 α ), (since α (s) is a constant, it only affects the mean). Hence, putting the approximated log likelihood function and the log of the prior distribution on (α − α (s) ), this results in a sum of (the log of) two normal distributions, which gives the distribution of (α − α (s) ) to be N (µ α,B ,σ 2 α,B ), wherē Thus, here the full conditional distribution of α is approximated by a normal distribution with varianceσ 2 α,B and mean This distribution is used as the proposal distribution in the Metropolis-Hastings algorithm.

B.3 The Poisson latent shrinkage position model
A link between nodes i and j is formed probabilistically from a Poisson distribution: A natural logarithm is used as the link function: Denoting The joint posterior distribution is the same as the binary case with the exception that Y i,j |α, Z i ∼ P ois(λ i,j ) which gives: The full conditional distribution for Z is: As this is not a recognisable distribution, the Metropolis-Hasting algorithm is employed. The full conditional distributions for δ 1 and δ h , where h ≥ 2 for count networks is the same as the binary networks case since the likelihood function does not affect the full conditionals. The full conditional distribution for α is: A Metropolis-Hasting algorithm is used to sample from this full conditional distribution since it is not in a recognisable form.

B.4 Deriving an informed proposal distribution for α for the Poisson latent shrinkage position model
For count network, the log-likelihood function is The 3rd term in equation B.4 can be approximated using a quadratic Taylor expansion i.e. let g(α) = − i̸ =j [exp(η i,j )], then the quadratic Taylor expansion around g(α) is g(α) ≈ g(α (s) ) + (α − α (s) )g ′ (α (s) ) + 0.5(α − α (s) ) 2 g ′′ (α (s) ), Here, α (s) is the current value of α in the MCMC chain. Removing terms that are constant with respect to α, Introducing the constant (with respect to α) term, −α (s) y i,j : Completing the square gives which is in the form of a Gaussian distribution with parameters The prior on α is N (µ α , σ 2 α ), which makes (α − α (s) ) ∼ N (µ α − α (s) , σ 2 α ), (since α (s) is a constant, it only affects the mean). Hence, putting the approximated log likelihood function and the log of the prior distribution on (α − α (s) ), this results in a sum of (the log of) two normal distributions, which gives the distribution of (α − α (s) ) to be N (µ α,C ,σ 2 α,C ), wherē Thus, here the full conditional distribution of α is approximated by a normal distribution with varianceσ 2 α,C and mean This distribution is used as the proposal distribution in the Metropolis-Hastings algorithm.

C Comparison of LSPM performance under the MGP and MTGP priors
As discussed in Section 2.2 of the paper, it is of interest to explore the substantive impact of using the MTGP rather than the MGP prior on the performance of the LSPM. In this appendix we consider the analyses from the simulation studies and the applications examined in the paper, but where the LSPM employs the MGP rather than the MTGP prior. Figure 15 illustrates the posterior distributions of p for the logistic LSPM with the MTGP prior (top row) and the MGP prior (bottom row) for the simulated binary networks from Section 4.1 of the main paper. While the posterior modal dimension when p 0 = {2, 4, 10} is broadly similar under both priors, the posterior under the LSPM with the MTGP prior has higher precision. The LSPM with the MGP prior tends to explore higher dimensions more often. The performance of the LSPM with the MGP prior is further summarised in Table 5 where p m is only accurately inferred when p 0 = {4, 10}. The proportion of simulated networks for which p m = p * under the MGP prior is lower than when MTGP prior is used (see Table 1 in the main paper) for p 0 = 10 or 'auto'. Figure 16 illustrates posterior distributions of the variance parameters under a logistic LSPM with the MTGP and the MGP priors. Again the exploration of higher dimensions is evident under the MGP prior, giving more diffuse posteriors. Figure 17 shows similar behaviour for the intercept parameter for both MTGP and MGP priors when p varies, with the posteriors under the MTGP prior being more precise. Figure 18 provides posterior predictive checks for the logistic LSPM with the MTGP prior and the MGP prior for the simulated binary networks. These checks show very similar model fit behaviour, conditional on accurate inference on the number of dimensions.
Figures 19, 20, 21 and 22 illustrate inference after applying the logistic LSPM with the MGP prior on simulated networks from Section 4.3, the Zachary network, the cat connectome network, and the worm neuron network respectively. All again show similar behaviour in that posterior distributions are more diffuse than when the MTGP is employed and model fit is similar, condition on the correct dimension being inferred. Figure 23 and 24 illustrate two specific examples that clearly highlight the difference under the MTGP and MGP priors.      D Simulation studies' additional posterior predictive checks.
The LSPM performance is assessed via the posterior distributions of the dimension, variance and shrinkage strength parameters, and Procrustes correlations between inferred and true locations. Posterior predictive checks are employed to assess model fit. Included here are plots to assist in assessing the performance of the LSPM, to supplement those provided in the paper.