Vertex Nomination, Consistent Estimation, and Adversarial Modification

Given a pair of graphs $G_1$ and $G_2$ and a vertex set of interest in $G_1$, the vertex nomination (VN) problem seeks to find the corresponding vertices of interest in $G_2$ (if they exist) and produce a rank list of the vertices in $G_2$, with the corresponding vertices of interest in $G_2$ concentrating, ideally, at the top of the rank list. In this paper, we define and derive the analogue of Bayes optimality for VN with multiple vertices of interest, and we define the notion of maximal consistency classes in vertex nomination. This theory forms the foundation for a novel VN adversarial contamination model, and we demonstrate with real and simulated data that there are VN schemes that perform effectively in the uncontaminated setting, and adversarial network contamination adversely impacts the performance of our VN scheme. We further define a network regularization method for mitigating the impact of the adversarial contamination, and we demonstrate the effectiveness of regularization in both real and synthetic data.


Introduction and Background
Given graphs G 1 and G 2 and vertices of interest V * ⊂ V (G 1 ), the aim of the vertex nomination (VN) problem is to rank the vertices of G 2 into a nomination list with the corresponding vertices of interest concentrating at the top of the nomination list. In recent years, a host of VN procedures have been introduced (see, for example, [14,30,26,17,37,48]) that have proven to be effective information retrieval tools in both synthetic and real data applications. Moreover, recent work establishing a fundamental statistical framework for VN has led to a novel understanding of the limitations of VN efficacy in evolving network environments [27]. Herein, we consider a general statistical model for adversarial contamination in the context of vertex nomination-here the adversary model can both randomly add or remove edges and/or vertices in the network -and we examine the effect of both these contaminations on VN performance. In addition, we extend existing theory on consistent vertex nomination to multiple vertices of interest and define and derive Bayes Optimal Classifiers in this setting. We further show that there are infinitely many classes of distribution for which a vertex nomination scheme is not consistent.
The practical additional value of this paper is to 2. rigorously frame the concept of an adversary in the random graph framework; 3. develop theory showing how it is possible for an adversary to render vertex nomination schemes inconsistent; 4. demonstrate empirically that although an adversary can have a negative impact, regularization can succeed in recovering consistency.
The reason we do not prove that regularization succeeds is that the regularization scheme depends on the particular graph observation and introduces complex dependence structure into the problem. Such dependence, coupled with the already difficult spectral analysis problem, makes it unclear what exactly is even being estimated when using any spectral nomination scheme with regularization. Furthermore, the regularization scheme we consider is highly model-dependent, and our main theoretical contributions apply to any vertex nomination scheme and as such are necessary to begin to understand adversarial vertex nomination. To motivate our mathematical and statistical results further, we first consider an illustrative real data example in Section 1.1 in which we demonstrate the following: A VN scheme that works effectively with network contamination adversely impacting the performance of our VN scheme. Note that we will provide a more thorough background of the relevant literature after the motivating example in Section 1.2.

Motivating example
Consider the pair of high school friendship networks in [32]: The first, G 1 , has 156 nodes, each representing a student, and has two vertices adjacent if the two students made contact with each other at school in a given time period; the second, G 2 , has 134 vertices, again with each vertex representing a student, and has two vertices adjacent if the two students are friends on Facebook. There are 82 students appearing in both G 1 and G 2 , and we pose the VN problem here as follows: given a student-of-interest in G 1 , can we nominate the corresponding student (if they exist) in G 2 . We note here that the vertex nomination approach outlined below easily adapts to the multiple vertices of interest (v.o.i.) scenario (i.e., given students-of-interest in G 1 , can we nominate the corresponding students, if they exist, in G 2 )-and we will provide the necessary details for handling both single and multiple v.o.i. below. Recall that the VN problem assumes there is a correspondence between the vertices but that the practitioner does not have access to this correspondence. To this end, we act as though we do not know the corresponding student in each graph.
In one idealized data setting, all students would appear in both graphs as this would potentially maximize the signal present in the correspondence of labels across graphs. This bears itself out in the following illustrative VN experiment. Consider the following simple VN scheme, which we denote VN • GMM • ASE: Given vertex (or vertices) of interest v * in G 1 and seeded vertices S ⊂ V 1 ∩ V 2 (seeds here represent vertices whose identity across networks is known a priori), we proceed by embedding the graphs into a common Euclidean space R d and clustering using Mahalanobis distances between the embeddings of the vertices (see Section 4.1 for full detail).
We can consider running the VN • GMM • ASE in the idealized data setting where we only consider the induced subgraphs of G 1 and G 2 containing the 82 common vertices across graphs (call these graphs G (i) 1 and G (i) 2 ), and we can also consider running the procedure in the setting where the 52 vertices in G 2 without matches across graphs are added to G (i) 2 as a form of contamination. These unmatchable vertices can have the effect of obfuscating the correspondence amongst the common vertices across graphs, and thus can diminish VN performance. Indeed, we see this play out in Figure 1. In Figure 1, we plot the performance of VN • GMM • ASE averaged over nM C = 500 random seed sets of size s = 10. In the left figure, the x-axis shows the ranks in the nomination list and the y-axis shows the mean (± 2s.e.) number of vertices v ∈ G 1 , G 2 ). Note that the chance normalization is computed separately under the core and noisy models, and the seeming performance gain relative to chance in the contaminated setting is attributable to the fact that G 2 has significantly more vertices than the idealized G (i) 2 , and chance is therefore significantly worse. We emphasize here the effect of the contamination on VN performance; indeed, the adversarial contamination greatly (negatively) effects the performance of our vertex nomination scheme, suggesting that perhaps the vertex nomination scheme is not consistent for this class of contaminated distributions. In effect, the adversary is knocking the networks out of the consistency class for VN•GMM•ASE; see Section 2.3 for detail. While the results of Section 2.3.2 show that we cannot verify (in an unsupervised manner, without the true labels) the extent to which the contamination negatively impacts the performance of VN, in Section 3.2.1, we empirically explore the impact of regularization strategies for mitigating this contamination.
Recall that the vertex nomination problem can be stated loosely as follows: given graphs G 1 and G 2 and vertices of interest V * ⊂ V (G 1 ), rank the vertices of G 2 into a nomination list with the corresponding vertices of interest concentrating at the top of the nomination list (see Definition 10 for full detail). While vertex nomination has found applications in a number of different areas, such as social networks in [37] and data associated with human trafficking in [17], there are relatively few results establishing the statistical properties of vertex nomination. In [17], consistency is developed within the stochastic blockmodel random graph framework, where interesting vertices were defined via community membership. In [27], the authors develop the concepts of consistency and Bayes optimality for a very general class of random graph models and a very general definition of what makes the v.o.i. interesting. In this paper, we further develop the ideas in [27], with the aim of developing a theoretical regime in which to ground the notion of adversarial contamination in VN. In addition, their results are derived in the setting of a single vertex of interest; since many real application problems involve finding similar groups of nodes, we extend their results to multiple vertices of interest.
There has been significant recent attention towards better understanding the impact of adversarial attacks on machine learning methodologies (see, for example, [24,8,36,15,50]). Herein, we define an adversarial attack on a machine learning algorithm to be a mechanism that changes the data distribution in order to negatively affect algorithmic performance; see Definition 17. From a practical standpoint, adversarial attacks model the very real problem of having data compromised; if an intelligent agent has access to the data and algorithm, the agent may want to modify the data or the algorithm to give the wrong prediction/inferential conclusion. Although there has been much work on adversarial modeling in machine learning, there has been less theory developed for adversarial attacks from a statistical perspective.
The adversarial framework we consider is similar to the model considered in [8], and it is motivated by the example in the previous section in which the addition of the vertices without correspondences to G 2 negatively impacted VN performance. Suppose that we are interested in performing vertex nomination on a graph pair, but an adversary randomly adds and deletes some edges and/or vertices in the second graph. For example, suppose we are trying to find influencers on Instagram by vertex matching to Facebook. An influencer that has knowledge of our procedure may attempt to make our algorithm fail in its nominations, perhaps by friending and de-friending people on Facebook. Even if our vertex nomination scheme was working well prior to encountering the adversary, it may not be after modification by the adversary.
From a statistical standpoint, what can we say about the statistical consistency of our original vertex nomination rule? Our motivating example suggests that there are adversaries that can render our vertex nomination scheme no longer consistent, but theory is needed both to explain why that may be the case and to properly frame the problem. Hence, to answer these questions, we further develop the theory in [27] to situate the notion of adversarial contamination within the idea of maximal consistency classes for a given VN rule (Section 2.3). In this framework, the goal of an adversary is to move a model out of a rule's consistency class. We demonstrate with real and synthetic data examples how an adversary is able to move a model out of a rule's consistency class.
We finish with a brief discussion on how regularization can effectively recover consistency, though we leave this for future work.
Notation: See Table 1 for frequently used notation.
Notation Description The set of integers {1, 2, 3, . . . , k} G = (V, E) A (random) graph with vertex set V and edge set E The set of vertices in g topologically equivalent to u A vertex nomination scheme with vertex set of interest V * and observed graphs g 1 and o(g 2 ) The set of ranks of a set S under Φ(g 1 , o(g 2 ), V * )

Vertex Nomination and Consistency
Before discussing how to define adversarial attacks, we discuss the previous work of [27], the first of its kind to derive the Bayes Optimal vertex nomination scheme for one vertex. This work can be viewed as a follow-on of that work, in which we provide a groundwork for the rigorous framing of an adversary in vertex nomination. First, we will situate our analysis of the VN problem in the very general framework of nominatable distributions.
Definition 2 (Nominatable Distribution). For a given n, m ∈ Z > 0, the set of Nominatable Distributions of order (n, m), denoted N n,m , is the collection of all families of distributions F (n,m) Θ of the following form is a distribution on G n × G m parameterized by θ ∈ Θ satisfying: .., u c } as the core vertices. These are the vertices that are shared across the two graphs and imbue the model with a natural notion of corresponding vertices.

Vertices in
We refer to J 1 and J 2 as junk vertices. These are the vertices in each graph that have no corresponding vertex in the other graph 3. The induced subgraphs G 1 [J 1 ] and G 2 [J 2 ] are conditionally independent given θ.
The vertices in C are those that have a corresponding paired vertex in each graph; where corresponding can be defined very generally. Corresponding vertices need not correspond to the same person/user/account, rather corresponding vertices are understood as those that share a desired property (for example, a role in the network) across graphs. In particular, we will assume that the vertices of interest in G 1 have corresponding vertices in G 2 , and that these corresponding vertices are the vertices of interest in G 2 .
Having access to the vertex labels would then render the VN problem trivial. To model the uncertainty often present in data applications, where the vertex labels (or correspondences) are unknown a priori we adopt the notion of obfuscation functions from [27].
∈ N n,m , and let W be a set satisfying W ∩ V i = ∅ for i = 1, 2. An obfuscating function o : V 2 → W is a bijection from V 2 to W . We refer to W as an obfuscating set, and we let O W be the set of all such obfuscation functions.

VN in the Setting of a Single Vertex of Interest
With these two definitions in place, we now present the definition of a vertex nomination scheme for a single vertex of interest as in [27]. In Section 2.2, we will extend the definition of a vertex nomination scheme to encompass multiple vertices of interest. In the remainder of this section, we will let v * ∈ V 1 be the given vertex of interest in G 1 . Definition 4. (VN Scheme for single VOI) Let n, m ∈ Z > 0, and for each g ∈ G m , u ∈ V (g), let Let W be an obfuscating set and o ∈ O W be given. For a set A, let T A denote the set of all total orderings of the elements of A. A vertex nomination scheme is a function Φ : G n ×o(G m )×V 1 → T W satisfying the following consistency property: If for each u ∈ V 2 , we define rank Φ(g 1 ,o(g 2 ),v * ) o(u) to be the position of o(u) in the total ordering provided by Φ(g 1 , o(g 2 ), v * ), and we define r Φ : then we require that for any g 1 ∈ G n , g 2 ∈ G m , v * ⊂ V 1 , obfuscating functions o 1 , o 2 ∈ O W and any u ∈ V (g 2 ), denotes the k-th element (i.e., the rank-k vertex) in the ordering Φ(g 1 , o(g 2 ), v * ). We let V nm denote the set of all such VN schemes.
Remark 5. The consistency criterion, Eq. 1, models the property that a sensibly-defined vertex nomination scheme should view all vertices in a given I g (u) as being equally "interesting" in G 2 . These vertices are topologically indistinguishable, and thus are only separated by their labels which have been obfuscated via o. Truly obfuscated vertex labels should be independent of the obfuscation function, and the consistency criterion requires that the set of ranks of each set of equivalent vertices (i.e., each I g 2 (u)) does not depend on the particular choice of obfuscation function.
One can already begin to see how one might extend these definitions to multiple vertices of interest; note that Φ is a function of two graphs and a single vertex. It will be natural to require Φ to be a function of two graphs and a vertex set instead. We give these definitions in Section 2.2. We first define the error for the vertex nomination scheme defined above.
Definition 6 (VN loss function, level-k error for single VOI). Let Φ be a vertex nomination scheme, and o an obfuscating function. For (g 1 , g 2 ) realized from (G 1 , G 2 ) ∼ F c,n,m,θ with vertex of interest v * ∈ C, and k ∈ [m − 1], we define the level-k nomination loss via The level k error of Φ at v * is then defined to be The level k error is simply the probability that the rank of the vertex of interest in g 2 is not in the nomination list; this matches our intuition for what the error should be. To discuss the notion of consistency, we need to assume that the core set C of the nominated are nested in the following sense.
Definition 7 (Nested Cores). Let F = F (n,mn) cn,θn ∞ n=n 0 be a sequence of distributions in N . We say that F has nested cores if there exists an n 1 such that for all n 1 ≤ n < n , if (G 1 , G 2 ) ∼ F (n,mn) cn,θn and (G 1 , G 2 ) ∼ F (n ,m n ) c n ,θ n , we have, letting C and C be the core vertices associated with F (n,mn) cn,θn and F (n ,m n ) c n ,θ n respectively, and denoting the junk vertices J 1 , J 1 , J 2 , J 2 analogously, In [27], for any given nominatable distribution F n,m c,θ , a Bayes optimal VN scheme is defined that is simultaneously optimal at all levels k. We will denote this optimal scheme via Φ * = Φ * For a given non-decreasing sequence (k n ), we say that a VN rule We say that a VN rule Φ is universally level-(k n ) consistent if it is level-(k n ) consistent for all nested-core nominatable sequences F. Before presenting vertex nomination schemes in the multiple v.o.i. setting, we first present an important consistency result given in [27], which says that there are no universally consistent vertex nomination schemes.
Theorem 9 (Corollary 28 of [27]). Let ε ∈ (0, 1) be arbitrary, and consider a VN rule Φ = (Φ n,m ). For any nondecreasing sequence (k n ) ∞ n=n 0 satisfying k n = o(m), there exists a sequence of distributions F c,n,m,θ in N with nested cores such that This result is markedly different from the setting of classical classification, in which there exist universally consistent classifiers. In Section 3, we will explore the ramifications of Theorem 9 on our understanding of adversarial attacks on VN rules; effectively such a result might mean that an adversary acts by moving a given distribution outside of the "consistency class" of a given nomination rule (see Section 2.3 for detail).
We next extend definitions to the more practical setting of multiple vertices of interest.

Extension to Multiple Vertices of Interest
We will now rigorously define the VN problem and consistency within the VN framework for multiple vertices of interest. Combined with the results on consistency classes in Section 2.3, this will allow us to provide a statistical basis for understanding adversarial attacks in VN. Our definitions and notation are based on those in the previous section, though we have a few more general requirements. Recall that [27] defined a vertex nomination scheme as a function from Φ : G n × o(G m ) × V 1 → T W satisfying a certain consistency property. The extension to multiple vertices of interest requires that Φ be a function taking in a set of vertices. The rigorous definition is given below.
Definition 10. (VN Scheme) Let n, m ∈ Z > 0, and for each g ∈ G m , u ∈ V (g), and again let Let W be an obfuscating set and o ∈ O W be given. For a set A, let T A denote the set of all total orderings of the elements of A. A vertex nomination scheme is a function Φ : G n ×o(G m )×2 V 1 → T W satisfying the following consistency property: to be the position of o(u) in the total ordering provided by Φ(g 1 , o(g 2 ), V * ), and we define r Φ : then we require that for any We let V nm denote the set of all such VN schemes.
A VN scheme is an information retrieval tool for efficiently querying large network data sets. Rather than naively searching G 2 for interesting vertices, an appropriate VN scheme provides a rank list of the vertices in G 2 that, ideally, allows users to identify v.o.i. in G 2 in a time-efficient manner. As such, to measure the performance of a VN scheme on multiple vertices, we will adopt a recall-at-k/precision-at-k framework. More precisely, we have the following definition.
Definition 11 (Level k Nomination Loss). Let Φ ∈ V n,m be a vertex nomination scheme, W an obfuscating set, and o ∈ O W . Let (g 1 , g 2 ) be realized from ( where the (1) and (2) superscripts refer to recall and precision respectively. The error of a VN scheme is then defined as the expected loss. To wit, we have the following definition.
The level-k Bayes optimal scheme is defined as any element with corresponding errors L * ,(1) k and L * ,(2) k . In the almost sure absence of symmetries amongst the vertices in V * (i.e., I(v, G 2 ) = {v} for all v ∈ V * ), the derivation of the Bayes optimal scheme in the present |V * | > 1 setting mimics that of the |V * | = 1 setting presented in [27].

Bayes Optimal VN Scheme Construction
With notation as above, Let n, m be fixed and let For each (g 1 , g 2 ) ∈ G a n,m define where denotes graph isomorphism. For each w ∈ W and u ∈ V 2 , we also define the following restriction σ an isomorphism, σ(o −1 (w)) = u , so that the sets g partition G a n,m . To ease notation, we will denote this partition via P g n,m . We will next define a Bayes optimal scheme Φ * (optimal under both loss functions simultaneously for all k ∈ [m − 1] for the above F supported on G a n,m ).
For ease of notation, for each i ∈ [h] and u ∈ W , define Then, set (where ties are broken in a fixed but arbitrary manner) 2 )), and define See Appendix A for a proof of the optimality of such a scheme.
Bayes optimal schemes when symmetries exist for the v.o.i.-i.e., when there are v ∈ V * such that |I(v, ; g 2 )| > 1-offer additional complications and, in the case when |V * | = 1 done in [27], little additional insight. Precisely defining the Bayes optimal scheme in the case of symmetries when |V * | > 1 is notationally and technically nontrivial, and is the subject of current research.

Consistency in VN with
Consistency in the VN framework for multiple vertices is then defined as follows. For a given non-decreasing sequence (k n ), we say that a VN rule Φ = (Φ n,mn ) n=∞ n=n 0 is (where the level k n -losses here are computed with respect to F n = F (n,mn) for any sequence of obfuscating functions of V 2 with |V 2 | = m n . Note that the level k n -loss here is computed with respect to F n = F (n,mn) cn,θn . ii. level-(k n ) precision consistent for for nested V * n ∈ C n with respect to F if for any sequence of obfuscating functions of V 2 with |V 2 | = m n .
We say that a VN rule Φ is universally level-(k n ) precision recall consistent if it is level-(k n ) precision recall consistent for all nested-core nominatable sequences F. Theorem 9 in the previous section (Corollary 28 from [27]) proves that universally consistent VN schemes do not exist for any nondecreasing integral sequences (k n ) satisfying k n = o(m n ) and any (V * n ) satisfying |V * n | = Θ(1). Beyond the ramifications for practically implementing VN in streaming or evolving network environments considered in [27], this lack of universal consistency is also the motivating result for our statistical approach to adversarial contamination in VN. Indeed, a simple consequence of the lack of universal consistency is that for any VN rule there are nominatable sequences for which the rule is not consistent. An adversary could then be understood as a probabilistic mechanism designed to transform nominatable sequences for which the rule is consistent into nominatable sequences for which the rule is not consistent.
To develop this reasoning further, we next develop the notion of (maximal) consistency classes in the VN framework.

VN Consistency Classes
We next explore the concept of consistency classes in VN, with an eye towards the development of a statistical adversarial contamination framework for VN. First, let N V * be the collection of all nested-core nominatable sequences with nested (1), and nondecreasing sequence (k n ) (satisfying the growth condition k n = o(n) of Theorem 15), the level-(k n ) precision recall consistency class of Φ is defined to be The lack of universal consistency ensures that C An affirmative answer would allow for ensemble methods to practically overcome the lack of universally consistent rules, and hence practically overcome any adversarial attack in the VN framework. We will see in Section 2.3.1 that the answer is, as expected, no, and any partition of N V * into maximal consistency classes necessarily contains infinite parts; see Theorem 15. As a consequence, ensemble methods cannot recover universal consistency in VN. The insights developed in Section 2.3.1 further motivate the development of adversarial contamination regimes for a given rule Φ. The idea behind adversarial contamination is simple in this framework: the adversary contaminates

Counting Consistency Classes
How can a practitioner mitigate the impact of a lack of universal consistency? One idea would be to consider ensemble methods, as the practical implications of the lack of universal consistency can be mitigated if universally consistent ensemble schemes exist. In this section, we will formalize the notion of maximal VN consistency classes and prove that infinitely many maximal consistency classes exist. We begin with defining the notion of maximal consistency classes in the VN-framework.
Definition 14 (Maximal Consistency Class). As above, let N V * be the collection of all nested-core nominatable sequences with nested v.o.i. V * = (V * n ⊂ C n ). For a nondecreasing integer sequence (k n ), we say that C ∈ N V * is a maximal level-(k n ) precision recall consistency class for V * if the following two conditions hold.
i. There exists a VN rule Φ that is jointly level-(k n ) precision recall consistent for V * for each F ∈ C; ii. If F / ∈ C, then there does not exist a VN rule Φ that is jointly level-(k n ) precision recall consistent for V * for each F ∈ C ∪ {F }.
A natural question to ask is whether it is possible to partition N V * into a finite number of maximal level-(k n ) consistency classes for a particular sequence (k n ) ∞ n=1 ? Our next result-Theorem 15shows that for any integer sequence (k n ) satisfying a modest growth condition, any partition of N into maximal level-(k n ) consistency classes must include at least countably infinite parts, thus erasing the hope that ensemble methods can recover universal consistency and practically mitigate the effect of any VN adversarial attack.
Theorem 15. Let (k n ) be a sequence of nondecreasing integers satisfying k n = o(n), and let V * be a nested sequence of vertices of interest satisfying |V * n | = Θ(1).
i. Let N V * = ∪ α∈A C α be a partition of N V * into maximal level-(k n ) recall consistency classes, then |A| = ∞.
The proof of this Theorem can be found in Appendix B.

Verification functions
In the presence of an adversarial attack, is it possible to, without additional supervision, verify if a given VN scheme is working on a given F (n,m) c,θ ∈ N n,m ? In other words, given a nondecreasing integer sequence (k n ), (g 1 , g 2 ) ∈ G n ×G m , and v.o.i. V * n , can we consistently estimate the verification function Note that the scaling by |V * n | in the recall setting and by k n in the precision setting do not affect consistent estimation of h if |V * n | = Θ(1) or if in the precision setting k n = Θ(1). As such, the scaling is omitted.
The internal consistency criterion, Eq. 2 guarantees that for all obfuscation functions o n ,õ n ∈ O n . Indeed, the v.o.i.'s in g 2 are identical (though obfuscated differently) in o n (g 2 ) andõ n (g 2 ). If we consider an alternate (g 1 , g 2 ) ∼ F n ⊂ F , it could be the case that g 1 = g 1 and g 2 g 2 , while for all o n ∈ O n ; indeed, consider letting the v.o.i.'s' in g 2 be different from (and not isomorphic to) those in g 2 (i.e., the behavior of the v.o.i. in F n is different from the behavior of the v.o.i. in F n ). Consider the problem of estimating h Φn viaĥ Φn . If the estimator is label-agnostic (i.e., there is no information in the obfuscated labeling of o(g 2 )), then it is sensible to require that for all g 2 g 2 , we have thatĥ Contrasting this to Eqs. (4) and (5), we see that (ĥ Φn ) cannot universally consistently estimate (h Φn ), as the sequence of estimators cannot account for the potentially different behaviors of the v.o.i.'s under the umbrella of nominatable distributions. To wit, we have the following lemma. Lemma 16. With notation as above, let (ĥ Φn ) n be any sequence of label-agnostic (i.e., satisfying Eq. 6) estimators of (h Φn ) n . There exists sequences of nested-core nominatable distributions F = (F n ) and F = (F n ) such that for n sufficiently large, if (G 1 , G 2 ) ∼ F n , and (G 1 , G 2 ) ∼ F n , then As a result of the above discussion and Lemma, we are unable to verify, without additional supervision, if an adversary has moved the distribution out of a given VN rule's consistency class. This points to the primacy of additional supervision, which in the VN framework often comes in the form of a user-in-the-loop. Indeed, we are currently exploring the role/impact a use-in-the-loop in VN-where the user can evaluate the interestingness of the vertices in the top k of the nomination list for a cost c k . This supervision can also be thought of as a form of regularization, designed to increase the consistency class of a given VN rule.

Adversarial Vertex Nomination
In order to actively model adversarial attacks in the VN-framework, we formalize the notion of an edge adversary.
Definition 17 (Adversary). Let F be a distribution on graphs in G m , and let U be a random variable independent of G ∼ F . We say A = {f A , V A , U, θ} is an adversary parameterized by θ ∈ Θ if . Succinctly put, if an edge is added or removed from E(G), then the vertices adjacent to that edge must be in V A (G, U, θ).
In the above, U represents an independent source of randomness utilized in the adversarial attack.
Note that f A is simply a function that adds/deletes edges from a network potentially randomly, and these edges must be incident to the vertices of V A . To that end, we will refer to V A as the vertices contaminated by A.
If we are given a sequence of nominatable distributions F = (F n ) ∞ n=n 0 , where F n is a distribution on G n × G m , then we will let f An (F n ) denote a sequence of graphs realized from F n , with the second graph G 2 contaminated by f An ; we call a sequence (f An ) ∞ n=n 0 an adversary rule. In the language of VN consistency classes, we posit that an adversary rule aims to contaminate a VN rule Φ via Remark 18. Let G 2 = (V 2 , E 2 ) and G 2 = (V 2 , E 2 ). Consider an edge adversary f A acting on G 2 . By considering V 2 = V (G 2 ) \ V A , we can also consider this adversary as a vertex adversary that randomly adds vertices to G 2 . Vertex addition and deletion can be simultaneously modeled by first considering a mechanism for randomly deleting vertices from G 2 = (V 2 , E 2 ) before using the above approach to add adversarial vertices to the network.
Remark 19. In [50], the authors consider direct attacks and influencer attacks in which, given a vertex of interest v * , either v * ∈ V A or v * / ∈ V A respectively. However, note that in [50], the objective is vertex classification, whereas we are not directly classifying vertices. Rather, we are interested in ranking vertices in G 2 by interestingness given limited training data in G 1 . We will typically assume that v * / ∈ V A (i.e. the adversary does not control the vertex of interest), so that we are examining influencer attacks.

A Simple VN Adversarial Contamination Model
Now that we have developed the requisite setting for framing the idea of adversarial contamination in the VN-setting, we will consider a simple model for adversarial contamination in the stochastic blockmodel (SBM) of [23].
Definition 20 (Stochastic Blockmodel). We say that an n-vertex random graph G is an instantiation of a stochastic blockmodel with parameters (n, K, B, b) (written A ∼ SBM(n, K, B, π)) if i. The block membership vector π ∈ R K satisfies π i ≥ 0 for all i ∈ [K], and i π(i) = 1; In addition, we will say that a pair of graphs (G 1 , G 2 ) is an instantiation of a ρ-correlated SBM(n, K, B, b) (written (G 1 , G 2 ) ∼ SBM(ρ, n, K, B, π)) if marginally G 1 ∼ SBM(n, K, B, b) and G 2 ∼ SBM(n, K, B, b), and the collection of indicator random variables is mutually independent except that for each {u, v} ∈ V 2 , Consider G as an n-vertex stochastic blockmodel, with two blocks, B 1 and B 2 , and with π = (1/2, 1/2) . The block-probability matrix B is given by with p ≥ q ≥ r > 0. Given G = g, we define the following VN adversarial contamination procedure A = (f A , V A , U, θ) acting on g as follows: 1. θ = (c + , c − , π + , π − , s + , s − ) is a vector of parameters where c + , c − ∈ Z satisfy c + + c − ≤ n, π + , π − ∈ (0, 1), and s + , s − ∈ [0, 1]; 2. U is a uniformly distributed random variable independent of G; 3. f A (g, U, θ) ∈ G n is defined as follows: i. Initialize g c = g ii. Create a set of vertices W + by independently selecting each vertex in V = [n] to be in W + with probability π + . Then, create a set of vertices W − by independently selecting each vertex in V \ W + = [n] to be in W − with probability π − . iii. For each vertex pair {v, u} ∈ W + × (V \ W − ), i. If {v, u} ∈ E(g c ), nothing happens. ii. If {v, u} / ∈ E(g c ), an edge is independently added connecting {v, u} in g c with probability s + . iv. For each vertex pair {v, u} ∈ W − × (V \ W + ), i. If {v, u} / ∈ E(g c ), nothing happens. ii. If {v, u} ∈ E(g c ), the edge is independently deleted from g c with probability s − . v. Set f A (g, U, θ) = g c ∈ G n .
The auxiliary randomness U in A is utilized to make the random vertex selections in ii., the random edge additions in iii., and the random edge deletions in iv.
Notice that this adversarial model gives rise to a new stochastic blockmodel with the edgeprobability matrixB given bỹ and whereB + 1 are the vertices in W + ∩ B 1 ;B − 1 are the vertices in B 1 ∩ W − ; andB 1 are the vertices in B 1 \ (B + 1 ∪B − 1 ); withB 2 defined analogously. We note here that this adversarial contamination model is similar to the contamination model considered in [8].
Note also that the original block structure is preserved amongst vertices inB 1 ∪B 2 , and we can view this contamination model as adding vertices randomly to G[B 1 ∪B 2 ], i.e., the induced subgraph onB 1 ∪B 2 . When (G 1 , G 2 ) ∼ SBM(ρ, n, K, B, π) and this adversarial procedure is applied to G 2 , we will denote Remark 21. Let A n be the simple adversarial rule outlined above. A very simple VN rule Φ and nested core nominatable sequence F for which proceeds as follows. Consider F n = SBM(ρ, n, K, B, π) supported on G n × G n where B is as in Eq. 7 with π = (1/2, 1/2), p > q > r fixed, and ρ > 0 fixed. Suppose that Φ n is a VN scheme that runs spectral clustering on the contaminated graph by first selecting the number of communities in a consistent manner (via adjacency spectral clustering for example [28]) and ranking all the vertices in the group with the highest probability of within-group connection (in a fixed but arbitrary order), and then ranks the rest of the vertices in fixed but arbitrary order. Suppose that we consider and that the adversary acting on G 2 impacts this consistency. We present the following result as a lemma, but the proof is a simple calculation.

Lemma 22.
In the adversarial contamination model A n defined above, if either 1. p − q < s − , or 2. p−q 1−q < s + , then Φ n is no longer consistent with respect to the adversarially contaminated model sequence.

Regularizing the Adversary
Given the adversarial model considered above, and the discussion on VN verification in Section 2.3.2, it is natural to seek procedures for mitigating the effect of the contamination in G 2 . Network regularization is a natural solution, and we here consider as a regularization strategy the network analogue of the classical trimmed mean estimator. To wit, we consider the regularization procedure in Algorithm 1 inspired by the network trimming procedure in [16]; see also the work in [25] for the impact of trimming regularization on random graph concentration.

Algorithm 1 Regularization via network trimming
Input: Graph G, , h ∈ (0, 1), seed set S; 1. Initialize V t = S 2. Rank the vertices in V (G) \ S by descending degree (ties are broken via averaging over ranks). For each vertex u in V (G) \ S, denote the rank via rk(u); using ASE and cluster the embedding using a model-based GMM procedure. Given a clustering C, the modularity is defined as usual via where |E| =the number of edges in G ; and C i is the cluster containing vertex i in C.

Regularization in our Motivating Example from Section 1.1
We next explore the impact of regularization on our motivating HS social network example from Section 1.1. In the left panel of Figure 2, we plot the modularity of the GMM clustering in the trimmed G ( ,h) 2 as a function of , h ∈ {0, 0.05, 0.1, 0.15, 0.2, 0.25}. Note that we average the modularity values over nM C = 500 seed sets of size s = 10 (the same seed sets as used in Figure  1). The color indicates the value of the modularity, with darker red indicating lower values and lighter yellow-to-white indicating larger values. From the figure, we can see that modularity is maximized when h = 0 (i.e., no large degree vertices trimmed) and ≈ 0.05-0.1. We note that this trimming process can cut core vertices as well as junk vertices, and core vertices cut from G 2 can never be recovered via VN • GMM • ASE. This is demonstrated in the right panel of Figure 2, where the horizontal asymptotes for each trimming value indicates the maximum number of core vertices that are recoverable after regularization. In the figure, the gold line represents performance in the idealized network pair (G In Figure 3 and Table 2

Experiments
We next explore the effect of our adversarial noise model in a simulated data experiment, and the effect of adversarial contamination (and a subsequent model for regularization) in a real data example derived from Bing entity transition graphs. First, we explain in detail the steps of the VN scheme we will consider in our experiments.

Experimental Setup
In the contamination model of Section 3.1, we consider the following VN scheme, denoted VN • GMM • ASE. Letting v * ∈ V (G 1 ) (resp., V * ⊂ V (G 1 )) be the vertex (resp., vertices) of interest in G 1 , we seek the corresponding vertex (resp., vertices) of interest in V (G 2 ) as follows: 1. Given two graphs, G 1 and G 2 , we use Adjacency Spectral Embedding (ASE) [43] to separately embed G 1 and G 2 into a common Euclidean space R d . Given the n × n adjacency matrix A of G 1 , the d-dimensional ASE of G 1 is defined as follows.
Definition 24 (Adjacency spectral embedding (ASE)). Given d ∈ Z > 0, the adjacency spectral is the spectral decomposition of |A| = (A T A) 1/2 , S A ∈ R d×d is the diagonal matrix with the d largest eigenvalues of |A| on its diagonal and U A ∈ R n×d has columns which are the eigenvectors corresponding to the eigenvalues of S A .
Simply stated, the ASE of a graph G provides Euclidean features for each vertex in G on which to perform subsequent inference. Combined with recent efforts to prove that the ASE provides consistent estimators of the latent position parameters in random dot product graphs and positivedefinite stochastic blockmodels [43,2], the ASE allows for a host classical inference methodologies to be successfully employed within these random graph frameworks [44,45,29]. To choose d above, we use the machinery of [49,10] to develop the principled heuristic of estimating d as the larger of the two elbows of the associated scree plots of the singular values of G 1 and G 2 .

2.
Solve the orthogonal Procrustes problem [40] to find an orthogonal transformation aligning the seeded vertices across graphs. Let X S (resp., Y S ) be the matrix composed of the rows of ASE(G 1 ) (resp., ASE(G 2 )) corresponding to the seeded vertices in S. Letting the SVD of Y T S X S = U ΣV T , the solution to is given by R = U V T . Use this transformation to align the embeddings of G 1 and 3. Motivated by the central limit theorem of [3] for the residual errors between the rows of the ASE and the latent position parameters in random dot product graphs, we use model-based Gaussian mixture modeling (GMM) to simultaneously cluster the vertices of the embedded graphs. Here, we employ the R package MClust [19].

4.
Rank the candidate matches in G 2 according to the following heuristic. If u ∈ V (G 1 ) and v ∈ V (G 2 ) are clustered points in the Procrustes-aligned embedding of G 1 and G 2 with respective covariance matrices Σ u and Σ v in their components of the GMM, then compute fashion, we rank via (where n 2 = |V (G 2 )|) . . .
In the case of multiple v.o.i. V * , rank the vertices in G 2 then by increasing value of min v∈V * ∆(v, u) with ties broken in a fixed deterministic fashion. We choose min v∈V * ∆(v, u) as our ranking metric here as what defines interestingness can vary even among the v.o.i. in G 1 ; i.e., max v,v ∈V * ∆(v, v ) may be relatively large. Being uniformly close to the collection of v.o.i. would be too stringent a condition then, and we merely require highly nominated vertices to have close proximity to a v.o.i., as this would be evidence the highly nominated vertices correspond in G 2 to these proximal v.o.i. in G 1 .

Simulation
We consider the model in Section 3.1 with the following parameter choices:   ) for varying values of ( , h).
Note that these parameter choices yield an illustrative simulation, and we find that the resulting findings hold across multiple parameter choices as well. Note that, in the notation of Section 3.1, if (G 1 , G 2 ) ∼ SBM(ρ, n, K, B, π), we will consider In this simulation example, we observe that the adversarial contamination model significantly decreases VN performance and that the trimming regularization mitigates this contamination and recovers much of the lost inferential performance. In Figure 4 we plot the performance of VN • GMM • ASE over a number of ( , h) trimming pairs (we note that for all correlation/regularized/contaminated/trimmed combinations, mean performance is significantly better than chance and chance normalized plots are omitted). In the left panel, we plot the modularity of the GMM clustering in the trimmed G ). We see here that, as expected, performance loss due to contamination is mitigated by using the true model-based trimming parameters = h = 0.1, and using the modularity maximizing = 0.1, h = 0. If we over-trim, here represented by = h = 0.2, we see a degradation in performance; as expected from the low modularity value in the left panel for = h = 0.2. We again see here the interesting phenomena observed in the motivating high school friendship network example of Section 1.1: modularity and subsequently VN performance tends to emphasize more trimming of the low degree vertices and less trimming of the high degree vertices. This suggests that low-degree contamination is most effective at thwarting the performance on VN • GMM • ASE, perhaps contrary to the intuition that high-degree nodes adversely affect concentration of adjacency matrices [25].
As ). As expected, over-regularizing results in a significant number of v.o.i. being trimmed and significant performance loss as compared to the more moderate choices of regularization. Lastly, exploring the affect of ρ on VN • GMM • ASE performance, we repeat the above experiment with ρ = 0.5, and ρ = 0.3. Results are plotted in Figure 6. As expected, the trends observed in Figure  4 hold here as well, with an across the board performance decrease as ρ decreases.

Microsoft Bing Entity Graph Transitions
In the next example, we consider a multigraph derived from one month of aggregate Bing entity graph transitions. The multigraph represents entity transitions, and each weighted edge-type of the multigraph represents aggregated signal that capture a transition rate between two entities while browsing. There are multiple ways that a transition between those entities could be made, so we count each aggregated signal separately using the different edge-types in the multigraph: one edge-type represents transitions that were made via a suggestion interface; the other edge-type represents transitions that we made independent of any suggestion interface. As such, one type will have a constrained set of transition probabilities (it can realistically only connect to a subset of the vertices in the graph), while the other will be more "unlimited" in that it may connect to any other entity in the entire graph.
The resulting graphs are symmetric, weighted and loop-free, with G has 36808 vertices, and as expected, absolute performance (the left panel in Figure 7) in the clean case is better than in the regularized setting. From the right panel, we observe however, that the relative improvement  Figure 7), with the gold line representing the idealized network pair, the red line representing the contaminated, and the other colors representing various levels of regularization. See Section 4 for details over chance achieved in the regularized setting exceeds that in the clean setting, and we observe that VN • GMM • ASE performance is worse than chance in the contaminated and over-regularized network settings. While regularization has not recovered the performance in the idealized setting, the improvement induced via regularization is dramatic versus the contaminated setting. We also note that the modularity levels for automating the choice of ( , h) in this example are relatively stable to the trimming value, with the clustered G achieving Q = 0.53. Indeed, in this data example the graphs do not cluster particularly well under any trimming conditions, and a more modest trimming scheme is more effective for the subsequent VN inference task.
In Figure 8, we again consider the performance of VN • GMM • ASE with the same nM C = 2 randomly chose 100 vertex seed sets and various levels of regularization, here plotting over an extended x-axis. In pink we plot VN•GMM•ASE run on (G

Discussion
Our motivating question is two-fold: What effect does adversarial contamination have on the performance of vertex nomination? Herein, we have demonstrated both theoretically and empirically that an adversary can cause our VN scheme to fail (i.e., nominate the wrong vertices). Empirically, we have also demonstrated that regularization can be effective for mitigating the effect of the contamination model posited herein, though we have not proven this result. Establishing the theoretical effect of regularization on VN is an open problem, and the subject of our present research.
In [27], the authors showed that there can be no universally consistent vertex nomination scheme assuming only one vertex of interest. In this paper, we have seen that with a suitable definition of a maximal consistency class and (possibly) multiple vertices of interest, there are infinitely many such consistency classes, which implies that ensemble methods cannot recover consistency and/or thwart an arbitrary adversary. This allows us to formulate our model of adversarial contamination in terms of consistency classes; indeed, an adversary for a particular VN rule aims to move the distribution out of the rule's consistency class. A natural next question to consider would be what effect regularization has on a VN rule's consistency class. Ideally, regularization enlarges the consistency class of a VN rule thereby making the adversary's job (i.e., moving the model out of the consistency class) more difficult. The interplay between the adversary and regularization in VN is central to this story, although we are only at the infancy of understanding it.
There are several issues compounding the theoretical analysis of regularization, even in the relatively simple setting posited herein. Indeed, the adversarially modified graph G 2 is, under our modeling assumptions of Section 3.1, a stochastic blockmodel, albeit with more blocks than in G 1 . Theoretically analyzing the effect of our trimming regularizer in the context of VN • GMM • ASE would require novel results in the concentration and spectral properties of regularized random graphs, akin (though different from) those in [25]. Indeed, regularization and its effect on the spectral analysis of random graphs is still not very well understood, as regularization often induces complicated dependency structure into the resulting regularized graph. Existing spectral analysis techniques often require relating differences in eigenvectors/eigenvalues for perturbed matrices with independent (or weakly dependent [9]) entries, which is not directly applicable in the regularized setting. Hence, new techniques must be developed to understand regularization. We believe that our theoretical findings are a necessary first step to begin to understand how an adversary can affect vertex nomination.
Our proposed definition of an adversary is suited to a general random graph setting, and it provides a simple surrogate in which to study the effect of contamination in real data examples. From our simulation study and real data examples we have seen that a particular VN rule (VN • GMM • ASE) succeeds before adversarial contamination, fails after contamination, and succeeds after graph regularization. We are currently exploring the effect of contamination on a broader class of VN rules, and considering other models for adversarial contamination and subsequent regularization. Finally, while we have partially answered in the negative our question about whether consistency can be retained in the general adversarial setting, another valid consideration is whether there are adversarial models for which the adversary does not affect consistency. While we believe even simple manipulation on the edges of G 2 can affect consistency, it may be possible to derive bounds and phase transitions on the number of edges (or vertices) that an adversary would need to modify to change the result. Mathematically, this is akin to finding limits on the size of |V A | in our definition of an adversary.
A Proof of Bayes Optimality for the Scheme in Sec. 2.2.1 and σ Φ(g . Lastly, for (g 1 , g 2 ) ∈ G a n × G a m , define p Φ ∈ [0, 1] m via Note that, by definition, p Φ * majorizes p Φ .
To show that Φ * is Bayes optimal for L

B Proof of Theorem 15
We first note that the growth condition on |V * n | and on k n in the precision case ensures that the result for precision and recall consistency follow from each other, and so we will focus our attention on recall consistency. The analogous result for precision follows mutatis mutandis.
For each ∈ n/3 ξn and each vertex v in V (B ), independent of all other edges in the network, select vertices uniformly at random from H n , i.e., from n/3 ξ n ξ n + 1, n/3 ξ n ξ n + 2, . . . , n .
Denote this set of vertices via V v, -and place an edge between v and each vertex in V v, . Let H n,i be the collection of all graphs possible under the above construction, and let F n,i be the distribution on H n,i outlined above. With c = n, the correspondence the identity, and (where |V *  However, note that here L * , (1) kn (V * , F n,i ) ≤ 1 − k n ξ n .
Indeed, for a given F n,i , consider the following VN scheme Ψ n . First identify the vertices of H n ; this is possible as H n is a complete subgraph of order ≥ 2n/3, and each B i is of order o(n) with vertices of degree at most n/3 ξn ≤ n/3. Each B can then be recovered and identified by computing the number of edges between H n and each vertex v ∈ V \ V (H n ); in particular B i can be identified as the set of vertices in V \ V (H n ) with i edges to V (H n ). Let ψ n then rank the vertices in B i (in arbitrary order) at the top of its nomination list. It is immediate then that L (1) kn (Ψ n , V * ) = 1 − k n ξ n .
By consistency with respect toF n,i andF n,j , i.e., by Eqs. 10-11, we have that for any > 0, there existsñ such that for n ≥ñ, we have P F n,i (E v n,i ) ≥ k n ξ n − ; (12) P F n,j (E v n,j ) ≥ k n ξ n − .
As was chosen arbitrarily, and kn ξn is bounded away from 0 by assumption, we reach our desired contradiction, and Φ cannot be consistent with respect to both F i and F j . As i, j ∈ n 0 /3 ξn 0 were arbitrary, we see that there must be at least countably many consistency classes (since there are at least n 0 /3 ξn 0 and we can let n 0 tend to infinity).