The Annals of Applied Statistics

Vertex nomination schemes for membership prediction

D. E. Fishkind, V. Lyzinski, H. Pao, L. Chen, and C. E. Priebe

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Suppose that a graph is realized from a stochastic block model where one of the blocks is of interest, but many or all of the vertices’ block labels are unobserved. The task is to order the vertices with unobserved block labels into a “nomination list” such that, with high probability, vertices from the interesting block are concentrated near the list’s beginning. We propose several vertex nomination schemes. Our basic—but principled—setting and development yields a best nomination scheme (which is a Bayes–Optimal analogue), and also a likelihood maximization nomination scheme that is practical to implement when there are a thousand vertices, and which is empirically near-optimal when the number of vertices is small enough to allow comparison to the best nomination scheme. We then illustrate the robustness of the likelihood maximization nomination scheme to the modeling challenges inherent in real data, using examples which include a social network involving human trafficking, the Enron Graph, a worm brain connectome and a political blog network.

Article information

Source
Ann. Appl. Stat. Volume 9, Number 3 (2015), 1510-1532.

Dates
Received: August 2014
Revised: February 2015
First available in Project Euclid: 2 November 2015

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1446488749

Digital Object Identifier
doi:10.1214/15-AOAS834

Mathematical Reviews number (MathSciNet)
MR3418733

Zentralblatt MATH identifier
06525996

Keywords
Vertex nomination stochastic block model graph matching spectral partitioning

Citation

Fishkind, D. E.; Lyzinski, V.; Pao, H.; Chen, L.; Priebe, C. E. Vertex nomination schemes for membership prediction. Ann. Appl. Stat. 9 (2015), no. 3, 1510--1532. doi:10.1214/15-AOAS834. https://projecteuclid.org/euclid.aoas/1446488749


Export citation

References

  • Adamic, L. A. and Glance, N. (2005). The political blogosphere and the 2004 U.S. election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery LinkKDD’05 36–43. ACM, New York.
  • Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2009). Mixed membership stochastic blockmodels. In Advances in Neural Information Processing Systems 21 (D. Koller, D. Schuurmans, Y. Bengio and L. Bottou, eds.) 33–40. Curran, Red Hook, New York.
  • Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
  • Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res. 3 993–1022.
  • Chang, J. and Dai, A. (2010). LDA: Collapsed Gibbs sampling methods for topic models, 2010. R Package Version 1.
  • Conte, D., Foggia, P., Sansone, C. and Vento, M. (2004). Thirty years of graph matching in pattern recognition. Int. J. Pattern Recognit. Artif. Intell. 18 265–298.
  • Coppersmith, G. (2014). Vertex nomination. Wiley Interdisciplinary Reviews: Computational Statistics 6 144–153.
  • Coppersmith, G. A. and Priebe, C. E. (2012). Vertex nomination via content and context. Preprint. Available at arXiv:1201.4118.
  • Erdős, P. and Rényi, A. (1963). Asymmetric graphs. Acta Math. Acad. Sci. Hungar 14 295–315.
  • Fishkind, D. E., Sussman, D. L., Tang, M., Vogelstein, J. T. and Priebe, C. E. (2013). Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown. SIAM J. Matrix Anal. Appl. 34 23–39.
  • Fortunato, S. (2010). Community detection in graphs. Phys. Rep. 486 75–174.
  • Fraley, C. and Raftery, A. E. (1999). MCLUST: Software for model-based cluster analysis. J. Classification 16 297–306.
  • Fraley, C. and Raftery, A. E. (2003). Enhanced model-based clustering, density estimation, and discriminant analysis software: MCLUST. J. Classification 20 263–286.
  • Garey, M. R. and Johnson, D. S. (1979). Computer and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, New York.
  • Hubert, L. and Arabie, P. (1985). Comparing partitions. J. Classification 2 193–218.
  • Lee, D. S. and Priebe, C. E. (2012). Bayesian vertex nomination. Preprint. Available at arXiv:1205.5082.
  • Lyzinski, V., Fishkind, D. E. and Priebe, C. E. (2014). Seeded graph matching for correlated Erdos–Renyi graphs. J. Mach. Learn. Res. 15 3513–3540.
  • Lyzinski, V., Sussman, D. L., Fishkind, D. E., Pao, H., Chen, L., Vogelstein, J. T., Park, Y. and Priebe, C. E. (2015a). Spectral clustering for divide-and-conquer graph matching. Parallel Comput. 47 70–87.
  • Lyzinski, V., Fishkind, D., Fiori, M., Vogelstein, J. T., Priebe, C. E. and Sapiro, G. (2015b). Graph matching: Relax at your own risk. Preprint. IEEE Trans. Pattern Anal. Mach. Intell. To appear. DOI:10.1109/TPAMI.2015.2424894.
  • Lyzinski, V., Sussman, D. L., Tang, M., Athreya, A. and Priebe, C. E. (2014b). Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding. Electron. J. Stat. 8 2905–2922.
  • Newman, M. E. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E (3) 69 026113.
  • Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. J. Amer. Statist. Assoc. 96 1077–1087.
  • Pólya, G. (1937). Kombinatorische Anzahlbestimmungen für Gruppen, Graphen und chemische Verbindungen. Acta Math. 68 145–254.
  • Priebe, C. E., Conroy, J. M., Marchette, D. J. and Park, Y. (2005). Scan statistics on enron graphs. Comput. Math. Organ. Theory 11 229–247.
  • Read, R. C. and Corneil, D. G. (1977). The graph isomorphism disease. J. Graph Theory 1 339–363.
  • Sussman, D. L., Tang, M., Fishkind, D. E. and Priebe, C. E. (2012). A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Amer. Statist. Assoc. 107 1119–1128.
  • Vogelstein, J. T., Conroy, J. M., Lyzinski, V., Podrazik, L. J., Kratzer, S. G., Harley, E. T., Fishkind, D. E., Vogelstein, R. J. and Priebe, C. E. (2015). Fast approximate quadratic programming for graph matching. PLOS One 10 e0121002.
  • Zaslavskiy, M., Bach, F. and Vert, J.-P. (2009). A path following algorithm for the graph matching problem. IEEE Trans. Pattern Anal. Mach. Intell. 31 2227–2242.