Statistics Surveys

Basic models and questions in statistical network analysis

Miklós Z. Rácz and Sébastien Bubeck

Full-text: Open access

Abstract

Extracting information from large graphs has become an important statistical problem since network data is now common in various fields. In this minicourse we will investigate the most natural statistical questions for three canonical probabilistic models of networks: (i) community detection in the stochastic block model, (ii) finding the embedding of a random geometric graph, and (iii) finding the original vertex in a preferential attachment tree. Along the way we will cover many interesting topics in probability theory such as Pólya urns, large deviation theory, concentration of measure in high dimension, entropic central limit theorems, and more.

Article information

Source
Statist. Surv. Volume 11 (2017), 1-47.

Dates
Received: September 2016
First available in Project Euclid: 8 September 2017

Permanent link to this document
https://projecteuclid.org/euclid.ssu/1504836152

Digital Object Identifier
doi:10.1214/17-SS117

Subjects
Primary: 62-02: Research exposition (monographs, survey articles) 05C80: Random graphs [See also 60B20] 60C05: Combinatorial probability

Keywords
Networks statistical inference random graphs random trees community detection stochastic block model random geometric graphs evolving random graphs preferential attachment

Rights
Creative Commons Attribution 4.0 International License.

Citation

Rácz, Miklós Z.; Bubeck, Sébastien. Basic models and questions in statistical network analysis. Statist. Surv. 11 (2017), 1--47. doi:10.1214/17-SS117. https://projecteuclid.org/euclid.ssu/1504836152


Export citation

References

  • [1] E. Abbe. Community detection and stochastic block models: recent developments. Journal of Machine Learning Research, to appear, 2017.
  • [2] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model. IEEE Transactions on Information Theory, 62(1):471–487, 2016.
  • [3] E. Abbe and C. Sandon. Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms. In Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS). IEEE, 2015.
  • [4] E. Abbe and C. Sandon. Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap. Preprint available at http://arxiv.org/abs/1512.09080, 2015.
  • [5] S. Artstein, K. Ball, F. Barthe, and A. Naor. Solution of Shannon’s problem on the monotonicity of entropy. Journal of the American Mathematical Society, 17(4):975–982, 2004.
  • [6] S. Artstein, K. M. Ball, F. Barthe, and A. Naor. On the rate of convergence in the entropic central limit theorem. Probability Theory and Related Fields, 129(3):381–390, 2004.
  • [7] K. Ball, F. Barthe, and A. Naor. Entropy jumps in the presence of a spectral gap. Duke Mathematical Journal, 119(1):41–63, 2003.
  • [8] A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999.
  • [9] S. Bhamidi. Universal techniques to analyze preferential attachment trees: Global and Local analysis. Available online at http://www.unc.edu/~bhamidi/preferent.pdf, 2007.
  • [10] P. J. Bickel and A. Chen. A nonparametric view of network models and Newman–Girvan and other modularities. Proceedings of the National Academy of Sciences, 106(50):21068–21073, 2009.
  • [11] N. Blachman. The convolution inequality for entropy powers. IEEE Transactions on Information Theory, 11(2):267–271, 1965.
  • [12] S. G. Bobkov. Isoperimetric and analytic inequalities for log-concave probability measures. The Annals of Probability, 27(4):1903–1921, 1999.
  • [13] B. Bollobás, O. Riordan, J. Spencer, and G. Tusnády. The Degree Sequence of a Scale-Free Random Graph Process. Random Structures & Algorithms, 18(3):279–290, 2001.
  • [14] H. Breu and D. G. Kirkpatrick. Unit disk graph recognition is NP-hard. Computational Geometry, 9(1):3–24, 1998.
  • [15] N. Broutin, L. Devroye, N. Fraiman, and G. Lugosi. Connectivity threshold of Bluetooth graphs. Random Structures & Algorithms, 44(1):45–66, 2014.
  • [16] N. Broutin, L. Devroye, and G. Lugosi. Connectivity of sparse Bluetooth networks. Electronic Communications in Probability, 20(48):1–10, 2015.
  • [17] N. Broutin, L. Devroye, and G. Lugosi. Almost optimal sparsification of random geometric graphs. The Annals of Applied Probability, 26(5):3078–3109, 2016.
  • [18] S. Bubeck, L. Devroye, and G. Lugosi. Finding Adam in random growing trees. Random Structures & Algorithms, 50(2):158–172, 2017.
  • [19] S. Bubeck, J. Ding, R. Eldan, and M. Z. Rácz. Testing for high-dimensional geometry in random graphs. Random Structures & Algorithms, 49(3):503–532, 2016.
  • [20] S. Bubeck, R. Eldan, E. Mossel, and M. Z. Rácz. From trees to seeds: on the inference of the seed from large trees in the uniform attachment model. Bernoulli, 23(4A):2887–2916, 2017.
  • [21] S. Bubeck and S. Ganguly. Entropic CLT and Phase Transition in High-dimensional Wishart Matrices. International Mathematics Research Notices, to appear, 2016.
  • [22] S. Bubeck, E. Mossel, and M. Z. Rácz. On the influence of the seed graph in the preferential attachment model. IEEE Transactions on Network Science and Engineering, 2(1):30–39, 2015.
  • [23] T. N. Bui, S. Chaudhuri, F. T. Leighton, and M. Sipser. Graph bisection algorithms with good average case behavior. Combinatorica, 7(2):171–191, 1987.
  • [24] A. Condon and R. M. Karp. Algorithms for Graph Partitioning on the Planted Partition Model. Random Structures and Algorithms, 18(2):116–140, 2001.
  • [25] N. Curien, T. Duquesne, I. Kortchemski, and I. Manolescu. Scaling limits and influence of the seed graph in preferential attachment trees. Journal de l’École polytechnique — Mathématiques, 2:1–34, 2015.
  • [26] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E, 84(6):066106, 2011.
  • [27] L. Devroye, A. György, G. Lugosi, and F. Udina. High-dimensional random geometric graphs and their clique number. Electronic Journal of Probability, 16:2481–2508, 2011.
  • [28] F. Eggenberger and G. Pólya. Über die Statistik verketteter Vorgänge. ZAMM - Journal of Applied Mathematics and Mechanics / Zeitschrift für Angewandte Mathematik und Mechanik, 3(4):279–289, 1923.
  • [29] R. Eldan. An efficiency upper bound for inverse covariance estimation. Israel Journal of Mathematics, 207(1):1–9, 2015.
  • [30] R. Eldan and D. Mikulincer. Information and dimensionality of anisotropic random geometric graphs. Preprint available at https://arxiv.org/abs/1609.02490, 2016.
  • [31] P. Erdős and A. Rényi. On random graphs, I. Publicationes Mathematicae Debrecen, 6:290–297, 1959.
  • [32] P. Erdős and A. Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci., V.A(1–2):17–60, 1960.
  • [33] G. Fanti, P. Kairouz, S. Oh, and P. Viswanath. Spy vs. Spy: Rumor Source Obfuscation. In ACM SIGMETRICS, volume 43, pages 271–284. ACM, 2015.
  • [34] A. L. Gibbs and F. E. Su. On choosing and bounding probability metrics. International Statistical Review, 70(3):419–435, 2002.
  • [35] M. Girvan and M. E. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821–7826, 2002.
  • [36] P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent Space Approaches to Social Network Analysis. Journal of the American Statistical Association, 97(460):1090–1098, 2002.
  • [37] P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109–137, 1983.
  • [38] S. Janson. Limit theorems for triangular urn schemes. Probability Theory and Related Fields, 134(3):417–452, 2006.
  • [39] T. Jiang and D. Li. Approximation of Rectangular Beta-Laguerre Ensembles and Large Deviations. Journal of Theoretical Probability, 28:804–847, 2015.
  • [40] V. Jog and P.-L. Loh. Analysis of Centrality in Sublinear Preferential Attachment Trees via the Crump-Mode-Jagers Branching Process. IEEE Transactions on Network Science and Engineering, 4(1):1–12, 2017.
  • [41] O. Johnson and A. Barron. Fisher information inequalities and the central limit theorem. Probability Theory and Related Fields, 129(3):391–409, 2004.
  • [42] I. M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis. Annals of Statistics, 29(2):295–327, 2001.
  • [43] B. Karrer and M. E. Newman. Stochastic blockmodels and community structure in networks. Physical Review E, 83(1):016107, 2011.
  • [44] J. Khim and P.-L. Loh. Confidence Sets for the Source of a Diffusion in Regular Trees. IEEE Transactions on Network Science and Engineering, 4(1):27–40, 2017.
  • [45] Y. V. Linnik. An Information-Theoretic Proof of the Central Limit Theorem with Lindeberg Conditions. Theory of Probability & Its Applications, 4(3):288–299, 1959.
  • [46] H. M. Mahmoud. Distances in random plane-oriented recursive trees. Journal of Computational and Applied Mathematics, 41(1-2):237–245, 1992.
  • [47] L. Massoulié. Community detection thresholds and the weak Ramanujan property. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing (STOC), pages 694–703. ACM, 2014.
  • [48] E. Mossel, J. Neeman, and A. Sly. A proof of the block model threshold conjecture. Preprint available at http://arxiv.org/abs/1311.4115, 2013.
  • [49] E. Mossel, J. Neeman, and A. Sly. Belief propagation, robust reconstruction, and optimal recovery of block models. In Proceedings of the 27th Conference on Learning Theory (COLT), 2014.
  • [50] E. Mossel, J. Neeman, and A. Sly. Consistency thresholds for the planted bisection model. In Proceedings of the 47th Annual ACM on Symposium on Theory of Computing (STOC), pages 69–75. ACM, 2015.
  • [51] M. Penrose. Random Geometric Graphs, volume 5 of Oxford Studies in Probability. Oxford University Press, 2003.
  • [52] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39(4):1878–1915, 2011.
  • [53] A. Rudas, B. Tóth, and B. Valkó. Random trees and general branching processes. Random Structures & Algorithms, 31(2):186–202, 2007.
  • [54] D. Shah and T. Zaman. Rumors in a Network: Who’s the Culprit? IEEE Transactions on Information Theory, 57(8):5163–5181, 2011.
  • [55] D. Shah and T. Zaman. Finding Rumor Sources on Random Trees. Operations Research, 64(3):736–755, 2016.
  • [56] C. E. Shannon. A Mathematical Theory of Communication. The Bell System Technical Journal, 27:379–423, 623–656, 1948.
  • [57] A. J. Stam. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Information and Control, 2(2):101–112, 1959.