## Statistics Surveys

- Statist. Surv.
- Volume 11 (2017), 1-47.

### Basic models and questions in statistical network analysis

Miklós Z. Rácz and Sébastien Bubeck

**Full-text: Open access**

#### Abstract

Extracting information from large graphs has become an important statistical problem since network data is now common in various fields. In this minicourse we will investigate the most natural statistical questions for three canonical probabilistic models of networks: (i) community detection in the stochastic block model, (ii) finding the embedding of a random geometric graph, and (iii) finding the original vertex in a preferential attachment tree. Along the way we will cover many interesting topics in probability theory such as Pólya urns, large deviation theory, concentration of measure in high dimension, entropic central limit theorems, and more.

#### Article information

**Source**

Statist. Surv. Volume 11 (2017), 1-47.

**Dates**

Received: September 2016

First available in Project Euclid: 8 September 2017

**Permanent link to this document**

https://projecteuclid.org/euclid.ssu/1504836152

**Digital Object Identifier**

doi:10.1214/17-SS117

**Subjects**

Primary: 62-02: Research exposition (monographs, survey articles) 05C80: Random graphs [See also 60B20] 60C05: Combinatorial probability

**Keywords**

Networks statistical inference random graphs random trees community detection stochastic block model random geometric graphs evolving random graphs preferential attachment

**Rights**

Creative Commons Attribution 4.0 International License.

#### Citation

Rácz, Miklós Z.; Bubeck, Sébastien. Basic models and questions in statistical network analysis. Statist. Surv. 11 (2017), 1--47. doi:10.1214/17-SS117. https://projecteuclid.org/euclid.ssu/1504836152

#### References

- [1] E. Abbe. Community detection and stochastic block models: recent developments.
*Journal of Machine Learning Research*, to appear, 2017. - [2] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model.
*IEEE Transactions on Information Theory*, 62(1):471–487, 2016. - [3] E. Abbe and C. Sandon. Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms. In
*Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS)*. IEEE, 2015. - [4] E. Abbe and C. Sandon. Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap. Preprint available at http://arxiv.org/abs/1512.09080, 2015.
- [5] S. Artstein, K. Ball, F. Barthe, and A. Naor. Solution of Shannon’s problem on the monotonicity of entropy.
*Journal of the American Mathematical Society*, 17(4):975–982, 2004. - [6] S. Artstein, K. M. Ball, F. Barthe, and A. Naor. On the rate of convergence in the entropic central limit theorem.
*Probability Theory and Related Fields*, 129(3):381–390, 2004. - [7] K. Ball, F. Barthe, and A. Naor. Entropy jumps in the presence of a spectral gap.
*Duke Mathematical Journal*, 119(1):41–63, 2003. - [8] A.-L. Barabási and R. Albert. Emergence of scaling in random networks.
*Science*, 286(5439):509–512, 1999.Mathematical Reviews (MathSciNet): MR2091634

Digital Object Identifier: doi:10.1126/science.286.5439.509 - [9] S. Bhamidi. Universal techniques to analyze preferential attachment trees: Global and Local analysis. Available online at http://www.unc.edu/~bhamidi/preferent.pdf, 2007.
- [10] P. J. Bickel and A. Chen. A nonparametric view of network models and Newman–Girvan and other modularities.
*Proceedings of the National Academy of Sciences*, 106(50):21068–21073, 2009. - [11] N. Blachman. The convolution inequality for entropy powers.
*IEEE Transactions on Information Theory*, 11(2):267–271, 1965. - [12] S. G. Bobkov. Isoperimetric and analytic inequalities for log-concave probability measures.
*The Annals of Probability*, 27(4):1903–1921, 1999. - [13] B. Bollobás, O. Riordan, J. Spencer, and G. Tusnády. The Degree Sequence of a Scale-Free Random Graph Process.
*Random Structures & Algorithms*, 18(3):279–290, 2001. - [14] H. Breu and D. G. Kirkpatrick. Unit disk graph recognition is NP-hard.
*Computational Geometry*, 9(1):3–24, 1998. - [15] N. Broutin, L. Devroye, N. Fraiman, and G. Lugosi. Connectivity threshold of Bluetooth graphs.
*Random Structures & Algorithms*, 44(1):45–66, 2014. - [16] N. Broutin, L. Devroye, and G. Lugosi. Connectivity of sparse Bluetooth networks.
*Electronic Communications in Probability*, 20(48):1–10, 2015. - [17] N. Broutin, L. Devroye, and G. Lugosi. Almost optimal sparsification of random geometric graphs.
*The Annals of Applied Probability*, 26(5):3078–3109, 2016. - [18] S. Bubeck, L. Devroye, and G. Lugosi. Finding Adam in random growing trees.
*Random Structures & Algorithms*, 50(2):158–172, 2017. - [19] S. Bubeck, J. Ding, R. Eldan, and M. Z. Rácz. Testing for high-dimensional geometry in random graphs.
*Random Structures & Algorithms*, 49(3):503–532, 2016. - [20] S. Bubeck, R. Eldan, E. Mossel, and M. Z. Rácz. From trees to seeds: on the inference of the seed from large trees in the uniform attachment model.
*Bernoulli*, 23(4A):2887–2916, 2017. - [21] S. Bubeck and S. Ganguly. Entropic CLT and Phase Transition in High-dimensional Wishart Matrices.
*International Mathematics Research Notices*, to appear, 2016. - [22] S. Bubeck, E. Mossel, and M. Z. Rácz. On the influence of the seed graph in the preferential attachment model.
*IEEE Transactions on Network Science and Engineering*, 2(1):30–39, 2015. - [23] T. N. Bui, S. Chaudhuri, F. T. Leighton, and M. Sipser. Graph bisection algorithms with good average case behavior.
*Combinatorica*, 7(2):171–191, 1987. - [24] A. Condon and R. M. Karp. Algorithms for Graph Partitioning on the Planted Partition Model.
*Random Structures and Algorithms*, 18(2):116–140, 2001. - [25] N. Curien, T. Duquesne, I. Kortchemski, and I. Manolescu. Scaling limits and influence of the seed graph in preferential attachment trees.
*Journal de l’École polytechnique — Mathématiques*, 2:1–34, 2015. - [26] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications.
*Physical Review E*, 84(6):066106, 2011. - [27] L. Devroye, A. György, G. Lugosi, and F. Udina. High-dimensional random geometric graphs and their clique number.
*Electronic Journal of Probability*, 16:2481–2508, 2011. - [28] F. Eggenberger and G. Pólya. Über die Statistik verketteter Vorgänge.
*ZAMM - Journal of Applied Mathematics and Mechanics / Zeitschrift für Angewandte Mathematik und Mechanik*, 3(4):279–289, 1923. - [29] R. Eldan. An efficiency upper bound for inverse covariance estimation.
*Israel Journal of Mathematics*, 207(1):1–9, 2015. - [30] R. Eldan and D. Mikulincer. Information and dimensionality of anisotropic random geometric graphs. Preprint available at https://arxiv.org/abs/1609.02490, 2016.
- [31] P. Erdős and A. Rényi. On random graphs, I.
*Publicationes Mathematicae Debrecen*, 6:290–297, 1959. - [32] P. Erdős and A. Rényi. On the evolution of random graphs.
*Publ. Math. Inst. Hung. Acad. Sci.*, V.A(1–2):17–60, 1960. - [33] G. Fanti, P. Kairouz, S. Oh, and P. Viswanath. Spy vs. Spy: Rumor Source Obfuscation. In
*ACM SIGMETRICS*, volume 43, pages 271–284. ACM, 2015. - [34] A. L. Gibbs and F. E. Su. On choosing and bounding probability metrics.
*International Statistical Review*, 70(3):419–435, 2002. - [35] M. Girvan and M. E. Newman. Community structure in social and biological networks.
*Proceedings of the National Academy of Sciences*, 99(12):7821–7826, 2002. - [36] P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent Space Approaches to Social Network Analysis.
*Journal of the American Statistical Association*, 97(460):1090–1098, 2002. - [37] P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps.
*Social Networks*, 5(2):109–137, 1983.Mathematical Reviews (MathSciNet): MR718088

Digital Object Identifier: doi:10.1016/0378-8733(83)90021-7 - [38] S. Janson. Limit theorems for triangular urn schemes.
*Probability Theory and Related Fields*, 134(3):417–452, 2006. - [39] T. Jiang and D. Li. Approximation of Rectangular Beta-Laguerre Ensembles and Large Deviations.
*Journal of Theoretical Probability*, 28:804–847, 2015. - [40] V. Jog and P.-L. Loh. Analysis of Centrality in Sublinear Preferential Attachment Trees via the Crump-Mode-Jagers Branching Process.
*IEEE Transactions on Network Science and Engineering*, 4(1):1–12, 2017. - [41] O. Johnson and A. Barron. Fisher information inequalities and the central limit theorem.
*Probability Theory and Related Fields*, 129(3):391–409, 2004. - [42] I. M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis.
*Annals of Statistics*, 29(2):295–327, 2001. - [43] B. Karrer and M. E. Newman. Stochastic blockmodels and community structure in networks.
*Physical Review E*, 83(1):016107, 2011. - [44] J. Khim and P.-L. Loh. Confidence Sets for the Source of a Diffusion in Regular Trees.
*IEEE Transactions on Network Science and Engineering*, 4(1):27–40, 2017. - [45] Y. V. Linnik. An Information-Theoretic Proof of the Central Limit Theorem with Lindeberg Conditions.
*Theory of Probability & Its Applications*, 4(3):288–299, 1959. - [46] H. M. Mahmoud. Distances in random plane-oriented recursive trees.
*Journal of Computational and Applied Mathematics*, 41(1-2):237–245, 1992. - [47] L. Massoulié. Community detection thresholds and the weak Ramanujan property. In
*Proceedings of the 46th Annual ACM Symposium on Theory of Computing (STOC)*, pages 694–703. ACM, 2014. - [48] E. Mossel, J. Neeman, and A. Sly. A proof of the block model threshold conjecture. Preprint available at http://arxiv.org/abs/1311.4115, 2013.
- [49] E. Mossel, J. Neeman, and A. Sly. Belief propagation, robust reconstruction, and optimal recovery of block models. In
*Proceedings of the 27th Conference on Learning Theory (COLT)*, 2014. - [50] E. Mossel, J. Neeman, and A. Sly. Consistency thresholds for the planted bisection model. In
*Proceedings of the 47th Annual ACM on Symposium on Theory of Computing (STOC)*, pages 69–75. ACM, 2015. - [51] M. Penrose.
*Random Geometric Graphs*, volume 5 of*Oxford Studies in Probability*. Oxford University Press, 2003. - [52] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel.
*The Annals of Statistics*, 39(4):1878–1915, 2011.Mathematical Reviews (MathSciNet): MR2893856

Digital Object Identifier: doi:10.1214/11-AOS887

Project Euclid: euclid.aos/1314190618 - [53] A. Rudas, B. Tóth, and B. Valkó. Random trees and general branching processes.
*Random Structures & Algorithms*, 31(2):186–202, 2007. - [54] D. Shah and T. Zaman. Rumors in a Network: Who’s the Culprit?
*IEEE Transactions on Information Theory*, 57(8):5163–5181, 2011. - [55] D. Shah and T. Zaman. Finding Rumor Sources on Random Trees.
*Operations Research*, 64(3):736–755, 2016. - [56] C. E. Shannon. A Mathematical Theory of Communication.
*The Bell System Technical Journal*, 27:379–423, 623–656, 1948. - [57] A. J. Stam. Some inequalities satisfied by the quantities of information of Fisher and Shannon.
*Information and Control*, 2(2):101–112, 1959.Mathematical Reviews (MathSciNet): MR109101

#### The American Statistical Association, the Bernoulli Society, the Institute of Mathematical Statistics, and the Statistical Society of Canada

### More like this

- Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters

Leskovec, Jure, Lang, Kevin J., Dasgupta, Anirban, and Mahoney, Michael W., Internet Mathematics, 2009 - On a preferential attachment and generalized Pólya’s urn model

Collevecchio, Andrea, Cotar, Codina, and LiCalzi, Marco, The Annals of Applied Probability, 2013 - A testing based extraction algorithm for identifying significant communities in networks

Wilson, James D., Wang, Simi, Mucha, Peter J., Bhamidi, Shankar, and Nobel, Andrew B., The Annals of Applied Statistics, 2014

- Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters

Leskovec, Jure, Lang, Kevin J., Dasgupta, Anirban, and Mahoney, Michael W., Internet Mathematics, 2009 - On a preferential attachment and generalized Pólya’s urn model

Collevecchio, Andrea, Cotar, Codina, and LiCalzi, Marco, The Annals of Applied Probability, 2013 - A testing based extraction algorithm for identifying significant communities in networks

Wilson, James D., Wang, Simi, Mucha, Peter J., Bhamidi, Shankar, and Nobel, Andrew B., The Annals of Applied Statistics, 2014 - Consistency of community detection in networks under degree-corrected stochastic block models

Zhao, Yunpeng, Levina, Elizaveta, and Zhu, Ji, The Annals of Statistics, 2012 - Minimax rates of community detection in stochastic block models

Zhang, Anderson Y. and Zhou, Harrison H., The Annals of Statistics, 2016 - Random walk attachment graphs

Cannings, Chris and Jordan, Jonathan, Electronic Communications in Probability, 2013 - Bayesian degree-corrected stochastic blockmodels for community detection

Peng, Lijun and Carvalho, Luis, Electronic Journal of Statistics, 2016 - On a memory game and preferential attachment graphs

Acan, Hüseyin and Hitczenko, Paweł, Advances in Applied Probability, 2016 - Twitter event networks and the Superstar model

Bhamidi, Shankar, Steele, J. Michael, and Zaman, Tauhid, The Annals of Applied Probability, 2015 - Coupling Online and Offline Analyses for Randome Power Law Graphs

Chung, Fan and Lu, Linyuan, Internet Mathematics, 2004