Electronic Journal of Statistics

Classification and estimation in the Stochastic Blockmodel based on the empirical degrees

Antoine Channarond, Jean-Jacques Daudin, and Stéphane Robin

Full-text: Open access

Abstract

The Stochastic Blockmodel [16] is a mixture model for heterogeneous network data. Unlike the usual statistical framework, new nodes give additional information about the previous ones in this model. Thereby the distribution of the degrees concentrates in points conditionally on the node class. We show under a mild assumption that classification, estimation and model selection can actually be achieved with no more than the empirical degree data. We provide an algorithm able to process very large networks and consistent estimators based on it. In particular, we prove a bound of the probability of misclassification of at least one node, including when the number of classes grows.

Article information

Source
Electron. J. Statist., Volume 6 (2012), 2574-2601.

Dates
First available in Project Euclid: 11 January 2013

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1357913089

Digital Object Identifier
doi:10.1214/12-EJS753

Mathematical Reviews number (MathSciNet)
MR3020277

Zentralblatt MATH identifier
1295.62065

Subjects
Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 62H12: Estimation

Keywords
Stochastic Blockmodel unsupervised classification clustering estimation model selection

Citation

Channarond, Antoine; Daudin, Jean-Jacques; Robin, Stéphane. Classification and estimation in the Stochastic Blockmodel based on the empirical degrees. Electron. J. Statist. 6 (2012), 2574--2601. doi:10.1214/12-EJS753. https://projecteuclid.org/euclid.ejs/1357913089


Export citation

References

  • [1] R. Albert and A.L. Barabási. Statistical mechanics of complex networks., Reviews of modern physics, 74(1):47, 2002.
  • [2] C. Ambroise and C. Matias. New consistent and asymptotically normal parameter estimates for random-graph mixture models., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2011.
  • [3] P.J. Bickel and A. Chen. A nonparametric view of network models and Newman–Girvan and other modularities., Proceedings of the National Academy of Sciences, 106 (50) :21068, 2009.
  • [4] P.J. Bickel, A. Chen, and E. Levina. The method of moments and degree distributions for network models., The Annals of Statistics, 39(5):38–59, 2011.
  • [5] B. Bollobás, S. Janson, and O. Riordan. The phase transition in inhomogeneous random graphs., Random Structures & Algorithms, 31(1): 3–122, 2007. ISSN 1098-2418.
  • [6] A. Celisse, J.-J. Daudin, and L. Pierre. Consistency of maximum-likelihood and variational estimators in the stochastic block model., Electron. J. Statist., 6 :1847–1899, 2012. ISSN 1935-7524. 10.1214/12-EJS729
  • [7] D.S. Choi, P.J. Wolfe, and E.M. Airoldi. Stochastic blockmodels with growing number of classes., Arxiv preprint arXiv :1011.4644, 2010.
  • [8] A. Condon and R.M. Karp. Algorithms for graph partitioning on the planted partition model., Random Structures and Algorithms, 18(2): 116–140, 2001.
  • [9] J.J. Daudin, F. Picard, and S. Robin. A mixture model for random graphs., Statistics and computing, 18(2):173–183, 2008.
  • [10] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications., Physical Review E, 84(6) :066106, 2011.
  • [11] P. Erdős and A. Rényi. On random graphs, I., Publicationes Mathematicae (Debrecen), 6:290–297, 1959.
  • [12] K. Faust and S. Wasserman., Social network analysis: Methods and applications. Cambridge University Press, 1994.
  • [13] S.E. Fienberg and S.S. Wasserman. Categorical data analysis of single sociometric relations., Sociological methodology, 12:156–192, 1981. ISSN 0081-1750.
  • [14] E.N. Gilbert. Random graphs., The Annals of Mathematical Statistics, 30 (4) :1141–1144, 1959. ISSN 0003-4851.
  • [15] M. Girvan and M.E.J. Newman. Community structure in social and biological networks., Proceedings of the National Academy of Sciences of the United States of America, 99(12) :7821, 2002.
  • [16] P.W. Holland, K.B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps., Social Networks, 5(2):109–137, 1983.
  • [17] P.W. Holland and S. Leinhardt. An exponential family of probability distributions for directed graphs., Journal of the American Statistical Association, 76 (373):33–50, 1981. ISSN 0162-1459.
  • [18] T.S. Jaakkola. Tutorial on variational approximation methods., Advanced mean field methods: theory and practice, pages 129–159, 2000.
  • [19] A. Lancichinetti, S. Fortunato, and J. Kertész. Detecting the overlapping and hierarchical community structure in complex networks., New Journal of Physics, 11 :033015, 2009.
  • [20] F. Lorrain and H.C. White. Structural equivalence of individuals in social networks., The Journal of mathematical sociology, 1 (1):49–80, 1971.
  • [21] M.E.J. Newman. Modularity and community structure in networks., Proceedings of the National Academy of Sciences, 103 (23) :8577, 2006.
  • [22] K. Nowicki and T.A.B. Snijders. Estimation and prediction for stochastic blockstructures., Journal of the American Statistical Association, 96 (455) :1077–1087, 2001.
  • [23] F. Picard, V. Miele, J.J. Daudin, L. Cottret, and S. Robin. Deciphering the connectivity structure of biological networks using MixNet., BMC bioinformatics, 10(Suppl 6):S17, 2009.
  • [24] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional Stochastic Block Model., Arxiv preprint arXiv :1007.1684, 2010.
  • [25] T.A.B. Snijders and K. Nowicki. Estimation and prediction for stochastic blockmodels for graphs with latent block structure., Journal of Classification, 14(1):75–100, 1997.
  • [26] R. Van Der Hofstad. Random graphs and complex networks., Available on http://www.win.tue.nl/ rhofstad/NotesRGCN.pdf, 2009.
  • [27] M.J. Wainwright and M.I. Jordan. Graphical models, exponential families, and variational inference., Foundations and Trends® in Machine Learning, 1(1-2):1–305, 2008.