The Annals of Statistics

Likelihood-based model selection for stochastic block models

Y. X. Rachel Wang and Peter J. Bickel

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

The stochastic block model (SBM) provides a popular framework for modeling community structures in networks. However, more attention has been devoted to problems concerning estimating the latent node labels and the model parameters than the issue of choosing the number of blocks. We consider an approach based on the log likelihood ratio statistic and analyze its asymptotic properties under model misspecification. We show the limiting distribution of the statistic in the case of underfitting is normal and obtain its convergence rate in the case of overfitting. These conclusions remain valid when the average degree grows at a polylog rate. The results enable us to derive the correct order of the penalty term for model complexity and arrive at a likelihood-based model selection criterion that is asymptotically consistent. Our analysis can also be extended to a degree-corrected block model (DCSBM). In practice, the likelihood function can be estimated using more computationally efficient variational methods or consistent label estimation algorithms, allowing the criterion to be applied to large networks.

Article information

Source
Ann. Statist., Volume 45, Number 2 (2017), 500-528.

Dates
Received: October 2015
Revised: February 2016
First available in Project Euclid: 16 May 2017

Permanent link to this document
https://projecteuclid.org/euclid.aos/1494921948

Digital Object Identifier
doi:10.1214/16-AOS1457

Mathematical Reviews number (MathSciNet)
MR3650391

Zentralblatt MATH identifier
1371.62017

Subjects
Primary: 62F05: Asymptotic properties of tests

Keywords
Stochastic block models model misspecification network communities likelihood ratio statistic

Citation

Wang, Y. X. Rachel; Bickel, Peter J. Likelihood-based model selection for stochastic block models. Ann. Statist. 45 (2017), no. 2, 500--528. doi:10.1214/16-AOS1457. https://projecteuclid.org/euclid.aos/1494921948


Export citation

References

  • Adamic, L. A. and Glance, N. (2005). The political blogosphere and the 2004 US election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery 36–43. ACM, New York.
  • Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.
  • Amini, A. A., Chen, A., Bickel, P. J. and Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. Ann. Statist. 41 2097–2122.
  • Ball, B., Karrer, B. and Newman, M. E. J. (2011). An efficient and principled method for detecting communities in networks. Phys. Rev. E (3) 84 036103.
  • Bickel, P. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–73.
  • Bickel, P. J. and Sarkar, P. (2016). Hypothesis testing for automated community detection in networks. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 253–273.
  • Bickel, P., Choi, D., Chang, X. and Zhang, H. (2013). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Ann. Statist. 41 1922–1943.
  • Celisse, A., Daudin, J.-J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Stat. 6 1847–1899.
  • Chen, K. and Lei, J. (2014). Network cross-validation for determining the number of communities in network data. Preprint. Available at arXiv:1411.1715.
  • Choi, D. S., Wolfe, P. J. and Airoldi, E. M. (2012). Stochastic blockmodels with a growing number of classes. Biometrika 99 273–284.
  • Daudin, J.-J., Picard, F. and Robin, S. (2008). A mixture model for random graphs. Stat. Comput. 18 173–183.
  • Decelle, A., Krzakala, F., Moore, C. and Zdeborová, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E (3) 84 066106.
  • Fishkind, D. E., Sussman, D. L., Tang, M., Vogelstein, J. T. and Priebe, C. E. (2013). Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown. SIAM J. Matrix Anal. Appl. 34 23–39.
  • Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks 5 109–137.
  • Joseph, A. and Yu, B. (2013). Impact of regularization on spectral clustering. Preprint. Available at arXiv:1312.1733.
  • Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107, 10.
  • Latouche, P., Birmelé, E. and Ambroise, C. (2012). Variational Bayesian inference and complexity control for stochastic block models. Stat. Model. 12 93–115.
  • Le, C. M. and Levina, E. (2015). Estimating the number of communities in networks by spectral methods. Preprint. Available at arXiv:1507.00827.
  • Lei, J. (2014). A goodness-of-fit test for stochastic block models. Preprint. Available at arXiv:1412.4857.
  • Leskovec, J. and Mcauley, J. J. (2012). Learning to discover social circles in ego networks. In Adv. Neural Inf. Process. Syst. 539–547. Lake Tahoe.
  • Newman, M. E. J. (2006a). Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103 8577–8582.
  • Newman, M. E. J. (2006b). Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E (3) 74 036104, 19.
  • Peixoto, T. P. (2013). Parsimonious module inference in large networks. Phys. Rev. Lett. 110 148701.
  • Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
  • Saldana, D. F., Yu, Y. and Feng, Y. (2014). How many communities are there? Preprint. Available at arXiv:1412.1684.
  • Wang, Y. X. R. and Bickel, P. J. (2016). Supplement to “Likelihood-based model selection for stochastic block models.” DOI:10.1214/16-AOS1457SUPP.
  • Wolfe, P. J. and Olhede, S. C. (2013). Nonparametric graphon estimation. Preprint. Available at arXiv:1309.5936.
  • Zhao, Y., Levina, E. and Zhu, J. (2011). Community extraction for social networks. Proc. Natl. Acad. Sci. USA 108 7321–7326.
  • Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40 2266–2292.

Supplemental materials

  • Supplement to “Likelihood-based model selection for stochastic block models”. A proof sketch of how the main results in the paper can be extended to the DCSBM described in Section 2.5 is provided in the supplement.