Electronic Journal of Statistics

Model selection in overlapping stochastic block models

Pierre Latouche, Etienne Birmelé, and Christophe Ambroise

Full-text: Open access

Abstract

Networks are a commonly used mathematical model to describe the rich set of interactions between objects of interest. Many clustering methods have been developed in order to partition such structures, among which several rely on underlying probabilistic models, typically mixture models. The relevant hidden structure may however show overlapping groups in several applications. The Overlapping Stochastic Block Model (Latouche, Birmelé and Ambroise (2011)) has been developed to take this phenomenon into account. Nevertheless, the problem of the choice of the number of classes in the inference step is still open. To tackle this issue, we consider the proposed model in a Bayesian framework and develop a new criterion based on a non asymptotic approximation of the marginal log-likelihood. We describe how the criterion can be computed through a variational Bayes EM algorithm, and demonstrate its efficiency by running it on both simulated and real data.

Article information

Source
Electron. J. Statist. Volume 8, Number 1 (2014), 762-794.

Dates
First available in Project Euclid: 9 June 2014

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1402320466

Digital Object Identifier
doi:10.1214/14-EJS903

Mathematical Reviews number (MathSciNet)
MR3217788

Zentralblatt MATH identifier
1349.62276

Keywords
Graph clustering random graph models overlapping stochastic block models model selection global and local variational techniques

Citation

Latouche, Pierre; Birmelé, Etienne; Ambroise, Christophe. Model selection in overlapping stochastic block models. Electron. J. Statist. 8 (2014), no. 1, 762--794. doi:10.1214/14-EJS903. https://projecteuclid.org/euclid.ejs/1402320466.


Export citation

References

  • Airoldi, E., Blei, D., Fienberg, S., and Xing, E., Mixed membership analysis of high-throughput interaction studies: relational data. ArXiv e-prints, 2007.
  • Airoldi, E., Blei, D., Xing, E., and Fienberg, S., Mixed membership stochastic block models for relational data with application to protein-protein interactions. In Proceedings of the International Biometrics Society Annual Meeting, 2006.
  • Airoldi, E.M., Blei, D.M., Fienberg, S.E., and Xing, E.P., Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9:1981–2014, 2008.
  • Albert, R. and Barabási, A.L., Statistical mechanics of complex networks. Modern Physics, 74:47–97, 2002.
  • Ball, B., Karrer, B., and Newman, M.E.J., An efficient and principled method for detecting communities in networks. Phys. Rev. E, 84(036103), 2011.
  • Barabási, A.L. and Oltvai, Z.N., Network biology: understanding the cell’s functional organization. Nature Rev. Genet., 5:101–113, 2004.
  • Beal, M.J. and Ghahramani, Z., The variational Bayesian em algorithm for incomplete data: with application to scoring graphical model structures. In J.M. Bernardo, M.J. Bayarri, J.O. Berger, A.P. Dawid, D. Heckerman, A.F.M. Smith, and M. West, editors, Bayesian Statistics 7: Proceedings of the 7th Valencia International Meeting, page 453, 2002.
  • Bickel, P.J. and Chen, A., A non parametric view of network models and newman-girvan and other modularities. In Proceedings of the National Academy of Sciences, volume 106, pages 21068–21073, 2009.
  • Biernacki, C., Celeux, G., and Govaert, G., Exact and monte carlo calculations of integrated likelihoods for the latent class model. Journal of Statistical Planning and Inference, 140:2991–3002, 2010.
  • Bishop, C.M., Pattern Recognition and Machine Learning. Springer-Verlag, 2006.
  • Bishop, C.M. and Svensén, M., Bayesian hierarchical mixtures of experts. In Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence, pages 57–64. U. Kjaerulff and C. Meek, 2003.
  • Blei, D., Ng, A.Y., and Jordan, M.I., Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.
  • Boer, P., Huisman, M., Snijders, T.A.B., Steglich, C.E.G., Wichers, L.H.Y., and Zeggelink, E.P.H., StOCNET: An Open Software System for the Advanced Statistical Analysis of Social Networks, 2006.
  • Daudin, J., Picard, F., and Robin, S., A mixture model for random graphs. Statistics and Computing, 18:1–36, 2008.
  • Dempster, A.P., Laird, N.M., and Rubin, D.B., Maximum likelihood for incomplete data via the em algorithm. Journal of the Royal Statistical Society, B39:1–38, 1977.
  • Estrada, E. and Rodriguez-Velazquez, J.A., Spectral measures of bipartivity in complex networks. Physical Review E, 72:046105, 2005.
  • Fienberg, S.E. and Wasserman, S., Categorical data analysis of single sociometric relations. Sociological Methodology, 12:156–192, 1981.
  • Frank, O. and Harary, F., Cluster inference by using transitivity indices in empirical graphs. Journal of the American Statistical Association, 77:835–840, 1982.
  • Gazal, S., Daudin, J.-J., and Robin, S., Accuracy of variational estimates for random graph mixture models. Journal of Statistical Computation and Simulation, 2011.
  • Girvan, M. and Newman, M.E.J., Community structure in social and biological networks. In Proceedings of the National Academy of Sciences, volume 99, pages 7821–7826, 2002.
  • Griffiths, T. and Ghahramani, Z., Infinite latent feature models and the indian buffet process. In Neural Information Processing Systems, volume 18, pages 475–482, 2005.
  • Handcock, M.S., Raftery, A.E., and Tantrum, J.M., Model-based clustering for social networks. Journal of the Royal Statistical Society, 170:1–22, 2007.
  • Heller, K. and Ghahramani, Z., A nonparametric Bayesian approach to modeling overlapping clusters. In In Proceedings of The 11th International Conference On AI And Statistics, 2007.
  • Heller, K., Williamson, S., and Ghahramani, Z., Statistical models for partial membership. In Proceedings of the 25th International Conference on Machine Learning (ICML), pages 392–399, 2008.
  • Hofman, J.M. and Wiggins, C.H., A Bayesian approach to network modularity. Physical Review Letters, 100:258701, 2008.
  • Holland, P., Laskey, K.B., and Leinhardt, S., Stochastic blockmodels: some first steps. Social Networks, 5:109–137, 1983.
  • Jaakkola, T.S. and Jordan, M.I., Bayesian parameter estimation via variational methods. Statistics and Computing, 10:25–37, 2000.
  • Jeffery, C.J., Moonlighting proteins. Trends in Biochemical Sciences, 24:8–11, 1999.
  • Krivitsky, P.N. and Handcock, M.S., The latentnet package, 2009.
  • Latouche, P., Birmelé, E., and Ambroise, C., Bayesian methods for graph clustering, pages 229–239. Springer, 2009.
  • Latouche, P., Birmelé, E., and Ambroise, C., Overlapping stochastic block models with application to the french political blogosphere. Annals of Applied Statistics, 5(1):309–336, 2011.
  • Latouche, P., Birmelé, E., and Ambroise, C., Variational Bayes inference and complexity control for stochastic block models. Statistical Modelling, 12(1):93–115, 2012.
  • Mariadassou, M., Robin, S., and Vacher, C., Uncovering latent structure in valued graphs: a variational approach. Annals of Applied Statistics, 4(2), 2010.
  • McLachlan, G. and Krishnan, T., The EM algorithm and extensions. New York: John Wiley, 1997.
  • Newman, M.E.J., Modularity and community structure in networks. In aaa, volume 103, pages 8577–8582, 2006.
  • Nowicki, K. and Snijders, T.A.B., Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96:1077–1087, 2001.
  • Palla, G., Barabási, A.L., and Vicsek, T., Quantifying social group evolution. Nature, 446:664–667, 2007.
  • Palla, G., Derenyi, I., Farkas, I., and Vicsek, T., Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435:814–818, 2005.
  • Palla, G., Derenyi, I., Farkas, I., and Vicsek, T., CFinder, the community cluster finding program, 2006.
  • Snijders, T.A.B. and Nowicki, K., Estimation and prediction for stochastic block-structures for graphs with latent block structure. Journal of Classification, 14:75–100, 1997.
  • Wang, Y.J. and Wong, G.Y., Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82:8–19, 1987.
  • Yang, J. and Lescovec, J., Overlapping community detection at scale: a nonnegative matrix factorization approach. In ACM International Conference on Web Search and Data Mining (WSDM), 2013.
  • Zanghi, H., Ambroise, C., and Miele, V., Fast online graph clustering via erdös renyi mixture. Pattern Recognition, 41(12):3592–3599, 2008.