## The Annals of Statistics

### Exact recovery in the Ising blockmodel

#### Abstract

We consider the problem associated to recovering the block structure of an Ising model given independent observations on the binary hypercube. This new model, called the Ising blockmodel, is a perturbation of the mean field approximation of the Ising model known as the Curie–Weiss model: the sites are partitioned into two blocks of equal size and the interaction between those of the same block is stronger than across blocks, to account for more order within each block. We study probabilistic, statistical and computational aspects of this model in the high-dimensional case when the number of sites may be much larger than the sample size.

#### Article information

Source
Ann. Statist., Volume 47, Number 4 (2019), 1805-1834.

Dates
Revised: July 2017
First available in Project Euclid: 21 May 2019

https://projecteuclid.org/euclid.aos/1558425631

Digital Object Identifier
doi:10.1214/17-AOS1620

Mathematical Reviews number (MathSciNet)
MR3953436

Zentralblatt MATH identifier
07082271

#### Citation

Berthet, Quentin; Rigollet, Philippe; Srivastava, Piyush. Exact recovery in the Ising blockmodel. Ann. Statist. 47 (2019), no. 4, 1805--1834. doi:10.1214/17-AOS1620. https://projecteuclid.org/euclid.aos/1558425631

#### References

• Abbe, E. (2017). Community detection and stochastic block models: Recent developments. Preprint. Available at arXiv:1703.10146.
• Abbe, E., Bandeira, A. S. and Hall, G. (2016). Exact recovery in the stochastic block model. IEEE Trans. Inform. Theory 62 471–487.
• Abbe, E. and Sandon, C. (2015). Detection in the stochastic block model with multiple clusters: Proof of the achievability conjectures, acyclic BP, and the information-computation gap. Preprint. Available at arXiv:1512.09080.
• Abbe, E. and Sandon, C. (2016a). Achieving the ks threshold in the general stochastic block model with linearized acyclic belief propagation. In Advances in Neural Information Processing Systems 29 (D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon and R. Garnett, eds.) 1334–1342. Curran Associates, Inc., New York.
• Abbe, E. and Sandon, C. (2016b). Crossing the KS threshold in the stochastic block model with information theory. In Proceedings of the 2016 IEEE International Symposium on Information Theory (ISIT) 840–844. IEEE, New York.
• Alon, N., Krivelevich, M. and Sudakov, B. (1998). Finding a large hidden clique in a random graph. In Proceedings of the 1998 ACM–SIAM Symposium on Discrete Algorithms 594–598. SIAM, Philadelphia, PA.
• Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.
• Banks, J., Moore, C., Neeman, J. and Netrapalli, P. (2016). Information-theoretic thresholds for community detection in sparse networks. Preprint. Available at arXiv:1601.02658.
• Berthet, Q., Rigollet, P. and Srivastava, P. (2019). Supplement to “Exact recovery in the Ising blockmodel.” DOI:10.1214/17-AOS1620SUPP.
• Besag, J. (1986). On the statistical analysis of dirty pictures. J. Roy. Statist. Soc. Ser. B 48 259–302.
• Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge.
• Bresler, G. (2015). Efficiently learning Ising models on arbitrary graphs [extended abstract]. In Proceedings of the 2015 ACM Symposium on Theory of Computing 771–782. ACM, New York.
• Bresler, G., Gamarnik, D. and Shah, D. (2014). Learning graphical models from the glauber dynamics. Preprint. Available at arXiv:1410.7659.
• Bresler, G., Mossel, E. and Sly, A. (2008). Reconstruction of Markov random fields from samples: Some observations and algorithms. In Approximation, Randomization and Combinatorial Optimization. Lecture Notes in Computer Science 5171 343–356. Springer, Berlin.
• Bunea, F., Giraud, C. and Luo, X. (2015). Minimax optimal variable clustering in $G$-models via Cord. Preprint. Available at arXiv:1508.01939.
• Bunea, F., Giraud, C., Royer, M. and Verzelen, N. (2016). PECOK: A convex optimization approach to variable clustering. Preprint. Available at arXiv:1606.05100.
• Decelle, A., Krzakala, F., Moore, C. and Zdeborová, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E (3) 84 066106.
• Diaconis, P., Goel, S. and Holmes, S. (2008). Horseshoes in multidimensional scaling and local kernel methods. Ann. Appl. Stat. 2 777–807.
• Dyer, M. E. and Frieze, A. M. (1989). The solution of some random NP-hard problems in polynomial expected time. J. Algorithms 10 451–489.
• Fedele, M. and Unguendoli, F. (2012). Rigorous results on the bipartite mean-field model. J. Phys. A 45 385001.
• Feige, U. and Krauthgamer, R. (2002). A polylogarithmic approximation of the minimum bisection. SIAM J. Comput. 31 1090–1118 (electronic).
• Gao, C., Ma, Z., Zhang, A. Y. and Zhou, H. H. (2015). Achieving optimal misclassification proportion in stochastic block model. Preprint. Available at arXiv:1505.03772.
• Gao, C., Ma, Z., Zhang, A. Y. and Zhou, H. H. (2016). Community detection in degree-corrected block models. Preprint. Available at arXiv:1607.06993.
• Garey, M. R., Johnson, D. S. and Stockmeyer, L. (1976). Some simplified NP-complete graph problems. Theoret. Comput. Sci. 1 237–267.
• Goemans, M. X. and Williamson, D. P. (1995). Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. Assoc. Comput. Mach. 42 1115–1145.
• Hajek, B., Wu, Y. and Xu, J. (2016). Achieving exact cluster recovery threshold via semidefinite programming. IEEE Trans. Inform. Theory 62 2788–2797.
• Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw. 5 109–137.
• Ising, E. (1925). Beitrag zur Theorie des Ferromagnetismus. Z. Phys. 31 253–258.
• Laurent, M. and Poljak, S. (1996). On the facial structure of the set of correlation matrices. SIAM J. Matrix Anal. Appl. 17 530–547.
• Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford Univ. Press, New York.
• Lauritzen, S. L. and Sheehan, N. A. (2003). Graphical models for genetic analyses. Statist. Sci. 18 489–514.
• Lesieur, T., Krzakala, F. and Zdeborová, L. (2017). Constrained low-rank matrix estimation: Phase transitions, approximate message passing and applications. J. Stat. Mech. Theory Exp. 2017 073403.
• Li, L., Lu, P. and Yin, Y. (2012). Correlation decay up to uniqueness in spin systems. In Proceedings of the Twenty-Fourth Annual ACM–SIAM Symposium on Discrete Algorithms 67–84. SIAM, Philadelphia, PA.
• Manning, C. D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.
• Massoulié, L. (2014). Community detection thresholds and the weak Ramanujan property. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing. ACM, New York.
• McSherry, F. (2001). Spectral partitioning of random graphs. In 42nd IEEE Symposium on Foundations of Computer Science (Las Vegas, NV, 2001) 529–537. IEEE Computer Soc., Los Alamitos, CA.
• Moitra, A., Perry, W. and Wein, A. S. (2016). How robust are reconstruction thresholds for community detection? In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing 828–841.
• Montanari, A. and Saberi, A. (2010). The spread of innovations in social networks. Proc. Natl. Acad. Sci. USA 107 20196–20201.
• Mossel, E., Neeman, J. and Sly, A. (2013). A proof of the block model threshold conjecture. Preprint. Available at arXiv:1311.4115.
• Mossel, E., Neeman, J. and Sly, A. (2015). Reconstruction and estimation in the planted partition model. Probab. Theory Related Fields 162 431–461.
• Mossel, E., Neeman, J. and Sly, A. (2016). Belief propagation, robust reconstruction and optimal recovery of block models. Ann. Appl. Probab. 26 2211–2256.
• Mukherjee, R., Mukherjee, S. and Yuan, M. (2016). Global testing against sparse alternatives under Ising models. Preprint. Available at arXiv:1611.08293.
• Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $\ell_{1}$-regularized logistic regression. Ann. Statist. 38 1287–1319.
• Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
• Schneidman, E., Berry, M. J., Segev, R. and Bialek, W. (2006). Weak pairwise correlations imply strongly correlated network states in a neural population. Nature 440 1007–1012.
• Sebastiani, P., Ramoni, M. F., Nolan, V., Baldwin, C. T. and Steinberg, M. H. (2005). Genetic dissection and prognostic modeling of overt stroke in sickle cell anemia. Nat. Genet. 37 435–440.
• Sinclair, A., Srivastava, P. and Thurley, M. (2014). Approximation algorithms for two-state anti-ferromagnetic spin systems on bounded degree graphs. J. Stat. Phys. 155 666–686.
• Sly, A. and Sun, N. (2014). Counting in two-spin models on $d$-regular graphs. Ann. Probab. 42 2383–2416.
• Tropp, J. A. (2015). An introduction to matrix concentration inequalities. Found. Trends Mach. Learn. 8 1–230.
• Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York. Revised and extended from the 2004 French original, Translated by Vladimir Zaiats.
• Wang, T., Berthet, Q. and Samworth, R. J. (2016). Statistical and computational trade-offs in estimation of sparse principal components. Ann. Statist. 44 1896–1930.
• Weitz, D. (2006). Counting independent sets up to the tree threshold. In Proceedings of the 2006 ACM Symposium on the Theory of Computing 140–149. ACM, New York.

#### Supplemental materials

• Supplement to “Exact recovery in the Ising blockmodel”. The Supplementary Material contains additional facts about the Curie–Weiss model in Appendix A and proofs of technical results in Appendix B.