## The Annals of Statistics

### Consistency of spectral hypergraph partitioning under planted partition model

#### Abstract

Hypergraph partitioning lies at the heart of a number of problems in machine learning and network sciences. Many algorithms for hypergraph partitioning have been proposed that extend standard approaches for graph partitioning to the case of hypergraphs. However, theoretical aspects of such methods have seldom received attention in the literature as compared to the extensive studies on the guarantees of graph partitioning. For instance, consistency results of spectral graph partitioning under the stochastic block model are well known. In this paper, we present a planted partition model for sparse random nonuniform hypergraphs that generalizes the stochastic block model. We derive an error bound for a spectral hypergraph partitioning algorithm under this model using matrix concentration inequalities. To the best of our knowledge, this is the first consistency result related to partitioning nonuniform hypergraphs.

#### Article information

Source
Ann. Statist., Volume 45, Number 1 (2017), 289-315.

Dates
Revised: February 2016
First available in Project Euclid: 21 February 2017

https://projecteuclid.org/euclid.aos/1487667624

Digital Object Identifier
doi:10.1214/16-AOS1453

Mathematical Reviews number (MathSciNet)
MR3611493

Zentralblatt MATH identifier
1360.62330

#### Citation

Ghoshdastidar, Debarghya; Dukkipati, Ambedkar. Consistency of spectral hypergraph partitioning under planted partition model. Ann. Statist. 45 (2017), no. 1, 289--315. doi:10.1214/16-AOS1453. https://projecteuclid.org/euclid.aos/1487667624

#### References

• Achlioptas, D. and Coja-Oghlan, A. (2008). Algorithmic barriers from phase transitions. In Proceedings of 49th Annual Symposium on Foundations of Computer Science.
• Agarwal, S., Branson, K. and Belongie, S. (2006). Higher order learning with graphs. In Proceedings of the International Conference on Machine Learning (Pittsburgh, Pennsylvania, 2006) 17–24. ACM, New York.
• Alon, N. and Kahale, N. (1997). A spectral technique for coloring random $3$-colorable graphs. SIAM J. Comput. 26 1733–1748.
• Alon, N., Krivelevich, M. and Sudakov, B. (1998). Finding a large hidden clique in a random graph. In Proceedings of the Ninth Annual ACM–SIAM Symposium on Discrete Algorithms (San Francisco, CA, 1998) 594–598. ACM, New York.
• Alpert, C. J. (1998). The ISPD98 circuit benchmark suite. In ISPD’98: Proceedings of the 1998 International Symposium on Physical Design 80–85. ACM, New York.
• Amini, A. A. and Levina, E. (2014). On semi-definite relaxations for the block model. Available at arXiv:1406.5647.
• Andritsos, P., Tsaparas, P., Miller, R. J. and Sevcik, K. C. (2004). LIMBO: Scalable clustering of categorical data. In International Conference on Extending Database Technology (Heraklion, Crete, Greece, 2004) 123–146. Springer, Berlin, Heidelberg.
• Berge, C. (1984). Hypergraphs: Combinatorics of Finite Sets. Elsevier, Amsterdam.
• Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
• Bolla, M. (1993). Spectra, Euclidean representations and clusterings of hypergraphs. Discrete Math. 117 19–39.
• Chen, H. and Frieze, A. (1996). Coloring bipartite hypergraphs. In Integer Programming and Combinatorial Optimization (Vancouver, BC, 1996). Lecture Notes in Computer Science 1084 345–358. Springer, Berlin.
• Chen, Y., Sanghavi, S. and Xu, H. (2014). Improved graph clustering. IEEE Trans. Inform. Theory 60 6440–6455.
• Choi, D. S., Wolfe, P. J. and Airoldi, E. M. (2012). Stochastic blockmodels with a growing number of classes. Biometrika 99 273–284.
• Chung, F. R. K. (1993). The Laplacian of a hypergraph. In Expanding Graphs (Princeton, NJ, 1992). DIMACS Ser. Discrete Math. Theoret. Comput. Sci. 10 21–36. Amer. Math. Soc., Providence, RI.
• Chung, F. and Radcliffe, M. (2011). On the spectra of general random graphs. Electron. J. Combin. 18 Paper 215.
• Cooper, J. and Dutle, A. (2012). Spectra of uniform hypergraphs. Linear Algebra Appl. 436 3268–3292.
• Darling, R. W. R. and Norris, J. R. (2005). Structure of large random hypergraphs. Ann. Appl. Probab. 15 125–152.
• Decelle, A., Krzakala, F., Moore, C. and Zdeborová, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84 066106.
• Fiedler, M. (1973). Algebraic connectivity of graphs. Czechoslovak Math. J. 23 (98) 298–305.
• Friedman, J., Kahn, J. and Szemeredi, E. (1989). On the second eigenvalue of random regular graphs. In Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing.
• Gao, C., Ma, Z., Zhang, A. Y. and Zhou, H. H. (2015). Achieving optimal misclassification proportion in stochastic block model. Available at arXiv:1505.03772.
• Ghoshal, G., Zlatić, V., Caldarelli, G. and Newman, M. E. J. (2009). Random hypergraphs and their applications. Phys. Rev. E 79 066118.
• Ghoshdastidar, D. and Dukkipati, A. (2014). Consistency of spectral partitioning of uniform hypergraphs under planted partition model. In Advances in Neural Information Processing Systems (Montréal, Canada, 2014) 397–405. Curran Associates, Inc., Red Hook, NY.
• Ghoshdastidar, D. and Dukkipati, A. (2015). A provable generalized tensor spectral method for uniform hypergraph partitioning. In Proceedings of the International Conference on Machine Learning (Lille, France, 2015). 400–409.
• Ghoshdastidar, D. and Dukkipati, A. (2016). Supplement to “Consistency of spectral hypergraph partitioning under planted partition model.” DOI:10.1214/16-AOS1453SUPP.
• Gibson, D., Kleinberg, J. and Raghavan, P. (2000). Clustering categorical data: An approach based on dynamical systems. VLDB J. 8 222–236.
• Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99 7821–7826 (electronic).
• Govindu, V. M. (2005). A tensor decomposition for geometric grouping and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (San Diego, CA, USA, 2005) 1150–1157. IEEE Computer Society, Washington, DC.
• Guimera, R. and Amaral, L. A. N. (2005). Functional cartography of complex metabolic networks. Nature 433 895–900.
• Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw. 5 109–137.
• Hu, S. and Qi, L. (2012). Algebraic connectivity of an even uniform hypergraph. J. Comb. Optim. 24 564–579.
• Karypis, G. and Kumar, V. (2000). Multilevel $k$-way hypergraph partitioning. VLSI Des. 11 285–300.
• Kernighan, B. W. and Lin, S. (1970). An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49 291–307.
• Krzakala, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborová, L. and Zhang, P. (2013). Spectral redemption in clustering sparse networks. Proc. Natl. Acad. Sci. USA 110 20935–20940.
• Kumar, A., Sabharwal, Y. and Sen, S. (2004). A simple linear time $(1+\varepsilon)$-approximation algorithm for geometric $k$-means clustering in any dimensions. In Proceedings of the Annual Symposium on Foundations of Computer Science (Rome, Italy, 2004) 454–462. IEEE Computer Society, Washington, DC.
• Le, C. M., Levina, E. and Vershynin, R. (2015). Sparse random graphs: Regularization and concentration of the Laplacian. Available at arXiv:1502.03049.
• Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Ann. Statist. 43 215–237.
• Lei, J. and Zhu, L. (2014). A generic sample splitting approach for refined community recovery in stochastic block models. Available at arXiv:1411.1469.
• Lichman, M. (2013). UCI machine learning repository. Available at http://archive.ics.uci.edu/ml.
• Lloyd, S. P. (1982). Least squares quantization in PCM. IEEE Trans. Inform. Theory 28 129–137.
• McSherry, F. (2001). Spectral partitioning of random graphs. In 42nd IEEE Symposium on Foundations of Computer Science (Las Vegas, NV, 2001) 529–537. IEEE Computer Soc., Los Alamitos, CA.
• Michoel, T. and Nachtergaele, B. (2012). Alignment and integration of complex networks by hypergraph-based spectral clustering. Phys. Rev. E 86 056111.
• Ng, A., Jordan, M. and Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems (Vancouver, British Columbia, Canada) 849–856. MIT Press, Cambridge, MA.
• Ostrovsky, R., Rabani, Y., Schulman, L. J. and Swamy, C. (2012). The effectiveness of Lloyd-type methods for the $k$-means problem. J. ACM 59 Art. 28.
• Panagiotou, K. and Coja-Oghlan, A. (2012). Catching the $k$-NAESAT threshold. In ACM Symposium on Theory of Computing.
• Rodríguez, J. A. (2002). On the Laplacian eigenvalues and metric parameters of hypergraphs. Linear Multilinear Algebra 50 1–14.
• Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
• Schmidt-Pruzan, J. and Shamir, E. (1985). Component structure in the evolution of random hypergraphs. Combinatorica 5 81–94.
• Schweikert, G. and Kernighan, B. W. (1979). A proper model for the partitioning of electrical circuits. In Proceedings of the 9th Design Automation Workshop 57–62. ACM, New York, NY.
• Stasi, D., Sadeghi, K., Rinaldo, A., Petrovic, S. and Fienberg, S. (2014). $\beta$ models for random hypergraphs with a given degree sequence. In Proceedings of COMPSTAT 201421st International Conference on Computational Statistics 593–600. Internat. Statist. Inst., The Hague.
• Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory. Computer Science and Scientific Computing. Academic Press, Boston, MA.
• Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 389–434.
• von Luxburg, U. (2007). A tutorial on spectral clustering. Stat. Comput. 17 395–416.
• Vu, V. (2014). A simple SVD algorithm for finding hidden partitions. Available at arXiv:1404.3918.
• Wasserman, S. (1994). Social Network Analysis: Methods and Applications. Cambridge Univ. Press, Cambridge.
• Zhang, Y., Levina, E. and Zhu, J. (2014). Detecting overlapping communities in networks with spectral methods. Available at arXiv:1412.3432v2.
• Zhou, D., Huang, J. and Schölkopf, B. (2007). Learning with hypergraphs: Clustering, classification, and embedding. In Advances in Neural Information Processing Systems (Vancouver, British Columbia, Canada) 1601–1608. MIT Press, Cambridge, MA.

#### Supplemental materials

• Supplement to “Consistency of spectral hypergraph partitioning under planted partition model”. The supplementary material contains detailed proofs of all the lemmas and corollaries stated in Sections 4 and 5.