The Annals of Statistics

Co-clustering of nonsmooth graphons

David Choi

Abstract

Performance bounds are given for exploratory co-clustering/blockmodeling of bipartite graph data, where we assume the rows and columns of the data matrix are samples from an arbitrary population. This is equivalent to assuming that the data is generated from a nonsmooth graphon. It is shown that co-clusters found by any method can be extended to the row and column populations, or equivalently that the estimated blockmodel approximates a blocked version of the generative graphon, with estimation error bounded by $O_{P}(n^{-1/2})$. Analogous performance bounds are also given for degree-corrected blockmodels and random dot product graphs, with error rates depending on the dimensionality of the latent variable space.

Article information

Source
Ann. Statist., Volume 45, Number 4 (2017), 1488-1515.

Dates
Revised: March 2016
First available in Project Euclid: 28 June 2017

https://projecteuclid.org/euclid.aos/1498636864

Digital Object Identifier
doi:10.1214/16-AOS1497

Mathematical Reviews number (MathSciNet)
MR3670186

Zentralblatt MATH identifier
06773281

Citation

Choi, David. Co-clustering of nonsmooth graphons. Ann. Statist. 45 (2017), no. 4, 1488--1515. doi:10.1214/16-AOS1497. https://projecteuclid.org/euclid.aos/1498636864

References

• Airoldi, E. M., Costa, T. B. and Chan, S. H. (2013). Stochastic blockmodel approximation of a graphon: Theory and consistent estimation. In Advances in Neural Information Processing Systems 692–700.
• Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2009). Mixed membership stochastic blockmodels. In Advances in Neural Information Processing Systems 33–40.
• Aliprantis, C. D. and Border, K. C. (2006). Infinite Dimensional Analysis: A Hitchhiker’s Guide, 3rd ed. Springer, Berlin.
• Biau, G., Devroye, L. and Lugosi, G. (2008). On the performance of clustering in Hilbert spaces. IEEE Trans. Inform. Theory 54 781–790.
• Blondel, V. D., Guillaume, J.-L., Lambiotte, R. and Lefebvre, E. (2008). Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008 P10008.
• Borgs, C., Chayes, J. T., Cohn, H. and Ganguly, S. (2015). Consistent nonparametric estimation for heavy-tailed sparse graphs. Preprint. Available at arXiv:1508.06675.
• Cai, T. T. and Li, X. (2015). Robust and computationally feasible community detection in the presence of arbitrary outlier nodes. Ann. Statist. 43 1027–1059.
• Chen, A., Amini, A. A., Levina, E. and Bickel, P. J. (2012). Fitting community models to large sparse networks. Ann. Statist. 41 2097–2122.
• Chen, H.-C., Zou, W., Tien, Y.-J. and Chen, J. J. (2013). Identification of bicluster regions in a binary matrix and its applications. PLoS ONE 8 e71680.
• Choi, D. (2017). Supplement to “Co-clustering of nonsmooth graphons.” DOI:10.1214/16-AOS1497SUPP.
• Choi, D. and Wolfe, P. J. (2014). Co-clustering separately exchangeable network data. Ann. Statist. 42 29–63.
• Decelle, A., Krzakala, F., Moore, C. and Zdeborová, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84 066106.
• Diaconis, P. and Janson, S. (2007). Graph limits and exchangeable random graphs. Preprint. Available at arXiv:0712.2749.
• Gao, C., Lu, Y. and Zhou, H. H. (2014). Rate-optimal graphon estimation. Preprint. Available at arXiv:1410.5837.
• Gao, C., Ma, Z., Zhang, A. Y. and Zhou, H. H. (2015a). Achieving optimal misclassification proportion in stochastic block model. Preprint. Available at arXiv:1505.03772.
• Gao, C., Lu, Y., Ma, Z. and Zhou, H. H. (2015b). Optimal estimation and completion of matrices with biclustering structures. Preprint. Available at arXiv:1512.00150.
• Goh, K.-I., Cusick, M. E., Valle, D., Childs, B., Vidal, M. and Barabási, A.-L. (2007). The human disease network. Proc. Natl. Acad. Sci. USA 104 8685–8690.
• Harpaz, R., Perez, H., Chase, H. S., Rabadan, R., Hripcsak, G. and Friedman, C. (2011). Biclustering of adverse drug events in the FDA’s spontaneous reporting system. Clin. Pharmacol. Ther. 89 243–250.
• Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 1090–1098.
• Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. and Barabási, A.-L. (2000). The large-scale organization of metabolic networks. Nature 407 651–654.
• Ji, P. and Jin, J. (2014). Coauthorship and citation networks for statisticians. Preprint. Available at arXiv:1410.2840.
• Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E 83 016107, 10.
• Klopp, O., Tsybakov, A. B. and Verzelen, N. (2015). Oracle inequalities for network models and sparse graphon estimation. Preprint. Available at arXiv:1507.04118.
• Krzakala, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborová, L. and Zhang, P. (2013). Spectral redemption in clustering sparse networks. Proc. Natl. Acad. Sci. USA 110 20935–20940.
• Latouche, P., Birmelé, E. and Ambroise, C. (2011). Overlapping stochastic block models with application to the French political blogosphere. Ann. Appl. Stat. 5 309–336.
• Lovász, L. (2012). Large Networks and Graph Limits 60.
• Mossel, E., Neeman, J. and Sly, A. (2013). A proof of the block model threshold conjecture. Preprint. Available at arXiv:1311.4115.
• Newman, M. E. (2001). Scientific collaboration networks. I. Network construction and fundamental results. Phys. Rev. E 64 016131.
• Newman, M. E. (2013). Spectral community detection in sparse networks. Preprint. Available at arXiv:1308.6494.
• Olhede, S. C. and Wolfe, P. J. (2014). Network histograms and universality of blockmodel approximation. Proc. Natl. Acad. Sci. USA 111 14722–14727.
• Rohe, K., Qin, T. and Yu, B. (2012). Co-clustering for directed graphs: The stochastic co-Blockmodel and spectral algorithm Di-Sim. Preprint. Available at arXiv:1204.2296.
• Schneider, R. (2013). Convex Bodies: The Brunn–Minkowski Theory. Cambridge Univ. Press, Cambridge.
• Sussman, D. L., Tang, M. and Priebe, C. E. (2012). Universally consistent latent position estimation and vertex classification for random dot product graphs. Preprint. Available at arXiv:1207.6745.
• Sussman, D. L., Tang, M., Fishkind, D. E. and Priebe, C. E. (2012). A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Amer. Statist. Assoc. 107 1119–1128.
• Traud, A. L., Kelsic, E. D., Mucha, P. J. and Porter, M. A. (2011). Comparing community structure to characteristics in online collegiate social networks. SIAM Rev. 53 526–543.
• van Uitert, M., Meuleman, W. and Wessels, L. (2008). Biclustering sparse binary genomic data. J. Comput. Biol. 15 1329–1345.
• Wasserman, L. (2006). All of Nonparametric Statistics. Springer, New York.
• Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40 2266–2292.

Supplemental materials

• Supplement to “Co-clustering of nonsmooth graphons”. The supplementary material contains a proof of Lemma 7 and Theorem 2.