The Annals of Applied Probability

Consistency of modularity clustering on random geometric graphs

Erik Davis and Sunder Sethuraman

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Given a graph, the popular “modularity” clustering method specifies a partition of the vertex set as the solution of a certain optimization problem. In this paper, we discuss scaling limits of this method with respect to random geometric graphs constructed from i.i.d. points $\mathcal{X}_{n}=\{X_{1},X_{2},\ldots,X_{n}\}$, distributed according to a probability measure $\nu$ supported on a bounded domain $D\subset\mathbb{R}^{d}$. Among other results, we show, via a Gamma convergence framework, a geometric form of consistency: When the number of clusters, or partitioning sets of $\mathcal{X}_{n}$ is a priori bounded above, the discrete optimal modularity clusterings converge in a specific sense to a continuum partition of the underlying domain $D$, characterized as the solution to a “soap bubble” or “Kelvin”-type shape optimization problem.

Article information

Source
Ann. Appl. Probab., Volume 28, Number 4 (2018), 2003-2062.

Dates
Received: April 2016
Revised: February 2017
First available in Project Euclid: 9 August 2018

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1533780266

Digital Object Identifier
doi:10.1214/17-AAP1313

Mathematical Reviews number (MathSciNet)
MR3843822

Zentralblatt MATH identifier
06974744

Subjects
Primary: 60D05: Geometric probability and stochastic geometry [See also 52A22, 53C65]
Secondary: 62G20: Asymptotic properties 05C82: Small world graphs, complex networks [See also 90Bxx, 91D30] 49J55: Problems involving randomness [See also 93E20] 49J45: Methods involving semicontinuity and convergence; relaxation 68R10: Graph theory (including graph drawing) [See also 05Cxx, 90B10, 90B35, 90C35]

Keywords
Modularity community detection consistency random geometric graph Gamma convergence Kelvin’s problem scaling limit shape optimization optimal transport total variation perimeter

Citation

Davis, Erik; Sethuraman, Sunder. Consistency of modularity clustering on random geometric graphs. Ann. Appl. Probab. 28 (2018), no. 4, 2003--2062. doi:10.1214/17-AAP1313. https://projecteuclid.org/euclid.aoap/1533780266


Export citation

References

  • [1] Adams, R. A. and Fournier, J. J. F. (2003). Sobolev Spaces, 2nd ed. Pure and Applied Mathematics (Amsterdam) 140. Elsevier/Academic Press, Amsterdam.
  • [2] Alberti, G. and Bellettini, G. (1998). A non-local anisotropic model for phase transitions: Asymptotic behaviour of rescaled energies. European J. Appl. Math. 9 261–284.
  • [3] Ambrosio, L., Fusco, N. and Pallara, D. (2000). Functions of Bounded Variation and Free Discontinuity Problems. Oxford Mathematical Monographs. The Clarendon Press, Oxford Univ. Press, New York.
  • [4] Ambrosio, L., Gigli, N. and Savaré, G. (2005). Gradient Flows in Metric Spaces and in the Space of Probability Measures. Birkhäuser, Basel.
  • [5] Antonioni, A., Eglof, M. and Tomassini, M. (2013). An energy-based model for spatial social networks. In Advances in Artificial Life ECAL 2013 226–231. MIT Press, Cambridge, MA.
  • [6] Arias-Castro, E. and Pelletier, B. (2013). On the convergence of maximum variance unfolding. J. Mach. Learn. Res. 14 1747–1770.
  • [7] Arias-Castro, E., Pelletier, B. and Pudlo, P. (2012). The normalized graph cut and Cheeger constant: From discrete to continuous. Adv. in Appl. Probab. 44 907–937.
  • [8] Belkin, M. and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15 1373–1396.
  • [9] Belkin, M. and Niyogi, P. (2008). Towards a theoretical foundation for Laplacian-based manifold methods. J. Comput. System Sci. 74 1289–1308.
  • [10] Bettstetter, C. (2002). On the minimum node degree and connectivity of a wireless multihop network. In Proceedings of the 3rd ACM International Symposium on Mobile Ad Hoc Networking & Computing 80–91. ACM, New York.
  • [11] Bickel, P. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
  • [12] Blondel, V., Guillaume, J., Lambiotte, R. and Lefebvre, E. (2008). Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008 10008–10020.
  • [13] Braides, A. (2002). $\Gamma$-Convergence for Beginners. Oxford Lecture Series in Mathematics and Its Applications 22. Oxford Univ. Press, Oxford.
  • [14] Braides, A. and Gelli, M. S. (2006). From discrete systems to continuous variational problems: An introduction. In Topics on Concentration Phenomena and Problems with Multiple Scales. Lect. Notes Unione Mat. Ital. 2 3–77. Springer, Berlin.
  • [15] Braides, A. and Truskinovsky, L. (2008). Asymptotic expansions by $\Gamma$-convergence. Contin. Mech. Thermodyn. 20 21–62.
  • [16] Brakke, K. A. (1992). The surface evolver. Exp. Math. 1 141–165.
  • [17] Brandes, U., Delling, D., Gaertler, M., Görke, R., Hoefer, M., Nikoloski, Z. and Wagner, D. (2008). On modularity clustering. IEEE Trans. Knowl. Data Eng. 20 172–188.
  • [18] Cañete, A. and Ritoré, M. (2004). Least-perimeter partitions of the disk into three regions of given areas. Indiana Univ. Math. J. 53 883–904.
  • [19] Clauset, A., Newman, M. and Moore, C. (2004). Finding community structure in very large networks. Phys. Rev. E 70 066111.
  • [20] Coifman, R. R. and Lafon, S. (2006). Diffusion maps. Appl. Comput. Harmon. Anal. 21 5–30.
  • [21] Cox, S. J. and Flikkema, E. (2010). The minimal perimeter for $N$ confined deformable bubbles of equal area. Electron. J. Combin. 17 Research Paper 45.
  • [22] Davis, E. and Sethuraman, S. (2017). Consistency of modularity clustering on random geometric graphs. Available at arXiv:1604.03993v1.
  • [23] de la Peña, V. H. and Montgomery-Smith, S. J. (1995). Decoupling inequalities for the tail probabilities of multivariate $U$-statistics. Ann. Probab. 23 806–816.
  • [24] Dhara, M. and Shukla, K. K. (2012). Advanced cost based graph clustering algorithm for random geometric graphs. Int. J. Comput. Appl. 60 20–34.
  • [25] Díaz, J., Petit, J. and Serna, M. (2002). A survey of graph layout problems. ACM Comput. Surv. 34 313–356.
  • [26] Dí az, J., Penrose, M. D., Petit, J. and Serna, M. (2001). Approximating layout problems on random geometric graphs. J. Algorithms 39 78–116.
  • [27] Dudley, R. M. (2002). Real Analysis and Probability. Cambridge Studies in Advanced Mathematics 74. Cambridge Univ. Press, Cambridge. Revised reprint of the 1989 original.
  • [28] Durrett, R. (2010). Probability: Theory and Examples, 4th ed. Cambridge Series in Statistical and Probabilistic Mathematics 31. Cambridge Univ. Press, Cambridge.
  • [29] El Gamal, A., Mammen, J., Prabhakar, B. and Shah, D. (2004). Throughput-delay trade-off in wireless networks. In Twenty-Third Annual Joint Conference Proceedings of the IEEE Computer and Communications Societies.
  • [30] Folland, G. B. (2013). Real Analysis: Modern Techniques and Their Applications, 2nd ed. Wiley, New York.
  • [31] Fortuna, M., Stouffer, D., Olesen, J., Jordano, P., Mouillot, D., Krasnov, B., Poulin, R. and Bascompte, J. (2010). Nestedness versus modularity in ecological networks: Two sides of the same coin? J. Anim. Ecol. 79 811–817.
  • [32] Fortunato, S. (2010). Community detection in graphs. Phys. Rep. 486 75–174.
  • [33] Fortunato, S. and Barthélemy, M. (2006). Resolution limit in community detection. Proc. Natl. Acad. Sci. USA 104 36–41.
  • [34] Franceschetti, M. and Meester, R. (2007). Random Networks for Communication: From Statistical Physics to Information Systems. Cambridge Series in Statistical and Probabilistic Mathematics 24. Cambridge Univ. Press, Cambridge.
  • [35] García Trillos, N. and Slepčev, D. (2016). A variational approach to the consistency of spectral clustering. Appl. Comput. Harmon. Anal.
  • [36] García Trillos, N., Slepčev, D. and von Brecht, J. (2016). Estimating perimeter using graph cuts. Available at arXiv:1602.04102.
  • [37] García Trillos, N. and Slepčev, D. (2015). On the rate of convergence of empirical measures in $\infty$-transportation distance. Canad. J. Math. 67 1358–1383.
  • [38] García Trillos, N. and Slepčev, D. (2016). Continuum limit of total variation on point clouds. Arch. Ration. Mech. Anal. 220 193–241.
  • [39] García Trillos, N., Slepčev, D., von Brecht, J., Laurent, T. and Bresson, X. (2016). Consistency of Cheeger and ratio graph cuts. J. Mach. Learn. Res. 17 Paper No. 181.
  • [40] Giné, E. and Koltchinskii, V. (2006). Empirical graph Laplacian approximation of Laplace–Beltrami operators: Large sample results. In High Dimensional Probability. Institute of Mathematical Statistics Lecture Notes—Monograph Series 51 238–259. IMS, Beachwood, OH.
  • [41] Giné, E., Latała, R. and Zinn, J. (2000). Exponential and moment inequalities for $U$-statistics. In High Dimensional Probability, II (Seattle, WA, 1999). Progress in Probability 47 13–38. Birkhäuser, Boston, MA.
  • [42] Good, B. H., de Montjoye, Y.-A. and Clauset, A. (2010). Performance of modularity maximization in practical contexts. Phys. Rev. E (3) 81 046106.
  • [43] Guimera, R. and Amaral, L. (2005). Functional cartography of complex metabolic networks. Nature 433 895–900.
  • [44] Guimera, R., Sales-Pardo, M. and Amaral, L. (2004). Modularity from fluctuations in random graphs and complex networks. Phys. Rev. E 70 025101.
  • [45] Gupta, P. and Kumar, P. R. (2000). The capacity of wireless networks. IEEE Trans. Inform. Theory 46 388–404.
  • [46] Hagmann, P., Cammoun, L., Gigandet, X., Meuli, R., Honey, C. J., Wedeen, V. J. and Sporns, O. (2008). Mapping the structural core of human cerebral cortex. PLoS Biol. 6 e159.
  • [47] Hartigan, J. A. (1981). Consistency of single linkage for high-density clusters. J. Amer. Statist. Assoc. 76 388–394.
  • [48] Hein, M., Audibert, J.-Y. and von Luxburg, U. (2005). From graphs to manifolds—Weak and strong pointwise consistency of graph Laplacians. In Learning Theory. Lecture Notes in Computer Science 3559 470–485. Springer, Berlin.
  • [49] Hu, H., Laurent, T., Porter, M. A. and Bertozzi, A. L. (2013). A method based on total variation for network modularity optimization using the MBO scheme. SIAM J. Appl. Math. 73 2224–2246.
  • [50] Lancichinetti, A. and Fortunato, S. (2011). Limits of modularity maximization in community detection. Phys. Rev. E 84 066122.
  • [51] Le, C. M., Levina, E. and Vershynin, R. (2016). Optimization via low-rank approximation for community detection in networks. Ann. Statist. 44 373–400.
  • [52] Meester, R. and Roy, R. (1996). Continuum Percolation. Cambridge Tracts in Mathematics 119. Cambridge Univ. Press, Cambridge.
  • [53] Mill, J., Tang, T., Kaminsky, Z., Khare, T., Yazdanpanah, S., Bouchard, L., Jia, P., Assadzadeh, A., Flanagan, J., Schumacher, A., Wang, S.-C. and Petronis, A. (2008). Epigenomic profiling reveals DNA-methylation changes associated with major psychosis. Am. J. Hum. Genet. 82 696–711.
  • [54] Morgan, F. (2009). Geometric Measure Theory: A Beginner’s Guide, 4th ed. Elsevier/Academic Press, Amsterdam.
  • [55] Newman, M. (2006). Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103 8577–8582.
  • [56] Newman, M. (2013). Spectral methods for community detection and graph partitioning. Phys. Rev. E 88 042822.
  • [57] Newman, M. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E (3) 69 026113.
  • [58] Newman, M. E. J. (2006). Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E (3) 74 036104.
  • [59] Oudet, É. (2011). Approximation of partitions of least perimeter by $\Gamma$-convergence: Around Kelvin’s conjecture. Exp. Math. 20 260–270.
  • [60] Penrose, M. (2003). Random Geometric Graphs. Oxford Studies in Probability 5. Oxford Univ. Press, Oxford.
  • [61] Pollard, D. (1981). Strong consistency of $k$-means clustering. Ann. Statist. 9 135–140.
  • [62] Porter, M., Mucha, P., Newman, M. and Warmbrand, C. (2005). A network analysis of committees in the US House of Representatives. Proc. Natl. Acad. Sci. USA 102 7057–7062.
  • [63] Porter, M. A., Onnela, J.-P. and Mucha, P. J. (2009). Communities in networks. Notices Amer. Math. Soc. 56 1082–1097.
  • [64] Przulj, N., Corneil, D. G. and Jurisica, I. (2004). Modeling interactome: Scale-free or geometric? Bioinformatics 20 3508–3515.
  • [65] Reichardt, J. and Bornholdt, S. (2006). Statistical mechanics of community detection. Phys. Rev. E (3) 74 016110.
  • [66] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
  • [67] Sabin, M. (1987). Convergence and consistency of fuzzy c-means/ISODATA algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 9 661–668.
  • [68] Shorack, G. R. and Wellner, J. A. (2009). Empirical Processes with Applications to Statistics. Classics in Applied Mathematics 59. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA. Reprint of the 1986 original.
  • [69] Singer, A. (2006). From graph to manifold Laplacian: The convergence rate. Appl. Comput. Harmon. Anal. 21 128–134.
  • [70] Singer, A. and Wu, H.-T. (2017). Spectral convergence of the connection Laplacian from random samples. Inf. Inference 6 58–123.
  • [71] Thorpe, M., Theil, F., Johansen, A. M. and Cade, N. (2015). Convergence of the $k$-means minimization problem using $\Gamma$-convergence. SIAM J. Appl. Math. 75 2444–2474.
  • [72] Ting, D., Huang, L. and Jordan, M. I. (2010). An analysis of the convergence of graph Laplacians. In Proceedings of the 27th International Conference on Machine Learning.
  • [73] van Gennip, Y. and Bertozzi, A. L. (2012). $\Gamma$-convergence of graph Ginzburg–Landau functionals. Adv. Differential Equations 17 1115–1180.
  • [74] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
  • [75] Villani, C. (2004). Topics in Optimal Transportation. American Mathematical Society, Providence, RI.
  • [76] Villani, C. (2009). Optimal Transport: Old and New. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 338. Springer, Berlin.
  • [77] von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Statist. 36 555–586.
  • [78] Wets, R. J.-B. (1999). Statistical estimation from an optimization viewpoint. Ann. Oper. Res. 85 79–101.
  • [79] Zhang, X. and Newman, M. (2015). Multiway spectral community detection in networks. Phys. Rev. E 92 052808.
  • [80] Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40 2266–2292.