## The Annals of Statistics

### Detection thresholds for the $\beta$-model on sparse graphs

#### Abstract

In this paper, we study sharp thresholds for detecting sparse signals in $\beta$-models for potentially sparse random graphs. The results demonstrate interesting interplay between graph sparsity, signal sparsity and signal strength. In regimes of moderately dense signals, irrespective of graph sparsity, the detection thresholds mirror corresponding results in independent Gaussian sequence problems. For sparser signals, extreme graph sparsity implies that all tests are asymptotically powerless, irrespective of the signal strength. On the other hand, sharp detection thresholds are obtained, up to matching constants, on denser graphs. The phase transitions mentioned above are sharp. As a crucial ingredient, we study a version of the higher criticism test which is provably sharp up to optimal constants in the regime of sparse signals. The theoretical results are further verified by numerical simulations.

#### Article information

Source
Ann. Statist., Volume 46, Number 3 (2018), 1288-1317.

Dates
Revised: May 2017
First available in Project Euclid: 3 May 2018

https://projecteuclid.org/euclid.aos/1525313083

Digital Object Identifier
doi:10.1214/17-AOS1585

#### Citation

Mukherjee, Rajarshi; Mukherjee, Sumit; Sen, Subhabrata. Detection thresholds for the $\beta$-model on sparse graphs. Ann. Statist. 46 (2018), no. 3, 1288--1317. doi:10.1214/17-AOS1585. https://projecteuclid.org/euclid.aos/1525313083

#### References

• Addario-Berry, L., Broutin, N., Devroye, L. and Lugosi, G. (2010). On combinatorial testing problems. Ann. Statist. 38 3063–3092.
• Arias-Castro, E., Candès, E. J. and Plan, Y. (2011). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist. 39 2533–2556.
• Arias-Castro, E., Donoho, D. L. and Huo, X. (2005). Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory 51 2402–2425.
• Arias-Castro, E. and Verzelen, N. (2013). Community detection in random networks. Available at arXiv:1302.7099.
• Arias-Castro, E. and Wang, M. (2015). The sparse Poisson means model. Electron. J. Stat. 9 2170–2201.
• Arias-Castro, E., Candès, E. J., Helgason, H. and Zeitouni, O. (2008). Searching for a trail of evidence in a maze. Ann. Statist. 36 1726–1757.
• Barabási, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. Science 286 509–512.
• Barnett, I., Mukherjee, R. and Lin, X. (2017). The generalized higher criticism for testing SNP-set effects in genetic association studies. J. Amer. Statist. Assoc. 112 64–76.
• Barvinok, A. and Hartigan, J. A. (2013). The number of graphs and a random graph with a given degree sequence. Random Structures Algorithms 42 301–348.
• Bickel, P. J., Chen, A. and Levina, E. (2011). The method of moments and degree distributions for network models. Ann. Statist. 39 2280–2301.
• Blitzstein, J. and Diaconis, P. (2010). A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Internet Math. 6 489–522.
• Bollobás, B. (2001). Random Graphs, 2nd ed. Cambridge Studies in Advanced Mathematics 73. Cambridge Univ. Press, Cambridge.
• Cai, T. T. and Yuan, M. (2014). Rate-optimal detection of very short signal segments. Available at arXiv:1407.2812.
• Chatterjee, S., Diaconis, P. and Sly, A. (2011). Random graphs with a given degree sequence. Ann. Appl. Probab. 21 1400–1435.
• Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
• Fienberg, S. E. and Wasserman, S. (1981). Categorical data analysis of single sociometric relations. Sociol. Method. 12 156–192.
• Goodreau, S. M. (2007). Advances in exponential random graph (p∗) models applied to a large social network. Soc. Netw. 29 231–248.
• Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
• Hara, H. and Takemura, A. (2010). Connecting tables with zero-one entries by a subset of a Markov basis. In Algebraic Methods in Statistics and Probability II. Contemp. Math. 516 199–213. Amer. Math. Soc., Providence, RI.
• Hillar, C. and Wibisono, A. (2013). Maximum entropy distributions on graphs. Available at arXiv:1301.3321.
• Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. J. Amer. Statist. Assoc. 76 33–65.
• Ingster, Y. I. and Suslina, I. A. (2003). Nonparametric Goodness-of-Fit Testing Under Gaussian Models. Lecture Notes in Statistics 169. Springer, New York.
• Ingster, Y. I., Tsybakov, A. B. and Verzelen, N. (2010). Detection boundary in sparse regression. Electron. J. Stat. 4 1476–1526.
• Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107.
• Karwa, V. and Slavković, A. (2016). Inference using noisy degrees: Differentially private $\beta$-model and synthetic graphs. Ann. Statist. 44 87–112.
• Lauritzen, S. L. (2002). Rasch models with exchangeable rows and columns. Research Report Series, No. R-02-2005, Dept. Mathematical Sciences, Aalborg Univ.
• Lauritzen, S. L. (2008). Exchangeable Rasch matrices. Rend. Mat. Appl. (7) 28 83–95.
• Mukherjee, R., Mukherjee, S. and Sen, S. (2018). Supplement to “Detection thresholds for the $\beta$-model on sparse graphs.” DOI:10.1214/17-AOS1585SUPP.
• Mukherjee, R., Pillai, N. S. and Lin, X. (2015). Hypothesis testing for high-dimensional sparse binary regression. Ann. Statist. 43 352–381.
• Ogawa, M., Hara, H. and Takemura, A. (2013). Graver basis for an undirected graph and its application to testing the beta model of random graphs. Ann. Inst. Statist. Math. 65 191–212.
• Perry, P. O. and Wolfe, P. J. (2012). Null models for network data. Available at arXiv:1201.5871.
• Petrović, S., Rinaldo, A. and Fienberg, S. E. (2010). Algebraic statistics for a directed random graph model with reciprocation. In Algebraic Methods in Statistics and Probability II. Contemp. Math. 516 261–283. Amer. Math. Soc., Providence, RI.
• Rinaldo, A., Petrović, S. and Fienberg, S. E. (2013). Maximum likelihood estimation in the $\beta$-model. Ann. Statist. 41 1085–1110.
• Robins, G., Pattison, P., Kalish, Y. and Lusher, D. (2007). An introduction to exponential random graph (p∗) models for social networks. Soc. Netw. 29 173–191.
• Verzelen, N. and Arias-Castro, E. (2015). Community detection in sparse random networks. Ann. Appl. Probab. 25 3465–3510.
• Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature 393 440–442.
• Yan, T., Qin, H. and Wang, H. (2016). Asymptotics in undirected random graph models parameterized by the strengths of vertices. Statist. Sinica 26 273–293.
• Yan, T. and Xu, J. (2013). A central limit theorem in the $\beta$-model for undirected random graphs with a diverging number of vertices. Biometrika 100 519–524.
• Yan, T., Zhao, Y. and Qin, H. (2015). Asymptotic normality in the maximum entropy models on graphs with an increasing number of parameters. J. Multivariate Anal. 133 61–76.
• Yan, X., Shalizi, C., Jensen, J. E., Krzakala, F., Moore, C., Zdeborová, L., Zhang, P. and Zhu, Y. (2014). Model selection for degree-corrected block models. J. Stat. Mech. Theory Exp. 2014 P05007.

#### Supplemental materials

• Supplement to “Detection thresholds for the $\beta$ model on sparse graphs”. The supplementary material contain the proofs of additional technical results.