Network sampling is an indispensable tool for understanding features of large complex networks where it is practically impossible to search over the entire graph. In this paper, we develop a framework for statistical inference for counting network motifs, such as edges, triangles and wedges, in the widely used subgraph sampling model, where each vertex is sampled independently, and the subgraph induced by the sampled vertices is observed. We derive necessary and sufficient conditions for the consistency and the asymptotic normality of the natural Horvitz–Thompson (HT) estimator, which can be used for constructing confidence intervals and hypothesis testing for the motif counts based on the sampled graph. In particular, we show that the asymptotic normality of the HT estimator exhibits an interesting fourth-moment phenomenon, which asserts that the HT estimator (appropriately centered and rescaled) converges in distribution to the standard normal whenever its fourth-moment converges to 3 (the fourth-moment of the standard normal distribution). As a consequence, we derive the exact thresholds for consistency and asymptotic normality of the HT estimator in various natural graph ensembles, such as sparse graphs with bounded degree, Erdős–Rényi random graphs, random regular graphs and dense graphons.
The first author was supported by NSF CAREER Grant DMS-2046393 and a Sloan Research Fellowship.
The third author was supported by NSF Grant DMS-1712037.
The authors thank Sohom Bhattacharya for pointing out , and Jason Klusowski for helpful discussions. The authors also thank the Associate Editor and the anonymous referees for their detailed and thoughtful comments, which greatly improved the quality and the presentation of the paper.
"Motif estimation via subgraph sampling: The fourth-moment phenomenon." Ann. Statist. 50 (2) 987 - 1011, April 2022. https://doi.org/10.1214/21-AOS2134