## The Annals of Statistics

### Combinatorial inference for graphical models

#### Abstract

We propose a new family of combinatorial inference problems for graphical models. Unlike classical statistical inference where the main interest is point estimation or parameter testing, combinatorial inference aims at testing the global structure of the underlying graph. Examples include testing the graph connectivity, the presence of a cycle of certain size, or the maximum degree of the graph. To begin with, we study the information-theoretic limits of a large family of combinatorial inference problems. We propose new concepts including structural packing and buffer entropies to characterize how the complexity of combinatorial graph structures impacts the corresponding minimax lower bounds. On the other hand, we propose a family of novel and practical structural testing algorithms to match the lower bounds. We provide numerical results on both synthetic graphical models and brain networks to illustrate the usefulness of these proposed methods.

#### Article information

Source
Ann. Statist., Volume 47, Number 2 (2019), 795-827.

Dates
Revised: August 2017
First available in Project Euclid: 11 January 2019

Permanent link to this document
https://projecteuclid.org/euclid.aos/1547197239

Digital Object Identifier
doi:10.1214/17-AOS1650

Mathematical Reviews number (MathSciNet)
MR3909951

Zentralblatt MATH identifier
07033152

Subjects
Primary: 62F03: Hypothesis testing 62F04 62H15: Hypothesis testing

#### Citation

Neykov, Matey; Lu, Junwei; Liu, Han. Combinatorial inference for graphical models. Ann. Statist. 47 (2019), no. 2, 795--827. doi:10.1214/17-AOS1650. https://projecteuclid.org/euclid.aos/1547197239

#### References

• Addario-Berry, L., Broutin, N., Devroye, L. and Lugosi, G. (2010). On combinatorial testing problems. Ann. Statist. 38 3063–3092.
• Arias-Castro, E., Bubeck, S. and Lugosi, G. (2012). Detection of correlations. Ann. Statist. 40 412–435.
• Arias-Castro, E., Bubeck, S. and Lugosi, G. (2015). Detecting positive correlations in a multivariate sample. Bernoulli 21 209–241.
• Arias-Castro, E., Candès, E. J. and Durand, A. (2011). Detection of an anomalous cluster in a network. Ann. Statist. 39 278–304.
• Arias-Castro, E., Candès, E. J. and Plan, Y. (2011). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist. 39 2533–2556.
• Arias-Castro, E., Bubeck, S., Lugosi, G. and Verzelen, N. (2015). Detecting Markov random fields hidden in white noise. Preprint. Available at arXiv:1504.06984.
• Baraud, Y. (2002). Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8 577–606.
• Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
• Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780–1815.
• Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
• Chen, M., Ren, Z., Zhao, H. and Zhou, H. (2016). Asymptotically normal and efficient estimation of covariate-adjusted Gaussian graphical model. J. Amer. Statist. Assoc. 111 394–406.
• Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. 41 2786–2819.
• Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
• Dubhashi, D. and Ranjan, D. (1998). Balls and bins: A study in negative dependence. Random Structures Algorithms 13 99–124.
• Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
• Gu, Q., Cao, Y., Ning, Y. and Liu, H. (2015). Local and global inference for high dimensional Gaussian copula graphical models. Preprint. Available at arXiv:1502.02347.
• Ingster, Y. I. (1982). Minimax nonparametric detection of signals in white Gaussian noise. Problemy Peredachi Informatsii 18 61–73.
• Ingster, Y. I., Tsybakov, A. B. and Verzelen, N. (2010). Detection boundary in sparse regression. Electron. J. Stat. 4 1476–1526.
• Janková, J. and van de Geer, S. (2015). Confidence intervals for high-dimensional inverse covariance estimation. Electron. J. Stat. 9 1205–1229.
• Joag-Dev, K. and Proschan, F. (1983). Negative association of random variables, with applications. Ann. Statist. 11 286–295.
• Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
• Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278.
• Le Cam, L. (1973). Convergence of estimates under dimensionality restrictions. Ann. Statist. 1 38–53.
• Liu, W. (2013). Gaussian graphical model estimation with false discovery rate control. Ann. Statist. 41 2948–2978.
• Liu, H., Lafferty, J. and Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10 2295–2328.
• Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
• Neykov, M., Lu, J. and Liu, H. (2018a). Supplement to “Combinatorial inference for graphical models.” DOI:10.1214/17-AOS1650SUPP.
• Neykov, M., Ning, Y., Liu, J. S. and Liu, H. (2018b). A unified theory of confidence regions and testing for high dimensional estimating equations. Statist. Sci 33 427–443.
• Raskutti, G., Yu, B., Wainwright, M. J. and Ravikumar, P. K. (2008). Model selection in Gaussian graphical models: High-dimensional consistency of $\ell_{1}$-regularized MLE. In Advances in Neural Information Processing Systems 1329–1336.
• Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing $\ell_{1}$-penalized log-determinant divergence. Electron. J. Stat. 5 935–980.
• Ren, Z., Sun, T., Zhang, C.-H. and Zhou, H. H. (2015). Asymptotic normality and optimalities in estimation of large Gaussian graphical models. Ann. Statist. 43 991–1026.
• Romano, J. P. and Wolf, M. (2005). Exact and approximate stepdown methods for multiple hypothesis testing. J. Amer. Statist. Assoc. 100 94–108.
• Santhanam, N. P. and Wainwright, M. J. (2012). Information-theoretic limits of selecting binary graphical models in high dimensions. IEEE Trans. Inform. Theory 58 4117–4134.
• Verzelen, N. and Villers, F. (2010). Goodness-of-fit tests for high-dimensional Gaussian linear models. Ann. Statist. 38 704–752.
• Wang, W., Wainwright, M. J. and Ramchandran, K. (2010). Information-theoretic bounds on model selection for Gaussian Markov random fields. In ISIT 1373–1377.
• Yang, Y. and Barron, A. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564–1599.
• Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
• Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217–242.

#### Supplemental materials

• Supplement to “Combinatorial inference for graphical models”. The Supplementary Material contains proofs and derivations of some of the main results of the paper, as well as simulation results and real data analysis.