Electronic Journal of Statistics

Berry-Esseen bounds for estimating undirected graphs

Larry Wasserman, Mladen Kolar, and Alessandro Rinaldo

Full-text: Open access

Abstract

We consider the problem of providing nonparametric confidence guarantees — with finite sample Berry-Esseen bounds — for undirected graphs under weak assumptions. We do not assume sparsity or incoherence. We allow the dimension $D$ to increase with the sample size $n$. First, we prove lower bounds that show that if we want accurate inferences with weak assumptions then $D$ must be less than $n$. In that case, we show that methods based on Normal approximations and on the bootstrap lead to valid inferences and we provide new Berry-Esseen bounds on the accuracy of the Normal approximation and the bootstrap. When the dimension is large relative to sample size, accurate inferences for graphs under weak assumptions are not possible. Instead we propose to estimate something less demanding than the entire partial correlation graph. In particular, we consider: cluster graphs, restricted partial correlation graphs and correlation graphs.

Article information

Source
Electron. J. Statist., Volume 8, Number 1 (2014), 1188-1224.

Dates
First available in Project Euclid: 12 August 2014

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1407848859

Digital Object Identifier
doi:10.1214/14-EJS928

Mathematical Reviews number (MathSciNet)
MR3263117

Zentralblatt MATH identifier
1298.62089

Subjects
Primary: 62H12: Estimation
Secondary: 62H10: Distribution of statistics

Keywords
Graphical models high dimensional inference

Citation

Wasserman, Larry; Kolar, Mladen; Rinaldo, Alessandro. Berry-Esseen bounds for estimating undirected graphs. Electron. J. Statist. 8 (2014), no. 1, 1188--1224. doi:10.1214/14-EJS928. https://projecteuclid.org/euclid.ejs/1407848859


Export citation

References

  • Bergsma, W. (2011). A note on the distribution of the partial correlation coefficient with nonparametrically estimated marginal regressions. arXiv:1101.4616.
  • Boik, R. and Haaland, B. (2006). Second-order accurate inference on simple, partial, and multiple correlations. Journal of Modern Applied Statistical Methods 5 283–308.
  • Castelo, R. and Roverato, A. (2006). A robust procedure for Gaussian graphical model search from microarray data with $p$ larger than $n$. The Journal of Machine Learning Research 7 2621–2650.
  • Chen, L. H. and Shao, Q.-M. (2007). Normal approximation for nonlinear statistics using a concentration inequality approach. Bernoulli 581–599.
  • Chernozhukov, V., Chetverikov, D. and Kato, K. (2012). Central limit theorem and multiplier boostrap when $p$ is much larger than $n$. arXiv:1212.6906.
  • Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Comparison and anti-concentration bounds for maxima of Gaussian random vectors. arXiv:1301.4807.
  • Devroye, L. and Lugosi, G. (2001). Combinatorial Methods in Density Estimation. Springer Verlag.
  • Drton, M. and Perlman, M. D. (2004). Model selection for Gaussian concentration graphs. Biometrika 91 591–602.
  • Friedman, J. and Tibshirani, R. (2007). Graphical lasso.
  • Harris, N. and Drton, M. (2012). PC algorithm for Gaussian copula graphical models. arXiv:1207.0242.
  • Horn, R. A. and Johnson, C. R. (1990). Matrix analysis. Cambridge university press.
  • Jalali, A., Johnson, C. C. and Ravikumar, P. D. (2011). On Learning Discrete Graphical Models using Greedy Methods. In NIPS 1935–1943.
  • Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis 88 365–411.
  • Liu, W. (2013). Gaussian graphical model estimation with false discovery rate control. arXiv:1306.0976.
  • Magnus, X. and Neudecker, H. (1988). Matrix differential calculus. New York.
  • Mammen, E. (1993). Bootstrap and wild bootstrap for high dimensional linear models. The Annals of Statistics 255–285.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics 34 1436–1462.
  • Pinelis, I. and Molzon, R. (2013). Berry-Esseen bounds for general nonlinear statistics, with applications to Pearson’s and non-central Student’s and Hotelling’s. arXiv:0906.0177.
  • Portnoy, S. (1988). Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. The Annals of Statistics 16 356–366.
  • Ravikumar, P., Wainwright, M. J., Raskutti, G., Yu, B. et al. (2011). High-dimensional covariance estimation by minimizing $\ell_{1}$-penalized log-determinant divergence. Electronic Journal of Statistics 5 935–980.
  • Ren, Z., Sun, T., Zhange, C.-H. and Zhou, H. (2013). Asymptotic normality and optimalities in estimation of large Gaissian graphical models. Manuscript.
  • Schäfer, J., Strimmer, K. et al. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology 4 32.
  • Vershynin, R. (2010). Introduction to the non-asymptotic analysis of random matrices. arXiv:1011.3027.
  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.