The Annals of Statistics

High-dimensional semiparametric Gaussian copula graphical models

Han Liu, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman

Full-text: Open access

Abstract

We propose a semiparametric approach called the nonparanormal SKEPTIC for efficiently and robustly estimating high-dimensional undirected graphical models. To achieve modeling flexibility, we consider the nonparanormal graphical models proposed by Liu, Lafferty and Wasserman [J. Mach. Learn. Res. 10 (2009) 2295–2328]. To achieve estimation robustness, we exploit nonparametric rank-based correlation coefficient estimators, including Spearman’s rho and Kendall’s tau. We prove that the nonparanormal SKEPTIC achieves the optimal parametric rates of convergence for both graph recovery and parameter estimation. This result suggests that the nonparanormal graphical models can be used as a safe replacement of the popular Gaussian graphical models, even when the data are truly Gaussian. Besides theoretical analysis, we also conduct thorough numerical simulations to compare the graph recovery performance of different estimators under both ideal and noisy settings. The proposed methods are then applied on a large-scale genomic data set to illustrate their empirical usefulness. The R package huge implementing the proposed methods is available on the Comprehensive R Archive Network: http://cran.r-project.org/.

Article information

Source
Ann. Statist. Volume 40, Number 4 (2012), 2293-2326.

Dates
First available in Project Euclid: 23 January 2013

Permanent link to this document
https://projecteuclid.org/euclid.aos/1358951383

Digital Object Identifier
doi:10.1214/12-AOS1037

Mathematical Reviews number (MathSciNet)
MR3059084

Zentralblatt MATH identifier
1297.62073

Subjects
Primary: 62G05: Estimation
Secondary: 62G20: Asymptotic properties 62F12: Asymptotic properties of estimators

Keywords
High-dimensional statistics undirected graphical models Gaussian copula nonparanormal graphical models robust statistics minimax optimality biological regulatory networks

Citation

Liu, Han; Han, Fang; Yuan, Ming; Lafferty, John; Wasserman, Larry. High-dimensional semiparametric Gaussian copula graphical models. Ann. Statist. 40 (2012), no. 4, 2293--2326. doi:10.1214/12-AOS1037. https://projecteuclid.org/euclid.aos/1358951383.


Export citation

References

  • Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.
  • Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_1$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
  • Christensen, D. (2005). Fast algorithms for the calculation of Kendall’s $\tau$. Comput. Statist. 20 51–62.
  • Dempster, A. P. (1972). Covariance selection. Biometrics 28 157–175.
  • Drton, M. and Perlman, M. D. (2007). Multiple testing and error control in Gaussian graphical model selection. Statist. Sci. 22 430–449.
  • Drton, M. and Perlman, M. D. (2008). A SINful approach to Gaussian graphical model selection. J. Statist. Plann. Inference 138 1179–1200.
  • Friedman, J. H., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Statist. 19 293–325.
  • Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.
  • James, G. M., Radchenko, P. and Lv, J. (2009). DASSO: Connections between the Dantzig selector and lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 127–142.
  • Kendall, M. (1948). Rank Correlation Methods. Griffin, London.
  • Klaassen, C. A. J. and Wellner, J. A. (1997). Efficient estimation in the bivariate normal copula model: Normal margins are least favourable. Bernoulli 3 55–77.
  • Kruskal, W. H. (1958). Ordinal measures of association. J. Amer. Statist. Assoc. 53 814–861.
  • Lafferty, J., Liu, H. and Wasserman, L. (2012). Sparse nonparametric graphical models. Statist. Sci. To appear.
  • Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278.
  • Leek, J. T. and Storey, J. D. (2007). Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3 e161.
  • Liu, H., Lafferty, J. and Wasserman, L. (2008). Nonparametric regression and classification with joint sparsity constraints. In Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS) 969–976. Curran Associates, Inc., Red Hook, NY.
  • Liu, H., Lafferty, J. and Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10 2295–2328.
  • Liu, H., Roeder, K. and Wasserman, L. (2010). Stability approach to regularization selection (StARS) for high dimensional graphical models. In Proceedings of the Twenty-Third Annual Conference on Neural Information Processing Systems (NIPS). Curran Associates, Inc., Red Hook, NY.
  • Liu, H. and Zhang, J. (2009). Estimation consistency of the group lasso and its applications. J. Mach. Learn. Res. Proceedings Track 5 376–383.
  • Liu, H., Chen, X., Lafferty, J. and Wasserman, L. (2010). Graph-valued regression. In Proceedings of the Twenty-Third Annual Conference on Neural Information Processing Systems (NIPS). Curran Associates, Inc., Red Hook, NY.
  • Liu, H., Xu, M., Gu, H., Gupta, A., Lafferty, J. and Wasserman, L. (2011). Forest density estimation. J. Mach. Learn. Res. 12 907–951.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Pickands, J. III (1969). An iterated logarithm law for the maximum in a stationary Gaussian sequence. Z. Wahrsch. Verw. Gebiete 12 344–353.
  • Ravikumar, P., Wainwright, M., Raskutti, G. and Yu, B. (2009). Model selection in Gaussian graphical models: High-dimensional consistency of $\ell_1$-regularized MLE. In Advances in Neural Information Processing Systems 22. MIT Press, Cambridge, MA.
  • Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 1009–1030.
  • Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
  • Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Applications to Statistics. Wiley, New York.
  • Tsukahara, H. (2005). Semiparametric estimation in copula models. Canad. J. Statist. 33 357–375.
  • Xue, L. and Zou, H. (2012). Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann. Statist. To appear.
  • Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res. 11 2261–2286.
  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
  • Zimmerman, D. W., Zumbo, B. D. and Williams, R. H. (2003). Bias in estimation and hypothesis testing of correlation. Transformation 24 133–158.