The Annals of Statistics

ROCKET: Robust confidence intervals via Kendall’s tau for transelliptical graphical models

Abstract

Understanding complex relationships between random variables is of fundamental importance in high-dimensional statistics, with numerous applications in biological and social sciences. Undirected graphical models are often used to represent dependencies between random variables, where an edge between two random variables is drawn if they are conditionally dependent given all the other measured variables. A large body of literature exists on methods that estimate the structure of an undirected graphical model, however, little is known about the distributional properties of the estimators beyond the Gaussian setting. In this paper, we focus on inference for edge parameters in a high-dimensional transelliptical model, which generalizes Gaussian and nonparanormal graphical models. We propose ROCKET, a novel procedure for estimating parameters in the latent inverse covariance matrix. We establish asymptotic normality of ROCKET in an ultra high-dimensional setting under mild assumptions, without relying on oracle model selection results. ROCKET requires the same number of samples that are known to be necessary for obtaining a $\sqrt{n}$ consistent estimator of an element in the precision matrix under a Gaussian model. Hence, it is an optimal estimator under a much larger family of distributions. The result hinges on a tight control of the sparse spectral norm of the nonparametric Kendall’s tau estimator of the correlation matrix, which is of independent interest. Empirically, ROCKET outperforms the nonparanormal and Gaussian models in terms of achieving accurate inference on simulated data. We also compare the three methods on real data (daily stock returns), and find that the ROCKET estimator is the only method whose behavior across subsamples agrees with the distribution predicted by the theory.

Article information

Source
Ann. Statist., Volume 46, Number 6B (2018), 3422-3450.

Dates
Revised: April 2017
First available in Project Euclid: 11 September 2018

https://projecteuclid.org/euclid.aos/1536631279

Digital Object Identifier
doi:10.1214/17-AOS1663

Mathematical Reviews number (MathSciNet)
MR3852657

Subjects
Primary: 62G10: Hypothesis testing
Secondary: 62F12: Asymptotic properties of estimators 62G20: Asymptotic properties

Citation

Barber, Rina Foygel; Kolar, Mladen. ROCKET: Ro bust c onfidence intervals via Ke ndall’s t au for transelliptical graphical models. Ann. Statist. 46 (2018), no. 6B, 3422--3450. doi:10.1214/17-AOS1663. https://projecteuclid.org/euclid.aos/1536631279

References

• Barber, R. F. and Kolar, M. (2018). Supplement to “ROCKET: Robust confidence intervals via Kendall’s tau for transelliptical graphical models.” DOI:10.1214/17-AOS1663SUPP.
• Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli 19 521–547.
• Belloni, A., Chernozhukov, V. and Hansen, C. (2014). Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 81 608–650.
• Belloni, A., Chernozhukov, V. and Kato, K. (2013a). Uniform post selection inference for LAD regression models. Preprint. Available at arXiv:1304.0282.
• Belloni, A., Chernozhukov, V. and Kato, K. (2013b). Robust inference in high-dimensional approximately sparse quantile regression models. Preprint. Available at arXiv:1312.7186.
• Belloni, A., Chernozhukov, V. and Wei, Y. (2013). Honest confidence regions for logistic regression with a large number of controls. Preprint. Available at arXiv:1304.3969.
• Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
• Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
• Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
• Chen, M., Ren, Z., Zhao, H. and Zhou, H. (2016). Asymptotically normal and efficient estimation of covariate-adjusted Gaussian graphical model. J. Amer. Statist. Assoc. 111 394–406.
• Cheng, J., Li, T., Levina, E. and Zhu, J. (2017). High-dimensional mixed graphical models. J. Comput. Graph. Statist. 26 367–378.
• d’Aspremont, A., Banerjee, O. and El Ghaoui, L. (2008). First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30 56–66.
• de la Peña, V. H. and Giné, E. (1999). Decoupling: From Dependence to Independence. Springer, New York.
• Embrechts, P., Lindskog, F. and McNeil, A. (2003). Modelling dependence with copulas and applications to risk management. In Handbook of Heavy Tailed Distributions in Finance (S. T. Rachev, ed.) 329–384. Elsevier, Amsterdam.
• Fan, J., Feng, Y. and Wu, Y. (2009). Network exploration via the adaptive lasso and SCAD penalties. Ann. Appl. Stat. 3 521–541.
• Fan, J., Han, F. and Liu, H. (2014). PAGE: Robust pattern guided estimation of large covariance matrix. Technical report, Princeton Univ., Princeton, NJ.
• Fang, K. T., Kotz, S. and Ng, K. W. (1990). Symmetric Multivariate and Related Distributions. Monographs on Statistics and Applied Probability 36. Chapman and Hall, Ltd., London.
• Farrell, M. H. (2015). Robust inference on average treatment effects with possibly more covariates than observations. J. Econometrics 189 1–23.
• Friedman, J. H., Hastie, T. J. and Tibshirani, R. J. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
• Gu, Q., Cao, Y., Ning, Y. and Liu, H. (2015). Local and global inference for high dimensional Gaussian copula graphical models. Preprint. Available at arXiv:1502.02347.
• Guo, J., Levina, E., Michailidis, G. and Zhu, J. (2011a). Joint estimation of multiple graphical models. Biometrika 98 1–15.
• Guo, J., Levina, E., Michailidis, G. and Zhu, J. (2011b). Asymptotic properties of the joint neighborhood selection method for estimating categorical Markov networks. Technical report, Univ. Michigan, Ann Arbor, MI.
• Han, F. and Liu, H. (2013). Optimal rates of convergence for latent generalized correlation matrix estimation in transelliptical distribution. Preprint. Available at arXiv:1305.6916.
• Höfling, H. and Tibshirani, R. (2009). Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods. J. Mach. Learn. Res. 10 883–906.
• Javanmard, A. and Montanari, A. (2013). Nearly optimal sample size in hypothesis testing for high-dimensional regression. Preprint. Available at arXiv:1311.0274.
• Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 15 2869–2909.
• Klüppelberg, C., Kuhn, G. and Peng, L. (2008). Semi-parametric models for the multivariate tail dependence function—The asymptotically dependent case. Scand. J. Stat. 35 701–718.
• Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278.
• Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. The Clarendon Press, Oxford Univ. Press, New York.
• Lee, J. D. and Hastie, T. J. (2012). Learning mixed graphical models. Preprint. Available at arXiv:1205.5012.
• Lee, J. D., Sun, D. L., Sun, Y. and Taylor, J. E. (2016). Exact post-selection inference, with application to the lasso. Ann. Statist. 44 907–927.
• Lindskog, F., McNeil, A. and Schmock, U. (2003). Kendall’s tau for elliptical distributions. In Credit Risk 149–156.
• Liu, W. (2013). Gaussian graphical model estimation with false discovery rate control. Ann. Statist. 41 2948–2978.
• Liu, H., Han, F. and Zhang, C.-H. (2012). Transelliptical graphical models. In Proc. of NIPS 809–817.
• Liu, H., Lafferty, J. and Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10 2295–2328.
• Liu, H. and Wang, L. (2017). TIGER: A tuning-insensitive approach for optimally estimating Gaussian graphical models. Electron. J. Stat. 11 241–294.
• Liu, H., Han, F., Yuan, M., Lafferty, J. and Wasserman, L. (2012). High-dimensional semiparametric Gaussian copula graphical models. Ann. Statist. 40 2293–2326.
• Lockhart, R., Taylor, J., Tibshirani, R. J. and Tibshirani, R. (2014). A significance test for the lasso. Ann. Statist. 42 413–468.
• Loh, P.-L. and Wainwright, M. J. (2015). Regularized $M$-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Mach. Learn. Res. 16 559–616.
• Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
• Mitra, R. and Zhang, C.-H. (2014). Multivariate analysis of nonparametric estimates of large correlation matrices. Preprint. Available at arXiv:1403.6195.
• Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $\ell_{1}$-regularized logistic regression. Ann. Statist. 38 1287–1319.
• Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing $\ell_{1}$-penalized log-determinant divergence. Electron. J. Stat. 5 935–980.
• Ren, Z., Sun, T., Zhang, C.-H. and Zhou, H. H. (2015). Asymptotic normality and optimalities in estimation of large Gaussian graphical models. Ann. Statist. 43 991–1026.
• Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
• Sun, T. and Zhang, C.-H. (2012a). Comment: “Minimax estimation of large covariance matrices under $\ell_{1}$-norm”. Statist. Sinica 22 1354–1358.
• Sun, T. and Zhang, C.-H. (2012b). Sparse matrix inversion with scaled lasso. Preprint. Available at arXiv:1202.2723.
• Tibshirani, R. J., Taylor, J., Lockhart, R. and Tibshirani, R. (2016). Exact post-selection inference for sequential regression procedures. J. Amer. Statist. Assoc. 111 600–620.
• van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42 1166–1202.
• Wegkamp, M. and Zhao, Y. (2016). Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas. Bernoulli 22 1184–1226.
• Xue, L. and Zou, H. (2012). Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann. Statist. 40 2541–2571.
• Xue, L., Zou, H. and Cai, T. (2012). Nonconcave penalized composite conditional likelihood estimation of sparse Ising models. Ann. Statist. 40 1403–1429.
• Yang, E., Allen, G. I., Liu, Z. and Ravikumar, P. (2012). Graphical models via generalized linear models. In Advances in Neural Information Processing Systems 25 1358–1366. Curran Associates, Red Hook, NY.
• Yang, E., Baker, Y., Ravikumar, P., Allen, G. I. and Liu, Z. (2014). Mixed graphical models via exponential families. In Proc. 17th Int. Conf, Artif. Intel. Stat. 1042–1050.
• Yang, E., Ravikumar, P., Allen, G. I. and Liu, Z. (2015). Graphical models via univariate exponential family distributions. J. Mach. Learn. Res. 16 3813–3847.
• Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res. 11 2261–2286.
• Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
• Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217–242.
• Zhao, T. and Liu, H. (2014). Calibrated precision matrix estimation for high-dimensional elliptical distributions. IEEE Trans. Inform. Theory 60 7874–7887.

Supplemental materials

• Supplement to “ROCKET: Robust emphconfidence intervals via Kendall’s tau for transelliptical graphical models”. In the supplementary materials, we provide additional experimental results (as described in Section 5), as well as details for all proofs of the theoretical results provided in this paper.