## Statistical Science

### A Unified Theory of Confidence Regions and Testing for High-Dimensional Estimating Equations

#### Abstract

We propose a new inferential framework for constructing confidence regions and testing hypotheses in statistical models specified by a system of high-dimensional estimating equations. We construct an influence function by projecting the fitted estimating equations to a sparse direction obtained by solving a large-scale linear program. Our main theoretical contribution is to establish a unified Z-estimation theory of confidence regions for high-dimensional problems. Different from existing methods, all of which require the specification of the likelihood or pseudo-likelihood, our framework is likelihood-free. As a result, our approach provides valid inference for a broad class of high-dimensional constrained estimating equation problems, which are not covered by existing methods. Such examples include, noisy compressed sensing, instrumental variable regression, undirected graphical models, discriminant analysis and vector autoregressive models. We present detailed theoretical results for all these examples. Finally, we conduct thorough numerical simulations, and a real dataset analysis to back up the developed theoretical results.

#### Article information

Source
Statist. Sci., Volume 33, Number 3 (2018), 427-443.

Dates
First available in Project Euclid: 13 August 2018

https://projecteuclid.org/euclid.ss/1534147231

Digital Object Identifier
doi:10.1214/18-STS661

Mathematical Reviews number (MathSciNet)
MR3843384

#### Citation

Neykov, Matey; Ning, Yang; Liu, Jun S.; Liu, Han. A Unified Theory of Confidence Regions and Testing for High-Dimensional Estimating Equations. Statist. Sci. 33 (2018), no. 3, 427--443. doi:10.1214/18-STS661. https://projecteuclid.org/euclid.ss/1534147231

#### References

• Barber, R. F. and Kolar, M. (2015). Rocket: Robust confidence intervals via Kendall’s tau for transelliptical graphical models. Preprint. Available at arXiv:1502.07641.
• Belloni, A., Chernozhukov, V. and Hansen, C. (2014). Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 81 608–650.
• Belloni, A., Chernozhukov, V. and Kato, K. (2015). Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems. Biometrika 102 77–94.
• Belloni, A., Chernozhukov, V. and Wei, Y. (2013). Honest confidence regions for logistic regression with a large number of controls. Preprint. Available at arXiv:1304.3969.
• Cai, T. T. and Guo, Z. (2017). Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity. Ann. Statist. 45 615–646.
• Cai, T. T., Liang, T. and Rakhlin, A. (2014). Geometrizing local rates of convergence for linear inverse problems. Preprint. Available at arXiv:1404.4408.
• Cai, T. and Liu, W. (2011). A direct estimation approach to sparse linear discriminant analysis. J. Amer. Statist. Assoc. 106 1566–1577.
• Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
• Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
• Chen, M., Ren, Z., Zhao, H. and Zhou, H. (2016). Asymptotically normal and efficient estimation of covariate-adjusted Gaussian graphical model. J. Amer. Statist. Assoc. 111 394–406.
• Chernozhukov, V., Chetverikov, D. and Kato, K. (2014). Gaussian approximation of suprema of empirical processes. Ann. Statist. 42 1564–1597.
• Fan, J. and Lv, J. (2011). Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans. Inform. Theory 57 5467–5484.
• Gautier, E. and Tsybakov, A. (2011). High-dimensional instrumental variables regression and confidence sets. Preprint. Available at arXiv:1105.2454.
• Godambe, V. P. (1991). Estimating functions. Clarendon Press, Oxford.
• Gu, Q., Cao, Y., Ning, Y. and Liu, H. (2015). Local and global inference for high dimensional gaussian copula graphical models. Preprint. Available at arXiv:1502.02347.
• Han, F., Lu, H. and Liu, H. (2015). A direct estimation of high dimensional stationary vector autoregressions. J. Mach. Learn. Res. 16 3115–3150.
• Holmes, K., Roberts, O. L., Thomas, A. M. and Cross, M. J. (2007). Vascular endothelial growth factor receptor-2: Structure, function, intracellular signalling and therapeutic inhibition. Cellular Signalling 19 2003–2012.
• Janková, J. and van de Geer, S. (2015). Confidence intervals for high-dimensional inverse covariance estimation. Electron. J. Stat. 9 1205–1229.
• Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 15 2869–2909.
• Lee, J. D., Sun, D. L., Sun, Y. and Taylor, J. E. (2013). Exact inference after model selection via the lasso. Preprint. Available at arXiv:1311.6238.
• Liu, W. (2013). Gaussian graphical model estimation with false discovery rate control. Ann. Statist. 41 2948–2978.
• Liu, H., Han, F. and Zhang, C.-H. (2012). Transelliptical graphical models. In Advances in Neural Information Processing Systems.
• Lockhart, R., Taylor, J., Tibshirani, R. J. and Tibshirani, R. (2014). A significance test for the lasso. Ann. Statist. 42 413–468.
• Loh, P.-L. (2017). Statistical consistency and asymptotic normality for high-dimensional robust $M$-estimators. Ann. Statist. 45 866–896.
• Lu, S., Liu, Y., Yin, L. and Zhang, K. (2015). Confidence intervals and regions for the lasso using stochastic variational inequality techniques in optimization. Technical report.
• Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis: Probability and Mathematical Statistics. Academic Press, London.
• Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 417–473.
• Meinshausen, N., Meier, L. and Bühlmann, P. (2009). $p$-values for high-dimensional regression. J. Amer. Statist. Assoc. 104 1671–1681.
• Newey, W. K. and McFadden, D. (1994). Large sample estimation and hypothesis testing. In Handbook of Econometrics, Vol. IV. Handbooks in Econom. 2 2111–2245. North-Holland, Amsterdam.
• Neykov, M., Ning, Y., Liu, J. S and Liu, H. (2018). Supplement to “A Unified Theory of Confidence Regions and Testing for High Dimensional Estimating Equations.” DOI:10.1214/18-STS661SUPP.
• Nickl, R. and van de Geer, S. (2013). Confidence sets in sparse regression. Ann. Statist. 41 2852–2876.
• Ning, Y. and Liu, H. (2013). High-dimensional semiparametric bigraphical models. Biometrika 100 655–670.
• Ning, Y. and Liu, H. (2014). Sparc: Optimal estimation and asymptotic inference under semiparametric sparsity. Preprint. Available at arXiv:1412.2295.
• Ning, Y. and Liu, H. (2017). A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann. Statist. 45 158–195.
• Ren, Z., Sun, T., Zhang, C.-H. and Zhou, H. H. (2015). Asymptotic normality and optimalities in estimation of large Gaussian graphical models. Ann. Statist. 43 991–1026.
• Shah, R. D. and Samworth, R. J. (2013). Variable selection with error control: Another look at stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 55–80.
• Small, C. G. and Yang, Z. (1999). Multiple roots of estimating functions. Canad. J. Statist. 27 585–598.
• Taylor, J., Lockhart, R., Tibshirani, R. J. and Tibshirani, R. (2014). Post-selection adaptive inference for least angle regression and the lasso. Preprint. Available at arXiv:1401.3889.
• Tian, X. and Taylor, J. (2018). Selective inference with a randomized response. Ann. Statist. 46 679–710.
• van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42 1166–1202.
• van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
• Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge Univ. Press, Cambridge.
• Voorman, A., Shojaie, A. and Witten, D. (2014). Inference in high dimensions with the penalized score test. Preprint. Available at arXiv:1401.2678.
• Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178–2201.
• Zahn, J. M., Poosala, S., Owen, A. B., Ingram, D. K., Lustig, A., Carter, A., Weeraratna, A. T., Taub, D. D., Gorospe, M., Mazan-Mamczarz, K. et al. (2007). AGEMAP: A gene expression database for aging in mice. PLoS Genet. 3 e201.
• Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217–242.
• Zhao, S. D. (2012). Survival analysis with high-dimensional covariates, with applications to cancer genomics. Ph.D. thesis, Harvard Univ.
• Zhu, Y. and Bradic, J. (2016). Linear hypothesis testing in dense high-dimensional linear models. Preprint. Available at arXiv:1610.02987.

#### Supplemental materials

• Supplement to “A Unified Theory of Confidence Regions and Testing for High-Dimensional Estimating Equations”. This is the supplementary material to “A Unified Theory of Confidence Regions and Testing for High-Dimensional Estimating Equations” by M. Neykov, Y. Ning, H. Liu and J. Liu.