## The Annals of Statistics

### Uniformly valid post-regularization confidence regions for many functional parameters in z-estimation framework

#### Abstract

In this paper, we develop procedures to construct simultaneous confidence bands for ${\tilde{p}}$ potentially infinite-dimensional parameters after model selection for general moment condition models where ${\tilde{p}}$ is potentially much larger than the sample size of available data, $n$. This allows us to cover settings with functional response data where each of the ${\tilde{p}}$ parameters is a function. The procedure is based on the construction of score functions that satisfy Neyman orthogonality condition approximately. The proposed simultaneous confidence bands rely on uniform central limit theorems for high-dimensional vectors (and not on Donsker arguments as we allow for ${{\tilde{p}}\gg n}$). To construct the bands, we employ a multiplier bootstrap procedure which is computationally efficient as it only involves resampling the estimated score functions (and does not require resolving the high-dimensional optimization problems). We formally apply the general theory to inference on regression coefficient process in the distribution regression model with a logistic link, where two implementations are analyzed in detail. Simulations and an application to real data are provided to help illustrate the applicability of the results.

#### Article information

Source
Ann. Statist., Volume 46, Number 6B (2018), 3643-3675.

Dates
Received: February 2016
Revised: October 2017
First available in Project Euclid: 11 September 2018

Permanent link to this document
https://projecteuclid.org/euclid.aos/1536631286

Digital Object Identifier
doi:10.1214/17-AOS1671

Mathematical Reviews number (MathSciNet)
MR3852664

Zentralblatt MATH identifier
1407.62268

Subjects
Primary: 62-07: Data analysis
Secondary: 62H99: None of the above, but in this section

#### Citation

Belloni, Alexandre; Chernozhukov, Victor; Chetverikov, Denis; Wei, Ying. Uniformly valid post-regularization confidence regions for many functional parameters in z-estimation framework. Ann. Statist. 46 (2018), no. 6B, 3643--3675. doi:10.1214/17-AOS1671. https://projecteuclid.org/euclid.aos/1536631286

#### References

• [1] Andrews, D. W. K. (1994). Asymptotics for semiparametric econometric models via stochastic equicontinuity. Econometrica 62 43–72.
• [2] Belloni, A., Chen, D., Chernozhukov, V. and Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 2369–2429.
• [3] Belloni, A. and Chernozhukov, V. (2011). $\ell_{1}$-Penalized quantile regression for high dimensional sparse models. Ann. Statist. 39 82–130.
• [4] Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli 19 521–547. Available at arXiv:1001.0188.
• [5] Belloni, A., Chernozhukov, V., Chetverikov, D. and Wei, Y. (2018). Supplement to “Uniformly valid post-regularization confidence regions for many functional parameters in z-estimation framework.” DOI:10.1214/17-AOS1671SUPP.
• [6] Belloni, A., Chernozhukov, V., Fernández-Val, I. and Hansen, C. (2013). Program evaluation with high-dimensional data. Available at arXiv:1311.2645.
• [7] Belloni, A., Chernozhukov, V. and Hansen, C. (2010). Lasso methods for Gaussian instrumental variables models. Available at arXiv:1012.1297.
• [8] Belloni, A., Chernozhukov, V. and Hansen, C. (2013). Inference for high-dimensional sparse econometric models. In Advances in Economics and Econometrics. 10th World Congress of Econometric Society, August 2010, Vol. III. 245–295. Available at arXiv:1201.0220.
• [9] Belloni, A., Chernozhukov, V. and Hansen, C. (2014). Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 81 608–650.
• [10] Belloni, A., Chernozhukov, V. and Kato, K. (2013). Valid post-selection inference in high-dimensional approximately sparse quantile regression models. Available at arXiv:1312.7186.
• [11] Belloni, A., Chernozhukov, V. and Kato, K. (2015). Uniform post selection inference for LAD regression models and other Z-estimators. Biometrika 102 77–94.
• [12] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root-lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98 791–806.
• [13] Belloni, A., Chernozhukov, V. and Wang, L. (2014). Pivotal estimation via square-root Lasso in nonparametric regression. Ann. Statist. 42 757–788.
• [14] Chamberlain, G. (1992). Efficiency bounds for semiparametric regression. Econometrica 60 567–596.
• [15] Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econom. J. 21 C1–C68.
• [16] Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. 41 2786–2819.
• [17] Chernozhukov, V., Chetverikov, D. and Kato, K. (2014). Anti-concentration and honest, adaptive confidence bands. Ann. Statist. 42 1787–1818.
• [18] Chernozhukov, V., Chetverikov, D. and Kato, K. (2017). Central limit theorems and bootstrap in high dimensions. Ann. Probab. 4 2309–2352.
• [19] Chernozhukov, V., Chetverikov, D. and Kato, K. (2014). Gaussian approximation of suprema of empirical processes. Ann. Statist. 42 1564–1597.
• [20] Chernozhukov, V., Chetverikov, D. and Kato, K. (2015). Comparison and anti-concentration bounds for maxima of Gaussian random vectors. Probab. Theory Related Fields 162 47–70.
• [21] Chernozhukov, V., Chetverikov, D. and Kato, K. (2015). Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related Gaussian couplings. Available at arXiv:1502.00352.
• [22] Chernozhukov, V., Fernández-Val, I. and Melly, B. (2013). Inference on counterfactual distributions. Econometrica 81 2205–2268.
• [23] Chernozhukov, V., Hansen, C. and Spindler, M. (2015). Post-selection and post-regularization inference in linear models with very many controls and instruments. Am. Econ. Rev. Pap. Proc. 105 486–490.
• [24] Deng, H. and Zhang, C.-H. (2017). Beyond Gaussian approximation: Bootstrap for maxima of sums of independent random vectors. Available at arXiv:1705.09528.
• [25] Dudley, R. (1999). Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathematics 63. Cambridge Univ. Press, Cambridge.
• [26] Hothorn, T., Kneib, T. and Bühlmann, P. (2014). Conditional transformation models. J. Roy. Statist. Soc. Ser. B 76 3–27.
• [27] Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 15 2869–2909.
• [28] Javanmard, A. and Montanari, A. (2014). Hypothesis testing in high-dimensional regression under the Gaussian random design model: Asymptotic theory. IEEE Trans. Inform. Theory 60 6522–6554.
• [29] Kosorok, M. (2008). Introduction to Empirical Processes and Semiparametric Inference. Springer, Berlin.
• [30] Leeb, H. and Pötscher, B. (2008). Can one estimate the unconditional distribution of post-model-selection estimators? Econometric Theory 24 338–376.
• [31] Leeb, H. and Pötscher, B. (2008). Recent developments in model selection and related areas. Econometric Theory 24 319–322.
• [32] Leeb, H. and Pötscher, B. M. (2008). Sparse estimators and the oracle property, or the return of Hodges’ estimator. J. Econometrics 142 201–211.
• [33] Linton, O. (1996). Edgeworth approximation for MINPIN estimators in semiparametric regression models. Econometric Theory 12 30–60. Cowles Foundation Discussion Papers 1086 (1994).
• [34] Mammen, E. (1993). Bootstrap and wild bootstrap for high dimensional linear models. Ann. Statist. 21 255–285.
• [35] Newey, W. (1990). Semiparametric efficiency bounds. J. Appl. Econometrics 5 99–135.
• [36] Newey, W. (1994). The asymptotic variance of semiparametric estimators. Econometrica 62 1349–1382.
• [37] Neyman, J. (1959). Optimal asymptotic tests of composite statistical hypotheses. In Probability and Statistics: The Harald Cramér Volume (U. Grenander, ed.) 213–234. Almqvist & Wiksell, Stockholm.
• [38] Neyman, J. (1979). $c(\alpha)$ tests and their use. Sankhyā 41 1–21.
• [39] Ning, Y. and Liu, H. (2014). A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Available at arXiv:1412.8765.
• [40] Pötscher, B. and Leeb, H. (2009). On the distribution of penalized maximum likelihood estimators: The LASSO, SCAD, and thresholding. J. Multivariate Anal. 100 2065–2082.
• [41] Robins, J. and Rotnitzky, A. (1995). Semiparametric efficiency in multivariate regression models with missing data. J. Amer. Statist. Assoc. 90 122–129.
• [42] Stein, C. (1956). Efficient nonparametric testing and estimation. In Proc. 3rd Berkeley Symp. Math. Statist. and Probab. 1 187–195. Univ. California Press, Berkeley, CA.
• [43] van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42 1166–1202.
• [44] van der Vaart, A. (1998). Asymptotic Statistics. Cambridge Univ. Press, Cambridge.
• [45] van der Vaart, A. and Wellner, J. (1996). Weak Convergence and Empirical Processes.
• [46] Zhang, C.-H. and Zhang, S. (2014). Confidence intervals for low-dimensional parameters with high-dimensional data. J. Roy. Statist. Soc. Ser. B 76 217–242.
• [47] Zhao, T., Kolar, M. and Liu, H. (2014). A general framework for robust testing and confidence regions in high-dimensional quantile regression. Available at arXiv:1412.8724.

#### Supplemental materials

• Supplement to “Uniformly valid post-regularization confidence regions for many functional parameters in z-estimation framework”. The supplemental material contains additional proofs omitted in the main text, a discussion of the double selection method, a set of new results for $\ell_{1}$-penalized $M$-estimators with functional data, additional simulation results, and an empirical application.