The Annals of Statistics

Pivotal estimation via square-root Lasso in nonparametric regression

Alexandre Belloni, Victor Chernozhukov, and Lie Wang

Full-text: Open access


We propose a self-tuning $\sqrt{\mathrm {Lasso}} $ method that simultaneously resolves three important practical problems in high-dimensional regression analysis, namely it handles the unknown scale, heteroscedasticity and (drastic) non-Gaussianity of the noise. In addition, our analysis allows for badly behaved designs, for example, perfectly collinear regressors, and generates sharp bounds even in extreme cases, such as the infinite variance case and the noiseless case, in contrast to Lasso. We establish various nonasymptotic bounds for $\sqrt{\mathrm {Lasso}} $ including prediction norm rate and sparsity. Our analysis is based on new impact factors that are tailored for bounding prediction norm. In order to cover heteroscedastic non-Gaussian noise, we rely on moderate deviation theory for self-normalized sums to achieve Gaussian-like results under weak conditions. Moreover, we derive bounds on the performance of ordinary least square (ols) applied to the model selected by $\sqrt{\mathrm {Lasso}} $ accounting for possible misspecification of the selected model. Under mild conditions, the rate of convergence of ols post $\sqrt{\mathrm {Lasso}} $ is as good as $\sqrt{\mathrm {Lasso}} $’s rate. As an application, we consider the use of $\sqrt{\mathrm {Lasso}} $ and ols post $\sqrt{\mathrm {Lasso}} $ as estimators of nuisance parameters in a generic semiparametric problem (nonlinear moment condition or $Z$-problem), resulting in a construction of $\sqrt{n}$-consistent and asymptotically normal estimators of the main parameters.

Article information

Ann. Statist., Volume 42, Number 2 (2014), 757-788.

First available in Project Euclid: 20 May 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G05: Estimation 62G08: Nonparametric regression
Secondary: 62G35: Robustness

Pivotal square-root Lasso model selection non-Gaussian heteroscedastic generic semiparametric problem nonlinear instrumental variable $Z$-estimation problem $\sqrt{n}$-consistency and asymptotic normality after model selection


Belloni, Alexandre; Chernozhukov, Victor; Wang, Lie. Pivotal estimation via square-root Lasso in nonparametric regression. Ann. Statist. 42 (2014), no. 2, 757--788. doi:10.1214/14-AOS1204.

Export citation


  • [1] Amemiya, T. (1977). The maximum likelihood and the nonlinear three-stage least squares estimator in the general nonlinear simultaneous equation model. Econometrica 45 955–968.
  • [2] Belloni, A., Chernozhukov, V. and Wang, L. (2014). Supplement to “Pivotal estimation via square-root Lasso in nonparametric regression.” DOI:10.1214/14-AOS1204SUPP.
  • [3] Belloni, A., Chen, D., Chernozhukov, V. and Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 2369–2429.
  • [4] Belloni, A. and Chernozhukov, V. (2011). High dimensional sparse econometric models: An introduction. In Inverse Problems and High-Dimensional Estimation. Lect. Notes Stat. Proc. 203 121–156. Springer, Heidelberg.
  • [5] Belloni, A. and Chernozhukov, V. (2011). $\ell_1$-penalized quantile regression in high-dimensional sparse models. Ann. Statist. 39 82–130.
  • [6] Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli 19 521–547.
  • [7] Belloni, A., Chernozhukov, V., Fernandez-Val, I. and Hansen, C. (2013). Program evaluation with high-dimensional data. Available at arXiv:1311.2645.
  • [8] Belloni, A., Chernozhukov, V. and Hansen, C. (2010). Lasso methods for Gaussian instrumental variables models. Available at arXiv:1012.1297.
  • [9] Belloni, A., Chernozhukov, V. and Hansen, C. (2011). Inference for high-dimensional sparse econometric models. In Advances in Economics and Econometrics. 10th World Congress of Econometric Society. August 2010 III 245–295. Cambridge Univ. Press, New York.
  • [10] Belloni, A., Chernozhukov, V. and Hansen, C. (2013). Inference on treatment effects after selection amongst high-dimensional controls. Rev. Econom. Stud. DOI:10.1093/restud/rdt044.
  • [11] Belloni, A., Chernozhukov, V. and Kato, K. (2012). Uniform post selection inference for LAD regression and other $Z$-estimation problems. Available at arXiv:1304.0282.
  • [12] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root Lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98 791–806.
  • [13] Belloni, A., Chernozhukov, V. and Wei, Y. (2013). Honest confidence regions for a regression parameter in logistic regression with a large number of controls. Available at arXiv:1304.3969.
  • [14] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [15] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
  • [16] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
  • [17] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
  • [18] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
  • [19] Candès, E. J. and Plan, Y. (2009). Near-ideal model selection by $\ell_1$ minimization. Ann. Statist. 37 2145–2177.
  • [20] Chamberlain, G. (1992). Efficiency bounds for semiparametric regression. Econometrica 60 567–596.
  • [21] Chen, Y. and Dalalyan, A. S. (2012). Fused sparsity and robust estimation for linear models with unknown variance. Adv. Neural Inf. Process. Syst. 25 1268–1276.
  • [22] Chernozhukov, V., Chetverikov, D. and Kato, K. (2012). Gaussian approximations of suprema of empirical processes. Available at arXiv:1212.6885.
  • [23] Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. 41 2786–2819.
  • [24] Chrétien, S. and Darses, S. (2012). Sparse recovery with unknown variance: A Lasso-type approach. Available at arXiv:1101.0434.
  • [25] de la Peña, V. H., Lai, T. L. and Shao, Q.-M. (2009). Self-Normalized Processes. Springer, Berlin.
  • [26] Dümbgen, L., van de Geer, S. A., Veraar, M. C. and Wellner, J. A. (2010). Nemirovski’s inequalities revisited. Amer. Math. Monthly 117 138–160.
  • [27] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849–911.
  • [28] Farrell, M. (2013). Robust inference on average treatment effects with possibly more covariates than observations. Available at arXiv:1309.4686.
  • [29] Gautier, E. and Tsybakov, A. (2011). High-dimensional instrumental variables rergession and confidence sets. Available at arXiv:1105.2454.
  • [30] Giraud, C., Huet, S. and Verzelen, N. (2012). High-dimensional regression with unknown variance. Statist. Sci. 27 500–518.
  • [31] Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50 1029–1054.
  • [32] Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, CA, 1965/66), Vol. I: Statistics 221–233. Univ. California Press, Berkeley, CA.
  • [33] Jing, B.-Y., Shao, Q.-M. and Wang, Q. (2003). Self-normalized Cramér-type large deviations for independent random variables. Ann. Probab. 31 2167–2215.
  • [34] Klopp, O. (2011). High dimensional matrix estimation with unknown variance of the noise. Available at arXiv:1112.3055.
  • [35] Koltchinskii, V. (2009). Sparsity in penalized empirical risk minimization. Ann. Inst. Henri Poincaré Probab. Stat. 45 7–57.
  • [36] Kosorok, M. R. (2008). Introduction to Empirical Processes and Semiparametric Inference. Springer, New York.
  • [37] Leeb, H. and Pötscher, B. M. (2008). Can one estimate the unconditional distribution of post-model-selection estimators? Econometric Theory 24 338–376.
  • [38] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 90–102.
  • [39] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2010). Taking advantage of sparsity in multi-task learning. Available at arXiv:0903.1468.
  • [40] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
  • [41] Robinson, P. M. (1988). Root-$N$-consistent semiparametric regression. Econometrica 56 931–954.
  • [42] Rosenbaum, M. and Tsybakov, A. B. (2010). Sparse recovery under matrix uncertainty. Ann. Statist. 38 2620–2651.
  • [43] Städler, N., Bühlmann, P. and van de Geer, S. (2010). $\ell_1$-penalization for mixture regression models. TEST 19 209–256.
  • [44] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879–898.
  • [45] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York.
  • [46] van de Geer, S. A. (2007). The deterministic Lasso. In JSM proceedings.
  • [47] van de Geer, S. A. (2008). High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 614–645.
  • [48] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
  • [49] van de Geer, S. A., Bühlmann, P. and Ritov, Y. (2013). On asymptotically optimal confidence regions and tests for high-dimensional models. Available at arXiv:1303.0518.
  • [50] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
  • [51] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York.
  • [52] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_1$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
  • [53] Wang, L. (2013). The $L_1$ penalized LAD estimator for high dimensional linear regression. J. Multivariate Anal. 120 135–151.
  • [54] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • [55] Zhang, C.-H. and Zhang, S. S. (2011). Confidence intervals for low-dimensional parameters with high-dimensional data. Available at arXiv:1110.2563.
  • [56] Zhao, R., Sun, T., Zhang, C.-H. and Zhou, H. H. (2013). Asymptotic normality and optimalities in estimation of large Gaussian graphical model. Available at arXiv:1309.6024.

Supplemental materials

  • Supplementary material: Supplementary material. The material contains deferred proofs, additional theoretical results on convergence rates in $\ell_{2},\ell_{1}$ and $\ell_{\infty}$, lower bound on the prediction rate, and Monte-Carlo simulations.