The Annals of Statistics

1-penalized quantile regression in high-dimensional sparse models

Alexandre Belloni and Victor Chernozhukov

Full-text: Open access


We consider median regression and, more generally, a possibly infinite collection of quantile regressions in high-dimensional sparse models. In these models, the number of regressors p is very large, possibly larger than the sample size n, but only at most s regressors have a nonzero impact on each conditional quantile of the response variable, where s grows more slowly than n. Since ordinary quantile regression is not consistent in this case, we consider 1-penalized quantile regression (1-QR), which penalizes the 1-norm of regression coefficients, as well as the post-penalized QR estimator (post-1-QR), which applies ordinary QR to the model selected by 1-QR. First, we show that under general conditions 1-QR is consistent at the near-oracle rate $\sqrt{s/n}\sqrt{\log(p\vee n)}$, uniformly in the compact set $\mathcal{U}\subset(0,1)$ of quantile indices. In deriving this result, we propose a partly pivotal, data-driven choice of the penalty level and show that it satisfies the requirements for achieving this rate. Second, we show that under similar conditions post-1-QR is consistent at the near-oracle rate $\sqrt{s/n}\sqrt{\log(p\vee n)}$, uniformly over $\mathcal{U}$, even if the 1-QR-selected models miss some components of the true models, and the rate could be even closer to the oracle rate otherwise. Third, we characterize conditions under which 1-QR contains the true model as a submodel, and derive bounds on the dimension of the selected model, uniformly over $\mathcal{U}$; we also provide conditions under which hard-thresholding selects the minimal true model, uniformly over $\mathcal{U}$.

Article information

Ann. Statist., Volume 39, Number 1 (2011), 82-130.

First available in Project Euclid: 3 December 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H12: Estimation 62J99: None of the above, but in this section
Secondary: 62J07: Ridge regression; shrinkage estimators

Median regression quantile regression sparse models


Belloni, Alexandre; Chernozhukov, Victor. ℓ 1 -penalized quantile regression in high-dimensional sparse models. Ann. Statist. 39 (2011), no. 1, 82--130. doi:10.1214/10-AOS827.

Export citation


  • [1] Belloni, A. and Chernozhukov, V. (2009). Computational complexity of MCMC-based estimators in large samples. Ann. Statist. 37 2011–2055.
  • [2] Belloni, A. and Chernozhukov, V. (2010). Supplement to “1-penalized quantile regression in high-dimensional sparse models.” DOI: 10.1214/10-AOS827SUPP.
  • [3] Belloni, A. and Chernozhukov, V. (2009). 1-penalized quantile regression in high-dimensional sparse models. Available at arXiv:0904.2931.
  • [4] Belloni, A. and Chernozhukov, V. (2008). Conditional quantile processes under increasing dimension. Technical report, Duke and MIT.
  • [5] Belloni, A. and Chernozhukov, V. (2009). Post-1-penalized estimators in high-dimensional linear regression models. Available at arXiv:1001.0188.
  • [6] Bertsimas, D. and Tsitsiklis, J. (1997). Introduction to Linear Optimization. Athena Scientific, Belmont, MA.
  • [7] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [8] Buchinsky, M. (1994). Changes in the U.S. wage structure 1963–1987: Application of quantile regression. Econometrica 62 405–458.
  • [9] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2006). Aggregation and sparsity via 1 penalized least squares. In Proceedings of 19th Annual Conference on Learning Theory (COLT 2006) (G. Lugosi and H. U. Simon, eds.). Lecture Notes in Artificial Intelligence 4005 379–391. Springer, Berlin.
  • [10] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
  • [11] Bunea, F., Tsybakov, A. and Wegkamp, M. H. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
  • [12] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
  • [13] Chernozhukov, V. (2005). Extremal quantile regression. Ann. Statist. 33 806–839.
  • [14] Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849–911.
  • [15] Gutenbrunner, C. and Jurečková, J. (1992). Regression rank scores and regression quantiles. Ann. Statist. 20 305–330.
  • [16] He, X. and Shao, Q.-M. (2000). On parameters of increasing dimenions. J. Multivariate Anal. 73 120–135.
  • [17] Knight, K. (1998). Limiting distributions for L1 regression estimators under general conditions. Ann. Statist. 26 755–770.
  • [18] Knight, K. and Fu, W. J. (2000). Asymptotics for Lasso-type estimators. Ann. Statist. 28 1356–1378.
  • [19] Koenker, R. (2005). Quantile Regression. Cambridge Univ. Press, Cambridge.
  • [20] Koenker, R. (2010). Additive models for quantile regression: Model selection and confidence bandaids. Working paper. Available at
  • [21] Koenker, R. and Basset, G. (1978). Regression quantiles. Econometrica 46 33–50.
  • [22] Koltchinskii, V. (2009). Sparsity in penalized empirical risk minimization. Ann. Inst. H. Poincaré Probab. Statist. 45 7–57.
  • [23] Laplace, P.-S. (1818). Théorie Analytique des Probabilités. Éditions Jacques Gabay (1995), Paris.
  • [24] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der Mathematik und ihrer Grenzgebiete 23. Springer, Berlin.
  • [25] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. A. (2009). Taking advantage of sparsity in multi-task learning. In COLT’09. Omnipress, Madison, WI.
  • [26] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
  • [27] Portnoy, S. (1991). Asymptotic behavior of regression quantiles in nonstationary, dependent cases. J. Multivariate Anal. 38 100–113.
  • [28] Portnoy, S. and Koenker, R. (1997). The Gaussian hare and the Laplacian tortoise: Computability of squared-error versus absolute-error estimators. Statist. Sci. 12 279–300.
  • [29] Rosenbaum, M. and Tsybakov, A. B. (2010). Sparse recovery under matrix uncertainty. Ann. Statist. 38 2620–2651.
  • [30] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • [31] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Univ. Press, Cambridge, MA.
  • [32] van de Geer, S. A. (2008). High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 614–645.
  • [33] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
  • [34] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.

Supplemental materials

  • Supplementary material: Supplement to “ℓ1-penalized quantile regression in high-dimensional sparse models”. We included technical proofs omitted from the main text: Examples of simple sufficient conditions, VC index bounds and Gaussian sparse eigenvalues.