Annals of Statistics

UPS delivers optimal phase diagram in high-dimensional variable selection

Pengsheng Ji and Jiashun Jin

Full-text: Open access


Consider a linear model Y = + z, zN(0, In). Here, X = Xn,p, where both p and n are large, but p > n. We model the rows of X as i.i.d. samples from N(0, 1/n Ω), where Ω is a p × p correlation matrix, which is unknown to us but is presumably sparse. The vector β is also unknown but has relatively few nonzero coordinates, and we are interested in identifying these nonzeros.

We propose the Univariate Penalization Screeing (UPS) for variable selection. This is a screen and clean method where we screen with univariate thresholding and clean with penalized MLE. It has two important properties: sure screening and separable after screening. These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately. The UPS is effective both in theory and in computation.

We measure the performance of a procedure by the Hamming distance, and use an asymptotic framework where p → ∞ and other quantities (e.g., n, sparsity level and strength of signals) are linked to p by fixed parameters. We find that in many cases, the UPS achieves the optimal rate of convergence. Also, for many different Ω, there is a common three-phase diagram in the two-dimensional phase space quantifying the signal sparsity and signal strength. In the first phase, it is possible to recover all signals. In the second phase, it is possible to recover most of the signals, but not all of them. In the third phase, successful variable selection is impossible. UPS partitions the phase space in the same way that the optimal procedures do, and recovers most of the signals as long as successful variable selection is possible.

The lasso and the subset selection are well-known approaches to variable selection. However, somewhat surprisingly, there are regions in the phase space where neither of them is rate optimal, even in very simple settings, such as Ω is tridiagonal, and when the tuning parameter is ideally set.

Article information

Ann. Statist., Volume 40, Number 1 (2012), 73-103.

First available in Project Euclid: 15 March 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J05: Linear regression 62J07: Ridge regression; shrinkage estimators
Secondary: 62G20: Asymptotic properties 62C05: General considerations

Graph Hamming distance lasso Stein’s normal means penalization methods phase diagram screen and clean subset selection variable selection


Ji, Pengsheng; Jin, Jiashun. UPS delivers optimal phase diagram in high-dimensional variable selection. Ann. Statist. 40 (2012), no. 1, 73--103. doi:10.1214/11-AOS947.

Export citation


  • [1] Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
  • [2] Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automatic Control 19 716–723.
  • [3] Bajwa, W. U., Haupt, J. D., Raz, G. M., Wright, S. J. and Nowak, R. D. (2007). Toeplitz-structured compressed sensing matrices. In Proceedings of IEEE Workshop on Statistical Signal Processing (SSP), Madison, Wisconsin 294–298. IEEE Computer Society, Washington, DC.
  • [4] Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • [5] Candès, E. J. and Plan, Y. (2009). Near-ideal model selection by 1 minimization. Ann. Statist. 37 2145–2177.
  • [6] Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
  • [7] Diestel, R. (2005). Graph Theory, 3rd ed. Graduate Texts in Mathematics 173. Springer, Berlin.
  • [8] Dinur, I. and Nissim, K. (2003). Revealing information while preserving privacy. In Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems 202–210. ACM Press, New York.
  • [9] Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289–1306.
  • [10] Donoho, D. L. and Tanner, J. (2005). Sparse nonnegative solution of underdetermined linear equations by linear programming. Proc. Natl. Acad. Sci. USA 102 9446–9451 (electronic).
  • [11] Fan, J., Jin, J. and Ke, Z. (2011). Optimal procedure for variable selection in the presence of strong dependence. Unpublished manuscript.
  • [12] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849–911.
  • [13] Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975.
  • [14] Friedman, J. H., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 1–22. Available at
  • [15] Genovese, C., Jin, J. and Wasserman, L. (2011). Revisiting marginal regression. Unpublished manuscript.
  • [16] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
  • [17] Ji, P. (2011). Selected topics in nonparametric testing and variable selection for high dimensional data. Ph.D. thesis, Dept. Statistical Science, Cornell Univ.
  • [18] Ji, P. and Jin, J. (2011). Supplement to “UPS delivers optimal phase diagram in high dimensional variable selection.” DOI:10.1214/11-AOS947SUPP.
  • [19] Jin, J. and Zhang, C.-H. (2011). Adaptive optimality of UPS in high dimensional variable selection. Unpublished manuscript.
  • [20] Jin, J. and Zhang, Q. (2011). Optimal selection of variable when signals come from an Ising model. Unpublished manuscript.
  • [21] Kerkyacharian, G., Mougeot, M., Picard, D. and Tribouley, K. (2009). Learning out of leaders. In Multiscale, Nonlinear and Adaptive Approximation 295–324. Springer, Berlin.
  • [22] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • [23] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • [24] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • [25] Wainwright, M. (2006). Sharp threshold for high-dimensional and noisy recovery of sparsity. Technical report, Dept. Statistics, Univ. California, Berkeley.
  • [26] Wasserman, L. (2006). All of Nonparametric Statistics. Springer, New York.
  • [27] Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178–2201.
  • [28] Ye, F. and Zhang, C. H. (2009). Rate minimaxity of the lasso and Dantzig estimators. Technical report, Dept. Statistics and Biostatistics, Rutgers Univ.
  • [29] Zhou, S. (2010). Thresholded Lasso for high dimensional variable selection and statistical estimation. Available at arXiv:1002.1583.
  • [30] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.

Supplemental materials

  • Supplementary material: Supplementary material for “UPS delivers optimal phase diagram in high-dimensional variable selection”. Owing to space constraints, the technical proofs are moved to a supplementary document [18].