The Annals of Applied Statistics

SLOPE—Adaptive variable selection via convex optimization

Abstract

We introduce a new estimator for the vector of coefficients $\beta$ in the linear model $y=X\beta+z$, where $X$ has dimensions $n\times p$ with $p$ possibly larger than $n$. SLOPE, short for Sorted L-One Penalized Estimation, is the solution to

$\min_{b\in\mathbb{R}^{p}}\frac{1}{2}\Vert y-Xb\Vert_{\ell_{2}}^{2}+\lambda_{1}\vert b\vert_{(1)}+\lambda_{2}\vert b\vert_{(2)}+\cdots+\lambda_{p}\vert b\vert_{(p)},$ where $\lambda_{1}\ge\lambda_{2}\ge\cdots\ge\lambda_{p}\ge0$ and $\vert b\vert_{(1)}\ge\vert b\vert_{(2)}\ge\cdots\ge\vert b\vert_{(p)}$ are the decreasing absolute values of the entries of $b$. This is a convex program and we demonstrate a solution algorithm whose computational complexity is roughly comparable to that of classical $\ell_{1}$ procedures such as the Lasso. Here, the regularizer is a sorted $\ell_{1}$ norm, which penalizes the regression coefficients according to their rank: the higher the rank—that is, stronger the signal—the larger the penalty. This is similar to the Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289–300] procedure (BH) which compares more significant $p$-values with more stringent thresholds. One notable choice of the sequence $\{\lambda_{i}\}$ is given by the BH critical values $\lambda_{\mathrm{BH}}(i)=z(1-i\cdot q/2p)$, where $q\in(0,1)$ and $z(\alpha)$ is the quantile of a standard normal distribution. SLOPE aims to provide finite sample guarantees on the selected model; of special interest is the false discovery rate (FDR), defined as the expected proportion of irrelevant regressors among all selected predictors. Under orthogonal designs, SLOPE with $\lambda_{\mathrm{BH}}$ provably controls FDR at level $q$. Moreover, it also appears to have appreciable inferential properties under more general designs $X$ while having substantial power, as demonstrated in a series of experiments running on both simulated and real data.

Article information

Source
Ann. Appl. Stat., Volume 9, Number 3 (2015), 1103-1140.

Dates
Revised: February 2015
First available in Project Euclid: 2 November 2015

https://projecteuclid.org/euclid.aoas/1446488733

Digital Object Identifier
doi:10.1214/15-AOAS842

Mathematical Reviews number (MathSciNet)
MR3418717

Zentralblatt MATH identifier
06525980

Citation

Bogdan, Małgorzata; van den Berg, Ewout; Sabatti, Chiara; Su, Weijie; Candès, Emmanuel J. SLOPE—Adaptive variable selection via convex optimization. Ann. Appl. Stat. 9 (2015), no. 3, 1103--1140. doi:10.1214/15-AOAS842. https://projecteuclid.org/euclid.aoas/1446488733

References

• Abramovich, F. and Benjamini, Y. (1995). Thresholding of wavelet coefficients as multiple hypotheses testing procedure. In Wavelets and Statistics. Lecture Notes in Statistics 103 5–14. Springer, Berlin.
• Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
• Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control AC-19 716–723. System identification and time-series analysis.
• Barlow, R. E., Bartholomew, D. J., Bremner, J. M. and Brunk, H. D. (1972). Statistical Inference Under Order Restrictions. The Theory and Application of Isotonic Regression. Wiley, New York.
• Bauer, P., Pötscher, B. M. and Hackl, P. (1988). Model selection by multiple test procedures. Statistics 19 39–44.
• Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183–202.
• Becker, S. R., Candès, E. J. and Grant, M. C. (2011). Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3 165–218.
• Benjamini, Y. and Gavrilov, Y. (2009). A simple forward selection procedure based on false discovery rate control. Ann. Appl. Stat. 3 179–198.
• Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
• Benjamini, Y. and Yekutieli, D. (2005). False discovery rate-adjusted multiple confidence intervals for selected parameters. J. Amer. Statist. Assoc. 100 71–93.
• Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802–837.
• Best, M. J. and Chakravarti, N. (1990). Active set algorithms for isotonic regression; a unifying framework. Math. Program. 47 425–439.
• Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. (JEMS) 3 203–268.
• Bogdan, M., Chakrabarti, A., Frommlet, F. and Ghosh, J. K. (2011). Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. Ann. Statist. 39 1551–1579.
• Bogdan, M., Ghosh, J. K. and Żak-Szatkowska, M. (2008). Selecting explanatory variables with the modified version of Bayesian information criterion. Qual. Reliab. Eng. Int. 24 627–641.
• Bogdan, M., van den Berg, E., Sabatti, C., Su, W. and Candès, E. J. (2015). Supplement to “SLOPE—Adaptive variable selection via convex optimization.” DOI:10.1214/15-AOAS842SUPP.
• Bogdan, M., van den Berg, E., Su, W. and Candès, E. J. (2013). Statistical estimation and testing via the ordered $\ell_{1}$ norm. Preprint. Available at arXiv:1310.1969v2.
• Bondell, H. D. and Reich, B. J. (2008). Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64 115–123, 322–323.
• Bühlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19 1212–1242.
• Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
• Candès, E. J., Wakin, M. B. and Boyd, S. P. (2008). Enhancing sparsity by reweighted $l_{1}$ minimization. J. Fourier Anal. Appl. 14 877–905.
• de Leeuw, J., Hornik, K. and Mair, P. (2009). Isotone optimization in R: Pool-adjacent-violators algorithm (PAVA) and active set methods. J. Stat. Softw. 32 1–24.
• Efron, B. (2011). Tweedie’s formula and selection bias. J. Amer. Statist. Assoc. 106 1602–1614.
• Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975.
• Foster, D. P. and Stine, R. A. (1999). Local asymptotic coding and the minimum description length. IEEE Trans. Inform. Theory 45 1289–1293.
• Foygel-Barber, R. and Candès, E. J. (2014). Controlling the false discovery rate via knockoffs. Ann. Statist. To appear. Available at arXiv:1404.5609.
• Frommlet, F. and Bogdan, M. (2013). Some optimality properties of FDR controlling rules under sparsity. Electron. J. Stat. 7 1328–1368.
• Frommlet, F., Ruhaltinger, F., Twaróg, P. and Bogdan, M. (2012). Modified versions of Bayesian information criterion for genome-wide association studies. Comput. Statist. Data Anal. 56 1038–1051.
• Grazier G’Sell, M., Hastie, T. and Tibshirani, R. (2013). False variable selection rates in regression. Preprint. Available at arXiv:1302.2303.
• Grotzinger, S. J. and Witzgall, C. (1984). Projections onto order simplexes. Appl. Math. Optim. 12 247–270.
• Ingster, Yu. I. (1998). Minimax detection of a signal for $l^{n}$-balls. Math. Methods Statist. 7 401–428.
• Javanmard, A. and Montanari, A. (2014a). Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 15 2869–2909.
• Javanmard, A. and Montanari, A. (2014b). Hypothesis testing in high-dimensional regression under the Gaussian random design model: Asymptotic theory. IEEE Trans. Inform. Theory 60 6522–6554.
• Kruskal, J. B. (1964). Nonmetric multidimensional scaling: A numerical method. Psychometrika 29 115–129.
• Lockhart, R., Taylor, J., Tibshirani, R. J. and Tibshirani, R. (2014). A significance test for the Lasso. Ann. Statist. 42 413–468.
• Mallows, C. L. (1973). Some comments on $c_{p}$. Technometrics 15 661–676.
• Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal. 52 374–393.
• Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 417–473.
• Meinshausen, N., Meier, L. and Bühlmann, P. (2009). $p$-values for high-dimensional regression. J. Amer. Statist. Assoc. 104 1671–1681.
• Nesterov, Y. (2004). Introductory Lectures on Convex Optimization. A Basic Course. Kluwer Academic, Boston, MA.
• Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. CORE discussion paper. Center for Operations Research and Econometrics (CORE), Université Catholique de Louvain. Available at http://www.ecore.be/DPs/dp_1191313936.pdf.
• Parikh, N. and Boyd, S. (2013). Proximal algorithms. In Foundations and Trends in Optimization 1 123–231.
• Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Ann. Statist. 30 239–257.
• Service, S. K., Teslovich, T. M., Fuchsberger, C., Ramensky, V., Yajnik, P., Koboldt, D. C., Larson, D. E., Zhang, Q., Lin, L., Welch, R., Ding, L., McLellan, M. D., O’Laughlin, M., Fronick, C., Fulton, L. L., Magrini, V., Swift, A., Elliott, P., Jarvelin, M. R., Kaakinen, M., McCarthy, M. I., Peltonen, L., Pouta, A., Bonnycastle, L. L., Collins, F. S., Narisu, N., Stringham, H. M., Tuomilehto, J., Ripatti, S., Fulton, R. S., Sabatti, C., Wilson, R. K., Boehnke, M. and Freimer, N. B. (2014). Re-sequencing expands our understanding of the phenotypic impact of variants at GWAS loci. PLoS Genet. 10 e1004147.
• Städler, N., Bühlmann, P. and van de Geer, S. (2010). $\ell_{1}$-penalization for mixture regression models. TEST 19 209–256.
• Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879–898.
• Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• Tibshirani, R. and Knight, K. (1999). The covariance inflation criterion for adaptive model selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 61 529–546.
• van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42 1166–1202.
• Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178–2201.
• Wu, Z. and Zhou, H. H. (2013). Model selection and sharp asymptotic minimaxity. Probab. Theory Related Fields 156 165–191.
• Zeng, X. and Figueiredo, M. (2014). Decreasing weighted sorted l1 regularization. IEEE Signal Process. Lett. 1240–1244.
• Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217–242.
• Zhong, L. and Kwok, J. (2012). Efficient sparse modeling with automatic feature grouping. IEEE Trans. Neural Netw. Learn. Syst. 1436–1447.
• Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.