Source: Ann. Statist.
Volume 37, Number 1
The Lasso is an attractive technique for regularization and variable selection for high-dimensional data, where the number of predictor variables pn is potentially much larger than the number of samples n. However, it was recently discovered that the sparsity pattern of the Lasso estimator can only be asymptotically identical to the true sparsity pattern if the design matrix satisfies the so-called irrepresentable condition. The latter condition can easily be violated in the presence of highly correlated variables.
Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the ℓ2-norm sense for fixed designs under conditions on (a) the number sn of nonzero components of the vector βn and (b) the minimal singular values of design matrices that are induced by selecting small subsets of variables. Furthermore, a rate of convergence result is obtained on the ℓ2 error with an appropriate choice of the smoothing parameter. The rate is shown to be optimal under the condition of bounded maximal and minimal sparse eigenvalues. Our results imply that, with high probability, all important variables are selected. The set of selected variables is a meaningful reduction on the original set of variables. Finally, our results are illustrated with the detection of closely adjacent frequencies, a problem encountered in astrophysics.
 Bickel, P., Ritov, Y. and Tsybakov, A. (2008). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. To appear.
 Bunea, B., Tsybakov, A. and Wegkamp, M. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
 Bunea, F., Tsybakov, A. and Wegkamp, M. (2006). Sparsity oracle inequalities for the Lasso. Electron. J. Statist. 169–194.
 Candes, E. and Tao, T. (2005a). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.
 Candes, E. and Tao, T. (2005b). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
 Cornish, N. and Crowder, J. (2005). LISA data analysis using Markov chain Monte Carlo methods. Phys. Rev. D 72 43005.
 Davidson, K. and Szarek, S. (2001). Local operator theory, random matrices and Banach spaces. In Handbook on the Geometry of Banach Spaces 1 (W. B. Johnson and J. Lindenstrauss, eds.) 317–366. North -Holland, Amsterdam.
 Donoho, D. (2006). For most large underdetermined systems of linear equations, the minimal l1-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59 797–829.
 Donoho, D. and Elad, M. (2003). Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1-minimization. Proc. Natl. Acad. Sci. USA 100 2197–2202.
 Donoho, D., Elad, M. and Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6–18.
 Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–451.
 Fuchs, J. (2005). Recovery of exact sparse representations in the presence of bounded noise. IEEE Trans. Inform. Theory 51 3601–3608.
 Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional predictor selection and the virtue of over-parametrization. Bernoulli 10 971–988.
 Gribonval, R. and Nielsen, M. (2003). Sparse representations in unions of bases. IEEE Trans. Inform. Theory 49 3320–3325.
 Hall, P., Reimann, J. and Rice, J. (2000). Nonparametric estimation of a periodic function. Biometrika 87 545–557.
 Hannan, E. and Quinn, B. (1989). The resolution of closely adjacent spectral lines. J. Time Ser. Anal. 10 13–31.
 Joshi, R., Crump, V. and Fischer, T. (1995). Image subband coding using arithmetic coded trellis codedquantization. IEEE Transactions on Circuits and Systems for Video Technology 5 515–523.
 Knight, K. and Fu, W. (2000). Asymptotics for Lasso-type estimators. Ann. Statist. 28 1356–1378.
 LoPresto, S., Ramchandran, K. and Orchard, M. (1997). Image coding based on mixture modeling of wavelet coefficients and a fast estimation-quantization framework. In Proc. Data Compression Conference 221–230.
 Mallat, S. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Machine Intelligence 11 674–693.
 Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. J. Roy. Statist. Soc. Ser. B 70 53–71.
 Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal. 52 374–393.
 Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
 Meinshausen, N., Rocha, G. and Yu, B. (2007). A tale of three cousins: Lasso, L2Boosting and Dantzig. Ann. Statist. 35 2373–2384.
 Osborne, M., Presnell, B. and Turlach, B. (2000). On the Lasso and its dual. J. Comput. Graph. Statistics 9 319–337.
 Paul, D. (2007). Asymptotics of sample eigenstructure for a large-dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
 Pojmanski, G. (2002). The All Sky Automated Survey. Catalog of Variable Stars. I. 0 h–6 Quarter of the Southern Hemisphere. Acta Astronomica 52 397–427.
 Scargle, J. (1982). Studies in astronomical time series analysis. II. Statistical aspects of spectral analysis of unevenly spaced data. Astrophysical J. 263 835.
 Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
 Tropp, J. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform. Theory 50 2231–2242.
 Tropp, J. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory 52 1030–1051.
 Umstätter, R., Christensen, N., Hendry, M., Meyer, R., Simha, V., Veitch, J., Vigeland, S. and Woan, G. (2005). LISA source confusion: Identification and characterization of signals. Classical and Quantum Gravity 22 901.
 Valdés-Sosa, P., Sánchez-Bornot, J., Lage-Castellanos, A., Vega-Hernández, M., Bosch-Bayard, J., Melie-García, L. and Canales-Rodríguez, E. (2005). Estimating brain functional connectivity with sparse multivariate autoregression. Philos. Trans. Roy. Soc. B: Biological Sciences 360 969–981.
 van de Geer, S. (2006). High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 614–645.
 Wainwright, M. (2006). Sharp thresholds for high-dimensional and noisy recovery of sparsity. Available at arXiv:math.ST/0605740.
 Yuan, M. and Lin, Y. (2006a). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68 49–67.
 Yuan, M. and Lin, Y. (2006b). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68 49–67.
 Zhang, C.-H. and Huang, J. (2006). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
 Zhang, H. and Lu, W. (2007). Adaptive-Lasso for Cox’s proportional hazards model. Biometrika 94 691–703.
 Zhao, P. and Yu, B. (2004). Stagewise Lasso. J. Machine Learning Research 8 2701–2726.
 Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Machine Learning Research 7 2541–2563.
 Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.