The Annals of Statistics

Lasso-type recovery of sparse representations for high-dimensional data

Nicolai Meinshausen and Bin Yu
Source: Ann. Statist. Volume 37, Number 1 (2009), 246-270.

Abstract

The Lasso is an attractive technique for regularization and variable selection for high-dimensional data, where the number of predictor variables pn is potentially much larger than the number of samples n. However, it was recently discovered that the sparsity pattern of the Lasso estimator can only be asymptotically identical to the true sparsity pattern if the design matrix satisfies the so-called irrepresentable condition. The latter condition can easily be violated in the presence of highly correlated variables.

Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the 2-norm sense for fixed designs under conditions on (a) the number sn of nonzero components of the vector βn and (b) the minimal singular values of design matrices that are induced by selecting small subsets of variables. Furthermore, a rate of convergence result is obtained on the 2 error with an appropriate choice of the smoothing parameter. The rate is shown to be optimal under the condition of bounded maximal and minimal sparse eigenvalues. Our results imply that, with high probability, all important variables are selected. The set of selected variables is a meaningful reduction on the original set of variables. Finally, our results are illustrated with the detection of closely adjacent frequencies, a problem encountered in astrophysics.

First Page: Show Hide
Primary Subjects: 62J07
Secondary Subjects: 62F07
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1232115934
Digital Object Identifier: doi:10.1214/07-AOS582
Mathematical Reviews number (MathSciNet): MR2488351
Zentralblatt MATH identifier: 1155.62050

References

[1] Bickel, P., Ritov, Y. and Tsybakov, A. (2008). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. To appear.
Mathematical Reviews (MathSciNet): MR2533469
Zentralblatt MATH: 1173.62022
Digital Object Identifier: doi:10.1214/08-AOS620
Project Euclid: euclid.aos/1245332830
[2] Bunea, B., Tsybakov, A. and Wegkamp, M. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
Mathematical Reviews (MathSciNet): MR2351101
Zentralblatt MATH: 05201517
Digital Object Identifier: doi:10.1214/009053606000001587
Project Euclid: euclid.aos/1188405626
[3] Bunea, F., Tsybakov, A. and Wegkamp, M. (2006). Sparsity oracle inequalities for the Lasso. Electron. J. Statist. 169–194.
Mathematical Reviews (MathSciNet): MR2312149
Zentralblatt MATH: 1146.62028
Digital Object Identifier: doi:10.1214/07-EJS008
Project Euclid: euclid.ejs/1179759718
[4] Candes, E. and Tao, T. (2005a). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.
Mathematical Reviews (MathSciNet): MR2243152
Digital Object Identifier: doi:10.1109/TIT.2005.858979
[5] Candes, E. and Tao, T. (2005b). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
Mathematical Reviews (MathSciNet): MR2382644
Zentralblatt MATH: 1139.62019
Digital Object Identifier: doi:10.1214/009053606000001523
Project Euclid: euclid.aos/1201012958
[6] Cornish, N. and Crowder, J. (2005). LISA data analysis using Markov chain Monte Carlo methods. Phys. Rev. D 72 43005.
[7] Davidson, K. and Szarek, S. (2001). Local operator theory, random matrices and Banach spaces. In Handbook on the Geometry of Banach Spaces 1 (W. B. Johnson and J. Lindenstrauss, eds.) 317–366. North -Holland, Amsterdam.
Mathematical Reviews (MathSciNet): MR1863696
Zentralblatt MATH: 1067.46008
Digital Object Identifier: doi:10.1016/S1874-5849(01)80010-3
[8] Donoho, D. (2006). For most large underdetermined systems of linear equations, the minimal l1-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59 797–829.
Mathematical Reviews (MathSciNet): MR2217606
Zentralblatt MATH: 1113.15004
Digital Object Identifier: doi:10.1002/cpa.20132
[9] Donoho, D. and Elad, M. (2003). Optimally sparse representation in general (nonorthogonal) dictionaries via 1-minimization. Proc. Natl. Acad. Sci. USA 100 2197–2202.
Mathematical Reviews (MathSciNet): MR1963681
Zentralblatt MATH: 1064.94011
Digital Object Identifier: doi:10.1073/pnas.0437847100
[10] Donoho, D., Elad, M. and Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6–18.
Mathematical Reviews (MathSciNet): MR2237332
Digital Object Identifier: doi:10.1109/TIT.2005.860430
[11] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–451.
Mathematical Reviews (MathSciNet): MR2060166
Zentralblatt MATH: 1091.62054
Digital Object Identifier: doi:10.1214/009053604000000067
Project Euclid: euclid.aos/1083178935
[12] Fuchs, J. (2005). Recovery of exact sparse representations in the presence of bounded noise. IEEE Trans. Inform. Theory 51 3601–3608.
Mathematical Reviews (MathSciNet): MR2237526
Digital Object Identifier: doi:10.1109/TIT.2005.855614
[13] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional predictor selection and the virtue of over-parametrization. Bernoulli 10 971–988.
[14] Gribonval, R. and Nielsen, M. (2003). Sparse representations in unions of bases. IEEE Trans. Inform. Theory 49 3320–3325.
Mathematical Reviews (MathSciNet): MR2045813
Digital Object Identifier: doi:10.1109/TIT.2003.820031
[15] Hall, P., Reimann, J. and Rice, J. (2000). Nonparametric estimation of a periodic function. Biometrika 87 545–557.
Mathematical Reviews (MathSciNet): MR1789808
Zentralblatt MATH: 0956.62031
Digital Object Identifier: doi:10.1093/biomet/87.3.545
[16] Hannan, E. and Quinn, B. (1989). The resolution of closely adjacent spectral lines. J. Time Ser. Anal. 10 13–31.
Mathematical Reviews (MathSciNet): MR1001880
Zentralblatt MATH: 0683.62051
Digital Object Identifier: doi:10.1111/j.1467-9892.1989.tb00012.x
[17] Joshi, R., Crump, V. and Fischer, T. (1995). Image subband coding using arithmetic coded trellis codedquantization. IEEE Transactions on Circuits and Systems for Video Technology 5 515–523.
[18] Knight, K. and Fu, W. (2000). Asymptotics for Lasso-type estimators. Ann. Statist. 28 1356–1378.
Mathematical Reviews (MathSciNet): MR1805787
Zentralblatt MATH: 1105.62357
Digital Object Identifier: doi:10.1214/aos/1015957397
Project Euclid: euclid.aos/1015957397
[19] LoPresto, S., Ramchandran, K. and Orchard, M. (1997). Image coding based on mixture modeling of wavelet coefficients and a fast estimation-quantization framework. In Proc. Data Compression Conference 221–230.
[20] Mallat, S. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Machine Intelligence 11 674–693.
[21] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. J. Roy. Statist. Soc. Ser. B 70 53–71.
[22] Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal. 52 374–393.
Mathematical Reviews (MathSciNet): MR2409990
[23] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
[24] Meinshausen, N., Rocha, G. and Yu, B. (2007). A tale of three cousins: Lasso, L2Boosting and Dantzig. Ann. Statist. 35 2373–2384.
Mathematical Reviews (MathSciNet): MR2382649
Digital Object Identifier: doi:10.1214/009053607000000460
Project Euclid: euclid.aos/1201012963
[25] Osborne, M., Presnell, B. and Turlach, B. (2000). On the Lasso and its dual. J. Comput. Graph. Statistics 9 319–337.
Mathematical Reviews (MathSciNet): MR1822089
Digital Object Identifier: doi:10.2307/1390657
[26] Paul, D. (2007). Asymptotics of sample eigenstructure for a large-dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
Mathematical Reviews (MathSciNet): MR2399865
Zentralblatt MATH: 1134.62029
[27] Pojmanski, G. (2002). The All Sky Automated Survey. Catalog of Variable Stars. I. 0 h–6 Quarter of the Southern Hemisphere. Acta Astronomica 52 397–427.
[28] Scargle, J. (1982). Studies in astronomical time series analysis. II. Statistical aspects of spectral analysis of unevenly spaced data. Astrophysical J. 263 835.
[29] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
Mathematical Reviews (MathSciNet): MR1379242
[30] Tropp, J. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform. Theory 50 2231–2242.
Mathematical Reviews (MathSciNet): MR2097044
Digital Object Identifier: doi:10.1109/TIT.2004.834793
[31] Tropp, J. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory 52 1030–1051.
Mathematical Reviews (MathSciNet): MR2238069
Digital Object Identifier: doi:10.1109/TIT.2005.864420
[32] Umstätter, R., Christensen, N., Hendry, M., Meyer, R., Simha, V., Veitch, J., Vigeland, S. and Woan, G. (2005). LISA source confusion: Identification and characterization of signals. Classical and Quantum Gravity 22 901.
[33] Valdés-Sosa, P., Sánchez-Bornot, J., Lage-Castellanos, A., Vega-Hernández, M., Bosch-Bayard, J., Melie-García, L. and Canales-Rodríguez, E. (2005). Estimating brain functional connectivity with sparse multivariate autoregression. Philos. Trans. Roy. Soc. B: Biological Sciences 360 969–981.
[34] van de Geer, S. (2006). High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 614–645.
Mathematical Reviews (MathSciNet): MR2396809
Zentralblatt MATH: 1138.62323
Digital Object Identifier: doi:10.1214/009053607000000929
Project Euclid: euclid.aos/1205420513
[35] Wainwright, M. (2006). Sharp thresholds for high-dimensional and noisy recovery of sparsity. Available at arXiv:math.ST/0605740.
[36] Yuan, M. and Lin, Y. (2006a). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68 49–67.
Mathematical Reviews (MathSciNet): MR2212574
Zentralblatt MATH: 1141.62030
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00532.x
[37] Yuan, M. and Lin, Y. (2006b). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68 49–67.
Mathematical Reviews (MathSciNet): MR2212574
Zentralblatt MATH: 1141.62030
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00532.x
[38] Zhang, C.-H. and Huang, J. (2006). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
Mathematical Reviews (MathSciNet): MR2435448
Zentralblatt MATH: 1142.62044
Digital Object Identifier: doi:10.1214/07-AOS520
Project Euclid: euclid.aos/1216237292
[39] Zhang, H. and Lu, W. (2007). Adaptive-Lasso for Cox’s proportional hazards model. Biometrika 94 691–703.
Mathematical Reviews (MathSciNet): MR2410017
Zentralblatt MATH: 1135.62083
Digital Object Identifier: doi:10.1093/biomet/asm037
[40] Zhao, P. and Yu, B. (2004). Stagewise Lasso. J. Machine Learning Research 8 2701–2726.
Mathematical Reviews (MathSciNet): MR2383572
[41] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Machine Learning Research 7 2541–2563.
Mathematical Reviews (MathSciNet): MR2274449
[42] Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
Mathematical Reviews (MathSciNet): MR2279469
Zentralblatt MATH: 1171.62326
Digital Object Identifier: doi:10.1198/016214506000000735

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?