Source: Ann. Statist.
Volume 35, Number 6
In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y=Xβ+z, where β∈Rp is a parameter vector of interest, X is a data matrix with possibly far fewer rows than columns, n≪p, and the zi’s are i.i.d. N(0, σ2). Is it possible to estimate β reliably based on the noisy data y?
To estimate β, we introduce a new estimator—we call it the Dantzig selector—which is a solution to the ℓ1-regularization problem
where r is the residual vector y−Xβ̃ and t is a positive scalar. We show that if X obeys a uniform uncertainty principle (with unit-normed columns) and if the true parameter vector β is sufficiently sparse (which here roughly guarantees that the model is identifiable), then with very large probability,
‖β̂−β‖ℓ22≤C2⋅2log p⋅(σ2+∑imin(βi2, σ2)).
Our results are nonasymptotic and we give values for the constant C. Even though n may be much smaller than p, our estimator achieves a loss within a logarithmic factor of the ideal mean squared error one would achieve with an oracle which would supply perfect information about which coordinates are nonzero, and which were above the noise level.
In multivariate regression and from a model selection viewpoint, our result says that it is possible nearly to select the best subset of variables by solving a very simple convex program, which, in fact, can easily be recast as a convenient linear program (LP).
Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automatic Control 19 716–723.
Antoniadis, A. and Fan, J. (2001). Regularization of wavelet approximations (with discussion). J. Amer. Statist. Assoc. 96 939–967.
Baraud, Y. (2000). Model selection for regression on a fixed design. Probab. Theory Related Fields 117 467–493.
Barron, A. R., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413.
Barron, A. R. and Cover, T. M. (1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 1034–1054.
Birgé, L. and Massart, P. (1997). From model selection to adaptive estimation. In Festschrift for Lucien Le Cam (D. Pollard, E. Torgersen and G. L. Yang, eds.) 55–87. Springer, New York.
Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. 3 203–268.
Boyd, S. and Vandenberghe L. (2004). Convex Optimization. Cambridge Univ. Press.
Candès, E. J. and Romberg, J. (2005). Practical signal recovery from random projections. In Computational Imaging III: Proc. SPIE International Symposium on Electronic Imaging 1 76–86. San Jose, CA.
Candès, E. J., Romberg, J. and Tao, T. (2006). Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math. 59 1207–1223.
Candès, E. J., Romberg, J. and Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory 52 489–509.
Candès, E. J., Rudelson, M., Vershynin, R. and Tao, T. (2005). Error correction via linear programming. In Proc. 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS) 295–308. IEEE, Los Alamitos, CA.
Candès, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.
Candès, E. J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52 5406–5425.
Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
Daniel, B. L., Yen, Y. F., Glover, G. H. et al. (1998). Breast disease: Dynamic spiral MR imaging. Radiology 209 499–509.
Daubechies, I. (2005). Personal communication.
Donoho, D. L. (2006). For most large underdetermined systems of linear equations the minimal $\ell_1$-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59 797–829.
Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289–1306.
Donoho, D. L. and Huo, X. (2001). Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory 47 2845–2862.
Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
Donoho, D. L. and Johnstone, I. M. (1994). Ideal denoising in an orthonormal basis chosen from a library of bases. C. R. Acad. Sci. Paris Sér. I Math. 319 1317–1322.
Donoho, D. L. and Johnstone, I. M. (1995). Empirical atomic decomposition. Unpublished manuscript.
Elad, M. and Bruckstein, A. M. (2002). A generalized uncertainty principle and sparse representation in pairs of bases. IEEE Trans. Inform. Theory 48 2558–2567.
Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975.
Fuchs, J. (2004). On sparse representations in arbitrary redundant bases. IEEE Trans. Inform. Theory 50 1341–1344.
Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
Haupt, J. and Nowak, R. (2006). Signal reconstruction from noisy random projections. IEEE Trans. Inform. Theory 52 4036–4048.
Kettenring, J., Lindsay, B. and Siegmund, D., eds. (2003). Statistics: Challenges and opportunities for the twenty-first century. NSF report. Available at www.pnl.gov/scales/docs/nsf_report.pdf.
Mallows, C. L. (1973). Some comments on $C_P$. Technometrics 15 661–675.
Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM J. Comput. 24 227–234.
Peters, D. C., Korosec, F. R., Grist, T. M., Block, W. F., Holden, J. E., Vigen, K. K. and Mistretta, C. A. (2000). Undersampled projection reconstruction applied to MR angiography. Magnetic Resonance in Medicine 43 91–101.
Rudin, L. I., Osher, S. and Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D 60 259–268.
Sardy, S., Bruce, A. G. and Tseng, P. (2000). Block coordinate relaxation methods for nonparametric wavelet denoising. J. Comput. Graph. Statist. 9 361–379.
Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
Szarek, S. J. (1991). Condition numbers of random matrices. J. Complexity 7 131–149.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.