The Annals of Statistics

The Dantzig selector: Statistical estimation when p is much larger than n

Emmanuel Candes and Terence Tao
Source: Ann. Statist. Volume 35, Number 6 (2007), 2313-2351.

Abstract

In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y=+z, where βRp is a parameter vector of interest, X is a data matrix with possibly far fewer rows than columns, np, and the zi’s are i.i.d. N(0, σ2). Is it possible to estimate β reliably based on the noisy data y?

To estimate β, we introduce a new estimator—we call it the Dantzig selector—which is a solution to the 1-regularization problem

\[\min_{\tilde{\beta}\in\mathbf{R}^{p}}\|\tilde{\beta}\|_{\ell_{1}}\quad\mbox{subject to}\quad \|X^{*}r\|_{\ell_{\infty}}\leq(1+t^{-1})\sqrt{2\log p}\cdot\sigma,\]

where r is the residual vector yXβ̃ and t is a positive scalar. We show that if X obeys a uniform uncertainty principle (with unit-normed columns) and if the true parameter vector β is sufficiently sparse (which here roughly guarantees that the model is identifiable), then with very large probability,

β̂β22C2⋅2log p⋅(σ2+∑imin(βi2, σ2)).

Our results are nonasymptotic and we give values for the constant C. Even though n may be much smaller than p, our estimator achieves a loss within a logarithmic factor of the ideal mean squared error one would achieve with an oracle which would supply perfect information about which coordinates are nonzero, and which were above the noise level.

In multivariate regression and from a model selection viewpoint, our result says that it is possible nearly to select the best subset of variables by solving a very simple convex program, which, in fact, can easily be recast as a convenient linear program (LP).

First Page: Show Hide
Primary Subjects: 62C05, 62G05
Secondary Subjects: 94A08, 94A12
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1201012958
Digital Object Identifier: doi:10.1214/009053606000001523
Mathematical Reviews number (MathSciNet): MR2382644
Zentralblatt MATH identifier: 1139.62019

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automatic Control 19 716–723.
Mathematical Reviews (MathSciNet): MR0423716
Digital Object Identifier: doi:10.1109/TAC.1974.1100705
Antoniadis, A. and Fan, J. (2001). Regularization of wavelet approximations (with discussion). J. Amer. Statist. Assoc. 96 939–967.
Mathematical Reviews (MathSciNet): MR1946364
Zentralblatt MATH: 1072.62561
Digital Object Identifier: doi:10.1198/016214501753208942
Baraud, Y. (2000). Model selection for regression on a fixed design. Probab. Theory Related Fields 117 467–493.
Mathematical Reviews (MathSciNet): MR1777129
Zentralblatt MATH: 0997.62027
Digital Object Identifier: doi:10.1007/PL00008731
Barron, A. R., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413.
Mathematical Reviews (MathSciNet): MR1679028
Zentralblatt MATH: 0946.62036
Digital Object Identifier: doi:10.1007/s004400050210
Barron, A. R. and Cover, T. M. (1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 1034–1054.
Mathematical Reviews (MathSciNet): MR1111806
Digital Object Identifier: doi:10.1109/18.86996
Birgé, L. and Massart, P. (1997). From model selection to adaptive estimation. In Festschrift for Lucien Le Cam (D. Pollard, E. Torgersen and G. L. Yang, eds.) 55–87. Springer, New York.
Mathematical Reviews (MathSciNet): MR1462939
Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. 3 203–268.
Mathematical Reviews (MathSciNet): MR1848946
Zentralblatt MATH: 1037.62001
Digital Object Identifier: doi:10.1007/s100970100031
Boyd, S. and Vandenberghe L. (2004). Convex Optimization. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR2061575
Candès, E. J. and Romberg, J. (2005). Practical signal recovery from random projections. In Computational Imaging III: Proc. SPIE International Symposium on Electronic Imaging 1 76–86. San Jose, CA.
Candès, E. J., Romberg, J. and Tao, T. (2006). Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math. 59 1207–1223.
Mathematical Reviews (MathSciNet): MR2230846
Zentralblatt MATH: 1098.94009
Digital Object Identifier: doi:10.1002/cpa.20124
Candès, E. J., Romberg, J. and Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory 52 489–509.
Mathematical Reviews (MathSciNet): MR2236170
Digital Object Identifier: doi:10.1109/TIT.2005.862083
Candès, E. J., Rudelson, M., Vershynin, R. and Tao, T. (2005). Error correction via linear programming. In Proc. 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS) 295–308. IEEE, Los Alamitos, CA.
Candès, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.
Mathematical Reviews (MathSciNet): MR2243152
Digital Object Identifier: doi:10.1109/TIT.2005.858979
Candès, E. J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52 5406–5425.
Mathematical Reviews (MathSciNet): MR2300700
Digital Object Identifier: doi:10.1109/TIT.2006.885507
Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
Mathematical Reviews (MathSciNet): MR1639094
Zentralblatt MATH: 0919.94002
Digital Object Identifier: doi:10.1137/S1064827596304010
Daniel, B. L., Yen, Y. F., Glover, G. H. et al. (1998). Breast disease: Dynamic spiral MR imaging. Radiology 209 499–509.
Daubechies, I. (2005). Personal communication.
Donoho, D. L. (2006). For most large underdetermined systems of linear equations the minimal $\ell_1$-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59 797–829.
Mathematical Reviews (MathSciNet): MR2217606
Zentralblatt MATH: 1113.15004
Digital Object Identifier: doi:10.1002/cpa.20132
Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289–1306.
Mathematical Reviews (MathSciNet): MR2241189
Digital Object Identifier: doi:10.1109/TIT.2006.871582
Donoho, D. L. and Huo, X. (2001). Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory 47 2845–2862.
Mathematical Reviews (MathSciNet): MR1872845
Digital Object Identifier: doi:10.1109/18.959265
Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
Mathematical Reviews (MathSciNet): MR1311089
Zentralblatt MATH: 0815.62019
Digital Object Identifier: doi:10.1093/biomet/81.3.425
Donoho, D. L. and Johnstone, I. M. (1994). Ideal denoising in an orthonormal basis chosen from a library of bases. C. R. Acad. Sci. Paris Sér. I Math. 319 1317–1322.
Mathematical Reviews (MathSciNet): MR1310679
Donoho, D. L. and Johnstone, I. M. (1995). Empirical atomic decomposition. Unpublished manuscript.
Elad, M. and Bruckstein, A. M. (2002). A generalized uncertainty principle and sparse representation in pairs of bases. IEEE Trans. Inform. Theory 48 2558–2567.
Mathematical Reviews (MathSciNet): MR1929464
Digital Object Identifier: doi:10.1109/TIT.2002.801410
Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
Mathematical Reviews (MathSciNet): MR2065194
Zentralblatt MATH: 1092.62031
Digital Object Identifier: doi:10.1214/009053604000000256
Project Euclid: euclid.aos/1085408491
Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975.
Mathematical Reviews (MathSciNet): MR1329177
Zentralblatt MATH: 0829.62066
Digital Object Identifier: doi:10.1214/aos/1176325766
Project Euclid: euclid.aos/1176325766
Fuchs, J. (2004). On sparse representations in arbitrary redundant bases. IEEE Trans. Inform. Theory 50 1341–1344.
Mathematical Reviews (MathSciNet): MR2094894
Digital Object Identifier: doi:10.1109/TIT.2004.828141
Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
Mathematical Reviews (MathSciNet): MR2108039
Digital Object Identifier: doi:10.3150/bj/1106314846
Project Euclid: euclid.bj/1106314846
Haupt, J. and Nowak, R. (2006). Signal reconstruction from noisy random projections. IEEE Trans. Inform. Theory 52 4036–4048.
Mathematical Reviews (MathSciNet): MR2298532
Digital Object Identifier: doi:10.1109/TIT.2006.880031
Kettenring, J., Lindsay, B. and Siegmund, D., eds. (2003). Statistics: Challenges and opportunities for the twenty-first century. NSF report. Available at www.pnl.gov/scales/docs/nsf_report.pdf.
Mallows, C. L. (1973). Some comments on $C_P$. Technometrics 15 661–675.
Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM J. Comput. 24 227–234.
Mathematical Reviews (MathSciNet): MR1320206
Zentralblatt MATH: 0827.68054
Digital Object Identifier: doi:10.1137/S0097539792240406
Peters, D. C., Korosec, F. R., Grist, T. M., Block, W. F., Holden, J. E., Vigen, K. K. and Mistretta, C. A. (2000). Undersampled projection reconstruction applied to MR angiography. Magnetic Resonance in Medicine 43 91–101.
Rudin, L. I., Osher, S. and Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D 60 259–268.
Sardy, S., Bruce, A. G. and Tseng, P. (2000). Block coordinate relaxation methods for nonparametric wavelet denoising. J. Comput. Graph. Statist. 9 361–379.
Mathematical Reviews (MathSciNet): MR1822091
Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
Mathematical Reviews (MathSciNet): MR0468014
Zentralblatt MATH: 0379.62005
Digital Object Identifier: doi:10.1214/aos/1176344136
Project Euclid: euclid.aos/1176344136
Szarek, S. J. (1991). Condition numbers of random matrices. J. Complexity 7 131–149.
Mathematical Reviews (MathSciNet): MR1108773
Zentralblatt MATH: 0760.15018
Digital Object Identifier: doi:10.1016/0885-064X(91)90002-F
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
Mathematical Reviews (MathSciNet): MR1379242

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?