## The Annals of Statistics

### CoCoLasso for high-dimensional error-in-variables regression

#### Abstract

Much theoretical and applied work has been devoted to high-dimensional regression with clean data. However, we often face corrupted data in many applications where missing data and measurement errors cannot be ignored. Loh and Wainwright [Ann. Statist. 40 (2012) 1637–1664] proposed a nonconvex modification of the Lasso for doing high-dimensional regression with noisy and missing data. It is generally agreed that the virtues of convexity contribute fundamentally the success and popularity of the Lasso. In light of this, we propose a new method named CoCoLasso that is convex and can handle a general class of corrupted datasets. We establish the estimation error bounds of CoCoLasso and its asymptotic sign-consistent selection property. We further elucidate how the standard cross validation techniques can be misleading in presence of measurement error and develop a novel calibrated cross-validation technique by using the basic idea in CoCoLasso. The calibrated cross-validation has its own importance. We demonstrate the superior performance of our method over the nonconvex approach by simulation studies.

#### Article information

Source
Ann. Statist. Volume 45, Number 6 (2017), 2400-2426.

Dates
Revised: August 2016
First available in Project Euclid: 15 December 2017

https://projecteuclid.org/euclid.aos/1513328577

Digital Object Identifier
doi:10.1214/16-AOS1527

Subjects
Primary: 62J07: Ridge regression; shrinkage estimators
Secondary: 62F12: Asymptotic properties of estimators

#### Citation

Datta, Abhirup; Zou, Hui. CoCoLasso for high-dimensional error-in-variables regression. Ann. Statist. 45 (2017), no. 6, 2400--2426. doi:10.1214/16-AOS1527. https://projecteuclid.org/euclid.aos/1513328577

#### References

• [1] Belloni, A., Rosenbaum, M. and Tsybakov, A. B. (2014). Linear and conic programming estimators in high-dimensional errors-in-variables models. Preprint. Available at arXiv:1408.0241.
• [2] Belloni, A., Rosenbaum, M. and Tsybakov, A. B. (2016). An $\{\ell_{1},\ell_{2},\ell_{\infty}\}$-regularization approach to high-dimensional errors-in-variables models. Electron. J. Stat. 10 1729–1750.
• [3] Benjamini, Y. and Speed, T. P. (2012). Estimation and correction for GC-content bias in high throughput sequencing. Nucleic Acids Res. 40 72.
• [4] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [5] Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Faund. Trends Mach. Learn. 3 1–122.
• [6] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
• [7] Cai, T. T., Zhang, C.-H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
• [8] Candès, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
• [9] Duchi, J., Shalev-Shwartz, S., Singer, Y. and Chandra, T. (2008). Efficient projections onto the L1-ball for learning in high dimensions. In Proceedings of the 25th International Conference on Machine Learning (ICML ’08) 272–279. ACM, New York.
• [10] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
• [11] Efron, B., Hastie, T. and Tibshirani, R. (2007). Discussion: “The Dantzig selector: Statistical estimation when $p$ is much larger than $n$” [Ann. Statist. 35 (2007), no. 6, 2313–2351; MR2382644] by E. Candes and T. Tao. Ann. Statist. 35 2358–2364.
• [12] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• [13] Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In International Congress of Mathematicians. Vol. III 595–622. Eur. Math. Soc., Zürich.
• [14] Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statist. Sinica 20 101–148.
• [15] Friedman, J. H., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1–22.
• [16] Hastie, T., Tibshirani, R. and Friedman, J. (2011). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
• [17] Loh, P. L. and Wainwright, M. J. (2012). High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Ann. Statist. 40 1637–1664.
• [18] Mai, Q., Zou, H. and Yuan, M. (2012). A direct approach to sparse discriminant analysis in ultra-high dimensions. Biometrika 99 29–42.
• [19] Purdom, E. and Holmes, S. P. (2005). Error distribution for gene expression data. Stat. Appl. Genet. Mol. Biol. 4 Art. 16, 35.
• [20] Rosenbaum, M. and Tsybakov, A. B. (2010). Sparse recovery under matrix uncertainty. Ann. Statist. 38 2620–2651.
• [21] Rosenbaum, M. and Tsybakov, A. B. (2013). Improved matrix uncertainty selector. In From Probability to Statistics and Back: High-Dimensional Models and Processes. Inst. Math. Stat. (IMS) Collect. 9 276–290. IMS, Beachwood, OH.
• [22] Slijepcevic, S., Megerian, S. and Potkonjak, M. (2002). Location errors in wireless embedded sensor networks: Sources, models, and effects on applications. Mob. Comput. Commun. Rev. 6 67–78.
• [23] Sørensen, Ø., Frigessi, A. and Thoresen, M. (2013). Measurement error in LASSO: Impact and likelihood bias correction. Statist. Sinica 23. To appear.
• [24] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc., B 58 267–288.
• [25] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 91–108.
• [26] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
• [27] Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge Univ. Press, Cambridge.
• [28] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
• [29] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
• [30] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
• [31] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.