Electronic Journal of Statistics

A general family of trimmed estimators for robust high-dimensional data analysis

Eunho Yang, Aurélie C. Lozano, and Aleksandr Aravkin

Full-text: Open access

Abstract

We consider the problem of robustifying high-dimensional structured estimation. Robust techniques are key in real-world applications which often involve outliers and data corruption. We focus on trimmed versions of structurally regularized M-estimators in the high-dimensional setting, including the popular Least Trimmed Squares estimator, as well as analogous estimators for generalized linear models and graphical models, using convex and non-convex loss functions. We present a general analysis of their statistical convergence rates and consistency, and then take a closer look at the trimmed versions of the Lasso and Graphical Lasso estimators as special cases. On the optimization side, we show how to extend algorithms for M-estimators to fit trimmed variants and provide guarantees on their numerical convergence. The generality and competitive performance of high-dimensional trimmed estimators are illustrated numerically on both simulated and real-world genomics data.

Article information

Source
Electron. J. Statist., Volume 12, Number 2 (2018), 3519-3553.

Dates
Received: March 2018
First available in Project Euclid: 22 October 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1540195547

Digital Object Identifier
doi:10.1214/18-EJS1470

Keywords
Lasso robust estimation high-dimensional variable selection sparse learning

Rights
Creative Commons Attribution 4.0 International License.

Citation

Yang, Eunho; Lozano, Aurélie C.; Aravkin, Aleksandr. A general family of trimmed estimators for robust high-dimensional data analysis. Electron. J. Statist. 12 (2018), no. 2, 3519--3553. doi:10.1214/18-EJS1470. https://projecteuclid.org/euclid.ejs/1540195547


Export citation

References

  • [1] Alfons, A., Croux, C., and Gelper, S. (2013), “Sparse least trimmed squares regression for analyzing high-dimensional large data sets,”, Ann. Appl. Stat., 7, 226–248.
  • [2] Aravkin, A. Y. and Van Leeuwen, T. (2012), “Estimating nuisance parameters in inverse problems,”, Inverse Problems, 28, 115016.
  • [3] Bannerjee, O., Ghaoui, L. E., and d’Aspremont, A. (2008), “Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data,”, Jour. Mach. Lear. Res., 9, 485–516.
  • [4] Belloni, A., Chernozhukov, V., Kaul, A., Rosenbaum, M., and Tsybakov, A. B. (2017), “Pivotal Estimation via Self-Normalization for High-Dimensional Linear Models with Error in Variables,”, arXiv preprint arXiv:1708.08353.
  • [5] Belloni, A., Chernozhukov, V., and Wang, L. (2011), “Square-root lasso: pivotal recovery of sparse signals via conic programming,”, Biometrika, 98, 791–806.
  • [6] Bhatia, K., Jain, P., and Kar, P. (2015), “Robust Regression via Hard Thresholding,” in, Neur. Info. Proc. Sys. (NIPS).
  • [7] Boyd, S. and Vandenberghe, L. (2004), Convex optimization, Cambridge, UK: Cambridge University Press.
  • [8] Brem, R. B. and Kruglyak, L. (2005), “The landscape of genetic complexity across 5,700 gene expression traits in yeast,”, Proceedings of the National Academy of Sciences of the United States of America, 102, 1572–1577.
  • [9] Brem, R. B., Storey, J. D., Whittle, J., and Kruglyak, L. (2005), “Genetic interactions between polymorphisms that affect gene expression in yeast.”, Nature, 436, 701–703.
  • [10] Bunea, F. (2008), “Honest variable selection in linear and logistic regression models via l1 and l1 + l2 penalization,”, Electron. J. Stat., 2, 1153–1194.
  • [11] Candès, E., Romberg, J., and Tao, T. (2006), “Stable signal recovery from incomplete and inaccurate measurements,”, Communications on Pure and Applied Mathematics, 59, 1207–1223.
  • [12] Chen, Y., Caramanis, C., and Mannor, S. (2013), “Robust High Dimensional Sparse Regression and Matching Pursuit,”, The Proceedings of the International Conference on Machine Learning (ICML).
  • [13] Chetverikov, D., Liao, Z., and Chernozhukov, V. (2017), “On cross-validated Lasso,”, Arxiv preprint arXiv:1605.02214.
  • [14] Cross, G. and Jain, A. (1983), “Markov Random Field Texture Models,”, IEEE Trans. PAMI, 5, 25–39.
  • [15] Daye, Z., Chen, J., and H., L. (2012), “High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis,”, Biometrics, 68, 316–326.
  • [16] Finegold, M. and Drton, M. (2011), “Robust graphical modeling of gene networks using classical and alternative T-distributions,”, The Annals of Applied Statistics, 5, 1057–1080.
  • [17] Friedman, J., Hastie, T., and Tibshirani, R. (2007), “Sparse inverse covariance estimation with the graphical Lasso,”, Biostatistics.
  • [18] Golub, G. and Pereyra, V. (2003), “Separable nonlinear least squares: the variable projection method and its applications,”, Inverse Problems, 19, R1–R26.
  • [19] Hassner, M. and Sklansky, J. (1978), “Markov Random Field Models of Digitized Image Texture,” in, ICPR78, pp. 538–540.
  • [20] Ising, E. (1925), “Beitrag zur Theorie der Ferromagnetismus,”, Zeitschrift für Physik, 31, 253–258.
  • [21] Kanehisa, M., Goto, S., Sato, Y., Kawashima, M., Furumichi, M., and Tanabe, M. (2014), “Data, information, knowledge and principle: back to metabolism in KEGG,”, Nucleic Acids Res., 42, D199–D205.
  • [22] Lambert-Lacroix, S., Zwald, L., et al. (2011), “Robust regression through the Huber’s criterion and adaptive lasso penalty,”, Electronic Journal of Statistics, 5, 1015–1053.
  • [23] Lauritzen, S. (1996), Graphical models, Oxford University Press, USA.
  • [24] Liu, L., Shen, Y., Li, T., and Caramanis, C. (2018), “High dimensional robust sparse regression,”, Arxiv preprint arXiv:1805.11643.
  • [25] Loh, P. and Wainwright, M. J. (2015), “Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima,”, Journal of Machine Learning Research (JMLR), 16, 559–616.
  • [26] Loh, P.-L. and Wainwright, M. J. (2013), “Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima,” in, Neur. Info. Proc. Sys. (NIPS), 26.
  • [27] Manning, C. D. and Schutze, H. (1999), Foundations of Statistical Natural Language Processing, MIT Press.
  • [28] Meinshausen, N. and Bühlmann, P. (2006), “High-dimensional graphs and variable selection with the Lasso,”, Annals of Statistics, 34, 1436–1462.
  • [29] Negahban, S., Ravikumar, P., Wainwright, M. J., and Yu, B. (2012), “A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers,”, Statistical Science, 27, 538–557.
  • [30] Nesterov, Y. (2004), Introductory lectures on convex optimization, vol. 87 of Applied Optimization, Kluwer Academic Publishers, Boston, MA, a basic course.
  • [31] Nguyen, N. H. and Tran, T. D. (2013), “Robust Lasso with missing and grossly corrupted observations,”, IEEE Trans. Info. Theory, 59, 2036–2058.
  • [32] Oh, J. H. and Deasy, J. O. (2014), “Inference of radio-responsive gene regulatory networks using the graphical lasso algorithm,”, BMC Bioinformatics, 15, S5.
  • [33] Prasad, A., Suggala, A. S., Balakrishnan, S., and Ravikumar, P. (2018), “Robust Estimation via Robust Gradient Estimation,”, Arxiv preprint arXiv:1802.06485.
  • [34] Raskutti, G., Wainwright, M. J., and Yu, B. (2010), “Restricted Eigenvalue Properties for Correlated Gaussian Designs,”, Journal of Machine Learning Research (JMLR), 99, 2241–2259.
  • [35] Ravikumar, P., Wainwright, M. J., Raskutti, G., and Yu, B. (2011), “High-dimensional covariance estimation by minimizing $\ell_1$-penalized log-determinant divergence,”, Electronic Journal of Statistics, 5, 935–980.
  • [36] Recht, B., Fazel, M., and Parrilo, P. A. (2010), “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization,”, SIAM review, 52, 471–501.
  • [37] Ripley, B. D. (1981), Spatial statistics, New York: Wiley.
  • [38] Rosenbaum, M. and Tsybakov, A. B. (2010), “Sparse recovery under matrix uncertainty,”, The Annals of Statistics, 2620–2651.
  • [39] Rousseeuw, P. J. (1984), “Least median of squares regression,”, J. Amer. Statist. Assoc., 79, 871–880.
  • [40] Stratton, H., Zhou, J., Reed, S., and Stone, D. (1996), “The Mating-Specific Galpha Protein of Saccharomyces cerevisiae Downregulates the Mating Signal by a Mechanism That Is Dependent on Pheromone and Independent of Gbetagamma Sequestration,”, Molecular and Cellular Biology.
  • [41] Sun, H. and Li, H. (2012), “Robust Gaussian graphical modeling via l1 penalization,”, Biometrics, 68, 1197–206.
  • [42] Tibshirani, J. and Manning, C. D. (2014), “Robust Logistic Regression using Shift Parameters.” in, ACL (2), pp. 124–129.
  • [43] Tibshirani, R. (1996), “Regression shrinkage and selection via the lasso,”, Journal of the Royal Statistical Society, Series B, 58, 267–288.
  • [44] van de Geer, S. and Buhlmann, P. (2009), “On the conditions used to prove oracle results for the Lasso,”, Electronic Journal of Statistics, 3, 1360–1392.
  • [45] Vershynin, R. (2012), “Introduction to the non-asymptotic analysis of random matrices,” in, Compressed Sensing: Theory and Applications, eds. Eldar, Y. and Kutyniok, G., Cambridge University Press, pp. 210–268, forthcoming.
  • [46] Wainwright, M. J. (2009), “Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_1$-constrained quadratic programming (Lasso),”, IEEE Trans. Information Theory, 55, 2183–2202.
  • [47] Wang, H., Li, G., and Jiang, G. (2007), “Robust regression shrinkage and consistent variable selection through the LAD-lasso,”, Journal of Business and Economics Statistics, 25, 347–355.
  • [48] Woods, J. (1978), “Markov Image Modeling,”, IEEE Transactions on Automatic Control, 23, 846–850.
  • [49] Yang, E. and Ravikumar, P. (2013), “Dirty Statistical Models,” in, Neur. Info. Proc. Sys. (NIPS), 26.
  • [50] Yang, E., Ravikumar, P., Allen, G. I., and Liu, Z. (2012), “Graphical Models via Generalized Linear Models,” in, Neur. Info. Proc. Sys. (NIPS), 25.
  • [51] Yang, E., Tewari, A., and Ravikumar, P. (2013), “On Robust Estimation of High Dimensional Generalized Linear Models,” in, Inter. Joint Conf. on Artificial Intelligence, 13.
  • [52] Yuan, M. and Lin, Y. (2007), “Model selection and estimation in the Gaussian graphical model,”, Biometrika, 94, 19–35.
  • [53] Zhang, X., Xu, C., Zhang, Y., Zhu, T., and Cheng, L. (2017a), “Multivariate Regression with Grossly Corrupted Observations: A Robust Approach and its Applications,”, Arxiv preprint arXiv:1701.02892.
  • [54] Zhang, X., Zhao, L., Boedihardjo, A. P., and Lu, C.-T. (2017b), “Robust Regression via Heuristic Hard Thresholding,” in, International Joint Conference on Artificial Intelligence (IJCAI).