The Annals of Applied Statistics

Robust elastic net estimators for variable selection and identification of proteomic biomarkers

Gabriela V. Cohen Freue, David Kepplinger, Matías Salibián-Barrera, and Ezequiel Smucler

Full-text: Open access

Abstract

In large-scale quantitative proteomic studies, scientists measure the abundance of thousands of proteins from the human proteome in search of novel biomarkers for a given disease. Penalized regression estimators can be used to identify potential biomarkers among a large set of molecular features measured. Yet, the performance and statistical properties of these estimators depend on the loss and penalty functions used to define them. Motivated by a real plasma proteomic biomarkers study, we propose a new class of penalized robust estimators based on the elastic net penalty, which can be tuned to keep groups of correlated variables together in the selected model and maintain robustness against possible outliers. We also propose an efficient algorithm to compute our robust penalized estimators and derive a data-driven method to select the penalty term. Our robust penalized estimators have very good robustness properties and are also consistent under certain regularity conditions. Numerical results show that our robust estimators compare favorably to other robust penalized estimators. Using our proposed methodology for the analysis of the proteomics data, we identify new potentially relevant biomarkers of cardiac allograft vasculopathy that are not found with nonrobust alternatives. The selected model is validated in a new set of 52 test samples and achieves an area under the receiver operating characteristic (AUC) of 0.85.

Article information

Source
Ann. Appl. Stat., Volume 13, Number 4 (2019), 2065-2090.

Dates
Received: March 2018
Revised: February 2019
First available in Project Euclid: 28 November 2019

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1574910036

Digital Object Identifier
doi:10.1214/19-AOAS1269

Keywords
Robust estimation regularized estimation penalized estimation elastic net penalty proteomics biomarkers

Citation

Cohen Freue, Gabriela V.; Kepplinger, David; Salibián-Barrera, Matías; Smucler, Ezequiel. Robust elastic net estimators for variable selection and identification of proteomic biomarkers. Ann. Appl. Stat. 13 (2019), no. 4, 2065--2090. doi:10.1214/19-AOAS1269. https://projecteuclid.org/euclid.aoas/1574910036


Export citation

References

  • Alfons, A. (2016). robustHD: Robust Methods for High-Dimensional Data. R package version 0.5.1.
  • Alfons, A., Croux, C. and Gelper, S. (2013). Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Ann. Appl. Stat. 7 226–248.
  • Clarke, F. H. (1990). Optimization and Nonsmooth Analysis, 2nd ed. Classics in Applied Mathematics 5. SIAM, Philadelphia, PA.
  • Cohen Freue, G. V. and Borchers, C. H. (2012). Multiple Reaction Monitoring (MRM). Circ. Cardiovasc. Genet. 5 378.
  • Cohen Freue, G. V, Kepplinger, D., Salibián-Barrera, M. and Smucler, E. (2019). Supplement to “Robust elastic net estimators for variable selection and identification of proteomic biomarkers.” DOI:10.1214/19-AOAS1269SUPP.
  • Domanski, D., Percy, A. J., Yang, J., Chambers, A. G., Hill, J. S., Freue, G. V. C. and Borchers, C. H. (2012). MRM-based multiplexed quantitation of 67 putative cardiovascular disease biomarkers in human plasma. Proteomics 12 1222–1243.
  • Donoho, D. and Huber, P. J. (1983). The notion of breakdown point. In A Festschrift for Erich L. Lehmann. Wadsworth Statist./Probab. Ser. 157–184. Wadsworth, Belmont, CA.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • Fan, J., Li, Q. and Wang, Y. (2017). Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 247–265.
  • Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1–22.
  • Khan, J. A., Van Aelst, S. and Zamar, R. H. (2007). Robust linear model selection based on least angle regression. J. Amer. Statist. Assoc. 102 1289–1299.
  • Koller, M. and Stahel, W. A. (2017). Nonsingular subsampling for regression S estimators with categorical predictors. Comput. Statist. 32 631–646.
  • Lin, D., Freue, G. C., Hollander, Z., Mancini, G. B. J., Sasaki, M., Mui, A., Wilson-McManus, J., Ignaszewski, A., Imai, C. et al. (2013). Plasma protein biosignatures for detection of cardiac allograft vasculopathy. J. Heart Lung Transplant. 32 723–733.
  • Maronna, R. A. (2011). Robust ridge regression for high-dimensional data. Technometrics 53 44–53.
  • Maronna, R. A., Martin, R. D. and Yohai, V. J. (2006). Robust Statistics. Wiley Series in Probability and Statistics: Theory and Methods. Wiley, Chichester.
  • Maronna, R. A. and Yohai, V. J. (2010). Correcting MM estimates for “fat” data sets. Comput. Statist. Data Anal. 54 3168–3173.
  • Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319–337.
  • Peña, D. and Yohai, V. (1999). A fast procedure for outlier diagnostics in large regression problems. J. Amer. Statist. Assoc. 94 434–445.
  • Rousseeuw, P. J. (1984). Least median of squares regression. J. Amer. Statist. Assoc. 79 871–880.
  • Rousseeuw, P. and Yohai, V. (1984). Robust regression by means of S-estimators. In Robust and Nonlinear Time Series Analysis (Heidelberg, 1983). Lect. Notes Stat. 26 256–272. Springer, New York.
  • Salibian-Barrera, M. and Yohai, V. J. (2006). A fast algorithm for S-regression estimates. J. Comput. Graph. Statist. 15 414–427.
  • Schmauss, D. and Weis, M. (2008). Cardiac allograft vasculopathy. Circ. 117 2131–2141.
  • Smucler, E. and Yohai, V. J. (2017). Robust and sparse estimators for linear regression models. Comput. Statist. Data Anal. 111 116–130.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tibshirani, R. J. (2013). The lasso problem and uniqueness. Electron. J. Stat. 7 1456–1490.
  • Tomioka, R., Suzuki, T. and Sugiyama, M. (2011). Super-linear convergence of dual augmented Lagrangian algorithm for sparsity regularized estimation. J. Mach. Learn. Res. 12 1537–1586.
  • Yohai, V. J. (1987). High breakdown-point and high efficiency robust estimates for regression. Ann. Statist. 15 642–656.
  • Yohai, V. J. and Zamar, R. H. (1988). High breakdown-point estimates of regression by means of the minimization of an efficient scale. J. Amer. Statist. Assoc. 83 406–413.
  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.
  • Zou, H. and Zhang, H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. Ann. Statist. 37 1733–1751.

Supplemental materials

  • Supplementary material for “Robust elastic net estimators for variable selection and identification of proteomic biomarkers”. We provide additional details on PENSE algorithm, properties and mathematical proofs.