The Annals of Applied Statistics

Random lasso

Sijian Wang, Bin Nan, Saharon Rosset, and Ji Zhu

Full-text: Open access

Abstract

We propose a computationally intensive method, the random lasso method, for variable selection in linear models. The method consists of two major steps. In step 1, the lasso method is applied to many bootstrap samples, each using a set of randomly selected covariates. A measure of importance is yielded from this step for each covariate. In step 2, a similar procedure to the first step is implemented with the exception that for each bootstrap sample, a subset of covariates is randomly selected with unequal selection probabilities determined by the covariates’ importance. Adaptive lasso may be used in the second step with weights determined by the importance measures. The final set of covariates and their coefficients are determined by averaging bootstrap results obtained from step 2. The proposed method alleviates some of the limitations of lasso, elastic-net and related methods noted especially in the context of microarray data analysis: it tends to remove highly correlated variables altogether or select them all, and maintains maximal flexibility in estimating their coefficients, particularly with different signs; the number of selected variables is no longer limited by the sample size; and the resulting prediction accuracy is competitive or superior compared to the alternatives. We illustrate the proposed method by extensive simulation studies. The proposed method is also applied to a Glioblastoma microarray data analysis.

Article information

Source
Ann. Appl. Stat., Volume 5, Number 1 (2011), 468-485.

Dates
First available in Project Euclid: 21 March 2011

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1300715199

Digital Object Identifier
doi:10.1214/10-AOAS377

Mathematical Reviews number (MathSciNet)
MR2810406

Zentralblatt MATH identifier
1220.62091

Keywords
Lasso microarray regularization variable selection

Citation

Wang, Sijian; Nan, Bin; Rosset, Saharon; Zhu, Ji. Random lasso. Ann. Appl. Stat. 5 (2011), no. 1, 468--485. doi:10.1214/10-AOAS377. https://projecteuclid.org/euclid.aoas/1300715199


Export citation

References

  • Breiman, L. (1995). Better subset regression using the non-negative garrote. Technometrics 37 373–384.
  • Breiman, L. (2001). Random forest. Mach. Learn. 45 5–32.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Freije, W., Castro-Vargas, F. E., Fang, Z., Horvath, S., Cloughesy, T., Liau, L. M., Mischel, P. S. and Nelson1, S. F. (2004). Gene expression profiling of Gliomas strongly predicts survival. Cancer Res. 64 6503–6510.
  • Horvath, S., Zhang, B., Carlson, M., Lu, K. V., Zhu, S., Felciano, R. M., Laurance, M. F., Zhao, W., Shu, Q., Lee, Y., Scheck, A. C., Liau, L. M., Wu, H., Geschwind, D. H., Febbo, P. G., Kornblum, H. I., Cloughesy, T. F., Nelson, S. F. and Mischel, P. S. (2006). Analysis of oncogenic signaling networks in glioblastoma identifies aspm as a novel molecular target. Proc. Natl. Acad. Sci. 103 17402–17407.
  • Meinshausen, N. (2007). Relaxed lasso. Comput. Statist. Data Anal. 52 374–393.
  • Park, M. and Hastie, T. (2008). Penalized logistic regression for detecting gene interactions. Biostatistics 9 30–50.
  • Radchenko, P. and James, G. (2008). Variable inclusion and Shrinkage algorithms. J. Amer. Statist. Assoc. 103 1304–1315.
  • Tatenhorst, L., Senner, V., Puttmann, S. and Paulus, W. (2004). Regulators of G-protein signaling 3 and 4 (RGS3, RGS4) are associated with Glioma cell motility. J. Neuropathol. Exp. Neurol. 63 210–222.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Xie, Y., Chan, H., Fan, J., Chen, Y., Young, J., Li, W., Miao, X., Yuan, Z., Wang, H., Tam, P. K. and Ren, Y. (2007). Involvement of visinin-like protein-1 (VSNL-1) in regulating proliferative and invasive properties of neuroblastoma. Carcinogenesis 28 2122–2130.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • Zou, M., Al-Baradie, R. S., Al-Hindi, H., Farid, N. R. and Shi, Y. (2005). Gene overexpression is associated with invasion and metastasis of papillary thyroid carcinoma. Br. J. Cancer 93 1277–1284.
  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67 301–320.