Source: Ann. Statist.
Volume 37, Number 4
We consider the problem of model selection and estimation in situations where the number of parameters diverges with the sample size. When the dimension is high, an ideal method should have the oracle property [J. Amer. Statist. Assoc. 96 (2001) 1348–1360] and [Ann. Statist. 32 (2004) 928–961] which ensures the optimal large sample performance. Furthermore, the high-dimensionality often induces the collinearity problem, which should be properly handled by the ideal method. Many existing variable selection methods fail to achieve both goals simultaneously. In this paper, we propose the adaptive elastic-net that combines the strengths of the quadratic regularization and the adaptively weighted lasso shrinkage. Under weak regularity conditions, we establish the oracle property of the adaptive elastic-net. We show by simulations that the adaptive elastic-net deals with the collinearity problem better than the other oracle-like methods, thus enjoying much improved finite sample performance.
Breiman, L. (1996). Heuristics of instability and stabilization in model selection. Ann. Statist. 24 2350–2383.
Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
Candes, E., Wakin, M. and Boyd, S. (2008). Enhancing sparsity by reweighted l1 minimization. J. Fourier Anal. Appl. To appear.
Donoho, D. and Johnstone, I. (1994). Ideal spatial adaptation via wavelet shrinkage. Biometrika 81 425–455.
Donoho, D., Johnstone, I., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: Asymptopia? (with discussion). J. Roy. Statist. Soc. Ser. B 57 301–337.
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In International Congress of Mathematicians 3 595–622.
Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high-dimensional feature space. J. Roy. Statist. Soc. Ser. B 70 849–911.
Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
Fan, J., Peng, H. and Huang, T. (2005). Semilinear high-dimensional model for normalization of microarray data: A theoretical analysis and partial consistency (with discussion). J. Amer. Statist. Assoc. 100 781–813.
Huber, P. (1988). Robust regression: Asymptotics, conjectures and Monte Carlo. Ann. Statist. 1 799–821.
Mathematical Reviews (MathSciNet): MR356373
Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
Lam, C. and Fan, J. (2008). Profile-kernel likelihood inference with diverging number of parameters. Ann. Statist. 36 2232–2260.
Portnoy, S. (1984). Asymptotic behavior of M-estimatiors of p regression parameters when p2/n is large. I. Consistency. Ann. Statist. 12 1298–1309.
Mathematical Reviews (MathSciNet): MR760690
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
Wang, H., Li, R. and Tsai, C. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94 553–568.
Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301–320.
Zou, H., Hastie, T. and Tibshirani, R. (2007). On the degrees of freedom of the lasso. Ann. Statist. 35 2173–2192.