In this article we investigate consistency of selection in regression models via the popular Lasso method. Here we depart from the traditional linear regression assumption and consider approximations of the regression function f with elements of a given dictionary of M functions. The target for consistency is the index set of those functions from this dictionary that realize the most parsimonious approximation to f among all linear combinations belonging to an L2 ball centered at f and of radius r2n, M. In this framework we show that a consistent estimate of this index set can be derived via ℓ1 penalized least squares, with a data dependent penalty and with tuning sequence rn, M>$\sqrt{\log(Mn)/n}$, where n is the sample size. Our results hold for any 1≤M≤nγ, for any γ>0.
References
[1] Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control 19 716–723.
Mathematical Reviews (MathSciNet):
MR423716
[2] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413.
[3] Benjamini, Y. and Hochberg, Y. (1995). Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Hypothesis Testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
[4] Bunea, F. (2004). Consistent covariate selection and post model selection inference in semiparametric regression. Ann. Statist. 32 898–927.
[5] Bunea, F., Wegkamp, M. H. and Auguste, A. (2006). Consistent variable selection in high dimensional regression via multiple testing. J. Statist. Plann. Inference 136 4349–4364.
[6] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Sparsity oracle inequalities for the Lasso. Electronic J. Statist. 1 169–194.
[7] Chakrabarti, A. and Ghosh, J. K. (2006). A generalization of BIC for the general exponential families. J. Statist. Plann. Inference 136 2847–2872.
[8] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–451.
[9] Genovese, C. and Wasserman, L. (2004). A Stochastic Process Approach to False Discovery Rates. Ann. Statist. 32 1035–1061.
[10] Guyon, X. and Yao, J. (1999). On the underfitting and overfitting sets of models chosen by order selection criteria. J. Multivariate Anal. 70 221–315.
[11] Lahiri, P., ed. (2001). Model Selection. Institute of Mathematical Statistics Lecture Notes – Monograph Series 38. IMS, Beachwood, OH.
[12] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
[13] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000a). On the lasso and its dual. J. Comput. Graph. Statist. 9 319–337.
[14] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000b). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–404.
[15] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
Mathematical Reviews (MathSciNet):
MR468014
[16] Shao, J. (1993). Linear model selection by cross validation. J. Amer. Statist. Assoc. 888 486–494.
[17] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
[18] Turlach, B. A. (2005). On algorithms for solving least squares problems under an L1 penalty or an L1 constraint. 2004 Proceedings of the American Statistical Association, Statistical Computing Section [CD-ROM] 2572–2577. American Statistical Association, Alexandria, VA.
[19] Wainwright, M. J. (2007). Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. Technical report, Dept. Statistics, UC Berkeley.
[20] Wasserman, L. and Roeder, K. (2007). High dimensional variable selection. Technical report, Dept. Statistics, Carnegie Mellon Univ.
[21] Wegkamp, M. H. (2003). Model selection in nonparametric regression. Ann. Statist. 31 252–273.
[22] Woodroofe, M. (1982). On model selection and the arcsine laws. Ann. Statist. 10 1182–1194.
Mathematical Reviews (MathSciNet):
MR673653
[23] Zhao, P. and Yu, B. (2007). On model selection consistency of Lasso. J. Machine Learning Research 7 2541–2567.
[24] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.