## Electronic Journal of Statistics

- Electron. J. Statist.
- Volume 8, Number 1 (2014), 328-354.

### Estimation and variable selection with exponential weights

Ery Arias-Castro and Karim Lounici

**Full-text: Open access**

#### Abstract

In the context of a linear model with a sparse coefficient vector, exponential weights methods have been shown to be achieve oracle inequalities for denoising/prediction. We show that such methods also succeed at variable selection and estimation under the near minimum condition on the design matrix, instead of much stronger assumptions required by other methods such as the Lasso or the Dantzig Selector. The same analysis yields consistency results for Bayesian methods and BIC-type variable selection under similar conditions.

#### Article information

**Source**

Electron. J. Statist., Volume 8, Number 1 (2014), 328-354.

**Dates**

First available in Project Euclid: 18 April 2014

**Permanent link to this document**

https://projecteuclid.org/euclid.ejs/1397826704

**Digital Object Identifier**

doi:10.1214/14-EJS883

**Mathematical Reviews number (MathSciNet)**

MR3195119

**Zentralblatt MATH identifier**

1294.62164

**Subjects**

Primary: 62J99: None of the above, but in this section

**Keywords**

Estimation variable selection model selection sparse linear model exponential weights Gibbs sampler identifiability condition

#### Citation

Arias-Castro, Ery; Lounici, Karim. Estimation and variable selection with exponential weights. Electron. J. Statist. 8 (2014), no. 1, 328--354. doi:10.1214/14-EJS883. https://projecteuclid.org/euclid.ejs/1397826704

#### References

- [1] Abramovich, F. and V. Grinshtein (2010). Map model selection in gaussian regression.
*Electron. J. Stat.**4*, 932–949. - [2] Alquier, P. and K. Lounici (2011). Pac-bayesian theorems for sparse regression estimation with exponential weights.
*Electronic Journal of Statistics**5*, 127–145. Arxiv:1009.2707. - [3] Bach, F. R. (2008). Bolasso: model consistent lasso estimation through the bootstrap. In
*Proceedings of the 25th international conference on Machine learning*, ICML ’08, New York, NY, USA, pp. 33–40. ACM. - [4] Bickel, P., Y. Ritov, and A. Tsybakov (2009). Simultaneous analysis of lasso and dantzig selector.
*Annals of Statistics**37*(4), 1705–1732. - [5] Birgé, L. and P. Massart (2001). Gaussian model selection.
*J. Eur. Math. Soc. (JEMS)**3*(3), 203–268. - [6] Bunea, F. (2008). Consistent selection via the Lasso for high dimensional approximating regression models. In
*Pushing the limits of contemporary statistics: contributions in honor of Jayanta K. Ghosh*, Volume 3 of*Inst. Math. Stat. Collect.*, pp. 122–137. Beachwood, OH: Inst. Math. Statist. - [7] Bunea, F. and A. Nobel (2008). Sequential procedures for aggregating arbitrary estimators of a conditional mean.
*IEEE Trans. Inform. Theory**54*(4), 1725–1735. - [8] Bunea, F., A. Tsybakov, and M. Wegkamp (2007). Sparsity oracle inequalities for the Lasso.
*Electronic Journal of Statistics**1*, 169–194. - [9] Cai, T. T. and L. Wang (2011). Orthogonal matching pursuit for sparse signal recovery with noise.
*IEEE Trans. Inform. Theory**57*(7), 4680–4688. - [10] Candès, E. and T. Tao (2007). The Dantzig selector: statistical estimation when $p$ is much larger than $n$.
*Ann. Statist.**35*(6), 2313–2351. - [11] Candès, E. J. and M. A. Davenport (2013). How well can we estimate a sparse vector?
*Appl. Comput. Harmon. Anal.**34*(2), 317–323. - [12] Candès, E. J. and Y. Plan (2009). Near-ideal model selection by $\ell_{1}$ minimization.
*Ann. Statist.**37*(5A), 2145–2177. - [13] Catoni, O. (2004).
*Statistical learning theory and stochastic optimization*, Volume 1851 of*Lecture Notes in Mathematics*. Berlin: Springer-Verlag. Lecture notes from the 31st Summer School on Probability Theory held in Saint-Flour, July 8–25, 2001. - [14] Chen, J. and Z. Chen (2008). Extended Bayesian information criteria for model selection with large model spaces.
*Biometrika**95*(3), 759–771. - [15] Chipman, H., E. I. George, and R. E. McCulloch (2001). The practical implementation of Bayesian model selection. In
*Model selection*, Volume 38 of*IMS Lecture Notes Monogr. Ser.*, pp. 65–134. Beachwood, OH: Inst. Math. Statist. With discussion by M. Clyde, Dean P. Foster, and Robert A. Stine, and a rejoinder by the authors. - [16] Dalalyan, A. and J. Salmon (2011). Optimal aggregation of affine estimators. In
*Proceedings of the 24th annual conference on Computational Learning Theory*, Budapest (Hungary). - [17] Dalalyan, A. and J. Salmon (2012). Sharp oracle inequalities for aggregation of affine estimators.
*Ann. Statist.**40*(4), 2327–2355. - [18] Dalalyan, A. and A. Tsybakov (2007). Aggregation by exponential weighting and sharp oracle inequalities. In
*Learning theory*, Volume 4539 of*Lecture Notes in Comput. Sci.*, pp. 97–111. Berlin: Springer. - [19] Fan, J. and J. Lv (2008). Sure independence screening for ultrahigh dimensional feature space.
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)**70*(5), 849–911. - [20] Fan, J. and J. Lv (2011). Nonconcave penalized likelihood with NP-dimensionality.
*IEEE Trans. Inform. Theory**57*(8), 5467–5484. - [21] Fan, J. and H. Peng (2004). Nonconcave penalized likelihood with a diverging number of parameters.
*Ann. Statist.**32*(3), 928–961. - [22] Gautier, E. and A. Tsybakov (2011, October). High-dimensional instrumental variables regression and confidence sets. Technical report, Arxiv preprint 1105.2454v3.
- [23] Giraud, C. (2008). Mixing least-squares estimators when the variance is unknown.
*Bernoulli**14*(4), 1089–1107. - [24] Ji, P. and J. Jin (2012). UPS delivers optimal phase diagram in high-dimensional variable selection.
*Ann. Statist.**40*(1), 73–103. - [25] Jin, J., C. Zhang, and Q. Zhang (2012). Optimality of graphlet screening in high dimensional variable selection. Available online at http://arxiv.org/abs/1204.6452.
- [26] Juditsky, A., P. Rigollet, and A. B. Tsybakov (2008). Learning by mirror averaging.
*Ann. Statist.**36*(5), 2183–2206. - [27] Leung, G. and A. Barron (2006). Information theory and mixing least-squares regressions.
*IEEE Transactions on Information Theory**52*(8), 3396–3410. - [28] Lounici, K. (2007). Generalized mirror averaging and $D$-convex aggregation.
*Math. Methods Statist.**16*(3), 246–259. - [29] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators.
*Electronic Journal of Statistics**2*, 90–102. - [30] Lounici, K. (2009).
*Statistical Estimation in High-Dimension, Sparsity Oracle Inequalities*. Ph. D. thesis, University Paris Diderot - Paris 7. - [31] Lounici, K., M. Pontil, A. Tsybakov, and S. van de Geer (2011). Oracle inequalities and optimal inference under group sparsity.
*Ann. Statist.**39*(4), 2164–2204. - [32] Meinshausen, N., P. Bühlmann, and E. Zürich (2006). High dimensional graphs and variable selection with the lasso.
*Annals of Statistics**34*, 1436–1462. - [33] Meinshausen, N. and B. Yu (2009). Lasso-type recovery of sparse representations for high-dimensional data.
*Ann. Statist.**37*(1), 246–270. - [34] Raskutti, G., M. J. Wainwright, and B. Yu (2011). Minimax rates of estimation for high-dimensional linear regression over $\ell_{q}$-balls.
*IEEE Trans. Inform. Theory**57*(10), 6976–6994. - [35] Rigollet, P. and A. Tsybakov (2011). Exponential screening and optimal rates of sparse estimation.
*Ann. Statist.**39*(2), 731–771. - [36] Rigollet, P. and A. B. Tsybakov (2012). Sparse estimation by exponential weighting.
*Statist. Sci.**27*(4), 558–575. - [37] Rudelson, M. and S. Zhou (2013). Reconstruction from anisotropic random measurements.
*IEEE Trans. Inform. Theory**59*(6), 3434–3447. - [38] Shao, J. (1997). An asymptotic theory for linear model selection.
*Statist. Sinica**7*(2), 221–264. With comments and a rejoinder by the author. - [39] Stewart, G. W. and J. G. Sun (1990).
*Matrix perturbation theory*. Computer Science and Scientific Computing. Boston, MA: Academic Press Inc. - [40] Vershynin, R. (2010). Introduction to the non-asymptotic analysis of random matrices. Available from http://arxiv.org/abs/1011.3027.
- [41] Verzelen, N. (2012). Minimax risks for sparse regressions: Ultra-high dimensional phenomenons.
*Electron. J. Stat.**6*, 38–90. - [42] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso).
*IEEE Trans. Inform. Theory**55*(5), 2183–2202. - [43] Yang, Y. (2004). Aggregating regression procedures to improve performance.
*Bernoulli**10*(8), 25–47. - [44] Yang, Y. (2005). Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation.
*Biometrika**92*(4), 937–950. - [45] Ye, F. and C.-H. Zhang (2010). Rate minimaxity of the Lasso and Dantzig selector for the $\ell_{q}$ loss in $\ell_{r}$ balls.
*J. Mach. Learn. Res.**11*, 3519–3540. - [46] Zhang, C.-H. (2007). Information-theoretic optimality of variable selection with concave penalty. Technical report, Dept. Statistics, Rutgers Univ.
- [47] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty.
*Ann. Statist.**38*(2), 894–942. - [48] Zhang, C.-H. and T. Zhang (2012). A general theory of concave regularization for high-dimensional sparse estimation problems.
*Statist. Sci.**27*(4), 576–593. - [49] Zhao, P. and B. Yu (2006). On model selection consistency of Lasso.
*J. Mach. Learn. Res.**7*, 2541–2563.

#### The Institute of Mathematical Statistics and the Bernoulli Society

### More like this

- Simultaneous analysis of Lasso and Dantzig selector

Bickel, Peter J., Ritov, Ya’acov, and Tsybakov, Alexandre B., The Annals of Statistics, 2009 - High-dimensional graphs and variable selection with the Lasso

Meinshausen, Nicolai and Bühlmann, Peter, The Annals of Statistics, 2006 - The Dantzig selector: Statistical estimation when p is much larger than n

Candes, Emmanuel and Tao, Terence, The Annals of Statistics, 2007

- Simultaneous analysis of Lasso and Dantzig selector

Bickel, Peter J., Ritov, Ya’acov, and Tsybakov, Alexandre B., The Annals of Statistics, 2009 - High-dimensional graphs and variable selection with the Lasso

Meinshausen, Nicolai and Bühlmann, Peter, The Annals of Statistics, 2006 - The Dantzig selector: Statistical estimation when p is much larger than n

Candes, Emmanuel and Tao, Terence, The Annals of Statistics, 2007 - Oracle inequalities and optimal inference under group sparsity

Lounici, Karim, Pontil, Massimiliano, van de Geer, Sara, and Tsybakov, Alexandre B., The Annals of Statistics, 2011 - Regularized rank-based estimation of high-dimensional nonparanormal graphical models

Xue, Lingzhou and Zou, Hui, The Annals of Statistics, 2012 - On the adaptive elastic-net with a diverging number of parameters

Zou, Hui and Zhang, Hao Helen, The Annals of Statistics, 2009 - Weaker Regularity Conditions and Sparse Recovery in High-Dimensional Regression

Wang, Shiqing, Shi, Yan, and Su, Limin, Journal of Applied Mathematics, 2014 - Bayesian Variable Selection and Estimation for Group Lasso

Xu, Xiaofan and Ghosh, Malay, Bayesian Analysis, 2015 - Normalized and standard Dantzig estimators: Two approaches

Mielniczuk, Jan and Szymanowski, Hubert, Electronic Journal of Statistics, 2015 - The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)

van de Geer, Sara, Bühlmann, Peter, and Zhou, Shuheng, Electronic Journal of Statistics, 2011