## Bernoulli

• Bernoulli
• Volume 22, Number 3 (2016), 1937-1961.

### An analysis of penalized interaction models

#### Abstract

An important consideration for variable selection in interaction models is to design an appropriate penalty that respects hierarchy of the importance of the variables. A common theme is to include an interaction term only after the corresponding main effects are present. In this paper, we study several recently proposed approaches and present a unified analysis on the convergence rate for a class of estimators, when the design satisfies the restricted eigenvalue condition. In particular, we show that with probability tending to one, the resulting estimates have a rate of convergence $s\sqrt{\log p_{1}/n}$ in the $\ell_{1}$ error, where $p_{1}$ is the ambient dimension, $s$ is the true dimension and $n$ is the sample size. We give a new proof that the restricted eigenvalue condition holds with high probability, when the variables in the main effects and the errors follow sub-Gaussian distributions. Under this setup, the interactions no longer follow Gaussian or sub-Gaussian distributions even if the main effects follow Gaussian, and thus existing works are not applicable. This result is of independent interest.

#### Article information

Source
Bernoulli, Volume 22, Number 3 (2016), 1937-1961.

Dates
Revised: November 2014
First available in Project Euclid: 16 March 2016

https://projecteuclid.org/euclid.bj/1458133003

Digital Object Identifier
doi:10.3150/15-BEJ715

Mathematical Reviews number (MathSciNet)
MR3474837

Zentralblatt MATH identifier
1360.62392

#### Citation

Zhao, Junlong; Leng, Chenlei. An analysis of penalized interaction models. Bernoulli 22 (2016), no. 3, 1937--1961. doi:10.3150/15-BEJ715. https://projecteuclid.org/euclid.bj/1458133003

#### References

• [1] Bach, F., Jenatton, R., Mairal, J. and Obozinski, G. (2012). Structured sparsity through convex optimization. Statist. Sci. 27 450–468.
• [2] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [3] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2010). Hierarchical selection of variables in sparse high-dimensional regression. In Borrowing Strength: Theory Powering Applications—a Festschrift for Lawrence D. Brown. Inst. Math. Stat. Collect. 6 56–69. Beachwood, OH: IMS.
• [4] Bien, J., Taylor, J. and Tibshirani, R. (2013). A LASSO for hierarchical interactions. Ann. Statist. 41 1111–1141.
• [5] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
• [6] Choi, N.H., Li, W. and Zhu, J. (2010). Variable selection with the strong heredity constraint and its oracle property. J. Amer. Statist. Assoc. 105 354–364.
• [7] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• [8] Fang, Y.G., Loparo, K.A. and Feng, X. (1994). Inequalities for the trace of matrix product. IEEE Trans. Automat. Control 39 2489–2490.
• [9] Hall, P. and Xue, J.-H. (2014). On selecting interacting features from high-dimensional data. Comput. Statist. Data Anal. 71 694–708.
• [10] Hao, N. and Zhang, H.H. (2012). Interaction selection under marginality principle in high dimensional regression. Manuscript.
• [11] Hao, N. and Zhang, H.H. (2012). A note on regression models with interactions. Manuscript.
• [12] Hao, N. and Zhang, H.H. (2014). Interaction screening for ultrahigh-dimensional data. J. Amer. Statist. Assoc. 109 1285–1301.
• [13] Lin, Y. and Zhang, H.H. (2006). Component selection and smoothing in multivariate nonparametric regression. Ann. Statist. 34 2272–2297.
• [14] Magnus, J.R. and Neudecker, H. (1979). The commutation matrix: Some properties and applications. Ann. Statist. 7 381–394.
• [15] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
• [16] Mendelson, S., Pajor, A. and Tomczak-Jaegermann, N. (2008). Uniform uncertainty principle for Bernoulli and subgaussian ensembles. Constr. Approx. 28 277–289.
• [17] Negahban, S.N., Ravikumar, P., Wainwright, M.J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
• [18] Radchenko, P. and James, G.M. (2010). Variable selection using adaptive nonlinear interaction structures in high dimensions. J. Amer. Statist. Assoc. 105 1541–1553.
• [19] Raskutti, G., Wainwright, M.J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241–2259.
• [20] Raskutti, G., Wainwright, M.J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over $\ell_{q}$-balls. IEEE Trans. Inform. Theory 57 6976–6994.
• [21] Rudelson, M. and Zhou, S. (2013). Reconstruction from anisotropic random measurements. IEEE Trans. Inform. Theory 59 3434–3447.
• [22] Shah, R.D. (2012). Modelling interactions in high-dimensional data with backtracking. Manuscript.
• [23] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• [24] van de Geer, S. (2007). The deterministic lasso. Techinical Report 140, ETH, Zurich.
• [25] van de Geer, S.A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
• [26] Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge: Cambridge Univ. Press.
• [27] Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. J. Amer. Statist. Assoc. 104 1512–1524.
• [28] Ye, F. and Zhang, C.-H. (2010). Rate minimaxity of the Lasso and Dantzig selector for the $\ell_{q}$ loss in $\ell_{r}$ balls. J. Mach. Learn. Res. 11 3519–3540.
• [29] Yuan, M., Joseph, V.R. and Zou, H. (2009). Structured variable selection and estimation. Ann. Appl. Stat. 3 1738–1757.
• [30] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
• [31] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
• [32] Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468–3497.
• [33] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.