## Electronic Journal of Statistics

### Model selection in regression under structural constraints

#### Abstract

The paper considers model selection in regression under the additional structural constraints on admissible models where the number of potential predictors miht be even larger than the available sample size. We develop a Bayesian formalism which is used as a natural tool for generating a wide class of model selection criteria based on penalized least squares estimation with various complexity penalties associated with a prior on a model size. The resulting criteria are adaptive to structural constraints. We establish the upper bound for the quadratic risk of the resulting MAP estimator and the corresponding lower bound for the minimax risk over a set of admissible models of a given size. We then specify the class of priors (and, therefore, the class of complexity penalties) where for the “nearly-orthogonal” design the MAP estimator is asymptotically at least nearly-minimax (up to a log-factor) simultaneously over an entire range of sparse and dense setups. Moreover, when the numbers of admissible models are “small” (e.g., ordered variable selection) or, on the opposite, for the case of complete variable selection, the proposed estimator achieves the exact minimax rates.

#### Article information

Source
Electron. J. Statist., Volume 7 (2013), 480-498.

Dates
First available in Project Euclid: 21 February 2013

https://projecteuclid.org/euclid.ejs/1361455094

Digital Object Identifier
doi:10.1214/13-EJS780

Mathematical Reviews number (MathSciNet)
MR3035263

Zentralblatt MATH identifier
1337.62156

#### Citation

Abramovich, Felix; Grinshtein, Vadim. Model selection in regression under structural constraints. Electron. J. Statist. 7 (2013), 480--498. doi:10.1214/13-EJS780. https://projecteuclid.org/euclid.ejs/1361455094

#### References

• [1] Abramovich, F., Benjamini, Y., Donoho, D.L. and Johnstone, I.M. (2006). Adapting to unknown sparsity by controlling the false discovery rate., Ann. Statist. 34, 584–653.
• [2] Abramovich, F. and Grinshtein, V. (2010). MAP model selection in Gaussian regression., Electr. J. Statist. 4, 932–949.
• [3] Abramovich, F., Grinshtein, V. and Pensky, M. (2007). On optimality of Bayesian testimation in the normal means problem., Ann. Statist. 35, 2261–2286.
• [4] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. in, Second International Symposium on Information Theory. (eds. B.N. Petrov and F. Czáki). Akademiai Kiadó, Budapest, 267–281.
• [5] Bien, J., Taylor, J. and Tibshirani, R. (2013). A Lasso for hierarchical interactions., Ann, Statist., to appear.
• [6] Birgé, L. and Massart, P. (2001). Gaussian model selection., J. Eur. Math. Soc. 3, 203–268.
• [7] Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection., Probab. Theory Relat. Fields 138, 33–73.
• [8] Bunea, F., Tsybakov, A. and Wegkamp, M.H. (2007). Aggregation for Gaussian regression., Ann. Statist. 35, 1674–1697.
• [9] Chipman, H. (1996). Bayesian variable selection with related predictors., Canad. J. Statist. 24, 17–36.
• [10] Chipman, H., George, E.I. and McCullogh, R.E. (2001)., The Practical Implementation of Bayesian Model Selection. IMS Lecture Notes – Monograph Series 38.
• [11] Donoho, D. and Johnstone, I.M. (1994). Ideal spatial adaptation by wavelet shrinkage., Biometrika 81, 425–455.
• [12] Farcomeni, A. (2010). Bayesian constrained variable selection., Statistica Sinica 20, 1043–1062.
• [13] Foster, D.P. and George, E.I. (1994). The risk inflation criterion for multiple regression., Ann. Statist. 22, 1947–1975.
• [14] George, E.I. and McCullogh, R.E. (1993). Variable selection via Gibbs sampling., J. Am. Statist. Assoc. 88, 881–889.
• [15] George, E.I. and McCullogh, R.E. (1997). Approaches to Bayesian variable selection., Statistica Sinica 7, 339–373.
• [16] Gutin, G. and Jones, M. (2012). Note on large subsets of binary vecrtors with similar distances., SIAM J. Discrete Math. 26, 1108-1111.
• [17] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise., Ann. Statist. 38, 1681–1732.
• [18] Johnstone, I.M. (2011)., Function Estimation and Gaussian Sequence Models, unpublished manuscript.
• [19] Lounici, K., Pontil, M., Tsybakov, A. and van de Geer, S. (2011). Oracle inequalities and optimal inference under group sparsity., Ann. Statist. 39, 2164–2204.
• [20] Mallows, C.L. (1973). Some comments on $C_p$., Technometrics 15, 661–675.
• [21] Raskutti, G., Wainwright, M.J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional regression over $l_q$ balls., IEEE Trans. Inform. Theory 57, 6976–6994.
• [22] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation., Ann. Statist. 39, 731–771.
• [23] Schwarz, G. (1978). Estimating the dimension of a model., Ann. Statist. 6, 461–464.
• [24] Tropp, J.A. and Wright, S.J. (2010). Computational methods for sparse solution of linear inverse problems., Proc. IEEE, special issue “Applications of sparse representation and compressive sensing”.
• [25] Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with $g$-prior distributions. In, Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti (eds. Goel, P.K. and Zellner, A.), North-Holland, Amsterdam, 233–243.