The Annals of Applied Statistics

Structured variable selection and estimation

Ming Yuan, V. Roshan Joseph, and Hui Zou

Full-text: Open access


In linear regression problems with related predictors, it is desirable to do variable selection and estimation by maintaining the hierarchical or structural relationships among predictors. In this paper we propose non-negative garrote methods that can naturally incorporate such relationships defined through effect heredity principles or marginality principles. We show that the methods are very easy to compute and enjoy nice theoretical properties. We also show that the methods can be easily extended to deal with more general regression problems such as generalized linear models. Simulations and real examples are used to illustrate the merits of the proposed methods.

Article information

Ann. Appl. Stat., Volume 3, Number 4 (2009), 1738-1757.

First available in Project Euclid: 1 March 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Effect heredity nonnegative garrote quadratic programming regularization variable selection


Yuan, Ming; Joseph, V. Roshan; Zou, Hui. Structured variable selection and estimation. Ann. Appl. Stat. 3 (2009), no. 4, 1738--1757. doi:10.1214/09-AOAS254.

Export citation


  • Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge.
  • Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics 37 373–384.
  • Breiman, L. and Friedman, J. (1985). Estimating optimal transformations for multiple regression and correlation. J. Amer. Statist. Assoc. 80 580–598.
  • Breiman, L., Friedman, J., Stone, C. and Olshen, R. (1984). Classifcation and Regression Trees. Chapman & Hall/CRC, New York.
  • Chipman, H. (1996). Bayesian variable selection with related predictors. Canad. J. Statist. 24 17–36.
  • Chipman, H., Hamada, M. and Wu, C. F. J. (1997). A Bayesian variable selection approach for analyzing designed experiments with complex aliasing. Technometrics 39 372–381.
  • Choi, N., Li, W. and Zhu, J. (2006). Variable selection with the strong heredity constraint and its oracle property. Technical report.
  • Efron, B., Johnstone, I., Hastie, T. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
  • Hamada, M. and Wu, C. F. J. (1992). Analysis of designed experiments with complex aliasing. Journal of Quality Technology 24 130–137.
  • Harrison, D. and Rubinfeld, D. (1978). Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management 5 81–102.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2003). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York.
  • Joseph, V. R. and Delaney, J. D. (2007). Functionally induced priors for the analysis of experiments. Technometrics 49 1–11.
  • Li, X., Sundarsanam, N. and Frey, D. (2006). Regularities in data from factorial experiments. Complexity 11 32–45.
  • McCullagh, P. (2002). What is a statistical model (with discussion). Ann. Statist. 30 1225–1310.
  • McCullagh, P. and Nelder, J. (1989). Generalized Linear Models, 2nd ed. Chapman & Hall, London.
  • Nelder, J. (1977). A reformulation of linear models. J. Roy. Statist. Soc. Ser. A 140 48–77.
  • Nelder, J. (1994). The statistics of linear models. Statist. Comput. 4 221–234.
  • Nelder, J. (1998). The selection of terms in response-surface models—how strong is the weak-heredity principle? Amer. Statist. 52 315–318.
  • Newman, D., Hettich, S., Blake, C. and Merz, C. (1998). UCI repository of machine learning databases. Dept. Information and Computer Science, Univ. California, Irvine, CA. Available at
  • Osborne, M., Presnell, B. and Turlach, B. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–403.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Turlach, B. (2004). Discussion of “Least angle regression.” Ann. Statist. 32 481–490.
  • Yuan, M., Joseph, V. R. and Lin, Y. (2007). An efficient variable selection approach for analyzing designed experiments. Technometrics 49 430–439.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68 49–67.
  • Yuan, M. and Lin, Y. (2007). On the the nonnegative garrote estimator. J. Roy. Statist. Soc. Ser. B 69 143–161.
  • Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468–3497.