Electronic Journal of Statistics

Structured variable selection in support vector machines

Seongho Wu, Hui Zou, and Ming Yuan

Full-text: Open access

Abstract

When applying the support vector machine (SVM) to high-dimensional classification problems, we often impose a sparse structure in the SVM to eliminate the influences of the irrelevant predictors. The lasso and other variable selection techniques have been successfully used in the SVM to perform automatic variable selection. In some problems, there is a natural hierarchical structure among the variables. Thus, in order to have an interpretable SVM classifier, it is important to respect the heredity principle when enforcing the sparsity in the SVM. Many variable selection methods, however, do not respect the heredity principle. In this paper we enforce both sparsity and the heredity principle in the SVM by using the so-called structured variable selection (SVS) framework originally proposed in [20]. We minimize the empirical hinge loss under a set of linear inequality constraints and a lasso-type penalty. The solution always obeys the desired heredity principle and enjoys sparsity. The new SVM classifier can be efficiently fitted, because the optimization problem is a linear program. Another contribution of this work is to present a nonparametric extension of the SVS framework, and we propose nonparametric heredity SVMs. Simulated and real data are used to illustrate the merits of the proposed method.

Article information

Source
Electron. J. Statist. Volume 2 (2008), 103-117.

Dates
First available in Project Euclid: 22 February 2008

Permanent link to this document
http://projecteuclid.org/euclid.ejs/1203692405

Digital Object Identifier
doi:10.1214/07-EJS125

Mathematical Reviews number (MathSciNet)
MR2386088

Subjects
Primary: 68T10: Pattern recognition, speech recognition {For cluster analysis, see 62H30}
Secondary: 62G05: Estimation

Keywords
Classification Heredity Nonparametric estimation Support vector machine Variable selection

Citation

Wu, Seongho; Zou, Hui; Yuan, Ming. Structured variable selection in support vector machines. Electron. J. Statist. 2 (2008), 103--117. doi:10.1214/07-EJS125. http://projecteuclid.org/euclid.ejs/1203692405.


Export citation

References

  • Bradley, P. and Mangasarian, O. (1998). Feature selection via concave minimization and support vector machines. In J. Shavlik (eds), ICML’98. Morgan Kaufmann.
  • Breiman, L. (1995). Better subset regression using the nonnegative garrote., Technometrics 37, 4, 373–384.
  • Chipman, H. (1996). Bayesian variable selection with related predictors., Canad. J. Statist. 24, 1, 17–36.
  • Chipman, H., Hamada, M. and Wu, C. F. J. (1997). A Bayesian variable selection approach for analyzing designed experiments with complex aliasing., Technometrics 39, 372–381.
  • Choi, N. and Zhu, J. (2006). Variable selection with strong heredity / marginality constraints., Technical Report, Department of Statistics, University of Michigan, Ann Arbor.
  • de Boor, C. (1978)., A practical guide to splines. Applied Mathematical Sciences, Vol. 27. Springer-Verlag, New York.
  • Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression., Ann. Statist. 32, 2, 407–499. With discussion, and a rejoinder by the authors.
  • Green, P. J. and Silverman, B. W. (1994)., Nonparametric regression and generalized linear models. Monographs on Statistics and Applied Probability, Vol. 58. Chapman & Hall, London. A roughness penalty approach.
  • Hamada, M. and Wu, C. F. J. (1992). Analysis of designed experiments with complex aliasing, Journal of Quality Technology 24, 130–137.
  • Hastie, T. and Tibshirani, R. (2004). Efficient Quadratic Regularization for Expression Arrays, Biostatistics 5, 329–340.
  • Hastie, T., Tibshirani, R., and Friedman, J. (2001)., The elements of statistical learning. Springer Series in Statistics. Springer-Verlag, New York. Data mining, inference, and prediction.
  • Hastie, T. J. and Tibshirani, R. J. (1990)., Generalized additive models. Monographs on Statistics and Applied Probability, Vol. 43. Chapman and Hall Ltd., London.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., J. Roy. Statist. Soc. Ser. B 58, 1, 267–288.
  • Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression., Ann. Statist. 32, 2, 407–499. With discussion, and a rejoinder by the authors.
  • Vapnik, V. (1996)., The Nature of Statistical Learning. Springer Verlag, New York.
  • Venables, W. N. and Ripley, B. D. (1994)., Modern applied statistics with S-Plus. Statistics and Computing. Springer-Verlag, New York. With 1 IBM-PC floppy disk (3.5 inch; HD).
  • Wahba, G. (1990)., Spline models for observational data. CBMS-NSF Regional Conference Series in Applied Mathematics, Vol. 59. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.
  • Wu, C. F. J. and Hamada, M. (2000)., Experiments. Wiley Series in Probability and Statistics: Texts and References Section. John Wiley & Sons Inc., New York. Planning, analysis, and parameter design optimization, A Wiley-Interscience Publication.
  • Yuan, M. and Lin, Y. (2005). Efficient empirical Bayes variable selection and estimation in linear models., J. Amer. Statist. Assoc. 100, 472, 1215–1225.
  • Yuan, M., Joseph, R. and Zou, H. (2007). Structured Variable Selection and Estimation, Technical Report, School of Industrial and Systems Engineering, Georgia Institute of Technology.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables., J. R. Stat. Soc. Ser. B Stat. Methodol. 68, 1, 49–67.
  • Yuan, M. and Lin, Y. (2007). On the non-negative garrotte estimator., J. R. Stat. Soc. Ser. B Stat. Methodol. 69, 2, 143–161.
  • Zhao, P., Rocha, G. and Yu, B. (2006). Grouped and hierarchical model selection through composite absolute penalties., Technical Report, Department of Statistics, University of California, Berkeley.
  • Zhu, J., Rosset, S., Hastie, T. & Tibshirani, R. (2004). 1-norm support vector machines., Advances in Neural nformation Processing Systems 16.
  • Zou, H. and Yuan, M. (2005). The, F-norm Support Vector Machine. Statisica Sinica. To appear.