Journal of Applied Mathematics

  • J. Appl. Math.
  • Volume 2013, Special Issue (2013), Article ID 590614, 18 pages.

Selecting Optimal Feature Set in High-Dimensional Data by Swarm Search

Simon Fong, Yan Zhuang, Rui Tang, Xin-She Yang, and Suash Deb

Full-text: Open access

Abstract

Selecting the right set of features from data of high dimensionality for inducing an accurate classification model is a tough computational challenge. It is almost a NP-hard problem as the combinations of features escalate exponentially as the number of features increases. Unfortunately in data mining, as well as other engineering applications and bioinformatics, some data are described by a long array of features. Many feature subset selection algorithms have been proposed in the past, but not all of them are effective. Since it takes seemingly forever to use brute force in exhaustively trying every possible combination of features, stochastic optimization may be a solution. In this paper, we propose a new feature selection scheme called Swarm Search to find an optimal feature set by using metaheuristics. The advantage of Swarm Search is its flexibility in integrating any classifier into its fitness function and plugging in any metaheuristic algorithm to facilitate heuristic search. Simulation experiments are carried out by testing the Swarm Search over some high-dimensional datasets, with different classification algorithms and various metaheuristic algorithms. The comparative experiment results show that Swarm Search is able to attain relatively low error rates in classification without shrinking the size of the feature subset to its minimum.

Article information

Source
J. Appl. Math., Volume 2013, Special Issue (2013), Article ID 590614, 18 pages.

Dates
First available in Project Euclid: 14 March 2014

Permanent link to this document
https://projecteuclid.org/euclid.jam/1394807362

Digital Object Identifier
doi:10.1155/2013/590614

Citation

Fong, Simon; Zhuang, Yan; Tang, Rui; Yang, Xin-She; Deb, Suash. Selecting Optimal Feature Set in High-Dimensional Data by Swarm Search. J. Appl. Math. 2013, Special Issue (2013), Article ID 590614, 18 pages. doi:10.1155/2013/590614. https://projecteuclid.org/euclid.jam/1394807362


Export citation

References

  • S. Aeberhard, D. Coomans, and O. de Vel, “Comparison of classifiers in high dimensional settings,” Tech. Rep. 92-02, Department of Computer Science and Department of Mathematics and Statistics, James Cook University, North Queensland, Australia, 1992.
  • E.-G. Talbi, L. Jourdan, J. García-Nieto, and E. Alba, “Comparison of population based metaheuristics for feature selection: application to microarray data classification,” in Proceedings of the 6th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA '08), pp. 45–52, Doha, Qatar, April 2008.
  • S. M. Vieira, L. F. Mendonca, G. J. Farinha, and J. M. C. Sousa, “Metaheuristics for feature selection: application to sepsis outcome prediction,” in Proceedings of the IEEE World Congress on Computational Intelligence, pp. 1–8, Brisbane, Australia, June 2012.
  • J. Wang, A.-R. Hedar, S. Wang, and J. Ma, “Rough set and scatter search metaheuristic based feature selection for credit scoring,” Expert Systems with Applications, vol. 39, no. 6, pp. 6123–6128, 2012.
  • N. Abd-Alsabour and A. Moneim, “Diversification with an ant colony system for the feature selection problem,” in Proceedings of the 2nd International Conference on Management and Artificial Intelligence (IPEDR' 12), vol. 35, pp. 35–39, IACSIT Press, 2012.
  • S. Casado, J. Pacheco, and L. Núñez, “A new variable selection method for classification,” XV Jornadas de ASEPUMA y III Encuentro Internacional, pp. 1–11, 2007.
  • J. B. Jona and N. Nagaveni, “Ant-cuckoo colony optimization for feature selection in digital mammogram,” Pakistan Journal of Biological Sciences, pp. 1–6, 2013.
  • A. Unler, A. Murat, and R. B. Chinnam, “Mr2PSO: a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification,” Information Sciences, vol. 181, no. 20, pp. 4625–4641, 2011.
  • D. Korycinski, M. M. Crawford, and J. W. Barnes, “Adaptive feature selection for hyperspectral data analysis,” in 9th Image and Signal Processing for Remote Sensing, vol. 5238 of Proceedings of SPIE, pp. 213–225, Barcelona, Spain, 2004.
  • S. C. Yusta, “Different metaheuristic strategies to solve the feature selection problem,” Pattern Recognition Letters, vol. 30, no. 5, pp. 525–534, 2009.
  • A. Unler and A. Murat, “A discrete particle swarm optimization method for feature selection in binary classification problems,” European Journal of Operational Research, vol. 206, no. 3, pp. 528–539, 2010.
  • M. García-Torres, C. F. García López, B. Melián-Batista, A. J. Moreno-Pérez, and J. M. Moreno-Vega, “Solving feature subset selection problem by a hybrid metaheuristic,” Hybrid Metaheuristics, pp. 59–68, 2004.
  • S. El Ferchichi and K. Laabidi, “Genetic algorithm and tabu search for feature selection,” Studies in Informatics and Control, vol. 18, no. 2, pp. 181–187, 2009.
  • A. Al-Ani, “Feature subset selection using ant colony optimization,” International Journal of Information and Mathematical Sciences, vol. 2, article 1, pp. 53–58, 2006.
  • L. C. Molina, L. Belanche, and À. Nebot, “Feature selection algorithms: a survey and experimental evaluation,” in Proceedings of the IEEE International Conference on Data Mining, (ICDM '02), pp. 306–313, Maebashi, Japan, December 2002.
  • R. A. Johnson and D. W. Wichern, Applied Multivariate Statistical Analysis, Prentice Hall, Englewood Cliffs, NJ, USA, 3rd edition, 1992.
  • M. Deriche and A. Al-Ani, “Feature selection using a mutual information based measure,” in Proceedings of the 16th International Conference on Pattern Recognition, vol. 4, pp. 82–85, Quebec, Canada, August 2002.
  • G. Kumar and K. Kumar, “A novel evaluation function for feature selection based upon information theory,” in Proceedings of the 24th Canadian Conference on Electrical and Computer Engineering (CCECE '11), pp. 395–399, Niagara Falls, NY, USA, May 2011.
  • M. A. Hall and L. Smith, “A. Practical feature subset selection for machine learning,” in Proceedings of the Australian Computer Science Conference, pp. 181–191, Springer, New York, NY, USA, 1998.
  • R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artificial Intelligence, vol. 97, no. 1-2, pp. 273–324, 1997.
  • R. Ruiz, J. C. Riquelme, and J. S. Aguilar-Ruiz, “Heuristic search over a ranking for feature selection,” in Proceedings of the 8th International Workshop on Artificial Neural Networks, (IWANN '05), pp. 742–749, Barcelona, Spain, June 2005.
  • T. N. Lal, O. Chapelle, J. Western, and A. Elisseeff, “Embedded methods,” Studies in Fuzziness and Soft Computing, vol. 207, pp. 137–165, 2006.
  • X. S. Yang, “Swarm-based metaheuristic algorithms and no-free-lunch theorems,” in Theory and New Applications of Swarm Intelligence, R. Parpinelli and S. Heitor Lopes, Eds., InTech, 2012.
  • J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of the IEEE International Conference on Neural Networks. Part 1, vol. 4, pp. 1942–1948, December 1995.
  • X.-S. Yang, “A new metaheuristic bat-inspired algorithm,” Nature Inspired Cooperative Strategies for Optimization (NICSO '10), vol. 284, pp. 65–74, 2010.
  • R. Tang, S. Fong, X. S. Yang, and S. Deb, “Wolf search algorithm with ephemeral memory,” in Proceedings of the IEEE 7th International Conference on Digital Information Management (ICDIM '12), pp. 165–172, August 2012.
  • S. Fong, K. Lan, and R. Wong, “Classifying human voices by using hybrid SFX time-series pre-processing and ensemble feature selection,” Biomed Research International, vol. 2013, Article ID 720834, 27 pages, 2013.