Journal of Applied Mathematics

On Software Defect Prediction Using Machine Learning

Jinsheng Ren, Ke Qin, Ying Ma, and Guangchun Luo

Full-text: Access has been disabled (more information)

Abstract

This paper mainly deals with how kernel method can be used for software defect prediction, since the class imbalance can greatly reduce the performance of defect prediction. In this paper, two classifiers, namely, the asymmetric kernel partial least squares classifier (AKPLSC) and asymmetric kernel principal component analysis classifier (AKPCAC), are proposed for solving the class imbalance problem. This is achieved by applying kernel function to the asymmetric partial least squares classifier and asymmetric principal component analysis classifier, respectively. The kernel function used for the two classifiers is Gaussian function. Experiments conducted on NASA and SOFTLAB data sets using F-measure, Friedman’s test, and Tukey’s test confirm the validity of our methods.

Article information

Source
J. Appl. Math. Volume 2014 (2014), Article ID 785435, 8 pages.

Dates
First available in Project Euclid: 2 March 2015

Permanent link to this document
http://projecteuclid.org/euclid.jam/1425305549

Digital Object Identifier
doi:10.1155/2014/785435

Mathematical Reviews number (MathSciNet)
MR3176829

Citation

Ren, Jinsheng; Qin, Ke; Ma, Ying; Luo, Guangchun. On Software Defect Prediction Using Machine Learning. J. Appl. Math. 2014 (2014), Article ID 785435, 8 pages. doi:10.1155/2014/785435. http://projecteuclid.org/euclid.jam/1425305549.


Export citation

References

  • T. M. Khoshgoftaar, E. B. Allen, and J. Deng, “Using regression trees to classify fault-prone software modules,” IEEE Transactions on Reliability, vol. 51, no. 4, pp. 455–462, 2002.
  • T. Menzies, J. Greenwald, and A. Frank, “Data mining static code attributes to learn defect predictors,” IEEE Transactions on Software Engineering, vol. 33, no. 1, pp. 2–13, 2007.
  • Y. Ma, G. Luo, X. Zeng, and A. Chen, “Transfer learning for cross-company software defect prediction,” Information and Software Technology, vol. 54, no. 3, pp. 248–256, 2012.
  • C. Seiffert, T. M. Khoshgoftaar, and J. Van Hulse, “Improving software-quality predictions with data sampling and boosting,” IEEE Transactions on Systems, Man, and Cybernetics A, vol. 39, no. 6, pp. 1283–1294, 2009.
  • L. Guo, Y. Ma, B. Cukic, and H. Singh, “Robust prediction of fault-proneness by random forests,” in Proceedings of the 15th International Symposium on Software Reliability Engineering (ISSRE '04), pp. 417–428, November 2004.
  • Y. Freund and R. Schapire, “Experiments with a new boosting algorithm,” in Proceedings of the 13th International Conference on Machine Learning, pp. 148–156, 1996.
  • M. Barker and W. Rayens, “Partial least squares for discrimination,” Journal of Chemometrics, vol. 17, no. 3, pp. 166–173, 2003.
  • J.-H. Xue and D. M. Titterington, “Do unbalanced data have a negative effect on LDA?” Pattern Recognition, vol. 41, no. 5, pp. 1558–1571, 2008.
  • X. Jiang, “Asymmetric principal component and discriminant analyses for pattern classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 931–937, 2009.
  • I. T. Jolliffe, Principal Component Analysis, Springer, New York, NY, USA, 1986.
  • R. Rosipal, M. Girolami, L. J. Trejo, and A. Cichocki, “Kernel PCA for feature extraction and de-noising in nonlinear regression,” Neural Computing and Applications, vol. 10, no. 3, pp. 231–243, 2001.
  • Y. Ma, G. Luo, and H. Chen, “Kernel based asymmetric lear-ning for software defect prediction,” IEICE Transactions on Information and Systems, vol. E-95-D, no. 1, pp. 267–270, 2012.
  • Y. Ma, G. Luo, and H. Chen, “Kernel based asymmetric learning for software defect prediction,” IEICE Transactions on Information and Systems, vol. E-95-D, no. 1, pp. 267–270, 2012.
  • R. Rosipal, L. J. Trejo, and B. Matthews, “Kernel PLS-SVC for linear and nonlinear classification,” in Proceedings of the 20th International Conference on Machine Learning (ICML '03), pp. 640–647, August 2003.
  • H.-N. Qu, G.-Z. Li, and W.-S. Xu, “An asymmetric classifier based on partial least squares,” Pattern Recognition, vol. 43, no. 10, pp. 3448–3457, 2010.
  • K. Bache and M. Lichman, “UCI Machine Learning Repository,” University of California, School of Information and Computer Science, Irvine, Calif, USA, 2013 http://archive.ics.uci.edu/ml/.
  • T. Menzies, B. Caglayan, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan, “The PROMISE Repository of empirical software engineering data,” West Virginia University, Department of Computer Science, 2012, http://promisedata.googlecode.com/.
  • N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
  • M. Friedman, “The use of ranks to avoid the assumption of normality implicit in the analysis of variance,” Journal of the American Statistical Association, no. 32, pp. C675–C701, 1937.
  • M. Friedman, “A comparison of alternative tests of significance for the problem of $m$ rankings,” Annals of Mathematical Statistics, vol. 11, pp. 86–92, 1940.
  • M. William and S. Terry, Statistics for Engineering and the Sciences, Pearson, London, UK, 5th edition, 2006. \endinput