Journal of Applied Mathematics

  • J. Appl. Math.
  • Volume 2014, Special Issue (2013), Article ID 425731, 6 pages.

Classification of Phishing Email Using Random Forest Machine Learning Technique

Andronicus A. Akinyelu and Aderemi O. Adewumi

Full-text: Open access


Phishing is one of the major challenges faced by the world of e-commerce today. Thanks to phishing attacks, billions of dollars have been lost by many companies and individuals. In 2012, an online report put the loss due to phishing attack at about $1.5 billion. This global impact of phishing attacks will continue to be on the increase and thus requires more efficient phishing detection techniques to curb the menace. This paper investigates and reports the use of random forest machine learning algorithm in classification of phishing attacks, with the major objective of developing an improved phishing email classifier with better prediction accuracy and fewer numbers of features. From a dataset consisting of 2000 phishing and ham emails, a set of prominent phishing email features (identified from the literature) were extracted and used by the machine learning algorithm with a resulting classification accuracy of 99.7% and low false negative (FN) and false positive (FP) rates.

Article information

J. Appl. Math., Volume 2014, Special Issue (2013), Article ID 425731, 6 pages.

First available in Project Euclid: 1 October 2014

Permanent link to this document

Digital Object Identifier


Akinyelu, Andronicus A.; Adewumi, Aderemi O. Classification of Phishing Email Using Random Forest Machine Learning Technique. J. Appl. Math. 2014, Special Issue (2013), Article ID 425731, 6 pages. doi:10.1155/2014/425731.

Export citation


  • M. Khonji, Y. Iraqi, and A. Jones, “Phishing detection: a literature survey,” IEEE Communications & Surveys Tutorials, vol. 15, no. 4, pp. 2091–2121, 2013.
  • S. Sheng, M. Holbrook, P. Kumaraguru, L. F. Cranor, and J. Downs, “Who falls for phish?: a demographic analysis of phishing susceptibility and effectiveness of interventions,” in Proceedings of the 28th Annual SIGCHI Conference on Human Factors in Computing Systems (CHI '10), pp. 373–382, Atlanta, Ga, USA, April 2010.
  • M. Behdad, L. Barone, M. Bennamoun, and T. French, “Nature-inspired techniques in the context of fraud detection,” IEEE Transactions on Systems, Man, and Cybernetics C: Applications and Reviews, vol. 42, no. 6, pp. 1273–1290, 2012.
  • P. Prakash, M. Kumar, R. R. Kompella, and M. Gupta, “PhishNet: predictive blacklisting to detect phishing attacks,” in Proceedings of the IEEE Conference on Computer Communications (IEEE INFOCOM '10), pp. 1–5, IEEE, San Diego, Calif, USA, March 2010.
  • L. F. Cranor, S. Egelman, J. I. Hong, and Y. Zhang, “Phinding phish: an evaluation of anti-phishing toolbars,” in Proceedings of the 14th Annual Network & Distributed System Security Symposium (NDSS '07), San Diego, Calif, USA, 2007.
  • N. Chou, R. Ledesma, Y. Teraguchi, and J. C. Mitchell, “Client-side defense against web-based identity theft,” in Proceedings of the 11th Annual Network & Distributed System Security Symposium (NDSS '04), San Diego, Calif, USA, February 2004.
  • W. D. Yu, S. Nargundkar, and N. Tiruthani, “PhishCatch–-a phishing detection tool,” in Proceedings of the 33rd Annual IEEE International Computer Software and Applications Conference (COMPSAC '09), vol. 2, pp. 451–456, Seattle, Wash, USA, July 2009.
  • Y. Zhang, J. I. Hong, and L. F. Cranor, “Cantina: a content-based approach to detecting phishing web sites,” in Proceedings of the 16th International World Wide Web Conference (WWW '07), pp. 639–648, ACM, Alberta, Canada, May 2007.
  • I. Fette, N. Sadeh, and A. Tomasic, “Learning to detect phishing emails,” in Proceedings of the 16th International World Wide Web Conference (WWW '07), pp. 649–656, Alberta, Canada, May 2007.
  • A. Bergholz, J. de Beer, S. Glahn, M. F. Moens, G. Paaß, and S. Strobel, “New filtering approaches for phishing email,” Journal of Computer Security, vol. 18, no. 1, pp. 7–35, 2010.
  • Y. Cao, W. Han, and Y. Le, “Anti-phishing based on automated individual white-list,” in Proceedings of the 4th ACM Workshop on Digital Identity Management (DIM '08), pp. 51–59, ACM, Alexandria, Va, USA, October 2008.
  • L. Ma, B. Ofoghi, P. Watters, and S. Brown, “Detecting phishing emails using hybrid features,” in Proccedings of the Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing (UIC-ATC '09), pp. 493–497, IEEE, Brisbane, Australia, July 2009.
  • Apache Software Foundation, “Spam assassin homepage,” 2006,
  • K. Albrecht, N. Burri, and R. Wattenhofer, “Spamato-an extendable spam filter system,” in Proceedings of the 2nd Conference on Email and Anti-Spam (CEAS '05), Stanford, Calif, USA, 2005.
  • A. Emigh, “Phishing attacks: information flow and chokepoints,” in Phishing and Countermeasures: Understanding the Increasing Problem of Electronic Identity Theft, M. Jakobsson and S. Myers, Eds., pp. 31–64, John Wiley & Sons, New York, NY, USA, 2007.
  • N. Zhang and Y. Yuan, “Phishing detection using neural network,”
  • A. ALmomani, T.-C. Wan, A. Altaher et al., “Evolving fuzzy neural network for phishing emails detection,” Journal of Computer Science, vol. 8, no. 7, pp. 1099–1107, 2012.
  • R. Basnet, S. Mukkamala, and A. H. Sung, “Detection of phishing attacks: a machine learning approach,” in Soft Computing Applications in Industry, pp. 373–383, Springer, Berlin, Germany, 2008.
  • J. Nazario, “Phishingcorpus homepage,” 2006, \href{ \href{
  • L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
  • L. Breiman and A. Cutler, “Random forests-classification description,” Department of Statistics Homepage, 2007,$\sim\,\!$breiman/RandomForests/cc_ home.htm.
  • I. Koprinska, J. Poon, J. Clark, and J. Chan, “Learning to classify e-mail,” Information Sciences, vol. 177, no. 10, pp. 2167–2187, 2007.
  • C. Whittaker, B. Ryner, and M. Nazif, “Large-scale automatic classification of phishing pages,” in Proceedings of the 17th Annual Network & Distributed System Security Symposium (NDSS '10), The Internet Society, San Diego, Calif, USA, 2010.
  • T. M. Mitchell, Machine Learning, McGraw-Hill, New York, NY, USA, 1997. \endinput