Evaluating the Performance of Heterogeneous and Homogeneous Ensemble-based Models for Twitter Spam Classification

  • Akinyemi Moruff OYELAKIN Department of Computer Science, Al-Hikmah University, Ilorin, Nigeria
  • A. O. Ameen Department of Computer Science, University of Ilorin, Ilorin, Nigeria
  • I. K. Ajiboye Computer Science Unit, Abdulraheem College of Advanced Studies An Affiliate of Al-Hikmah Univerisity, Ilorin, Nigeria
  • I. S. Olatinwo Computer Science Unit, Abdulraheem College of Advanced Studies An Affiliate of Al-Hikmah Univerisity, Ilorin, Nigeria
  • K. Y.Obiwusi Department of Mathematics and Computer Science, Summit University, Offa, Nigeria
  • T. S. Ogundele Department of Computer Science, Al-Hikmah University, Ilorin, Nigeria
Keywords: ensemble classification, predictive accuracy, social network, Twitter spam detection

Abstract

Abstract Views: 73

Spam based attacks are growing in various social networks. Social network spam is a type of unwanted content that appears on social networking sites, such as Facebook, Twitter, Instagram, and others. This study used two categories of ensemble algorithms to build Twitter spam classification models. These algorithms worked by combining the strengths of individual learning algorithms and then reporting their total performances. In ensemble learning, models are formed from data based on the assumption that combining the output of multiple models is better than using a single classifier. Hence, this study used a labeled public dataset for machine learning-based Twitter spam detection. Several studies have investigated the classification of Twitter spam from the available datasets. However, there is a paucity of works that investigated how machine learning-based models, built with homogenous and heterogeneous algorithms, behave in Twitter spam classification. ANOVA-F test was used for selecting the most promising features in the dataset. Then, homogeneous tree-based Random Forest (RF) ensemble and a heterogeneous ensemble vote classifier were employed for the classification of Twitter spam. Tree-based algorithms were used to build a homogeneous twitter spam detection model, while a combination of Support Vector Machine (SVM) and Decision Tree (DT) algorithms was used for building the heterogeneous model (using maximum voting classifier). The current study found that the performance of the Twitter spam detection models were promising. In all, the heterogeneous model recorded better performance with regards to accuracy, precision, recall, and F1-score than the model built with homogeneous base classifier.

Downloads

Download data is not yet available.

References

S. Rao, A. K. Verma, and T. Bhatia, “A review on social spam detection: Challenges, open issues, and future directions,” Expert Sys. Appl..

vol. 186, Art. no. 115742, Dec. 2021, doi: https://doi.org/10.1016/j.eswa.2021.115742

A. Pektaş and T. Acarman, “Botnet detection based on network flow summary and deep learning,” Int. J. Netw. Manag., vol. 28, no. 6, pp. 1–15., July 2018, doi: https://doi.org/10.1002/nem.2039

B. Markines, C. Cattuto, and F. Menczer, “Social spam detection,” Proc. 5th Int. Workshop Advers. Inform. Retrien. Web – AIRWeb, 2009.

S. Penchikala, “Big data processing with apache spark-part 4,” Spark Mach. Lear., 2016

D. Opitz, and R. Maclin, “Popular ensemble methods: An empirical study,” J. Artif. Intell. Res., vol. 11, pp. 169–198, 1999.

R. Polikar. (2006). Ensemble based systems in decision making, IEEE Circuits and Systems Magazine. 6 (3): 21–45. doi:10.1109/MCAS.2006.1688199. S2CID 18032543.

Rokach, L., “Ensemble-based classifiers”. Artificial Intelligence Review2010. 33 (1–2): 1–39, 2010, doi:

https://doi.org/10.1007/s10462-009-9124-7

R. G. Jimoh et al., “Experimental evaluation of ensemble learning-based models for twitter spam classification,” 5th Information Technology for Education and Development (ITED), 2022.

A. H. Wang, “Machine learning for the detection of spam in twitter networks,” paper presented at 7th International Joint Conference, ICETE, Athens, Greece, July 26–28, 2010 .

D. Thilagavathy, A. Muthumanickam, S. Naveenkumar, and A. U. Kumar, “Spam detection in twitter using light weight detectors,” Int. J. Sci. Res. Comput. Sci. Appl. Manag. Stud., vol. 8, no. 2, 2019.

F. Concone, G. Lo Re, M. Morana, C. Ruocco, “Twitter spam account detection by effective labeling,” in Proc. 3th Italian Conf. Cyber. Secu., Pisa, Italy, Feb. 13–15, 2019.

C. Chen, J. Zhang, X. Chen, Y. Xiang, and W. Zhou, “6 Million spam tweets: A large ground truth for timely twitter

spam detection,” IEEE Int. Conf. Commun. Info. Syst. Secur. Symp. London, UK, 2015, pp. 7065–7070, doi: https://doi/org/10.1109/ICC.2015.7249453

J. Oliver, P. Pajares, C. Ke, C. Chen, and Y. Xiang, “An indepth analysis of abuse on twitter,” Trend Micro Res. Paper, 2014

S. Rao, A. K. Verma, and T. Bhatia, “A review on social spam detection: Challenges, open issues, and future directions,” Expert Syst. Appl., 2021, vol. 186, Art. no. 115742, doi: https://doi.org/10.1016/j.eswa.2021.115742

A. M. Oyelakin and R.G. Jimoh, “A survey of feature extraction and feature selection techniques used in machine learning-based botnet detection schemes,” VAWKUM Transac. Comput. Sci., vol. 9, no. 1, pp. 1-7, 2021.

L. Breiman, “Bagging predictors,” Mach. Learn., vol. 26, no. 2, pp. 123–140, 1996. https://doi.org/10.1007/BF00058655

L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001.https://doi.org/10.1023/A:1010933404324

G. Brown, “Ensemble learning,” in Encyclopedia of Machine Learning. Springer. Boston, MA, USA, 2010.

M. Swamynathan, Mastering machine learning with Python in six steps, A practical implementation guide to predictive data analytics using Python. Apress, Berkeley, CA.

Published
2022-12-25
How to Cite
OYELAKIN, A. M., A. O. Ameen, I. K. Ajiboye, I. S. Olatinwo, K. Y.Obiwusi, & T. S. Ogundele. (2022). Evaluating the Performance of Heterogeneous and Homogeneous Ensemble-based Models for Twitter Spam Classification. Innovative Computing Review, 2(2). https://doi.org/10.32350/icr.0202.01