Tree-Based Learning Models for Botnet Malware Classification in Real World Sub-Sample Dataset

Akinyemi Moruff  Oyelakin; Jimoh Rasheed  Gbenga

doi:10.32350/icr.32.01

Akinyemi Moruff Oyelakin Department of Computer Science, Crescent University, Abeokuta,Nigeria
Jimoh Rasheed Gbenga Department of Computer Science, University of Ilorin, Ilorin, Nigeria

DOI: https://doi.org/10.32350/icr.32.01

Keywords: botnet malware, bot communication, malware detection, tree learning algorithms

Abstract

Abstract Views: 0

The use of machine learning techniques for botnet detection has been an active area of research in security field for some years now. Some of the past machine learning-based botnet detection studies used datasets that were generated synthetically. The release of a large and real-life botnet dataset, named CTU-13, allowed researchers to build machine learning-based models from real-world data. In fact, the real-life traces in the dataset makes it more promising for being used for botnet identification studies. The current study proposed the use of a single tree-based learning algorithm in the classification of botnet evidence from sub-sampled portion of three captures in CTU-13 dataset. Random sub-sampling was used to arrive at three different datasets that was used in the study. The first step in the methodology involved experimental analyses on three captures out of thethirteen in the whole dataset. The analyses revealed the basic characteristics of the datasets which further guided the study further. The missing values and categorical data types in the dataset were handled through mixed imputation and feature encoding, respectively. The big data nature of the dataset was handled through random sub-sampling technique with a view to building a botnet detection model that is less computationally intensive. The random sub-sampling technique was used without changing the data distributions in thedataset. The botnet detection models were built by using decision-tree algorithm from the three sub-sampled dataset captures. The performances of the models were evaluated by using accuracy, precision, recall, and f1-score, respectively. In all, the model built with scenario5 capture slightly performed better than the ones built using scenario 6 and scenario 7 captures, respectively.

Downloads

Download data is not yet available.

References

P. Wang, B. Aslam, and C. C. Zou,“Peer-to-peer botnets,” in Handbook of Information and Communication

Security, P. Stavroulakis and M. Stamp, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 335–350. doi: https://doi.org/10.1007/978-3-642-04117-4_18

J. Liu, Y. Xiao, K. Ghaboosi, H. Deng, and J. Zhang, “Botnet: classification, attacks, detection, tracing, and preventive measures,” EURASIP J. Wirel. Commun. Netw., vol. 2009, no. 1, Art. no. 692654, Dec. 2009, doi: https://doi.org/10.1155/2009/692654

A. Zand, G. Vigna, X. Yan, and C. Kruegel, “Extracting probable command and control signatures for detecting botnets,” in Proceedings of the 29th Annual ACM Symposium on Applied Computing, Gyeongju Republic of Korea: ACM, Mar. 2014, pp. 1657–1662. doi: http://doi.org/10.1145/2554850.25548 96

A. Almomani, “Fast-flux hunter: a system for filtering online fast-flux botnet,” Neural. Comput. Appl., vol. 29, no. 7, pp. 483–493, Apr. 2018, doi: https://doi.org/10.1007/s00521-016- 2531-1

S. Lagraa, J. François, A. Lahmadi, M. Miner, C. Hammerschmidt, and R. State, “BotGM: Unsupervised graph mining to detect botnets in traffic flows,” in 2017 1st Cyber Security in Networking Conference (CSNet), IEEE, 2017, pp. 1–8. [Online]. Available:https://ieeexplore.ieee.org/a bstract/document/8241990/

Y. S. Abu-Mostafa, M. Magdon- Ismail, and H. T. Lin,Learning from data. AML Book, 2012. [7] S. Garcia, M. Grill, J. Stiborek, and A. Zunino, “An empirical comparison of botnet detection methods,” Comput. Secur., vol. 45, pp. 100–123, Sep. 2014, doi:https://doi.org/10.1016/j.cose.201 4.05.011

L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, Aug. 1996, doi: https://doi.org/10.1007/BF00058655

L. Breiman, Classification and regression trees. Routledge, 2017, doi: https://doi.org/10.1201/9781315139470

M. Alauthaman, N. Aslam, L. Zhang, R. Alasem, and M. A. Hossain, “A P2P botnet detection scheme based on decision tree and adaptive multilayer neural networks,” Neural. Comput. Appl., vol. 29, no. 11, pp. 991–1004, June. 2018, doi:https://doi.org/10.1007/s00521-016-2564-5

A. A. Ahmed, W. A. Jabbar, A. S. Sadiq, and H. Patel, “Deep learning-based classification model for botnet attack detection,” J. Ambient Intell. Humaniz. Comput., vol. 13, no. 7, pp. 3457–3466, July. 2022, doi: https://doi.org/10.1007/s12652-020-01848-9

A. R. Vishwakarma, “Network trafficbased botnet detection using machine learning,” Master thesis, Dep. Comput. Sci., San Jos State Univ. San Jos, California, 2020. [Online]. Available:https://scholarworks.sjsu.edu/etd_projects/917/

K. Ramström, “Botnet detection on flow data using the reconstruction error from Autoencoders trained on Word2Vec network embeddings,” Master thesis, Dep. Info. Technol., Uppsala Univ.,Uppsala, Sweden, 2019. [Online]. Available:https://www.diva-portal.org/smash/record.jsf?pid=diva2:1352441

F. Tariq and S. Baig, “Multiclass machine learning based botnet detection in software defined networks,” Int. J. Comput. Sci. Netw. Secur., vol. 19, no. 3, p. 150–156, 2019.

S. Harun, T. H. Bhuiyan, S. Zhang, H. Medal, and L. Bian, “Bot classification for real-life highly class-imbalanced dataset,” in 2017 IEEE 15th Int. Conf. Depend. Auton. Secur. Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), IEEE, 2017, pp. 565–572. [Online]. Available:https://ieeexplore.ieee.org/abstract/document/8328446/

A. Pektaş and T. Acarman, “Botnet detection based on network flow summary and deep learning,” Int. J. Netw. Manag., vol. 28, no. 6, Art. no. 2039, Nov. 2018, doi: https://doi.org/10.1002/nem.2039

M. Swamynathan, Mastering Machine Learning with Python in Six Steps. Berkeley, CA: Apress, 2017, doi: http://doi.org/10.1007/978-1-4842-2866-1

J. A. R. Rojas, M. B. Kery, S. Rosenthal, and A. Dey, “Sampling techniques to improve big data exploration,” in 2017 IEEE 7th Symposium on Large Data Analysis and Visualization (LDAV), IEEE, 2017, pp. 26–35. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8231848/