Use of Deep Learning in Early Software Bug Detection

Syed Mibran Hassan Zaidi

doi:10.32350/umt-air.42.03

Syed Mibran Hassan Zaidi NED University Of Engineering & Technology

DOI: https://doi.org/10.32350/umt-air.42.03

Keywords: Deep learning, CNN, Random Forest, Early Software Defect Prediction

Abstract

Abstract Views: 0

Traditional defect prediction studies primarily rely on hand-crafted features fed into Machine Learning (ML) classifiers to identify defective code. However, these features frequently fail to capture the essential semantic and structural information of programs and metric-based datasets, which are critical for accurate defect prediction. To address these limitations, the current study introduced a comprehensive deep learning pipeline for Software Defect Prediction (SDP) using multiple publicly-available datasets. The study particularly focused on the integration of Convolutional Neural Networks (CNN) and Random Forest classifiers. The datasets, representing different versions of software projects, were loaded, concatenated, and preprocessed using a combination of StandardScaler and OneHotEncoder. This preprocessing ensures the data is in a suitable format for training models. The approach adopted by the current study involved building and training a CNN model to capture semantic and structural features of the software data, followed by a Random Forest model tuned through GridSearchCV for optimal performance. Predictions from both models were combined using an ensemble method, where a majority vote determined the final predictions. The accuracy, precision, recall, and F1 score of this ensemble model were calculated to evaluate its performance. The experimental results demonstrated that the ensemble model, leveraging the strengths of both CNN and Random Forest classifiers, achieved high accuracy and F1 scores across multiple datasets. This study highlighted the effectiveness of combining deep learning with traditional ML techniques for SDP. This offered a robust framework to improve software reliability and aid developers in order to identify potential bugs.

Downloads

Download data is not yet available.

References

T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener, “Defect prediction from static code features: Current results, limitations, and new approaches,” Autom. Softw. Eng., vol. 17, no. 4, pp. 375–407, Dec. 2010, doi: https://doi.org/10.1007/s10515-010-0079-7.

X.-Y. Jing, S. Ying, Z.-W. Zhang, S.-S. Wu, and J. Liu, “Dictionary learning based software defect prediction,” in Proc. Int. Conf. Softw. Eng. (ICSE), May 2014, pp. 414–423, doi: https://doi.org/10.1145/2568225.2568230.

S. Wang, T. Liu, and L. Tan, “Automatically learning semantic features for defect prediction,” in Proc. Int. Conf. Softw. Eng. (ICSE), May 2016, pp. 297–308, doi: https://doi.org/10.1145/2884781.2884804.

T. Menzies, J. Greenwald, and A. Frank, “Data mining static code attributes to learn defect predictors,” IEEE Trans. Softw. Eng., vol. 33, no. 1, pp. 2–13, Jan. 2007, doi: https://doi.org/10.1109/TSE.2007.256941.

T. J. McCabe, “A complexity measure,” IEEE Trans. Softw. Eng., no. 4, pp. 308–320, Dec. 1976, doi: https://doi.org/10.1109/TSE.1976.233837.

S. R. Chidamber and C. F. Kemerer, “A metrics suite for object-oriented design,” IEEE Trans. Softw. Eng., vol. 20, no. 6, pp. 476–493, Jun. 1994, doi: https://doi.org/10.1109/32.295895.

O. Abdel-Hamid, A.-R. Mohamed, H. Jiang, and G. Penn, “Applying convolutional neural networks concepts to hybrid NN-HMM models for speech recognition,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Mar. 2012, pp. 4277–4280, doi: https://doi.org/10.1109/ICASSP.2012.6288864.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Adv. Neural Inf. Process. Syst. (NeurIPS), Dec. 2012, pp. 1097–1105, doi: https://doi.org/10.1145/3065386.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, MA, USA: MIT Press, 2016.

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998, doi: https://doi.org/10.1109/5.726791.

X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” in Adv. Neural Inf. Process. Syst. (NeurIPS), Dec. 2015, pp. 649–657.

D. Al-Fraihat and Y. Sharrab, “Predicting software defects using ensemble learning techniques,” in Proc. IEEE Int. Conf. Data Sci. Adv. Anal. (DSAA), Oct. 2022, pp. 1–10, doi: https://doi.org/10.1109/DSAA54385.2022.10032457.

“The PROMISE repository of software engineering databases,” in Softw. Eng. Databases, 2nd ed., vol. 3. NASA, Ed. Pennsylvania, USA: CiteSeerX, 2024, pp. 1–12.

J. Li, P. He, J. Zhu, and M. R. Lyu, “Software defect prediction via convolutional neural network,” in Proc. IEEE Int. Conf. Softw. Qual., Reliab. Secur. (QRS), Jul. 2017, pp. 318–328, doi:https://doi.org/10.1109/QRS.2017.42.

Fei, “PROMISE-backup,” GitHub repository. [Online]. Available: https://github.com/feiwww/PROMISE-backup. [Accessed: Aug. 3, 2024].

S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, “Benchmarking classification models for software defect prediction,” IEEE Trans. Softw. Eng., vol. 34, no. 4, pp. 485–496, Jul. 2008, doi: https://doi.org/10.1109/TSE.2008.35.

T. C. Sharma and M. Jain, “WEKA approach for comparative study of classification algorithms,” Int. J. Adv. Res. Comput. Commun. Eng., vol. 2, no. 4, pp. 1925–1931, Apr. 2013.

P. J. Kaur and Pallavi, “Data mining techniques for software defect prediction,” Int. J. Softw. Web Sci., vol. 3, no. 1, pp. 54–57, 2013.

T. Wang, W. Li, H. Shi, and Z. Liu, “Software defect prediction based on classifiers ensemble,” J. Inf. Comput. Sci., vol. 8, no. 1, pp. 4241–4254, 2011.

V. R. Basili et al., “A validation of object-oriented design metrics as quality indicators,” IEEE Trans. Softw. Eng., vol. 22, no. 10, pp. 751–761, Oct. 1996, doi: https://doi.org/10.1109/32.544352.

T. Gyimothy, R. Ferenc, and I. Siket, “Empirical validation of object-oriented metrics on open source software,” IEEE Trans. Softw. Eng., vol. 31, no. 10, pp. 897–910, Oct. 2005, doi: https://doi.org/10.1109/TSE.2005.112.

P. Singh and S. Verma, “Cross project software fault prediction at the design phase,” Int. J. Comput. Inf. Eng., vol. 9, no. 3, pp. 800–805, 2015.

B. J. Park et al., “Polynomial function-based neural networks for software defect detection,” Inf. Sci., vol. 229, pp. 40–57, Apr. 2013, doi: https://doi.org/10.1016/j.ins.2012.12.042.

K. O. Elish and M. O. Elish, “Predicting defect-prone software modules using support vector machines,” J. Syst. Softw., vol. 81, no. 5, pp. 649–660, May 2008, doi: https://doi.org/10.1016/j.jss.2007.07.040.

S. Shivaji et al., “Reducing features to improve code change-based bug prediction,” IEEE Trans. Softw. Eng., vol. 39, no. 4, pp. 552–569, Apr. 2013, doi: https://doi.org/10.1109/TSE.2012.38.

K. Dejaeger et al., “Toward comprehensible software fault prediction models using Bayesian networks,” IEEE Trans. Softw. Eng., vol. 39, no. 2, pp. 237–257, Feb. 2013, doi: https://doi.org/10.1109/TSE.2012.24.

S. S. Rathore and S. Kumar, “A decision tree logic-based recommendation system,” Computing, vol. 99, no. 3, pp. 255–284, Mar. 2017, doi: https://doi.org/10.1007/s00607-016-0485-9.

P. Singh and S. Verma, “Multi-classifier model for software fault prediction,” Int. Arab J. Inf. Technol., vol. 15, no. 5, pp. 912–919, Sep. 2018.

P. Singh et al., “Fuzzy rule-based approach for software fault prediction,” IEEE Trans. Syst., Man, Cybern.: Syst., vol. 47, no. 5, pp. 826–837, May 2017, doi: https://doi.org/10.1109/TSMC.2016.2566608.

X. Yang et al., “Deep learning for just-in-time defect prediction,” in Proc. IEEE Int. Conf. Softw. Qual., Reliab. Secur., Aug. 2015, pp. 17–26, doi: https://doi.org/10.1109/QRS.2015.14.

T. Thaher and F. Khamayseh, “A novel machine learning approach for software defect prediction,” in Proc. Int. Conf. Comput. Intell. Data Eng., 2021, pp. 95–106.

Y. Jiang, J. Lin, B. Cukic, and T. Menzies, “Variance analysis in software fault prediction models,” in Proc. Int. Symp. Softw. Reliab. Eng., Nov. 2009, pp. 99–108, doi: https://doi.org/10.1109/ISSRE.2009.33.