Use of Deep Learning in Early Software Bug Detection
Abstract
Abstract Views: 0
Traditional defect prediction studies primarily rely on hand-crafted features fed into Machine Learning (ML) classifiers to identify defective code. However, these features frequently fail to capture the essential semantic and structural information of programs and metric-based datasets, which are critical for accurate defect prediction. To address these limitations, the current study introduced a comprehensive deep learning pipeline for Software Defect Prediction (SDP) using multiple publicly-available datasets. The study particularly focused on the integration of Convolutional Neural Networks (CNN) and Random Forest classifiers. The datasets, representing different versions of software projects, were loaded, concatenated, and preprocessed using a combination of StandardScaler and OneHotEncoder. This preprocessing ensures the data is in a suitable format for training models. The approach adopted by the current study involved building and training a CNN model to capture semantic and structural features of the software data, followed by a Random Forest model tuned through GridSearchCV for optimal performance. Predictions from both models were combined using an ensemble method, where a majority vote determined the final predictions. The accuracy, precision, recall, and F1 score of this ensemble model were calculated to evaluate its performance. The experimental results demonstrated that the ensemble model, leveraging the strengths of both CNN and Random Forest classifiers, achieved high accuracy and F1 scores across multiple datasets. This study highlighted the effectiveness of combining deep learning with traditional ML techniques for SDP. This offered a robust framework to improve software reliability and aid developers in order to identify potential bugs.
Downloads
References
T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener, “Defect prediction from static code features: Current results, limitations, and new approaches,” Autom. Softw. Eng., vol. 17, no. 4, pp. 375–407, Dec. 2010, doi: https://doi.org/10.1007/s10515-010-0079-7.
X.-Y. Jing, S. Ying, Z.-W. Zhang, S.-S. Wu, and J. Liu, “Dictionary learning based software defect prediction,” in Proc. Int. Conf. Softw. Eng. (ICSE), May 2014, pp. 414–423, doi: https://doi.org/10.1145/2568225.2568230.
S. Wang, T. Liu, and L. Tan, “Automatically learning semantic features for defect prediction,” in Proc. Int. Conf. Softw. Eng. (ICSE), May 2016, pp. 297–308, doi: https://doi.org/10.1145/2884781.2884804.
T. Menzies, J. Greenwald, and A. Frank, “Data mining static code attributes to learn defect predictors,” IEEE Trans. Softw. Eng., vol. 33, no. 1, pp. 2–13, Jan. 2007, doi: https://doi.org/10.1109/TSE.2007.256941.
T. J. McCabe, “A complexity measure,” IEEE Trans. Softw. Eng., no. 4, pp. 308–320, Dec. 1976, doi: https://doi.org/10.1109/TSE.1976.233837.
S. R. Chidamber and C. F. Kemerer, “A metrics suite for object-oriented design,” IEEE Trans. Softw. Eng., vol. 20, no. 6, pp. 476–493, Jun. 1994, doi: https://doi.org/10.1109/32.295895.
O. Abdel-Hamid, A.-R. Mohamed, H. Jiang, and G. Penn, “Applying convolutional neural networks concepts to hybrid NN-HMM models for speech recognition,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Mar. 2012, pp. 4277–4280, doi: https://doi.org/10.1109/ICASSP.2012.6288864.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Adv. Neural Inf. Process. Syst. (NeurIPS), Dec. 2012, pp. 1097–1105, doi: https://doi.org/10.1145/3065386.
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, MA, USA: MIT Press, 2016.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998, doi: https://doi.org/10.1109/5.726791.
X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” in Adv. Neural Inf. Process. Syst. (NeurIPS), Dec. 2015, pp. 649–657.
D. Al-Fraihat and Y. Sharrab, “Predicting software defects using ensemble learning techniques,” in Proc. IEEE Int. Conf. Data Sci. Adv. Anal. (DSAA), Oct. 2022, pp. 1–10, doi: https://doi.org/10.1109/DSAA54385.2022.10032457.
“The PROMISE repository of software engineering databases,” in Softw. Eng. Databases, 2nd ed., vol. 3. NASA, Ed. Pennsylvania, USA: CiteSeerX, 2024, pp. 1–12.
J. Li, P. He, J. Zhu, and M. R. Lyu, “Software defect prediction via convolutional neural network,” in Proc. IEEE Int. Conf. Softw. Qual., Reliab. Secur. (QRS), Jul. 2017, pp. 318–328, doi:https://doi.org/10.1109/QRS.2017.42.
Fei, “PROMISE-backup,” GitHub repository. [Online]. Available: https://github.com/feiwww/PROMISE-backup. [Accessed: Aug. 3, 2024].
S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, “Benchmarking classification models for software defect prediction,” IEEE Trans. Softw. Eng., vol. 34, no. 4, pp. 485–496, Jul. 2008, doi: https://doi.org/10.1109/TSE.2008.35.
T. C. Sharma and M. Jain, “WEKA approach for comparative study of classification algorithms,” Int. J. Adv. Res. Comput. Commun. Eng., vol. 2, no. 4, pp. 1925–1931, Apr. 2013.
P. J. Kaur and Pallavi, “Data mining techniques for software defect prediction,” Int. J. Softw. Web Sci., vol. 3, no. 1, pp. 54–57, 2013.
T. Wang, W. Li, H. Shi, and Z. Liu, “Software defect prediction based on classifiers ensemble,” J. Inf. Comput. Sci., vol. 8, no. 1, pp. 4241–4254, 2011.
V. R. Basili et al., “A validation of object-oriented design metrics as quality indicators,” IEEE Trans. Softw. Eng., vol. 22, no. 10, pp. 751–761, Oct. 1996, doi: https://doi.org/10.1109/32.544352.
T. Gyimothy, R. Ferenc, and I. Siket, “Empirical validation of object-oriented metrics on open source software,” IEEE Trans. Softw. Eng., vol. 31, no. 10, pp. 897–910, Oct. 2005, doi: https://doi.org/10.1109/TSE.2005.112.
P. Singh and S. Verma, “Cross project software fault prediction at the design phase,” Int. J. Comput. Inf. Eng., vol. 9, no. 3, pp. 800–805, 2015.
B. J. Park et al., “Polynomial function-based neural networks for software defect detection,” Inf. Sci., vol. 229, pp. 40–57, Apr. 2013, doi: https://doi.org/10.1016/j.ins.2012.12.042.
K. O. Elish and M. O. Elish, “Predicting defect-prone software modules using support vector machines,” J. Syst. Softw., vol. 81, no. 5, pp. 649–660, May 2008, doi: https://doi.org/10.1016/j.jss.2007.07.040.
S. Shivaji et al., “Reducing features to improve code change-based bug prediction,” IEEE Trans. Softw. Eng., vol. 39, no. 4, pp. 552–569, Apr. 2013, doi: https://doi.org/10.1109/TSE.2012.38.
K. Dejaeger et al., “Toward comprehensible software fault prediction models using Bayesian networks,” IEEE Trans. Softw. Eng., vol. 39, no. 2, pp. 237–257, Feb. 2013, doi: https://doi.org/10.1109/TSE.2012.24.
S. S. Rathore and S. Kumar, “A decision tree logic-based recommendation system,” Computing, vol. 99, no. 3, pp. 255–284, Mar. 2017, doi: https://doi.org/10.1007/s00607-016-0485-9.
P. Singh and S. Verma, “Multi-classifier model for software fault prediction,” Int. Arab J. Inf. Technol., vol. 15, no. 5, pp. 912–919, Sep. 2018.
P. Singh et al., “Fuzzy rule-based approach for software fault prediction,” IEEE Trans. Syst., Man, Cybern.: Syst., vol. 47, no. 5, pp. 826–837, May 2017, doi: https://doi.org/10.1109/TSMC.2016.2566608.
X. Yang et al., “Deep learning for just-in-time defect prediction,” in Proc. IEEE Int. Conf. Softw. Qual., Reliab. Secur., Aug. 2015, pp. 17–26, doi: https://doi.org/10.1109/QRS.2015.14.
T. Thaher and F. Khamayseh, “A novel machine learning approach for software defect prediction,” in Proc. Int. Conf. Comput. Intell. Data Eng., 2021, pp. 95–106.
Y. Jiang, J. Lin, B. Cukic, and T. Menzies, “Variance analysis in software fault prediction models,” in Proc. Int. Symp. Softw. Reliab. Eng., Nov. 2009, pp. 99–108, doi: https://doi.org/10.1109/ISSRE.2009.33.
Copyright (c) 2024 Syed Mibran Hassan Zaidi

This work is licensed under a Creative Commons Attribution 4.0 International License.
UMT-AIR follow an open-access publishing policy and full text of all published articles is available free, immediately upon publication of an issue. The journal’s contents are published and distributed under the terms of the Creative Commons Attribution 4.0 International (CC-BY 4.0) license. Thus, the work submitted to the journal implies that it is original, unpublished work of the authors (neither published previously nor accepted/under consideration for publication elsewhere). On acceptance of a manuscript for publication, a corresponding author on the behalf of all co-authors of the manuscript will sign and submit a completed the Copyright and Author Consent Form.

