Prediction of Breast Cancer Using Machine Learning Techniques

  • Tahir Iqbal Department of Computer Sciences, Bahria University, Lahore Campus, Pakistan
  • Asif Farooq Department of Computer Science, University of Central Punjab, Lahore Pakistan
  • Nadeem Sarwar Department of Computer Sciences, Bahria University, Lahore Campus, Pakistan http://orcid.org/0000-0001-8681-6382
  • Mohsin Ashraf Department of Computer Science, University of Central Punjab, Lahore Pakistan https://orcid.org/0000-0001-9984-3400
  • Asma Irshad School of Biochemistry and Biotechnology, University of the Punjab, Lahore, Pakistan
Keywords: breast cancer, naïve Bayes,, neural network, machine learning, medical imaging support, vector machine

Abstract

Abstract Views: 497

Breast cancer affects a large number of women around the world who are more likely to die as a result of this condition. To seek out the main cause of breast cancer, samples were collected by employing a variety of cutting-edge procedures. The most modern techniques used in this regard are logistic regression, discriminant analysis and principal component analysis (PCA), all of which are useful in determining the causes of breast cancer. The Breast Cancer Wisconsin Diagnostic Dataset collects information about breast cancer via the machine learning repository approach. As a result of the data correlation matrix, we were able to root our job positively. PCA, discriminant analysis, and logistic regression were utilized to extract the dataset features. Models such as decision tree, naive Bayes, logistic regression, support vector machine (SVM), and artificial neural networks were utilized, and their performances were rigorously examined. The results suggested that the proposed strategy works effectively and reduces the training time. These new methods will help doctors to understand the origins of breast cancer and to distinguish between tumor kinds. Data mining techniques are used extensively, especially for feature selection. Finally, it was concluded that among all models, the hybrid discriminant-logistic (DA-LR) feature selection model outperforms SVM and naive Bayes.

Downloads

Download data is not yet available.

References

. DeSantis C, Ma J, Bryan L, Jemal A. Breast cancer statistics, 2013. CA: A Cancer Journal for Clinicians. 2014;64(1):52-62. DOI: https://doi.org/10.3322/caac.21203

. T.A.C. Society. Breast Cancer Early Detection and Diagnosis. [Online]. Available from: https://www.cancer.org/cancer/breast-cancer.

. Abe N, Kudo M, Toyama J, Shimbo M. A divergence criterion for classifier-independent feature selection. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). Springer. Berlin, Heidelberg, 2000; pp. 668-676. DOI: https://doi.org/10.1007/3-540-44522-6_69

. Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research 2003;3:1157-1182.

. C Society, "Breast Biopsy" [Online] [18 August 2016]. Available from: https://www.cancer.org/cancer/breast-cancer/screening-tests-and-early-detection/breastbiopsy.

. Breast Cancer Surveillance Consortium. [Online] [23 September 2013]. Available from: http://www.bcsc-research.org/statistics/performance/screening/2009/rate_age.

. Abdolmaleki P, Buadu LD, Murayama S, Murakami J, Hashiguchi N, Yabuuchi H, Masuda K. Neural network analysis of breast cancer from MRI findings. Radiation Medicine 1997;15(5):283294.

. Abdolmaleki P, Buadu LD, Naderimansh H. Feature extraction and classification of breast cancer on dynamic magnetic resonance imaging using artificial neural network. Cancer Letters. 2001;171(2):183-191. DOI: https://doi.org/10.1016/S0304-3835(01)00508-0

. Burke HB, Goodman PH, Rosen DB, et al. Artificial neural networks improve the accuracy of cancer survival prediction. Cancer. 1997;79(4):857-862. DOI: https://doi.org/10.1002/(SICI)1097-0142(19970215)79:4<857::AID-CNCR24>3.0.CO;2-Y

. Quinlan JR. Improved use of continuous attributes in C4. 5. Journal of Artificial Intelligence Research. 1996;4:77-90. DOI: https://doi.org/10.1613/jair.279

. Pena-Reyes CA, Sipper M. A fuzzy-genetic approach to breast cancer diagnosis. Artificial Intelligence in Medicine. 1999;17(2):131-155. DOI: https://doi.org/10.1016/S0933-3657(99)00019-6

. Hamilton HJ, Cercone N, Shan N. RIAC: a rule induction algorithm based on approximate classification. Computer Science Department, University of Regina. 1996.

. Abbass HA. An evolutionary artificial neural networks approach for breast cancer diagnosis. Artificial Intelligence in Medicine. 2002;25(3):265-281. DOI: https://doi.org/10.1016/S0933-3657(02)00028-3

. Şahan S, Polat K, Kodaz H, Güneş S. A new hybrid method based on fuzzy-artificial immune system and k-nn algorithm for breast cancer diagnosis. Computers in Biology and Medicine 2007;37(3):415-423. DOI: https://doi.org/10.1016/j.compbiomed.2006.05.003

. Akay MF. Support vector machines combined with feature selection for breast cancer diagnosis. Expert Systems with Applications. 2009;36(2):3240-3247. DOI: https://doi.org/10.1016/j.eswa.2008.01.009

. Chen HL, Yang B, Liu J, Liu DY. A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Systems with Applications. 2011;38(7):90149022. DOI: https://doi.org/10.1016/j.eswa.2011.01.120

. Jin SY, Won JK, Lee H, Choi HJ. Construction of an automated screening system to predict breast cancer diagnosis and prognosis. Basic and Applied Pathology. 2012;5(1):15-18. DOI: https://doi.org/10.1111/j.1755-9294.2012.01124.x

. Kaya Y. A new intelligent classifier for breast cancer diagnosis based on a rough set and extreme learning machine: RS+ ELM. Turkish Journal of Electrical Engineering & Computer Sciences. 2013;21(Sup. 1):2079-2091. DOI: https://doi.org/10.3906/elk-1203-119

. Zheng B, Yoon SW, Lam SS. Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Systems with Applications. 2014;41(4):1476-1482. 20. DOI: https://doi.org/10.1016/j.eswa.2013.08.044

. El-Baz AH. Hybrid intelligent system-based rough set and ensemble classifier for breast cancer diagnosis. Neural Computing and Applications. 2015;26(2):437-446. DOI: https://doi.org/10.1007/s00521-014-1731-9

. Bhardwaj A, Tiwari A. Breast cancer diagnosis using genetically optimized neural network model. Expert Systems with Applications. 2015;42(10):4611-4620. DOI: https://doi.org/10.1016/j.eswa.2015.01.065

. Onan A. A fuzzy-rough nearest neighbor classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer. Expert Systems with Applications. 2015;42(20):6844-6852. DOI: https://doi.org/10.1016/j.eswa.2015.05.006

. Örkçü HH, Doğan Mİ, Örkçü M. A Hybrid Applied Optimization Algorithm for Training MultiLayer Neural Networks in the Data Classification. Gazi University Journal of Science. 2015;28(1):115-132.

. Aalaei S, Shahraki H, Rowhanimanesh A, Eslami S. Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets. Iranian Journal of Basic Medical Sciences. 2016;19(5):476-482.

. Aličković E, Subasi A. Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Computing and Applications. 2017;28(4):753-763. DOI: https://doi.org/10.1007/s00521-015-2103-9

. Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF, Hua L. Data mining in healthcare and biomedicine: a survey of the literature. Journal of Medical Systems. 2012;36(4):2431-2448. DOI: https://doi.org/10.1007/s10916-011-9710-5

. Mitchell TM, Learning M. McGraw-Hill Science. Engineering/Math. 1997;1:27.

. Dey A, Singh J, Singh N. Analysis of Supervised Machine Learning Algorithms for Heart Disease Prediction with Reduced Number of Attributes using Principal Component Analysis. International Journal of Computer Applications. 2016;140(2):27-31. DOI: https://doi.org/10.5120/ijca2016909231

. Lan K, Wang DT, Fong S, Liu LS, Wong KKL, Dey N. A Survey of Data Mining and Deep Learning in Bioinformatics. J Med Syst. 2018;42(8):139. DOI: https://doi.org/10.1007/s10916-018-1003-9

. Han J, Pei J, Kamber M. Data mining: concepts and techniques. Elsevier. 2011.

. Shiffman D, Fry S, Marsh Z. The nature of code. Chapter 7 Cellular Automata. D. Shiffman. 2012:323-330.

. Sharma S, Sharma V, Sharma A. Performance based evaluation of various machine learning classification techniques for chronic kidney disease diagnosis. arXiv preprint arXiv:1606.09581. 2016 Jun 28.

. Peng CY, Lee KL, Ingersoll GM. An introduction to logistic regression analysis and reporting. The Journal of Educational Research. 2002;96(1):3-14. DOI: https://doi.org/10.1080/00220670209598786

. Alrashed AA, Gharibdousti MS, Goodarzi M, de Oliveira LR, Safaei MR, Bandarra Filho EP. Effects on thermophysical properties of carbon based nanofluids: Experimental data, modelling using regression, ANFIS and ANN. International Journal of Heat and Mass Transfer. 2018;125:920-932.

. Enders CK. Applied Missing Data Analysis. Methodology in the Social Sciences Series. Guilford Press. 2010.

. Allison PD. Missing data. Sage Publications. 2001. DOI: https://doi.org/10.4135/9781412985079

. Haitovsky Y. Missing data in regression analysis. Journal of the Royal Statistical Society: Series B (Methodological). 1968;30(1):67-82. DOI: https://doi.org/10.1111/j.2517-6161.1968.tb01507.x

. Hansen J. Using SPSS for Windows and Macintosh: Analyzing and Understanding Data. Pearson College Div. 1999.

. Liong CY, Foo SF. Comparison of linear discriminant analysis and logistic regression for data classification. InAIP Conference Proceedings. 2013;1522(1):1159-1165. DOI: https://doi.org/10.1063/1.4801262

. Jafari-Marandi R, Davarzani S, Gharibdousti MS, Smith BK. An optimum ANN-based breast cancer diagnosis: Bridging gaps between ANN learning and decision-making goals. Applied Soft Computing. 2018;72:108-120. DOI: https://doi.org/10.1016/j.asoc.2018.07.060

. Hall MA. Correlation-based feature selection for machine learning. PhD thesis. The University of Waikato, Department of Computer Science, Hamilton, NewZealand. Available from: https://www.cs.waikato.ac.nz/~mhall/thesis.pdf

. Gharibdousti MS, Azimi K, Hathikal S, Won DH. Prediction of chronic kidney disease using data mining techniques. Proceedings of the 2017 Industrial and Systems Engineering Conference. 2017, pp. 2135-2140.

. Alrashed AA, Gharibdousti MS, Goodarzi M, de Oliveira LR, Safaei MR, Bandarra Filho EP. Effects on thermophysical properties of carbon based nanofluids: Experimental data, modelling using regression, ANFIS and ANN. International Journal of Heat and Mass Transfer. 2018;125:920-932. DOI: https://doi.org/10.1016/j.ijheatmasstransfer.2018.04.142

. Begdache L, Kianmehr H, Sabounchi N, Chaar M, Marhaba J. Principal component analysis identifies differential gender-specific dietary patterns that may be linked to mental distress in human adults. Nutritional Neuroscience. 2018:1-4 DOI: https://doi.org/10.1080/1028415X.2018.1500198

Mangal, Anuj, and Vinod Jain. Prediction of Breast Cancer using Machine Learning Algorithms. 2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC). IEEE, 2021. DOI: https://doi.org/10.1109/I-SMAC52330.2021.9640813

Mridha, Krishna. "Early Prediction of Breast Cancer by using Artificial Neural Network and Machine Learning Techniques. 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT). IEEE. 2021. DOI: https://doi.org/10.1109/CSNT51715.2021.9509658

Published
2022-03-25
How to Cite
Tahir Iqbal, Asif Farooq, Sarwar, N., Mohsin Ashraf, & Asma Irshad. (2022). Prediction of Breast Cancer Using Machine Learning Techniques. BioScientific Review, 4(1), 59-75. https://doi.org/10.32350/BSR.0401.04
Section
Research Articles