Determining Urdu News Type from Headline Text Using Deep Learning

  • Umair Arshad School of Computing, Robert Gordon University, Aberdeen, United Kingdom
  • Khawar Iqbal Malik Riphah School of Computing & Innovation, Riphah International University, Lahore Campus, Pakistan
  • Hira Arooj Department of Mathematics and Statistics, University of Lahore, Sargodha Campus, Pakistan
  • Muhammad Fiaz Department of Computer Science, University of Lahore, Sargodha Campus, Pakistan
Keywords: deep learning (DL), fastText, natural language processing (NLP), Urdu news classification, Word2vec

Abstract

Abstract Views: 0

In recent years, the volume of data of regional languages available on the Internet has grown significantly. It helps people to express themselves by removing linguistic boundaries. Moreover, the accessibility of news articles on the web provides billions of web users with a source of knowledge. This research offers a classification model for categorizing Urdu news headlines text with deep learning (DL) techniques and different word vector embeddings. To improve the efficacy of various Urdu natural language processing (NLP) applications, this study included two neural word embeddings built by utilizing the most widely used approaches, namely Word2vec and pre-trained fastText. Both intrinsic and extrinsic evaluation methods were used to examine the integrity of the created neural word embeddings. The study employed a vast, fresh corpus of Urdu text containing 153,050 headlines categorized into 8 different classes. Then, text pre-processing techniques and two DL models, namely the Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BiLSTM) were applied. The results were compared based on embeddings. It was found that when a pre-trained fastText embedding was utilized, BiLSTM surpassed other DL models with an accuracy of 93.93%, precision of 93.86%, recall of 93.93%, and F1 score of 93.89%.

Downloads

Download data is not yet available.

References

M. Iqbal, B. Tahir, and M. A. Mehmood, "CURE: Collection for Urdu information retrieval evaluation and ranking," in Int. Conf. Digit. Fut. Transform. Technol., May 2021, pp. 1–6, doi: https://doi.org/10.48550/arXiv.2011.00565.

A. Daud, W. Khan, and D. Che, "Urdu language processing: A survey," Artif. Intell. Rev., vol. 47, pp. 279–311, 2017, doi: https://doi.org/10.1007/s10462-016-9482-x.

A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, "Bag of tricks for efficient text classification," arXiv, arXiv:1607.01759, 2016, doi: https://doi.org/10.48550/arXiv.1607.01759

I. Rasheed, H. Banka, and H. M. Khan, "A hybrid feature selection approach based on LSI for classification of Urdu text," in Machine Learning Algorithms for Industrial Applications, S. Das, S. Das, N. Dey, and A. E. Hassanien, Eds., Springer, 2021, pp. 3–18, 2021, doi: https://doi.org/10.1007/978-3-030-50641-4_1

I. Rasheed, V. Gupta, H. Banka, and C. Kumar, "Urdu text classification: A comparative study using machine learning techniques," in 13th Int. Conf. Digit. Inform. Manag., Sep. 2018, pp. 274–278, doi: https://doi.org/10.1109/ICDIM.2018.8847044

T. B. Shahi and A. K. Pant, "Nepali news classification using Naive Bayes, support vector machines and neural networks," in Int. Conf. Commun. Info. Comput. Technol., Feb. 2018, pp. 1–5, doi: https://doi.org/10.1109/ICCICT.2018.8325883

K. I. Malik, "Urdu news content classification using machine learning algorithms," Lahore Garri. Univ. Res. J. Comput. Sci. Info. Technol., vol. 6, no. 1, pp. 22-31, 2022, doi: https://doi.org/10.54692/lgurjcsit.2022.0601274

M. N. Asim, M. U. Ghani, M. A. Ibrahim, W. Mahmood, A. Dengel, and S. Ahmed, "Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification," Neural. Comput. Applic. vol. 33, pp. 5437–5469, 2021, doi: https://doi.org/10.1007/s00521-020-05321-8

A. Elnagar, R. Al-Debsi, and O. Einea, "Arabic text classification using deep learning models," Info. Process. Manag., vol. 57, no. 1, Article no. 102121, 2020.

J. Xie et al., "Chinese text classification based on attention mechanism and feature-enhanced fusion neural network," Computing, vol. 102, pp. 683–700, 2020, doi: https://doi.org/10.1007/s00607-019-00766-9

J. A. Díaz-García, C. Fernandez-Basso, M. D. Ruiz, and M. J. Martin-Bautista, "Mining text patterns over fake and real tweets," in Int. Conf. Info. Process. Manag. Uncert. Knowledge-Based Syst., 2020, pp. 648–660, Springer, doi: https://doi.org/10.1007/978-3-030-50143-3_51

J. Gong et al., "Hierarchical graph transformer-based deep learning model for large-scale multi-label text classification," IEEE Access, vol. 8, pp. 30885–30896, 2020, doi: https://doi.org/10.1109/ACCESS.2020.2972751

X. Xiao, S. Lian, Z. Luo, and S. Li, "Weighted res-unet for high-quality retina vessel segmentation," in 9th Int. Conf. Info. Technol. Med. Edu., 2018, pp. 327–331, doi: https://doi.org/10.1109/ITME.2018.00080

M. A. Ramdhani, D. S. A. Maylawati, and T. Mantoro, "Indonesian news classification using convolutional neural network," Indo. J. Elect. Eng. Comput. Sci., vol. 19, no. 2, pp. 1000–1009, 2020.

S. R. Sahoo and B. B. Gupta, "Multiple features based approach for automatic fake news detection on social networks using deep learning," Appl. Soft Comput., vol. 100, Article e106983, 2021, doi: https://doi.org/10.1016/j.asoc.2020.106983

I. C. Irsan and M. L. Khodra, "Hierarchical multilabel classification for Indonesian news articles," in Int. Conf. Adv. Info. Concept. Theo. Appl., 2016, pp. 1–6, doi: https://doi.org/10.1109/ICAICTA.2016.7803108

Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, "Hierarchical attention networks for document classification," in Proc. 2016 Conf. North Am. Chap. Assoc. Comput. Linguist. Human Lang. Technol., 2016, pp. 1480–1489.

I. Safder et al., "Sentiment analysis for Urdu online reviews using deep learning models," vol. 38, no. 8, p. e12751, 2021, doi: https://doi.org/10.1111/exsy.12751

K. Ahmed, M. Ali, S. Khalid, and M. Kamran, "Framework for Urdu News headlines classification," J. Appl. Comput. Sci. Mathemat., no. 21, 2016, doi: https://doi.org/10.1111/exsy.12751

S. A. Hamza, B. Tahir, and M. A. Mehmood, "Domain identification of urdu news text," in 22nd Int. Multi. Conf., 2019, pp. 1–7, doi: https://doi.org/10.1109/INMIC48123.2019.9022736

T. A. Javed, W. Shahzad, and U. Arshad, "Hierarchical text classification of urdu news using deep neural network," 2021, doi: https://doi.org/10.48550/arXiv.2107.03141

M. P. Akhter, Z. Jiangbin, I. R. Naqvi, M. Abdelmajeed, and M. Fayyaz, "Exploring deep learning approaches for Urdu text classification in product manufacturing," Enter. Info. Syst., vol. 16, no. 2, pp. 223–248, 2022, doi: https://doi.org/10.1080/17517575.2020.1755455

U. Naqvi, A. Majid, and S. A. Abbas, "UTSA: Urdu text sentiment analysis using deep learning methods," IEEE Access, vol. 9, pp. 114085–114094, 2021, doi: https://doi.org/10.1109/ACCESS.2021.3104308

H. Liu, "Sentiment analysis of citations using word2vec," arXiv, arXiv:1704.00177, 2017, doi: https://doi.org/10.48550/arXiv.1704.00177

D. Zhang, H. Xu, Z. Su, and Y. Xu, "Chinese comments sentiment classification based on word2vec and SVMperf," Expert Syst. Appl., vol. 42, no. 4, pp. 1857–1863, 2015, doi: https://doi.org/10.1016/j.eswa.2014.09.011

H. Peng, Y. Song, and D. Roth, "Event detection and co-reference with minimal supervision," in Proc. 2016 Conf. Empiri. Methods Nat. Lang. Process., 2016, pp. 392–402.

F. Mehmood, M. U. Ghani, M. A. Ibrahim, R. Shahzadi, W. Mahmood, and M. N. Asim, "A precisely xtreme-multi channel hybrid approach for Roman Urdu sentiment analysis," IEEE Access, vol. 8, pp. 192740–192759, 2020, doi: https://doi.org/10.1109/ACCESS.2020.3030885

Published
2023-12-20
How to Cite
Arshad, U., Malik, K. I., Arooj, H., & Fiaz, M. (2023). Determining Urdu News Type from Headline Text Using Deep Learning. UMT Artificial Intelligence Review, 3(2). https://doi.org/10.32350/umt-air.32.05