Sentiment Analysis of Roman Urdu Text Using Machine Learning Techniques

  • Mubasher Malik Department of Computer Science, Institute of Southern Punjab Multan, Pakistan
  • Hamid Ghous Australian Scientific & Engineering Solutions, Sydney, New South Wales, Australia
Keywords: feature engineering, opinion mining, Roman Urdu, sentiment classification, supervised learning, unsupervised learning


Abstract Views: 0

Social media has attained popularity during the last few decades due to the rapid growth of online businesses and social interaction. People can interact with one another and communicate their sentiments by expressing their ideas and points of view on social media. Businesses involved in manufacturing, sales, and marketing increasingly focus on social media to get feedback on their goods and services from people worldwide. Businesses must process and analyze this feedback in the form of sentiments to gain business insights. Every day, millions of Urdu and Roman Urdu sentences are posted on social media platforms. The critical loss of this massive amount of data results from ignoring the thoughts and opinions in language with limited resources, such as Urdu and Roman Urdu in the favor of resource-rich languages, such as English. The current study focused on sentiment analysis of Roman Urdu text. Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) word embedding techniques were deployed to conduct the current study. Support Vector Machine (SVM), Linear Support Vector Machine (LSVC), Logistic Regression (LR), and Random Forest (RF) classifiers were deployed. The experiments showed that SVM showed 94.74%, while RF showed 93.13% accuracy using BoW word embedding technique


Download data is not yet available.


D. M. E.-D. M. Hussein, "A survey on sentiment analysis challenges," J. King. Saud. Unive.-Eng. Sci., vol. 30, no. 4, pp. 330–338, Oct. 2018, doi:

W. Villegas-Ch, S. Molina, V. D. Janón, E. Montalvo, and A. Mera- Navarrete, "Proposal of a method for the analysis of sentiments in social networks with the use of R," Informatics, vol. 9, no. 3, Art. no. 63, Aug. 2022. doi: 30063

I. H. Sarker, "Ai-based modeling: Techniques, applications and research issues towards automation, intelligent and smart systems," SN Comput. Sci., vol. 3, no. 2, Art. no. 158, 2022, doi: 01043-x

B. Hasselgren, C. Chrysoulas, N. Pitropakis, and W. J. Buchanan, "Using social media & sentiment analysis to make investment decisions," Future Internet, vol. 15, no. 1, Art. no. 5, Dec. 2023, doi:

Nasdaq. "How does social media influence financial markets?," w-does-social-media-influence-financial-markets-2019-10-14 (accessed Jan. 1, 2023].

A. Ligthart, C. Catal, and B. Tekinerdogan, "Systematic reviews in sentiment analysis: a tertiary study," Artif. Intell. Rev., vol. 54, no. 7, pp. 4997–5053, 2021, doi: 09973-3

Y. K. Dwivedi et al., "Setting the future of digital and social media marketing research: Perspectives and research propositions," Int. J. Info. Manag., vol. 59, Art. no. 102168, 2021, doi: 20.102168

U. Sivarajah, M. M. Kamal, Z. Irani, and V. Weerakkody, "Critical analysis of big data challenges and analytical methods," J. Bus. Res., vol. 70, pp. 263–286, Jan. 2017, doi: .08.001

S. Haque, "Language use and islamic practices in multilingual europe," Signs Soc., vol. 8, no. 3, pp. 401–425, 2020, doi:

Z. Ansari, S. Ali, and F. Khan, "Use of roman script for writing urdu language," Int. J. Linguist. Cul., vol. 1, no. 2, pp. 165–178, 2020, doi:

F. Noor, M. Bakhtyar, and J. Baber, "Sentiment analysis in E-commerce using SVM on roman urdu text," in Int. Conf. Emerg. Technol. Comput., 2019, pp. 213–222, doi: 23943-5_16

B. Chandio et al., "Sentiment analysis of roman Urdu on e-commerce reviews using machine learning," CMES-Comput. Model. Eng. Sci., vol. 131, no. 3, pp. 1263–1287, Apr. 2022, doi: 2.019535

I. U. Khan et al., "A review of Urdu sentiment analysis with multilingual perspective: A case of Urdu and roman Urdu language," Computers, vol. 11, no. 1, Art. no. 3, 2021; doi:

Worldometer. "South Asian Population (Live)." (accessed Jan. 1, 2023).

W. Ahmad and M. Edalati, "Urdu speech and text based sentiment analyzer," arXiv, arXiv:2207.09163, 2022, doi:

F. H. A. Shibly, U. Sharma, and H. M. M. Naleer, "Classifying and measuring hate speech in Twitter using topic classifier of sentiment analysis," in Int. Conf. Innov. Comput. Commun., 2021, pp. 671–678, doi:

L. Nemes and A. Kiss, " Social media sentiment analysis based on COVID-19," J. Info. Telecommun., vol. 5, no. 1, pp. 1–15, July 2021, doi:

F. Mehmood, M. U. Ghani, M. A. Ibrahim, R. Shahzadi, W. Mahmood, and M. N. Asim, "A precisely xtreme-multi channel hybrid approach for roman urdu sentiment analysis," IEEE Access, vol. 8, pp. 192740–192759, Oct. 2020, doi:

M. Asif, A. Ishtiaq, H. Ahmad, H. Aljuaid, and J. Shah, "Sentiment analysis of extremism in social media from textual information," Telemat. Info., vol. 48, Art. no. 101345, May 2020, doi:

M. P. Akhter, Z. Jiangbin, I. R. Naqvi, M. Abdelmajeed, and M. T. Sadiq, "Automatic detection of offensive language for urdu and roman Urdu," IEEE Access, vol. 8, pp. 91213–91226, May 2020, doi:

S. Yasin, K. Ullah, S. Nawaz, M. Rizwan, and Z. Aslam, "Dual language sentiment analysis model for youtube videos ranking based on machine learning techniques," Pak J Eng Technol., vol. 3, no. 2, pp. 213–218, Oct. 2020.

Z. Nasim and S. Ghani, "Sentiment analysis on urdu tweets using markov chains," SN Comput. Sci., vol. 1, Art. no. 269, Aug. 2020, doi:

S. Rani and W. Anwar, "Resource Creation and evaluation of aspect based sentiment analysis in Urdu," in Proc. 1st Conf. Asia-Pacific Chap. Assoc. Comput. Linguist. 10th Int. Joint Conf. Natu. Lang. Process., B. Shmueli, Y. J. Huang, Eds., Dec. 2020, pp. 79–84.

A. Rafique, M. K. Malik, Z. Nawaz, F. Bukhari, and A. H. Jalbani, "Sentiment analysis for roman urdu," Mehran Univ. Res. J. Eng. Technol., vol. 38, no. 2. pp. 463–470, 2019.

R. Bibi, U. Qamar, M. Ansar, and A. Shaheen, "Sentiment analysis for urdu news tweets using decision tree," in IEEE 17th Int. Conf. Soft. Eng. Res. Manag. Appl., 2019, pp. 66–70, doi: 886788

E. ul Haq, S. Rauf, S. Hussain, and K. Javed, "Corpus of aspect-based sentiment for urdu political data," Lang. Technol., pp. 37–40, 2019.

K. Mehmood, D. Essam, and K. Shafi, "Sentiment analysis system for Roman Urdu," in Proc. 2018 Comput. Conf., 2018, pp. 29–42, doi: 01174-1_3

Z. Sharf and S. U. Rahman, "Lexical normalization of roman Urdu text," Int. J. Comput. Sci. Net. Sec., vol. 17, no. 12, pp. 213–221, 2017.

Z. Sharf and S. U. Rahman, "Performing natural language processing on roman urdu datasets," Int. J. Comput. Sci. Net. Sec., vol. 18, no. 1, pp. 141–148, 2018.

S. J. Mielke, "Between words and characters: a brief history of open-vocabulary modeling and tokenization in NLP," arXiv, arXiv:2112.10508: 2021. 10508

R. Satapathy, C. Guerreiro, I. Chaturvedi, and E. Cambria, "Phonetic-based microtext normalization for twitter sentiment analysis," in IEEE Int. Conf. Data Min Works., 2017, pp. 407–413, doi: 7.59

V. S. Vykhovanets, J. Du, and S. A. Sakulin, "An overview of phonetic encoding algorithms," Autom. Remote. Control., vol. 81, pp. 1896– 1910, Nov. 2020, doi: 100082

Z. Bhatti, A. Waqas, I. A. Ismaili, D. N. Hakro, and W. J. Soomro, "Phonetic based Soundex & shapeex algorithm for sindhi spell checker system," Adv. Environ. Biology., vol. 8, pp. 1147–1155, 2014.

A. Kumar and S. P. Panda, "A survey: How python pitches in IT-world," in Int. Conf. Mach. Learn Big Data Cloud Parallel Comput., 2019, pp. 248–251, doi: 019.8862251

M. Lavin, "Analyzing documents with TF-IDF," Program. Histor. 2019, no. 8, pp. 1–21, doi:

V. Sundaram, S. Ahmed, S. A. Muqtadeer, and R. R. Reddy, "Emotion analysis in text using TF-IDF," in 11th Int. Conf. Cloud Comput. Data Sci. Eng., 2021, pp. 292–297, doi: 648.2021.9377159

R. N. Rathi and A. Mustafi, "The importance of Term Weighting in semantic understanding of text: a review of techniques," Multimed Tools Appl., vol. 82, no. 7, pp. 9761– 9783, 2023, doi:

Y. Zhang, R. Jin, and Z. H. Zhou, "Understanding bag-of-words model: a statistical framework," Int. J. Mach. Learn. Cyber., vol. 1, pp. 43–52, Aug. 2010, doi:

S. Georgeand and S. Joseph, "Text classification by augmenting bag of words (BOW) representation with co-occurrence feature," IOSR J. Comput. Eng., vol. 16, no. 1, pp. 34–38, 2014.

V. Dogra, S. Verma, P. Chatterjee, J. Shafi, J. Choi, and M. F. Ijaz, "A complete process of text classification system using state-of-the-art NLP models," Comput. Intell. Neurosci., vol. 2022, Art. no. 1883698, doi:

S. Suthaharan and S. Suthaharan, "Support vector machine," in Machine Learning Models And Algorithms For Big Data Classification: Thinking With Examples For Effective Learning. Boston; Springer, 2016, pp. 207–235.

F. Nie, W. Zhu, and X. Li, "Decision Tree SVM: An extension of linear SVM for non-linear classification," Neurocomputing, vol. 401, pp. 153–159, Aug. 2020, doi:

A. Patle and D. S. Chouhan, "SVM kernel functions for classification," in Int. Conf. Adv. Technol. Eng., 2013, pp. 1–9, doi:

D.-X. Zhou and K. Jetter, "Approximation with polynomial kernels and SVM classifiers," Adv. Comput. Math. vol. 25, no. 1-3, pp. 323–344, 2006, doi:

B. H. Cho, H. Yu, J. Lee, Y. J. Chee, I. Y. Kim, and S. I. Kim, "Nonlinear support vector machine visualization for risk factor analysis using nomograms and localized radial basis function kernels," IEEE Transac. Info. Technol. Biomed., vol. 12, no. 2, pp. 247–256, Mar. 2008, doi:

L. Ladicky and P. Torr, "Locally linear support vector machines," in Proc. 28th Int. Conf. Mach Learn., 2011, pp. 985–992.

D. Abdelhamid and A. Taleb-Ahmed, "Support vector machine based clustering: A review," in Int. Sympos. iNnovat. Info. Biskra, 2022, pp. 1-6, doi:

A. Cutler, D. R. Cutler, and J. R. Stevens, "Random forests," in Ensemble Machine Learning: Methods and applications, C. Zhang and Y. Ma, Eds., Springer, 2012, pp. 157–175, doi:

A. Parmar, R. Katariya, and V. Patel, "A review on random forest: An ensemble classifier," in Int. Conf.Intell. Data Commun. Technol. Internet Things, 2019, pp. 758–763, doi: 030-03146-6_86

Y. Qi and Z. Shabrina, "Sentiment analysis using Twitter data: a comparative application of lexicon-and machine-learning-based approach," Soc. Netw. Anal. Min., vol. 13, no. 1, p. 31, Feb. 2023, doi: 01030-x

X. Liao, Y. Xue, and L. Carin, "Logistic regression with an auxiliary data source," in Proc. 22nd Int. Conf. Mach. Learn., 2005, pp. 505–512, doi: 2415

I. H. Sarker, "Machine learning: Algorithms, real-world applications and research directions," SN Comput. Sci., vol. 2, no. 3, Art. no. 160, 2021, doi: 021-00592-x

How to Cite
Malik, M., & Ghous, H. (2023). Sentiment Analysis of Roman Urdu Text Using Machine Learning Techniques. Innovative Computing Review, 3(2).