Roman Urdu to Urdu Machine Transliteration by Using T5 Transformer

  • Usama Ahmed University of Management and Technology
  • Muhammad Adeel University of Management and Technology
  • Usama Amjad University of Management and Technology
Keywords: Machine Transliteration, T5 transformer, Deep learning, Roman Urdu, Encoder Decoder, Transfer Learning

Abstract

Abstract Views: 0

Transliteration is the process of simply analyzing the words in the resource language to the words in the goal language, without any change in meaning. This method transforms the syntax of a text in resource speech into characters of the target language, known as machine transliteration. Recent studies indicate that no dedicated transliteration machine currently exists that covers the issue of RU-U Machine Translation. Previous researchers have attempted to solve this problem using the deep learning techniques, particularly RNN model. Recurrent Neural Network (RNN) transformers are built to manage sequence input information, like natural language, for tasks like translation and text summarization. This model works better on short sentences than long sentences. In the proposed methodology, T5 transformers are encoder-decoder models that translate NLP issues into text-text format. T5 is a transfer learning and the transformers used in this paper are trained on 101 languages including resources language and after training on our parallel data set which consists of 1,107,156 sentences, the study achieved a remarkable result of 91.56 Blue Score.

Downloads

Download data is not yet available.

References

M. Shahroz, M. F. Mushtaq, A. Mehmood, S. Ullah, and G. S. Choi, “RUTUT: Roman Urdu to Urdu translator based on character substitution rules and Unicode mapping,” IEEE Access, vol. 8, pp. 189823–189841, Oct. 2020, doi: https://doi.org/10.1109/ACCESS.2020.3031393.

M. Alam and S. ul Hussain, “Deep learning-based Roman-Urdu to Urdu transliteration,” Int. J. Pattern Recognit. Artif. Intell., vol. 35, no. 4, 2021, Art. no. 2152001, doi: https://doi.org/10.1142/S0218001421520017.

M. Alam and S. ul Hussain, “Roman-Urdu-Parl: Roman-Urdu and Urdu parallel corpus for Urdu language understanding,” ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 21, no. 1, Jan. 2022, Art. no. 13, doi: https://doi.org/10.1145/3464424.

A. Fraisse, R. Jenn, and S. F. Fishkin, “Building multilingual parallel corpora for under-resourced languages using translated fictional texts,” in Proc. 3rd Workshop Collab. Comput. Under-Resour. Lang., Miyazaki, Japan, Oct. 2018.

A. Kunchukuttan, P. Mehta, and P. Bhattacharyya, “The IIT Bombay English–Hindi parallel corpus,” arXiv preprint, arXiv:1710.02855, 2017.

N. T. Le et al., “Low-resource machine transliteration using recurrent neural networks,” ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 18, no. 2, Mar. 2019, Art. no. 13, doi: https://doi.org/10.1145/3265752.

M. Alam and S. Hussain, “Sequence-to-sequence networks for Roman-Urdu to Urdu transliteration,” in Proc. Int. Multi-Topic Conf. (INMIC), Lahore, Pakistan, Dec. 2017, pp. 1–6.

A. Daud, W. Khan, and D. Che, “Urdu language processing: A survey,” Artif. Intell. Rev., vol. 47, pp. 279–311, Mar. 2017, doi: https://doi.org/10.1007/s10462-016-9486-x.

S. A. B. Andrabi and A. Wahid, “Machine translation system using deep learning for English to Urdu (retracted),” Comput. Intell. Neurosci., vol. 2022, 2022, Art. no. 7873012, doi: https://doi.org/10.1155/2022/7873012.

H. M. Shakeel, R. Khan, and M. Waheed, “Context-based Roman-Urdu to Urdu script transliteration system,” arXiv preprint, arXiv:2109.14197, 2021.

M. H. Al-Khresheh and S. A. Almaaytah, “English proverbs into Arabic through machine translation,” Int. J. Appl. Linguist. English Lit., vol. 7, no. 5, pp. 158–166, Sep. 2018.

S. Khan et al., “Translation divergence patterns handling in English to Urdu machine translation,” Int. J. Artif. Intell. Tools, vol. 27, no. 5, Oct. 2018, Art. no. 1850017, doi: https://doi.org/10.1142/S0218213018500173.

A. Bilal et al., “Roman-txt: Forms and functions of Roman Urdu texting,” in Proc. Int. Conf. Hum.-Comput. Interact. Mobile Devices Serv., Sep. 2017, doi: https://doi.org/10.1145/3098279.3098552.

N. Durrani et al., “Hindi-to-Urdu machine translation through transliteration,” in Proc. 48th Annu. Meet. ACL, Uppsala, Sweden, Jul. 2010, pp. 465–474.

H. Masroor et al., “Transtech: Development of a novel translator for Roman Urdu to English,” Heliyon, vol. 5, no. 5, May 2019, Art. no. 01780, doi: https://doi.org/10.1016/j.heliyon.2019.e01780.

S. K. Mahata, D. Das, and S. Bandyopadhyay, “MTIL2017: Machine translation using recurrent neural networks on statistical MT,” J. Intell. Syst., vol. 28, no. 3, pp. 447–453, Jul. 2019, doi: https://doi.org/10.1515/jisys-2018-0016.

N. Kalchbrenner and P. Blunsom, “Recurrent continuous translation models,” in Proc. Conf. Emp. Methods Nat. Lang. Proc., Oct. 2013, pp. 1700–1709.

N. Durrani and P. Koehn, “Improving machine translation via triangulation and transliteration,” in Proc. Annu. Conf. Eur. Assoc. Mach. Transl., Jun. 2014.

M. Zafar and A. Masood, “Interactive English to Urdu machine translation using example-based approach,” Int. J. Comput. Sci. Eng., vol. 1, no. 3, pp. 275–282, 2009.

J. Ni et al., “Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models,” arXiv preprint, arXiv:2108.08877, 2021.

M. A. Kumar et al., “An overview of the shared task on machine translation in Indian languages (MTIL-2017),” J. Intell. Syst., vol. 28, no. 3, pp. 455–464, Jul. 2019, doi: https://doi.org/10.1515/jisys-2018-0024.

D. Lamba and W. H. Hsu, “Answer-agnostic question generation in privacy policy domain,” in Proc. Int. Conf. Electron., Commun. Inf. Technol. (CECIT), Dec. 2021, pp. 1–6.

R. Dabre, C. Chu, and A. Kunchukuttan, “A survey of multilingual neural machine translation,” ACM Comput. Surv., vol. 53, no. 5, pp. 1–38, Oct. 2020, doi: https://doi.org/10.1145/3406095.

Z. A. Zeeshan and M. Z. Jawad, “Chinese–Urdu machine translation based on deep learning,” J. Auton. Intell., vol. 3, no. 2, pp. 34–44, 2020.

A. Mastropaolo et al., “Studying the usage of text-to-text transfer transformer to support code-related tasks,” in Proc. IEEE/ACM Int. Conf. Softw. Eng. (ICSE), May 2021.

J. J. Bird, A. Ekárt, and D. R. Faria, “Chatbot interaction with artificial intelligence using T5,” J. Ambient Intell. Humanized Comput., vol. 14, no. 4, pp. 3129–3144, Apr. 2023, doi: https://doi.org/10.1007/s12652-021-03439-8.

C. Raffel et al., “Exploring the limits of transfer learning with a unified text-to-text transformer,” J. Mach. Learn. Res., vol. 21, no. 140, pp. 1–67, 2020.

C. R. Dhivyaa et al., “Transliteration-based GPT-2 model for Tamil text summarization,” in Proc. Int. Conf. Comput. Commun. Info., Jan. 2022.

I. Ganguli et al., “Empirical auto-evaluation of Python code using T5 architecture,” in Proc. Int. Conf. Smart Comput. Commun. (ICSCC), Jul. 2021.

L. Xue et al., “mT5: A massively multilingual pre-trained text-to-text transformer,” arXiv preprint, arXiv:2010.11934, 2020.

S. H. Kumhar et al., “Translation of English into Urdu using LSTM model,” Comput., Mater. Contin., vol. 74, no. 2, pp. 3899–3912, 2023.

A. Ahmad and M. A. Ahmad, “Advancing Roman Urdu to Urdu transliteration using machine learning techniques,” Asian J. Multidiscip. Res. Rev., vol. 5, no. 2, pp. 108–127, Apr. 2024.

M. A. Soomro et al., “Spelling variation of Roman Urdu using machine learning,” J. Comput. Biomed. Informatics, vol. 7, no. 2, 2024.

J.-H. Ju, J.-H. Yang, and C.-J. Wang, “Text-to-text multi-view learning for passage re-ranking,” in Proc. ACM SIGIR Int. Conf. Res. Dev. Inf. Retrieval, Jul. 2021.

H. Baruah, S. R. Singh, and P. Sarmah, “Transliteration characteristics in Romanized Assamese social media text,” ACM Trans. Asian Low-Resour. Lang. Inf. Proc., vol. 23, no. 2, Art. no. 33, Feb. 2024, doi: https://doi.org/10.1145/3639565.

S. Ranathunga et al., “Neural machine translation for low-resource languages: A survey,” ACM Comput. Surv., vol. 55, no. 11, Art. no. 29, Nov. 2023, doi: https://doi.org/10.1145/3567592.

Published
2025-06-25
How to Cite
Usama Ahmed, Muhammad Adeel, & Usama Amjad. (2025). Roman Urdu to Urdu Machine Transliteration by Using T5 Transformer. UMT Artificial Intelligence Review, 5(1). Retrieved from https://journals.umt.edu.pk/index.php/UMT-AIR/article/view/5820
Section
Articles