Towards Sindhi Corpus Construction

Mutee U Rahman

doi:10.32350/llr/11/04

Towards Sindhi Corpus Construction

Authors

Mutee U Rahman Department of Computer Science, Isra University - Hyderabad, Pakistan

DOI:

https://doi.org/10.32350/llr/11/04

Keywords:

corpus construction, unigram, bigram, trigram frequencies orthography, script

Abstract

The paper discusses the current state of Sindhi corpus construction in detail. Sindhi corpus development issues including corpus acquisition, preprocessing, and tokenization are discussed in detail. Preliminary results and observations which include letter unigram, bigram and trigram frequencies; word frequencies and word bigram frequencies are presented. Current state of Sindhi corpus with its limitations and future work is also discussed. The paper also explores the orthography and script of Sindhi language with reference to corpus development.

Downloads

Download data is not yet available.

137

Downloads

PDF ¹³³

Published

2015-03-31

How to Cite

Mutee U Rahman. (2015). Towards Sindhi Corpus Construction. Linguistics and Literature Review, 1(1), 39–48. https://doi.org/10.32350/llr/11/04

Download Citation

Issue

Vol. 1 No. 1 (2015): Spring

Section

Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution (CC-BY) 4.0 License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.

Towards Sindhi Corpus Construction

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

Browse

Keywords

Information

Current Issue

visitor

Towards Sindhi Corpus Construction

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

Browse

Keywords

Information

Current Issue

visitor

Social Media Links