Development of Saraiki WordNet by Mapping of Word Senses: A Corpus Based Approach

This paper aimed to develop the Saraiki WordNet. Saraiki is one of the regional languages spoken in Pakistan and has a unique history of its own. Saraiki language is remarkably similar to two languages, namely Punjabi and Sindhi. Saraiki has different dialects and each dialect is representative of the region where it is spoken. This paper used the Urdu WordNet (Zafar, Mahmood, Shams & Hussain, 2014) as the basis for the formation of Saraiki WordNet. Urdu WordNet (Zafar et al., 2014) was created by UET Lahore and is based on Princeton WordNet (Miller, 1990). Dictionaries or lughats and literary sources, such as poetry, fiction, as well as non-literary sources, such as newspapers of Saraiki language, were used to extract data. Additionally, Urdu word senses were mapped onto Saraiki word senses. The method used in this study was mapping, while the expansion approach was used in the mapping process. This study may aid in creating bilingual dictionaries (of Saraiki and Urdu?) in the future. 
Keywords: expand approach, mapping, Saraiki language, WordNet


Introduction
Saraiki is counted among the widely spoken languages in the Pakistani provinces of Punjab and Khyber Pakhtunkhwa (KPK). It is a sister language of Sindhi and Punjabi languages and is greatly influenced by both of them. The speakers of this language are scattered across different geographical regions of Pakistan. In each area, their Saraiki speech is influenced by local languages. Hence, Saraiki has incorporated different elements of local languages, which has allowed it to evolve into a distinct but related language (Garcia, 2016).
WordNet is a thesaurus and it is very useful for computational purposes. It can be downloaded and used online. WordNet has different versions in many languages. Princeton WordNet (Miller, 1990) is the first WordNet to be developed in this regard. It is an English language WordNet developed by George Armitage Miller. Previously, dictionaries were used for finding meaning by people and they are available to use for humans only. A WordNet contains not just words and meanings but also incorporates their concepts and examples. This makes it more useful than conventional dictionaries as more emphasis is given to computational methods of word databases. It is an intricate system and consists of a database provided with a complete system of documentation and tracking (Miller, 1990).
The WordNet developed by George Armitage Miller consists of strings of words. Each word has multiple synonyms. These synonyms pertain to one sense of the word. The WordNet incorporates all of their related meanings, senses and concepts. There are a total of 118,000 different word forms in it. There are different word senses totalling around 90,000 and pairs included are totalling around 166,000. The amount of polysemous words is 17% and 40% have set of synonyms. Different categories are distinguished in this WordNet based on different criteria. Nouns, verbs and other categories are mentioned. In terms of parsing system, some 300 prepositions and pronouns are important (Miller, 1990).
Inflectional and derivational morphology is also taken into context in this WordNet. Inflectional morphology is a big part of the WordNet system, it provides the option for seeing the other form as well while on the other hand derivational morphological information is also given a distinct position in it. Different semantic relations are also given importance (Miller, 1995). Some of the semantic relations mentioned are synonymy, antonymy, hyponymy, meronymy, toponymy and entailment. All these relations are assigned specific categories and there are almost 116,000 such relations in this WordNet (Miller, 1990).
The systems provided in this WordNet make it possible to search the required item and find the required category. This, in turn, makes it possible to find the exact meronyms and hyponyms of a given word. It helps in retrieving the information easily and keeping it at hand. The issue of polysemy rises when one language is translated into another. Sometimes, there are multiple meanings provided for one word and it causes problems in determining the proper translation for the said word. It is crucial to take context into consideration in order to find the exact meaning. This WordNet needs a lot of development in this regard as it gives multiple meanings without giving proper consideration to the context and it becomes hard to find the relevant meanings of the words. Algorithms are needed to provide the required context. Sense identification is very important in order to find the exact meaning. Proper contextual representation is needed in a WordNet (Miller, 1990).
Lots of methods have been used to counter this issue in computational linguistics. One way is to limit the discourse. Topical context is another way to solve this problem. Sometimes, local context is also used to solve this issue. Still, a proper system is needed to find the correct meanings according to the context, the absence of which is creating a lot of problems for people using this WordNet. Semantic concordance is very important in creating links in the lexicon contained in a corpus. It is a small-scale method to solve this problem and a large-scale solution of this problem is still required (Miller, 1990).

Aims and Objectives
 The aim of the current study is to develop a Saraiki WordNet using Urdu WordNet as its basis.

Research Questions
This research asked the following research questions: 1. What is the process involved in mapping Urdu word senses to Saraiki word senses? 2. How the word senses of the two languages are aligned to help develop Saraiki WordNet?

Literature Review
Saraiki language is part of the vibrant culture of Pakistan which has different colours according to the region. Saraiki language has been spoken in multiple parts of Pakistan whether it is in Northern or Southern parts of the country. Saraiki language has a long history of its origin and dialects.
To some researchers, lexical knowledge base is very useful in resource development of language. An important lexical knowledge base is the WordNet, which is very useful in language processing. There are many ways to extend this resource. WordNet is one of the most important components of lexical knowledge base. It helps in semantic search, text summarization and Word Sense Disambiguation (Fernando & Stevenson, 2012).
Mapping is one of the ways used to enrich the WordNet and also to develop them. The use of automated as well as manual methods is very important regarding the development of the WordNet. This is why, in the development of this WordNet, the use of manual annotation has been very important. These mappings after the development of wordnet are put online for access (Fernando & Stevenson, 2012).
There are many semantic relations in WordNet such as the ones given below. Reprinted from "Nouns in WordNet: A Lexical Inheritance System," by (Miller, 1998) WordNet is very important in Natural Language Processing (NLP) and it is an invaluable source in computational linguistics. It works on the basis of a thesaurus. Working to end the problems in everyday dictionaries, Miller (1990) developed the first wordnet. It solved the problems relating to the senses and in the definitions. WordNet consists of lemmas and senses (Artale, Magnini, & Strapparava, 1997).
There are two ways to map senses in WordNet. One is manually and the other is automatically. Automatic method uses the already available resources to construct a WordNet. This method uses Word Sense Disambiguation (WSD), which takes the words collected from bilingual dictionaries and connects them with the WordNet Synsets. There are many dictionaries which use this method to develop WordNet. Many functions related to NLP demand compact ontologies. These ontologies help in information retrieval. Only a few languages have ontologies and many languages still lack in compiling ontologies. It is very difficult to develop ontologies manually as this work demands a lot of time and resources. Researchers use the already available resources to develop the WordNet, as these resources already cover a wide range of lexical knowledge and semantic information. Korean WordNet is among the WordNets developed by this method. It uses the automatic WordNet mapping by utilizing WSD. Korean words from a bilingual dictionary (MRD) are linked with the English WordNet Synsets (Lee, Lee &Yun, 2000).
The mapping process involved in the development of this WordNet used all the heuristics mentioned above and the decision tree helped in the disambiguation process. All heuristics were used either to link or discard the candidate Synsets. In the instance of Korean WordNet, manual classification was used to link or discard the 3260 candidate Synsets with the senses found in the Korean bilingual dictionary. Precision and coverage were also involved in the process. Precision helps to find the correctly linked senses and coverage indicates the proportion of linked senses (Lee et al., 2000). There are two methods used to develop a WordNet. One is the 'expansion' approach and the other is the 'merge' approach. The method employed here is the expansion approach. It has been used previously to develop a number of WordNets. The merge approach is used where extensive resources are available and time constraints are little to nothing.
Several methods are used in the expansion approach to develop a WordNet. These methods include (i) Cross-lingual WSD (ii) Google Similarity Distance (iii) Intersection method (iv) Multiple Heuristic method (v) Combining multiple methods (vi) Assign procedure (vii) Base concepts and (viii) MultiDic tool. Cross-lingual WSD uses both Word Sense Induction (WSI) and Word Sense Disambiguation (WSD). This method is discussed by Apidianaki. The first WordNet developed by using it was the French WordNet. This method creates semantically similar groups and after disambiguation, these groups are placed in their positions in the WordNet. WSI method was used by Apidianki in the English-Greek corpus. The variations of English words are represented by three Greek equivalents (EQVs) in this WordNet. Every EQV represents a different sense of the given word. To distinguish between each sense and to place semantically similar EQVs in the same cluster, semantic similarity of each pair is calculated (Nadageri & Haribhakta, 2017).
In the above instance of word variation, two words {increase, significant} out of surrounding context features found in cluster 1 representing Greek word διακύμανζη with sense fluctuation. Using this approach, Greek equivalents replaced PWN Synsets to create the Greek WordNet. The performance of cross-lingual WSD was found to be quite promising and approximately 72% nouns, 62% verbs, 81% adjectives, and 86% adverbs were correctly distinguished (Nadageri & Haribhakta, 2017).
Another method known as Google Similarity Distance is used to link words with the English Synsets through WSD. To find the suitable link with the Princeton WordNet (Miller, 1990), a similarity was determined between translated Synsets and translated definition in the target language. This method was used in the development of Macedonian WordNet. It was identified that "the result as per the discussion in shows that Google Similarity Distance method has 87% accuracy in assignment of appropriate Synsets. It correctly translates 14,335 English Synsets into Macedonian Synsets" (Nadageri & Haribhakta, 2017).
Intersection method is another method used in the development of WordNet. In this method, synonymy is the main feature as it is responsible for creating equivalence classes. Two WordNets including Macedonian and Romanian WordNets were developed using this method. It involves two rules (Nadageri & Haribhakta, 2017). The first one states that if the original Synset contains at least one monosemous word, then the translation of that monosemous word is sufficient to translate other words in the Synset. The second rule is that if the original Synset contains more than one polysemous word, then the intersection of the translations of each word in the synset forms translation of original synset (Nadageri & Haribhakta, 2017).
Another method is called 'combining multiple methods' which combines different ways to develop a WordNet. Homogenous Bilingual (HBil) dictionary is very useful and based on this method. It has word entries in both ways to make it easier to work in both languages. This dictionary helps in linking senses with the WordNet. Other methods have been used also in this way including class method, structural method and conceptual method. Class method uses the processed dictionary and criteria to develop words. The criteria used in this method are the polysemic criterion, hybrid criterion and field criterion. Structural method takes the whole structure of PW and links it with the Synsets of the Princeton WordNet (Miller, 1990). The criteria involved in this method are the intersection criterion, parent criterion, brother criterion and distant hyperonym criterion. The last method is the Conceptual Distance method. This method deals with the closeness between the meanings of words. It is calculated to find the closeness of words with each other and monolingual dictionary entries are explored. With the accuracy level of 85%, the Synsets are linked to the Princeton WordNet (Nadageri & Haribhakta, 2017).
Spanish WordNet is built using a combination of these methods. The results of these methods are quite encouraging. In Spanish WordNet v.0.0, all Synsets with a Confidence Score (CS) of more than 85% were selected and 10,982 connections were obtained. Combining discarded Synsets having CS less than but near to 85% could be acceptable, as new connections increased the number of connections by 7,244. Finally, Spanish WordNet v. 0.1 with greater accuracy of 86.4% was obtained (Nadageri & Haribhakta, 2017).

Research Methodology and Corpus Development
Expansion approach is the most widely used method in WordNet development. Lexicographers use this method to build a WordNet. This method is one of the ways to connect to another WordNet as well which results in the WordNet carrying the format and properties of the other developed WordNet.

Dictionaries Used in the Current Study
In this study, different sources were utilised. Of these sources, dictionaries played a big part in the development of WordNet. These dictionaries were both monolingual and bilingual. Bilingual Saraiki-English and Saraiki-Urdu dictionaries proved to be very helpful in conducting this study.

Sources of Corpora
Corpus was compiled using different sources such as newspapers, stories, essays and poetry. It took a long time to compile this diverse corpus which proved to be very helpful in providing necessary examples and also helped to elaborate the concepts. There are many WordNets which have used the corpus in the process of development of these databases. One such WordNet is the Tatar WordNet Galieva, Nevzorova & Suleymanov (2015), also called the Tat WordNet. It uses the Tatar National Corpus as the source to collect verbs. Due to the ambiguities regarding the semantics of Turkic and Tatar words, there is a need of a comprehensive language source. The Tatar National Corpus helped to find the correct definitions and also helped in creating a hierarchical network in the development of Tat WordNet. Its use spurred the development of the modern WordNet. It also helped to analyze the various syntactic features and hierarchical networks of semantic relations (Galieva et al., 2015).
Developing a WordNet-like thesaurus of Tatar verbs allowed us to combine the experience of the traditional Tatar lexicography and modern information technologies. The Tatar National Corpus played an important role in building the Tatar WordNet. The use of corpus technology enabled us to create a resource that reflected adequately the distribution of Tatar words and their lexical-semantic variants in real contextual environments (Galieva et al., 2015).
The real use of language in corpus is beneficial as it yields adequate data to provide definitions in the WordNet. Next is the development of synsets which requires a lot of data and analysis. The Tatar National Corpus helped to find the correct pairs and synsets. These relations and pairs proved pivotal in the analysis and processing of data. The Tatar National Corpus takes into account the verbs but other parts of speech can be processed as well (Galieva et al., 2015).

Disciplines of Corpora
Different sources were used to collect the data manually. Corpora were developed using either the automatic method or the manual method. In one study by Giampieri (2019), the manual method was found to be far more reliable, although very tiresome in processing. These diverse sources helped to provide the required amount of data needed to establish the proper use of language. The different disciplines used in the study are given below.    All of the above sources were utilized to obtain the required data. These corpora helped in cross-checking the data and in the authentication of the mapping process. Corpora were compiled in Word. This Word Document has untagged corpora and all the data needs to be properly tagged. The untagged data is still unrefined and needs to be properly edited and saved in the Word Document.

Converting Data into Machine Readable Form
All the data gathered from different sources was later converted into machine readable format. For this task, different tools and methods were used. These books were initially in hard form and it took a tremendous effort to convert them to machine readable format. Firstly, all the books in the data were scanned by using the HP DeskJet All-In-One Printer and put through a process of converting them into PDF by using the iLovePDF website. The website converted the data into PDF format. In some instances, OCR was also done by utilizing Google Lens, which helped in segregating the text from the images. Later on, this text was pasted into a word file and a word document was developed using this method. This data was later combined with the data taken from other sources, such as internet.

Coding the Corpus
The corpus of newspaper was given the code NP. The corpus of fiction was coded as FT and the corpus of essays as ES. All these codes were properly mentioned for each of the corpus and during the process of compilation, these codes helped to identify the different sources used in the corpus.

Universal Tag Set
The tagging of data was done using the POS tag set. For this purpose, especially designed POS Tag sets are freely available. These POS tag sets helped in tagging data and categorized them Word Doc in proper grammatical categories. POS tag sets can be developed from scratch or they can be downloaded as well. There are many forms of POS tag sets. Some POS tag sets are made for specific reasons and some are made for many languages. The POS tag sets made for many languages can be used for tagging multiple languages and they are known as Universal Tags. The information about the tags of nouns, verbs and other parts of speech helps to know about the grammatical categories of collocated words. For example, knowing nouns from POS tagging helps us to know the adjectives and other grammatical categories. The placement of noun in a phrase tells us about the nature of the phrase. POS tagging helps in many ways. One of the many benefits of POS tagging is that it helps in the process of information extraction about people and organizations, which are all named entities. Another benefit is speech recognition and co-reference resolution (Jurafsky & Martin, 2019).

Benefits of POS Tag Sets
Universal taggers are used for many languages and this is their main benefit. There are many languages which are tagged with universal tag sets. These tag sets have been developed by many researchers. All the languages tagged with universal tag sets create a kind of database where they can be compared and mapped together. Two universal taggers used are Universal Dependencies and Google Universal POS tagger. Both of these POS taggers are easily available and can be used in multiple documents. These are refined POS taggers which provide clarity regarding the use of grammatical categories. Universal Dependencies tag set has 16 tag sets and these can be modified further to add grammatical categories of different languages (Nivre et al, 2016 as cited in Jurafsky & Martin, 2019).

Google Universal Tag Set
The POS tagger used in this study was the Google Universal Tagger. It is quite helpful as it gives us basic details regarding POS tagging. The Google Universal tag set consists of twelve POS tags. It not just provides tag sets but also performs the mapping of 25 treebank tag sets from different languages. These mappings prove helpful in providing the tag sets needed to compare different languages. After combining it with the other main tag set, we created a database of almost 22 different languages in the same place. To check the benefits and use of this tag set, it went through many experiments. All the treebanks were checked through the Universal POS tag set to know its authenticity. For unsupervised grammar induction and parser, the Google Universal tag set was utilized (Petrov, Das & McDonald, 2011). The table shows the Google Universal tag set which is complete and contains all basic grammatical categories. This is a basic tag set and it is very helpful in providing mappings between different languages. This table gives us the facts about the total number of words in the corpora. Word type denotes the individual words, while word token denotes the frequency of occurrence of these words in the corpora.

Development of the WordNet
The process of the development of the Saraiki WordNet was marked by various issues. WordNet was developed by using the expansion approach. For this purpose, a complete WordNet was needed to help in the mapping process. Urdu WordNet (Zafar et al., 2014) was used in the mapping process. Saraiki words were taken from the corpora developed from news reports, poetry and other sources. Excel sheets in Microsoft Excel were used to store the basic database. These sheets were first loaded with Urdu WordNet (Zafar et al., 2014) acquired from CLE, UET Lahore. This Urdu WordNet (Zafar et al., 2014) was received in UTF-16 format in Notepad, which was later loaded in Excel sheets. Relevant labels were also provided and data was refined to fit according to the requirements. This WordNet was later used as pivot for further work on developing Saraiki WordNet.

Translation of Urdu Entries
After the loading of data, the process of translating Urdu entries into Saraiki began. Saraiki translation of Urdu entries took into account all the senses of the words and no concept was left out. Hence, there was less confusion and retaining the clarity of senses remained the utmost priority at that stage. These translations were made with the help of native speakers and bilingual dictionaries. These sources helped in doing literal translations of Saraiki words and they also helped in other processes. The literal translations were all documented in the Excel sheets. Later on, they were compared with the corpus for another round of determining their authenticity and also to root out any mistakes or false translations. These entries were stored in a separate database and afterwards helped in finding the correct senses. The translations were the starting point in the mapping process and comprehensive translations were very important at this point.

Role of Corpora
The next process involved the preparation of the corpus. The corpora used was of the three kinds already explained in the chapter on corpus compilation. The corpora composed of different sources which proved to be very diverse and helpful in finding the correct usage of words and also in the process of translation. The translations were cross checked from the corpora and incorporated in a different database. These translations were mapped with the Urdu words. In the mapping process, the previous literal translations helped a lot as they provided the corner stone on which we decided the suitability of the sense. Suitable senses were later added to Saraiki words, side by side with Urdu words, as final translations of Urdu words after their evaluation based on literal translations and the corpora.

Process of Encoding
The corpus helped to find the correct literal translation of words and also acted as the backup resource geared to provide relevant examples and concepts. It is a long process to determine the literal translation of words from the corpus. The corpora were compiled first in Microsoft Word, with all the relevant words and information. The data was cleaned and stored in Microsoft Word files for future use. It was in the form of UTF-8 format in Notepad. The data in Notepad helped in AntConc analysis (Anthony, 2019). After loading the data in AntConc software (Anthony, 2019), the corpora appeared in this tool ready for analysis. In the different tabs mentioned for different types of analysis, wordlist is among the most important ones. It provides us with the frequency of the words, that is, how many times a single word appears in the corpus. The resultant list is cloned and used for the purpose of analysis. Moreover, it is also used to find the translations of the words and to cross-check them as well.

Frequency
Wordlist provides the details about each word used in the corpus including its frequency, which tells us the number of times a word is used in the corpus. Each corpus was loaded into the AntConc software (Anthony, 2019) and its frequency was noted down. The frequencies of all the entries were combined to create a complete list of all the words in the corpora. The wordlist helped to find the correct and reliable senses of Saraiki words. As this wordlist was based on the live use of language, it is a reliable and trustworthy source to be used in the WordNet.  The translations were added to the database after thorough evaluation. These translations were added under the label of Saraiki words.

Concepts
The next step was the addition of concepts which provided the unique explanations of these words. Concepts were provided with the help of the corpora and native speakers. The explanations helped to provide the bases for the addition of further examples. They furnished the meanings and explanations of the literal senses already provided in the WordNet. These explanations were clear and provided complete meanings of these words. Concepts were not ambiguous and solved the problem of the similarity of senses. Whenever a new WordNet is compiled, there is always an issue of similar senses of words. So, in order to solve this issue, corpora were consulted to provide the clarity of meaning and also to remove the ambiguities. The compiled corpora helped not only in providing clarity but also in providing related examples from the live use of language. The concepts helped to end the ambiguities. They were added right beside the Saraiki senses and together they helped to provide the basic explanations of the Saraiki words.

POS Tags
The next important step was providing POS to the senses of words. After loading the corpora into AntConc (Anthony, 2019), frequency was generated in the wordlist and it revealed the exact number of times a word appeared in the said corpora. The results from the wordlist were later cloned and sent to the Microsoft Word file. The file was used as the basis for the tagging process, which involved an Urdu Tags set that helped in tagging the relevant and suitable part of speech as the grammatical tag to Saraiki senses. The words in the corpora were tagged according to their grammatical category. Later on, these tags were used to provide Saraiki senses from the corpora with POS categories of verb, noun or adjective. The process of tagging provided us with correct grammatical categories which, in turn, helped us in the mapping process. The correct grammatical categories were very carefully mapped with the Urdu WordNet (Zafar et al., 2014).

Examples
Relevant and suitable examples were assigned to the senses. These examples were taken from the corpus and they provided us with the context exemplifying the use of language. For tracing examples, concordance lines were used to find the relevant word and then the whole sentence was incorporated into the WordNet. Concordance in the AntConc software (Anthony, 2019) helped to find out all the relevant queries related to a word. With context in sight, it becomes far easier to find the suitable query which has all the qualities of a word and does not leave out any meaning.
After loading the data in AntConc (Anthony, 2019), the wordlist was created. The results were cloned and the words chosen were later processed through the concordance procedure. This procedure occurred in the Concordance tab of the tool. The concordance process helped to find the context of the words and the most suitable sentence was chosen which best explained the given word. Full sentences were recorded in file view and added to the WordNet. Concordance resulted in seeking different instances of the word used in various contexts. The context which matched the sense of the word in Saraiki language was chosen for the WordNet. Concordance line is the parameter to find the suitable word to add in the WordNet. The process incorporated all three corpus files. It helped in searching the whole corpora and did not leave out any file. File view provides the details regarding the use of words in the corpora. When a certain word was chosen from the wordlist to find its concordance, we searched the related words to find the different instances of that word . By clicking at one, the use of that word in different sentences of the corpora was determined. The one instance which was most suitable to the sense was added to the developing WordNet.

Mapping of Senses
WordNet senses of both Saraiki and Urdu words were mapped. The mapping process involved both the Urdu and Saraiki words in order to determine which of these words were mapped and which of them were not. Urdu senses from Urdu WordNet (Zafar et al., 2014) were already loaded into the Excel sheets. Saraiki words, which were chosen from the corpora, were added to the data base. Both Urdu and Saraiki words were later compared. Some of the senses matched, while some did not. There should be clarity in the meaning and translation of the words so that no mistakes are made. Only suitable words were aligned with Urdu words. The mapping process manifested the similarity and differences between the two languages. There should be similarity with regard to the use of language in word senses as well as in other categories. The mapping of senses gave us information regarding the overall similarity between different words based on their literal translations. The senses were all based on Saraiki literal translations and the corpora. Some of the senses were successfully mapped, while some were left out. Successful mapping resulted in the mapping of senses where words were all properly aligned based on their senses, concepts and examples. Any discrepancy between these categories resulted in no match result. Some of the ambiguous and confusing entries were removed from the developed WordNet. The entries which were most compatible with the Urdu word senses were chosen and the rest were removed to avoid any problem. Table 4.1 shows the word senses found in the Urdu WordNet (Zafar et al., 2014). Adverb 97 Adjective 1067 The unmapped word senses were considered no match and it is mentioned as such in the WordNet. There were some words which were hard to map. These words were aligned side by side while keeping in view each and every concept and grammatical category. If the translation of the words were not found in the corpora, they were removed from the WordNet and were not aligned with any source. The words which are mapped are all shown in the Saraiki word part and they are later mapped with the Urdu WordNet (Zafar et al., 2014). It is important for the authenticity of the data that information is taken from the corpus. Without corpora, it would be hard to find the instances of the use of Saraiki words.

Unmapped Senses
There were a number of senses which remained unmapped. These unmapped word senses were removed from the WordNet. Not many senses of Saraiki words were found in the Urdu WordNet (Zafar et al., 2014). Occasionally, some word senses from the Urdu WordNet (Zafar et al., 2014) were not present in the literal translation of Saraiki words. Urdu WordNet (Zafar et al., 2014) has a total number of 5132 senses in it and 2910 senses were matched. The remaining Urdu word senses were not present in the Saraiki corpora and were removed from the WordNet.

Conclusion
WordNet is a great source of lexical information. Different WordNets have been developed in the past.
In the current study, Saraiki WordNet was developed by mapping Urdu and Saraiki word senses. Urdu word senses are part of the Urdu WordNet (Zafar et al., 2014) and these were mapped onto the Saraiki word senses. There are many methods used to develop a WordNet. This Saraiki WordNet was developed using the expansion approach, which is one of the best ways to build a WordNet. The expansion approach helps in linking two WordNets and also takes into consideration already existing resources.
This study was limited due to the constraints of the data. Only corpus data was utilized during the process of WordNet development. Corpora exemplifies a limited use of language. Hence, data was limited to the number of instances provided in the corpora.
This study will prove useful for many researchers working on the Saraiki language. Future researchers working on different reports or dissertations can get useful data from this study and its results.
The biggest advantage that this study confers is the corpora developed for this study. The corpora of Saraiki language manifests its use in different settings and in different varieties. The corpora will prove useful for students and researchers working on this language because of its diversity. Any part of the corpora can be used in future researches dealing with corpus analysis or NLP.
This corpus is helpful for building future bilingual dictionaries as well. Bilingual dictionaries contain information related to two languages. This research is based on mapping both the Saraiki and Urdu word senses and paves the way for the future bilingual Saraiki-Urdu dictionary.