Search code examples
c#arabicwordnet

Arabic WordNet with plural words


I am using Arabic wordNet with c# to get the synonym of a singular word like "عرض" and I get the following synonyms (علامة, أمارة, شدة, ضر, شؤم, بلية, etc ).
my question is: Is there a way to get the synonyms of plural word from the Arabic WordNet like the word "علامات".
I need that because I didn't find a way to get the singular word from a plural one in arabic language like "علامات" => "علامة.
I appreciate any help you provide.


Solution

  • I solved this by editing the awn.xml file and adding all needed plural words for example the word "عرض" has the plural "أعراض" and has synonyms علامات, أمارات, شدائد, بلايا,أضرار as the follwoing

    <wordnet version="20">
    <item itemid="&gt;aArad_n1AR" offset="102231120" lexfile="" name="أعراض" type="synset" headword="" POS="n" source="" gloss="" authorshipid="1" />
    <authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
    <item itemid="&gt;aMrad_n1AR" offset="102231121" lexfile="" name="أمراض" type="synset" headword="" POS="n" source="" gloss="" authorshipid="2" />
    <authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
    <item itemid="&gt;Isteqsa'at" offset="102231121" lexfile="" name="استقصاءات" type="synset" headword="" POS="n" source="" gloss="" authorshipid="3" />
    <authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
    

    then add the synonyms as the following

    <authorship author="ali" date="20180215" score="" comment="From suggested word" covering="0" authorshipid="12136" />
    <word wordid="&lt;aArad_n1AR" value="أعراض" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أعراض" wordid="&lt;aArad_n1AR" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;$araat" value="إشارات" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أعراض" wordid="&lt;$araat" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;Alamat" value="علامات" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أعراض" wordid="&lt;Alamat" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;$adaed" value="شدائد" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أعراض" wordid="&lt;$adaed" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;adrar" value="أضرار" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أعراض" wordid="&lt;adrar" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;balaya" value="بلايا" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أعراض" wordid="&lt;balaya" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;tawar'a" value="طوارئ" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أعراض" wordid="&lt;tawar'a" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;fawajea" value="فواجع" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أعراض" wordid="&lt;fawajea" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;fawadeh" value="فوادح" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أعراض" wordid="&lt;fawadeh" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;kawareth" value="كوارث" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أعراض" wordid="&lt;kawareth" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;mehan" value="محن" synsetid="&gt;aArad_n1AR"  type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أعراض" wordid="&lt;mehan" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;makrohat" value="مكروهات" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أعراض" wordid="&lt;makrohat" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;masaeb" value="مصائب" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أعراض" wordid="&lt;masaeb" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;masawea" value="مساوئ" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أعراض" wordid="&lt;masawea" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;Elal" value="علل" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أمراض" wordid="&lt;Elal" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;Ellat" value="علات" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أمراض" wordid="&lt;Ellat" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;Eatilalat" value="اعتلالات" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أمراض" wordid="&lt;Eatilalat" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;Da'aat" value="داءات" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أمراض" wordid="&lt;Da'aat" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;waakat" value="وعكات" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أمراض" wordid="&lt;waakat" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;askaam" value="أسقام" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أمراض" wordid="&lt;askaam" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;$akawa" value="شكاوى" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أمراض" wordid="&lt;$akawa" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;aMrad_n1AR" value="أمراض" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="أمراض" wordid="&lt;aMrad_n1AR" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;Fohosat" value="فحوصات" synsetid="&gt;Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="استقصاءات" wordid="&lt;Fohosat" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;Taharieat" value="تحريات" synsetid="&gt;Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="استقصاءات" wordid="&lt;Taharieat" type="brokenPlural" authorshipid="12137" />
    <word wordid="&lt;Isteqsa'at" value="استقصاءات" synsetid="&gt;Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
    <form value="استقصاءات" wordid="&lt;Isteqsa'at" type="brokenPlural" authorshipid="12137" />
    

    now when we execute the following code snippet

            List<string> wordId = _awn.Get_List_Word_Id_From_Value("علامات");
            List<string> synonyms = new List<string>();
            if (wordId != null)
            {
                foreach (string ss in wordId)
                {
                    string temp = _awn.Get_Synset_ID_From_Word_Id(ss);
                    List<string> test = _awn.Get_List_Word_Id_From_Synset_ID(temp);
                    if (test.Count != 0)
                    {
                        foreach (string str in test)
                        {
                            string s = _awn.Get_Word_Value_From_Word_Id(str);
                            if (!synonyms.Contains(s))
                                synonyms.Add(s);
                        }
                    }
                }
            }
    

    we get the following words in synonyms list "علل","علات","اعتلالات","داءات","وعكات","أسقام","شكاوى". which are the plural words of the synonyms of the word "عرض".