Search code examples
pythonutf-8arabicstemming

Python ISRIStemmer for Arabic text


I am running the following code on IDLE(Python) and I want to enter Arabic string and get the stemming for it but actually it doesn't work

>>> from nltk.stem.isri import ISRIStemmer
>>> st = ISRIStemmer()
>>> w= 'حركات'
>>> join = w.decode('Windows-1256')
>>> print st.stem(join).encode('Windows-1256').decode('utf-8')

The result of running it is the same text in w which is 'حركات' which is not the stem

But when do the following:

>>> print st.stem(u'اعلاميون')

The result succeeded and returns the stem which is 'علم'

Why passing some words to stem() function doesn't return the stem?


Solution

  • Ok, I solved the problem by myself using the following:

    w = 'حركات' 
    st.stem(w.decode('utf-8'))
    

    and it gives the root correctly which is "حرك"