There's this article about sentiment analysis of Arabic.
In the beginning of page 5 it says that:
"Experiments also show that stemming words before feature extraction and classification nearly always degrades the results".
Later on in the same page, they state that:
"...and an Arabic light stemmer is used for stemming the words"
Um I thought that a stemmer/lemmatizer was always used before text classifications, why does he say that it degrades the results?
Thanks :)
I do not know the arabic language, it may be specific in many aspects, my answer regards english.
Um I thought that a stemmer/lemmatizer was always used before text classifications, why does he say that it degrades the results?
No it is not, in entirely depends on the task. If you want to extract some general concept of the text, then stemming/lematization is a good step. But in analysis of short chunks, where each word is valuable, stemming simply destroys its meaning. In particular - in sentiment analysis stemming may destroy the sentiment of the word.