Search code examples
nlpsvdlemmatizationlsalatent-semantic-analysis

Latent Semantic Analysis and Stemming


Assume a very large corpus of any inflective language. Does the following make sense? By applying LSA on such corpus, words with similar concepts converge together in vector space, thus inflected word forms reffering to the same concept should ideally be identical with their lemma in the space. With such assumption, any lemmatization or stemming of queries or corpus is not necessary. Or am i totally wrong?


Solution

  • According to the founders of LSA, stemming is not necessary. Though, I think there is general disagreement in the literature about this. I have read a few papers where stemming was found to improve results for a given information retrieval task.

    Generally, there is recent research that shows stemming does not help in topic modeling and may even hurt topic coherence.