Search code examples
nlpgensimword2vec

stopword removing when using the word2vec


I have been trying word2vec for a while now using the gensim's word2vec library. My question is do I have to remove stopwords from my input text? Because, based on my initial experimental results, I could see words like 'of', 'when'.. (stopwords) popping up when I do a model.most_similar('someword')..?

But I didn't see anywhere referring that stop word removal is necessary with word2vec? Does the word2vec is supposed to handle stop words even if you don't remove them?

What are the must do pre processing things (like for topic modeling, it's almost a must that you should do stopword removal)?


Solution

  • Personaly I think, removal of stop word will give better results, check link

    Also for topic modeling, you shlould perform preprocessing on the text, following things you must do,

    1. Remove of stop words.
    2. Tokenization.
    3. Stemming and Lemmatization.