Search code examples
pythonword2vec

Word2Vec Vocabulary not definded error


I am new to python and word2vec and keep getting a "you must first build vocabulary before training the model" error. What is wrong with my code?

Here is my code:

file_object=open("SupremeCourt.txt","w")
from gensim.models import word2vec

data = word2vec.Text8Corpus('SupremeCourt.txt')
model = word2vec.Word2Vec(data, size=200)

out=model.most_similar()

print(out[1])
print(out[2])

Solution

  • I could see some wrong things in your code like the file is opened in write mode and the model which you have loaded doesn't contain the word which you want to find the most similar words. I would like to suggest to use the predefined models like google_news_vectors to load in the gensim or to build your own word2vec model so that you won't get the error. the usage of most_similar in gensim is out = model.most_similar("word-name")

    file_object=open("SupremeCourt.txt","r")
    from gensim.models import word2vec
    
    data = word2vec.Text8Corpus('SupremeCourt.txt')
    model = word2vec.Word2Vec(data, size=200)#use google news vectors here 
    
    out=model.most_similar("word")
    print(out)