I am new to python and word2vec and keep getting a "you must first build vocabulary before training the model" error. What is wrong with my code?
Here is my code:
file_object=open("SupremeCourt.txt","w")
from gensim.models import word2vec
data = word2vec.Text8Corpus('SupremeCourt.txt')
model = word2vec.Word2Vec(data, size=200)
out=model.most_similar()
print(out[1])
print(out[2])
I could see some wrong things in your code like the file is opened in write mode and the model which you have loaded doesn't contain the word which you want to find the most similar words.
I would like to suggest to use the predefined models like google_news_vectors to load in the gensim or to build your own word2vec model so that you won't get the error.
the usage of most_similar in gensim is out = model.most_similar("word-name")
file_object=open("SupremeCourt.txt","r")
from gensim.models import word2vec
data = word2vec.Text8Corpus('SupremeCourt.txt')
model = word2vec.Word2Vec(data, size=200)#use google news vectors here
out=model.most_similar("word")
print(out)