Search code examples
pythonword2vec

word2vec vocab vs char


I'm using word2vec to represent my words as vectors.

text = np.loadtxt("file.txt", dtype=str, delimiter=" ")
word2vec = w2v.Word2Vec(text, size=100, window=5, min_count=5, workers=4)
print(len(word2vec.wv.vocab))

text is a list of words(strings). Instead of printing the number of words, this code prints 26, # English letters. In order to train word2vec to my model, I need to be dealing with words, not letters. I've tried converting text to a string, but it wasn't successful. What am I doing wrong?


Solution

  • I believe you need to pass a list of lists of words:

    word2vec = w2v.Word2Vec(text.reshape(-1, 1), size=100, window=5, min_count=5, workers=4)