This is my code
w2v = Word2Vec(vector_size=150,min_count = 10)
w2v.build_vocab(x_train)
w2v.train(x_train)
def average_vec(text):
vec = np.zeros(300).reshape((1,300))
for word in text:
try:
vec += w2v[word].reshape((1,300))
except KeyError:
continue
return vec
And this throws the following error:
Traceback (most recent call last): File
"C:/Users/machao/Desktop/svm-master/word2vec.py", line 27, in <module>
train_vec = np.concatenate([average_vec(z) for z in x_train]) File "C:/Users/machao/Desktop/svm-master/word2vec.py", line 27, in
<listcomp>
train_vec = np.concatenate([average_vec(z) for z in x_train]) File "C:/Users/machao/Desktop/svm-master/word2vec.py", line 21, in
average_vec
vec += w2v[word] TypeError: 'Word2Vec' object is not subscriptable
Process finished with exit code 1
The Word2Vec
model object itself – w2v
in your code – no longer supports direct access to individual vectors by lookup word key, in Gensim 4.0 and above.
Instead, you should use the subsidiary object in its .wv
property - an object of type KeyedVectors
which can be used to work with the set of word-vectors separately. (Separating functionality like this helps in cases where you only want the word-vectors, or only have the word-vectors from someone else, but not the full model's overhead.)
So, everywhere you might use w2v[word]
, try w2v.wv[word]
instead.
Or perhaps, name things more like the following, and hold a different variable reference to the word-vectors:
w2v_model = Word2Vec(...)
word_vectors = w2v_model.wv
print(word_vectors[word])
For other tips in adapting your own older code, or examples online, to Gensim 4.0, the following project wiki page may be helpful: