I had a problem using word2vec. Maybe it's a version problem, but I don't know how to solve it ？

This is my code

w2v = Word2Vec(vector_size=150,min_count = 10)
w2v.build_vocab(x_train)
w2v.train(x_train)

def average_vec(text):
    vec = np.zeros(300).reshape((1,300))
    for word in text:
        try:
            vec += w2v[word].reshape((1,300))
        except KeyError:
            continue
        return vec

And this throws the following error:

Traceback (most recent call last):   File
"C:/Users/machao/Desktop/svm-master/word2vec.py", line 27, in <module>
    train_vec = np.concatenate([average_vec(z) for z in x_train])   File "C:/Users/machao/Desktop/svm-master/word2vec.py", line 27, in
<listcomp>
    train_vec = np.concatenate([average_vec(z) for z in x_train])   File "C:/Users/machao/Desktop/svm-master/word2vec.py", line 21, in
average_vec
    vec += w2v[word] TypeError: 'Word2Vec' object is not subscriptable

Process finished with exit code 1

Solution

The Word2Vec model object itself – w2v in your code – no longer supports direct access to individual vectors by lookup word key, in Gensim 4.0 and above.

Instead, you should use the subsidiary object in its .wv property - an object of type KeyedVectors which can be used to work with the set of word-vectors separately. (Separating functionality like this helps in cases where you only want the word-vectors, or only have the word-vectors from someone else, but not the full model's overhead.)

So, everywhere you might use w2v[word], try w2v.wv[word] instead.

Or perhaps, name things more like the following, and hold a different variable reference to the word-vectors:

w2v_model = Word2Vec(...)
word_vectors = w2v_model.wv
print(word_vectors[word])

For other tips in adapting your own older code, or examples online, to Gensim 4.0, the following project wiki page may be helpful:

Migrating from Gensim 3.x to 4