Search code examples
pythongensimword2vec

I had a problem using word2vec. Maybe it's a version problem, but I don't know how to solve it ?


This is my code

w2v = Word2Vec(vector_size=150,min_count = 10)
w2v.build_vocab(x_train)
w2v.train(x_train)

def average_vec(text):
    vec = np.zeros(300).reshape((1,300))
    for word in text:
        try:
            vec += w2v[word].reshape((1,300))
        except KeyError:
            continue
        return vec

And this throws the following error:

Traceback (most recent call last):   File
"C:/Users/machao/Desktop/svm-master/word2vec.py", line 27, in <module>
    train_vec = np.concatenate([average_vec(z) for z in x_train])   File "C:/Users/machao/Desktop/svm-master/word2vec.py", line 27, in
<listcomp>
    train_vec = np.concatenate([average_vec(z) for z in x_train])   File "C:/Users/machao/Desktop/svm-master/word2vec.py", line 21, in
average_vec
    vec += w2v[word] TypeError: 'Word2Vec' object is not subscriptable

Process finished with exit code 1

Solution

  • The Word2Vec model object itself – w2v in your code – no longer supports direct access to individual vectors by lookup word key, in Gensim 4.0 and above.

    Instead, you should use the subsidiary object in its .wv property - an object of type KeyedVectors which can be used to work with the set of word-vectors separately. (Separating functionality like this helps in cases where you only want the word-vectors, or only have the word-vectors from someone else, but not the full model's overhead.)

    So, everywhere you might use w2v[word], try w2v.wv[word] instead.

    Or perhaps, name things more like the following, and hold a different variable reference to the word-vectors:

    w2v_model = Word2Vec(...)
    word_vectors = w2v_model.wv
    print(word_vectors[word])
    

    For other tips in adapting your own older code, or examples online, to Gensim 4.0, the following project wiki page may be helpful:

    Migrating from Gensim 3.x to 4