Search code examples
python-3.xnlpgensimword2vec

word2vec - find a word by a specific vector


I trained a gensim Word2Vec model. Let's say I have a certain vector and I want the find the word it represents - what is the best way to do so?

Meaning, for a specific vector:

vec = array([-0.00449447, -0.00310097,  0.02421786, ...], dtype=float32)

I want to get a word:

 'computer' = model.vec2word(vec)

Solution

  • Word-vectors are generated through an iterative, approximative process – so shouldn't be thought of as precisely right (even though they do have exact coordinates), just "useful within certain tolerances".

    So, there's no lookup of exact-word-for-exact-coordinates. Instead, in gensim Word2Vec and related classes there's most_similar(), which gives the known words closest to given known-words or vector coordinates, in ranked order, with the cosine-similarities. So if you've just trained (or loaded) a full Word2Vec model into the variable model, you can get the closest words to your vector with:

    vec = array([-0.00449447, -0.00310097,  0.02421786, ...], dtype=float32)
    similars = model.wv.most_similar(positive=[vec])
    print(similars)
    

    If you just want the single closest word, it'd be in similars[0][0] (the first position of the top-ranked tuple).