I trained a gensim Word2Vec model. Let's say I have a certain vector and I want the find the word it represents - what is the best way to do so?
Meaning, for a specific vector:
vec = array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)
I want to get a word:
'computer' = model.vec2word(vec)
Word-vectors are generated through an iterative, approximative process – so shouldn't be thought of as precisely right (even though they do have exact coordinates), just "useful within certain tolerances".
So, there's no lookup of exact-word-for-exact-coordinates. Instead, in gensim Word2Vec
and related classes there's most_similar()
, which gives the known words closest to given known-words or vector coordinates, in ranked order, with the cosine-similarities. So if you've just trained (or loaded) a full Word2Vec
model into the variable model
, you can get the closest words to your vector with:
vec = array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)
similars = model.wv.most_similar(positive=[vec])
print(similars)
If you just want the single closest word, it'd be in similars[0][0]
(the first position of the top-ranked tuple).