Search code examples
pythonword2vecgensimword-embeddingdoc2vec

Given a word vector get the word of it in word2vec


I obtain word vectors from my code. e.g.,

array([ -3.09521449e-04,   2.73033947e-06,   2.15601496e-04, ...,
         5.12349070e-04,   5.04256517e-04,   8.16784304e-05], dtype=float32)

Now, I want to identify what is the word that represents this word vector in wor2vec genism.

I tried it using the below code. However it did not work.

print(model.wv.index2word(kmeans_clustering.cluster_centers_))

Please help me.


Solution

  • The gensim most_similar() method will take a vector as an argument, as well, but you have to explicitly supply it as one item inside a list of positive examples – so that it's not misunderstood as a something else.

    For example:

    wv = model.wv.['book']
    similars = model.wv.most_similar(positive=[wv,])
    

    Naturally, 'book' will be at the top of this list of words most-similar to its own vector.