I have built Word2Vecmodel using gensim library in python.I want to evaluate my word embedding as follows
If A is related to B and C is related to D, then A-C+B should be equal to D. For example, embedding vector arithmetic of "India"-"Rupee"+"Japan" should be equal to the embedding of "Yen".
I have used in built functions of gensim like predict_output_word,most_similar but couldn't get desired results.
new_model.predict_output_word(['india','rupee','japan'],topn=10)
new_model.most_similar(positive=['india', 'rupee'], negative=['japan'])
Kindly help me in evaluating my model as per the criteria above.
You should you the most_similar()
method's positive
and negative
arguments in the same manner as the accuracy()
method:
Specifically, if you have a analogy of the form "A is to B as C is to [expected]", you should look at:
results = model.most_similar(positive=[word_b, word_c], negative=[word_a])
Or in your example:
results = model.most_similar(positive=['rupee', 'japan'], negative=['india'])