Search code examples
nlpword2vecword-embedding

Evaluating Word2Vec model by finding linear algebraic structure of words


I have built Word2Vecmodel using gensim library in python.I want to evaluate my word embedding as follows

If A is related to B and C is related to D, then A-C+B should be equal to D. For example, embedding vector arithmetic of "India"-"Rupee"+"Japan" should be equal to the embedding of "Yen".

I have used in built functions of gensim like predict_output_word,most_similar but couldn't get desired results.

new_model.predict_output_word(['india','rupee','japan'],topn=10)
new_model.most_similar(positive=['india', 'rupee'], negative=['japan'])

Kindly help me in evaluating my model as per the criteria above.


Solution

  • You should you the most_similar() method's positive and negative arguments in the same manner as the accuracy() method:

    https://github.com/RaRe-Technologies/gensim/blob/718b1c6bd1a8a98625993d73b83d98baf385752d/gensim/models/keyedvectors.py#L697

    Specifically, if you have a analogy of the form "A is to B as C is to [expected]", you should look at:

    results = model.most_similar(positive=[word_b, word_c], negative=[word_a])
    

    Or in your example:

    results = model.most_similar(positive=['rupee', 'japan'], negative=['india'])