Search code examples
python-3.xgensimword2vec

evaluating word2vec model using SimLex-999


i have trained my model with Gensim.now i wanna evaluate my model with simlexx-999 but it gives me error. my code.

model.wv.evaluate_word_analogies('SimLex-999.txt')
2019-08-25 13:43:22,766 : INFO : Evaluating word analogies for top 300000 words in the model on SimLex-999.txt

error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-60cb96c45579> in <module>()
----> 1 model.wv.evaluate_word_analogies('SimLex-999.txt')

C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in evaluate_word_analogies(self, analogies, restrict_vocab, case_insensitive, dummy4unknown)
   1088             else:
   1089                 if not section:
-> 1090                     raise ValueError("Missing section header before line #%i in %s" % (line_no, analogies))
   1091                 try:
   1092                     if case_insensitive:

ValueError: Missing section header before line #0 in SimLex-999.txt

i have tried

from gensim.test.utils import datapath

similarities = model.evaluate_word_pairs(datapath('SimLex-999.txt'))

print(similarities)

but it gives me keyError.Please help me to solve the problem.

KeyError                                  Traceback (most recent call last)
<ipython-input-29-caeb682cb7ff> in <module>()
      1 from gensim.test.utils import datapath
      2 
----> 3 similarities = model.wv.evaluate_word_pairs(datapath('SimLex-999.txt'),dummy4unknown=True)
      4 
      5 print(similarities)

C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in evaluate_word_pairs(self, pairs, delimiter, restrict_vocab, case_insensitive, dummy4unknown)
   1287 
   1288         """
-> 1289         ok_vocab = [(w, self.vocab[w]) for w in self.index2word[:restrict_vocab]]
   1290         ok_vocab = {w.upper(): v for w, v in reversed(ok_vocab)} if case_insensitive else dict(ok_vocab)
   1291 

C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in <listcomp>(.0)
   1287 
   1288         """
-> 1289         ok_vocab = [(w, self.vocab[w]) for w in self.index2word[:restrict_vocab]]
   1290         ok_vocab = {w.upper(): v for w, v in reversed(ok_vocab)} if case_insensitive else dict(ok_vocab)
   1291 

KeyError: 'movie'

Solution

  • SimLex-999.txt does not appear to be a list of word analogies appropriate as an argument for the evaluate_word_analogies() function.

    Have you tried the evaluate_word_pairs() function? Its description is at:

    https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.Word2VecKeyedVectors.evaluate_word_pairs