Search code examples
pythongensimword2vec

How to fix unpickling key error when loading word2vec (gensim)?


I am trying to load a pre-trained word2vec model in pkl format taken from here

The line of code I use to load it:

model = gensim.models.KeyedVectors.load('enwiki_20180420_500d.pkl') 

However, i keep getting the following error (full traceback):

UnpicklingError                           Traceback (most recent call last)
<ipython-input-15-ebd5780b6636> in <module>
     55 
     56 #Load pretrained word2vec
---> 57 model = gensim.models.KeyedVectors.load('enwiki_20180420_500d.pkl',mmap='r')
     58 

~/anaconda3/lib/python3.7/site-packages/gensim/models/keyedvectors.py in load(cls, fname_or_handle, **kwargs)
   1551     @classmethod
   1552     def load(cls, fname_or_handle, **kwargs):
-> 1553         model = super(WordEmbeddingsKeyedVectors, cls).load(fname_or_handle, **kwargs)
   1554         if isinstance(model, FastTextKeyedVectors):
   1555             if not hasattr(model, 'compatible_hash'):

~/anaconda3/lib/python3.7/site-packages/gensim/models/keyedvectors.py in load(cls, fname_or_handle, **kwargs)
    226     @classmethod
    227     def load(cls, fname_or_handle, **kwargs):
--> 228         return super(BaseKeyedVectors, cls).load(fname_or_handle, **kwargs)
    229 
    230     def similarity(self, entity1, entity2):

~/anaconda3/lib/python3.7/site-packages/gensim/utils.py in load(cls, fname, mmap)
    433         compress, subname = SaveLoad._adapt_by_suffix(fname)
    434 
--> 435         obj = unpickle(fname)
    436         obj._load_specials(fname, mmap, compress, subname)
    437         logger.info("loaded %s", fname)

~/anaconda3/lib/python3.7/site-packages/gensim/utils.py in unpickle(fname)
   1396         # Because of loading from S3 load can't be used (missing readline in smart_open)
   1397         if sys.version_info > (3, 0):
-> 1398             return _pickle.load(f, encoding='latin1')
   1399         else:
   1400             return _pickle.loads(f.read())

UnpicklingError: invalid load key, ':'.

I tried loading it with load_word2vec_format, but no luck. Any ideas what might be wrong with it?


Solution

  • Per your link https://wikipedia2vec.github.io/wikipedia2vec/pretrained/ these are to be loaded using that library's Wikipedia2Vec.load() method.

    Gensim's .load() methods should only be used with files saved directly from Gensim model objects.

    The Wikipedia2Vec project does say that their .txt file formats would load with .load_word2vec_format(), so you could also try that - but with one of their .txt format files.

    Their full model .pkl files are only going to work with their class's own loading function.