Search code examples
pythonamazon-ec2gensimword2vec

Array reshape error when loading word2vec model


I have the following piece of code:

from gensim.models import Word2Vec
model = Word2Vec.load('model2')
X = model[model.wv.vocab]

This piece of code works on one of my machines but not another. The model file is the same. What's going on? The error message I get is the following:

  File "/home/ec2-user/miniconda3/envs/word2vec/lib/python3.7/site-packages/gensim/models/word2vec.py", line 1330, in load
    model = super(Word2Vec, cls).load(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/word2vec/lib/python3.7/site-packages/gensim/models/base_any2vec.py", line 1244, in load
    model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/word2vec/lib/python3.7/site-packages/gensim/models/base_any2vec.py", line 603, in load
    return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
  File "/home/ec2-user/miniconda3/envs/word2vec/lib/python3.7/site-packages/gensim/utils.py", line 427, in load
    obj._load_specials(fname, mmap, compress, subname)
  File "/home/ec2-user/miniconda3/envs/word2vec/lib/python3.7/site-packages/gensim/utils.py", line 458, in _load_specials
    getattr(self, attrib)._load_specials(cfname, mmap, compress, subname)
  File "/home/ec2-user/miniconda3/envs/word2vec/lib/python3.7/site-packages/gensim/utils.py", line 469, in _load_specials
    val = np.load(subname(fname, attrib), mmap_mode=mmap)
  File "/home/ec2-user/miniconda3/envs/word2vec/lib/python3.7/site-packages/numpy/lib/npyio.py", line 440, in load
    pickle_kwargs=pickle_kwargs)
  File "/home/ec2-user/miniconda3/envs/word2vec/lib/python3.7/site-packages/numpy/lib/format.py", line 771, in read_array
    array.shape = shape
ValueError: cannot reshape array of size 16777184 into shape (134441,128)

To install gensim, I used conda install -c anaconda gensim


Solution

  • I checked what @gojomo referred to in the comments and he was correct, my file sizes were wrong. Something must have happened during upload. For large models, word2vec saves the model in 3 files. Assuming your model name is "model2" you will have:

    1. model2
    2. model2.trainables.syn1neg.npy
    3. model2.wv.vectors.npy

    My .wv.vectors.npy was a few kilo bytes too small than the version in my other machine.