I have big gensim Doc2vec model, I only need to infer vectors while i am loading the training documents vectors from other source. Is it possible to load it as is without the big npy file
I did
Edit:
from gensim.models.doc2vec import Doc2Vec
model_path = r'C:\model/model'
model = Doc2Vec.load(model_path)
model.delete_temporary_training_data(keep_doctags_vectors=False, keep_inference=True)
model.save(model_path)
remove the files (model.trainables.syn1neg.npy,model.wv.vectors.npy)
manually
model = Doc2Vec.load(model_path)
but it ask for
Traceback (most recent call last):
File "<ipython-input-5-7f868a7dbe0c>", line 1, in <module>
model = Doc2Vec.load(model_path)
File "C:\ProgramData\Anaconda3\envs\py\lib\site-packages\gensim\models\doc2vec.py", line 1113, in load
return super(Doc2Vec, cls).load(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\py\lib\site-packages\gensim\models\base_any2vec.py", line 1244, in load
model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\py\lib\site-packages\gensim\models\base_any2vec.py", line 603, in load
return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
File "C:\ProgramData\Anaconda3\envs\py\lib\site-packages\gensim\utils.py", line 427, in load
obj._load_specials(fname, mmap, compress, subname)
File "C:\ProgramData\Anaconda3\envs\py\lib\site-packages\gensim\utils.py", line 458, in _load_specials
getattr(self, attrib)._load_specials(cfname, mmap, compress, subname)
File "C:\ProgramData\Anaconda3\envs\py\lib\site-packages\gensim\utils.py", line 469, in _load_specials
val = np.load(subname(fname, attrib), mmap_mode=mmap)
File "C:\ProgramData\Anaconda3\envs\py\lib\site-packages\numpy\lib\npyio.py", line 428, in load
fid = open(os_fspath(file), "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\model/model.trainables.syn1neg.npy'
Note: Those files not exists in the directory, The model run on a server and download the model file from the storage My question is, Do the model must have those files for inference? I want to run it as low memory consumption as possible. Thanks.
Edit: Is the file model.trainables.syn1neg.npy is the model weights? Is the file model.wv.vectors.npy is necessary for running an inference?
I'm not a fan of the delete_temporary_training_data()
method. It implies there's a clearer separation between training-state and that needed for later uses. (Inference is very similar to training, though it doesn't need the cached doc-vectors for training texts.)
That said, if you've used that method, you shouldn't then be deleting any of the side-files that were still part of the save. If they were written by .save()
, they'll be expected, by name, by the .load()
. They must be kept with the main model file. (There might be fewer such files, or smaller such files, after the delete_temporary_training_data()
call - but any written must be kept for reading.)
The syn1neg
file is absolutely required for inference: it's the model's hidden-to-output weights, needed to perform new forward-predictions (and thus also backpropagated inference-adjustments). The wv.vectors
file is definitely needed in default dm=1
mode, where word-vectors are part of the doc-vector calculation. (It might be optional in dm=0
mode, but I'm not sure the code is armored against them being absent - not via in-memory trimming, and definitely not against the expected file being deleted out-of-band.)