I have a look-up object, specifically a pre-trained word2vec model from gensim.models.keyedvectors.Word2VecKeyedVectors
. I need to do some data pre-processing and I am using multi-processing for the same. Is there a way in which all of my processes can use the object from the same memory location instead of each process loading the object into its own memory?
Yes, if:
.save()
method, and the relevant large-arrays of vectors are clearly separate .npy
files.load()
method, with the mmap
optionSee this prior answer for an overview of the steps/concerns of a similar need.
(The concern & extra steps listed there to avoid breaking the mmap-sharing – by performing manual patch-ups of the norm
properties – should no longer be necessary in Gensim 4.0.0, currently available only as a prerelease version.)