I am using word2vec to calculate similarity between two words. So for the model I am using GoogleNews. This model is quite huge and hence takes lot of time to load.
model = Word2Vec.load_word2vec_format('D:/Userfiles/vsachidananda/Downloads/GoogleNews-vectors negative300.bin.gz', binary=True)
I would like to load this and keep in a variable/object so that whenever I run a python program I should be able to call just
model.similarity('word1','word2')
How can this be achieved? Any idea?
The only way I know to share complex objects between Python processes is to use multiprocessing.Manager
.
But model
would be pickled and unpickled each time it needs to be shared with a subprocess. I guess it would be as slow as load_word2vec_format
.
You could instead run a launcher that loads model
once, then waits and executes another python script on demand. A very simple launcher would look like this:
import Word2Vec
model = Word2Vec.load_word2vec_format(...)
try:
import traceback
import script
while True:
raw_input()
try:
reload(script)
script.main(model)
except:
print traceback.print_exc()
except KeyboardInterrupt:
print 'exit launcher'
With this basic launcher, script.py
should be in the same folder and need to define a main()
:
def main(model):
model.similarity('word1','word2')
...