Search code examples
pythonobjectpython-3.xword2vec

Keep alive the object for using them in another program in python


I am using word2vec to calculate similarity between two words. So for the model I am using GoogleNews. This model is quite huge and hence takes lot of time to load.

model = Word2Vec.load_word2vec_format('D:/Userfiles/vsachidananda/Downloads/GoogleNews-vectors negative300.bin.gz', binary=True)

I would like to load this and keep in a variable/object so that whenever I run a python program I should be able to call just

model.similarity('word1','word2')

How can this be achieved? Any idea?


Solution

  • The only way I know to share complex objects between Python processes is to use multiprocessing.Manager. But model would be pickled and unpickled each time it needs to be shared with a subprocess. I guess it would be as slow as load_word2vec_format.

    You could instead run a launcher that loads model once, then waits and executes another python script on demand. A very simple launcher would look like this:

    import Word2Vec
    model = Word2Vec.load_word2vec_format(...)
    
    try:
        import traceback
        import script
    
        while True:
            raw_input()
    
            try:
                reload(script)
                script.main(model)
            except:
                print traceback.print_exc()
    
    except KeyboardInterrupt:
        print 'exit launcher'
    

    With this basic launcher, script.py should be in the same folder and need to define a main():

    def main(model):
        model.similarity('word1','word2')
        ...