Search code examples
pythonnlpgensimword2vec

Load PreComputed Vectors Gensim


I am using the Gensim Python package to learn a neural language model, and I know that you can provide a training corpus to learn the model. However, there already exist many precomputed word vectors available in text format (e.g. http://www-nlp.stanford.edu/projects/glove/). Is there some way to initialize a Gensim Word2Vec model that just makes use of some precomputed vectors, rather than having to learn the vectors from scratch?

Thanks!


Solution

  • You can download pre-trained word vectors from here (get the file 'GoogleNews-vectors-negative300.bin'): word2vec

    Extract the file and then you can load it in python like:

    model = gensim.models.word2vec.Word2Vec.load_word2vec_format(os.path.join(os.path.dirname(__file__), 'GoogleNews-vectors-negative300.bin'), binary=True)
    
    model.most_similar('dog')
    

    EDIT (May 2017): As the above code is now deprecated, this is how you'd load the vectors now:

    model = gensim.models.KeyedVectors.load_word2vec_format(os.path.join(os.path.dirname(__file__), 'GoogleNews-vectors-negative300.bin'), binary=True)