I am using the Gensim Python package to learn a neural language model, and I know that you can provide a training corpus to learn the model. However, there already exist many precomputed word vectors available in text format (e.g. http://www-nlp.stanford.edu/projects/glove/). Is there some way to initialize a Gensim Word2Vec model that just makes use of some precomputed vectors, rather than having to learn the vectors from scratch?
Thanks!
You can download pre-trained word vectors from here (get the file 'GoogleNews-vectors-negative300.bin'): word2vec
Extract the file and then you can load it in python like:
model = gensim.models.word2vec.Word2Vec.load_word2vec_format(os.path.join(os.path.dirname(__file__), 'GoogleNews-vectors-negative300.bin'), binary=True)
model.most_similar('dog')
EDIT (May 2017): As the above code is now deprecated, this is how you'd load the vectors now:
model = gensim.models.KeyedVectors.load_word2vec_format(os.path.join(os.path.dirname(__file__), 'GoogleNews-vectors-negative300.bin'), binary=True)