I am doing a project where I need a pre-trained vector of the skip-gram model. I heard that there is also a variant named skip-n-gram model which gives better result.
I am wondering what do I need to train the models myself? Since I just need them to initialize the embedding layer for my model.
I have searched enough but didn't get good examples. I need suggestion from you. Where can I get such pre-trained model or there is no pre-trained model for this.
You can train our own word-vectors if you have enough data with you. This can be done using gensim. They provide very simple yet powerful APIs for topic modeling.
But if you want to use already trained word2vec models, you can use the word2vec model released by Google. It’s 1.5GB and includes word vectors for a vocabulary of 3 million words and phrases that they trained on roughly 100 billion words from a Google News dataset.
You can load this model with the gensim. Download the trained word2vec model and use following code to get started.
import math
import sys
import gensim
import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')
from gensim.models.keyedvectors import KeyedVectors
words = ['access', 'aeroway', 'airport']
# load the model
model = KeyedVectors.load_word2vec_format(path_to_model, binary=True)
# to extract word vector
print(model[words[0]]) # vector representing access
Result vector:
[ -8.74023438e-02 -1.86523438e-01 .. ]
Please note that your system may freeze while loading of such huge model.