Search code examples
word2vecpre-trained-model

Pre-trained vector of skip-gram and skip-n-gram


I am doing a project where I need a pre-trained vector of the skip-gram model. I heard that there is also a variant named skip-n-gram model which gives better result.

I am wondering what do I need to train the models myself? Since I just need them to initialize the embedding layer for my model.

I have searched enough but didn't get good examples. I need suggestion from you. Where can I get such pre-trained model or there is no pre-trained model for this.


Solution

  • You can train our own word-vectors if you have enough data with you. This can be done using gensim. They provide very simple yet powerful APIs for topic modeling.

    But if you want to use already trained word2vec models, you can use the word2vec model released by Google. It’s 1.5GB and includes word vectors for a vocabulary of 3 million words and phrases that they trained on roughly 100 billion words from a Google News dataset.

    You can load this model with the gensim. Download the trained word2vec model and use following code to get started.

    import math
    import sys
    import gensim
    import warnings
    warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim') 
    
    from gensim.models.keyedvectors import KeyedVectors
    
    words = ['access', 'aeroway', 'airport']
    
    # load the model
    model = KeyedVectors.load_word2vec_format(path_to_model, binary=True)
    
    # to extract word vector
    print(model[words[0]])  # vector representing access
    

    Result vector:

    [ -8.74023438e-02  -1.86523438e-01 .. ]
    

    Please note that your system may freeze while loading of such huge model.