Search code examples
gensimpre-trained-modelfasttext

How to load pre-trained fastText model in gensim with .npy extension


I am new to deep learning and I am trying to play with a pretrained word embedding model from a paper. I downloaded the following files:

1)sa-d300-m2-fasttext.model

2)sa-d300-m2-fasttext.model.trainables.syn1neg.npy

3)sa-d300-m2-fasttext.model.trainables.vectors_ngrams_lockf.npy

4)sa-d300-m2-fasttext.model.wv.vectors.npy

5)sa-d300-m2-fasttext.model.wv.vectors_ngrams.npy

6)sa-d300-m2-fasttext.model.wv.vectors_vocab.npy

If in case these details are needed sa - sanskrit d300 - embedding dimension fastText - fastText

I dont have a prior experience with gensim, how can load the model into gensim or into tensorflow.

I tried

from gensim.models.wrappers import FastText
FastText.load_fasttext_format('/content/sa/300/fasttext/sa-d300-m2-fasttext.model.wv.vectors_ngrams.npy')

FileNotFoundError: [Errno 2] No such file or directory: '/content/sa/300/fasttext/sa-d300-m2-fasttext.model.wv.vectors_ngrams.npy.bin'


Solution

  • That set of multiple files looks like it was saved from Gensim's FastText implementation, using Gensim's save() method - and thus is not in Facebook's original 'fasttext_format'.

    So, try loading them with the following instead:

    from gensim.models.fasttext import FastText
    model = FastText.load('/content/sa/300/fasttext/sa-d300-m2-fasttext.model')
    

    (Upon loading that main/root file, it will find the subsidiary related files in the same directory, as long as they're all present.)

    The source where you downloaded these files should have included clear instructions for loading them nearby!