Search code examples
nlpstanford-nlpgensimfasttext

How to load pre trained FastText Word Embeddings using Gensim?


I downloaded word embedding from this link. I want to load it in Gensim to do some work but I am not able to load it. I have found many resources and none of it is working. I am using Gensim version 4.1.

I have tried

gensim.models.fasttext.load_facebook_model('/home/admin1/embeddings/crawl-300d-2M.vec')
gensim.models.fasttext.load_facebook_vectors('/home/admin1/embeddings/crawl-300d-2M.vec')

and it is showing me

NotImplementedError: Supervised fastText models are not supported

I went to try to load it using using FastText.load('/home/admin1/embeddings/crawl-300d-2M.vec',) but then it showed UnpicklingError: could not find MARK.

Also, using


Solution

  • Per the NotImplementedError, those are the one kind of full Facebook FastText model, -supervised mode, that Gensim does not support.

    So sadly, the answer to "How do you load these?" is "you don't".

    The .vec files contain just the full-word vectors in a plain-text format – no subword info for synthesizing OOV vectors, or supervised-classification output features. Those can be loaded into a KeyedVectors model:

    kv_model = KeyedVectors.load_word2vec_format('crawl-300d-2M.vec')