Search code examples
pythonnlpgensimfasttext

FastText: AttributeError: type object 'FastText' has no attribute 'reduce_model'


I use FastText to generate the word embedding. I download the pre-trained model from https://fasttext.cc/docs/en/crawl-vectors.html The model has 300 dimensions but I want 100 dimensions so I use reduce model command but I got an error

import gensim
model = gensim.models.fasttext.FastText.load_fasttext_format('cc.th.300.bin')
gensim.models.fasttext.utils.reduce_model(model, 100)
I got AttributeError: module 'gensim.utils' has no attribute 'reduce_model'

Heres are the code from FastText docs

import fasttext
import fasttext.util
ft = fasttext.load_model('cc.en.300.bin')
fasttext.util.reduce_model(ft, 100)

How to fix this error, I cannot find any docs for the new command.

Thank you


Solution

  • The module gensim.fasttext.utils does not have a function reduce_model(), as the error message describes.

    That's not a common/standard operation - it's just something the Facebook wrapper decided to implement. (It looks like it's using standard PCA on a tiny subsample of the vectors, per source code here.)

    Why do you want to reduce the dimensionality?

    Note that you'll lose some of the model's expressiveness, and if you were able to load the model at all to do the reduction, it's not too big for your RAM. If your main goal is to save model size, there might be better ways, such as discarding more rare words, depending on your reasons.

    If you absolutely need to perform such a reduction, some options could be:

    • Do it using the Facebook wrapper, then save the results in a form Gensim can load.
    • Reimplement the same operation for a Gensim model, perhaps using the FB code as a guide. (You'd have to make sure the Gensim model is updated in all ways that it considered the original dimensionality, which might be tricky – it's never been a goal or function of Gensim to enable after-the-fact model-shrinking.)