Search code examples
pythoncompressionfasttext

Fasttext Quantize Unsupervised model


I am trying to quantize the unsupervised model in fasttext using this command.

model.quantize(input=train_data, qnorm=True, retrain=True, cutoff=200000)

It's throwing an error that it is supported for only supervised models.

enter image description here

Is there any alternate way to quantize the unsupervised models?


Solution

  • The paper which introduced the FastText team's quantization strategy only evaluated classification models, and used some pruning steps that might only make sense with labeled training documents. (Though, I don't see the arguments to -quantize as including the original training docs, so not sure the pruning technique as described in the paper is fully implmented.)

    While some of the compression steps could be applied to the unsupervised dense vectors, I've not yet seen a library offering that functionality, but it could be a neat thing to implement/add.

    However, it's possible that the kind of classification done by the FastText work is a "sweet spot" for these techniques, and applied to other word-vectors they'd have a much larger negative impact on downstream uses. So, extension of the technique should be accompanied by some experiments confirming its value.