Search code examples
text-classificationfasttextmlp

Using fastText Sentence Vector as an Input Feature


I want to use the fastText Sentence Vector as an input Feature.

vector = model.get_sentence_vector('Original Sentence')

I am attempting to perform Binary Classification of sentences using MLPs and will train the algorithm using the fixed sized feature generated by the above code. Is this a plausible thing to do?


Solution

  • You can take the mean of the word embeddings, i.e., tokenize the sentence, look up embeddings for all words computing an average. In this way, you will get a NumPy array that you can use as an input to whatever classifier you want. Depending on the classification task, it might be useful to remove function words first.

    Gensim has a richer Python API than FastText itself. If you just want to quickly train a classifier, the best option is using the command line interface of FastText.