Search code examples
pythonmachine-learningtext-classification

Alternative to TfidfVectorizer


Is there any alternative to TfidfVectorizer function of sklearn.feature_extraction.text module? I've heard of fastText and GloVe, but couldn't find a good expalnation of how to use it to vectorize text.

Edit: Basically I've a feature called narration, which consists of English sentences. In order to feed this into any ML algorithm I've to convert it into a numeric matrix representation. TfIdf was one way. Is there any other way that I can try out? (May or may not be under sklearn)


Solution

  • What you are looking for is called text embedding, see for example this. Essentially for your naration feature, you are looking to turn a sequence into vectors, hence seq_to_vec. TfIdf is just one of the simplest ways of doing this, which yields a sparse (many more components are =0, than not). I suggest you look here for a good starting point.