Search code examples
pythonneural-networknlprasa-nlu

MITIE library for NLP


I am trying to understand how MITIE is integrated with Rasa. I wanted to know what exactly the MITIE file total_word_feature_extractor.dat contain? I dont find any good documentation about this.

Thanks!


Solution

  • If you poke around deep enough in the MITIE repo's on Github you can find your answer. For example here is a bit of information about what goes into that file.

    As for what's inside, yes, it's a variant of word2vec based on the two step CCA method from this paper: http://icml.cc/2012/papers/763.pdf. I also upgraded it to include something that is similar to the CCA method but works on out of sample words by analyzing their morphology to produce a word vector. This significantly improved the results on datasets containing lots of words not in the original dictionary.

    As far as how MITIE integrates into Rasa, it is one of a few backend choices for Rasa. It provides a few pipeline components that can do both intent classification and NER. Both of which use an SVM and use the total_word_feature_extractor.dat to provide the individual word vectors.