Search code examples
spacyrasa-nlu

How to improve accuracy of Rasa NLU while using Spacy as pipeline?


In Spacy documentation it is mentioned that it uses vector similarity in featurization and hence in classification.

For example if we test a sentence which is not in the training data but has same meaning then it should be classified in same intent in which training sentences have classified.

But it's not happening. Let's say training data is like this-

## intent: delete_event
- delete event
- delete all events
- delete all events of friday
- delete ...

Now if I test remove event then it is not classified as delete_event rather it falls in some other intent.

I have tried changing the pipeline to supervised_embeddings and also made changes in components of spacy pipeline. But still this issue is there.

I don't want to create training data for remove... texts, as it should be supported by spacy according to it's documentation.

I don't have other intents which has sentences delete... in them.

Config file in rasa -

language: "en_core_web_sm"

pipeline:
- name: "SpacyNLP"
- name: "SpacyTokenizer"
- name: "SpacyFeaturizer"
- name: "RegexFeaturizer"
- name: "SpacyEntityExtractor"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "SklearnIntentClassifier"

policies:
- name: MemoizationPolicy
- name: KerasPolicy
- name: MappingPolicy

Solution

  • It's probably an overdone answer, but likely you just need more training data. And that probably means that you have to include some other words besides delete.

    Yes, spaCy can generalize outside of words you include, but if all of your training data for that intent uses the word delete then you are training it to only accept that word or that word is extremely important. if you include more similar words to delete you train it that related words are allowed.

    As far as the TensorFlow pipeline, it doesn't even know the words exist until you use them, so you would be best served including remove at least once so it can build the vectors connecting delete and remove (and cancel, call off, drop, etc as well)

    Also, you are currently using the small spaCy language model, it may be useful trying one of the larger ones once you've got more training data.