Search code examples
rasa-nlu

Rasa NLU:Entity Synonyms detection inconsistency


Me and my team have been using Rasa NLU as a replacement for MS LUIS for over 2 months now, and it has worked out pretty well for us so far. Now we have around 900 entries as Entity Synonyms(as we were using List entity in LUIS).

And only for some utterances, the entity is detected as synonyms and for the majority of utterances, it is unable to detect Entity Synonyms. In order to detect synonyms, I have to create another simple entity which again we are manually training with all the synonym values, once the intents are trained with this simple Entity Rasa seems to detect entity for this intent as both simple and synonyms.

And another quick question, Is the Entity Synonyms in Rasa designed to return only one matched entity(unlike LUIS which used to return all the matched entities values)?

Is there any alternative to list entity from LUIS here in Rasa?


Solution

  • Entity Synonyms in Rasa can lead to some confusion. The actual functionality that they provide is very simple. For each entity that is parsed by the model the value of that entity is checked against the list of entity synonyms. If the value matches an entity synonym then it is replaced with the synonym value.

    The big catch in the above statement is that the the entity has to be identified by the model before it can be replaced with a synonym.

    So take this as a simplified example. Here is my entity synonym definition:

    {
      "value": "New York City",
      "synonyms": ["NYC", "nyc", "the big apple"]
    }
    

    If my training data only provides this example:

    {
      "text": "in the center of NYC",
      "intent": "search",
      "entities": [
        {
          "start": 17,
          "end": 20,
          "value": "New York City",
          "entity": "city"
        }
      ]
    }
    

    It is very unlikely that my model will be able to detect an entity in a sentence like In the center of the big apple. As I said above if the big apple isn't parsed as an entity by the model it cannot be replaced by the entity synonyms to read New York City.

    For this reason you should include more examples in the actual common_examples of the training data with the entities labeled. Once all of the variations of the entity are being classified correctly then add those values to the entity synonym and they will be replaced.

    [
      {
        "text": "in the center of NYC",
        "intent": "search",
        "entities": [
          {
            "start": 17,
            "end": 20,
            "value": "New York City",
            "entity": "city"
          }
        ]
      },
      {
        "text": "in the centre of New York City",
        "intent": "search",
        "entities": [
          {
            "start": 17,
            "end": 30,
            "value": "New York City",
            "entity": "city"
          }
        ]
      }
    ]
    

    I've opened a pull request into the Rasa docs page to add a note to this effect.