Search code examples
nlpazure-language-understandingnlu

Prevent Luis.ai from recognizing 'a' or 'the' as entities


I create an pattern.any entity and intent similar to this:

I want to run [salesforce] bot
I want to run [facebook] bot

I call my entity "BotName" and the intent "BotRun"

This works fine, however, it's confusing the entity when the user enter 'a' or 'the', for example:

I want to run a bot
I want to run the bot

In this case Luis would still recognize them as entities.

Is there anyway I can "exclude" certain words like 'a' or 'the'? Or is there other way to solve this problem?


Solution

  • No, you cannot do this. I have tried on multiple occasions to create the type of exclusion you are wanting. But the methods don't work. You can tell LUIS to ignore words using brackets [], but it doesn't work like you'd expect. For example,

        `run [a] [the] {BotName} bot`
    

    should in theory ignore those words, but in actuality "a" and "the" will still be recognized as the entity.

    If you can be a little more rigid on the required utterance format, you can use something like

        `run (a|the) {BotName} bot`
    

    which will require "a" or "the" before the bot name. LUIS patterns does much better with this. In this case it will not identify "a" or "the" as entities for a phrase like "Run the bot", but it also won't recognize the intent (unless you separately add "run the bot" to your non-pattern utterance list). Further, "Run facebook bot" also isn't recognized, which isn't ideal. However, I think it's ok to require a bit more of a complete phrase, especially if you are relying on this entity extraction. "Run the facebook bot" is much more natural. By the way, this pattern will also correctly recognize longer phrases like "I want to run the facebook bot" or "Can you run the facebook bot?"

    You can add additional phrases to handle other cases, but you can't use the same phrase with less detail. For example, if you keep the simple run {BotName} bot pattern, that will override the more specific pattern and you'll go back to picking up "a" and "the" as bot names (these patterns work independently).

    You could do a check on the entity and ignore it if it is "a" or "the", but accounting for all the cases of random words the user might use is probably just as manual as accounting for the bot names themselves.

    What I've settled on is using very specific patterns where there is very low chance of having extra words included. I add more general utterances directly to the intent. I'm almost always doing dialogs, so I just check and see if the utterance is found, and if not I prompt for it. You run the risk of frustrating users if they type something like "Run my facebook bot" which won't recognize the entity, but really the alternative is that you use a list entity which may not be feasible depending on the number of possible values for the entity.