Search code examples
pythonnlpspacyazure-language-understandingnamed-entity-extraction

entity detection - entities clashing with english words


I have few sentences like below

  • what is the sales org for fpc 1234 for IS?
  • give me sales org for fpc 12234 for IS?
  • give me sales org for fpc 12234 with scope ME?

In the above sentences, the entity I'm looking for is IS, IS and ME respectively. These entities include, IS, ME, AN, AM which are common while constructing a sentence in English. I'm using LUIS for entity detection and maintaining the entities as a list entity. The issue is that, though LUIS is able to detect the entities (IS,AN,AM) its detecting them on the normal sentences like

  • what is the sales org for fpc 1234

In the above sentence, we do not have any entity but the entity IS is picked up.

How do we detect the entities only if they're addressed actually and not a part of sentence construction.

Few points to note:

  • The list of entities is really long to train the entities rather than use entities
  • We can't hard code to find the occurrence of a entity twice as it may fail like here:
    • give me sales org for fpc 12234 for IS?
    • ME,IS do not occur twice and cannot be used to create a rule.
  • The issue is not with LUIS but entity extraction in general. I'm looking at POS tagging as well but that needs the entity to be present in capital letter to identify it as a Noun, which may not be the case always.
    • Also have just tried out the POS tagging using Spacy. Below are the results.
    • which sales org extended to the fpc 1234 for TO? - TO is classified as preposition (which it is actually.) enter image description here
    • what is the sales org for fpc 1234 with scope IS? - IS is classified as a Verb enter image description here

Solution

  • You've probably figured out that non-machine-learned entities are not ideal in your case because they don't take context into consideration. I think you have a few options.

    Option 1: Simple Entities

    I just tested by adding your three utterances to an intent named "Sales org" and then creating a simple entity named "Scope." I labeled IS, IS, and ME at the ends of those utterances as the Scope entity. LUIS was then able to correctly identify "is" as the entity but not "me" when I tested "give me sales org for fpc 12234 for is?"

    After making a call to LUIS, your bot code can then validate the recognized entity to make sure it's within the list of acceptable values.

    Option 2: Roles

    If you still want to use a list entity, you can still have LUIS give you contextual information about the entity by using roles.

    I just tested by creating an entity named "ScopeName" with your four values IS, ME, AN, and AM. I then created two roles for that entity: "scope" and "falsePositive." Then I labeled the entities in the "Sales org" utterances like this:

    enter image description here

    If you do this, LUIS will still recognize IS, ME, AN, and AM when they're in the parts of the sentence where you don't want them to be recognized, but you'll know to ignore them in your bot code because they'll be assigned the "falsePositive" role.