nlp botframework azure-cognitive-services azure-language-understanding

How to make Microsoft LUIS case sensitive?

I have a Azure LUIS instance for NLP, tried to extract Alphanumberic values using RegEx Expression. it worked well but the output had output in lowercase alphabets.

For example:

CASE 1*

My Input: " run job for AE0002" RegExCode = [a-zA-Z]{2}\d+

Output:

{
  "query": " run job for AE0002",
  "topScoringIntent": {
    "intent": "Run Job",
    "score": 0.7897274
  },
  "intents": [
    {
      "intent": "Run Job",
      "score": 0.7897274
    },
    {
      "intent": "None",
      "score": 0.00434472738
    }
  ],
  "entities": [
    {
      "entity": "ae0002",
      "type": "Alpha Number",
      "startIndex": 15,
      "endIndex": 20
    }
  ]
}

I need to maintain the case of the input.

CASE 2

My Input : "Extract only abreaviations like HP and IBM" RegExCode = [A-Z]{2,}

Output :

{
  "query": "extract only abreaviations like hp and ibm", // Query accepted by LUIS test window
  "query": "extract only abreaviations like HP and IBM", // Query accepted as an endpoint url
  "prediction": {
    "normalizedQuery": "extract only abreaviations like hp and ibm",
    "topIntent": "None",
    "intents": {
      "None": {
        "score": 0.09844558
      }
    },
    "entities": {
      "Abbre": [
        "extract",
        "only",
        "abreaviations",
        "like",
        "hp",
        "and",
        "ibm"
      ],
      "$instance": {
        "Abbre": [
          {
            "type": "Abbre",
            "text": "extract",
            "startIndex": 0,
            "length": 7,
            "modelTypeId": 8,
            "modelType": "Regex Entity Extractor",
            "recognitionSources": [
              "model"
            ]
          },
          {
            "type": "Abbre",
            "text": "only",
            "startIndex": 8,
            "length": 4,
            "modelTypeId": 8,
            "modelType": "Regex Entity Extractor",
            "recognitionSources": [
              "model"
            ]
          },....          
          {
            "type": "Abbre",
            "text": "ibm",
            "startIndex": 39,
            "length": 3,
            "modelTypeId": 8,
            "modelType": "Regex Entity Extractor",
            "recognitionSources": [
              "model"
            ]
          }
        ]
      }
    }
  }
}

This makes me doubt if the entire training is happening in lowercase, What shocked me was all the words that were trained initially to their respective entities were retrained as Abbre

Any input would be of great help :)

Thank you

Solution

For Case 1, do you need to preserve the case in order to query the job on your system? As long as the job identifier always has uppercase characters you can just use toUpperCase(), e.g. var jobName = step._info.options.entities.Alpha_Number.toUpperCase() (not sure about the underscore in Alpha Number, I've never had an entity with spaces before).

For Case 2, this is a shortcoming of the LUIS application. You can force case sensitivity in the regex with (?-i) (e.g. /(?-i)[A-Z]{2,}/g). However, LUIS appears to convert everything to lowercase first, so you'll never get any matches with that statement (which is better than matching every word, but that isn't saying much!). I don't know of any way to make LUIS recognize entities in the way you are requesting.

You could create a list entity with all of the abbreviations you are expecting, but depending on the inputs you are expecting, that could be too much to maintain. Plus abbreviations that are also words would be picked up as false positives (e.g. CAT and cat). You could also write a function to do it for you outside of LUIS, basically building your own manual entity detection. There could be some additional solutions based on exactly what you are trying to do after you identify the abbreviations.