Search code examples
nlpspeech-recognitionstanford-nlppos-tagger

Possible error with Stanford POS Tagger and classifying intent and the replies


I have a specific usecase, where a person would say something like this:

  • "Hey (Trigger Word), note in object history XYZ" or:
  • "Hey (Trigger Word), record in object diagnosis that PQR"
  • ("object" as used in the example is a placeholder and can be replaced with words like 'Maintenance/Patient', etc.)

I would like to recognize the intent and the slots.

Then I use Stanford Parser to parse the sentence, e.g. parsing "Note in object history object was last updated in may twenty eighteen" gives this list-of-tuple:

[('Note', 'VB'),
 ('in', 'IN'),
 ('object', 'NN'),
 ('history', 'NN'),
 ('object', 'NN'),
 ('was', 'VBD'),
 ('last', 'RB'),
 ('updated', 'VBN'),
 ('in', 'IN'),
 ('may', 'MD'),
 ('twenty', 'CD'),
 ('eighteen', 'CD')]
  1. Now, my point is how can I use this information to get the necessary output:

    • Where to note (we have a field in DB: Object History) and
    • What to note (object was last updated in may twenty eighteen).
  2. Another issue is since the input of the NLP is from an ASR system, the capitalization is missing. And the POS Tagger mis-tags 'note' as 'NN' (instead of 'VB'). Ideally 'note'/'record' should be a verb. How do I solve this probable error?


Solution

  • You can use the TrueCaseAnnotator to fix case issues:

    https://stanfordnlp.github.io/CoreNLP/truecase.html

    In general you probably just want to use TokensRegex and write rules patterns to handle these templates. More info here:

    https://stanfordnlp.github.io/CoreNLP/tokensregex.html