Search code examples
botframeworkazure-language-understanding

Pre-process user utterances in bot before forwarding them to LUIS


I build a bot in German language which should understand Swiss number formats:

  • English format for 1Mio: 1,000,000
  • German format for 1Mio: 1.000.000
  • Swiss format for 1Mio: 1'000'000

Unfortunately LUIS has no Swiss culture and will therefore not correctly understand 1'000'000 with builtin number entity. So my idea is to pre-process the user utterances before forwarding it to LUIS as follows: If I see a Swiss thousand separator (i.e. ') with at least one digit on the left and 3 digits on the right, then remove the Swiss thousand separator from the utterance before forwarding it to LUIS... and LUIS will then correctly recognize it because the numbers are cleaned of thousand separators.

Has anyone an idea how to do this in the bot? Or better in the middleware? I am new to BotFramework and pretty much lost.

Thanks!


Solution

  • Yes, you can modify the activity before you pass it to LUIS. You just need to come up with the appropriate regex to find and replace the '. For example, here's a bot where I'm updating this as part of the onTurn function, updated with a regex replace that I think will work for you (in nodejs):

    async onTurn(context) {
        if (context.activity.type === ActivityTypes.Message) {
            context.activity.text = context.activity.text.replace(/(?<=\d{1})'(?=\d{3})/g,'')
    
            const dc = await this.dialogs.createContext(context);
            const results = await this.luisRecognizer.recognize(context);
    

    The regex here is looking for the ' character preceeded by one digit (it's ok if it's more than one like in the middle of the number) and followed by 3 digits. You'd actually probably be ok with just /'(?=\d{3})/g which is a ' followed by three digits.

    Same applies if you are using C# or a different turn handler, you just need to modify the activity.text before you pass it to LUIS.