Search code examples
uwpc++-cxgrxml

Creating voice command synonyms in GRXML


I've created a voice controlled UWP application in C++/CX (for Hololens, if that matters). A very simple one, mostly according to some samples, this is the speech recognition event handler:

void MyAppMain::HasSpoken(SpeechContinuousRecognitionSession ^sender, SpeechContinuousRecognitionResultGeneratedEventArgs ^args)
{
    if (args->Result->Confidence == SpeechRecognitionConfidence::Medium
        || args->Result->Confidence == SpeechRecognitionConfidence::High)
    {
        process_voice_command(args->Result->Text);
    }
}

Everything works so far, the recognition result is in args->Result->Text variable. Now, I only need to support a very limited set of voice commands and simply ignore everything else, but within that limited set of commands I want some variability. It seems, the last example on this page is exactly about that. So I made the following grammar file based on that:

<grammar version="1.0" xml:lang="en-US" root="nextCommands" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0">

  <rule id="nextCommands">
    <item>
      <one-of>
        <item>next</item>
        <item>go</item>        
        <item>advance</item>
      </one-of>
      <tag>out="next";</tag>
    </item>
  </rule>

</grammar>

What I want with it is that when I say either "next", "go" or "advance", the recognition engine just returns "next", so it is in the args->Result->Text above. What it actually does for me right now is limiting the set of recognized words to those three, but it simply returns the word I say, without converting it to "next". Looks like it either ignores the <tag> element, or I have to retrieve its content in a different way in my C++/CX program. Or <tag> doesn't work the way I think it does. What shall I change to make it work?


Solution

  • I have found a way to do what I want with SRGS (at least for the very simple case described in the question). So, it seems <tag> doesn't change the recognition result directly (at least, not with tag-format="semantics/1.0", there are other tag-format's, as described, for example, here, they may do something else). Instead, it populates some additional collection of properties. So this is how I changed my code for now:

    <grammar version="1.0" xml:lang="en-US" 
    root="nextCommands" xmlns="http://www.w3.org/2001/06/grammar" 
    tag-format="semantics/1.0">
    
      <rule id="nextCommands">
        <item>
          <one-of>
            <item>next</item>
            <item>go</item>        
            <item>advance</item>
          </one-of>
          <tag>out.HONEY="bunny";</tag>
        </item>
      </rule>
    
    </grammar>
    

    Now, when either "next", "go" or "advance" is recognized, it still goes to args->Result->Text unchanged, but also there's gonna be a new pair in the args->Result->SemanticInterpretation->Properties with the HONEY key and the bunny value. I can check if that was the case with

    args->Result->SemanticInterpretation->Properties->HasKey("HONEY");
    

    and, if so, retrieve the value of it with

    args->Result->SemanticInterpretation->Properties->Lookup("HONEY")->GetAt(0); //returns "bunny"