I've created a voice controlled UWP application in C++/CX (for Hololens, if that matters). A very simple one, mostly according to some samples, this is the speech recognition event handler:
void MyAppMain::HasSpoken(SpeechContinuousRecognitionSession ^sender, SpeechContinuousRecognitionResultGeneratedEventArgs ^args)
{
if (args->Result->Confidence == SpeechRecognitionConfidence::Medium
|| args->Result->Confidence == SpeechRecognitionConfidence::High)
{
process_voice_command(args->Result->Text);
}
}
Everything works so far, the recognition result is in args->Result->Text
variable. Now, I only need to support a very limited set of voice commands and simply ignore everything else, but within that limited set of commands I want some variability. It seems, the last example on this page is exactly about that. So I made the following grammar file based on that:
<grammar version="1.0" xml:lang="en-US" root="nextCommands" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0">
<rule id="nextCommands">
<item>
<one-of>
<item>next</item>
<item>go</item>
<item>advance</item>
</one-of>
<tag>out="next";</tag>
</item>
</rule>
</grammar>
What I want with it is that when I say either "next", "go" or "advance", the recognition engine just returns "next", so it is in the args->Result->Text
above. What it actually does for me right now is limiting the set of recognized words to those three, but it simply returns the word I say, without converting it to "next". Looks like it either ignores the <tag>
element, or I have to retrieve its content in a different way in my C++/CX program. Or <tag>
doesn't work the way I think it does. What shall I change to make it work?
I have found a way to do what I want with SRGS (at least for the very simple case described in the question). So, it seems <tag>
doesn't change the recognition result directly (at least, not with tag-format="semantics/1.0"
, there are other tag-format
's, as described, for example, here, they may do something else). Instead, it populates some additional collection of properties. So this is how I changed my code for now:
<grammar version="1.0" xml:lang="en-US"
root="nextCommands" xmlns="http://www.w3.org/2001/06/grammar"
tag-format="semantics/1.0">
<rule id="nextCommands">
<item>
<one-of>
<item>next</item>
<item>go</item>
<item>advance</item>
</one-of>
<tag>out.HONEY="bunny";</tag>
</item>
</rule>
</grammar>
Now, when either "next", "go" or "advance" is recognized, it still goes to args->Result->Text
unchanged, but also there's gonna be a new pair in the args->Result->SemanticInterpretation->Properties
with the HONEY
key and the bunny
value. I can check if that was the case with
args->Result->SemanticInterpretation->Properties->HasKey("HONEY");
and, if so, retrieve the value of it with
args->Result->SemanticInterpretation->Properties->Lookup("HONEY")->GetAt(0); //returns "bunny"