I have tried VB in Visual Studio 2010 and 2012 with framework 4.5 I check "Microsoft Speech Object Library ver 11" in my references after installing the Speech SDK versionn 11. I then import "SpeechLib" and instantiate speech with:
Public WithEvents m_Recocontext As SpInProcRecoContext
Public m_Recognizer As SpInprocRecognizer
Public m_Grammar As ISpeechRecoGrammar
and in Form_Load:
m_Recocontext = New SpInProcRecoContext
m_Recognizer = CType(m_Recocontext.Recognizer, SpInprocRecognizer)
m_Grammar = m_Recocontext.CreateGrammar(0)
Dim grammarfile As String = Application.StartupPath & "\grammartest.xml"
m_Grammar.CmdLoadFromFile(grammarfile, SpeechLoadOption.SLODynamic)
m_Recocontext.EventInterests = SpeechRecoEvents.SREAllEvents
m_Recocontext.RetainedAudio = CType((m_Recocontext.RetainedAudio = SpeechRetainedAudioOptions.SRAORetainAudio), SpeechRetainedAudioOptions)
Then in my "SpeechOn" routine I connect the audio path to my telephony device on a phone call like this:
m_AudioIn = New SpMMAudioIn
m_AudioIn.DeviceId = TelePhoneLine.WaveRecordID
m_AudioIn.Format.Type = SpeechAudioFormatType.SAFT8kHz16BitMono
m_Recognizer.AudioInputStream = m_AudioIn
m_Grammar.CmdSetRuleIdState(1, SpeechRuleState.SGDSActive)
Here is my grammar file:
<GRAMMAR LANGID="409">
<RULE ID="1" Name="number" TOPLEVEL="ACTIVE">
<L PROPNAME="number">
<P VAL="1">+one</P>
<P VAL="2">+two</P>
<P VAL="3">+three</P>
<P VAL="4">+four</P>
<P VAL="5">+five</P>
<P VAL="6">+six</P>
<P VAL="7">+seven</P>
<P VAL="8">+eight</P>
<P VAL="9">+nine</P>
<P VAL="0">+zero</P>
</L>
</RULE>
<RULE ID="2" Name="yesno" TOPLEVEL="ACTIVE">
<L PROPNAME="yesno">
<P VAL="1">+yes</P>
<P VAL="2">+no</P>
<P VAL="3">+maybe</P>
</L>
</RULE>
</GRAMMAR>
My "OnRecognition" event fires every time I speak a command and control word from my .XML Grammar file which is very small, usually just 0-9 or even just 1 or 2 (press 1 for yes and 2 for no). If I speak "1...2...3...4...5" at a normal pace, it misses every other number or every third number. If I speak 1 number per second, it gets them all. What trick am I missing to make speech recognition fast enough to be usable?
And here is the final working version. "propname" had to be added in the tag or it would not load.
<GRAMMAR LANGID="409">
<RULE ID="1" Name="number" TOPLEVEL="ACTIVE">
<PHRASE min="5" max="7">
<RULEREF Name="digits" propname="digits"/>
</PHRASE>
</RULE>
<RULE Name="digits">
<L PROPNAME="digits">
<P VAL="0">0</P>
<P VAL="1">1</P>
<P VAL="2">2</P>
<P VAL="3">3</P>
<P VAL="4">4</P>
<P VAL="5">5</P>
<P VAL="6">6</P>
<P VAL="7">7</P>
<P VAL="8">8</P>
<P VAL="9">9</P>
</L>
</RULE>
</GRAMMAR>
You'll want to change your grammars (using the SAPI Grammar spec) to specify an account number. Assuming your account numbers are 7-10 digits long, you could use something like this:
<rule name="accountno">
<phrase min="7" max="10">
<ruleref name="digit" propname="digit"/>
</phrase>
</rule>
<rule name="digit">
<l>
<p val="0">0</p>
<p val="1">1</p>
<p val="2">2</p>
<p val="3">3</p>
<p val="4">4</p>
<p val="5">5</p>
<p val="6">6</p>
<p val="7">7</p>
<p val="8">8</p>
<p val="9">9</p>
</l>
</rule>