Search code examples
.netpowershellspeech-recognitionsapi

MS SpeechRecognitionEngine not returning results


i am trying to do some simple speech recognition (from a .wav file) using Powershell. I am using Microsoft.Speech.Recognition.SpeechRecognitionEngine. Sadly i have some serious problems with it, but first off here is my code:

[System.Reflection.Assembly]::LoadFrom("C:\Program Files\Microsoft SDKs\Speech\v11.0\Assembly\Microsoft.Speech.dll")
[System.Reflection.Assembly]::LoadWithPartialName("System.Speech")


$cult = New-Object System.Globalization.CultureInfo("en-US")

$listener = New-Object Microsoft.Speech.Recognition.SpeechRecognitionEngine($cult)
$listener.SetInputToWaveFile("C:\Users\user\Downloads\audio.wav")

$arr = @("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q" ,"r", "s", "t", "u","v","w","x","y","z","four","red")
$text = New-Object Microsoft.Speech.Recognition.Choices
$text.Add($arr)
$toGram = New-Object Microsoft.Speech.Recognition.GrammarBuilder($text)
$toGram.Culture = $cult
$gram = New-Object Microsoft.Speech.Recognition.Grammar($toGram)
$listener.LoadGrammar($gram)

Register-ObjectEvent $listener RecognizeCompleted -SourceIdentifier "RecognizeCompleted" -Action {if($EventArgs){$EventArgs.Result.Text; write-host $EventArgs.Result.Confidence} else {write-host "nope"} }
$listener.RecognizeAsync()

My problem is that when i use .Recognize() i get no output at all, not even output with 0 results. When registering for the completion of the Async method (.RecognizeAsync()) the Handler gets called and $EventArgs does exist but i can not access any Properties of the variable or even get output from Get-Member.

Am i doing something obviously wrong here? I would appreciate any input as i´m kind of going mad right now...

I would also be open for any alternatives to the MS Speech API (any command line tool that can do basic speech recognition in english would do).

Update: the wave file contains a series of letters or numbers. For example " 3 D 6 H Y"

Update: i appreciate edits but i dont appreciate someone removing code! Thanks! Dont do it!

Update: it seems SAPI doesnt handle single characters very well (if anyhow). I´ll probably try sphinx next. Thanks though to brandon for investing so much time to help me.


Solution

  • This is from my removed comment as it's part of the answer:

    Recognize() is blocking. It's doing one single recognition action each call the way you have it now. I don't have any experience with Powershell so correct me if I'm wrong, but it looks like you'd have call that function or procedure or script etc... for every time you want a recognition.

    Basically: If it hears "A", that's it; You have to call Recognize again to get "B". Try it with a microphone (SetInputToDefaultAudioDevice). Lastly, Recognize[Async]() raises the SpeechRecognized event, where you retrieve results, which it doesn't look like you handle.

    You'll probably want to call RecognizeAsync instead, so the engine can handle more than one bit of spoken text in the same action. It can be done both ways however.

    Again, because I'm not familiar with Powershell, here's some pseudo/c# code to get you on the right track:

    Recognize() method:

    function InitializeRecognizer
        setup your recognizer and audio input, .wav file etc.
        add the handler for the SpeechRecognized event.
        call the Recognize method
    
    function SpeechRecognizedHandler
        read the EventArgs data to get the speech element
        do your output or logic
        if we want to listen to some more stuff
            call Recognize() again
    

    RecognizeAsync() method:

    function InitializeRecognizer
        setup your recognizer and audio input, .wav file etc.
        add the handler for the SpeechRecognized event.
        call the RecognizeAsync() method
    
    function SpeechRecognizedHandler
        read the EventArgs data to get the speech element
        do your output or logic
        (Note: you may have to call RecognizeAsyncCancel()
           or something similar here if you run into issues 
           where it's recognizing stuff in a weird order)
    

    Here's a link to the RecognizeAsync() MSDN doc, which will show you the events raised by the Recognize family.

    http://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognitionengine.recognizeasync%28v=vs.110%29.aspx