Search code examples
c#.netspeech-recognitionspeechspeech-to-text

good Speech recognition API


I am working on a college project in which I am using speech recognition. Currently I am developing it on Windows 7 and I'm using system.speech API package which comes along with .net and I am doing it on C#.

The problem I am facing is dictation recognition is not accurate enough. Then whenever I start my application the desktop speech recognition starts automatically. This is a big nuicance to me. As already the words I speak are not clear enough and conflicting recognition are interpreted as commands and actions like application switching minimize is being carried out.

This is a critical part of my app and i kindly request you to suggest any good speech API for me other than this Microsoft blunder. It will be good even if it can understand just simple dictation grammar.


Solution

  • I think desktop recognition is starting because you are using a shared desktop recognizer. You should use an inproc recognizer for your application only. you do this by instantiating a SpeechRecognitionEngine() in your application.

    Since you are using the dictation grammar and the desktop windows recognizer, I believe it can be trained by the speaker to improve its accuracy. Go through the Windows 7 recognizer training and see if the accuracy improves.

    To get started with .NET speech, there is a very good article that was published a few years ago at http://msdn.microsoft.com/en-us/magazine/cc163663.aspx. It is probably the best introductory article I’ve found so far. It is a little out of date, but very helfpul. (The AppendResultKeyValue method was dropped after the beta.)

    Here is a quick sample that shows one of the simplest .NET windows forms app to use a dictation grammar that I could think of. This should work on Windows Vista or Windows 7. I created a form. Dropped a button on it and made the button big. Added a reference to System.Speech and the line:

    using System.Speech.Recognition;
    

    Then I added the following event handler to button1:

    private void button1_Click(object sender, EventArgs e)
    {         
        SpeechRecognitionEngine recognizer = new SpeechRecognitionEngine();
        Grammar dictationGrammar = new DictationGrammar();
        recognizer.LoadGrammar(dictationGrammar);
        try
        {
            button1.Text = "Speak Now";
            recognizer.SetInputToDefaultAudioDevice();
            RecognitionResult result = recognizer.Recognize();
            button1.Text = result.Text;
        }
        catch (InvalidOperationException exception)
        {
            button1.Text = String.Format("Could not recognize input from default aduio device. Is a microphone or sound card available?\r\n{0} - {1}.", exception.Source, exception.Message);
        }
        finally
        {
            recognizer.UnloadAllGrammars();
        }                          
    }
    

    A little more information comparing the various flavors of speech engines and APIs shipped by Microsoft can be found at What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition??