Search code examples
c#speech-recognitionsapi

MS SAPI SpeechRecognitionEngine in C# completely wrong transcription


I'm new to MS SAPI and I'm trying to write a WAV to TXT conversion utility in C#/Windows Forms using SpeechRecognitionEngine class. I've noticed the speech is completely incorrect. The words don't even sound similar. I'm guessing this could be influenced by a long list of factors, such as sound quality of the input WAV file and the grammar loaded into the recognition engine. I am using the DictationGrammar class.

I'd appreciate any leads from seasoned speech recognition/digital signal processing folks out there.


Solution

  • There are a few reasons you may be having such disappointing results. First, if you are using a desktop recognizer, you should train it for the speaker.

    A second idea is that if you are converting from a Wav file you must use care when choosing the format of the that file. You may have to resample the wav files because the speech recognition engines only support certain sample rates.

    • 8 bits per sample
    • single channel mono
    • 22,050 samples per second
    • PCM encoding

    works well on Windows. See https://stackoverflow.com/a/6203533/90236 for some more info.