Microsoft SAPI System.Speech for transcription

I'm currently doing a research on a tool that is able to transcribe audio files. The first thing I look at is the possibility of using Microsoft's System.Speech API.

Looking through the msdn documentation, it seems like, this tool is more suitable for short voice commands where you have some knowledge of what to expect from the speaker. It requires you to creation of Grammar for good accuracy.

Can someone who has experienced with this API confirm whether this is right?

Solution

Yes and no.

While theoretically any speech recognizer could implement SAPI (and therefore theoretically have ANY degree of accuracy), the stock windows recognizer I've found is profoundly good for command and control, but not so much for free form dictation or things like keyword spotting.

That's not to say you couldn't recognize a robust selection of words and have it be very accurate. I've had SAPI recognize and speak Klingon, and have had massively sized grammar files. It's just that when you attempt to create your own recognizer, or even your own SAPI voice, there is an absolute dearth of information. Typically the people that could help you are unlikely to precisely BECAUSE it is so difficult or the information they have is proprietary.

If you have a larger lexicon that you'd like to have recognized in a free form fashion, you'd probably be better served with something like Sphinx.