c#win-universal-app speech-recognition text-to-speech desktop-application

API for Live Captions on Windows

I'm build a Win Universal App with capabilities to watch live captions of the lecture which student is currently watching or attending in person. I'm looking for a built-in free solution to do audio to text operations.

macOS have the Speech lib https://developer.apple.com/documentation/speech , which we're going to use, but cannot find a similar on Windows. Found docs on Windows.Media package, but cannot figure out if it actually has audio2text api or just commands recognition https://learn.microsoft.com/en-us/uwp/api/windows.media.speechrecognition?view=winrt-22621

Maybe someone has experience with building such kind of capabilities on Windows?

Solution

Yes, you could use the Windows.Media.SpeechRecognition API for speech recognition not only with the commands recognition.

You could make a simple test with the official Speech Recognition Sample here: SpeechRecognitionAndSynthesis. Just remember to enable the Online speech recognition (Settings -> Privacy -> Speech).