speech-recognition speech-to-text bing azure-cognitive-services

Service recognize text until I pause and nothing after

I'm working on an App which lets the user tell a short story (1-2 minutes) and transcribe it to text.

I use MediaCapture to Stream and send the recorded voice with chunked transfer encoding to the Bing Speech API. Everything works great except for one issue: If the user pauses for a couple of seconds and continues nothing he has spoken after the pause is recognized.

I tried the same with a recorded wav-file to ensure the chunked transfer is not the source of this issue. But it produced the same behavior. So the transfer is correct and I get a valid response, but only for the first part of the record.

Does someone ran into the same issue? Is this by design and if so: Is there a way around this behavior?

Solution

You might want to use the SDK. It is better suited for long form scenarios like dictation. There is just a few of seconds of wait before connection is closed in rest API but its longer for SDK.