I'm working on an App which lets the user tell a short story (1-2 minutes) and transcribe it to text.
I use MediaCapture to Stream and send the recorded voice with chunked transfer encoding to the Bing Speech API. Everything works great except for one issue: If the user pauses for a couple of seconds and continues nothing he has spoken after the pause is recognized.
I tried the same with a recorded wav-file to ensure the chunked transfer is not the source of this issue. But it produced the same behavior. So the transfer is correct and I get a valid response, but only for the first part of the record.
Does someone ran into the same issue? Is this by design and if so: Is there a way around this behavior?
You might want to use the SDK. It is better suited for long form scenarios like dictation. There is just a few of seconds of wait before connection is closed in rest API but its longer for SDK.