Search code examples
c#speech-recognitionspeech-to-textibm-watson

Watson IBM Speech to Text c# api


I using follow example to recognize text from audio https://gist.github.com/nfriedly/0240e862901474a9447a600e5795d500 but I need also time codes, i added at line 40 "timestamps" : true, and removed "interim_results": true as I need only final results. But it broken, after { "state": "listening" } message it takes some time and raise exception like that "Text" received message is invalid after the call Websocket.Closeasync. Websockets.In cases closeasync, so you should only use those when you do not expect to receive other data from the remote endpoint. Use "Websockets.CloseOutputAsync" to preserve the possibility of obtaining additional data, but to close the outgoing channel.

And if i set "continuous" : false, It do only the first iteration of speech (few first words before a pause), and then repeat {"state": "listening" } and freezes.

Can you help me, how to update that example to return Timecodes?


Solution

  • continuous: false means "only transcribe until the first pause" - so it isn't "freezing", it's just stopping when you tell it to.

    The service then sends the final results followed by the second {"state": "listening"} message to indicate that it's done sending results. The example code closes the connection after that, but it sound like you're still attempting to send audio after closing the connection.

    I'm not certain, but I think that timestamps and interim_results will probably work the way you want once you set continuous: false.

    Although, if you only need final results, then the HTTP interface might make more sense. It's much simpler than the WebSockets one.

    Finally, as I mentioned in email, the official IBM Watson .net SDK has support for Speech to Text in the development branch right now, and should have it included in a release soon.