Search code examples
ibm-cloudspeech-to-textibm-watson

Dont receive results other than those from first audio chunk


I want some level of real-time speech to text conversion. I am using the web-sockets interface with interim_results=true. However, I am receiving results for the first audio chunk only. The second,third... audio chunks that I am sending are not getting transcribed. I do know that my receiver is not blocked since I do receive the inactivity message.

json {"error": "Session timed out due to inactivity after 30 seconds."}

Please let me know if I am missing something if I need to provide more contextual information.

Just for reference this is my init json.

{
 "action": "start",
 "content-type":"audio/wav",
 "interim_results": true,
 "continuous": true,
 "inactivity_timeout": 10
}

In the result that I get for the first audio chunk, the final json field is always received as false.

Also, I am using golang but that should not really matter.

EDIT:

Consider the following pseudo log

  • localhost-server receives first 4 seconds of binary data #lets say Binary 1
  • Binary 1 is sent to Watson
  • {interim_result_1 for first chunk}
  • {interim_result_2 for first chunk}
  • localhost-server receives last 4 seconds of binary data #lets say Binary 2
  • Binary 2 is sent to Watson
  • Send {"action": "stop"} to Watson
  • {interim_result_3 for first chunk}
  • final result for the first chunk
  • I am not receiving any transcription for the second chunk

Link to code


Solution

  • The solution to this question was to set the size header of the wav file to 0.