Search code examples
audioibm-cloudspeech-to-textwatson

Watson Speech To Text service works faster for which type of audio file?


I have tried the Watson Speech to Text API for MP3 as well as WAV files. As per my observation, the same length of audio takes less time if its given in MP3 format as compared to WAV. 10 consecutive API calls with different audios took on an average 8.7 seconds for MP3 files. On the other hand the same input in WAV format took average 11.1 seconds. Does the service response time depend on the file type? Which file type is recommended to use to obtain the results faster?


Solution

  • Different encoding formats have different bitrates. mp3 and opus are lossy compression formats (although suitable for speech recognition when bitrates are not too low) so they offer the lowest bitrates. If you need to push less bytes over the network that is typically better for latency, so depending on your network speed you can see shorter processing times when using encoding with lower bitrates.

    However, regarding the actual speech recognition process (ignoring the data transfer over the network) all encodings are equally fast since before the recognition starts all the audio is uncompressed, if necessary, and converted to the sampling rate of the target model (broadband or narrowband).