Google Speech api output changes every time for the same

Google Speech API output changes every time for the same audio file. Is there a way to get same output or fix the model the transcriber uses?

Solution

You're probably using the "default" model. It's the recommended for audio. But I found out (and I wasn't alone) that it's not that great. You're way better off using the "video" model (it's one of the enhanced models, and requires data-logging). I suggest you try the video model even if you're transcribing just audio.

Also, if you have a common theme in the files you'll be transcribing, try and supply some common phrases to the API. The results improve a lot and kinda stabilize when you do that. (ref: SpeechContext)