I am trying to use the google-speech2text api however, I keep getting "Specify MP3 encoding to match audio file" even though I have setup my code to go through all available encoders.
This is the file I am trying to use
I have to add, If I upload the file on their UI I can get an output. So I assume there is nothing wrong in the source file.
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient.from_service_account_json('gcp_credentials.json')
speech_file = 'chunk7.mp3'
import io
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
import wave
ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16,
enums.RecognitionConfig.AudioEncoding.FLAC,
enums.RecognitionConfig.AudioEncoding.MULAW,
enums.RecognitionConfig.AudioEncoding.AMR,
enums.RecognitionConfig.AudioEncoding.AMR_WB,
enums.RecognitionConfig.AudioEncoding.OGG_OPUS,
enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]
SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
for rate in SAMPLE_RATE_HERTZ:
config = types.RecognitionConfig(
encoding=enco,
sample_rate_hertz=rate,
language_code='en-US')
# Detects speech in the audio file
response = []
print(response)
try:
response = client.recognize(config, audio)
print(response)
except:
pass
print("-----------------------------------------------------")
print(str(rate) + " " + str(enco))
print("response: ", str(response))
Alternatively, there is another file here in Persian ('fa-IR') - which I face the similar issue. I initially put the Obama file as it is more understandable. I appreciate if test your answer with the second file as well.
It looks like you got some unsupported audio format, make it easy just by converting to other format(flac advised), you got two options:
Convert it yourself in you machine:
1) Install sox (editing)
2) Install encoders need it:
* [lame](http://lame.sourceforge.net) mp3 encoder
* [flac](https://xiph.org/flac/download.html) flac encoder
3) run command:
sox source.mp3 --channels=1 --bits=16 dest.flac
In which case you can also use python to execute command:
import subprocess
subprocess.check_output(['sox',sourcePath,'--channels=1','--bits=16',destPath])
Notice you don't need to specify neither sample_rate_hertz nor encoding just because all that info it's in flac headers itself, so you can omit them:
config = types.RecognitionConfig(language_code="fa-IR")
esponse = client.recognize(config, audio)
Resources: troubleshooting