Search code examples
google-speech-apigoogle-cloud-speech

400 Specify MP3 encoding to match audio file


I am trying to use the google-speech2text api however, I keep getting "Specify MP3 encoding to match audio file" even though I have setup my code to go through all available encoders.

This is the file I am trying to use

I have to add, If I upload the file on their UI I can get an output. So I assume there is nothing wrong in the source file.

from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient.from_service_account_json('gcp_credentials.json')

speech_file = 'chunk7.mp3'

import io
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types


with io.open(speech_file, 'rb') as audio_file:
    content = audio_file.read()
    audio = types.RecognitionAudio(content=content)

import wave

ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16, 
            enums.RecognitionConfig.AudioEncoding.FLAC,
            enums.RecognitionConfig.AudioEncoding.MULAW,
            enums.RecognitionConfig.AudioEncoding.AMR,
            enums.RecognitionConfig.AudioEncoding.AMR_WB,
            enums.RecognitionConfig.AudioEncoding.OGG_OPUS, 
            enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]

SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
    for rate in SAMPLE_RATE_HERTZ:
        config = types.RecognitionConfig(
            encoding=enco,
            sample_rate_hertz=rate,
            language_code='en-US')

        # Detects speech in the audio file
        response = []

        print(response)
        try:
            response = client.recognize(config, audio)
            print(response)
        except:
            pass
        print("-----------------------------------------------------")
        print(str(rate) + "   " + str(enco))
        print("response: ", str(response))

Alternatively, there is another file here in Persian ('fa-IR') - which I face the similar issue. I initially put the Obama file as it is more understandable. I appreciate if test your answer with the second file as well.


Solution

  • It looks like you got some unsupported audio format, make it easy just by converting to other format(flac advised), you got two options:

    • Search in google for a online audio convertion
    • Convert it yourself in you machine:

      1) Install sox (editing)

      2) Install encoders need it:

       * [lame](http://lame.sourceforge.net) mp3 encoder
       * [flac](https://xiph.org/flac/download.html) flac encoder
      

      3) run command:

      sox source.mp3 --channels=1 --bits=16 dest.flac

    In which case you can also use python to execute command:

    import subprocess
    subprocess.check_output(['sox',sourcePath,'--channels=1','--bits=16',destPath]) 
    

    Notice you don't need to specify neither sample_rate_hertz nor encoding just because all that info it's in flac headers itself, so you can omit them:

    config = types.RecognitionConfig(language_code="fa-IR")
    esponse = client.recognize(config, audio)
    

    Resources: troubleshooting