Search code examples
pythongoogle-speech-apigoogle-cloud-speech

Empty response using Speech-to-Text API on WAV file


I have a WAV file generated from a stream using WebRTC. The sample demo here is able to transcribe it with results but my code is failing to do so as I'm getting an empty response. Here's my config:

​audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.OGG_OPUS,
    sample_rate_hertz=48000,
    language_code="es-US",
    audio_channel_count=2,
    enable_separate_recognition_per_channel=True,
    use_enhanced=True,
    model="command_and_search"
)

Download audio


Solution

  • I tested your file using the configuration you provided and I get blank results as well. I'm not sure what code or version of the API on the backend of Try it demo uses that makes your audio file work seamlessly. But what I did as a workaround is I converted your file to FLAC and it worked.

    To convert the file I used FFMPEG. You can use any audio converter tool that you have as long as it properly converts it to FLAC. See command:

    ffmpeg -i hola.wav hola.flac
    

    Using the converted file I changed the audio encoding in the config to flac and it worked fine. See code below:

    def transcribe_file(speech_file):
        from google.cloud import speech
        import io
    
        client = speech.SpeechClient()
    
        with io.open(speech_file, "rb") as audio_file:
            content = audio_file.read()
    
        audio = speech.RecognitionAudio(content=content)
        config = speech.RecognitionConfig(
            encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
            sample_rate_hertz=48000,
            audio_channel_count=2,
            language_code="en-US",
            model="command_and_search"
        )
    
        response = client.recognize(config=config, audio=audio)
        print(response)
    
        for result in response.results:
            print(u"Transcript: {}".format(result.alternatives[0].transcript))
    
    transcribe_file("./hola.flac")
    

    Output:

    enter image description here

    Also for reference, when empty result is encountered and you have tried to optimize the audio (split into mono) and still fails. Try converting the file to FLAC as suggested by troubleshooting docs.