Search code examples
google-cloud-speech

MP3 AudioEncoding not working, am I currently running v1beta1?


I am trying to transcribe audio from a stream using this tutorial (section, "Performing streaming speech recognition on a local file"): https://cloud.google.com/speech-to-text/docs/streaming-recognize

The file is an M3U file, so I am trying to use the RecognitionConfig.AudioEncoding.MP3 option, but the MP3 attribute is being rejected. When I try to autocomplete the option, MP3 does not appear either.

The documentation show that the MP3 attribute is only available in version v1beta1 (https://cloud.google.com/text-to-speech/docs/reference/rpc/google.cloud.texttospeech.v1beta1#google.cloud.texttospeech.v1beta1.AudioEncoding), and I ran the pip upgrade.

Is there something else I need to do to install v1beta1?


Solution

  • Note that the second link you shared, regarding v1beta1, is for the Text-to-Speech API which is the other way around of the examples you are following (Speech-to-Text API).

    In that case, to use RecognitionConfig.AudioEncoding.MP3, you'll need to use the v1p1beta1 version instead. No changes are needed to the pip command (pip install --upgrade google-cloud-speech) but you need to import the right version (speech_v1p1beta1) in your Python code:

    # [START speech_transcribe_streaming]
    def transcribe_streaming(stream_file):
        """Streams transcription of the given audio file."""
        import io
        from google.cloud import speech_v1p1beta1
        from google.cloud.speech_v1p1beta1 import enums
        from google.cloud.speech_v1p1beta1 import types
        client = speech_v1p1beta1.SpeechClient()
    

    And now you can use the MP3 encoding:

        config = types.RecognitionConfig(
            encoding=enums.RecognitionConfig.AudioEncoding.MP3,
            sample_rate_hertz=16000,
            language_code='en-US')
        streaming_config = types.StreamingRecognitionConfig(config=config)
    

    Full code here but it's just the base example with the previous changes.

    Tested with an MP3 sample:

    $ python mp3.py sample.mp3
    Finished: True
    Stability: 0.0
    Confidence: 0.9875912666320801
    Transcript: I'm sorry Dave I'm afraid I can't do that