Search code examples
python-2.7apispeech

Google Speech API Returning Blank Json Response


I want to use the google speech API V1 with Python.

So far I have got it to work using a google uri example and received content back. When I tried to modify the code to use a custom recorded audio file I get a response from google but it doesn't have any translated content.

I set up the request by:

"""Transcribe the given raw audio file asynchronously.
Args:
    audio_file: the raw audio file.
"""
audio_file = 'audioFiles/test.raw'

with open(audio_file, 'rb') as speech:
    speech_content = base64.b64encode(speech.read())

service = get_speech_service()
service_request = service.speech().asyncrecognize(
    body={
        'config': {
            'encoding': 'LINEAR16',
            'sampleRate': 16000, 
            'languageCode': 'en-US',
        },
        'audio': {
            'content': speech_content.decode('utf-8', 'ignore')
            }
        })
response = service_request.execute()

print(json.dumps(response))

name = response['name']

service = get_speech_service()
service_request = service.operations().get(name=name)

while True:
    # Get the long running operation with response.
    response = service_request.execute()

    if 'done' in response and response['done']:
        break
    else:
        # Give the server a few seconds to process.
        print('%s, waiting for results from job, %s' % (datetime.now().replace(second=0, microsecond=0), name))
        time.sleep(60)

print(json.dumps(response))

which gives me a response of:

kayl@kayl-Surface-Pro-3:~/audioConversion$ python speechToText.py 
{"name": "527788331906219767"} 2017-03-30 20:10:00, waiting for results from job, 527788331906219767
{"response": {"@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"},"done": true, "name": "527788331906219767", "metadata": {"lastUpdateTime": "2017-03-31T03:11:16.391628Z", "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata", "startTime": "2017-03-31T03:10:52.351004Z", "progressPercent": 100}}

Where I should be getting a response that is in the form of:

{"response": {"@type":"type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse", "results":{...}}...

Using a raw audio file that is:

  • 16000hz sample rate, tried 41000hz as well
  • 16 bit Little Endian
  • Signed
  • 65 seconds long

To record this audio I run:

arecord -f cd -d 65 -r 16000 -t raw test.raw

Any advice that could point me in the right direction would be very appreciated.


Solution

  • Your example is basically the same as this sample which is working for me with the test audio files.

    Does your code work for you with the test sample, audio.raw? If so, it's most likely an encoding issue. I've had the most success with flac files and recording audio as recommended in the best practices. I have also used Audacity in the past to take some of the guesswork out of recording.

    From Mac OSX, the following shell script worked for getting 65 seconds of audio:

      rec --channels=1 --bits=16 --rate=44100 audio.wav trim 0 65
    

    I'm then using the following code to transcribe the audio:

    from google.cloud import speech
    speech_client = speech.Client()
    
    with io.open(speech_file, 'rb') as audio_file:
        content = audio_file.read()
        audio_sample = speech_client.sample(
            content,
            source_uri=None,
            encoding='LINEAR16',
            sample_rate=44100)
    
    operation = speech_client.speech_api.async_recognize(audio_sample)
    
    retry_count = 100
    while retry_count > 0 and not operation.complete:
        retry_count -= 1
        time.sleep(2)
        operation.poll()
    
    if not operation.complete:
        print('Operation not complete and retry limit reached.')
        return
    
    alternatives = operation.results
    for alternative in alternatives:
        print('Transcript: {}'.format(alternative.transcript))
    

    Note that in my example, I'm using the new client library that makes it easier to access the API. This sample code is the starting point where I got my example from.