Search code examples
pythondialogflow-esapi-ai

Send audio file to DialogFlow using Python


I know that I can send data (Text in this case) to DialogFlow by using Python in the following way:

ai = apiai.ApiAI(CLIENT_ACCESS_TOKEN)
request = ai.text_request()
request.lang = 'de'  # optional, default value equal 'en'
request.session_id = "<SESSION ID, UNIQUE FOR EACH USER>"
request.query = "Hello"
response = request.getresponse()
print (response.read())

But I'm not sure if I could send an audio file to DialogFlow, does anyone know about that?


Solution

  • There are two ways to use audio files in Google Action/Dialogflow responses: SSML with the <audio> tag and Media responses. Both expect the audio file to be provided via a HTTPS URL, the file itself is usually stored in a cloud storage service like Google Cloud Storage or Amazon S3.

    SSML (Speech Synthesis Markup Language) is a markup language for audio output, just like HTML is for visual output. It is supported by Google Actions and can be used as drop-in replacement for the normal text response Instead of including the response text like this:

    {
        "speech": "This is the text that the users hears",
        ...
    }
    

    you would mark it up with SSML like this:

    {
        "speech": "<speak><audio src="https://some_cloud_storage.com/my_audio_file.ogg"></audio></speak>",
        ...
    }
    

    Note that the <speak> tags must always surround the entire response so that Google nows that it has to render the text with SSML (just like the <html> tag on websites). The <audio> tag can take several optional attributes, see the documentation for details.

    SSML has the benefit of being very easy to use for you as the developer, but the audio files are limited to a length of 120 seconds and a file size of 5MB and it gives the user no playback control.

    Media responses do not have theses limits and are displayed as a card with an image and playback controls, but they currently work only on Google Home and Android devices.