Search code examples
pythonamazon-web-servicesamazon-echo

How can the Amazon Echo catch errors?


I am building an application for the Amazon Echo in python. When I speak a bad utterance that the Amazon Echo does not recognize, my skill quits and returns me to the home screen. I am looking to prevent this and repeat what was just uttered by the Amazon Echo.

To try to achieve this to some extent I try calling a function to say something when the session ends or bad input is detected.

def on_session_ended(session_ended_request, session):
    """
    Called when the user ends the session.
    Is not called when the skill returns should_end_session=true
    """
    print("on_session_ended requestId=" + session_ended_request['requestId'] +
          ", sessionId=" + session['sessionId'])
    return get_session_end_response()

However, I just get an error from the Echo -- this function, on_session_ended is never entered.

So how do I conduct error catching and handling on the Amazon Echo?

UPDATE 1: I reduced the number of utterances and the number of intents with custom slots to one. Now a user should only speak A, B, C, or D. If they speak anything outside of this, then the intent is still triggered but with no slot value. Thus, I can do some error checking based on whether the slot value is there or not. However, this seems like not the best way to do it. When I try to add in intents with no slots and a corresponding utterance, anything that doesn't match either of my intents defaults to this new intent. How can I resolve these issues?

UPDATE 2: Here are some relevant sections of my code.

Intent handlers:

def lambda_handler(event, context):
    print("Python START -------------------------------")
    print("event.session.application.applicationId=" +
          event['session']['application']['applicationId'])

    if event['session']['new']:
        on_session_started({'requestId': event['request']['requestId']},
                           event['session'])

    if event['request']['type'] == "LaunchRequest":
        return on_launch(event['request'], event['session'])
    elif event['request']['type'] == "IntentRequest":
        return on_intent(event['request'], event['session'])
    elif event['request']['type'] == "SessionEndedRequest":
        return on_session_ended(event['request'], event['session'])


def on_session_started(session_started_request, session):
    print("on_session_started requestId=" + session_started_request['requestId']
          + ", sessionId=" + session['sessionId'])


def on_launch(launch_request, session):
    """ Called when the user launches the skill without specifying what they want """
    print("on_launch requestId=" + launch_request['requestId'] +
          ", sessionId=" + session['sessionId'])
    # Dispatch to your skill's launch
    return create_new_user()


def on_intent(intent_request, session):
    """ Called when the user specifies an intent for this skill """

    print("on_intent requestId=" + intent_request['requestId'] +
          ", sessionId=" + session['sessionId'])

    intent = intent_request['intent']
    intent_name = intent['name']
    attributes = session["attributes"] if 'attributes' in session else None
    intent_slots = intent['slots'] if 'slots' in intent else None

    # Dispatch to skill's intent handlers

    # TODO : Authenticate users
    #   TODO : Start session in a different spot depending on where user left off

    if intent_name == "StartQuizIntent":
        return create_new_user()

    elif intent_name == "AnswerIntent":
        return get_answer_response(intent_slots, attributes)

    elif intent_name == "TestAudioIntent":
        return get_audio_response()

    elif intent_name == "AMAZON.HelpIntent":
        return get_help_response()

    elif intent_name == "AMAZON.CancelIntent":
        return get_session_end_response()

    elif intent_name == "AMAZON.StopIntent":
        return get_session_end_response()

    else:
        return get_session_end_response()


def on_session_ended(session_ended_request, session):
    """
    Called when the user ends the session.
    Is not called when the skill returns should_end_session=true
    """
    print("on_session_ended requestId=" + session_ended_request['requestId'] +
          ", sessionId=" + session['sessionId'])
    return get_session_end_response()

Then we have the functions that actually get called and the response builders. I have edited some of the code for privacy. I haven't built up all the display response text fields and have some uids hard coded so I don't have to worry about authentication yet.

# --------------- Functions that control the skill's behavior ------------------

####### GLOBAL SETTINGS ########
utility_background_image = "https://i.imgur.com/XXXX.png"


def get_welcome_response():
    """ Returns the welcome message if a user invokes the skill without specifying an intent """
    session_attributes = {}
    card_title = ""
    speech_output = ("Hello and welcome ... quiz .... blah blah ...")
    reprompt_text = "Ask me to start and we will begin the test!"
    should_end_session = False

    # visual responses
    primary_text = ''  # TODO
    secondary_text = ''  # TODO

    return build_response(session_attributes,
                          build_speechlet_response(card_title, speech_output, reprompt_text,
                                                   should_end_session,
                                                   build_display_response(utility_background_image,
                                                                          card_title, primary_text,
                                                                          secondary_text)))


def get_session_end_response():
    """ Returns the ending message if a user errs or exits the skill """
    session_attributes = {}
    card_title = ""
    speech_output = "Thank you for your time!"
    reprompt_text = ''
    should_end_session = True

    # visual responses
    primary_text = ''  # TODO
    secondary_text = ''  # TODO

    return build_response(session_attributes,
                          build_speechlet_response(card_title, speech_output, reprompt_text,
                                                   should_end_session,
                                                   build_display_response(utility_background_image,
                                                                          card_title, primary_text,
                                                                          secondary_text)))


def get_audio_response():
    """ Tests the audio capabilities of the echo """
    session_attributes = {}
    card_title = ""  # TODO : keep no 'welcome'?
    speech_output = ""
    reprompt_text = ""
    should_end_session = False

    # visual responses
    primary_text = ''  # TODO
    secondary_text = ''  # TODO

    return build_response(session_attributes,
                          build_speechlet_response(card_title, speech_output, reprompt_text,
                                                   should_end_session, build_audio_response()))


def create_new_user():
    """ Creates a new user that the server will recognize and whose action will be stored in db """
    url = "http://XXXXXX:XXXX/create_user"
    response = urllib.request.urlopen(url)
    data = json.loads(response.read().decode('utf8'))
    uuid = data["uuid"]
    return ask_question(uuid)


def query_server(uuid):
    """ Requests to get num_questions number of questions from the server """
    url = "http://XXXXXXXX:XXXX/get_question_json?uuid=%s" % (uuid)  # TODO : change needs to to be uuid
    response = urllib.request.urlopen(url)
    data = json.loads(response.read().decode('utf8'))

    if data["status"]:
        question = data["data"]["question"]
        quid = data["data"]["quid"]
        next_quid = data["data"]["next_quid"]  # TODO : will we need any of this?
        topic = data["data"]["topic"]
        type = data["data"]["type"]
        media_type = data["data"]["media_type"]  # either 'IMAGE', 'AUDIO', or 'VIDEO'
        answers = data["data"]["answer"]  # list of answers stored in order they should be spoken
        images = data["data"]["image"]  # list of images that correspond to order of answers list
        audio = data["data"]["audio"]
        video = data["data"]["video"]

        question_data = {"status": True, "data":{"question": question, "quid": quid, "answers": answers,
                         "media_type": media_type, "images": images, "audio": audio, "video": video}}
        if next_quid is "None":
            return None
        return question_data
    else:
        return {"status": False}


def ask_question(uuid):
    """ Returns a quiz question to the user since they specified a QuizIntent """
    question_data = query_server(uuid)

    if question_data is None:
        return get_session_end_response()

    card_title = "Ask Question"
    speech_output = ""
    session_attributes = {}
    should_end_session = False
    reprompt_text = ""

    # visual responses
    display_title = ""
    primary_text = ""
    secondary_text = ""

    images = []
    answers = []

    if question_data["status"]:
        session_attributes = {
            "quid": question_data["data"]["quid"],
            "uuid": "df876c9d-cd41-4b9f-a3b9-3ccd1b441f24",
            "question_start_time": time.time()
        }

        question = question_data["data"]["question"]
        answers = question_data["data"]["answers"]  # answers are shuffled when pulled from server
        images = question_data["data"]["images"]
        # TODO : consider different media types

        speech_output += question
        reprompt_text += ("Please choose an answer using the official NATO alphabet. For example," +
                          " A is alpha, B is bravo, and C is charlie.")

    else:
        speech_output += "Oops! This is embarrassing. There seems to be a problem with the server."
        reprompt_text += "I don't exactly know where to go from here. I suggest restarting this skill."

    return build_response(session_attributes, build_speechlet_response(card_title, speech_output,
            reprompt_text, should_end_session,
             build_display_response_list_template2(title=question, image_urls=images, answers=answers)))


def send_quiz_responses_to_server(uuid, quid, time_used_for_question, answer_given):
    """ Sends the users responses back to the server to be stored in the database """
    url = ("http://XXXXXXXX:XXXX/send_answers?uuid=%s&quid=%s&time=%s&answer_given=%s" %
          (uuid, quid, time_used_for_question, answer_given))
    response = urllib.request.urlopen(url)
    data = json.loads(response.read().decode('utf8'))
    return data["status"]


def get_answer_response(slots, attributes):
    """ Returns a correct/incorrect message to the user depending on their AnswerIntent """

    # get time, quid, and uuid from attributes
    question_start_time = attributes["question_start_time"]
    quid = attributes["quid"]
    uuid = attributes["uuid"]

    # get answer from slots
    try:
        answer_given = slots["Answer"]["value"].lower()
    except KeyError:
        return get_session_end_response()

    # calculate a rough estimate of the time it took to answer question
    time_used_for_question = str(int(time.time() - question_start_time))

    # record response data by sending it to the server
    send_quiz_responses_to_server(uuid, quid, time_used_for_question, answer_given)

    return ask_question(uuid)


def get_help_response():
    """ Returns a help message to the user since they called AMAZON.HelpIntent """
    session_attributes = {}
    card_title = ""
    speech_output = "" # TODO
    reprompt_text = "" # TODO
    should_end_session = False

    return build_response(session_attributes,
            build_speechlet_response(card_title, speech_output, reprompt_text, should_end_session,
             build_display_response(utility_background_image, card_title)))


# --------------- Helpers that build all of the responses ----------------------


def build_hint_response(hint):
    """
    Builds the hint response for a display.

    For example, Try "Alexa, play number 1" where "play number 1" is the hint.
    """
    return {
        "type": "Hint",
        "hint": {
            "type": "RichText",
            "text": hint
        }
    }


def build_display_response(url='', title='', primary_text='', secondary_text='', tertiary_text=''):
    """
    Builds the display template for the echo show to display.

    Echo show screen is 1024px x 600px

    For additional image size requirements, see the display interface reference.
    """
    return [{
        "type": "Display.RenderTemplate",
        "template": {
            "type": "BodyTemplate1",
            "token": "question",
            "title": title,
            "backgroundImage": {
                "contentDescription": "Question",
                "sources": [
                    {
                        "url": url
                    }
                ]
            },
            "textContent": {
                "primaryText": {
                    "type": "RichText",
                    "text": primary_text
                },
                "secondaryText": {
                    "type": "RichText",
                    "text": secondary_text
                },
                "tertiaryText": {
                    "type": "RichText",
                    "text": tertiary_text
                }
            }
        }
    }]


def build_list_item(url='', primary_text='', secondary_text='', tertiary_text=''):
    return {
        "token": "question_item",
        "image": {
            "sources": [
                {
                    "url": url
                }
            ],
            "contentDescription": "Question Image"
        },
        "textContent": {
            "primaryText": {
                "type": "RichText",
                "text": primary_text
            },
            "secondaryText": {
                "text": secondary_text,
                "type": "PlainText"
            },
            "tertiaryText": {
                "text": tertiary_text,
                "type": "PlainText"
            }
        }
    }


def build_display_response_list_template2(title='', image_urls=[], answers=[]):
    list_items = []
    for image, answer in zip(image_urls, answers):
        list_items.append(build_list_item(url=image, primary_text=answer))

    return [{
        "type": "Display.RenderTemplate",
        "template": {
            "type": "ListTemplate2",
            "token": "question",
            "title": title,
            "backgroundImage": {
                "contentDescription": "Question Background",
                "sources": [
                    {
                        "url": "https://i.imgur.com/HkaPLrK.png"
                    }
                ]
            },
            "listItems": list_items
        }
    }]


def build_audio_response(url): # TODO add a display repsonse here as well
    """ Builds audio response. I.e. plays back an audio file with zero offset """
    return [{
        "type": "AudioPlayer.Play",
        "playBehavior": "REPLACE_ALL",
        "audioItem": {
            "stream": {
                "token": "audio_clip",
                "url": url,
                "offsetInMilliseconds": 0
            }
        }
    }]


def build_speechlet_response(title, output, reprompt_text, should_end_session, directive=None):
    """ Builds speechlet response and puts display response inside """
    return {
        'outputSpeech': {
            'type': 'PlainText',
            'text': output
        },
        'card': {
            'type': 'Simple',
            'title': title,
            'content': output
        },
        'reprompt': {
            'outputSpeech': {
                'type': 'PlainText',
                'text': reprompt_text
            }
        },
        'shouldEndSession': should_end_session,
        'directives': directive
    }


def build_response(session_attributes, speechlet_response):
    """ Builds the complete response to send back to Alexa """
    return {
        'version': '1.0',
        'sessionAttributes': session_attributes,
        'response': speechlet_response
    }

UPDATE 3: I updated the intents so there is now one custom intent that takes a custom slot, and then I have another custom intent that takes no slots. These custom intents also have there own sample utterances. Both the intents and their utterances are listed below. When I start the skill, it works fine. Then when I say/type "zoo zoo zoo" to test bad input, I get an error. Both the request for "zoo zoo zoo" and the response are listed below. I am looking for a good way to catch this bad input error and resume/revert the skill back to its previous state.

Intents:

...
{
      "intent": "TestAudioIntent"
},
{
      "slots": [
        {
          "name": "Answer",
          "type": "LETTER"
        }
      ],
      "intent": "AnswerIntent"
},
...

Sample Utterances:

AnswerIntent {Answer}
AnswerIntent I think it is {Answer}
TestAudioIntent test the audio

Example JSON request:

{
  "session": {
    "new": false,
    "sessionId": "SessionId.574f0b74-be17-4f79-bbd6-ce926a1bf856",
    "application": {
      "applicationId": "XXXXXXXX"
    },
    "attributes": {
      "quid": "7fa9fcbf-35db-4bbd-ac73-37977bcef563",
      "question_start_time": 1515691612.7381804,
      "uuid": "df876c9d-cd41-4b9f-a3b9-3ccd1b441f24"
    },
    "user": {
      "userId": "XXXXXXXX"
    }
  },
  "request": {
    "type": "IntentRequest",
    "requestId": "EdwRequestId.23765cb0-f327-4f52-a9a3-b9f92a375a5f",
    "intent": {
      "name": "TestAudioIntent",
      "slots": {}
    },
    "locale": "en-US",
    "timestamp": "2018-01-11T17:26:57Z"
  },
  "context": {
    "AudioPlayer": {
      "playerActivity": "IDLE"
    },
    "System": {
      "application": {
        "applicationId": "XXXXXXXX"
      },
      "user": {
        "userId": "XXXXXXXX"
      },
      "device": {
        "supportedInterfaces": {
          "Display": {
            "templateVersion": "1",
            "markupVersion": "1"
          }
        }
      }
    }
  },
  "version": "1.0"
}

And I get the following testing error as a response:

The remote endpoint could not be called, or the response it returned was invalid.

Solution

  • What I ended up doing is using something similar to Amazon's dialogue management system. If a user says something that doesn't fill a slot, I re-prompt them with that question. My goal is to record a user's statements/answers after each time they speak, thus I didn't use the built-in dialogue management. Additionally, I used Amazon's slot synonyms for all my slots to make my modal more robust.

    I still don't know that this is the best way, but it is a starting point and seems to work O.K....