Search code examples
amazon-web-servicesaws-lambdaaws-step-functions

Sns mail notification when a step is not kicked off within a threshold timeframe


I have an emr step which is submitted through step function. During step run I can see task is submitted, but emr step is not executed and emr console don’t have any information .

  1. How can I debug this?
  2. How can I send an sns when a step doesn’t start execution with in a threshold timeframe?in my case step function shows emr task submitted but no information on emr console and pipeline is long running without failing for more than half hr

Solution

    1. You could start the debugging process through the Step Functions execution log and identify the specific step that has failed, and later, you can move on looking for the EMR console or the specific service that has failed. Usually when the EMR step doesn't appear in the EMR console, is due to a Runtime Error, caused by an exception raised when calling the EMR step.

    2. For this scenario, you can use the Error Handling that Step Functions has, using the Catch and Timeout fields, you can find more details in the AWS documentation here. Basically you need to add this fields as show bellow:

    {
        "StartAt": "EmrStep",
           "States": {
              "EmrStep": {
                 "Type": "Task",
                 "Resource": "arn:aws:emr:execute-X-step",
                 "Comment": "This is your EMR step",
                 "TimeoutSeconds": 10,
                 "Catch": [ {
                    "ErrorEquals": ["States.Timeout"],
                    "Next": "ShutdownClusterAndSendSNS"
                 } ],
                 "End": true
              },
              "ShutdownClusterAndSendSNS": {
                 "Type": "Pass",
                 "Comment": "This step handles the timeout exception raised",
                 "Result": "You can shutdown the EMR cluster to avoid increased cost here and later send a sns notification!",
                 "End": true
              }
    }
    

    Note: To catch the timeout exception, you have to catch the error States.Timeout, but also you can define the same catch field for other types of error.