Search code examples
aws-lambdaaws-step-functions

Retry in AWS Step Functions


I am trying to implement an infinite retry of a lambda function through step functions -

{
  "Comment": "A description of my state machine",
  "StartAt": "Check Export Status",
  "States": {
    "Check Export Status": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "OutputPath": "$.Payload",
      "Parameters": {
        "Payload.$": "$",
        "FunctionName": "arn:aws:lambda:eu-west-1:xxxx:function:xxxx:$LATEST"
      },
      "Next": "Glue StartJobRun",
      "Retry": [
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "BackoffRate": 1,
          "IntervalSeconds": 60,
          "MaxAttempts": 0
        }
      ]
    },
    "Glue StartJobRun": {
      "Type": "Task",
      "Resource": "arn:aws:states:::glue:startJobRun",
      "ResultPath": "$.error",
      "Parameters": {
        "JobName": "glue job test"
      },
      "End": true
    }
  }
}

Somehow when the step function starts executing it just executes once and fails and exits rather than trying infinite number of times. What am i missing ?


Solution

  • You cannot retry indefinitely. From the documentation, it is mentioned this:

    MaxAttempts (Optional)

    A positive integer that represents the maximum number of retry attempts (3 by default). If the error recurs more times than specified, retries cease and normal error handling resumes. A value of 0 specifies that the error or errors are never retried. MaxAttempts has a maximum value of 99999999.

    Here is the link for reference: https://docs.aws.amazon.com/step-functions/latest/dg/concepts-error-handling.html#error-handling-retrying-after-an-error

    So you can retry 99999999 times, which is still quite a lot.