Search code examples
aws-lambdaamazon-ecsaws-step-functions

How to handle error in Step Functions ECS task error code taskfailedtostart


I have a step function which has an ecs task to run.

All my step functions states monitored and error handled by catch option.

But this error taskfailedtostart error is coming when one ecs container doesn't have IP address to run a container. I solved the IP issue.. but I can't handle this taskfailedtostart error using step functions catch block..

Any suggestions to handle the error?


Solution

  • Using StringToJson in a Pass state, you can switch your retry or catch logic based on the task failure type

    See the example below where it retries the task based on the StoppedReason

    {
      "Version": "1.0",
      "Comment": "Run AWS Fargate task",
      "TimeoutSeconds": 900,
      "StartAt": "Run Fargate Task",
      "States": {
        "Run Fargate Task": {
          "Type": "Task",
          "Resource": "arn:aws:states:::ecs:runTask.sync",
          "Parameters": {
            "LaunchType": "FARGATE",
            "Cluster": "<Cluster-ARN>",
            "TaskDefinition": "<TaskDef-ARN>",
            "Group.$": "$$.Execution.Name",
            "NetworkConfiguration": {
              "AwsvpcConfiguration": {
                "Subnets": [
                  "<Subnet-1>",
                  "<Subnet-2>",
                  "<Subnet-3>"
                ],
                "AssignPublicIp": "ENABLED",
                "SecurityGroups": [
                  "<SecurityGroup-ID>"
                ]
              }
            },
            "Overrides": {
              "ContainerOverrides": [
                {
                  "Name": "<Container-Name>",
                  "Environment": [
                    {
                      "Name": "<Environment-Variable-Name-To-Override>",
                      "Value": "<Environment-Variable-Value-To-Override>"
                    }
                  ]
                }
              ]
            }
          },
          "End": true,
          "Catch": [
            {
              "ErrorEquals": [
                "States.TaskFailed"
              ],
              "Next": "Cause to Json"
            }
          ]
        },
        "Cause to Json": {
          "Type": "Pass",
          "Parameters": {
            "Cause.$": "States.StringToJson($.Cause)"
          },
          "Next": "Retry or Finish"
        },
        "Retry or Finish": {
          "Type": "Choice",
          "Choices": [
            {
              "Variable": "$.Cause.StoppedReason",
              "StringMatches": "ResourceInitializationError: *",
              "Next": "Run Fargate Task"
            }
          ],
          "Default": "Fail"
        },
        "Fail": {
          "Type": "Fail"
        }
      }
    }