Search code examples
amazon-web-servicesaws-step-functions

How to see why a long-running AWS Step Function failed


I have an AWS Step Function with many state transitions that can run for a half hour or more.

There are only a few states, and the application loops through them until it runs out of items to process.

I have a run that failed after about half an hour. I can look at the logging under the "Execution event history". However, since this logs every transition and state, there are thousands of events. I cannot page down to show enough events (clicking the "Load More" button) without hanging my browser window.

There is no way to sort or filter this list that I can see.

How can I find the cause of the failure? Is there a way to export the Execution event history somewhere? Or send it to CloudWatch?


Solution

  • You can use the AWS CLI command aws stepfunctions get-execution-history with the --reverse-order flag in order to get the logs from the most recent (where the errors will be) first.