Search code examples
aws-step-functions

How to trigger another step function run from the previous one of same state machine


We have step function which are scheduled daily. Currently, the step function runs are not dependent on each other means the scheduled one will run even if the previous run got failed. But if the today's run fails, the next day run will also fail because we publish the datasets at the end of run which are required in next day's run. Let's suppose my step function is failing for the past 4 days because of the corrupt input and once it got fixed, I want to run the step function for the past 4 days. Is there a way I can trigger only the first day step function, and the next ones are triggered one after the other automatically till the current date?

For make step functions run dependent on the previous one, I'll be introducing a lambda which will publish the step function execution into a ddb and I'll check this result in the next scheduled run and will start the execution accordingly. For the problem I mentioned, I saw posts regarding how to trigger one state machine from the other, but didn't get any workaround on how to run multiple executions one by one automatically.


Solution

  • To "catch up" on previous days' runs, you'd want to execute the Step Function serially for a given date D, then D+1, D+2 etc until the current date. There are several ways to approach this problem. Here are a few:

    1. Create a new Step Function that invokes your Step Function. Standard Workflows can only start executions asynchronously, so you'll need to add a increment date (Pass) -> Continue? (Choice) -> Wait -> Execute State Machine (Task) loop to handle the multiple serial executions. This example from the docs may be helpful.
    2. You could also add the loop to the end of your current State Machine.
    3. If this is a one-off task, it's easiest just to invoke the executions manually.

    Some considerations:

    • You'll need to decide whether to include logic to check for the presence of the day's data set.
    • Instead of maintaining separate state in a DynamoDB table, as in the question, you could use the ListExecutions API to query for failed runs.
    • An alternative recovery approach would be to respond to execution failure events as they happen. An EventBridge rule would trigger some logic (e.g. a Lambda or a Step Function target) to rerun the execution.