Search code examples
amazon-web-servicesamazon-s3aws-lambdaaws-step-functions

Starting a new execution of Step Function after exceeding 25,000 events, when iterating through objects in an S3 bucket


I am iterating through an S3 bucket to process the files. My solution is based on this example;

https://rubenjgarcia.es/step-function-to-iterate-s3/

The iteration is working fine but unfortunately I exceed the 25,000 events allowed by one execution, so it eventually fails. I know you have to start a new execution of the step function, but I'm unclear how to tell it where I am at in the current iteration. I have the count of how many files have been processed and obviously the ContinuationToken. Can I use the ContinuationToken to keep track of where I am in iterating through the s3 bucket and the count to tell it when to start a new execution.

I have read the AWS docs https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-continue-new.html but I am not sure where to start applying this to my own solution. Has anyone done this when iterating through objects in an s3 bucket, if you can you point me in the right direction?


Solution

  • I can think of two options:

    1. in your solution you are iterating as long as there is the next token. You can extend that and create a counter and in each iteration increase it. And change your condition to iterate as long as there is next token or count is less than a threshold.

    2. I prefer to use a nested state machine to overcome the 25,000 events limitation. Let's say every time you are reading 100 items from s3. If you pass the list to the nested state machine to process them, then the top-level state machine will not reach 25,000 events, and also the nested state machine.