I am new to AWS Step Functions, and I am exploring it as an option for an ETL service.
To my understanding of AWS Lambda's burst concurrency
The maximum immediate increase in function concurrency that can occur when your functions scale in response to a burst of traffic. After the initial burst, concurrency scales by 500 executions per minute up to your concurrency limit.
However, I can set the step function's distributed map concurrent child executions to be 10,000.
If I make invoke lambda function
as one of the step to do some processing in the ETL pipeline, can Lambda keep up with the burst of 10,000. Or the step function knows this caveat and scale the lambda function accordingly, i.e., add 500 instances per minute?
PS: the lambda concurrency limit is 50,000.
Load testing:
I created a sample test case, where I kept the concurrent child executions of the step function to be 3500. But I got the error:
Rate Exceeded. (Service: AWSLambda; Status Code: 429; Error Code: TooManyRequestsException; Request ID: 1448d65e-e451-4e7e-ae88-9b8708eb9060; Proxy: null)
Can I get some help on how to make sure I get around this issue?
One option that come to my mind is to have a warm up execution which scales up the lambda. But I am not in favor of going ahead with that option.
You probably want to look at Lambda Provisioned Concurrency. This will allow you to specify the concurrency you require and AWS Lambda will ensure it is ready. If you expect this workload to be steady state, then you may be fine to leave it provisioned for that level. If you expect the workload to be variable by time and you want to enable Provisioned Concurrency for only specific times, you can use Application Autoscaling (this blog post provides instructions).
Alternatively, you can introduce retries in your ItemProcessor. You can configure Step Functions to use exponential backoff in response to these throttling errors and allow the bust concurrency functionality to scale up automatically. You will want to include this even if you are using Provisioned Concurrency, as you want your workflow to be resilient. But depending on your workload, it may be sufficient on it's own and simplify your account configuration.