Search code examples
amazon-web-servicesaws-lambdaamazon-sqs

AWS SQS Lambda Processing n files at once


I have setup an SQS queue where S3 paths are being pushed whenever there is a file upload.

I have also set up a Lambda with an SQS trigger and a batch size of 1.

In my scenario, I have to process n files at a time. Lets say (n = 10).

Say, there are 100 messages in the queue. In my current implementation I'm doing the following steps:

  1. Whenever there is a message in the input queue, Lambda will be triggered
  2. First I check the active number of concurrent executions I have. If am already running 10 executions, the code will simply return without doing anything. If it is less than 10, it reads one message from the queue and calls for processing.
  3. Once the processing is done, the message will be manually deleted from the queue.

With the above mentioned approach, I'm able to process n files at a time. However, Say 100 files lands into S3 at the same time.

It leads to 100 lambda calls. Since we have a condition check in Lambda, the first 10 messages go for processing and the remaining 90 messages go to the in-flight mode.

Now, when some of my processing is done (say 3/10 got over), still the main queue is empty since the messages are still in-flight.

As per my understanding, if processing a file takes x minutes, the visibility timeout of the messages in the queue should be lesser than x (<x) . So that the message would once be available in the queue.

But it also leads to another problem. Say the batch took some more time to complete, message would come back to queue. Lambda would be triggered and once again it goes to the flight mode.

Is there any way, I can control the number of triggers made in lambda. For example: only first 10 messages should be processed however remaining 90 messages should remain visible in the queue. Or is there any other way I can make this design simple ?

I don't want to wait until 10 messages. Even if there are only 5 messages, it should trigger those files. And I don't want to call the Lambda in timely fashion (ex: calling it every 5 minutes).


Solution

  • There is a setting in Lambda called Reserved Concurrency, I'm going to quote from the docs (emphasis mine):

    • Reserved concurrency – Reserved concurrency creates a pool of requests that can only be used by its function, and also prevents its function from using unreserved concurrency.

    [...]

    To ensure that a function can always reach a certain level of concurrency, configure the function with reserved concurrency. When a function has reserved concurrency, no other function can use that concurrency. Reserved concurrency also limits the maximum concurrency for the function, and applies to the function as a whole, including versions and aliases.

    For a deeper dive, check out this article from the documentation.

    You can use this to limit how many Lambdas can be triggered in parallel - if no Lambda execution contexts are available, SQS invocations will wait.

    This is only necessary if you want to limit how many files can be processed in parallel. If there is no actual need to limit this, it won't cost you more to let Lambda scale out for you.