Search code examples
javaamazon-web-servicesaws-lambdacloudamazon-cloudwatch

Trigger Lambda by number of SQS messages


I have a SQS which will receive a huge number of messages. The messages keep coming to the queue.

And I have a use case where if the number of messages in a queue reaches X number (such as 1,000), the system needs to trigger an event to process 1,000 at a time.

And the system will make a chunk of triggers. Each trigger has a thousand messages.

For example, if we have 2300 messages in a queue, we expect 3 triggers to a lambda function, the first 2 triggers corresponding to 1,000 messages, and the last one will contain 300 messages.

I'm researching and see CloudWatch Alarm can hook up to SQS metric on "NumberOfMessageReceived" to send to SNS. But I don't know how can I configure a chunk of alarms for each 1,000 messages.

Please advice me if AWS can support this use case or any customize we can make to achieve this.

enter image description here


Solution

  • So after going through some clarifications on the comments section with the OP, here's my answer (combined with @ChrisPollard's comment):

    Achieving what you want with SQS is impossible, because every batch can only contain up to 10 messages. Since you need to process 1000 messages at once, this is definitely a no-go.

    @ChrisPollard suggested to create a new record in DynamoDB every time a new file is pushed to S3. This is a very good approach. Increment the partition key by 1 every time and trigger a lambda through DynamoDB Streams. On your function, run a check against your partition key and, if it equals 1000, you run a query against your DynamoDB table filtering the last 1000 updated items (you'll need a Global Secondary Index on your CreatedAt field). Map these items (or use Projections) to create a very minimal JSON that contains only the necessary information. Something like:

    [
        {
         "key": "my-amazing-key",
         "bucket": "my-super-cool-bucket"
        },
        ...
    ]
    

    A JSON like this is only 87 bytes long (if you take the square brackets out of the game because they won't be repeated, you're left out with 83 bytes). If you round it up to 100 bytes, you can still successfully send it as one event to SQS, as it will only be around 100KB of data.

    Then have one Lambda function subscribe to your SQS queue and then finally concatenate the 1 thousand files.

    Things to keep in mind:

    1. Make sure you really create the createdAt field in DynamoDB. By the time it hits one thousand, new items could have been inserted, so this way you make sure you are reading the 1000 items that you expected.

    2. On your Lambda check, just run batchId % 1000 = 0, this way you don't need to delete anything, saving DynamoDB operations.

    3. Watch out for the execution time of your Lambda. Concatenating 1000 files at once may take a while to run, so I'd run a couple of tests and put 1 min overhead on top of it. I.e, if it usually takes 5 mins, set your function's timeout to 6 mins.

    If you have new info to share I am happy to edit my answer.