I'd like to execute a lambda function with multiple data, only after a fixed amount of data is gathered. The fixed amount would be, for example, to consider only a specific amount of messages, or messages that are sent in a specific temporal range.
I thought to solve this problem using an SQS, on which I write the messages, and using a polling to check the SQS status. But I don't like this solution, because I'd like to trigger the lambda instantly when the criteria is matched (for example: elapsed time from the first message sent, or a fixed amount of messages)
The ideal would be to send all the messages gathered, for example, after 1 minute after the first message arrives.
To be clear:
Moreover, I'd like to handle different queues in parallel, based on different ids
Is there an elegant way to do so?
I have already in place a system that works with sequential lambda, that handles all the process per single message
Unfortunately, it's not an easy task to do on AWS Lambda (we have a similar use case).
SQS or Kinesis data stream as a trigger can be helpful, but have several limitations:
SQS will be pulled by AWS Lambda in a very high frequency. You will have to add a concurrency limit to your lambda to make it get triggered by more than a single item. And the maximum batch size is just 10.
The base rate for Kinesis trigger is one per second for each shard, and cannot be changed.
Aggregating records between different invocations is not a good idea because you never know if the next invocation will start on a different container so they will get lost.
Kinesis Firehose can be helpful, as you can configure max batch size and max time range for sending a new batch. You can configure it to write to an S3 bucket and configure a lambda to be triggered by new created files.
Make sure that if you use a Kinesis data stream as the source of a Kinesis firehose, the data from each shard of the data stream is seperately batched in the Firehose (this is not documented in AWS).