Search code examples
amazon-web-servicesaws-lambdaamazon-dynamodbamazon-sqs

What is a cost effective serverless pattern for SQS FIFO and Lambda to delay messages by Message Group Id?


I want to be able to delay messages coming through an SQS FIFO queue by a defined period of time. This delay needs to be at the Message Group ID level. Please provide advice on your solution, or on my solutions.

Requirements

Sequential message processing

Message processing no-faster-than 2 minutes per message per message group ID.

Example

Messages 1, 2, 3 come in to a FIFO queue, for Message Group ID A, all 1 second after each other.

Let's say it's 10am.

  • Message 1 can be processed immediately (presuming no previous messages for that group id have come in, or been processed recently)
  • Message 2, should not process before 10:02am
  • Message 3, should not process until 2 minutes after message 2 has processed (10:04am in this example)

Current thought process

I have 2 solutions in my head

Solution 1:

  • SQS FIFO triggers a lambda
  • Create a dynamoDB to handle processing time
  • On reciept of the message by a lambda, it checks the dyanmoDB for last message process time for that group id
  • If less than 2 minutes ago > fail the lambda
    • A message visibility timeout on the queue of 2 minutes should allow this to re-appear after the 2 minute window has passed

Solution 2:

  • Trigger a lambda that calls a step function on message receive
  • Incorporate a wait on the step function at the end of the processing before deleting the message

Things I've ruled out:

Message delay won't solve the problem. If messages come in 1 second after each other, and processing time is brief (1 second on average), will just result in the messages being processed rapidly, just n seconds later than their arrival.

Setting a wait in the lambda that processes them This could run up costs on the lambda, even if the lambda was set to it's lowest memory allocation


Solution

  • This is a common ask... The ability to throttle the processing of messages, typically due to limitations imposed by an external system.

    Unfortunately, native SQS functionality can't address this requirement. I would recommend against intentionally triggering Lambda failure as a form of 're-queuing' -- it could cause problems when there is a backlog of messages. Don't abuse a system designed for one purpose to try and make it fit another purpose.

    Instead, I'd recommend implementing your own logic by putting the 'queue' in a database, then having an 'orchestrator' that determines when a message is able to be processed and invokes the Lambda function directly. The Lambda function would either need to update the database to say that it has finished processing the message, or it could return information to the orchestrator that then updates the database.

    The orchestrator would either need to be 'continuously running code' (eg on EC2 or ECS) or it could be an AWS Lambda function triggered every minute or so.