Search code examples
amazon-web-servicesamazon-sqs

How to ensure SQS FIFO is blocked while having a message in the corresponding deadletter queue


Imagine the following lifetime of an Order.

  1. Order is Paid
  2. Order is Approved
  3. Order is Completed

We chose to use an SQS FIFO to ensure all these messages are processed in the order they are produced, to avoid for example changing the status of an order to Approved only after it was Paid and not after has been Completed.

But let's say that there is an error while trying to Approve an order, and after several attempts the message will be moved to the Deadletter queue.

The problem we noticed is the subsequent message, that is "Order is completed", it is processed, even though the previous message, "Approved", it is in the deadletter queue.

How we should handle this?

Should we check the contents of deadletter queue for having messages with the same MessageGroupID as the consuming one, assuming we could do this?

Is there a mechanism that we are missing?


Solution

  • Sounds to me like you are using a single Queue for multiple types of events, where I would probably recommend (at least) three seperate queues:

    • An order paid event queue
    • An order approved event queue
    • An order completed event queue

    When a order payment comes in, an event is put into the first queue, once your system has successfully processed that payment, it removes the item from the first queue (deletes the message), and then inserts 'Order Approved' event into the 2nd queue.

    The process responsible for processing those events, only watches that queue and does what it needs to do, and once complete, deletes the message and inserts a third message into the third queue so that yet another process can see and act on that message - process it and then delete it.

    If anything fails along the way the message will eventually endup in a dead letter queue - either the same on, or one per queue - that makes no difference, but nothing that was supposed to happen AFTER the event failed would happen.

    Doesn't even sound to me like you need a FIFO queue at all in this case, though there is no real harm (except for the slighlty higher cost, and lower throughput limits).