Search code examples
amazon-web-servicesmessage-queueamazon-sqsmessage-bus

Amazon SQS Dead Letter Queue: Is it really dead letter or poison?


I'm trying to get clarification on what exactly Amazon's SQS Dead Letter Queue is doing.

According to http://aws.typepad.com/aws/2014/01/amazon-sqs-new-dead-letter-queue.html

Dead Letter Queue - The ARN (Amazon Resource Name) of an SQS queue that will receive the messages which were not successfully processed after maximum number of receives by consumers.

Doesn't that sound more like a Poision Queue? The key distinction being that the consumers did receive the message. A dead letter would be when the message is potentially fine, but can't be delivered, probably due to a service outage. http://www.eaipatterns.com/DeadLetterChannel.html

Where as this sounds like the message is being successfully received multiple times, but processing the message fails, which I understand to be the meaning of a Poison Message Queue.

Message Bus vs Queue

Does the Dead Letter Pattern have different meaning in the context of a plain old queue? Since SQS is just a queue, not a message bus, it isn't responsible for delivering messages. Instead it waits for messages to be picked up (requested). So the traditional Dead Letter pattern doesn't really apply since there isn't a message bus attempting to deliver a message and not being able to find a recipient.

Can SQS behave like a message bus?

Is there a way through SQS to set up channels and listeners instead of explicitly polling for messages from the queue?


Solution

  • Good question.

    Based on the definition from the canonical source, which you quoted (citations removed for clarity):

    The specific way a Dead Letter Channel works depends on the specific messaging system’s implementation, if it provides one at all. The channel may be called a “dead message queue” or “dead letter queue.” Typically, each machine the messaging system is installed on has its own local Dead Letter Channel so that whatever machine a message dies on, it can be moved from one local queue to another without any networking uncertainties. This also records what machine the message died on. When the messaging system moves the message, it may also record the original channel the message was supposed to be delivered on.

    ...it's not clear if there's really a difference. I understand what you mean by "poison queue," and your understanding of how SQS works is sound. Semantically, the difference between a DLQ and a PQ -- "undeliverable" in the style of email versus "poison" -- isn't clear to me. Perhaps a PQ is a flavor of a DLQ.

    FWIW, ActiveMQ's redelivery policy uses the same definition of DLQ -- a hybrid DLQ / PQ -- as SQS does.

    Can SQS behave like a message bus?

    SQS can't, but there are similar products that can.

    1. Amazon SNS

      SNS (Simple Notification Service) is a generalized publish-subscribe topic system. SNS allows you to create topics, and then register subscribers that receive push notifications. Currently, push notifications can come in the form of HTTP/S, email, SMS, SQS, and mobile device push notifications.

      SNS has a pretty sane retry policy for HTTP/S, but does not support a DLQ or PQ AFAIK.

    2. IronMQ's Push Queues

      IronMQ is another REST-ful message queueing service that is a little more fully-featured than SQS. (True FIFO message ordering, longer delays, and so on, but sadly smaller message sizes.) Push queues allow you to set up push "subscribers," which then receive an HTTP POST any time a new message is put onto the queue.

      If IronMQ fails to deliver a message -- the HTTP POST times out, or your endpoint returns anything but a 2xx -- then it will retry the delivery. If it runs out of retries, then it will put the message onto an error queue -- a combination DLQ and PQ in this case.

      This is probably as close as you're going to get to a true "ESB" in a managed service.

    Of course, then there are true open-source ESBs and SOA frameworks -- MULE, ServiceMix, and so on -- but I don't know nearly enough about what you're trying to do to make any kind of recommendation there. :)