message-queue messaging distributed-computing

What are examples of real-world scenarios where a message queuing system can accept the loss of some messages?

I was reading this blog post, in which the author proposes the following question, in the context of message queues:

does it matter if a message is lost? If you application node, processing the request, dies, can you recover? You’ll be surprised how often it doesn’t actually matter, and you can function properly without guaranteeing all messages are processed

At first I thought that the main point of handling messages was to never loose a single message - after all, a message lost could mean a hotel reservation not booked, a checkout not completed, or any other functionality not carried through, which seems too similar to a bug for me. I suppose I am missing something, so, what are examples of scenarios where it is OK for a messaging system to loose a few messages?

Solution

Well, your initial expectation:

the main point of handling messages
was to never loose a single message

was just not a correct one.

Right, if one strives for a one certain type of robustness, where fail-safe measures have to take all due care and precautions, so as not a single message could get lost, yes, there your a priori expressed expectation fits.

This does not mean that all other system designs have to carry all the immense burdens and have to pay all that incurred costs ( resources-wise, latency-wise et al ), as the "100+% guaranteed delivery" systems do ( but, again, only if they can ).

Anti-pattern cases:

There are many use-cases, where an absolute certainty of delivery of each and every message originally sent is actually an anti-pattern.

Just imagine a weakly synchronised system ( including ones, that have nothing like backthrottling or even any simplest form of feedback propagation at all ), where the sensors read an actual temperature, a sound, a video-frame and send a message with that value(s).

Whenever a postprocessing system gets such information delivered, there may be a reason not to read any and all "old" values, but the most recent one(s).

If a delivery framework already got any newer set of values, all the "older" values, not processed yet, just hanging at some depth from the queue-head, yet in the queue, might create the anti-pattern, where one would not like to have to read and process any and all of those "older" values, but just the most recent one(s).

Like no one will make a trade with you based on yesterday prices, there is not positive value to make any new, current, decision based on reading any and all "old" temperature readings, that still wait in the queue.

Some smart-messaging frameworks provide explicit means for taking just the very "newest" message from a given source - thus enabling to imperatively discard any "older" messages, avoiding them from being read and processed right due to a known presence of a one "most" recent.

This answers the original question about the assumed main point of handling messages.

Efficiency first:

In any case, where a smart-delivery takes place ( either deliver an exact copy of the original message content or noting-at-all ), the resources are used at their best efforts, yet, without spending a single penny on anything but the "just-enough" smart-delivery.

Building robustness costs more than that.
Building an ultimate robustness, costs even way more than that.

Systems than do have such an extreme requirement can and may extend the resources-efficient smart-delivery so as to reach some requirements defined level of robustness, at some add-on costs.

The same but reversed is not possible -- if an "everything-proof" system is to get a slimmer form and fashion, so as to fit onto any restricted-resources hardware or to make it "forget" some "old" messages, that are of no positive value at this very moment ( but on the contrary, constitute a must for the processing element to read and process each and every "unwanted" message, just due to the fact it was delivered, while knowing a core-logic needs just the most recent one ).

Distributed systems accrue E2E-latency from many distributed sources, so any rigid-delivery system just block and penalise the only one element, who is ( latency-wise ) innocent -- the receiver.