Search code examples
scalaakkaactordistributed-system

Why is at-most-once delivery the default for actor systems?


I am working with actor systems for an event sourced service and I'd like to understand why the default is at-most-once delivery, i.e., a message from one actor to another may arrive at most once, but potentially not at all. I am working with Akka, but I understand this is in general the default for actor model implementations.

It seems to me that at-most-once delivery can easily result in silent failures/data corruption, and that the issues from at-least-once delivery can be easily fixed by versioning and tagging messages. Quoted from the How Akka Works manual:

(at-most-once works) especially in situations where the occasional lose of a message does not leave a system in an inconsistent state

I don't understand how to make the logical jump from "occasionally losing a message is no big deal" to "losing this particular message is no big deal". It seems to me: if I don't actually care about receiving this message, why send it at all?

Example: We are building a system to count the number of words in a document. We have an actor (M) that receives a document, splits it into lines, and then sends each line to a child actor (C) to count the words in the line. The state of M is how many words are in the document; it updates this state by receiving messages from its children C containing the number of words in a line and adds that number to the total.

In at-most-once delivery: if a child actor lost the message due to network error (if it was an error on the actual actor, the parent could fix using supervision), the parent wouldn't know. It could keep track of state by keeping a map of each child actor and whether it was done and update the map as replies come in, but if we make the computations slightly more complex and have a variable number of replies from the children, this gets out of hand quickly. Also, we just built a system to ensure that the message is received, so really we have started to venture out of the world of at-most-once delivery, especially if we rely on it to re-deliver the compute message to C as opposed to just knowing that the current count is corrupt.

At-least-once delivery: we can give the messages to M a serial ID and keep a log (C -> ID) in the parent M. This lets us know if a message arrives twice, we should discard it. To me this seems simpler and also more generally scalable if the task for the children becomes more complex.


Solution

  • The short answer for why it's the default is that:

    • It's not difficult to implement at-least-once on top of at-most-once (the ask pattern of pairing requests and replies is most of the way there), but there's no great way to get at-most-once out of at-least-once (e.g. without incurring the cost of at-least-once)
    • At-most-once means more or less one thing regardless of context, while at-least-once means an effectively infinite number of things (e.g. after how much time without a reply can a process decide that the send failed, and how many times should it be retried?)... There's no reasonable default meaning and it's reasonably likely that there isn't even a reasonable default for a given application/service.

    Note that at-least-once can lead to at least as much inconsistency as at-most-once, especially when dealing with non-idempotency and the generic ways of adding idempotency are really heavyweight; conversely designing idempotency at a higher level (and where needed) is typically able to be far lighter weight.