Search code examples
akkadistributed-computingactor

Akka Failure Recovery: Gated State


The akka documentation says all outbound messages are dropped if a link with a remote system is in gated state. Does this mean they will be delivered to dead letters immediately, or they will only be delivered to dead letters if the state changes to quarantined?

The logs are explicit about quarantined state, but not gated state:

Association with remote system [...] has failed,
address is now gated for [5000] ms. Reason: [Disassociated]

Association to [...] having UID [...] is irrecoverably failed.
UID is now quarantined and all messages to this UID will be delivered to dead letters. 
Remote actorsystem must be restarted to recover from this situation.
  • If a remote system transitions from gated state to active state because of a successful inbound connection, will all dropped outbound messages be re-sent?

  • Is a registered DeathWatch on a remote actor sufficient to detect dropped messages, or do I need to handle message failures to gated (but not quarantined) systems separately?


Solution

  • While gated all messages go to deadletter, they are not buffered and resent in any way - in other words, there are no guarantees of delivery, if you want that you need to add additional logic for this in your actors (relevant section of the docs: http://doc.akka.io/docs/akka/2.4/general/message-delivery-reliability.html)

    DeathWatch will make it possible to avoid sending messages to an actor that has died, however there is no guarantee that you didn't send messages to it after it died but before your actor received the terminated message, so this is not sufficient to build delivery guarantees.

    Depending on your needs implementing delivery guarantees can be as "light" as a simple acknowledging protocol with the remote actor, or as "heavy" as AtLeastOnce (docs: http://doc.akka.io/docs/akka/2.4/scala/persistence.html#at-least-once-delivery-scala) which will handle node crash without loosing messages.