Search code examples
javajmsmessagingamqpqpid

How does AMQP retry connection work with failover?


I couldn't find a clear documentation on how the JMS client reconnect works with the failover logic. I consulted the following official docs, which correspond to the versions I'm using:

The JMS client specifies the following URI to have failover and retry:

String uri = new String("failover:(amqp://host1:5672,amqp://host2:5672)?&failover.maxReconnectAttempts=20");
javax.jms.ConnectionFactory connectionFactory = new org.apache.qpid.jms.JmsConnectionFactory(uri);
  • Is the failover.maxReconnectAttempts applied on each failover URI (i.e will retry 20 times on the first URI, and if it doesn't succeed to reconnect, will attempt another 20 times on the second URI; for me, the caveat here is that with the default maximum reconnect value which is -1, the client will retry indefinitely on the first URI, and therefore the failover logic will never reach the second URI), or is it round-robin on both URIs (i.e retries on the first URI once, then second URI also once, then back to the first, etc... for a total of 20 retries)? I will be testing this of course, however, is this behavior explained in the official standard?

  • Given that a client is engaged in sending or receiving a message and there is a connection problem with the broker on host1, will the send or receive operation be also retried? I expect that the underlying connection be retried, however, not sure about what happens with a send or receive operation. If the send/receive is not automatically retried, it means there will have to be another retry logic on the level of the send/receive (which I find very unlikely). Same as before, is this documented in the official standard?


Solution

  • There is no specification defined behavior for how an AMQP client manages failover so it will vary from one implementation to another. For the outdated version of Qpid JMS you are using the client I think (can't recall anymore) considers each attempt to connect to a remote URI a distinct attempt and therefore can miss URIs if you've configured fewer attempts than URIs to connect to.

    The failover retries logic was overhauled around v0.26.0 and is much more robust and predictable now so you really should move to the latest release soon to be 0.35.0

    How send and receive handling works varies depending on where in the process the connection dropped. Most of the time a send will be retried but there are a few small windows of time in which you might get an exception indicating that the send failed. For a receive things are more difficult because it is up to the remote to decide what happens to an unsettled delivery when the connection drops, it is within its rights to send that message someplace else.

    Even when using failover you need to handle JMS Exceptions and practice good coding methods as there is no such thing as fully transparent failover and you need to be prepared to react when something unexpected happens.