RabbitMQ clustering queue mirroring for high availability: get master node's ip for a queue at time t

From my understanding, RabbitMQ clustering is for scalability not availability, but using mirrored queues allows for availability as well in that if that master fails, an up-to-date slave can be promoted to master.

From the documentation:

Messages published to the queue are replicated to all slaves. Consumers are connected to the master regardless of which node they connect to, with slaves dropping messages that have been acknowledged at the master. Queue mirroring therefore enhances availability, but does not distribute load across nodes (all participating nodes each do all the work).

Therefore, load-balancing across the nodes for a given queue doesn't make sense as this will always add an extra trip from the node contacted to the master node for the queue (unless I'm misunderstanding something). Hence, we'd want to always be able to know which node is the master for a given queue.

I haven't really worked with RabbitMQ very much, so perhaps I'm just missing it in the documentation, but it seems that there's no way to determine a mirrored-queue's master's ip if there was a master failure and a slave was promoted to master. Every source that I see merely remarks on one's ability to set the initial master node, which isn't very helpful for me. For any time t, how do I find the master node ip for a given queue?

PS: It also seems bad to simply have the nodes behind a load-balancer since if there's some network partition (which can occur even with nodes in the same LAN), then we'd potentially be hitting nodes that can't communicate with the master for the queue OR worse there could be a split brain that we'd be evolving, if you will.

Solution

You can create a smart client which maintain queues mirroring topology. It is possible using the Management Plugin and its REST API.

eg. for a queue, curl -i -u guest:guest http://[HOST]:[PORT]/api/queues/[VHOST]/[QUEUE] will return the following payload:

{
  "messages": 0,
  "slave_nodes": [
    "rabbit@node1",
    "rabbit@node0"
  ],
  "synchronised_slave_nodes": [
    "rabbit@node0",
    "rabbit@node1"
  ],
  "recoverable_slaves": [
    "rabbit@node0"
  ],
  "state": "running",
  "name": "myQueue",
=>"node": "rabbit@node2"
}

For myQueue your client will favor connection to node2 (the myQueue master node) to minimize HOP.

I'm not sure if it worth the cost. It will increase the number of connections and the client complexity. I would be happy to receive feeback if you implement somethink.