performance replication load-balancing ravendb redundancy

RavenDb topology with load balancing and redundancy

We're trying to come up with the appropriate RavenDb topology that would allow us to balance load as well as be fault tolerant. It seems that better approach for load balancing would be to use native sharding, we might shift to use it but due to domain peculiarities it is not trivial at this point. In order to have redundancy we just setup 2 ravendb nodes per group with master/master replication between so if one fails, RavenDb client will automatically switches to another one. We have indexing "component" which is the only one who will be writing to the database so it'll be writing to one node and we expect these changes to be distributed eventually. We're going to setup master/master replication between two groups of ravendb nodes so if indexing component will eventually fall back to the group 1, changes should be replicated to the second group.

The schema

So, it seems there's low risk of having conflicts since we have only one player who writes to the database (with bunches, once in a minute). Several questions regarding this setup:

Is it typical for RavenDb to have so many master/master replications?
Can this problem be solved in an easier way?
How to configure client's fallback conventions so each web node will fail to another node in its group first before failing to another group's RavenDb node?
Do you see any potential issues if we embed simple round robin logic for reads on each of web nodes (web node 1 will read from both: RavenDb1_01 and RavenDb1_02)? Will it make standard RavenDb fallback behavior go crazy?

Solution

1) It is pretty common to have many nodes in such a cluster, yes. Note that you need to setup replication to be changed and replicated in such topology.

2) It is generally easier to have a fully connected topology, rather than the layers you have here..

3) The failover is always based on the client's primary node order of destinations. In other words, if node 2 has destinations (node 1, node 3) and node 1 has destinations (node 3, node 2). A client that was originally connected to node 2 would go to node 1 then node 3 on failover, and a client that was originally connected to node 1 will go to node 3 and then 2 on failover.

4) Round robin and failover behave separately.