Search code examples
replicationcouchbasebucket

some questions about couchbase's replicas detail


Here I get several questions about the replica functions in couchbase, and hope can be answered. First of all, I wanna give some my own understanding ahout the couchbase; If there are 10 nodes in my cluster, and I set the number of replica to be 3 in each bucket ( actually I find that the maximal value is 3 , and I coundn't find any other way to make it larger than 3), then, does it mean that each data in bucket can only be copied to other three nodes(I guess the three nodes should be random choosen, but could it select manually )in totally 10 nodes; Furthermore, if some of the 10 nodes have downtime, will it cause loss of data?

I conclude my questions as follows;

1, The maximal value of the replica number in couchbase is 3, right or wrong? If wrong, how could it be largger than 3.

2, I guess the three nodes should be random choosen, but could it select manually

3, If my understanding is right, it will have loss of data when we find some nodes being in downtime. How could we avoid the loss under that condition


Solution

  • The maximal value of the replica number in couchbase is 3, right or wrong? If wrong, how could it be larger than 3.

    The maximum number of replicas that you can have is 3, we run in production with 1 replica but it all comes down to how large your cluster is and performance impact. The more replicas you have the more inter node communication and transfer that will occur.

    When you have 3 replicas this means that each node has its data replicated to 3 other nodes, this means you could handle 3 node failures in your cluster. It could happen but it is unlikely, what's more likely to happen is a machine dies and then Couchbase can automatically fail over and promote a replica held in an other node to serve requests/updates.

    Couchbase's system is nice because it means you can scale up and down by failing over a node and automatic re-balancing.

    I guess the three nodes should be randomly chosen, but could I select it manually?

    You have no say on which nodes replicas are held, in fact I think it's a great thing that all of Couchbase's sharding and replica processes are taken out of the developers hands, it's all an automatic process.

    If my understanding is right, it will have loss of data when we find some nodes being in downtime. How could we avoid data loss under that condition?

    As I said before, if one node goes down then a replica is promoted, with 3 back ups you'd need 3 nodes to fail before you noticed something happening. In a production environment you should already have a warning system for each individual node, be it New Relic, Nagios etc that would report if a server dies. If there was a catastrophic problem and you lost more than 4 nodes then yes you would have data loss.

    Bare in mind automatic fail over in Couchbase isn't instantaneous but still it's pretty quick. If you need downtime across the cluster say for server maintenance that needs a restart or something then it is possible to fail a node over, remove it from the cluster, perform operations and tasks on it, then add it back into the cluster and rebalance. Perform those stops again for as many nodes as you need, I've personally done that when I forgot to set specific Linux things that needed a system restart.

    Overall to avoid data loss, monitor your servers, make (daily/hourly) backups of the data in your cluster and have machines that are well provisioned for your workrate.

    Hope that helps!