Search code examples
error-handlingcassandraycsb

How does Cassandra handle errors? Will it retry or fail a request when some nodes are down?


I'm running YCSB on a 6-node Cassandra cluster with default settings. Assuming that the client has built connection with the coordinator, and found sufficient replicas to meet its consistency level, what will happen if:

(1) the coordinator is down? Will the YCSB client contacts a different coordinator?

(2) some of the replicas are down? Will it retry or simply fail the request?


Solution

  • Please only ask one question at a time. In answer to your questions:

    1. If the node chosen as the coordinator is down, then another node will be chosen as coordinator. Note, that the clients should be connecting with the TokenAwareLoadBalancingPolicy (is that configurable in YCSB?) which will negate the need for designating a coordinator node as long as a partition key is passed in the query (which all of your client-side queries should be doing).

    2. That depends on the consistency level designated on the client side. If the client is operating at QUORUM consistency, and your keyspace is defined with a replication factor (RF) of 3, then you only need to be able to hit two replicas. If the client operates at consistency of ONE, then you only need to find one. So if you have a RF of 3, and are querying at ONE or LOCAL_ONE two nodes could be down and you could still serve requests. YCSB should really have a way to configure that.