resolve write problem when write to Cassandra replica set

By design we can write to any node in the Cassandra replica set. For situation my replica set has 2 node. When I make a write operation to node A but the node is unavailable. Do I have catch exception then re-write to node B manually ?

On mongodb, their Driver have "Retry-able Writes" to auto write to another node if primary node is down. Does Cassandra have this feature ?

Solution

When you write to Cassandra you specify the consistency level you wish to write wish - ranging from ANY which provides no guarantees, up to ALL which requests that all replicas in all DCs acknowledge back to the co-ordinator.

This write is sent to a single node - based on your load balancing policy - that node acts as the co-ordinator for the whole operation, and will return a single response of success / exception- your application does not have to itself individually send the write to multiple nodes, it just sends to 1 node (any node can be used) who co-ordinates the write to the replicas.

In a normal scenario of using local_quorum for a write with a very normal replication factor of 3 then as long as the co-ordinator has 2 of the 3 nodes providing acknowledgement of the write, the application will not get any exception - even if the 3rd node fails to write the data.

There is a retry policy available on the driver - which can allow for a retry in the event of a timeout, you should ensure though that the operation is idempotent when using this. (for example, appending an item to a list, retrying could result in the item being on the list twice on one of the replicas).

With your particular replication factor being 2 - you are currently in a position where you are lack consistency guarantees, or resilience.

one / local_one - only guarantees one of the nodes got the write. (Both are likely to get it but there is no guarantee provided)
quorum / local_quorum - requires both nodes acknowledge, so you have no ability to handle a node failure.

This is because the quorum of 2 is 2 - if you used 3 nodes with an RF=3, then local_quorum requires 2 of the 3, which would allow a node to be down while providing a stronger guarantee on consistency.