Failover and Replication in 2-node Cassandra cluster

I run KairosDB on a 2-node Cassandra cluster, RF = 2, Write CL = 1, Read CL = 1. If 2 nodes are alive, client sends half of data to node 1 (e.g. metric from METRIC_1 to METRIC_5000) and the other half of data to node 2 (e.g. metric from METRIC_5001 to METRIC_10000). Ideally, each node always has a copy of all data. But if one node is dead, client sends all data to the alive node.

Client started sending data to the cluster. After 30 minutes, I turned node 2 off for 10 minutes. During this 10-minute period, client sent all data to node 1 properly. After that, I restarted node 2 and client continued sending data to 2 nodes properly. One hour later I stopped the client.

I wanted to check if the data which was sent to node 1 when node 2 was dead had been automatically replicated to node 2 or not. To do this, I turned node 1 off and queried the data within time when node 2 was dead from node 2 but it returned nothing. This made me think that the data had not been replicated from node 1 to node 2. I posted a question Doesn't Cassandra perform “late” replication when a node down and up again?. It seems that the data was replicated automatically but it was so slow.

What I expect is data in both 2 servers are the same (for redundancy purpose). That means the data sent to the system when node 2 is dead must be replicated from node 1 to node 2 automatically after node 2 becomes available (because RF = 2).

I have several questions here:

1) Is the replication truly slow? Or did I configure something wrong?

2) If client sends half of data to each node as in this question I think it's possible to lose data (e.g. node 1 receives data from client, while node 1 is replicating data to node 2 it suddenly goes down). Am I right?

3) If I am right in 2), I am going to do like this: client sends all data to both 2 nodes. This can solve 2) and also takes advantages of replication if one node is dead and is available later. But I am wondering that, this would cause duplication of data because both 2 nodes receive the same data. Is there any problem here?

Thank you!

Solution

Can you check the value of hinted_handoff_enabled in cassandra.yaml config file?

For your question: Yes you may lose data in some cases, until the replication is fully achieved, Cassandra is not exactly doing late replication - there are three mechanisms.

Hinted handoffs http://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsRepairNodesHintedHandoff.html
Repairs - http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsRepair.html
Read Repairs - those may not help much on your use case - http://wiki.apache.org/cassandra/ReadRepair

AFAIK, if you are running a version greater than 0.8, the hinted handoffs should duplicate the data after node restarts without the need for a repair, unless data is too old (this should not be the case for 10 minutes). I don't know why those handoffs where not sent to your replica node when it was restarted, it deserves some investigation.

Otherwise, when you restart the node, you can force Cassandra to make sure that data is consistent by running a repair (e.g. by running nodetool repair).

By your description I have the feeling you are getting confused between the coordinator node and the node that is getting the data (even if the two nodes hold the data, the distinction is important).

BTW, what is the client behaviour with metrics sharding between node 1 and node 2 you are describing? Neither KairosDB nor Cassandra work like that, is it your own client that is sending metrics to different KairosDB instances?

The Cassandra partition is not made on metric name but on row key (partition key exactly, but it's the same with kairosDB). So every 3-weeks data for each unique series will be associated a token based on hash code, this token will be use for sharding/replication on the cluster. KairosDB is able to communicate with several nodes and would round robin between those as coordinator nodes.

I hope this helps.