Stopping replication of cassandra in regular intervals

we have an idea. Which we want to think about, and maybe use in production.

We want to use 1 Datacenter as primary cassandra. 2 Datacenter is secondary and third datacenter with a cassandra cluster as the backup cluster.

Datacenter 1 and 2 replicate as normal. Datacenter 3 should lag 30 Minutes behind. We want to force it by shutting down datacenter 3 replication. Basically start / stop gossip via nodetool every 30 Minutes.

I think this should work once in a while (for example you have a schema change, you switch off DC3 and update and if something goes wrong you bootstrap DC1 and DC2 from DC3 with empty data) but what happens if you do it regularly?

What do you guys think? Will be soon or later the replication broken?

Solution

What do you want to accomplish?

First if you do this DC3 wont really lag 30 minutes behind all the time. In fact DC3 does not get any updates for 30 minutes and then fetches up to a 'consistent' state with almost no lag.

You will put all writes to the hints so you need to be sure none of them get lost or you will need repairs really often. Hinted handoffs are kept for 3 hours per default, there is a throttle too. That might fail in some situations, e.g. high write load times. (http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__max_hint_window_in_ms)

Do you want a safe point for desaster recovery?

You can create snapshots in DC3 via cron in 30 minute intervals. Fast, needs no additional space (hard links) and you can easily recover from them. Tagging them with a timestamp as 201707-1200 makes it easy to find the right ones - also for cleaning up old ones.

http://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsSnapShot.html

http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsBackupSnapshotRestore.html