I have a question about a potential scenario and wanted to know if our assumption is correct. (using cassandra 3.x with DSE 5.x)
We've learned from the docs that in order to add a new (and fresh) datacenter to a cluster, we need to temporarily set ReplicationFactor like so:
{'class' : 'NetworkTopologyStrategy', 'DC1' : 3, 'DC2' : 0 }
Where DC1 is the currently running datacenter and DC2 is the one we are adding. This test helped us understand the impact of the streaming of data from an existing live ring to a brand new one.
Now to our hypothetical scenario, which is to be able to start replicating a keyspace that was initially only replicated to one DC, to now save to other currently running DCs.
When creating the keyspace:
CREATE KEYSPACE Foo WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'US' : 2, 'EU' : 0};
Then, when business requirements change:
ALTER KEYSPACE Foo WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'US' : 2, 'EU' : 2};
Is it considered safer to define all new keyspaces in an application with all DCs to 0, so that the value can be modified at some point. And would changing that replication factor be enough to trigger the streaming of the keyspace to the other datacenters - or do we also need to run nodetool rebuild
?
The accepted practice is to simply not define a replication factor for a DC that you don't want a particular keyspace to replicate to. I don't think that anything bad would happen if you did it your way, but I feel that not defining it would be the safer way to go.
would changing that replication factor be enough to trigger the streaming of the keyspace to the other datacenters - or do we also need to run nodetool rebuild?
Altering the replication factor on the keyspace will tell all future writes to that keyspace to go to also the new data center. However, for the existing data to replicate to the new data center you will have to run a nodetool repair
or nodetool rebuild
.