I have a situation in Cassandra cluster (deployed over ec2 instance) such that, the disk space is going to run out of space in each node of the cluster. Now if I add some more instances in the Cassandra cluster, will it increase disk space?
What i mean, whenever we are running out of space, can we add more instances to cassandra cluster to inrease overall disk space?
Is it a right way to do, If so?
What i mean, whenever we are running out of space, can we add more instances to cassandra cluster to inrease overall disk space?
Yes, and yes.
Consider a 4 node cluster, with a replication factor (RF) of 3, with 100GB of storage per node. Assume that the initial complete copy of the data footprint is 60GB. With 4 nodes and a RF of 3, each node will be responsible for 3/4 of the data, or 45GiB.
Address Load Owns Total
10.0.0.1 45.0 GiB 75.0% 100Gb
10.0.0.2 45.0 GiB 75.0% 100Gb
10.0.0.3 45.0 GiB 75.0% 100Gb
10.0.0.4 45.0 GiB 75.0% 100Gb
With size tiered compaction (default) you want to keep each node at under 50% of total disk usage. This set up allows for that.
However, let's say the app team runs a big load overnight. We come in tomorrow morning, and find this:
Address Load Owns Total
10.0.0.1 70.0 GiB 75.0% 100Gb
10.0.0.2 70.0 GiB 75.0% 100Gb
10.0.0.3 70.0 GiB 75.0% 100Gb
10.0.0.4 70.0 GiB 75.0% 100Gb
Essentially, a complete copy of the data has grown to 93.3 GiB. To bring the amount of data per disk back down below 50%, we will have to add more nodes.
But how many?
If we add a single node (maintaining a RF of 3), that means each node becomes responsible for 3/5 (60% of the data), which is 55.98 GiB. Close, but not quite there.
If we add two nodes, that brings us to a total of 6, which means that each node is responsible for 50% of the data, which is 46.65 GiB. That does bring us back under %50 per node, so we should add at least two nodes.
After doing so, the cluster should look like this:
Address Load Owns Total
10.0.0.1 46.65 GiB 50.0% 100Gb
10.0.0.2 46.65 GiB 50.0% 100Gb
10.0.0.3 46.65 GiB 50.0% 100Gb
10.0.0.4 46.65 GiB 50.0% 100Gb
10.0.0.5 46.65 GiB 50.0% 100Gb
10.0.0.6 46.65 GiB 50.0% 100Gb
Note, that simply bootstrapping in new nodes only moves data to those nodes. It does not remove it from the existing nodes. For that, you should run a nodetool cleanup
on each pre-existing node.