I am newbie in Cassandra, I create a Cluseter with following specification.
How to make sure that Cassandra spread data evenly around the cluster?
node count: 4
replication_factor: 3
table schema:
CREATE TABLE space.user (
id uuid PRIMARY KEY,
firstname text,
lastname text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Simplest way is to use nodetool status
- you can check how much is shown in the Load
column - that's amount of data on disk, although, it could count also the data that isn't cleaned (if you did the topology change, you may need to run nodetool cleanup
to remove that data).
Basically, you shouldn't have very big differences between nodes, but this depends on the number that you specified as num_tokens
. If you have 8 tokens per server, then difference could be +-10-12% from the average size. For higher number of tokens, the difference could be smaller.
But in your case I think that difference between nodes won't be very big because you have very small rows, and first name/last name should be very big.