Please bear with me for slightly longer problem description. I am a newbie to Cassandra world and I am trying to migrate my current product from oracle based data layer to Cassandra.
In order to support range queries I have created an entity like below:
create table if not exists my_system.my_system_log_dated(
id uuid,
client_request_id text,
tenant_id text,
vertical_id text,
channel text,
event text,
event_type text,
created_date date,
primary key((created_date, tenant_id, vertical_id, channel, event),
event_type, client_request_id, id)
) with clustering order by (created_date desc);
Now, I have come across several documentation/resources/blogs that mentions that I should be keeping my partition size less than 100 mb for an optimally performing cluster. With the volume of traffic my system handles per day for a certain combinations of partitioning key, there is no way i can keep it less than 100 mb with above partitioning key.
To fix this i introduced a new factor called bucket_id and was thinking of assigning it hour of the day value to further break partitions into smaller chunks and keep them less than 100 mb(Even though this means i have to do 24 reads to serve traffic details for one day, but i am fine with some inefficiency in reads). Here is the schema with bucket id
create table if not exists my_system.my_system_log_dated(
id uuid,
client_request_id text,
tenant_id text,
vertical_id text,
channel text,
event text,
bucket_id int,
event_type text,
created_date date,
primary key((created_date, tenant_id, vertical_id, channel, event,
bucket_id), event_type, client_request_id, id)
) with clustering order by (created_date desc);
Even with this, couple of combinations of goes more than 100 mb while all other volume sits comfortably within the range.
With this situation in mind I have below questions:
Here is some more info that I thought may be useful:
Thanks in advance for your inputs!!
Your approach with the bucket id looks good. Answering your questions:
my_system.my_system_log_dated
. Check how to configure this compaction strategy, because the time window you set will be very important.