What are the rules-of-thumb for choosing the right partition key in Cassandra?

I just start learning about Cassandra and going deeper to understand what is happening backstage that makes Cassandra too much faster. I go through the following docs1 & docs2 but was still confused about choosing the right partition key for my table.

I'm designing the Model for a test application like Slack and creating a message table like:

CREATE TABLE messages (
    id uuid,
    work_space_id text,
    user_id text,
    channel_id text,
    body text,
    edited boolean,
    deleted boolean,
    year text,
    created_at TIMESTAMP,
    PRIMARY KEY (..................)
);

My query is to fetch all the messages by a channel_id and work_space_id. So following are the options in my mind to choose the Primary Key:

PRIMARY KEY ((work_space_id, year), channel_id, created_at)
PRIMARY KEY ((channel_id, work_space_id), created_at)

If I go with option 1, so each workspace has a separate partition by a year. This will might create Hotspot if one workspace has 100 Million messages and other has few hundreds in a year.

If I go with option 2, so each workspace channel has seprate partition. What if there are 1Million workspaces & each has 1K channels. This will create about 1B partitions. I know the limit is of 2Billion.

So what is the rool of thumb to choose the right partition key that will distribute data evenly and not create hotspots in a data center?

Solution

The primary rule of data modeling for Cassandra is that you must design a table for each application query. In your case, the app query needs to retrieve all messages based on the workspace and channel IDs.

The two critical things from your app query which should guide you are:

Retrieve multiple messages.
Filter by workspace + channel IDs.

The filter determines the partition key for the table which is (workspace_id, channel_id) and since each partition contains rows of messages, we'll use the created_at column as the clustering key so it can be sorted in descending chronological order so we have:

CREATE TABLE messages_by_workspace_channel_ids (
    workspace_id text,
    channel_id text,
    created_at timestamp,
    user_id text,
    body text,
    edited boolean,
    deleted boolean,
    PRIMARY KEY ((workspace_id, channel_id), created_at)
) WITH CLUSTERING ORDER BY (created_at DESC)

Ordinarily we would stop there but as you pointed out correctly, each channel could potentially have millions of messages which would lead to very large partitions. To avoid that, we need to group the messages into "buckets" to make the partitions smaller.

You attempted to do that by grouping messages by year but it may not be enough. The general recommendation is to keep partitions to 100MB for optimum performance -- smaller partitions are faster to read. We can make the partitions smaller by also grouping them into months:

CREATE TABLE messages_by_workspace_channel_ids_yearmonth (
    workspace_id text,
    channel_id text,
    year int,
    month int,
    created_at timestamp,
    ...
    PRIMARY KEY ((workspace_id, channel_id, year, month), created_at)
) WITH CLUSTERING ORDER BY (created_at DESC)

You could make them even smaller by further grouping them into dates:

CREATE TABLE messages_by_workspace_channel_ids_createdate (
    workspace_id text,
    channel_id text,
    createdate date,
    created_at timestamp,
    ...
    PRIMARY KEY ((workspace_id, channel_id, createdate), created_at)
) WITH CLUSTERING ORDER BY (created_at DESC)

The more "buckets" you use, the more partitions you will have in the table which is ideal since more partitions means greater distribution of data across the cluster. Cheers!

👉 Please support the Apache Cassandra community by hovering over the cassandra tag then click on the Watch tag button. 🙏 Thanks!