Search code examples
cassandracqlcassandra-3.0

Manage many to many relationship in Cassandra


I have these two tables:

create table users (
    id UUID,
    email ascii,
    created_at timeuuid,
    primary key(id, email)
);
create table groups (
    id UUID,
    name ascii,
    created_at timeuuid,
    primary key(id, name)
);

A user can be in multiple groups, a group can obviously have multiple users.

So I've two ways to maintain a many-to-many relationship (taken from here), one is:

CREATE TABLE user_group (
  user UUID,
  group UUID,
  PRIMARY KEY (user, group)
)

Another one is (using sets):

CREATE TABLE user_jn_group (
  user UUID PRIMARY KEY,
  groups set<UUID>
)
CREATE TABLE group_jn_user (
  group UUID PRIMARY KEY,
  users set<UUID>
)

I'm using Cassandra 3.9.0. I know both approaches have their own advantages, disadvantages. I want the least duplicity, but also I have an equal weight to read/write speed. Also, is there any more hidden cost behind any of both approaches?


Solution

  • Using collections for this is probably impractical because of the size limit on collections (although that shouldn't be a concern for a system with just a few users), chances are high that the set of users in a group will be too large.

    It's also worth noting that your solution based on the user_group table won't work as it won't support querying by group. You would need to maintain another table to support this query (and always maintain the two records):

    CREATE TABLE group_user (
      user UUID,
      group UUID,
      PRIMARY KEY (group, user)
    )
    

    This will allow querying by group.


    Additional options:

    Add a secondary index to user_group:
    Another approach is to expand the user_group solution: if you have a secondary index on the group field, you'll be able to perform lookups in both ways:

    CREATE INDEX ON user_group (group);
    

    Use a materialized view
    You can also use a materialized view instead of a group_user table. The data between user_group and this view will be kept in sync by cassandra (eventually):

    CREATE MATERIALIZED VIEW group_user
    AS SELECT group, user
    FROM user_group
    WHERE user IS NOT NULL AND group IS NOT NULL
    PRIMARY KEY (group, user);
    

    With this, you'll have to add a record to user_group only and the view will take care of searches by group.

    As you noted, each has pros and cons that can't be detailed here. Please check the docs on limitations of each option.