Search code examples
cassandracqldatastaxcql3datastax-java-driver

Pros and cons of given cassandra schema


I have this schema to store a post and its comment together in same table:

CREATE TABLE post ( 
  post_id int,
  access_key text,
  comment_id int,
  title text,
  comments FROZEN <type_comment>,
  PRIMARY KEY ((post_id, access_key), comment_id)
);

CREATE TYPE ks_test.type_comment (
  id int,
  content text
);

here is a sample data

post_id | access_key | comment_id | comments                 | title
---------+------------+------------+--------------------------+--------------
       1 | about_post |          1 |                     null | this is post
       1 |   comments |          2 | {id: 2, content: 'cmn1'} |         null
       1 |   comments |          3 | {id: 3, content: 'cmn2'} |         null
       1 |   comments |          4 | {id: 4, content: 'cmn3'} |         null

I am using this schema so that i have to access only one table to get a post and its comment. What will the pros & cons of this schema ?


Solution

  • Really thats fine and will work if you can figure out how to fill in the id fields.

    I think the comment_id and id in type_comment is redundant. Can replace entire comments column with content.

    Personally I would replace post_id with a time uuid since there is no auto incrementing thing id in Cassandra. Auto incrementing globally unique identifiers in a distributed environment is difficult. There are some things out there to do it for you but really, just using a time uuid/random uuid is the easiest.

    CREATE TABLE post ( 
      post_id uuid,
      access_key text,
      comment_id timeuuid,
      title text,
      content text,
      PRIMARY KEY ((post_id, access_key), comment_id)
    );
    

    If want to grab it by date can do something like

    CREATE TABLE post ( 
      year int,
      month int,
      access_key text,
      comment_id timeuuid,
      title text,
      content text,
      PRIMARY KEY ((access_key, year, month), comment_id)
    );
    

    then you can grab by month and can also use ranges to grab within periods of that month. Could also make a "time_bucket" to replace year/month as an iso time string like "2016-01-01 00:00:00" than you can change bucketing easier (ie months, days etc).