I have this schema to store a post
and its comment together in same table:
CREATE TABLE post (
post_id int,
access_key text,
comment_id int,
title text,
comments FROZEN <type_comment>,
PRIMARY KEY ((post_id, access_key), comment_id)
);
CREATE TYPE ks_test.type_comment (
id int,
content text
);
here is a sample data
post_id | access_key | comment_id | comments | title
---------+------------+------------+--------------------------+--------------
1 | about_post | 1 | null | this is post
1 | comments | 2 | {id: 2, content: 'cmn1'} | null
1 | comments | 3 | {id: 3, content: 'cmn2'} | null
1 | comments | 4 | {id: 4, content: 'cmn3'} | null
I am using this schema so that i have to access only one table to get a post
and its comment. What will the pros & cons of this schema ?
Really thats fine and will work if you can figure out how to fill in the id fields.
I think the comment_id
and id
in type_comment
is redundant. Can replace entire comments
column with content
.
Personally I would replace post_id with a time uuid since there is no auto incrementing thing id in Cassandra. Auto incrementing globally unique identifiers in a distributed environment is difficult. There are some things out there to do it for you but really, just using a time uuid/random uuid is the easiest.
CREATE TABLE post (
post_id uuid,
access_key text,
comment_id timeuuid,
title text,
content text,
PRIMARY KEY ((post_id, access_key), comment_id)
);
If want to grab it by date can do something like
CREATE TABLE post (
year int,
month int,
access_key text,
comment_id timeuuid,
title text,
content text,
PRIMARY KEY ((access_key, year, month), comment_id)
);
then you can grab by month and can also use ranges to grab within periods of that month. Could also make a "time_bucket" to replace year/month as an iso time string like "2016-01-01 00:00:00" than you can change bucketing easier (ie months, days etc).