I have task to create a social feed(news feed). I think no need to explain the standard functionality - all are how as FB. I chose solution apache cassandra and designed a data column Posts for storing information about posts users:
CREATE TABLE Posts (
post_id uuid,
post_at timestamp,
user_id text,
name varchar,
category set<text>,
link varchar,
image set<varchar>,
video set<varchar>,
content map<text, text>,
private boolean,
PRIMARY KEY ((post_id, user_id), post_at)
)
WITH CLUSTERING ORDER BY (post_at DESC) COMPACT STORAGE;
The next table contains id user posts:
CREATE TABLE posts_user (
post_id bigint,
post_at timestamp,
user_id bigint,
PRIMARY KEY ((post_id), post_at, user_id)
)
WITH CLUSTERING ORDER BY (post_at DESC) AND COMPACT STORAGE;
How do you think, is it good? What do you do change in the such a data model?
There are a couple of questions and a couple of improvements that jump out.
COMPACT STORAGE is deprecated now (if you want to take advantage of CQL 3 features). I do not think that you can create your table Posts
as you have defined above since it uses CQL 3 features (collections) with COMPACT STORAGE as well as declaring more than one column that is not part of the primary key.
posts_user
has completely different key types than Posts
does. I am not clear on what the relationship between the two tables is, but I imagine that post_id
is supposed to be consistent between them, whereas you have it as a uuid
in one table and a bigint
in the other. There are also discrepancies with the other fields.
Assuming post_id
is unique and represents the id of an individual post, it is strange to have it as the first part of a compound primary key in the Posts
table since if you know the post_id
then you can already uniquely access the record. Furthermore, as it is part of the partition key it also prevents you from doing wider selects of multiple posts and taking advantage of your post_at
ordering.
The common method to fix this is to create a dedicated index table to sort the data the way you want.
E.g.
CREATE TABLE posts (
id uuid,
created timestamp,
user_id uuid,
name text,
...
PRIMARY KEY (id)
);
CREATE TABLE posts_by_user_index (
user_id uuid,
post_id uuid,
post_at timestamp,
PRIMARY KEY (user_id,post_at,post_id)
WITH CLUSTERING ORDER BY (post_at DESC)
);
Or more comprehensively:
CREATE TABLE posts_by_user_sort_index (
user_id uuid,
post_id uuid,
sort_field text,
sort_value text,
PRIMARY KEY ((user_id,sort_field),sort_value,post_id)
);
However, in your case if you only wish to select the data one way, then you can get away with using your posts
table to do the sorting:
CREATE TABLE posts (
id uuid,
post_at timestamp,
user_id uuid,
name text,
...
PRIMARY KEY (user_id,post_at,id)
WITH CLUSTERING ORDER BY (post_at DESC)
);
It will just make it more complicated if you wish to add additional indexes later since you will need to index each post not just by its post id, but by its user and post_at fields as well.