Search code examples
cassandradenormalization

Update large denormalized data in Cassandra


Let's say I have my data model in this way:

CREATE TABLE users (
   id UUID primary key,
   username text,
   ..10 more columns..
)

CREATE TABLE posts(...)

CREATE TABLE comments (
   id UUID primary key,
   post_id,
   user_id UUID,
   username text,
   comment text,
   ..etc..
)

The comments table contains all the comments the user posted, it contains also the user id and name, so we can retrieve all the data we need to display with one query.

Now, let's say the db contains 100.000 comments, and the user decides to change his username.

Should I really perfom 100.000 updates to change the username in all the comments?


Solution

  • Well yes, you have to update all comments. This is classical Normalization vs Denormalization (consistency vs performance) dilemma.

    So you have to decide if you want to store username in comments: lots of additional reads (can be cached) or some cumbersome mass updates (maybe username change is not that common?).

    Or you can leave it as it is and say it's a data retention policy :)