Search code examples
cassandradata-modelingdenormalizationnosql

How is denormalization handled in cassandra


What is the best approach to update table with duplicate data?

I have a table

table users (
id text PRIMARY KEY,
email text,
description,
salary
)

I will delete, update, insert etc to this table. But I also have a requirement to be able to search by email, and description. If I create new table with new composite keys for email, and description,

when I update my base table I do

insert into users (id, salary) values (1, 500);

I do not have the required data to also update my secondary table since all the client has is id and salary. How is the second table updated.

Other workarounds and shortcomings

  1. I could have created a materialized view, but since the base table has only one primary key I can only add one more column. my search requirement requires more than one column.
  2. Create secondary indexes on the columns that will be searched on. But the performance for this would be bad since the columns I will be searching on would have high cardinality. i.e. description, email, etc

Solution

  • So, the "correct" way of doing this is to create 3 tables. salary_by_id, salary_by_email and salary_by_description.

    table salary_by_id (
      id text PRIMARY KEY,
      salary int
    )
    
    table salary_by_email (
      email text PRIMARY KEY,
      salary int
    )
    
    table salary_by_description (
      description text,
      id int,
      salary int,
      primary key (description, id)
    )
    

    The reason i added id to salary_by_description is that, from my own guessing, description won't be globally uniq, so it has to have something else in it's primary key.

    Depending on the size of these tables the last one might need something extra added to it's partitioning key. And if needed you can add id, email and description to the other tables.

    Now, when inserting or deleting values you need so do it in all 3 tables. If you use a driver, like in java, that supports asynchronous calls, then this doesn't cost very much extra.