Search code examples
performancecassandracqlcql3

Storing items in a map or in rows in Cassandra


I need to store users lists by customer in cassandra. There are two basic approaches I see:

A: create table users (  // one row per user
     customer int, userId int, primary key (customer, userId),
     login text, name text, email text
   );

or

B: create table users (  // one row per customer
     customer int primary key, users map<int, text>
   );

where in the second approach I would store a JSON representation of the user data as "text".

I will have the following operations on the table:

  • insert / update / delete single user
  • read all users for a customer
  • read a single user by id and customer

Here are the questions:

1) For large users lists, B is a bad idea. What order of magnitude would "large" be?

2) Would you expect B to have better performance for small users lists? What order of magnitude would "small" be?

3) What other advantages / disadvantages do you see for A or B?

(For those who need to know: I'm using scala / datastax driver / phantom to access the database.)


Solution

  • I would stick with A, definitely.

    1. Collections can have at most 64k queryable elements so that's your hard limit. And C* reads all the collection during queries, so you want to keep the collections as empty as possible to avoid huge read penalties.

    2. I expect the performance to be of the same order of magnitude because both are sequential reads.

    3. In B you will use not idempotent queries to update the collection. My mistake, it's a map, not a list.

    4. A makes very easy to update your schema. In B you'd need to read-modify-write your records.

    Stick with A.