I need to store users lists by customer in cassandra. There are two basic approaches I see:
A: create table users ( // one row per user
customer int, userId int, primary key (customer, userId),
login text, name text, email text
);
or
B: create table users ( // one row per customer
customer int primary key, users map<int, text>
);
where in the second approach I would store a JSON representation of the user data as "text".
I will have the following operations on the table:
Here are the questions:
1) For large users lists, B is a bad idea. What order of magnitude would "large" be?
2) Would you expect B to have better performance for small users lists? What order of magnitude would "small" be?
3) What other advantages / disadvantages do you see for A or B?
(For those who need to know: I'm using scala / datastax driver / phantom to access the database.)
I would stick with A, definitely.
Collections can have at most 64k queryable elements so that's your hard limit. And C* reads all the collection during queries, so you want to keep the collections as empty as possible to avoid huge read penalties.
I expect the performance to be of the same order of magnitude because both are sequential reads.
In B you will use not idempotent queries to update the collection. My mistake, it's a map, not a list.
A makes very easy to update your schema. In B you'd need to read-modify-write your records.
Stick with A.