I have a problem with understanding a one thing from this article - http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling
Exercise - We want get all users by groupname.
Solution:
CREATE TABLE groups (
groupname text,
username text,
email text,
age int,
PRIMARY KEY (groupname, username)
);
SELECT * FROM groups WHERE groupname = 'footballers';
But to find all users in group we can set: PRIMARY KEY (groupname)
and it work's also.
Why is needed in this case a clustering key (username)? I know that when we set username as the clustering key we can use it in a WHERE
clause. But to find users only by groupname is any difference between PRIMARY KEY (groupname)
and PRIMARY KEY (groupname, username)
in terms of query efficiency?
Clustering keys provide multiple benefits: Query flexibility, result set ordering (within a partition key) and extended uniqueness.
But to find all users in group we can set:
PRIMARY KEY (groupname)
Try that once. Create a new table using only groupname
as your PRIMARY KEY, and then try to insert multiple username
s for each group. You will find that there will only ever be one group, and that the username
column will be overwritten for each new user within that group.
But to find users only by
groupname
is any difference betweenPRIMARY KEY (groupname)
andPRIMARY KEY (groupname, username)
in terms of query efficiency?
If PRIMARY KEY (groupname)
performs faster, the most-likely reason is because there can be only a single row returned.
In this case, defining username
as a clustering key provides:
The ability to sort by username
within a group.
The ability to query for a specific username
within a group.
The ability to add multiple username
s within a group.