Search code examples
cassandradata-modelingcql

Can a cassandra table be queried using only a part of the composite partition key?


Consider a table like this to store a user's contacts -

CREATE TABLE contacts {
    user_name text,
    contact_name text,
    contact_id int, 
    contact_data blob,
    PRIMARYKEY ((user, contact_name), contact_id)
    //          ^-- Note the composite partition key
}

The composite partition key results in a row per contact.

Let's say there are a 100 million users and every user has a few hundred contacts.

I can look up a particular user's particular contact's data by using

SELECT contact_data FROM contacts WHERE user_name='foo' AND contact_name='bar'

However, is it also possible to look up all contact names for a user using something like,

SELECT contact_name FROM contacts WHERE user_name='foo'

? could the WHERE clause contain only some of all the columns that form the primary key?

EDIT -- I tried this and cassandra doesn't allow it. So my question now is, how would you model the data to support two queries -

  1. Get data for a specific user & contact
  2. Get all contact names for a user

I can think of two options -

  1. Create another table containing user_name and contact_name with only user_name as the primary key. But then if a user has too many contacts, could that be a wide row issue?
  2. Create an index on user_name. But given 100M users with only a few hundred contacts per user, would user_name be considered a high-cardinality value hence bad for use in index?

Solution

  • In a RDBMS the query planner might be able to create an efficient query plan for that kind of query. But Cassandra can not. Cassandra would have to do a table scan. Cassandra tries hard not to allow you to make those kinds of queries. So it should reject it.