Search code examples
cassandracqlcql3cassandra-2.1

Cassandra CQL where clause with multiple collection values?


My data model:-

tid                                  | codes        | raw          | type
-------------------------------------+--------------+--------------+------
a64fdd60-1bc4-11e5-9b30-3dca08b6a366 | {12, 34, 53} | {sdafb=safd} |  cmd

CREATE TABLE MyTable (
tid       TIMEUUID,
type      TEXT,
codes     SET<INT>,
raw       TEXT,
PRIMARY KEY (tid)
);
CREATE INDEX ON myTable (codes);

How to query the table to return rows based on multiple set values.

This works:-

select * from logData where codes contains 34;

But i want to get row based on multiple set values and none of this works:-

select * from logData where codes contains 34, 12; or 
select * from logData where codes contains 34 and 12; or
select * from logData where codes contains {34, 12};

Kindly assit.


Solution

  • If I create your table structure and insert a similar row to yours above, I can check for multiple values in the codes collection like this:

    aploetz@cqlsh:stackoverflow2> SELECT * FROM mytable 
        WHERE codes CONTAINS 34 
          AND codes CONTAINS 12
          ALLOW FILTERING;
    
     tid                                  | codes        | raw          | type
    --------------------------------------+--------------+--------------+------
     2569f270-1c06-11e5-92f0-21b264d4c94d | {12, 34, 53} | {sdafb=safd} |  cmd
    
    (1 rows)
    

    Now as others have mentioned, let me also tell you why this is a terrible idea...

    With a secondary index on the collection (and with the cardinality appearing to be fairly high) every node will have to be checked for each query. The idea with Cassandra, is to query by partition key as often as possible, that way you only have to hit one node per query. Apple's Richard Low wrote a great article called The sweet spot for Cassandra secondary indexes. It should make you re-think the way you use secondary indexes.

    Secondly, the only way I could get Cassandra to accept this query, was to use ALLOW FILTERING. What this means, is that the only way Cassandra can apply all of your fitlering criteria (WHERE clause) is to pull back every row and individually filter-out the rows that do not meet your criteria. Horribly inefficient. To be clear, the ALLOW FILTERING directive is something that you should never use.

    In any case, if codes are something that you will need to query by, then you should design an additional query table with codes as a part of the PRIMARY KEY.