Search code examples
cassandradatastax-java-drivercassandra-cli

ALLOW FILTERING implementation @Cassandra


I have a table like below:

CREATE TABLE tab(
    categoryid text,
    id text,
    name text,
    author text,
    desc text,
    PRIMARY KEY (categoryid , id)
) WITH CLUSTERING ORDER BY (id ASC);

CREATE INDEX ON tab (name);
CREATE INDEX ON tab (author);

When I execute the below queries:

select * from tab ALLOW FILTERING;  ---1
select * from tab where id = 'id01' ALLOW FILTERING;  ---2
select * from tab where categoryid = 'cid01' ALLOW FILTERING;  ---3

What is happening in the back end for the three queries?

Is it going to completely neglect the key indices on id and categoryid.

Appreciate the reply. Thanks


Solution

  • By specifying ALLOW FILTERING you are basically telling Cassandra that you are ok with it retrieving all rows from your table, examining each of them one-by-one, and returning only the ones that match your WHERE clause. Depending on your WHERE clause, it may or may not need to do this. This can be a painfully slow operation on a table that is large or where multiple nodes must be queried to retrieve all of the data.

    select * from tab ALLOW FILTERING;
    

    For this query, you do not have a WHERE clause specified, so it will return all of the rows in the tabs table. ALLOW FILTERING shouldn't alter performance in any noticeable way on this one.

    select * from tab where id = 'id01' ALLOW FILTERING;
    

    On this query, Cassandra will retrieve all rows from the tab table, and just return the ones where id='id01'. The fact that id is a clustering key really won't make much of a difference here. This is because the partition key is what is important for data lookup, and clustering keys are typically use for enforcing row uniqueness and ordering. And incidentally, since you are not querying by partition key (categoryid) it can't even enforce a clustering order on id, so your result set will not be sorted by id.

    select * from tab where categoryid = 'cid01' ALLOW FILTERING;
    

    Queries like this make Cassandra happy! Yes, the fact that categoryid is your partitioning key will be respected, and your result set will contain all CQL rows where categoryid='cid01'. And as an added bonus, in this case your clustering order will be enforced.

    DataStax's Developer Blog has a decent article on this topic that might also be of some further help to you: ALLOW FILTERING Explained.

    In my opinion, I would avoid queries that require ALLOW FILTERING. And I would certainly not run one in production or in an OLTP environment.