Search code examples
cassandracql

How to apply "not equal to" ( != ) operator on timeuuid column on cassandra query?


I have a column in cassandra table which contains timeuuid values. I want to exclude a row with a timeuuid value in a search query. if I apply '!=', I get the following error

Query:

select * from X where id != '0651de16-fa62-11eb-8318-dc4a3e6d5697 ' allow filtering ;

Error:

InvalidRequest: Error from server: code=2200 [Invalid query] message="Unsupported "!=" relation: id != '0651de16-fa62-11eb-8318-dc4a3e6d5697

I got that I can't apply '!=' on timeuuid values. In that case how to exclude a certain timeuuid value while searching?


Solution

  • I got that I can't apply '!=' on timeuuid values.

    It's not just for timeUUIDs. The not equals operator (!=) does not exist in CQL.

    Searching vs. Querying

    I want to exclude a row with a timeuuid value in a search query.

    First of all, searching and querying are not the same thing. Querying involves pulling back a result set of a known size with specific criteria. On the other hand, searching returns a result set of an unknown size using very liberal criteria.

    It's important to understand this difference, because Cassandra is built to support querying. And very specific querying, at that. Unfortunately this use case requires Cassandra to perform random reads (instead of sequential reads). The reality is that Cassandra was not designed to support random read patterns.

    Due to Cassandra's distributed nature, it also doesn't do well with full table scans. Essentially, query time becomes network time (for each partition in the result set) and that's never going to be fast.

    This is why the not equals operator does not exist in CQL. Because there's no way that Cassandra can isolate a specific partition or node with "not equals" criteria.

    That being said, there are some solutions available:

    • A distributed analytics layer like Apache Spark can run that query.
    • If it's just one or a few timeUUIDs which need to be excluded, then perhaps that's something which could be done in the application layer.
    • For a pure Cassandra solution, build your query to pull back the ids that you actually want. That's much easier for Cassandra to do, as opposed to telling Cassandra what you don't want.

    Note: The ALLOW FILTERING directive should never be used on a production system. The idea with Cassandra is to build tables to support intended, known queries. Sometimes data may need to be duplicated into one or more "query tables," and that's ok.