Search code examples
cassandracql

how to use in and order by at the same time in cassandra


I see the official document: https://docs.datastax.com/en/cql/3.1/cql/cql_using/useColumnsSort.html

can use "SELECT * FROM users WHERE userID IN (102,104) ORDER BY age ASC;"

how to define the users table or materialized view?

PAGING OFF is a must?

Here is my current view:

CREATE MATERIALIZED VIEW navigation_by_id_time AS
       SELECT * FROM navigation
       WHERE competition_id IS NOT NULL AND event_type_id IS NOT NULL AND event_id IS NOT NULL AND market_id IS NOT NULL AND market_start_time IS NOT NULL AND market_suspend_time IS NOT NULL
       PRIMARY KEY (event_type_id, market_start_time, event_id, market_id, market_suspend_time);

This query doesn't work

select * from navigation_by_id_time where event_type_id in(1,2,3) order by market_start_time;

message="Cannot page queries with both ORDER BY and a IN restriction on the partition key; you must either remove the ORDER BY or the IN and sort client side, or disable paging for this query"

Thanks


Solution

  • Cassandra does not support paging when bth IN and ORDER BY are used. It is stated also in datastax documentation - Retrieval using IN keyword.

    Also, if you inspect the code, you will also get some more explanations.

    The check was added in JIRA-6722 and the explanations of the behaviour is:

    It is however not all that easy in practice. If you query a full page of each partition and there is many partitions in the IN, you'll load tons of data in memory, defeating in large parts the goal of paging. If you instead query less than the page size of each partition, you now may need to re-query some of the partitions depending on what the merge sort yield on those first pages.

    As a conclusion: you need to turn off paging in cqlsh. If you are using a Java client, this might help (also explains some more stuff).