Search code examples
cassandracqlcqlsh

Cassandra SELECT DISTINCT and timeout issue


When running the following CQL query:

SELECT DISTINCT partition_key FROM table_name;

This is supposedly meant to return the list of partition keys that are in use for the given table. However, with the default timeout settings of 10s, it always times out:

ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}

Changing the timeout settings to:

read_request_timeout_in_ms: 60000
range_request_timeout_in_ms: 60000
request_timeout_in_ms: 60000

And then running said query results in several Cassandra nodes crashing, including the coordinator node. The table has approximately >100M rows with about 5000 unique partition keys.

Is there a workaround to find the unique list of partition keys?


Solution

  • This query should work fine on modern versions of cassandra (2.1 and newer) assuming you're using a client that supports paging/fetch-size, and use a sufficiently low fetch-size (the actual limit depends on your server load).

    Using a third party driver, look for an option to drop the page/fetch size. Set it to 100 and see if it behaves better.

    Using cqlsh, if you have cassandra 3.0 or newer, try PAGING 100;