When querying the text primary key in Cassandra CQL, the string comparison works the opposite way of what one expect, i.e.
cqlsh:test> select * from sl; name | data --------------------------+------ 000000020000000000000003 | null 000000010000000000000005 | null 000000010000000000000003 | null 000000010000000000000002 | null 000000010000000000000001 | null cqlsh:test> select name from sl where token(name) < token('000000010000000000000005'); name -------------------------- 000000020000000000000003 (1 rows) cqlsh:test> select name from sl where token(name) > token('000000010000000000000005'); name -------------------------- 000000010000000000000003 000000010000000000000002 000000010000000000000001 (3 rows)
In constrast, this is what I get from the string comparison in Python (and I think in most other languages):
>>>'000000020000000000000003' < '000000010000000000000005'
False
If I query without the token function, I get the following error:
cqlsh:test> select name from sl where name < '000000010000000000000005'; Bad Request: Only EQ and IN relation are supported on the partition key (unless you use the token() function)
The table description is:
CREATE TABLE sl (
name text,
data blob,
PRIMARY KEY (name)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
Is there an explanation in the docs I've missed or somewhere else on why such a strange string comparison order is chosen, or does the string comparison operator do not what I expect of it (i.e. returning me some unrelated order, i.e. the order of the rows as they are written to the database). I'm using the Murmur3Partitioner partitioner in case it matters.
In Cassandra rows are ordered by the hash of their key value. With the Random and Murmur3 partitioners there is a random element to the hash value, and thus the order is A) not meaningful, and B) designed to be evenly spread across the ring.
Therefore querying for tokens less than token('000000010000000000000005')
won't do a comparison based-on the string value of "000000010000000000000005". It will do a comparison on the hashed token value. By virtue of the results that you are seeing, the token value of the string "000000020000000000000003" is less than the token value of "000000010000000000000005".
For more information, check through this doc from DataStax: Paging Through Unordered Partitioner Results.
Assuming that you want to be able to query your data by the value of "name", you could build a table kind of like this:
CREATE TABLE sl (
type text,
name text,
data blob,
PRIMARY KEY (type, name)
)
I've created type
as a partitioning key. I'm not sure if your data makes sense to be divided by "type" (or anything else for that matter), so it's more for the sake of example than anything else. Anyway, with name
as a clustering key (determining the on-disk sort order) this query would work:
select * from sl where type='sometype' AND name < '000000010000000000000005';
Again its just an example, but I hope that helps to point you in the right direction.