Search code examples
pythoncassandracqlcqlsh

Differences for max time uuid in cqlsh and Python cassandra driver


I'm having a bug where my python code can't find a record in Cassandra and it seems to boil down to differences in the minTimeuuid/maxTimeuuid functions in cqlsh versus the python driver.

When I run a query in cqlsh (the ts column is a TimeUUID):

cqlsh:mydb> SELECT minTimeuuid(unixTimestampOf(ts)), maxTimeuuid(unixTimestampOf(ts)), unixTimestampOf(ts), dateOf(ts) from mytable where ...;

 minTimeuuid(unixTimestampOf(ts))     | maxTimeuuid(unixTimestampOf(ts))     | unixTimestampOf(ts) | dateOf(ts)
--------------------------------------+--------------------------------------+---------------------
 177dc170-b8e3-11e1-8080-808080808080 | 177de87f-b8e3-11e1-7f7f-7f7f7f7f7f7f |       1339982128903 | 2012-06-18 03:15:28+0200

When I run the same thing in Python:

Python 2.7.12 (default, Oct  8 2019, 14:14:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cassandra.util
>>> from datetime import datetime
>>> dt = datetime(2012,6,18,1,15,28,903000)
>>> cassandra.util.max_uuid_from_time(dt)
UUID('177dc170-b8e3-11e1-bf7f-7f7f7f7f7f7f')
>>> cassandra.util.min_uuid_from_time(dt)
UUID('177dc170-b8e3-11e1-8080-808080808080')

Note that the min versions are identical but the max time uuid are not:

Min (cqlsh first):                    |  Max (cqlsh first):
177dc170-b8e3-11e1-8080-808080808080  |  177de87f-b8e3-11e1-7f7f-7f7f7f7f7f7f
177dc170-b8e3-11e1-8080-808080808080  |  177dc170-b8e3-11e1-bf7f-7f7f7f7f7f7f

I don't understand how they can be different, any ideas? I've tried the same things under Python 3.5.2 rather than 2.7 as above with same results.


Solution

  • Cassandra timeuuid compare uses a (signed) 8-bit integer compare for the least significant bits of the UUID. Most significant bits use memory order (unsigned int compare). Thus min/maxTimeuuid functions create a UUID that would be the smallest/largest according to Cassandra compare order.

    My guess is that whoever wrote the original code was ignorant of the difference between signed and unsigned byte comparison, and then the legacy order just had to be respected, to avoid breaking any existing data.

    You can check out this commit for more details: https://github.com/apache/cassandra/commit/6d266253a5bdaf3a25eef14e54deb56aba9b2944