Search code examples
javasortingcassandradatastax-java-drivertimeuuid

Java UUID compareTo not working correctly for Type1 UUIDs


While working on a use case where the data needs to be sorted on UUID which are all Type 1 or timebased and generated using Datastax Cassandra Java driver library (UUIDS.timebased()), i found that UUID.compareTo is not sorting some of the UUIDs correctly. The logic in compareTo is

    /**
 * Compares this UUID with the specified UUID.
 *
 * <p> The first of two UUIDs is greater than the second if the most
 * significant field in which the UUIDs differ is greater for the first
 * UUID.
 *
 * @param  val
 *         {@code UUID} to which this {@code UUID} is to be compared
 *
 * @return  -1, 0 or 1 as this {@code UUID} is less than, equal to, or
 *          greater than {@code val}
 *
 */
public int compareTo(UUID val) {
    // The ordering is intentionally set up so that the UUIDs
    // can simply be numerically compared as two numbers
    return (this.mostSigBits < val.mostSigBits ? -1 :
            (this.mostSigBits > val.mostSigBits ? 1 :
             (this.leastSigBits < val.leastSigBits ? -1 :
              (this.leastSigBits > val.leastSigBits ? 1 :
               0))));
}

I had the below 2 UUIDs generated using the datastax cassandra driver for java.

UUID uuid1 = java.util.UUID.fromString("7fff5ab0-43be-11ea-8fba-0f6f28968a17")
UUID uuid2 = java.util.UUID.fromString("80004510-43be-11ea-8fba-0f6f28968a17")
uuid1.timestamp() //137997224058510000
uuid2.timestamp() //137997224058570000

From the above, it is evident that uuid1 is smaller than uuid2, but when we compare them using UUID compareTo method, we get different output. We should get output as -1 as it is supposed to be less than but we get answer as 1 which shows that this uuid1 is greater than uuid2

uuid1.compareTo(uuid2) //output - 1

On further analysing this, found out that the msb for uuid2 transforms to a negative number where as msb for uuid1 is positive number. Because of this the logic in compareTo is returning the value of 1 instead of -1.

u_7fff5ab0 = {UUID@2623} "7fff5ab0-43be-11ea-8fba-0f6f28968a17"
mostSigBits = 9223190274975338986
leastSigBits = -8090136810520933865

u_80004510 = {UUID@2622} "80004510-43be-11ea-8fba-0f6f28968a17"
mostSigBits = -9223296100696452630
leastSigBits = -8090136810520933865

Is this behaviour normal with UUID and their comparison with each other ? If so then how do we handle sorting of such timebased UUIDs?

Thank you


Solution

  • Please note that comparing time based UUIDs need special care, From the docs:

    Lastly, please note that Cassandra's timeuuid sorting is not compatible with UUID.compareTo(java.util.UUID) and hence the UUID created by this method are not necessarily lower bound for that latter method.

    Time based UUIDs should not be compare with the java.util.UUID#compareTo. To compare the two time based UUID, you should compare the time; within these two UUID contains. you need a custom Utility method implementation or just compare two timesstamps. Here is an example how to do it:

    // must be timebased UUID
    int compareTo(UUID a, UUID b){
       return Long.compare(UUIDs.unixTimestamp(a),UUIDs.unixTimestamp(b));
    }
    

    To learn more, go through this DOCS.