Search code examples
algorithmcassandratokenpartitionmurmurhash

What is the algorithm behind Cassandra's token function?


The Token function in my driver doesn't support a composite partition key, but it works very well with a single partition key, it takes a binary in 8 bits form as an input and pass it to murmur3 hash function and extract the 64-signed-little-integer (Token) from the result of murmur3 and ignore any extra binary buffer.

So my hope is to generate the binary for a composite partition key and then pass it to murmur3 as usual, an algorithm or bitwise operations will be really helpful or at least a source in any programming language.

I don't mean murmur3 part, only the token side which converts/mixes the composite partition key and outputs raw bytes in binary form.


Solution

  • Finally I found a solution to my question :The Algorithm to compute the token for a composite partition key : Primary_key((text, int)) -> therefore the partition key is a composite_partition_key (text, int).

    Example : a row with composite_partition_key ('hello', 1)

    Applying the algorithm :

    1- lay-out the components of the composite partition key in big-endian (16 bits) presentation :

    first_component = 'hello' -> 68 65 6c 6c 6f

    sec_component = 1 -> 00 00 00 01

    68 65 6c 6c 6f 00 00 00 01

    2- add two-bytes length of component before each component

    first_component = 'hello', length= 5-> 00 05 68 65 6c 6c 6f

    sec_component = 1, therefore length= 4 -> 00 04 00 00 00 01

    00 05 68 65 6c 6c 6f 00 04 00 00 00 01

    3- add zero-value after each component

    first_component = 'hello' -> 00 05 68 65 6c 6c 6f 00

    sec_component = 1 -> 00 04 00 00 00 01 00

    4- result

    00 05 68 65 6c 6c 6f 00 00 04 00 00 00 01 00

    now pass the result as whatever binary base your murmur3 function understand (make sure it's cassandra variant).