Search code examples
databaseredis

How to set hash tags correctly for a uuid key stored as byte array in Redis


As per Redis documentation hashtags could be used to force assign keys to the same hash slot. I have a use case where we are storing {UUID}Integer as a byte array in Redis where I expect only the UUID part to be used to assign a hash slot. The issue that we are facing is If the UUID contains characters 7b or 7d, it collides with { and } represented by ASCII byte values of 123 (0x7b) and 125 (0x7d) respectively which is impacting the distribution of these keys.

Eg - 7d3e4567-e89b-12d3-a456-426614174000 gets printed as {}>Eg\xe8\x9b\x12\xd3\xa4VBf\x14\x17@\x00} when stored with hashtags in Redis.

Is there a way to escape the characters in UUID for hash slot calculation to work correctly or is there any other workaround?


Solution

  • There is no way to tell Redis that a 7b or a 7d byte is not a curly bracket.

    The only way around this problem that I can see is to encode the UUIDs differently.

    I assume that you care about encoding speed and memory consumption. Currently, you are using 16 bytes per UUID which is optimal.

    You can use Base64 encoding to avoid the problem. it would require one byte per 6 bits, hence 22 bytes per UUID.

    Another option would be to use a custom encoding: every 7 bits are converted to a character 00..81 except 7b and 7d. The encoding is relatively simple - you'll have to extract 7 bits at a time and replace 7b with 80 and 7d with 81. It would require one byte per 7 bits, hence 19 bytes per UUID.

    Another possible encoding is to add 2 bytes to the UUID. Now you have 16 bytes + 16 bits. Set these 16 parity bits to 0. Next, for each of the 16 bytes: if it is 7b - replace it with 7a and set the matching parity bit. If it is 7d - replace it with 7c and set the matching parity bit. However, there is a problem with this encoding: the two added bytes could become 7b or 7d. To avoid it - add 3 bytes instead and divide the 16 parity bits between these 3 bytes, avoiding the most significant bit of each of these bytes. You can further optimize: for each UUID byte: if (byte & 0x01 == 0) { byte |= 1; set matching parity bit }

    More sophisticated encodings could reduce this number to 17 bytes per UUID but would be slower to compute.

    Another option to consider: If you are generating these UUIDs you can simply avoid generating UUIDs that contain 7b and 7d.