Search code examples
hashdata-partitioning

How to algorithmically partition a keyspace?


This is related to consistent hashing and while I conceptually understand what I need to do, I'm having a hard time translating this into code.

I'm trying to divide a given keyspace (say, 128 bits) into equal sized partitions. I want the upper bound (highest key) of each partition.

Basically, how would I complete this?

#define KEYSPACE_BYTE_SIZE  16
#define KEYSPACE_BIT_SIZE   (KEYSPACE_BYTE_SIZE * 8)

typedef struct _key
{ 
    char byte[KEYSPACE_BYTE_SIZE];
} key;

key * partition_keyspace( int num_partitions )
{
    key * partitions = malloc( sizeof(key) * num_partitions );

    // ...

}

Edit:

I suppose another way of saying this is:

for (i = 0; i < num_partitions; i++)
{
    partitions[i] = ((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * i;
}

Of course the problem is 2 ^ 128 is a very large number and can't be contained in any single integer variable in C with which to do the math (hence the char[16] struct).

I really don't want to use a large number library (or any library) for this.

Edit:

Although, in actuality the numbers I'm looking for is:

for (i = 0; i < num_partitions; i++)
{
    partitions[i] = (((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * (i + 1)) - 1;
}

Solution

  • The highest key in any particular partition will obviously be comprised of all 1-bits. If you have the lower n bits for your keys, and the upper m bits for your partition-ids, then all you need to do is run an m-bit counter, and concatenate it with n ones.
    To illustrate, assume an 8-bit keyspace with the upper 2 bits for the partitions (so num_partitions = 2^2 = 4, and the lower 6 for the keys. The highest key in each partition will be these four:

    00 111111
    01 111111
    10 111111
    11 111111
    

    In order to generate them, all you need to do is:

    for (int i = 0; i < num_partitions; i++)
        highest_key = (i << 6) | 0x3f // where 6 is key_bits and 0x3f is six ones.
    

    Of course, this assumes num_partitions is a power of two.

    Naturally, for a key-space as large as yours it won't be as simple as the above, since you can't fit everything into a single variable. Still, the principle remains the same. As long as your num_partitions is small enough, you can fit the counter into an ordinary int variable, copy it into the upper bits, and then filling the rest with ones is trivial.