Search code examples
algorithmlanguage-agnosticuniqueidentifier

Encoding a number with a suffix or prefix in a compact way


Let's say I have 6 digits order ids:

000000
000001
000003
...
000020
...
999999

And assume each of these come from a different node in a distributed system and I would like to encode the node id into the order. The easiest way to do this would be to simply reserve the first 2 digits for the node id like this:

010000 - second node
020001 - third node
010003 - second node again
150004 - 16th node
...

This works sort of fine, but since I know for sure I'm only expecting a small number of nodes (let's say 16) I'm losing lots of possible ids limiting myself to basically 10^4 instead of 10^6. Is there a smart way to encode the 15 unique nodes without limiting the possible numbers? Ideally, I would have 10^6 - 15 possibilities.

EDIT: I'm looking for a solution that won't equally distribute a range to each node id. I'm looking for a way to encode the node id in an already existing unique id, without losing (ideally) more that the number of nodes of possibilities.

EDIT2: The reason for which this has to be the string representation of a 6 digit number is because the API I'm working with requires this. There's no way around it, unfortunately.


Solution

  • I'm losing lots of possible ids limiting myself to basically 10^4 instead of 10^6.

    We still have 10^4 * 16 ids in total.

    Is there a smart way to encode the 15 unique nodes without limiting the possible numbers?

    This problem is similar to the distributed hash table keyspace partitioning. The best known solution for the problem is to create lots of virtual nodes, divide the keyspace among those virtual nodes and then assign those virtual nodes to physical in a particular manner (round-robin, random, on demand etc).

    The easiest way to implement keyspace partition is to make sure each node generates such an id, that:

    vnode_id = order_id % total_number_of_vnodes
    

    For example, if we have just 3 vnodes [0, 1, 2] then:

    vnode 0 must generate ids: 0, 3, 6, 9...
    vnode 1 must generate ids: 1, 4, 7, 10...
    vnode 2 must generate ids: 2, 5, 7, 11...
    

    If we have 7 vnodes [0, 1, 2, 3, 4, 5, 6] then:

    vnode 0 must generate ids: 0, 7, 14, 21... 
    vnode 1 must generate ids: 1, 8, 15, 22... 
    vnode 2 must generate ids: 2, 9, 16, 23... 
    ...
    vnode 6 must generate ids: 6, 13, 20, 27... 
    

    Then all physical nodes must map to the virtual in the known and common way, for example 1:1 mapping:

    physical node 0 takes vnode 0
    physical node 1 takes vnode 1
    physical node 2 takes vnode 2
    

    on demand mapping:

    physical node 0 takes vnode 0, 3, 7 (many orders)
    physical node 1 takes vnode 1, 4 (less orders)
    physical node 2 takes vnode 2 (no orders) 
    

    I hope you grasp the idea.

    Ideally, I would have 10^6 - 15 possibilities.

    Unfortunately, it is not possible. Consider this: we have a 10^6 of possible ids and 15 different nodes each generating an unique id.

    Basically, this means that one way or another we are dividing our ids among nodes, i.e. each node gets in average 10^6 / 15, which is much less than desirable 10^6 - 15.

    Using the method described above we still have 10^6 ids in total, but they will be partitioned among vnodes which in turn will be mapped to physical nodes. That is the best practical solution for your problem AFAIK.

    I'm looking for a solution that won't equally distribute a range to each node id. I'm looking for a way to encode the node id in an already existing unique id, without losing (ideally) more that the number of nodes of possibilities.

    Do not expect a miracle. There are might be lots of other tricks worth trying.

    For example, if Server and all Clients know that the next order id must be 235, but say Client 5 generates order id 240 (235 + 5) and send it to Server.

    Server expects order id 235, but receive order id 240. So now Server knows that this order comes form Client 5 (240 - 235).

    Or we can try to use another field to store client id. For instance if you have a time filed (HH:MM.SS), we might use seconds to store Client id.

    Just some examples, I guess you get the idea...