Search code examples
pythonpython-2.7uuidsharding

Python- Sharding based on UUID


I am working on a database sharding where I have to decide the database bucket in the Python code (application sharding). I have 3 different database buckets and the using user ID, I have to shard it. The user id is generated by uuid.uuid4(). How can I shard with these inputs? I tried by converting first UUID to int and then taking the modulus. But the modulus operator is failing.


Solution

  • The modulus operator should be fine:

    >>> int(uuid.uuid4()) % 3
    2L
    >>> int(uuid.uuid4()) % 3
    1L
    >>> int(uuid.uuid4()) % 3
    2L
    >>> int(uuid.uuid4()) % 3
    1L
    >>> int(uuid.uuid4()) % 3
    1L
    >>> int(uuid.uuid4()) % 3
    0L
    >>> int(uuid.uuid4()) % 3
    1L
    

    But for the future-proofing your design I'd suggest you to actually code at least 16 shards, for example with 16 different hostnames pointing to 3 different backend hosts:

    myhostname00 IN CNAME backend01
    myhostname01 IN CNAME backend01
    myhostname02 IN CNAME backend01
    myhostname03 IN CNAME backend01
    myhostname04 IN CNAME backend01
    myhostname05 IN CNAME backend01
    myhostname06 IN CNAME backend02
    myhostname07 IN CNAME backend02
    myhostname08 IN CNAME backend02
    myhostname09 IN CNAME backend02
    myhostname10 IN CNAME backend02
    myhostname11 IN CNAME backend03
    myhostname12 IN CNAME backend03
    myhostname13 IN CNAME backend03
    myhostname14 IN CNAME backend03
    myhostname15 IN CNAME backend03
    

    This way you'd be able to create more backend servers in the future and move users there without changing you code. You can even have unevenly distributed users if you'd ever have backend servers that are more of less performant than others.