I am working on a database sharding where I have to decide the database bucket in the Python code (application sharding). I have 3 different database buckets and the using user ID, I have to shard it. The user id is generated by uuid.uuid4(). How can I shard with these inputs? I tried by converting first UUID to int and then taking the modulus. But the modulus operator is failing.
The modulus operator should be fine:
>>> int(uuid.uuid4()) % 3
2L
>>> int(uuid.uuid4()) % 3
1L
>>> int(uuid.uuid4()) % 3
2L
>>> int(uuid.uuid4()) % 3
1L
>>> int(uuid.uuid4()) % 3
1L
>>> int(uuid.uuid4()) % 3
0L
>>> int(uuid.uuid4()) % 3
1L
But for the future-proofing your design I'd suggest you to actually code at least 16 shards, for example with 16 different hostnames pointing to 3 different backend hosts:
myhostname00 IN CNAME backend01
myhostname01 IN CNAME backend01
myhostname02 IN CNAME backend01
myhostname03 IN CNAME backend01
myhostname04 IN CNAME backend01
myhostname05 IN CNAME backend01
myhostname06 IN CNAME backend02
myhostname07 IN CNAME backend02
myhostname08 IN CNAME backend02
myhostname09 IN CNAME backend02
myhostname10 IN CNAME backend02
myhostname11 IN CNAME backend03
myhostname12 IN CNAME backend03
myhostname13 IN CNAME backend03
myhostname14 IN CNAME backend03
myhostname15 IN CNAME backend03
This way you'd be able to create more backend servers in the future and move users there without changing you code. You can even have unevenly distributed users if you'd ever have backend servers that are more of less performant than others.