Search code examples
pythonredisscalabilityshardingconsistent-hashing

Horizontally scaling out or sharding Python-RQ or Redis with Python


Trying to horizontally scale out the Redis instance working as the task server for Python-RQ.

As far as I know, the best way to do this would be to add sharding logic (most likely using Consistent Hashing) into a custom ConnectionPool and / or Connection class. I would rather use a library for the Consistent Hashing mechanism - as it seems like something that should probably be available and is most likely better / more battle-tested than a homegrown solution.

What would be a good pattern to do something like this? Is there some library that I should be looking into? Is there something I'm missing that I should be taking into account?

Thanks very much!


Solution

  • I think you should have a couple of things in mind.

    First one is about where is your bottleneck and why you should give a partition architecture way on your Redis backend under one Python-Rq paradigm. I think so that Redis has a enough performance to believe that the bootleneck should be in your number of workers and the number of jobs that you want process at a given time.

    You can say : How many time can i wait until this job should be processed ? If you can figure out this value, latency. You will have the key to add/del more nodes into your architecture.

    By the way, if you are looking for improve your infraestructure with some sharding solution with a Consistent Hashing, you have to take account the issues behind the rebalancing keys when a new node has been deleted or added.

    For instance, one current worker could be connected to wrong redis server for one already existent job, because a new server has been added or removed after that new job has been started and before that it has been ended.